Data Discovery: Screening

Data Access Restrictions

FERPA, HIPAA, Privacy Act, State-Level Regulations

LAWYERS!

Brainstorming Data Sources

Brainstorming for data source identification is a structured group creativity technique that brings together diverse participants to generate a wide range of ideas about where and how to find relevant data for a specific analytical policy problem. Facilitated sessions encourage participants to build on each other's suggestions, explore both conventional and unconventional sources, and use visual tools like mind maps or affinity diagrams to organize and expand upon the ideas generated. By deferring judgment and fostering an open, collaborative environment, brainstorming helps uncover data sources that may otherwise be overlooked, ensuring a more comprehensive and innovative foundation for policy analysis. This process not only stimulates creative thinking but also enables teams to quickly surface, evaluate, and prioritize potential data sources, supporting evidence-based decision-making.
\

Suggested Plan for Group Brainstorming to Identify Data Sources for Analytical Policy Analysis

Purpose:
To systematically identify relevant, reliable, and available data sources that will inform an analytical policy analysis, using structured group brainstorming techniques.


1. Preparation

  • Define the Policy Problem:
    Clearly articulate the policy issue, objectives, and the key questions that the analysis must address. This focuses the brainstorming on data relevant to the analytical goals[5][3].

  • Assemble a Diverse Group:
    Include staff, key partners, stakeholders, and, where possible, community representatives who bring varied perspectives and knowledge about potential data sources[1][3].

  • Gather Materials:
    Prepare chart paper, markers, digital collaboration tools (if remote), and any relevant background information or previous data lists[1].


2. Structured Brainstorming Session

A. Introduction (10 minutes)

  • Brief the group on the policy problem, objectives, and the importance of identifying comprehensive data sources.
  • Outline the session’s structure and encourage open, judgment-free idea sharing[2].

B. Individual Idea Generation (10 minutes)

  • Ask participants to spend a few minutes individually listing all potential data sources they can think of, considering both traditional (e.g., government reports, surveys) and non-traditional (e.g., social media, administrative data, stakeholder feedback) sources[4][5].

C. Group Sharing and Expansion (20 minutes)

  • Go around the group, with each participant sharing one idea at a time, while a facilitator records all suggestions on a visible board or digital platform.
  • Encourage building on others’ ideas and exploring less obvious sources, such as competitor data, open data portals, or community-generated information[2][4].

D. Prompted Exploration (10 minutes)

  • Use structured prompts to ensure breadth:
    • What quantitative data (e.g., statistics, metrics) is available?
    • What qualitative sources (e.g., interviews, case studies) could be useful?
    • Are there relevant data from other sectors, regions, or countries?
    • What unpublished or informal data might exist within organizations or communities[5][1]?

3. Prioritization and Gap Analysis

  • Cluster and Categorize:
    Group similar data sources and categorize them (e.g., administrative, survey, big data, qualitative)[5][4].

  • Assess Relevance, Availability, and Reliability:
    Quickly evaluate each source for its relevance to the policy question, accessibility, and trustworthiness[5].

  • Identify Gaps:
    Note areas where data is missing or insufficient, which may require new data collection or alternative approaches[5][3].


4. Documentation and Next Steps

  • Record All Outputs:
    Document the full list of suggested data sources, including notes on their strengths, limitations, and any follow-up actions needed (e.g., verifying access, seeking permissions)[1][5].

  • Feedback and Validation:
    Circulate the compiled list to participants and other stakeholders for validation and additional suggestions[3].

  • Plan for Data Collection:
    Develop a work plan for obtaining, processing, and analyzing the prioritized data sources as the next step in the policy analysis process[3][4].


Summary Table: Key Steps

StepPurposeExample Activities
PreparationFocus and organize the sessionDefine problem, assemble group
BrainstormingGenerate diverse data source ideasIndividual listing, group sharing
Prompted ExplorationEnsure breadth and depthUse prompts for types and sectors
Prioritization & Gap AnalysisFocus on actionable, relevant sourcesCluster, assess, identify gaps
Documentation & Next StepsEnsure follow-through and completenessRecord, validate, plan data collection

Tips for Success:

  • Use visuals (charts, sticky notes, digital boards) to keep ideas visible and spark further creativity[2].
  • Encourage participation from all group members, including quieter voices.
  • Consider follow-up sessions or online surveys to capture additional ideas after the meeting[1].

This structured, participatory approach ensures a comprehensive and relevant mapping of data sources, laying a strong foundation for evidence-based policy analysis[1][5][3].

Sources [1] [PDF] PARTICIPATORY POLICYMAKING - Goldman School of Public Policy https://gspp.berkeley.edu/assets/uploads/page/GSPP_Participatory_Policy_Toolkit_Version_1.pdf [2] Brainstorming with Data: How to Turn Insights into Innovation https://www.newhorizons.com/resources/blog/data-driven-brainstorming-strategies [3] Turn Data into policy https://www.datatopolicy.org/navigator/turn-data-into-policy [4] Strategic Policy Development Through Data-Driven Insights - LinkedIn https://www.linkedin.com/pulse/strategic-policy-development-through-data-driven-insights-bryce-undy-qu3ef [5] [PDF] Guide to Policy Analysis | ETF (europa.eu) https://www.etf.europa.eu/sites/default/files/m/72B7424E26ADE1AFC12582520051E25E_Guide%20to%20policy%20analysis.pdf [6] 8 Best Practices for Mastering Data-Driven Strategy - 180ops https://www.180ops.com/blog/best-practices-for-mastering-data-driven-strategy [7] Nominal Group Technique (NGT) - ASQ https://asq.org/quality-resources/nominal-group-technique [8] [PDF] Basic Methods of Policy Analysis and Planning http://surjonopwkub.lecture.ub.ac.id/files/2019/01/Basic_Methods_of_Policy_Analysis_and_Planing.pdf [9] Eliciting patient-important outcomes through group brainstorming https://pmc.ncbi.nlm.nih.gov/articles/PMC6360192/ [10] 30 Effective Brainstorming Techniques for Teams To Try | Indeed.com https://www.indeed.com/career-advice/career-development/brainstorming-techniques [11] Foundations of Policy Analysis | Intro to Public Policy Class Notes https://library.fiveable.me/introduction-to-public-policy/unit-4/foundations-policy-analysis/study-guide/crUWSnTqaamH1mro [12] Brainstorming Sessions: Agenda Template + Best Practices https://www.wudpecker.io/blog/brainstorming-sessions-agenda-template-best-practices [13] Phase 3: Collecting and Analyzing Data - NACCHO https://www.naccho.org/programs/public-health-infrastructure/performance-improvement/community-health-assessment/mapp/phase-3-the-four-assessments [14] Analytical techniques | College of Policing https://www.college.police.uk/app/intelligence-management/analysis/analytical-techniques [15] 7 Brainstorming Rules for Stronger Collaboration - IDEO U https://www.ideou.com/blogs/inspiration/7-simple-rules-of-brainstorming [16] Top Policy Analysis Tools for Better Public Policy - Number Analytics https://www.numberanalytics.com/blog/policy-analysis-tools-public-policy [17] Brainstorming in Design Thinking: Best Practices & Challenges https://voltagecontrol.com/blog/brainstorming-in-design-thinking-best-practices-challenges/ [18] [PDF] A Framework for Analyzing Public Policies: Practical Guide http://www.ncchpp.ca/docs/Guide_framework_analyzing_policies_En.pdf [19] [PDF] Public Policy Analysis - Political Science - University of Florida https://polisci.ufl.edu/wp-content/uploads/sites/147/PUP6009-Robbins-1-2.pdf [20] [PDF] Structured Analytic Techniques for Improving Intelligence Analysis ... https://www.stat.berkeley.edu/~aldous/157/Papers/Tradecraft%20Primer-apr09.pdf

Data Source Inventory

Following an initial screening inventory, a subset of the sources are selected for a full inventory.

Example from a recent project:

Full Inventory Process

* Description/Features - What is the temporal nature of the data: longitudinal, time-series, or one time point? - Are the data geospatial? If Yes, at what level, (e.g. census tracts, coordinates)?
  • Metadata

    • Is there information available to assess the transparency and soundness of the methods to gather the data for our purposes, (ie., supplementing the census)?
    • Is there a description of each variable in the source along with their valid values?
    • Are there unique IDs for unique elements that can be used for linking data?
    • Is there a data dictionary or codebook?
  • Selectivity

    • What unit is represented at the record level ofthe data source, (e.g., person, household, family, housing unit, property)?
    • Does this universe match the stated intentions for the data collection? If not, what has been included or excluded and why (e.g. do the data exclude certain individuals due to the way the data are collected)?
    • What is the sampling technique used (if applicable, e.g., convenience, snowball, random)?
    • What is the coverage, (e.g. response rate)?
  • Stability/Coherence

    • Were there any changes to the universe of data being captured (including geographical areas covered) and if so what were they, (e.g, changed the geographical boundaries of census tracts)?
    • Were there any changes in the data capture method and if so what were they, (e.g, revised questions, data collection mode, classification categories, algorithms for social media data)?
    • Were there any changes in the sources of data and if so what were they, (.g., data were reported by teachers in 2010 and reported by principals in 2011; used Current Population Survey in 2011 and American Community Survey in 2012)?
  • Accuracy

    • Are there any known sources of error, (e.g., missing records, missing values, duplications, erroneous inclusions)?
    • Describe any quality control checks performed by the data's owner, (e.¢., deleted duplicates, checked for recording errors)
  • Accessibility

    • Are any records or fields collected, but not included in data source, such as for confidentiality reasons, (e.g, does not include any student files in which there are less the 5 students in a category)?
    • Is there a subset of variables and/or data that must be obtained through a separate process, (e.g. state level data openly available, but one must apply to get census tract)?
    • If yes, is there a separate legal, regulatory, or administrative restrictions on accessing the data source?”
    • Cost? Is it a one time, annual, or project-based payment?
  • Privacy and security

    • Was consent given by participant? If so, how was consent given, (e.g. online form, in-person discussion)?
    • Are there legal limitations or restrictions on the use of the data, (¢.g., Family Educational Rights and Privacy Act -FERPA)?
    • What confidentiality policies are in place, (¢.g., cannot share data outside of requesting institution; does not include personally identifiable information)?
  • Research

    • What research has been done with this dataset, (e.¢., impact of policies, predictors of student suecess, housing stock inven- tory assessment)?
    • Include any links to research if provided.
    • List any other data use notes provided by the supplier.

Data Source Screening

The first step in before conducting a full data inventory is to screen the data sources, identifying which sources are worthy of a deeper look and which are worthy of consideration for profiling. The screening includes five questions and a qualitative evaluation of purpose, data collection method, selectivity, accessibility, and description.

Example from a recent project:

Screening Inventory Process

1. Are the data collected opinion-based, (e.g., people’s attitudes, preferences, etc.)? 2. Are the data collection recurring, (i.e., must be collected at least annually)? 3. Are there data available for 2013? 4. Geographic granularity * For Education 1. Are the data collected at least the school level? 2. Can the data be linked to other education/workforce datasets, (e.g., K-12, higher education, workforce)? 3. If this is a state dataset, how do they define school districts within this state? 4. If applicable, what types of schools does it cover, (¢.g.. public, private, charter)? * For Housing 1. Are the data collected at the property or housing unit level?

Additional Screening Information

* Purpose — What is the purpose of the organization collecting the data, (e.¢., the Virginia Department of Education (VDOE) coordinates education for the state and makes policy recommendations)? — Why are the data collected and how does the organization use the data, (e.g., VDOE collects the data for administrative purposes to assess student and school progress and to inform school policies)? — Who else uses these data, (¢.g., businesses, policy-makers, citizens, researchers)? — Who do they sell the data to, (¢.g., Zillow for individual homeowners, CoreLogic for multiple uses, business for economic development, Chief Economists at trade associations)?
  • Method

    • What is the data collection method, (e.g., paper questionnaire, operator entry, online survey, interview, sensors, algorithms for creating datasets from twitter feeds)?
    • What is the type of data collected, (e.g., designed collection, intentional observation, administrative data, digital data)?
    • If designed, who created the questions, (.g., government, researchers, private business)?
    • What are the raw sources of the collected data prior to any aggregation, (e.g., self-report, third party)?
  • Description

    • What is the general topic of the data, (¢.g., student learning, housing quality)?
    • What are the earliest and latest dates for which data are available, (e.g., 1995-2005)?
  • Timeliness

    • Are the data collected and available periodically, (e.g, every year or decade)?
    • How soon after a reference period ends can a data source be prepared and provided, (e.g., one year)?
  • Selectivity

    • What is the universe (¢.g., population) that the data represents (e.g., students who attended public school in Virginia in 1995)?
  • Accessibility

    • How are the data accessed, (¢.¢., API, downloaded - csv, txt, etc.)?
    • Are they open data?
    • Any legal, regulatory, or administrative restrictions on accessing the data source?
    • Cost? Is it one-time or annual or project-based payment?
    • Describe any gaps/concerns you see with this dataset
  • Does this dataset appear to meet for the needs for your study? Yes/No

Data Linkage Restrictions

Do common IDs and/or identifiable demographics exist for determiistic and probabilistic record linkage?

Snowballing

Snowballing is a simple process of expanding the zone of contacts through initial contacts. The process begins by identifying an initial group of data stakeholders, hopefully those who are already involved in the preliminary stages of the process. These actors or participants are then asked to identify those individuals whom they feel should be involved in the data discovery process as well. This is the “first-order” zone. The researcher then proceeds to contact those actors (whether individuals or groups) and proceeds to have these “second-order” actors, further identify others who they think would have an interest in the project or process (Wasserman and Faust, 1994: 34; see also Goldenberg, 1992; Babbie, 1998; Doreian and Woodward, 1992).
\

Suggested Plan for Using Snowballing to Identify Data Sources for Analytical Policy Analysis

1. Define the Scope and Criteria

  • Clearly articulate the policy question and the types of data sources needed (e.g., administrative records, survey data, expert reports, grey literature).
  • Set inclusion and exclusion criteria for what constitutes a relevant data source, considering factors such as data quality, accessibility, and relevance to the policy issue[3][5].

2. Identify Initial “Seed” Sources

  • Start with a small set of known, high-quality data sources or experts in the policy area-these are your “seeds”[5][8].
  • These may include foundational reports, key datasets, or recognized experts and organizations relevant to the policy topic.

3. Conduct Initial Outreach and Data Mapping

  • Review the reference lists, bibliographies, and acknowledgments of the seed sources to identify additional data sources (backward snowballing).
  • Contact the authors, data custodians, or experts associated with the seed sources to ask for recommendations of other relevant data sources or contacts (forward snowballing)[5][8].

4. Expand the Network Iteratively

  • For each newly identified data source or expert, repeat the process: review their references and seek further recommendations.
  • Continue this iterative process in multiple “waves” until new referrals yield little or no new information (saturation point)[2][8].
  • Track relationships and connections between sources to identify clusters or gaps in coverage[8].

5. Document and Assess Sources

  • Maintain a detailed log of all identified sources, how they were found, and their relevance to the policy analysis.
  • Regularly assess the quality, credibility, and diversity of the sources being identified to ensure a comprehensive and balanced data set[5][8].

6. Ethical Considerations

  • Ensure that all outreach respects privacy and consent, especially if contacting individuals or using unpublished data[1][4][8].
  • Clearly explain the purpose of the inquiry and obtain permission before sharing contact details or unpublished information.

7. Review and Finalize

  • Once saturation is reached and no significant new sources are emerging, review the full set of identified data sources.
  • Assess for completeness, diversity, and potential biases, and supplement with targeted searches if necessary.

This structured snowballing approach leverages expert networks and reference chains to systematically uncover both well-known and obscure data sources, providing a robust foundation for analytical policy analysis[5][8].

Sources [1] Snowball Sampling - Division of Research and Innovation https://research.oregonstate.edu/ori/irb/policies-and-guidance-investigators/guidance/snowball-sampling [2] What Is Snowball Sampling? | Definition & Examples - Scribbr https://www.scribbr.com/methodology/snowball-sampling/ [3] Snowball Sampling: How to Do It and Pros & Cons - InnovateMR https://www.innovatemr.com/insights/snowball-sampling-how-to-do-it-and-pros-and-cons/ [4] Guidelines for Investigators Using Snowball Sampling Recruitment ... https://www.boisestate.edu/research-compliance/irb/guidance/guidelines-for-investigators-using-snowball-sampling-recruitment-methods/ [5] Snowball sampling - Wikipedia https://en.wikipedia.org/wiki/Snowball_sampling [6] [PDF] Snowball Research Strategies https://sru.soc.surrey.ac.uk/SRU33.PDF [7] Snowball Sampling: Explanation, Examples, Pros, and Cons - Dovetail https://dovetail.com/research/snowball-sampling/ [8] What Is Snowball Sampling: 6 Simple Steps With Examples https://surveysparrow.com/blog/snowball-sampling/ [9] Snowball Sampling Method: Techniques & Examples https://www.simplypsychology.org/snowball-sampling.html [10] Snowball Sampling: Introduction - Johnson - Wiley Online Library https://onlinelibrary.wiley.com/doi/abs/10.1002/9781118445112.stat05720

Data Redistribution Restrictions

Is anybody other than you allowed to work with the data?

What additional transformations/protections are required before data can be redistributed?

Gaining Entry and Building Trust

More times than not needs to be the primary focus of your initial data gathering efforts.