Abstract
The technical challenges of accessing large administrative datasets are easily addressed with the advances in data security, computational resources, and the Internet. The most vexing barriers are legal and ethical issues, and control of the data by the agencies that generate it. This article describes those issues and promotes the notion that partnerships with the data providers are necessary to facilitate access to researchers, both inside and outside government, but also to provide benefits, in the form of evidence, research, and information to the data providers themselves. Ultimately, training of all stakeholders around the secure and responsible use of data and appropriate data stewardship is necessary to facilitate the increased use of administrative data that is required to develop evidence that will have an impact on government services and programs for individuals and families.
Keywords
While it was not always so, it is now generally accepted that administrative data are a valuable resource for rigorous research and evaluation (Card et al. 2010; NSF 2007).
1
Administrative datasets can contain data on the outcomes of program participation and the characteristics of all program participants. It can contain identifying information for both individuals and organizations and data that are sensitive and confidential because characteristics of specific individuals can be known. Because the outcomes and program data are not always in the same dataset, multiple administrative datasets need to be combined or administrative data need to be linked to survey data for administrative data to be a powerful tool for doing rigorous research at a large scale in a cost-effective manner. The federal government has recognized the potential value of administrative data by asking the Commission on Evidence-Based Policymaking to determine the optimal arrangement for which administrative data on federal programs and tax expenditures, survey data, and related statistical data series may be integrated and made available to facilitate program evaluation, continuous improvement, policy-relevant research, and cost-benefit analyses by qualified researchers and institutions.
2
The creation of the commission recognizes that the current state of affairs is not optimal. Administrative data are not often easily available to researchers either inside or outside of government as there are few arrangements whereby the data easily flow to researchers and research organizations. Much has been written and said about the barriers to accessing such data from states for research, policy, and program purposes (Goerge 1997; Hotz, Goerge, et al. 1998; U.S. General Accounting Office 1992). However, even as the federal government states that “the increased use of administrative data for statistical purposes can generate a range of benefits,” little has been done to ensure that a primary collector of such data—state agencies—has the incentives or resources to improve data use for research and evaluation. 3 The premise of this article is that to address the barriers, states need to be more closely engaged in the analysis of their data for their benefit.
State agencies are not required to provide access to any researcher. These data are collected primarily for operational purposes, be it determining eligibility for programs, tracking benefits paid to program participants, paying providers of services, and as frontline case management—the management of prisoners, patients, foster children, students, and others for whom the state has responsibility. These systems track data on human resources employed by the state and services provided by nongovernmental organizations (NGOs). These systems compile information on taxes paid by individuals and businesses. They collect data on births, deaths, disabilities, and diseases for public health and other purposes. Using administrative data for research, be it done by analysts inside or outside of government, is low on the list of primary uses. Until 30 years ago, researchers in the domains of employment, criminal and juvenile justice, human services, higher education, early childhood education, and health care used either survey methods or paper record extraction to compile data to conduct their work. Using administrative data for research is still not routine, but it is becoming more commonplace as states are increasingly expected to become data-driven by elected officials and the public. Even those government staff trained in research in universities may not be familiar with the notion of using administrative data for research.
There are multiple stakeholders concerned with improving access to administrative data from state agencies. Each of the stakeholders can impact access either as providers, users, or facilitators of data use. As the controller of the data, state agencies are the primary stakeholder. Researchers and research organizations increasingly value administrative data to conduct research. This article focuses on the interests of state agency leaders as the primary decision-makers around access and researchers as the primary user of state administrative data. Aligning the interests of these two stakeholder groups would benefit both. Closer collaboration and greater communication would help to address the barriers and modify what is often an adversarial process between the holders and users of the data.
There are clearly others who have a special interest and can be instrumental in improving the use of a specific set of administrative data in a secure and effective manner. As the funder of many programs and the collection of much of the data, the federal government has an interest in the quality of federal program management and the implementation of policy, including the protection of human subjects. Governors and state agencies other than that of the state agency holding the data also have a stake in that they may need administrative data to evaluate and improve other state programs. Advocates can be both interested in the analysis of the data as they can point to where classes of individuals may require protection under the law. They, in addition to the agencies that collect the data, are also protectors of the privacy of individuals. Class action lawsuits have been both a positive and negative force around the use of administrative data. 4 Finally, the public has an interest in such data—individuals can be served by better use of data, but can also harmed by breaches or other improper uses, which is why data security and responsible use is of the utmost importance.
This article describes the barriers and risks, but primarily focuses on how state agencies may benefit when researchers and research organizations have increased access to state administrative data to improve the effectiveness of state programs. A less appealing and likely politically unfeasible alternative, which would benefit the federal government, is to require state and local agencies to provide data to researchers to evaluate federally funded programs. Listing the multiple strands of public and political pushback on such a requirement is beyond the scope of this article. The focus is on the better option of moving toward optimizing the benefits of increased access for all stakeholders.
While this article focuses on researcher access to data, both inside and outside of government, other stakeholders seeking access to an agency’s data face similar issues. The barriers are similar, and the potential approaches to addressing them are as well.
This article argues for the creation and use of curated longitudinal administrative data that can be linked at the individual level with programmatic and outcome data in a secure, legal, and efficient manner. Access to these data should be governed by data sharing agreements (DSA) or memoranda of understanding (MOU) that describe the conditions under which researchers could access the administrative data. Other data, such as survey data, could be linked to longitudinal administrative data to enhance both datasets. Ideally, data across states could be linked.
The approach I advocate centers on state agency leadership as a driving force for the increased analysis of their data to improve federal, state, and local policy and programs. I discuss the current circumstances around accessing state administrative data in terms of barriers to access, then present approaches that would facilitate the goal of accessing the data necessary to improve the well-being of U.S. residents.
Barriers to Access
This section discusses the barriers to accessing administrative data. Both the need for such data and concerns about how these data are protected are increasing. The final report of the Commission on Evidence-Based Policymaking (2017) states that “the American public will be concerned about exactly how those data are being used and whether the privacy and confidentiality of individuals and organizations are being protected” (p. 8). And as the report calls for additional evidence to support policymaking, it also states that “these administrative data, collected in the first instance to serve routine program operation purposes, also can be used to assess how well programs are achieving their intended goals” (p. 9).
The concerns that key stakeholders have around privacy and confidentiality as they protect the public can act as barriers to access and to the increased use of administrative data. Organizations providing administrative data to third parties must be confident that the data will be kept secure and not disclosed for purposes beyond the explicit purposes of evidence-building.
It is important to point out that getting permission to access the data and actually accessing the data are two different activities and should likely be the job of two different units of an organization. The capacity of state agencies to produce a dataset varies tremendously. Some can produce one in days if they regularly curate their own data and the data fit the need of the researchers. At the other extreme, there may be no one whose role includes the creation of appropriate datasets, and doing so could mean months of delay.
Risk aversion
Although access to state agency data for external research and other purposes happens, the controllers of the data—state agency leaders—are often either reluctant to provide data to researchers or simply have not made their use a priority. It is reasonable for them to be reluctant because the results can potentially be harmful to the agency or to a particular leader. Although good data sharing agreements can ensure that a state agency has the ability to review interpretations of results and to provide input, even in these cases, a report may negatively impact the funding for a program and, more importantly, vulnerable individuals and families who rely on the program. Often, the reaction to a critical evaluation may not be to improve the program, but to defend it through political or anecdotal arguments.
Inadequacy of federal datasets compiled from state data
State agencies do provide data to other agencies, particularly federal agencies, when there are requirements to do so. This compliance, though, requires significant state resources. In theory, these data, held by federal agencies could be used for research and evaluation, just as federally collected data, such as tax data, has been shown to be an excellent resource for research. However, although federal agencies receive these data from states, the data often have significant prohibitions around re-use, such as using the data for another purpose (e.g., research). They either cannot be used for general research or evaluation purposes due to federal law (National Database on New Hires at the U.S. Department of Health and Human Services [HHS]) or require permission from states to be used for research or evaluation purposes (unemployment insurance wage data at the U.S. Census Bureau, for example). Addressing these restrictions, particularly when the federal government has an interest in knowing what programs are working for whom, would be a step forward especially if states benefit from the increased ability to reuse data. However, legislative change, which would be required in many cases to use such data, is slow to come, and in some cases, there have been steps backward. Recently, the Child Care Development Fund program (HHS) removed the requirement of providing PII (Social Security Numbers) to HHS for parents and children participating in this program. 5 Another example is the unemployment insurance wage data that the U.S. Census Bureau receives under the Longitudinal Employer-Household Dynamics program. In most cases, the U.S. Census Bureau cannot use the state data for other purposes without permission from the state.
Is it legal?
Some state agency leaders believe that it is not legal for them to share data with researchers, because, in several programs, administrative data have to be used to improve or benefit the administration of the program. This includes research that is conducted with administrative data from Temporary Assistance for Needy Families (TANF), Medicaid, and Supplemental Nutrition Assistance Program (SNAP), and education programs. For example, Medicaid data can be used to improve the administration of the Medicaid program. Education data with PII can be used for “conducting studies for, or on behalf of, schools, school districts, or postsecondary institutions.”6,7 Internal Revenue Service (IRS) data can be used for research if it contributes to tax administration. Clearly, whether research supports the administration of the program is open to interpretation. The onus is on either party to justify that the use of administrative data does inform the administration of the program. An evaluation of a program should improve the administration of the program in some way.
The most straightforward way for a research organization to access administrative data is to have a contract between the state agency and the research organization. States may also participate in national evaluation efforts where the intention is to provide evidence for the effectiveness of a new program or policy. External researchers are often part of these efforts, as national evaluators or as evaluators for specific states. That being said, this access, as well as contracts, are restrictive in that the product is controlled by the state or federal agency, and the administrative data can only be used for the specific evaluation purpose.
What is the benefit?
States do provide data to researchers when they see that there is a benefit to a state agency. That these data should improve program administration can be the basis for overcoming all other barriers—it changes the premise under which state agencies and researchers are discussing the reasons to provide access to researchers. The requests for data should always include how states would benefit from the research. If researchers cannot describe the benefit for a state providing the data, they should perhaps not get access.
Defining the legal relationship
As mentioned above, state administrative data sharing requires a data sharing agreement (or DSA or MOU). Once a researcher gets to the stage of creating a DSA with a state agency, some of the barriers will have been addressed. Although templates have been developed, they are seldom used, and these DSAs are often created from whole cloth. Good DSAs clearly outline the duties and interests of each party and the ability for the data providers to review draft reports and comment on them, clearly provide permission for specific research activities or how permission for each activity happens, have a duration beyond a single project and official contacts for all parties, and specify requirements for data security. Other features may include authorizing rules, regulations, or legislation; detailed descriptions of the data to be shared; and lists of individuals who will access the data. All the components of data sharing agreements can be considered to be barriers to accessing data and using it. There may be contention on any of these particular issues.
Once a DSA is in place, the path to data access may be clearer, even if the process does not move quickly. Even with a DSA, state agencies can decide that addressing a particular research question is not in their interest and can employ language in the DSA to reject a particular project.
It comes down to trust
Currently, the core of successful state administrative data sharing is trust between the agency and the individual researcher or research organizations. State agency leaders and staff must trust researchers (and other external entities) to protect data from breaches, to not disclose data or preliminary findings to other organizations (researchers, media, advocacy groups) without explicit permission, and to pursue research that is unknown to the agency. They must also trust researchers to protect human subjects and follow institutional review board (IRB) regulations where applicable, to understand their programs, to understand the data, to facilitate the review of the methods and findings by the state, to employ that most rigorous methods possible, to work with the state to understand the implications of the research, and to work in good faith to improve the programs of the state agency. Ultimately, a lack of trust between state leaders and researchers acts as a primary barrier to using administrative data, because state agency leaders have so much control.
The need for multiagency or program data
Because addressing a research question often requires the combination or linking of data across agencies, accessing data from multiple sources may be a barrier simply because of the need to convince multiple state decision-makers to share their data. It increases the number of “yes” answers needed to accomplish a goal. Many projects stumble because key datasets cannot be accessed.
Furthermore, state leadership may be concerned about researchers having data from multiple agencies. When data are combined from multiple sources, new data can be created that increase the potential costs of a breach in data security—for example, if combining school and health data results in a current street address of an individual with a disability. However, since such breaches are rare, perhaps a more pertinent concern for government leaders is what will be said about their program participants on topics for which they do not have data.
State agencies tend to be reluctant to share data with each other. Some of their concerns about sharing with other government agencies are similar to those that they have in sharing data with external researchers. In addition, there are sometimes concerns about other agencies “knowing their business.” The reality is that state agencies are often in competition for scarce resources, that their programs are at risk of being cut, that their staff may be reassigned, or that their authority over their operations is diminished as a result of information that is externally compiled. These factors sometimes actually benefit researchers since state agencies would rather share data with a third party than with each other when the benefit is clear or the risk is low. Again, however, it is difficult to disentangle the complex relationships between stakeholders.
Can the data be produced?
The capacity of states to provide access to external organizations is another potential barrier to using administrative data. Even if state leaders are supportive and a DSA can be finalized, the ability of a state agency to physically provide data may be limited. Both the technical and time capacity of state agency staff may be limited. Many states do not create datasets that can be easily shared. If any file that contains the data that a researcher needs does exist, if it is possible for a researcher to receive and work with that file, accepting existing formats is often the most expedient route to an external organization receiving the data. Any special extracts or reformatting of data may either ultimately prevent access or delay it significantly.
Documentation
Similarly, the capacity of researchers to both assist states in providing them with what is needed or to use the data provided is a potential barrier. Numerous proposed projects end because researchers simply do not know what is available. This lack of capacity is brought about in part by a lack of documentation about what is available. States have sparse, if any, metadata to inform researchers about the contents or quality of the data. Because the raw data are typically only used by a few agency staff who work near each other, all the knowledge about a particular dataset may be “in their heads” or kept in documentation that is impossible for an outsider to understand. Combinations of numbers and letters describing fields (variables), such as “H05AB,” may be obvious to those who work with the data all the time, but are foreign to a researcher who acquires a dataset without documentation. The lack documentation or metadata usually results in back-and-forth between agency staff and researchers about what is available and what is needed. Such conversations often lead to additional delays while the staff pull together information. Often, the first dataset does not meet the specific needs of the researcher—either because of its contents or the researcher’s inability to use the data.
Skills of the researcher
States today often use relational databases to store their data, and the structure of these databases is optimized to support the administration of relevant programs. This means that the database is likely highly normalized to promote speedier processing of individual transactions. However, this also means that the data that a researcher is asking for may exist in dozens, if not hundreds, of individual database tables. For example, one table may only contain the race of individuals in a program, while another only contains the gender. This requires either the state programmers to de-normalize the data or the researcher to do this. Again, this creates a delay for the researcher’s analysis. The existence of any type of file that the agency is using to calculate statistics is, as mentioned above, often the quickest way to get most of the data that a researcher might need.
Cost
Obtaining the financial resources to process administrative data into research-ready datasets is another significant barrier. While it is often said that using administrative data is a less expensive way to obtain data than primary data collection, the extraction, transformation, and curation of administrative data is an expense that is not easily funded through traditional funding mechanisms. Several foundations have made a commitment to funding such efforts, particularly in university settings, but as standards and best practices are developed, it is clear that sustainability is a challenge. 8
It is often impossible for an external party to understand the internal decision-making process of an agency that is deciding whether to provide access to a researcher. The reason given for denying access may not be the actual reason access was denied. This lack of transparency makes it difficult to address barriers one by one. Most state agencies do not have clear policies regarding access to their publicly available data; this creates further barriers.
The next sections address promising approaches to overcoming these barriers, approaches that have worked in some places and have features that could be generalized to other contexts. I begin with a set of requirements that is necessary for both the state and the researcher to go forward.
Addressing Data Security
First and foremost, administrative data must be kept secure and protected from unauthorized access. Without the peace of mind that identified data will not be released or “hacked,” state leaders would not take the first step in providing external access. The good news is that the technology exists to keep the data secure and restrict access to only those who are fully vetted to view and process the data. State agencies are increasingly able to assess a researcher’s ability to keep the data secure. Technical and procedural safeguards must be implemented, maintained, updated, and then communicated to the owners of the data so that data security is no longer a barrier.
A few states and private research organizations have curated data for use by researchers. Best practices—policies and procedures that organizations must implement—are forming as federal law such as the Health Insurance Portability and Accountability Act (HIPAA), the Family Educational Rights and Privacy Act (FERPA), and Code of Federal Regulations (CFR) 42 Part 2 require specific procedures and policies to be in place. 9 This is the topic of another article in this volume, but it is important for state administrators, who increasingly employ chief information security officers, to have confidence in the research organizations’ data security.
Opportunities
The leadership of state agencies clearly have important programmatic and policy challenges that could be addressed by increased analysis of their own data and likely data from other local, state, or federal agencies. In a recent needs assessment, state agency leaders expressed the need to link to data outside of their agencies to better understand the characteristics of their program participants and the outcomes that they experience (Weigensberg et al. 2015).
It is difficult for them to pursue this when they cannot attract the workforce or financial resources to both acquire and curate high quality data and to analyze the databases once built. With a few exceptions, state agencies in the health, human services, education, and public safety areas are understaffed.
To improve researcher access to state and local agency data, the research and academic communities should fill the gaps of both the leadership’s need for more analysis and their lack of the human resources to do it. Researchers must show that their work can address the needs of state agencies and is not meant only for their peers. The academic community can train the current and future federal, state, and local agency workforce so that, at a minimum, these agencies can partner with researchers to make their data work for them. Ideally, government analysts could do more to meet their leadership’s needs for analysis and help them to realize the benefit of more analysis.
Working to address state agency need is typically not rewarded in the university setting, because it often does not impact whether research is published. Most researchers do not have incentives to maintain relationships in which they provide benefits to state agencies over a long period of time, because a couple of well-placed peer-reviewed journal articles can be sufficient for promotion. Perhaps the process of securing data from government agencies needs to be given the same credit in academic departments as designing primary data collection efforts.
Research organizations, which have a mission to produce high-quality research that impacts public policy, depend on the cooperation of operating state agencies and rely on “soft” money, have a greater incentive to maintain relationships with state agencies, and provide technical or research assistance over a sustained period. The quid pro quo is for these research organizations to receive administrative data from state agencies, often on a regular basis, so that the data can be maintained and the quality can be ensured.
A common complaint among state agency leaders is that they cannot get information from researchers quickly when they need it. University faculty are not often able to provide information quickly—they often rely on student research assistance and cannot drop everything to address the need of a state agency leader. Also, again, this activity provides few benefits for the university faculty member. However, research centers, like the Institute for Research on Poverty at the University of Wisconsin, which curates Wisconsin state agency data, does have the ability to provide a quick response. Often, a state leader is extremely appreciative of whatever he or she can get when the agency’s staff cannot provide needed information. This is a capacity issue that could be addressed to improve access to organizations that can provide a quick response.
The fluency of all stakeholders in the use of administrative data should be continually enhanced. Federal statistical agencies require yearly training on how to handle sensitive data, how to address data security, and the legal requirements and penalties associated with their positions. This, while very important, is only the first step. As the Big Data meme is upon us, the expectations of government leaders to address the use of their own data have increased. While being data-driven has been a goal for years, the informed general public, including advocates and the media, is asking why government is not using its data better.
However, even when a state leader may provide his or her agency’s data for study, data from other state agencies may be necessary for the research. There are multiple efforts that clarify and facilitate the linkage of datasets across programs. Perhaps the best example of this is the use of unemployment insurance (UI) wage data to measure the impact of job training or other programs that have increased earnings as a primary outcome, such as the TANF leaver studies at the time of welfare reform (Cancian et al. 2002). Another example is a study including multiple states that looked at the impact of the Great Recession on the use of UI and SNAP (Finifter and Prell 2013). Accessing data from multiple programs or agencies to build comprehensive datasets (Integrated Data Systems, or IDS) is at the very least on a few agencies’ agendas in most states. The State Longitudinal Data Systems program of the Department of Education exists in multiple states and is an example of a database-building initiative that can benefit researchers inside and outside of government.
Strategies for Creating Infrastructures for Increased Data Sharing
There have been several efforts to share state administrative data with external researchers over the past three decades, although no one effort has been replicated to the extent that it has become a model for access. What can we learn from these efforts to build more generalized best practices around securing external state administrative data access?
State government examples
State governments have been traditionally inhospitable to researchers, although there are notable exceptions. The Washington State Department of Social and Health Services, since welfare reform; the Illinois Department of Children and Family Services; the New York State Office of Temporary and Disability Assistance; and the Florida Department of Education have (or have had) research units that operate very successfully within the bureaucratic structure. Does this internal success have implications for external data sharing? In these particular cases, the answer is that it has facilitated data sharing to external research organizations in limited ways. Trusted researchers and research organizations have received data regularly to conduct multiple research projects. The Actionable Intelligence for Social Policy (AISP) project has, in recent years, brought together states that have built integrated data systems internally along with research organizations that have done the same. Within the AISP group, there is variation in how welcoming government agencies are to sharing data with external research organizations.
A state government–based model is the South Carolina Budget Control and Review Board (now the Revenue and Fiscal Affairs Office). After nearly 30 years of building data capacity across state agencies and programs, South Carolina has the capacity to provide datasets to researchers. South Carolina state agencies and qualified researchers can purchase data, at cost, from this agency. Many researchers have benefitted from this model.
Illinois recently began the State Data Practice, housed in the Illinois Department of Innovation and Technology, the intention of which is to produce analytics to improve services provided to state program participants across the human services, health, public safety, and employment sectors. It has employed a data scientist and data architect and will have access to an unprecedented set of data within Illinois. Its mandate is to work with state agencies to produce actionable results that improve the well-being of its residents.
In the past year, a few states have implemented “enterprise memorandum of understanding” or E-MOUs. 10 Currently, these E-MOUs do not include sharing data with external parties, although the state intends to do so eventually. These E-MOUs are intended to facilitate data sharing among state agencies. Given that these E-MOUs are new, it is yet to be seen whether they will be well implemented and work to improve access to data within state government. The additional hope is that these E-MOUs can be used as a model to improve access to data for external organizations and researchers.
Federal agency examples
A different model has been implemented by the U.S. Census Bureau over the past two decades (Johnson, Massey, and O’Hara 2014). Initially, the Census Bureau began collecting administrative data from federal agencies to improve its ability to address coverage issues in the census count of the U.S. population. Currently, the Center for Administrative Records and Research Applications (CARRA) collects state administrative data to improve its coverage and data quality but also to support program evaluation. In most cases, state agencies still have some level of control over the administrative data as specified in agreements between the state agency and the U.S. Census Bureau. CARRA researchers and external researchers can access this administrative data for research purposes (Goerge et al. 2009; Meyer and Goerge 2011).
The U.S. Census Bureau, CARRA, and the IRS allow for federal data across multiple states to be accessed in certain instances (Chetty, Friedman, and Rockoff 2014). For example, research with IRS data should contribute to tax administration. Building good evidence, in part, may require outcomes to be measured across states. If employment, earnings, incarceration, college attendance, or welfare program utilization, among others, are the intended outcomes to be measured for a particular intervention, looking only within a particular state’s data holdings may be insufficient to adequately measure these outcomes. The National Student Clearinghouse collects postsecondary data from nearly all such institutions in the United States, but there are few such examples that are readily available to researchers.
Especially for states where there are high population densities at the borders, state agency leaders might be particularly interested in data from multiple states. Likewise, given inexpensive travel, higher mobility even among poor individuals may result in the need to measure certain outcomes across states. Therefore, having a data infrastructure that brings together datasets across domains for all states is not only an important tool for evidence-building, but would inform state leaders about the true outcomes for their populations without the bias of looking within only the state’s own outcome data.
University examples
Chapin Hall at the University of Chicago and the Center for Urban Poverty at Case Western Reserve are two examples of university-based research organizations that are compiling, curating, linking, and analyzing state data for the purposes of research and evaluation. Their data holdings cut across multiple decades and multiple state agencies. The value in this model is showing how an external party can effectively facilitate the production of evidence and knowledge through innovations that are valued by state agencies and the research community alike.
Embedded researchers
One potential approach for improving the communication between researchers and state agency leaders is for researchers to work inside state government as either temporary or permanent staff. The federal government uses the Intergovernmental Personnel Act to employ researchers in federal agencies to conduct research that is of mutual benefit to the agency and researcher. 11 State agencies have employed embedded researchers to assist leadership in the management and implementation of research. For example, Washington State’s Department of Social and Health Services has a Research and Data Analysis unit that “develops, initiates and supervises activities designed to meet the needs of agency decision-makers.” 12 This can be a relatively low cost and quick method to address the barriers described above because the researcher is brought to the data instead of the data having to leave the agency.
Clearinghouse
Finally, the U.S. Census Bureau has funded a university consortium, including New York University and the University of Chicago, to create a prototype administrative data research facility (ADRF) to pilot the notion of an administrative data clearinghouse. The ADRF is being built with state of the art technical resources and including tools that would facilitate the creation of high-quality metadata and build a learning community around specific datasets. Such an ADRF could provide services to state agencies, as well as researchers, to reduce costs and provide state-of-the-art tools to facilitate access.
Concluding Thoughts
In this article, I have posited that a closer partnership between state agencies and researchers is necessary to increase access to state agency data by researchers for rigorous evaluation and analysis. The collaboration between the two is not a natural one because the interests of each have not typically overlapped. Other stakeholders, particularly the federal government, will need to facilitate such a collaboration.
Collaboration between government and researchers may seem to blur what should be a clear line between program operators and researchers that ensures objective, independent research. Researchers or research organizations closely affiliated with specific programs may have their objectivity or integrity questioned. Vigilance to ensure objectivity will be needed on both sides.
A data security standard to certify organizations to perform the functions listed above for research purposes is needed. Currently, the federal government employs the Fedramp standard, which is required for an entity to have the “authority to operate” a data facility that provides access to researchers. Many states employ the HIPAA standards for covered entities (i.e., health care providers, insurers, government agencies) that do not necessarily apply to research organizations. The E-MOU is one piece of this, but states require full solutions that have not yet been specified. Is the Fedramp specification the right one for the states to employ? Or should states buy in to a service that is built to address the specific requirements and produce the ideal conditions for the production of evidence that states so desperately need?
To transform their raw administrative data into high-quality data, states require funding. Funding is provided to states to collect data and transmit it to the federal government for compliance purposes, so funding should be available to states for this purpose also. If states would be given the flexibility to use their federal funds earmarked for administration for evaluation, this would also increase the use of administrative data.
Data must be curated and the quality must be known to the extent it can be. 13 Where and how that is done is an open issue. Is that the role of the state agency that perhaps knows the data best? Is it up to the researcher to address issues of data quality and communicating them? Or should this be the purview of a third party—a government or private organization that has the responsibility for ingesting administrative data, developing metadata, managing legal and physical access to the data, facilitating the analysis of data, fostering discovery and learning about the data, and checking for disclosure of the output.
The creation of de-identified datasets—to the extent that it does not restrict analysis—is necessary and must be an area of further research. The creation of hashed identification numbers or ID numbers that are unique to a particular linkage of a number of datasets is a strategy that is being used more. Active management of the risks around the disclosure of PII is a requirement that must be addressed. Researchers should have access only to data that they require to conduct rigorous research.
Recommendations
Encourage ongoing collaborations among state and local agencies and researchers to jointly address the barriers in using administrative data across programs and agencies.
Build collections of data in secure facilities with the proper controls to ensure that only those individuals with the proper permission have access to data in a quick, manageable fashion.
Develop and hire agency leadership that understands the need for evaluation and research.
Train state and local government staff in the use of administrative data for program management and evaluation.
Train researchers not only in the techniques necessary to process and analyze administrative data, but also in state information system contents and database technologies that will allow them to facilitate the physical transfer of data from state agencies.
Footnotes
Notes
Robert M. Goerge is a Chapin Hall senior research fellow with more than 25 years of research focused on improving the available data and information on children and families, particularly those who require specialized services related to maltreatment, disability, poverty, or violence. He is also a senior fellow at the Harris School of Public Policy and Computation Institute at the University of Chicago and a senior fellow and lecturer at the Harris School for Public Policy Studies.
