Abstract
Linkage of federal, state, and local administrative records to survey data holds great promise for research on families, in particular research on low-income families. Researchers can use administrative records in conjunction with survey data to better measure family relationships and to capture the experiences of individuals and family members across multiple points in time and social and economic domains. Administrative data can be used to evaluate program participation in government social welfare programs, as well as to evaluate the accuracy of reporting on receipt of such benefits. Administrative records can also be used to enhance collection and accuracy of survey and census data and to improve coverage of hard-to-reach populations. This article discusses potential uses of linked administrative and survey data, gives an overview of the linking methodology and infrastructure (including limitations), and reviews social science literature that has used this method to date.
Family is a central and enduring social institution. Whether in the form of a nuclear family, extended family, childless couple, or one of many other household structures, families are fundamental building blocks of neighborhoods, religious communities, and social networks. Because of the pivotal role that family plays in child development and socialization and its intersections with domains such as gender, socioeconomic status, race and ethnicity, paid work, care work, living arrangements, and the law, it is a topic of interest to social scientists and policy-makers alike (Gornick and Meyers 2003). As families and family members pass through the life course, they interact with multiple institutions, generating administrative data on their encounters and outcomes. Linking such administrative data with existing research data, including survey and census data, provides great opportunity to understand families.
The U.S. Census Bureau accesses data from federal agencies, including the Internal Revenue Service (IRS), Department of Housing and Urban Development (HUD), Social Security Administration (SSA), and Centers for Medicare and Medicaid Services, as well as state-level data on programs such as the Supplemental Nutrition Assistance Program (SNAP), the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), and the Temporary Assistance for Needy Families (TANF) program, among others. The Census Bureau links these files at the person level with data from household surveys such as the Survey of Income and Program Participation (SIPP), the Current Population Survey (CPS), and the American Community Survey (ACS) as well as with decennial census data. We discuss this linking process below.
Linking administrative data to survey data holds a great deal of promise for family researchers. For example, administrative data can help researchers to understand individual family members’ labor force experiences (IRS data), health insurance coverage (Medicare and Medicaid data), and interactions with the criminal justice system (corrections data), as well as outcomes for veterans and military families (Veterans Affairs data). When linked to survey data, these records may allow researchers to observe how the experiences of one family member affect other members of the family.
Administrative data are particularly valuable for understanding low-income families, who are often underrepresented in survey data. For example, records generated by programs such as TANF, SNAP, and WIC can help researchers to understand how low-income families meet their material needs and how well these programs serve their intended populations. Administrative data can also help to illuminate family relationships. With increasing family complexity, outcomes for individuals and families may need to be observed across multiple households (Manning, Brown, and Stykes 2014) and to include members whose relationships to one another are not easily captured by surveys. Such complex relationships may sometimes be better understood by combining information from administrative and survey data. Individual family members experience change across their own lives, and family composition changes over time as unions are formed and dissolved, children are born, and family members die. Administrative data—which may capture information about an individual at multiple points in time—can help to extend the chronological reach of survey data.
In this article, we discuss the potential contributions to family research of linking administrative and survey data. We briefly review some of the economic, sociological, and policy research that to date has productively used linked administrative data and survey data to address five research questions. We discuss the process of record linkage and describe the administrative infrastructure available at the U.S. Census Bureau to facilitate researchers’ access to these data. Finally, we discuss some data limitations endemic to administrative data, and the potential for future research in this area.
Framework
The value of linking administrative data to survey data to study family issues is best demonstrated by citing some of the research that has been conducted using this method. The increased prevalence of multipartner fertility, complex families, multigenerational households, and household doubling up challenges researchers to capture and understand family and household relationships. Survey and census data may capture only one dimension of an interrelationship and may not capture all members of a family that spans multiple households.
This review of a selection of literature investigates whether and how the linkage of administrative records and survey data can address five questions highlighting family research challenges.
Can administrative data generate new ways of measuring individual families and/or households?
Can administrative data be used to extend information on families in survey and decennial census data?
Can administrative data be used to evaluate accuracy of survey data?
Can administrative data be used to evaluate coverage in census records of hard-to-reach populations who may be better represented in administrative data?
Can administrative data be used to evaluate access to and participation in social welfare programs?
By answering these questions, we highlight how linkages between administrative records and survey data can create broader and deeper data resources. In the section that follows, we describe data sources and linkage methods that permit researchers to undertake analyses that address these five questions.
Data and Methods
The U.S. Census Bureau links administrative data at the person and address level to information collected by the Bureau, including data from SIPP, the CPS, and the ACS, as well as decennial census data. Using common fields across sources, probabilistic matching results in a unique identifying number called a protected identification key and strips records of identifying details. Unique identifying numbers called master address file identifiers are also appended to facilitate linkages by address. These persistent unique identifiers enable data records for the same person to be linked across files and over time while preserving confidentiality. The Census Bureau’s authority to access records with personal identifiers and linkage infrastructure enable researchers to identify family relationships in multiple data sources and follow family members and their experiences in multiple domains. The Census Bureau is also involved in a joint effort with the National Academy of Sciences and academic researchers to digitize and match names from the 1990 Decennial Census to administrative data. This pilot project is part of the American Opportunity Study, seeking linkages of census, survey, and administrative data across generations to study social and economic mobility (Grusky, Smeeding, and Snipp 2015).
Chapin Hall at the University of Chicago maintains an integrated database on child and family programs for the state of Illinois. Chapin Hall’s database spans education, employment, income support, and health domains, including programs such as public school, pre-K, and Head Start data; unemployment insurance wage and benefit records; Workforce Innovation and Opportunity Act data; Supplemental Security Income and Aid to the Aged, Blind, or Disabled; TANF and TANF work programs; SNAP; childcare subsidy and licensing data; and data on Medicaid eligible individuals, Medicaid providers, and Medicaid claims. This collection of rich data has been carefully curated for decades and allows for deep and diverse studies in Illinois.
Although the Census Bureau currently has a large collection of national administrative data files, the Bureau is also seeking to expand its collection of state-level human services program data from programs such as SNAP, WIC, and TANF. The Bureau hopes to eventually acquire data from all states to provide additional, state-level information for researchers, in particular on low-income families and individuals.
Discussion
Linking administrative data with survey data can contribute to expanding and improving family research. Below we discuss papers that illustrate how administrative data can most productively be used to study families, in ways that address our five research questions. Table 1 recaps our research questions and summarizes examples of data sources that can be used to generate answers to these questions.
Synopsis of Research Questions and Data Sources
Improving measurement of families and households
The measurement of families has evolved over time in both censuses and sample surveys (Ruggles and Brower 2003). To measure families in sample surveys, the Census Bureau currently captures information about householders and other persons in a given housing unit. The amount of detail that associates a household’s residents with the householder varies across surveys. Some surveys contain rich relationship pointers (CPS, SIPP), allowing analysts to determine relationships for families and subfamilies (Kennedy and Fitch 2012). Other surveys, such as the ACS, only measure relationships of household residents to the reference person.
Previous research by Heggeness, Alexander, and Stern (2012) has demonstrated that differences in how family units are measured can affect larger estimates, such as household incidence of poverty. Administrative records allow for ways of measuring family units that complement family measurement in survey files, particularly with respect to measurement of household-level resources. Some administrative data sources are person level, such as information returns from the IRS. For example, wage statements (Form W-2) reflect earner information. In contrast, individual income tax returns (Form 1040) include tax filing units, which may include a secondary filer and tax dependents. This forms a “family” unit of sorts, using conditions on coresidence and support defined by the Internal Revenue Code. Similarly, data from SNAP and TANF programs include household units, reported to the human services agencies by program applicants. SNAP units are defined as groups of people who live together and purchase and prepare meals together (Harris 2014). TANF rules vary from state to state as to whether pregnant women are eligible, and whether teen parents are considered head of their own family unit or must be counted with their own parents (Huber et al. 2015). These alternative ways of measuring family relationships can augment the understanding of family units as defined in survey data.
Extending family information from survey and census records
Families present significant measurement challenges for social scientists. There are many types of families (e.g., two-parent families, single-parent families, no-children families, same-sex parents, grandparent-headed families, and so on). Membership of families changes as new members are born or adopted and as others separate. Family members may all live in the same place (or not) for different periods of time (Manning, Brown, and Stykes 2014; Wall and Bolzman 2013). Multiple subfamilies may live in the same household, and also compose extended or multigenerational families (Mykyta and McCartney 2011). Finally, not all families are related by blood, as children may be formally or informally adopted, or individuals may identify as fictive kin (Radel, Bramlett, and Waters 2010). Families may have complex structures that vary over time and context. Thus, household membership may or may not align with family membership.
Different household members may be observed in different survey and administrative data. Figure 1 shows a stylized example of a family centered on a child. Assume the child lives in the maternal grandmother’s house. The child’s mother is often present, and the child’s father is occasionally present. The node between the child and maternal grandmother is observed in census and survey contexts where questions are asked of the householder. The other nodes connecting the child with his or her mother and father do not appear in the survey. The node linking the mother and child appears in SSA and state vital records data. The Social Security and vital records may also include the node between the child and his or her father. In other sources that may provide more information about relationships—such as public housing or voucher data from HUD, Medicaid, food security support from SNAP or WIC, or IRS Earned Income Tax Credit EITC—the family relationships reflected will depend on whether the grandmother or mother was the responsible party who enrolled the child.

Family Relationships as Reflected in Survey and Administrative Data
Census Bureau researchers have been generating research on how individuals, especially children, are captured in administrative data, and how family structures differ from self-reported information collected in survey or population census data. These comparisons can be used to improve survey accuracy and to offer new information about respondents at various points in their lives. Research matching decennial census data to administrative data uses probabilistic linkage infrastructure at the Census Bureau to identify individuals in each data source that are likely to be the same person and/or people. Luque and Wagner (2015) compare parent-child links constructed from SSA data using probabilistic name matching to filer-dependent child data from individuals’ IRS income tax returns and parent-child relationships from the 2010 U.S. Census. Parent-child information from IRS data can be used to supplement information from census data. Luque and Wagner (2015) argue that more accurately linking parents to children in administrative and survey data can improve researchers’ ability to study intergenerational mobility, labor market outcomes, and child development. Massey (2014) shows how record linkage can be used to identify family relationships over time. She finds that data from the 1960 U.S. Census can be linked to current administrative data to study individuals’ outcomes over the life course as well as their relationships to other family members. Using a transcribed sample of the 1960 U.S. Census with very limited identifiers (name, quarter of birth, and year of birth), the inclusion of parent name information increases the match rate to the administrative data.
Evaluating accuracy of survey data
By linking administrative data on participants in social programs to survey data, researchers can evaluate the accuracy of reporting on participation in those programs. Recipients of public benefits such as TANF, SNAP, WIC, Medicaid, and Medicare commonly underreport their receipt of these benefits, either because of confusion about the nature of the benefit that they are receiving (Marquis and Moore 1990) or because of desirability bias (Nederhof 1985). Meyer and Mittag (2015) link the CPS with New York State SNAP, TANF, and housing subsidy data to examine the coverage of the safety net, specifically the share of people without work or program receipt. They find that survey data both understate the income of low-income households and overestimate the share of single mothers who are neither working nor receiving benefits. Similarly, Meyer and Goerge (2011) find underreporting of food stamp benefits by roughly 35 percent in the ACS and 50 percent in the SIPP. Card et al. (2004) use California state-level Medicare data to show that the SIPP underreports Medicare receipt by about 10 percent, with the highest underreporting about young children. Huynh, Rupp, and Sears (2002) match national-level SSA data to the SIPP to show that the SIPP underestimates the dollar amount of benefits received. Kim and Tamborini (2014) match SIPP and IRS W-2 data to show that SIPP respondents misreport their earnings, with different patterns of misreporting among different educational and race/ethnic groups. As the U.S. Census Bureau and Chapin Hall acquire more administrative data sources and make them available to researchers, we expect that researchers will be able to further examine survey-administrative record discrepancies, thereby improving the understanding of the strengths and limitations of survey data for all researchers.
Evaluating coverage of hard-to-reach individuals and families
The availability of high-quality data for social science research has long relied on federally funded large-scale longitudinal surveys in education, health, and human services. Surveys sponsored by the U.S. Census Bureau, Department of Health and Human Services, Department of Agriculture, and other federal agencies support traditional research data in the social sciences. These surveys face declining respondent cooperation and budget constraints, and sponsors are turning to auxiliary and complementary sources of information. Administrative data can be used to improve imputation as a way around unit and item nonresponse (Meyer, Mok, and Sullivan 2015) and to evaluate the accuracy of existing data. The Census Bureau is exploring methods to use administrative data instead of asking respondents particular questions, especially those that are considered intrusive or are difficult to answer. For example, the Census Bureau is assessing the extent to which IRS income records could replace the ACS income questions. Data quality, coverage, and timing issues must be analyzed, but in the future, IRS data may make data collection for some subpopulations unnecessary.
Substantial internal research at the Census Bureau has been dedicated to evaluating administrative data coverage of various subpopulations in decennial census records. Some groups of people—such as children (O’Hare 2015), men, and African Americans (Clogg, Massagli, and Eliason 1989; Raley 2002)—are known to be systematically missed and undercounted by censuses and sample surveys. Internal studies have evaluated how well individuals from national-level Indian Health Service, IRS, Medicaid, and Medicare files—and state-level SNAP and WIC files—are represented in 2010 U.S. Census files, and how much information matches across these files. This coverage and quality information can improve survey sampling frames, reduce cost and burden during the 2020 Decennial Census, and may also be used to supplement incomplete records and improve imputation in the case of unit and item nonresponse. In addition, research has evaluated whether hard-to-reach and frequently underreported groups may be better represented in administrative data than in decennial census records. Results from internal research show, for example, that supplementing census records with WIC and SNAP records substantially improves coverage of young children.
Evaluating social welfare program access and participation
Linking survey and administrative data allows comparison of household and benefit units. This allows researchers to evaluate the extent to which individuals who and households that are estimated to be eligible for benefits actually take advantage of them. Such information can help program administrators to better target their services to eligible populations and nonparticipating individuals by generating information about which characteristics are associated with increased likelihood of nonparticipation. In the future, better understanding of household- and family-level patterns of benefit usage may help survey methodologists to design questionnaires to prompt respondents’ memories in ways that will improve data accuracy. Czajka, Cunnyngham, and Rosso (2015) link SNAP administrative data from New York and Colorado to three surveys, the ACS, CPS, and SIPP. They compare SNAP unit membership as recorded in the administrative data to simulated unit membership (ACS and CPS) and reported unit membership (SIPP). They show that in 50 percent of survey households, all members of both the ACS survey household and the New York administrative unit were matched to the other dataset, and that the number of simulated and administrative SNAP units were the same. The vast majority of these households contained only one SNAP unit. In addition, between 44 and 47 percent of the households receiving SNAP benefits also included SNAP nonparticipants.
Also using the ACS, Scherpf, Newman, and Prell (2015) link survey data to state SNAP records to study how well SNAP is reaching the intended population. They find that substituting SNAP receipt as measured in administrative data for survey reports of SNAP receipt increases the proportion of lower income units receiving SNAP benefits in a year. More individuals and families receive SNAP than survey research would indicate. Looking at total benefit amounts and number of months of receipt in the year reveals that lower income units use SNAP more intensively. Newman and Scherpf (2013) use ACS and state-level data to examine uptake of SNAP in Texas, according to different geographic areas and demographic groups. They find that eligible households with children had higher access rates than other groups, and that eligible people aged 65 and up had the lowest access rates.
Limitations
Our approach to improving family research by linking federal surveys and administrative data relies not only on access to federal and state program files but also on the ability to conduct analyses and publish findings while maintaining the privacy of individuals. This section describes these access and confidentiality challenges—as well as other data access barriers facing researchers—and the inherent limitations of using administrative records for research purposes.
Both Chapin Hall and the U.S. Census Bureau have been acquiring and integrating administrative data for decades. This involves the engagement and cooperation of the local, state, or federal agency that owns the data. The data are often trapped in silos that span levels of government (e.g., federal, state, county, and city), silos that exist within and across agencies, and silos across domains (e.g., welfare and benefit programs, human services, law enforcement, education, employment/wages, and public health programs). Data access cannot occur without agreement from lawyers on permissions for data use and the cooperation of program and information technology staff to define, extract, and transmit data. Once studies are prepared, researchers must also engage program administrators and sometimes obtain their permission to release the findings.
At the Census Bureau, all record linkages must support its programs by improving census and survey collection and accuracy, or supporting other statistical activity. When the Census Bureau is deciding whether to link datasets, the linkage’s potential to support the agency’s mission is evaluated along with other criteria including sensitivity, cost, burden, timeliness, and data quality. Protecting the confidentiality of individuals described in these data is critical; at the Census Bureau, all personal and business information acquired by the Census Bureau and any resulting linkages are protected by law under Title 13 of the U.S. Code.
Once the datasets are linked, academic researchers from across disciplines, such as economics, policy analysis, and sociology, may want access to the linked data. Federal, state, and local government program administrators who wish to evaluate their own program implementation or policy impacts may also want access to the data. Obtaining permission to use the data and gaining access to the data are difficult and time-consuming for researchers. To help ameliorate these challenges, the Census Bureau brings administrative databases from different institutions and domains together in one place. By leveraging existing systems for governance and privacy protection, the Census Bureau is expanding secure access to data for researchers through its existing network of Federal Statistical Research Data Centers. The Census Bureau is working to improve the infrastructure by which researchers gain access to data. For example, the Bureau is creating user interfaces that allow for the use of comprehensive metadata, as well as robust protocols that allow for access to data integrated across surveys, to federal administrative data, and to state and local administrative data. These efforts align with the charge of the Evidence-Based Policymaking Commission, which was established under Public Law 114-140 to study how administrative data on federal programs, survey data, and other statistical data series can inform program evaluation, cost-benefit analyses, and policy-relevant studies.
Finally, while linked administrative and survey data have unique characteristics and benefits for researchers, they also have unique shortcomings. For example, a major advantage of administrative records is that they include information on the full population receiving a particular benefit or service at a given time, and are not subject to the uncertainty inherent in survey reporting. However, they are representative of neither the entire U.S. population nor even the entire population that may be eligible for a benefit or service. Another major advantage is that administrative data may include different ways of measuring family units and may contain information about families that can augment the information available in survey research. However, administrative data include many fewer variables than typically appear in survey data, so the extent to which they can augment survey data may be inconsistent. Because administrative data have not been designed explicitly for use in social science research, they may contain systematic or random error.
Conclusion
The dynamic nature of family and the expanding number of potential social research and policy questions concerning family call for new and different research approaches to studying families. Linking administrative data to existing survey data is one methodological innovation that offers researchers the capacity to enhance, extend, and improve survey data.
Given the growing electronic data collection around many facets of individuals’ lives, researchers can make use of administrative data to study not only individuals in families, but also the family unit itself, whatever form it might take. By pursuing administrative data sources, making these files usable for researchers, and providing a platform by which internal and external researchers may use these data, the Census Bureau and research partners, such as Chapin Hall, hope to launch a new era in quantitative research on families. As discussed above, linkage of administrative data with survey data allows for better measurement of family units and different ways to understand interrelationships in families and households (for example, in terms of food and meal production, social supports, and financial resources). This will help researchers to better meet the measurement challenges around family identification and to understand family systems and the functioning of families as entities within local, global, social, and economic contexts. Linking administrative records to survey data may also allow researchers to observe individuals and their needs at multiple points in time, thus tracing their outcomes over longer periods than are typically available outside of longitudinal studies. Linking administrative data and survey data allows researchers to trace family outcomes over time as well—for example, observing links between children, parents, and grandparents, their use of social services, and their near- and long-term social and economic mobility. We believe that the availability of these data will spark the creation of new and innovative methods; generate new knowledge to the benefit of social science research; and improve federal, state, and local policymaking.
Footnotes
Amy O’Hara is chief of the Center for Administrative Records Research and Applications at the U.S. Census Bureau. She addresses legal, policy, and methodological issues to expand use of administrative records data in federal statistics.
Rachel M. Shattuck is a sociologist and demographer in the Center for Administrative Records Research and Applications at the U.S. Census Bureau. Among her current projects is an analysis of childcare records linked to subsequent educational outcomes in survey data.
Robert M. Goerge is a senior research fellow at Chapin Hall at the University of Chicago and has senior fellow positions at the University of Chicago Harris School of Public Policy and Computation Institute and NORC.
