Our long-run research objective is to create the Longitudinal Intergenerational Family Electronic Micro-dataset (LIFE-M) that spans the early and middle 20th century US. Funded by the National Science Foundation Cyberinfrastructure Framework for 21st Century Science and Engineering (SMA 1539228), the LIFE-M project uses millions of US vital records to reconstruct how and why individuals? health has changed across time and generations. This multi-generational, longitudinal micro-database will transform the research frontier in studies of health and longevity, childbearing and family structure, and the long-run health effects of early-life circumstances and exposures.
To advance the creation of LIFE-M, the proposed pilot project seeks funding to compare and improve the performance of different automated linking algorithms. Automated linking methods are required to complete LIFE-M because it is not feasible or cost-effective to link millions of records by hand. This project?s specific aims are to vet the most popular automated linking methods in order to: (1) Produce systematic evidence regarding the performance of many popular algorithms in four dimensions: (A) match rates, (B) representativeness of the underlying population, (C) erroneous match rates (type I errors), and (D) systematic measurement error (in terms of type I errors or type II errors, or missed links). (2) Examine how phonetic name-cleaning methods affect quality. (3) Examine how match quality metrics vary for different underrepresented subgroups, including women and racial/ethnic minorities and immigrants, to determine how linking methods could affect inferences for different populations differently; and (4) Formulate recommended practices for researchers based upon the findings in aims (1) – (3).

In addition to improving the quality of the LIFE-M database, this project provides information highly relevant to ongoing projects relating to population health and aging. For instance, the Minnesota Population Center (MPC) and Census have partnered with LIFE-M to create the American Longitudinal Infrastructure for Research on Aging (ALIRA) which uses record linkage to trace individuals and families across successive generations (funding to NIH pending). This project may also enhance the NIH-funded Early Indicators Project, which links Union Army veterans to their children and grandchildren, as well as recent initiatives to link Medicare records with census and administrative data.