Levenstein on the use of synthetic administrative data to protect confidentiality
Maggie Levenstein discusses how synthetic datasets allow researchers to study social systems without compromising individual identities. To create synthetic census data, for example, researchers feed original confidential census data into a complex statistical model that generates a simulated population that has the same general features as the original data. Verification analyses are used to ensure that analytic results on the synthetic data are also true for the original data. “The validation is a way of making sure that the assumptions that were built into the synthetic data are not driving the results, as opposed to the thing that the person is trying to study,” said Levenstein.