Christy DeSmith
Researchers from Harvard and across the U.S. are raising awareness about the incredible value of information gathered by the United States government.
Read time: 5 minutes
The U.S. government publishes hundreds of thousands of datasets every year. For decades, social scientists eagerly mined them, crunching the numbers to glean insights on everything from wage inequality, health outcomes, and long-term trends in standardized test scores.
“Federal data allows us to see how we’re doing as a country, and where we need to improve,” said Christina Ciocca Eller, assistant professor of sociology and social studies. “If we want to know what’s happening across the population, we really need that federal data.”
Access to some of Washington, D.C.’s richest federal recordkeeping has fallen off during President Donald Trump’s second term. Many long-running statistics on climate and human health have disappeared from government websites. A shrunken federal workforce makes other information sets harder for researchers to obtain. Amid these disruptions, scholars from Harvard and across the U.S. are raising awareness about the incredible value of data gathered by the federal government.
“The cultural conversation around data really needs to change,” said Molly Hardy, project lead for the Public Data Project at Harvard Law School’s Library Innovation Lab. “It’s a service our taxpayer dollars pay for, and it is infrastructure that is essential to our democracy and a functioning society.”
The U.S. government has been sharing the data it collects for all of its 250 years, Hardy explained. By the 19th century, it was publishing sophisticated environmental and economic records in addition to census findings that are critical for municipal planning and representative governance. The Office of Education, a precursor to the Department of Education, started measuring enrollment, literacy, and academic quality in schools nationwide in the 1860s.
Since the 1970s, the Federal Emergency Management Agency (FEMA) has mapped flood risk across U.S. neighborhoods. In the 1980s, the National Oceanic and Atmospheric Administration (NOAA) developed a vegetative health index farmers still consult when applying for drought relief. More examples of everyday uses can be found via America’s Essential Data, put together by a coalition of data scientists and former government employees.
Demand for federal data surged with modern computing. As the digital age dawned, a new generation of social scientists trained for quantitative literacy. Many sought access to enormous administrative data sets assembled by federal agencies including the U.S. Department of the Treasury, Department of Housing and Urban Development (HUD), and Bureau of Labor Statistics (BLS).
“The government has been slow under every administration to catch up to these distribution and preservation needs,” offered Hardy, a former senior program officer in the Division of Preservation and Access at the National Endowment for the Humanities.
The Evidence Act of 2018, championed in part by former Republican House Speaker Paul Ryan, sought to modernize the flow of federal data to the research community. During a public service leave in 2021 and ’22, Ciocca Eller helped researchers from across the country access government records as assistant director of evidence and policy for the White House Office of Science and Technology Policy.
Ciocca Eller, who joined the Harvard faculty in 2019, is also a prodigious user of federal data in her own research. At Harvard, she has pursued a broad range of topics related to the U.S. higher education system, regularly drawing on data from both the U.S. Department of Education and the U.S. Department of Labor.
A recent study drew on a data set, provided by a major state university system, that amounted to anonymized transcripts of hundreds of thousands of students enrolled in its many campuses over a 20-year period. Ciocca Eller linked it with information drawn from the Integrated Postsecondary Education Data System (IPEDS), a U.S. Department of Education initiative that surveys every higher education institution participating in federal financial aid programs.
The methodology allowed her to analyze what characteristics and programs drive graduation rates at each campus for all students, focusing on students from groups traditionally underrepresented in higher education.
Her results demonstrated that college rankings are not always a good indicator of college quality, especially at less resourced public institutions like the ones she studied. Instead, her methods — drawing on federal data — helped show that quality can and should be assessed in ways that account for colleges’ ability to enhance upward mobility among enrolled students.
The cultural conversation around data really needs to change. It’s a service our taxpayer dollars pay for, and it is infrastructure that is essential to our democracy and a functioning society.
Harvard Law School’s Library Innovation Lab launched the Public Data Project to preserve and make accessible federal records in this moment. In early 2025, it announced its arrival by sharing more than 311,000 data sets copied from data.gov, the single largest repository of federal data. Other federal records have been archived and shared with the public via Harvard Dataverse.
But many researchers rely on restricted datasets accessible only to those licensed to protect privacy. Sharing these records would violate any agreements signed with federal agencies.
Ciocca Eller’s current project relies on a restricted data set provided by the DOE’s National Center for Education Statistics (NCES). “These are your federally funded surveys that do deep dives into the experiences and outcomes of students across each and every level of our education system — from pre-K to college graduation and even into the labor market,” she explained.
By relying on these data, the social scientist can generate new insights on the career trajectories of U.S. residents who have some college but no degree.
“I'm trying to understand their labor market outcomes as a function of the colleges they attend, and also as a function of time,” Ciocca Eller said. “I'm very curious to understand whether, for example, people with high school degrees in the early ’90s could get the kind of jobs that now people need at least some college to get.”
Her NCES dataset was obtained before layoffs hit the DOE last year. But Ciocca Eller’s agreement with the agency requires clearance before the presentation of any research results, a process now stalled out due to sharp NCES staff reductions.
A report, released in late February, promised streamlined dissemination of priority NCES datasets in the future. The impact on education researchers, and their access to the agency’s powerful restricted data sets, remains uncertain.
“Our best population-level data are funded by the federal government, fielded by the federal government, and delivered by the federal government,” Ciocca Eller said recently. “In the United States, there’s just no other source for this information.”
If approved, ENCE would become the first new concentration for Harvard College students since 2018.
Awardees include faculty studying infection, Atlantic slave trade, and South Asia artisan practices
The curator, author, and educator had been interim director for the past year.