Federal data is under attack, but data users can work together to preserve and democratize it

Overview
Staffing shortages, political interference, and a federal government shutdown are all disrupting the routine operations of the nonpartisan federal statistical agencies that reliably gather facts and publish data about the state of the U.S. economy and U.S. labor market. Many in private industry, state and local governments, think tanks, associations, and academics alike are stepping up to support and defend the integrity and importance of federal data. Simultaneously, many people and organizations can and should engage in building data collaborations and datasets that, over time, could lead to even better economic data.
Of course, there can be no replacement for the value created when federal statistical agencies, such as the Bureau of Labor Statistics, follow publicly available standard operating procedures that protect confidential information and produce objective, timely, and accurate data for the public, as required under a bipartisan 2018 law and 2019 rulemaking. Since 1992, the National Academy of Sciences regularly recommends best practices to federal statistical agencies. And, until it was recently disbanded, independent technical experts on a Data Users Advisory Committee routinely met with and advised BLS staff. All other data products, including private ones, benchmark against federal data.
At the same time, the anemic BLS budget has fallen by 22 percent since 2010 in real dollars, despite the need to sustain funding and improve operations. While professional associations of economists and statisticians recommend a 10 percent increase in the 2026 federal budget to support current BLS operations, President Donald Trump’s budget instead recommends an 8 percent cut and further reductions in staff. (Estimates suggest that 20 percent of BLS staff have already left their positions since January 2025, and a third of the agency’s leadership roles are vacant.)
Making reliable data accessible in a timely manner to inform decision-making is foundational for economic growth and equity in the United States. As Jonathan Cohen at the American Academy of Arts and Sciences and political scientist Katherine Cramer at the University of Wisconsin–Madison pointed out earlier this year, the right data are essential for a democracy that depends on an informed citizenry.
Defending the BLS tradition of “the fearless publication of the facts without regard to the influence those facts may have upon any party’s position or any partisan’s views” is essential for reliable federal data. But another path forward can be pursued simultaneously. Data users can work on projects that contribute to democratizing our data.
If data users do not act, public data can disappear. This column highlights past and present work by many organizations and researchers to protect and preserve labor market data and introduces a new working paper that illustrates a path toward creating more data, knowledge, and value than the status quo allows.
What can be done now to improve U.S. labor market data
In 2020, New York University Wagner School of Public Service professor emeritus Julia Lane authored a manifesto, Democratizing Our Data, laying out a vision for transforming public data by engaging data consumers in the collection of data and the construction of statistics that can be made available to the public. Her vision for community-engaged public data has the potential to cut costs, increase timeliness, create more value, enable more adaptability to different uses, and spark greater innovation through wider participation—all while continuing to protect the essential privacy of data.
Lane’s vision is based on her pioneering experience of first conceiving, and then building, the dataset at the U.S. Census Bureau that links household and employer data, which involved the construction of state-by-state partnerships to link state and federal data sources. She then went on to build the Coleridge Initiative, a secure platform for state and federal data-sharing. Both projects are mentioned in her award citation for the 2025 SOLE Prize for Contributions to Data & Measurement from the Society of Labor Economists.
Thousands of practitioners and researchers rely on the data that she helped assemble and construct. The work involved in building such datasets required her to convince many people across many organizations with few incentives to work on improving the collection and construction of public data, to realize they would all benefit from its existence. Her efforts laid essential groundwork and blazed a trail for data collaborations, such as a new online job-ad data aggregation project with the National Labor Exchange in which I and my co-collaborators are engaged.
In a new Washington Center for Equitable Growth working paper, “Extracting O*NET Features from the NLx Corpus to Build Public Use Aggregate Labor Market Data,” my co-authors Stephen Meisenbacher at the Technical University of Munich, Svetlozar Nestorov at Loyola University Chicago, and I describe the construction of an aggregate dataset of features extracted from online job postings in the United States, covering September 2015 to June 2025. Our project builds on the 2024 unanimous recommendation by the U.S. Department of Labor’s Workforce Information Advisory Council that the U.S. Secretary of Labor invest in timely, localized, and actionable data. Their top recommendation was to strengthen the National Labor Exchange, or NLx.
NLx is the data trustee for our nation’s online job-ad data and is sponsored and maintained by the National Association of State Workforce Agencies and the Direct Employers Association, which includes the nation’s largest private-sector employers. We built our dataset from more than 155 million job posts collected by the NLx Research Hub, a nonprofit partnership whose mission is to “provide the most accurate and comprehensive collection of real, online job openings at no additional cost to state workforce agencies and employers.”
Our dataset follows the O*NET taxonomy for understanding work used by many researchers and practitioners. In our dataset, there is far more data aligned with standard classifications for understanding the U.S. labor market than any other dataset currently available.
Importantly, NLx data-use agreements and provisions protect sensitive and disaggregated information. The natural language processing tools we developed to extract standard O*NET features from job ads are hosted publicly on the code-sharing platform Github and the AI community platform Hugging Face, both of which permit others to test and adopt this software, which we make available freely for noncommercial uses. Aggregate data at the occupation, industry, and geographic levels can and will be released publicly after peer review and publication.
NLx’s own data products include JOE, a soon-to-be-launched publicly available Job Opening Estimator, where users can find a prediction of the BLS monthly Job Openings and Labor Turnover Survey a month earlier than the official release, based on the historically tight correlation between benchmark JOLTS data and NLx data. Then, there’s the NLx On Demand platform, which enables users to access aggregate online jobs-ad data. A team using NLx data also has developed a skills extraction tool at the Leveraging AI for Skills Extraction & Research, or LAiSER, project at George Washington University’s Institute for Public Policy and is working with partners across the country to analyze employer demand. These are just some of the many projects in this vibrant national ecosystem of workforce development professionals at the state and local level and private-sector and academic collaborators.
Small, agile teams dedicated to data collection and the production of aggregated statistics can have an impact. The NLx Research Hub, and successor work being done with it, emerged from the work of a small team at the National Association of State Workforce Agencies, or NASWA. A doctoral student at George Washington University, Emma Northcott, first suggested applying for the National Science Foundation and Gates Foundation investments that now make it possible for researchers to access high-quality NLx data through the Research Hub.
Since 2007, NASWA leaders have continuously and incrementally stewarded the work of archiving online job ads from the national distribution pipeline of labor market information clearinghouses first envisioned in the 1933 Wagner-Peyser Act to facilitate efficient labor market matching. The NLx model is still in early days and has demonstrated success in a short period of time—and it can endure with the support of practitioners and academic researchers.
Indeed, the NLx partnership with the Direct Employers Association was just renewed until 2037. NLx has suggestions for private employers, state and local government agencies, analysts and researchers, and others to get involved in supporting accurate data collection and additional uses of these data.
Changes are needed to protect and improve federal data collection
Everyone who aspires to produce useful public data stands on the shoulders of giants and should have gratitude for the many contributions that create the world-class public data products in the United States today. Sustained investment in federal statistical agencies are necessary, but so are changes to reduce the cost, increase the speed of change, and adapt to users’ evolving needs through a different way of working.
Federal statistical agencies also recognize a need for change. One challenge to federal data collection is that survey responses have declined, especially in the wake of the COVID-19 pandemic. Recent posts from U.S. Census Bureau interim director Ron Jarmin outline major efforts to incorporate real-time feeds of data from outside providers and improve data collection on the business ecosystem. Even so, as the information scientists Christine L. Borgman at the University of California, Los Angeles and Philip E. Bourne at the University of Virginia have written, “it takes a village to manage and share data” in work that describes how commons approaches are needed to build sustainable systems.
Funders, including the federal government, could support community-engaged co-production of public data. Seed investments in projects, such as the NLx, are needed to collect data and establish infrastructure that protects privacy and enables an ecosystem of interested users to access data and build aggregate statistics that inform the public and create more value for users.
Academic institutions and journals could publish and reward risk-taking efforts to develop the software and demonstration projects of public data that engage a relevant community. Educational institutions should recognize where the demand is: In her manifesto, NYU’s Lane also outlines the need for a trained workforce capable of building and working in data collaborations that will be necessary for this work. Faculty can involve students in the work.
Nothing can replace the value of federal statistics. A fully funded, nonpartisan, independent Bureau of Labor Statistics and Census Bureau are essential. At the same time, public data that are taken for granted can disappear. One major case in point: The entire history of online job ads from the early internet through 2007 was destroyed when federal funding to support America’s Job Bank ended.
Picking up the pieces after a demolition is hard. Online job ad data prior to 2015 remains patchy. Near- and medium-term efforts from an engaged user community can partially fill gaps and make progress. In the long run, these efforts could be combined with, expand, and complement the unique capabilities of our nation’s vital federal statistical agencies.
Did you find this content informative and engaging?
Get updates and stay in tune with U.S. economic inequality and growth!
Stay updated on our latest research