You filed your taxes. Congrats, you’re administrative data!
One of life’s great certainties, Tax Day, is just around the corner. This annual rite is usually a time for reflection about the state of the federal tax code, its level, and the various rates—marginal, effective, and average—that we all pay. But there’s another angle that rarely, if ever, gets noted. After you submit your tax filing, you’ll be contributing to an important resource: administrative data.
Administrative data can be thought of as the publicly controlled side of “big data.” When you file your taxes, your returns become part of a dataset at the U.S. Department of the Treasury that includes everyone else who filed that year. And the years before that. Unlike datasets that come from the Current Population Survey, which only samples some of the population, administrative data can sometimes cover a larger portion of the population under study. The annual income data from the March Current Population Survey, for example, comes from a sample of roughly 160,000 observations. The IRS has annual earnings data on more than 140 million tax units.
For researchers, data of this size and accuracy offer a lot of opportunity. And just the tax dataset alone has been the source of some important economics research. Because these data do a better job at capturing top incomes than survey data, administrative tax data are the core of estimates of top-end income inequality from Thomas Piketty and Equitable Growth Steering Committee member Emmanuel Saez, and of top-end wealth inequality by Saez and Gabriel Zucman of the University of California, Berkeley. The same data source underlies the research by Raj Chetty of Stanford University and our Steering Committee, Saez, and others on economic mobility, as well as other research on the changes in the sources of business income.
The problem with administrative data is that it’s not easy to obtain access to. Unlike the Current Population Survey and other datasets like it, the data aren’t freely available for download on the Internet. Due to important concerns about privacy, the data are kept on a much tighter lease. Researchers have much taller hurdles to jump when it comes to accessing administrative tax data.
Access is also a significant problem for other datasets, such as data from the Social Security Administration. These large datasets are important for understanding topics such as the rise of inter-firm inequality in the United States, but it’s far from easy to get access to these data. There are concerns about privacy, and sometimes, as a result, individual researchers have to write their code with artificial data and have agency staffers run the code for them. This creates a bottleneck for researchers, leaving only a few researchers able to be accommodated at a time. Staff economists at these agencies have direct access to the data, so they are spared that inconvenience. This is one reason why Treasury or Social Security Administration economists are often co-authors on these projects.
It’s different in other countries. Take a look at some administrative data-driven research abroad and you’ll notice quite a few papers using data from Germany, where administrative data linking workers to their employers are much more readily available. The relatively easier access to data in Germany means we’re seeing researchers in the United States focusing on that country, while research on the United States lags behind. And when research abroad is done, it’s hard to replicate it in the United States to see what the results would be here without access to U.S. administrative data.
Unfortunately, some policymakers have tried to restrict the amount of data researchers and the public already have easy access to—even though the gains from the data almost certainly make up for the costs of collection. Additionally, expanding access to administrative data would help increase the amount of research and our knowledge of the U.S. economy. If we’re all contributing to these datasets, then we might as well have more access.