U.S. scholars need access to public and private big data
Big Data holds the promise of a wealth of information to uncover new insights into how our economy works but also the peril of exposing private information that could harm individual citizens. We all know that commercial ventures primarily use data gathered on their customers to track their purchases and spending habits—promising to varying degrees to protect such individual information—but now some private companies are allowing select scholars access to this information for research usage after the companies “anonymize” it.
One case in point is the JPMorgan Chase Institute, which last week unveiled its first report on the financial habits of retail banking customers at JPMorgan Chase & Co. The new research institute tapped into the commercial banking arm’s internal administrative data to determine how income and consumption fluctuate on a monthly and yearly basis. These findings will have important policy implications for lawmakers seeking to improve citizen’s financial well-being.
Researchers are constantly looking for new sources of information in order to answer the most challenging economic questions. But it is important to understand that by definition, JPMorgan Chase’s data can only tell us about their own customers. It cannot give us insight into the whole U.S. population—or even specific demographic groups. To create effective policies, we must gather information on all banking customers, not just those from one bank.
Still, researchers are flocking to private sources of data such as those released by JPMorgan Chase as well as credit-reporting companies. Yet the private sector is not the sole source of administrative data out there. Not by a long shot. The U.S. government holds tax records, school district filings, social security information—the list goes on—in order to administer its tax and benefit programs. Such recordkeeping has gone on for decades, but recent technological advances have made it easier to process these large datasets. Most importantly, government administrative data is representative of the entire population.
But because of perfectly reasonable privacy concerns this data is difficult to access, making a critical source of information—one that could allow us to investigate deep into our economy and provide better questions for policymakers to consider. But when handled correctly, these privacy concerns can be resolved. Those who are able to access the data have done amazing things. Work done using information from the U.S. Internal Revenue Service, for example, has transformed our understanding of the composition of incomes for those at the very top of the income ladder. And using administrative data, Harvard University economics professor Raj Chetty has repeatedly illustrated the extent to which your family and place of birth shape your success later on in life.
Such findings, however, are limited to the few scholars who have the means to gain access to this information. Even Chetty, the most well-known user of government data, must occasionally rely on European countries—many of whom have created secure data systems for researchers—to do his research on retirement subsidies, unemployment insurance, the effects of taxes on labor supply, among others. Such research tells us a great deal about European economies and their labor markets, but cannot directly translate into usable information for policymakers in the United States—a tragedy for U.S. researchers and policymakers alike.
Because scholars do not have the necessary access to U.S. government data in the same way that European countries provide access, it is welcome indeed that researchers can turn to the private sector. But this is a temporary solution to a much bigger problem. Yes, privacy surrounding government data is an issue. At the same time, we tacitly allow private companies to track our information with little vocal apprehension. What companies find out about us—and in the case of banks, they find out quite a lot—can be used to answer important economic and behavioral questions, but also by firms seeking to expand their profits.
Without full access to public administrative data, U.S. researchers cannot explore Big Data in pursuit of meaningful research. And firms like JPMorgan Chase cannot completely fill that gap. We need both public and private sources of information and, right now, public access is far behind.