About Data Commons
Why Data Commons
Publicly available data from open sources (census.gov, cdc.gov, data.gov, etc.) are vital resources for students and researchers in a variety of disciplines. Unfortunately, processing these datasets is often tedious and cumbersome. Organizations follow distinctive practices for codifying datasets. Combining data from different sources requires mapping common entities (city, county, etc.) and resolving different types of keys/identifiers. This process is time consuming, tedious and done over and over. Our goal with Data Commons is to address this problem.
Data Commons synthesizes a single graph from these different data sources. It links references to the same entities (such as cities, counties, organizations, etc.) across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources without data cleaning or joining. We hope the data contained within Data Commons will be useful to students, researchers, and enthusiasts across different disciplines.
Who can use it?
Data Commons can be accessed by anyone via the tools available on datacommons.org. Students, researchers and developers can use the REST, Python and Google Sheets APIs, both of which are free for educational, academic and journalistic research purposes.
Data Commons has benefited greatly from many collaborations. In addition to help from US Department of Commerce (notably the Census Bureau), we have received help from our many academic collaborations, including, UC San Francisco, Stanford University, UC Berkeley and Harvard.
We are looking for more collaborators, both for adding new data to Data Commons and for building new and interesting applications of Data Commons. Contact us if you are interested in working with us.
We are fortunate to have the counsel of our Advisory Board, which includes:
- Vint Cerf, Chief Internet Evangelist for Google.
- Gary King, Director of the Institute for Quantitative Social Science at Harvard University.
- Arun Majumdar, Jay Precourt Provostial Chair Professor at Stanford University.
- Sendhil Mullainathan, Roman Family University Professor of Computation and Behavioral Science at Chicago Booth.
- Alfred Spector, Former Head of Research for Google.
- Hal Varian, Chief Economist for Google.