Datasets in the Data Commons Graph

The base data in the Data Commons Graph, including taxonomy, is derived from schema.org.

US Census Gazetteer - Geographic Areas

From the 2018 US Census Gazetteer, Data Commons has data about:

  • Counties
  • County Subdivisions
  • 116th Congressional Districts
  • Census Tracts (2018)
  • Core Based Statistical Areas
  • Places (City, Census Designated Place, Village, Town)
  • School Districts - Elementary
  • School Districts - Secondary
  • School Districts - Unified
  • ZIP Code Tabulation Areas
Data made available under US Census Terms of Service.

National Oceanic and Atmospheric Administration (NOAA) - Daily Weather: Historical and Forecast

Data Commons has historical daily weather statistics on min, mean, and max of temperature, visibility, rainfall, snowfall, atmospheric pressure, humidity, and visibility from 2009-2019 (last 10 years).

Data made available under National Weather Service Use of NOAA/NWS Data and Products Terms of Service.

United States Geological Survey (USGS) Geographic Names Information System (GNIS) - National Federal Codes

The National Federal Codes dataset, includes all national features in a single file, maintained by GNIS.

Data made available under USGS Copyrights and Credits Terms of Service.

US Census Bureau - Cartographic Boundary Files

Data Commons has KML files from 2018 US Cartographic Boundaries:

  • Congressional Districts: 116th Congress
  • County
  • Place
  • State
  • Census Tracts
  • County Subdivisions
  • ZIP Code Tabulation Areas
  • School Districts - Elementary
  • School Districts - Secondary
  • School Districts - Unified
Data made available under US Census Terms of Service.

US Census American Community Survey (ACS) - 5-year Estimates

Data Commons includes a broad range of topics covering social, economic, demographic, and housing characteristics of the US population at the country, state, county, city, ZIP code tabulation area, school district, census tract levels, and more. The ACS publishes 5-year estimates every year.

Data made available under US Census Terms of Service.

US Census - County Business Patterns

Data Commons includes the 2011-2016 County Business Pattern datasets. Each annual dataset includes the number of establishments, employment during the week of March 12, first quarter payroll, and annual payroll. In addition, the record layouts and the references for industry and geographies are available within their year.

Data made available under US Census Terms of Service.

US Census - Small Area Health Insurance Estimates (SAHIE)

Data Commons includes the SAHIE dataset, which provides yearly estimates of health insurance coverage status for all counties and states. These estimates are available by age, race, sex, and income.

Data made available under US Census Terms of Service.

The Bureau of Labor Statistics - Monthly County-Level Employment and Unemployment

Data Commons includes the monthly labor force data by county, not seasonally adjusted data.

Data made available under BLS Terms of Service.

The Bureau of Labor Statistics - Quarterly Census of Employment and Wages (QCEW)

Data Commons contains 1990-2017 QCEW NAICS-Based Data Files County High-Level Layouts.

Data made available under BLS Terms of Service.

Bureau of Economic Analysis (BEA) - GDP Datasets

Data Commons has the following BEA datasets:

Data made available under BEA Terms of Service.

Federal Election Commission

Data Commons include Candidate Master, Candidate-Committee Linkages and All Candidates datasets from the FEC.

Data made available under FEC Terms of Use.

College Scorecard - University Data

The College Scorecard dataset includes data from 1996 through 2017 for all undergraduate degree-granting institutions of higher education and supporting data on student completion, debt and repayment, earnings, and more.

Data made available under US Department of Education Terms of Service.

National Center for Education Statistics (NCES) - Public School and School District Data

National Center for Education Statistics dataset includes data about public schools and school districts on student populations disaggregated by race, gender, and grade at both the School District and School Level.

Data made available under NCES Data Usage Agreement and US Department of Education Copyright Status Notice.

CDC - 500 Cities: Local Data for Better Health

The 500 Cities Project datasets contain model-based small area estimates for 27 measures of chronic disease related to unhealthy behaviors (5), health outcomes (13), and use of preventive services (9).

Data made available under CDC Data Terms of Service.

CDC Wonder - Daily County-Level PM2.5 Concentrations

Daily County-Level PM2.5 Concentrations "provides modeled predictions of particulate matter (PM2.5) levels from the EPA's Downscaler model. These data are used by the CDC's National Environmental Public Health Tracking Network to generate air quality measures. Data are at the county levels for 2001-2014".

Data made available under CDC Wonder Data Terms of Service.

CDC Wonder - Underlying Cause of Death

The Underlying Cause of Death, 1999-2017 dataset "contains mortality and population counts for all US counties. Data are based on death certificates for US residents. Each death certificate identifies a single underlying cause of death and demographic data."

Data made available under CDC Wonder Data Terms of Service.

US Drug Enforcement Agency - Retail Drug Distributions by Drug at the County Level

Automated Reports and Consolidated Ordering System (ARCOS) is a data collection system in which manufacturers and distributors report their controlled substances transactions to the Drug Enforcement Administration (DEA).

Data Commons includes quarterly retail drug distributions from ARCOS Report 1, provided annually from 2006-2017. The 3-digit zip prefixes from the report were aggregated to the county level using 2010 ZIP Code Tabulation Area (ZCTA) Relationship records from the US Census.

Please see the disclaimers page about the scope of the data, as well as the US Department of Justice Legal Policies and Disclaimers Terms of Use.

FBI Uniform Crime Reporting (UCR) Program - Offenses Known to Law Enforcement, by State by City

Table 8 provided by the UCR Program's Crime in the US Report "provides the volume of violent crime (murder and nonnegligent manslaughter, rape, robbery, and aggravated assault) and property crime (burglary, larceny-theft, and motor vehicle theft) as reported by city and town law enforcement agencies (listed alphabetically by state) that contributed data to the UCR Program."

Data made available under US Department of Justice Legal Policies and Disclaimers Terms of Use.

USGS Advanced National Seismic System Comprehensive Earthquake Catalog (ComCat)

Data Commons has data on earthquakes of magnitude 3 onwards starting from 1900. Variables include date, time, location, magnitudes of various types, magnitude error, depth, depth error, and review status.

Data made available under USGS Copyrights and Credits Terms of Service.

NOAA International Best Track Archive for Climate Stewardship (IBTrACS)

Data Commons includes cyclones from the IBTrACS dataset. Variables include name, start date, end date, max wind speed, minimum pressure, max classification, oceanic basin, and affected places.

Data made available under National Weather Service Use of NOAA/NWS Data and Products Terms of Service.

National Climatic Data Center Storm Events Database

Data Commons includes data on all available types of storm events (e.g. Tornado, Hail) and storm episodes in the US from the Storm Events Database. Variables include location, start datetime, end datetime, wind speed, precipitation type and amount, recorded description, affected places, number of direct and indirect injuries, number direct and indirect deaths, property damage cost, crop damage cost, and more.

Data made available under National Weather Service Use of NOAA/NWS Data and Products Terms of Service.

NIFC Interagency Situation Report - 209 (SIT-209)

Data Commons includes data on fires in the US reported via the SIT-209 application, starting from 1999. Variables include name, fire type, fire cause, location, discovery date, controlled date, affected area, and estimated costs.

Data made available under US Forest Service Terms of Service.

Collaboration: Opportunity Insights - Outcomes and Neighborhood Datasets

Opportunity Insights provide datasets on "social mobility and a variety of other outcomes from life expectancy to patent rates by neighborhood, college, parental income level, and racial background." The following datasets predicted outcomes for children by respective factors and are in Data Commons:

  • All Outcomes by Census Tract, Race, Gender and Parental Income Percentile
  • All Outcomes by County, Race, Gender and Parental Income Percentile
  • All Outcomes by Commuting Zone, Race, Gender and Parental Income Percentile
The following provide covariates used throughout the paper or shown in the Opportunity Atlas as neighborhood characteristics at respective levels and are in Data Commons:
  • Neighborhood Characteristics by Census Tract
  • Neighborhood Characteristics by County
  • Neighborhood Characteristics by Commuting Zone
Data made available under Opportunity Insights Data and Data Usage.

Collaboration: Encyclopedia of DNA Elements (ENCODE) - BED (Browser Extensible Data) Files

The ENCODE dataset contains information for approximately 7000 experiments along with 14,000 BED files collected by The Encyclopedia of DNA Elements (ENCODE) Consortium. Example of experiment metadata captured include the target biosample, assay type, gene assembly, etc. Bed files link to individual bed lines, which state the genomic position of individual peaks. Data Commons ingested all experimental data in BED format.

Data made available under: ENCODE Data Use Policy for External Users.

Collaboration: New York Botanical Garden (NYBG) - C. V. Starr Virtual Herbarium

C. V. Starr Virtual Herbarium is a public specimen database with photos and detailed records about millions of plants, fungi, and algae.