When applicable, disclaimers for data provenances will go here, in alphabetical order.

  • Federal Election Commission (FEC): financial reports are associated with the candidate, rather than the candidate's election campaign. Here is a snippet from an email response from the FEC regarding the All Candidates dataset. The original inquiry was whether FEC could provide annual rather than bi-annual summaries so that we could better associate financial activity to a campaign instead of a candidate (contact support@datacommons.org for full exchange): "For the website and for the data files we don’t try to pull apart special and general election data. It’s just programmatically very hard to do because, as you know, specials can happen at any time. If she’s looking for special election money only, she’d need to determine the filing period and then sum the financial data from those reports. For example NC-03’s, (link omitted), financial disclosure period is January 1, 2019 (pre-primary) through July 29, 2019 (post-general). As you know, candidates can raise and report contributions for the 2020 general on any of these reports. If she really wants only special election data, she’d need to look at transaction level data."
  • US Drug Enforcement Administration (DEA) Automated Reports and Consolidated Ordering System (ARCOS) dataset: ARCOS Report 1 includes quarterly retail drug distributions (by total weight in grams), for each State and Drug. Data for each state is reported at the 3-digit zip code level. We deduce that these are zip code prefixes, matching similar work by other researchers. To convert to county based results, the 3-digit zip codes were first expanded to 5-digit ZCTA codes using population proportions from the 2010 Census. The 2010 Census also includes percentage population per ZCTA-County relationship, which was used to weight the summed results at the county level. This was done across all ARCOS report years, from 2006-2017.

    Due to missing ZCTA-County relationship data, the following reported information from ARCOS is missing from Data Commons:
    • Missing states: American Samoa (60), Guam (66), U.S. Virgin Islands (78)
    • Missing zip prefixes: California (901, 942, 965), Florida (332), Georgia (311), Louisiana (702), Massachusetts (091), Tennessee (375), Puerto Rico (000)
    • ZCTA prefixes with total population of 0 - District of Columbia (202, 204), Texas (753, 772)
    • 212 ZCTA’s with null reported populations
  • US Drug Enforcement Administration (DEA) Controlled Substances dataset: Only 78 drug codes and names reported in the US DEA ARCOS report are included in Data Commons. Due to truncation issues in the ARCOS reports, drug names were resolved manually from the following sources:
  • We source our geological data from the United States Geological Survey. We do not attempt to accurately reflect geopolitical situation on the ground or territorial control.
  • National Fire Aviation and Management SIT-209 Fire Incident Reports: Data Commons includes selected fields from the final record of each wildfire, wildland fire use, prescribed fire, and complex fire incident. Data Commons also computes yearly aggregates based on the discovery date and reported latitude and longitude coordinates. These aggregates are based on incidents reported by dispatchers using the SIT-209 application and may not represent the true count of fires for a given place and year. Data Commons represents both the incident start date (used until 2013) and discovery date (used since 2014) as discovery date for consistency across years.
  • US Department of Labor ETA 539 Reports: Data Commons includes selected fields from the ETA 539 CSV. We attribute statistics to the week the data is about, not the week the data was filed.

    States may update any of their data at any time, so to reflect recent changes, Data Commons updates the last two years of data for all states on a daily basis.

    Data Commons also computes two types of aggregates:
    1. Summing state claim counts to USA claim counts.
    2. Summing initial and continued claims to total claim counts.
    These aggregates are marked in our KG with "dcAggregate" in the measurementMethod property of Observation nodes.

    Data Commons reports all numbers as-is. Note that the source CSV itself has anomalous data likely due to data entry errors, but according to the Office of Unemployment Insurance, "corrections to historical figures are not being actively sought from states offices at this time".