Frequently Asked Questions

Q: What is Data Commons?
Please see About Data Commons.

Q: What is the difference between Data Commons and public dataset projects like Dataverse, Kaggle datasets, Google Big Query Public Datasets, Dataset search etc.?
These collections of datasets provide a great service by aggregating topical open data sets. However though the data is open, using it to answer specific questions often involves tedious 'foraging' --- finding the data, cleaning the data, reconciling different formats and schemas, figuring out how to merge data about the same entity from different sources, etc. This error prone and tedious process is repeated, once (or more) by each organization. This is a problem in almost every area of study involving data, from the social sciences and physical sciences to public policy.

Data Commons is an attempt to ameliorate some of this tedium by doing this once, on a large scale and providing cloud accessible APIs to the cleaned, normalized and joined data. While there are millions of datasets and it will be a while before Data Commons includes a substantial fraction of them, in every domain, some collections of data get used more frequently than others. We have started with a core set of these in the hope that useful applications can be built on top of them.

Q: What is the difference between Data Commons and Wikidata?
The focus in Data Commons is on aggregating external, already available data (with an emphasis on statistical data) from government agencies and other authoritative sources, as opposed to creating a corpus of structured data from scratch.

Q: What is the relation between DataCommons.org and Schema.org?
DataCommons.org builds upon on the vocabularies defined by Schema.org, with additional terms defined to cover concepts (e.g. "citizenship") that are important to the data in Data Commons but which have not been a priority for Schema.org-based Web markup. The Data Commons schemas constitute an "external extension" to Schema.org, similar to that provided by GS1. Some schemas could migrate into Schema.org if the community find value in them.

Q: What are the kinds of entities you are resolving now? Which ones are you not resolving?
At this time we are resolving entities as it relates to a geographical location or a place. For example "population", "weather", and "crime statistics" in a particular "city". We will continue to expand the set of entities that will be resolved like organizations, people, events, products, etc. The long term goal is to be able to resolve any arbitrary entity using Reference by Description.

Q: What are the usage rights of the data in Data Commons?
Data Commons knowledge graph, and the compilation of the datasets is licensed under CC BY. The Data Commons REST API and the R, Python Libraries are released under Apache License 2.0. The data included in Data Commons Graph come from different sources. The source of the data (provenance) is provided for all the data. Provenance includes the URL of the source of the data. While effort is made to obtain data from sources which offer unrestricted usage of underlying data, terms of use of this data may be subject to different licenses and terms of use as specified in the URL of the provenance.

Q: How can we access data in Data Commons?
The data in knowledge graph can be accessed through the Data Commons Graph Browser and API's for Python, REST and Google Sheets.

Q: How can we add our own data to knowledge graph?
Data Commons is intended to be a community project and seeks your involvement. To know more about publishing data that can be included into Data Commons, check out Contributing page. You can also contact support@datacommons.org if you have an interesting dataset that you think should be included in Data Commons and would like to help. In the future we plan to allow users to ingest data into the Data Commons Graph using an upload tool. We will update the community when this functionality is released.

Q: What does per capita mean in the Time Series tool?
Different variables are measured over different populations. For example, the number of people with gender equal male (in a given place) is measured over the population of all people. On the other hand, the number of people whose educational attainment is High School is measured over the set of people whose age is 25 years or higher. Depending on the variable, the per capita calculations are done over the population over which the measurement was done.

Q: How long will you store the data for?
Data Commons is not an archival service. We collect the data, build the knowledge graph and provide access to the Graph. As with any website, long term storage and safekeeping of the data is the responsibility of the primary publisher.

Q: Where can I download all the data?
Given the size and evolving nature of the Data Commons Graph, we prefer you access it via the APIs. If your project needs local access to a large fraction of the Data Commons Graph, please contact support@datacommons.org .

Q: How much does this service cost to use?
The public data in the Data Commons Graph is hosted on Google Cloud platform by Data Commons and is made available for users. There is no cost for data itself, when it is publicly available for free. The usage limits for the service beyond free tier quota will be in line with pricing of Big Query Public dataset program. In the future when more data is added to the knowledge graph by users - just like the Web, we expect some data to be free, some data to be private, and some data may have an associated cost to access.

Q: How do we know if the data is accurate?
Data Commons provides an access mechanism to data and makes no commitment on the accuracy of the data. Answers to queries will include the provenance (source of the data). Choice of which data to use, based on source, is in developer's control. There may be errors in cleaning, etc. of the data. If you find something you think is in error, we would love to hear from you.

Q: How often is the data refreshed?
Different data sources refresh at different frequencies. We try to keep the data updated as the sources publish new versions of their data.

Q: What are the SLAs / Performance levels we can expect?
The service is provided on an as-is basis with no SLA or commitments on availability or uptime.

Q: I have a question / feedback. Whom do I contact?
You can post your question on the GitHub forum or contact support@datacommons.org.