Frequently Asked Questions

Q: What is Data Commons?

Data Commons, an open source initiative from Google, organizes the world’s publicly available information and makes it more accessible and useful. Learn more on About Data Commons.

Q: Who can use Data Commons?

Data Commons is available for anyone to use. Our goal is to make the world’s publicly available data more helpful to people and organizations working on the big societal challenges like climate change, food security, or economic inequity.

Q: Is Data Commons free or is there a cost to use it?

There is no cost for the publicly available data, which is hosted on Google Cloud by Data Commons. For individuals or organizations who exceed the free usage limits, pricing will be in line with the BigQuery public dataset program.

Q: What is the new Explore interface to Data Commons?

Data Commons has a new Explore interface that uses large language models (LLMs) to map your natural language question to the public data sets to extract the right visualizations to your question. We do not use LLMs to generate any data or visualizations; all responses are based on real data with sourced provenance from Data Commons.

Q: How do you choose which dataset to show in the Explore interface?

The LLMs powering Data Commons’ Explore interface use generative AI to identify the most likely response to your query. As we continue to improve the interface, we will look to provide more options that allow the user to select sources themselves. For now, you can always click the “Explore in …” Tool setting to change the source of data.

Q: Can I submit or suggest data I think should be added to Data Commons?

Yes, Data Commons is meant to be for the community, by the community and we welcome new submissions or suggestions. If you’d like to submit data, please review these resources and follow this development process. If you have a suggestion, please use this Google Form.

Q: How can we access data in Data Commons?

The data in knowledge graph can be accessed through the Data Commons Graph Browser, Data Commons Visualization tools, and APIs for Python, REST and Google Sheets.

Q: Where can I download all the data?

Given the size and evolving nature of the Data Commons knowledge graph, we prefer you access it via the APIs. If your project needs local access to a large fraction of the Data Commons Graph, please fill out this form.

Q: How do we know if the data is accurate?

Data Commons provides an access mechanism to data, but cannot ensure accuracy. To provide as much context as possible, answers to queries will include the provenance (source of the data). The choice of which data to use is up to individuals. If you find something you think is in error, we would love to hear from you.

Q: How often is the data refreshed?

Different data sources refresh at different frequencies. We try to keep the data updated as the sources publish new versions of their data. If you see something out of date, please file an issue on Github.

Q: What are the SLAs / Performance levels we can expect?

The service is provided on an as-is basis with no SLA or commitments on availability or uptime.

Q: How do I cite datacommons.org?

To cite charts and tools on this site, please use the following format.

Data Commons 2024, Data Commons, viewed 27 Apr 2024, <https://datacommons.org>.

If citing data from a particular dataset, e.g. CDC Places, then use:

Data Commons 2024, CDC Places, electronic dataset, Data Commons, viewed 27 Apr 2024, <https://datacommons.org>.

In both cases, please use the date you viewed the site (in the examples above, we used 27 Apr 2024).

Q: What is the difference between Data Commons and other other public dataset projects?

Many public dataset projects provide a great service by aggregating topical open data sets. However, using those data sets to answer specific questions often involves 'foraging' — finding the data, cleaning the data, reconciling different formats and schemas, figuring out how to merge data about the same entity from different sources, etc. This error prone and tedious process is repeated, once (or more) by each organization working on an issue. This is a challenge in almost every area of study involving data, from the social sciences and physical sciences to public policy. Data Commons does this work once, on a large scale, and provides cloud accessible APIs to the cleaned, normalized and joined data. While there are millions of datasets in every domain, some collections of data get used more frequently than others. We have started with a core set of these (over 120) in the hope that useful applications can be built on top of them.

Q: What is the difference between Data Commons and Wikidata?

The focus in Data Commons is on aggregating external, already available data (with an emphasis on statistical data) from government agencies and other authoritative sources.

Q: What is the relation between DataCommons.org and Schema.org?

DataCommons.org builds upon the vocabularies defined by Schema.org, with additional terms defined to cover concepts (e.g. "citizenship") that are important to the data in Data Commons but which have not been a priority for Schema.org-based Web markup. The Data Commons schemas constitute an "external extension" to Schema.org, similar to that provided by GS1. Some schemas could migrate into Schema.org if the community finds value in them.

Q: What are the usage rights of the data in Data Commons?

The Data Commons knowledge graph, and the compilation of the datasets is licensed under CC BY. The Data Commons REST API and the R, Python Libraries are released under Apache License 2.0. The data included in Data Commons come from different sources. The data provenance is provided for all the data, including a link to the source. While we make every effort to obtain data from sources offering unrestricted usage of underlying data, terms of use of data may be subject to different licenses and terms of use, specified in linked source of the data.

Q: Can my educational institution use Data Commons while complying with the Family Educational Rights Privacy Act (FERPA) and/or similar state privacy requirements?

Data Commons collects no personal information (PII), records, or private information from users and can be used in compliance with FERPA. For specific questions about FERPA compliance, please contact your organization’s legal counsel for advice.

Q: What data do you collect about me?

Data Commons uses Google Analytics to collect non-identifiable usage data to improve the product. We log all queries asked in the Search tool, but do not associate IP address or any other identifiers with the queries. We do use in-session cookies to be able to manage state.

Q: How do I ask a question or offer feedback on Data Commons?

You can post your question on the GitHub forum or fill out this feedback form.