This is a repository for tools and pipelines for importing data into Data Commons.
Data Commons is an Open Knowledge Graph that provides a unified view across multiple public data sets and statistics. It includes APIs and visual tools to easily explore and analyze data across different datasets without data cleaning or joining.
Detailed documentation on the Import Tool is available here.
-
Make sure Java 11+ is installed(download link).
-
Download the tool and run it with:
java -jar <path-to-jar> lint <list of mcf/tmcf/csv files>
It's useful to create an alias like
alias dc-import='java -jar <path-to-jar>'
so you can invoke the tool as
dc-import lint
-
If there are warnings or errors, the tool will produce a JSON report with a table of exemplar errors.
-
It's useful to install an extension like Json-As-Table to view the JSON report (but be sure to allow the extension access to file URLs like this).
Another option is to copy/paste the JSON content in jsongrid.
-
-
To see the list of flags that can be used and what the default values are:
dc-import --help
.
- The tools are built using Apache Maven version 3.8.0.
- For MacOS:
brew install maven
- For MacOS:
- The tools use protobuf and require that
protoc
be installed.- For MacOS:
brew install protoc
- For MacOS:
- Make sure Java 11+ (but not Java 16) is installed
- You can install it from here
- Check what version of Java Maven is using:
mvn --version
- If Maven is using Java 16:
- Open
~/.bash_profile
- Add
export JAVA_HOME=<Path to your downloaded Java 11>
- Save your change and run
source ~/.bash_profile
- Open
You can build and test the Java code from a Unix shell.
To build: mvn compile
To run tests: mvn test
To build binary: mvn package
-
which will produce
tool/target/datacommons-import-tool-0.1-alpha.1-jar-with-dependencies.jar
-
and you can run it with
java -jar tool/target/datacommons-import-tool-0.1-alpha.1-jar-with-dependencies.jar
The repo also hosts an experimental server for private DC.
To build: mvn compile
To run tests: mvn test
To build binary: mvn package
-
which will produce
server/target/datacommons-server-0.1-alpha.1.jar
-
and you can run it with
java -jar server/target/datacommons-server-0.1-alpha.1.jar <file1.tmcf> <file2.csv>
Send a request:
curl http://localhost:8080/stat/series?place=country/USA&statVar=<statVar>
Then should see "Hello World!" in the console output.
The code is formatted using
google-java-format
. Please
follow instructions in the
README
to integrate with IntelliJ/Eclipse IDEs.
The formatting is done as part of the build. It can be checked by running:
mvn com.coveo:fmt-maven-plugin:check
From the repo page, click on "Fork" button to fork the repo.
Clone your forked repo to your desktop.
Add datacommonsorg/import repo as a remote:
git remote add dc https://github.com/datacommonsorg/import.git
Every time when you want to send a Pull Request, do the following steps:
git checkout master
git pull dc master
git checkout -b new_branch_name
# Make some code change
git add .
git commit -m "commit message"
git push -u origin new_branch_name
Then in your forked repo, you can send a Pull Request. If this is your first time contributing to a Google Open Source project, you may need to follow the steps in contributing.md.
Wait for approval of the Pull Request and merge the change.
Apache 2.0
For general questions or issues, please open an issue on our
issues page. For all other
questions, please send an email to [email protected]
.
Note - This is not an officially supported Google product.