Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add morph-kgc materialize #220

Merged
merged 4 commits into from
Feb 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,9 @@ See:
width="231"
/>

## Test
It is possible to test against IPython Notebooks with: `pytest --nbmake examples/*ipynb`


## License and Copyright

Expand Down
36 changes: 36 additions & 0 deletions dat/morph-default-config.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
[DEFAULT]
main_dir: .
mappings_dir: .


[CONFIGURATION]

# INPUT
na_filter=yes
na_values=,#N/A,N/A,#N/A N/A,n/a,NA,<NA>,#NA,NULL,null,NaN,nan,None

# OUTPUT
output_dir=${main_dir}/morph-output
output_file=result
output_format=N-QUADS
clean_output_dir=no
only_printable_characters=no
safe_percent_encoding=

# MAPPINGS
mapping_partition=PARTIAL-AGGREGATIONS
infer_sql_datatypes=no

# MATERIALIZATION
chunksize=100000

# MULTIPROCESSING
number_of_processes=2

# LOGS
logging_level=INFO
logs_file=


[DataSource1]
mappings=${main_dir}/recipes.ttl
Empty file added dat/morph-output/.gitkeep
Empty file.
117 changes: 117 additions & 0 deletions examples/ex6_2.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# for use in tutorial and development; do not include this `sys.path` change in production:\n",
"import sys ; sys.path.insert(0, \"../\")\n",
"\n",
"from icecream import ic"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Load data via Morph-KGC\n",
"\n",
"> [`morph-kgc`](https://github.com/oeg-upm/morph-kgc) is an engine that constructs RDF knowledge graphs from heterogeneous data sources with R2RML and RML mapping languages. Morph-KGC is built on top of pandas and it leverages mapping partitions to significantly reduce execution times and memory consumption for large data sources.\n",
"\n",
"Data can be loaded from multiple text format but also via different ORMs (i.e. SQLAlchemy), via a config file with extension `.ini`.\n",
"\n",
"For documentation see [USAGE](https://github.com/oeg-upm/Morph-KGC/wiki/Usage)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, let's load our recipe KG:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from os.path import dirname\n",
"import kglab\n",
"import os\n",
"\n",
"namespaces = {\n",
" \"nom\": \"http://example.org/#\",\n",
" \"wtm\": \"http://purl.org/heals/food/\",\n",
" \"ind\": \"http://purl.org/heals/ingredient/\",\n",
" \"skos\": \"http://www.w3.org/2004/02/skos/core#\",\n",
" }\n",
"\n",
"kg = kglab.KnowledgeGraph(\n",
" name = \"A recipe KG example based on Food.com\",\n",
" base_uri = \"https://www.food.com/recipe/\",\n",
" namespaces = namespaces,\n",
" )\n",
"\n",
"datapath = dirname(os.getcwd()) + \"/dat/recipes.ttl\"\n",
"configpath = dirname(os.getcwd()) + \"/dat/morph-default-config.ini\"\n",
"\n",
"# config = f\"\"\"[DataSource1]\n",
"# mappings={datapath}\"\"\"\n",
"\n",
"print(configpath)\n",
"kg.materialize(configpath)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's try to query."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"sparql = \"\"\"\n",
" SELECT ?subject ?object\n",
" WHERE {\n",
" ?subject rdf:type wtm:Recipe .\n",
" ?subject wtm:hasIngredient ?object .\n",
" }\n",
" \"\"\"\n",
"\n",
"for row in kg._g.query(sparql):\n",
" ic(row.asdict())"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
21 changes: 18 additions & 3 deletions kglab/kglab.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# see license https://github.com/DerwenAI/kglab#license-and-copyright
""" KG Lab main class

see license https://github.com/DerwenAI/kglab#license-and-copyright
"""
######################################################################
## kglab - core classes

Expand Down Expand Up @@ -35,6 +35,9 @@
import typing
import urlpath # type: ignore # pylint: disable=E0401

### third-parties bindings
import morph_kgc

if get_gpu_count() > 0:
import cudf # type: ignore # pylint: disable=E0401

Expand Down Expand Up @@ -1449,3 +1452,15 @@ def infer_skos_hierarchical_mappings (
self.add(s, _skos.narrower, o)
else:
self.remove(s, _skos.narrowMatch, o)

def materialize(self, config: str) -> rdflib.Graph:
""" Binding to morph-kgc `materialize()` """

if len(self._g) == 0:
# generate the triples and load them to an RDFlib graph
self._g = morph_kgc.materialize(config)
else:
# merge
# for caveats about merging this way:
# <https://rdflib.readthedocs.io/en/stable/merging.html>
self._g.parse(morph_kgc.materialize(config))
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ gcsfs >= 0.7.1
gensim >= 3.8.3
icecream >= 2.1
matplotlib >= 3.3.4
morph-kgc >= 1.5
networkx >= 2.6
numpy >= 1.19.2
owlrl >= 6.0.2
Expand Down