Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add morph-kgc materialize #220

Merged
merged 4 commits into from
Feb 24, 2022
Merged

Conversation

Mec-iS
Copy link
Contributor

@Mec-iS Mec-iS commented Feb 23, 2022

Please try pytest --nbmake examples/ex6_2.ipynb

New expected behaviour

see #108 (comment)

Change logs

Add the materialized() method to kglab.py.
Add an example for it at ex6_2.ipynb

Add docstring at the top of kglab.py

@Mec-iS Mec-iS requested a review from ceteri February 23, 2022 23:08
@arenas-guerrero-julian
Copy link
Contributor

Hi @Mec-iS ,

Keep in mind that materialize can also receive a config in the form of a string and not a path. E.g.:

config = """
            [DataSource1]
            mappings=/path/to/mapping/mapping_file.rml.ttl
            db_url=mysql+pymysql://user:password@localhost:3306/db_name
         """

graph = morph_kgc.materialize(config)

see doc

@Mec-iS
Copy link
Contributor Author

Mec-iS commented Feb 24, 2022

yes thanks. I am just doing it step by step to test the integration.

@Mec-iS
Copy link
Contributor Author

Mec-iS commented Feb 24, 2022

It looks like the file we currently use as test recipe.ttl has a faulty header according to morph-kgc

~/drwn/.venv/lib/python3.8/site-packages/morph_kgc/args_parser.py in load_config_from_argument(config_entry)
     86     config = Config(interpolation=ExtendedInterpolation())
     87     if os.path.isfile(config_entry):
---> 88         config.read(config_entry)
     89     else:
     90         # it is a string

MissingSectionHeaderError: File contains no section headers.
file: '/home/lorenzo/drwn/kglab/dat/recipes.ttl', line: 1
'@prefix dct:  <http://purl.org/dc/terms/> .\n'

@arenas-guerrero-julian
Copy link
Contributor

Hi @Mec-iS ,

materialize expects a .ini file, not a .ttl. For testing morph-kgc with recipe you would need to define an RML mapping from recipes.csv to recipes.ttl. And prepare the config.ini.

Other option is to use one of the examples in the morph-kgc repo e.g. the json-example

@Mec-iS
Copy link
Contributor Author

Mec-iS commented Feb 24, 2022

Thanks for the feedback. I am trying to establish a baseline.
I have added a default ini as taken from here.

What is the expected way to

define an RML mapping from recipes.csv to recipes.ttl

?

In the kglab repository we already have both formats, do you mean something like this?
I think I am missing the point on how to generate the mapping file, .rml.ttl or .rml.csv.

@ceteri
Copy link
Collaborator

ceteri commented Feb 24, 2022

MissingSectionHeaderError: File contains no section headers.
file: '/home/lorenzo/drwn/kglab/dat/recipes.ttl', line: 1
'@Prefix dct: http://purl.org/dc/terms/ .\n'

Does that File contains no section headers message refers to the input .ini file, and not the recipes.ttl file?
There's no particular definition of "section headers" in RDF files.

We've used recipes.ttl with a number of different platforms and validators, with no errors.

It does go into @prefix definitions without specifying the optional @base – if that may have triggered a warning?

@ceteri
Copy link
Collaborator

ceteri commented Feb 24, 2022

In the kglab repository we already have both formats, do you mean something like this?

For this integration, the biggest use cases will be to formalize how sources from SQL, CSV, etc., can be ingested into an RDF graph. We're not looking at means to translate between serialization formats, e.g., go between CSV and TTL.

While there are means of importing CSV already in kglab, through the csvwlib integration, this integration with morph-kgc would provide superior means, and also ways to parallelize and make these kinds of inputs much more efficient.

In the most immediate use cases (for our colleagues in Madrid and Murcia) they have many smaller SQL databases and thousands of CSV files, so there are performance issues at scale for ingest, and Morph can really help! :)

@ceteri ceteri marked this pull request as ready for review February 24, 2022 19:19
Copy link
Collaborator

@ceteri ceteri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @Mec-iS !!

@ceteri ceteri merged commit d9f54d2 into DerwenAI:main Feb 24, 2022
@Mec-iS
Copy link
Contributor Author

Mec-iS commented Feb 24, 2022

@ceteri

i don't think this is finished because the mapping RML file is missing (what in the morph-kgl documentation has extension .rml.ttl). The ttl file is not enough to make it work, there should be a way to generate the RML mapping from a ttl or CSV.

@ceteri
Copy link
Collaborator

ceteri commented Feb 24, 2022

@Mec-iS
Thank you, I've reverted this merge.

Was just trying examples within my own branch and ran into problems with the csv-examples in Morph, when running locally.

Instead of applying an RML mapping to a TTL file as input, how about if we show an example in the tutorial notebook that takes 2 simple CSV files? This is a general pattern among users, where they already have node+edge files as CSV.

@ArenasGuerreroJulian do you have an simple RML for CSV examples? Something like the proverbial minimum viable that would show nodes and edges for a simple graph? That would help for our community, where they aren't familiar with RML yet. For example, how about something like this? https://rml.io/yarrrml/tutorial/getting-started/#example

@ceteri
Copy link
Collaborator

ceteri commented Feb 24, 2022

@Mec-iS here's a branch with preparations for a new release, for the morph-kgc integration:
https://github.com/DerwenAI/kglab/tree/morph-kgc

  • moved example notebook to 2.1, along with other discussion about imports and serialization
  • resolved pylint/mypy warnings
  • added changenotes
  • bumped version to 0.4.4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants