-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] PySHACL Alternate Modes #60
Comments
Another alternate operating mode could be a highly targeted version. You would be able to specify a single focus-node in the data graph to target, or use one of the SHACL-AF This would allow for the ability to programmatically guide the PySHACL operation using your own external logic. |
I have another idea for a different tool entirely, called shacl-quickcheck. That will be a heavily cut-down version of pyshacl, it will have zero dependencies.
This will result in:
|
First: thank you for all your work on PySHACL. This project and its compliance to the W3 specs has made development of SHACL-based tooling incredibly easy. This issue is tagged as "help wanted". I've been diving into the code base for the
There's a lot of exciting things about the advanced features and your exploration of alternate modes. One thing that intersects with my interests in the project is how many matches a particular rule was evaluated against. There's a whole class of false positives caused by ill-formed shapes that never match anything in the ontology being validated - building an engine that can recognize those cases could be very interesting and possibly spur development in an expansion of the SHACL validation report spec. Two other notes on the
|
I was about to add a new issue - but I guess this fits under here... when validating it is tricky to get the graph closure right - too little and you get validation errors for object types where the object is not in the graph closure - too much and you get validation errors for all the stuff in the graph closure and not the graph you are validating. So, rather than running a specific rule (which is probably a good idea...) I want to be able to target a specific graph in validation. (i.e. ignore the ont_graph contents as focus nodes for validation and entailment.) I also want (currently do this long-hand in scripting) to be able to do entailment+validation step wise - and be able to generate validation checks as a set of entailments and/or transformations are performed on a series of interoperability goals and identify which rules are failing to entail or validate properly. |
Another idea is to validate in a Linked Data context - where object references are resolved (via URI or a local catalog of some form) - could output the local catalog if you wish to persist between executions. |
Hi @rob-metalinkage
PySHACL already has built-in support for validating on multi-graph datasets (using To make it a bit easier for implementation and testing, would you be able to generate small demonstration set of files (shape_graph, ont_graph, and data_graph) which demonstrate both cases:
I don't think I'm quite following what you're asking there. Can you simplify it, or provide an example?
I think another user asked about a similar feature. Could be worth visiting in the future. |
ok here are three tiny graphs for the source, validator and ont_graph and the test report. the extra ont has two things - one a skos:Concept to make the test.ttl valid and an extra invalid reference to show it gets validated as well. |
And its also important that when running advanced features the output graph excludes the content of the ont_graph... I don't think it can be treating as working unless thats the default behaviour. There may be a need to include it bu I can't imagine it - and it would be trivial for users to add after the fact if required. |
Its unclear, are you taking about using SHACL-AF features as they stand now, or talking about potential future alternate modes (as per the discussion in this thread)? In its current form, PySHACL's output graph is a graph containing a validation report and nothing else. I think you're referring to the input data graph. PySHACL creates a working copy of the input datagraph in memory, which it then uses to mix in the ont_graph, apply rdfs/owl inferencing, and apply SHACL-AF Rules. This is done in a working copy to avoid polluting the datagraph (its also a requirement that SHACL validation engines should not modify the datagraph). The When using the If however you're taking about a future alternate mode of operation, where PySHACL is used as a kind of entailment engine, then yes in that case the output graph would be an inflated closure of the input graph, and I agree, it should probably not include the ont_graph contents by default. |
Am talking about the entailment engine idea I guess,. when I'm dealing with a source artefact my in-memory graph is always a working copy, so agnostic about changing it. |
A related topic has come up on the SHACL Mailing list today, in relation to the TopBraid validation engine. It was discussed there that TopBraid has a tool called I think that is a logical starting point to look at how such a mode would work for PySHACL. |
TQ have a few SHACL rules execution components implemented in different places - the programmatically accessible ones work exactly as I have asked for. |
Hi, May I ask if it is possible to get inferred triples using PySHACL? If so, how should I do it? Best, |
Hi @huanyu-li |
wouldn't be too hard eh? put Line 267 in 0e1f643
|
@majidaldo Correct, the implementation of returning the working graph to the user is not difficult. The difficult part is the thought process behind it. The validator's internal working datagraph ( However, I do believe this to be a valuable feature to have, and I am in the planning phase of a major update for PySHACL, that will expand its capabilities beyond just validation. |
do you mind letting the community know of these plans somewhere? a discussion? i say all generated data, owl/rdfs inferenced and shacl rules, should be a separate step that is written out (for skipping expensive calc). |
can shacl rules subsume n3 rules? |
This thread is the discussion. You are already participating in it. And remember there is always the official SHACL community Discord server used for discussion and help topics too: https://discord.gg/RTbGfJqdKB |
I could really use an alternate "ontology mode" that allows SHACL validation of classes instead of only instances. If I have a class, for example "Animal", then I can use SHACL to validate its instances:
This works on DBpedia, where animals are modeled as instances, for example https://dbpedia.org/page/Elephant has rdf:type dbo:Mammal, which is an rdfs:subClass of dbo:Animal. However assume I want to model animals as classes, because an elefant is just a set of actual elefant individuals:
This will not be validated using the before mentioned SHACL shape. However I would like to have an alternate pySHACL mode that does just that. |
(This and Konrad's comment might be worth breaking out into a separate Issue. If GitHub's make-an-Issue button doesn't grab both at once, I'll happily migrate this comment.) (Opinions in this post are my own and nobody's elses. I'm also not a biologist and some of my domain knowledge is likely outdated.) @KonradHoeffner - I think your example of animal taxonomies is an OWL design question more than a SHACL issue. I do have a suggestion that follows on where I think SHACL could be used on OWL-specific tasks. I did try to start a SHACL-based discussion on using OWL to model animal taxonomies, but I ended up getting sidetracked by the specific properties in the example (family and order) being, in my own opinion, better to model as classes because of some specific benefits from OWL entailment (/inference/knowledge expansion). OWL and SHACL might be writable to ensure that the family and order values are related to one another, but I think this example more motivates a different OWL subclass design without putting family and order into object properties. I've left that whole discussion under a block to be expanded by those interested. Your first snippet states kb:thatElephantISaw a :Elefant . If kb:thatElephantISaw a :Animal . Is that how you want your model to work? Take instead a knowledge-refining example. I'll use birds and an OWL design with a different application of metaclasses, that focuses the metaclasses on describing taxonomic levels. :Animal a owl:Class .
:Bird a owl:Class ; rdfs:subClassOf :Animal .
:Eagle a owl:Class ; rdfs:subClassOf :Bird .
:Seagull a owl:Class ; rdfs:subClassOf :Bird . One day, you take a picture of a bird flying high, and can only see a silhouette. You can record this in your journal-graph: :thatFlyingBirdISaw a :Bird . Later, you check silhouette references and conclude that, among your taxonomy, eagle's the most likely answer, and note so in the same journal-graph: :thatFlyingBirdISaw a :Eagle . You've made your graph more precise, and your prior triple is now redundant from entailment. OWL entailment of the latter triple would expand your graph to include: :thatFlyingBirdISaw a :Bird .
:thatFlyingBirdISaw a :Animal . Going back to the family and order classifications, one taxonomy design could do the tree-of-life division (kingdom, phylum, class, order, family, genus, specie - apologies in advance if it's outdated, I think the last time I thought of that whole ordering was 20 years ago). Somewhere in there, (Bald) Eagle's class hierarchy would show up as: :Animalia a owl:Class .
:Accipitriformes a owl:Class, rdfs:subClassOf :Animalia . # (This skipped a few steps.)
:Accipitridae a owl:Class, rdfs:subClassOf :Accipitriformes .
:BaldEagle rdfs:subClassOf :Accipitridae . # (This skipped a few steps.) A metaclass based design could note the family and order: :Animalia a owl:Class , :TaxonomicKingdom .
:Accipitriformes a owl:Class , :TaxonomicFamily .
:Accipitridae a owl:Class , :TaxonomicOrder ; rdfs:subClassOf :Accipitriformes . Then, these are entailed, and look (to me) correct - the eagle-individual is an instance of If you're curious what the family of SELECT ?nFamily
WHERE {
:thatFlyingBirdISaw a/rdfs:subClassOf* ?nFamily .
?nFamily a :TaxonomicFamily .
} From the above, I meant to supplement what I saw in the discussion thread on StackOverflow. In summary, I don't think your example demonstrates a SHACL problem with reviewing OWL - it looks particular to a question on when to use metaclasses. You could write SHACL to require all instances of :MustHaveFamily-shape
a sh:NodeShape ;
sh:targetClass :Animal ;
sh:sparql [
a sh:SPARQLConstraint ;
sh:description "Find all animal individuals that do not have a taxonomic family specified."@en ;
sh:message "Focus node is not a subclass of a class that is a taxonomic family instance."@en .
sh:select """
SELECT $this
WHERE {
$this a ?nClass .
FILTER NOT EXISTS {
?nClass rdfs:subClassOf* ?nFamilyClass .
?nFamilyClass a :TaxonomicFamily .
}
}
""" ;
] ;
. Warning - While I think the constraint will give correct answers, I do not think it would be fast to execute, because it's necessarily searching for absent information. Back to SHACL and specific review of OWL class and property design (i.e. the TBox), rather than OWL individuals-data (i.e. the ABox): SHACL can be used to review OWL syntax and constructs, such as for conformance versus the OWL to RDF mapping, or for consistency checking between OWL definitions and SHACL shapes like done in this shape 1 that checks that Footnotes
|
@ajnelson-nist: Wow, thank you for the extremely detailed answer! While I totally agree with your points in theory, I want to explain my experiences and motivation for such a mode: In Semantic Web ontology / knowledge base research projects there are often three groups: the domain experts (A), the ontologists (B) and the Semantic Web / Linked Open Data people (C). A: Know everything about the domain but are not experts in ontologies or Semantic Web technologies. Can model their domain into an ontology / knowledge base with good tooling provided by C but have difficulties understanding some theoretical differences like subclass vs part of. The domain they are modelling contains extensive hierarchies and the concepts they are describing are abstract (like "Elephant", "Hospital" and so on), so while validation would be easy with SHACL if they would model a knowledge base (i.e. individuals), the particularities of the domain are better expressed with an ontology of classes, even though it is more of an "ontology light" or a "knowledge base with classes instead of individuals". Because in the end, the data is entered with a table-like tool so there are only database-like relations + hierarchies at this stage. At the end, what the research project needs is a simple method to validate basic errors such as missing values, wrong cardinalities, invalid references or so, complex reasoning is not needed. If it can be integrated into a continuous integration, like a GitHub action, all the better. However after reading your take again and writing this, I guess I should just accept that metaclasses are the correct solution here and just add them. |
I've developed a Python-coordinated 'rules' engine around Oxigraph.
This works for my use with practicality in mind but provides a pathway to elegance, After reviewing the pySHACL and owlrl codebases, I felt that both could make use of a common rule system. Side: I'm not an ontologist, but why isn't sparql used for inferencing? |
A SHACL 'infer' capability would be a great idea! Do you have any additional thoughts on your planning? |
PySHACL was originally built to be a basic (but fully standards compliant) SHACL validator. That is, it uses SHACL shapes to check conformance of a data graph, and gives you the result (
True
/False
, plus aValidationReport
).PySHACL does that job quite well. It can be called from python or from the command line, and it delivers the results users expect.
Over the last 12 months, I've been slowly implementing more of the SHACL Advanced Features spec, and pySHACL is now almost AF-complete.
The Advanced features add capability to SHACL which extends beyond that of just validating. Eg, the SHACL Rules allow you to run SHACL-based entailment on your data graph. SHACL Functions allow you to execute parameterised custom SPARQL Functions over the data graph. Custom Targets allow you to bypass the standard SHACL node-targeting mechanism and use SPARQL to select targets.
These features can use useful to execute validation in a more customisable way, but their major benefit is in the general use outside of just validating a data graph against constraints.
With these new features I see the possibility of PySHACL operating in additional alternative modes, besides that of just validating. Eg, expansion mode could run SHACL-AF Functions and Rules on the data graph, then return the expanded data graph (without validating).
Related to #20
The text was updated successfully, but these errors were encountered: