Description
Background
One of the issues with using an ontology is that the concept names are prone to being typo'd as programming and example drafting happens. Some mechanism is needed to catch typos.
SHACL does not provide the ability to turn a namespace into a closed concept set. There is an ability to declare a node shape to be sh:closed
, but the effect of this is preventing any properties outside of the sh:property
-associated properties being used, regardless of namespace. CASE and UCO are choosing to not use the sh:closed
property. (One example why: even rdfs:comment
would trigger non-conformance.)
This leaves the current state of ontology-encoded enforcement unable to report that use of this concept in instance data is an error:
<https://ontology.unifiedcyberontology.org/uco/core/name_WITH_TYPO>
Hence, checking for concept typos may be delegated from the ontology to a downstream validation mechanism.
One form of solution for this problem is, in the abstract, to run a set-difference between the CASE and UCO prefixed concepts that are in the ontology versus in the instance data. The CASE-Examples-QC repository did this, with a combination of shell scripting and Python. That repository, until the 0.4.0 release of this repository, was tracked as a submodule to import that functionality as part of unit tests.
Unfortunately, tracking that repository as a dependency (i.e. Git submodule) of this repository introduced a circular Git submodule dependency, eventually looping back to this repository. With the 0.4.0 release of case_utils
, that submodule is gone, but so is its "IRI typo check" functionality that, with all due irony, helped catch a bug as the CASE 0.6.0 release branch of this repository was being drafted.
Solution Summary
No firm solution is offered quite yet. I have two ideas that could be explored, and I'm not sure offhand if they are mutually exclusive:
- A RDFLib
DefinedNamespace
could be used to catch typos at object instantation time within Python.- For example, trying to refer to
rdflib.RDFS.typo
yields the runtime errorAttributeError: term 'typo' not in namespace 'http://www.w3.org/2000/01/rdf-schema#'
. - This might be an unpleasant user experience. A user could be handed CASE data, try to read it, and get a traceback while parsing the graph.
- For example, trying to refer to
- An extra step within
case_validate
might be able to generate SHACLsh:ValidationResult
s by running an independent test like the CASE-Examples-QC test had been doing.- However, it's not clear if this can be "Slipped into" the pySHACL functions that generate the output graph. It might be possible to modify the
rdflib.Graph
holding thesh:ValidationReport
, but it's not clear if thehuman
output could also be influenced.
- However, it's not clear if this can be "Slipped into" the pySHACL functions that generate the output graph. It might be possible to modify the