Skip to content

Some function should confirm CASE- and UCO-prefixed concepts used in instance data exist #40

Closed
@ajnelson-nist

Description

@ajnelson-nist

Background

One of the issues with using an ontology is that the concept names are prone to being typo'd as programming and example drafting happens. Some mechanism is needed to catch typos.

SHACL does not provide the ability to turn a namespace into a closed concept set. There is an ability to declare a node shape to be sh:closed, but the effect of this is preventing any properties outside of the sh:property-associated properties being used, regardless of namespace. CASE and UCO are choosing to not use the sh:closed property. (One example why: even rdfs:comment would trigger non-conformance.)

This leaves the current state of ontology-encoded enforcement unable to report that use of this concept in instance data is an error:

<https://ontology.unifiedcyberontology.org/uco/core/name_WITH_TYPO>

Hence, checking for concept typos may be delegated from the ontology to a downstream validation mechanism.

One form of solution for this problem is, in the abstract, to run a set-difference between the CASE and UCO prefixed concepts that are in the ontology versus in the instance data. The CASE-Examples-QC repository did this, with a combination of shell scripting and Python. That repository, until the 0.4.0 release of this repository, was tracked as a submodule to import that functionality as part of unit tests.

Unfortunately, tracking that repository as a dependency (i.e. Git submodule) of this repository introduced a circular Git submodule dependency, eventually looping back to this repository. With the 0.4.0 release of case_utils, that submodule is gone, but so is its "IRI typo check" functionality that, with all due irony, helped catch a bug as the CASE 0.6.0 release branch of this repository was being drafted.

Solution Summary

No firm solution is offered quite yet. I have two ideas that could be explored, and I'm not sure offhand if they are mutually exclusive:

  1. A RDFLib DefinedNamespace could be used to catch typos at object instantation time within Python.
    • For example, trying to refer to rdflib.RDFS.typo yields the runtime error AttributeError: term 'typo' not in namespace 'http://www.w3.org/2000/01/rdf-schema#'.
    • This might be an unpleasant user experience. A user could be handed CASE data, try to read it, and get a traceback while parsing the graph.
  2. An extra step within case_validate might be able to generate SHACL sh:ValidationResults by running an independent test like the CASE-Examples-QC test had been doing.
    • However, it's not clear if this can be "Slipped into" the pySHACL functions that generate the output graph. It might be possible to modify the rdflib.Graph holding the sh:ValidationReport, but it's not clear if the human output could also be influenced.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions