This document introduces an extension to SHACL called Dynamic SHACL.
In standard SHACL, the values of constraint parameters such as sh:maxCount
are constant values such as the RDF literal 1
.
In Dynamic SHACL, these values may be SHACL Node Expressions that are computed before the validation starts.
This added flexibility significantly expands the expressivity of the SHACL language, in particular for
constraints that apply with different values depending on the context.
This document describes what will be available in TopBraid 8.3 and above.
In general, SHACL constraints apply to all target nodes of a shape.
For example, a sh:in
constraint at a property ex:state
applies to all instances of the following class:
ex:Address
a rdfs:Class, sh:NodeShape ;
sh:property ex:Address-state ;
.
ex:Address-state
a sh:PropertyShape ;
sh:path ex:state ;
sh:in ( "AL" "AK" "AZ" "AR" ... ) ; # Here, the US states
.
Imagine instance data such as this:
ex:ArizonaAddress1
a ex:Address ;
ex:street "123 John Muir Ave" ;
ex:country ex:USA ;
ex:state "AZ" ;
.
ex:QueenslandAddress1
a ex:Address ;
ex:street "123 Bob Katter Cl" ;
ex:country ex:Australia ;
ex:state "QLD" ;
.
In the example above, the QLD address obviously violates the sh:in
constraint because that is limited to the US states.
A recurring requirement for SHACL ontologies is to define constraints that apply only to certain instances of a class, or under certain circumstances. For example, we may want to express that If the Address is inside of Australia then the valid country codes are ACT, NSW, NT, QLD, SA, TAS, VIC and WA.
The following techniques can be used with standard SHACL to express this.
One technique is to define distinct subclasses such as ex:USAddress
and ex:AUAddress
and redefine a
new sh:in
constraint at each.
This however requires changes to the instance data and would likely lead to an artificial explosion of types for all combinations
of distinguishing conditions.
Also, an instance of Address would need to dynamically change its rdf:type
depending on the value of ex:country
.
Another technique to define conditional constraints is to define a shape that uses a
SPARQL-based Target.
In the SPARQL query it would be possible to target exactly the addresses that have country ex:Australia
.
However, this is not efficient as we would need to walk through all SPARQL queries to identify which ones apply.
Furthermore this isn't really declarative and a lot of business logic is hidden in SPARQL strings.
We could also express all cases through sh:or
and sh:hasValue
, e.g.
ex:Address
a rdfs:Class, sh:NodeShape ;
sh:property ex:Address-state ;
sh:or (
[
sh:property [
sh:path ex:country ;
sh:hasValue ex:USA ;
] ;
sh:property [
sh:path ex:state ;
sh:in ( "AL" "AK" "AZ" "AR" ... ) ;
]
]
[
sh:property [
sh:path ex:country ;
sh:hasValue ex:Australia ;
] ;
sh:property [
sh:path ex:state ;
sh:in ( "ACT" "NSW" "NT" "QLD" "SA" "TAS" "VIC" "WA" ) ;
]
]
) ;
.
ex:Address-state
a sh:PropertyShape ;
sh:path ex:state ;
.
This solution is quite convoluted.
Although it would work for validation, it would be next to impossible for non-validation use cases such as user interface generators
to use this information.
In particular, a common technique in form builders would be to display a drop-down list whenever a property declares
a sh:in
constraint, see Enum Select Editor.
A static analysis of the constraint would have trouble understanding the intent here as the sh:in
is hidden deep within
the shape definition.
Human readers would also struggle to parse the meaning of this ontology.
Also this structure isn't modular or model-driven, nor extensible.
This technique works best if the valid state codes are attached to the country instances:
ex:Australia
a ex:Country ;
ex:stateCode "ACT", "NSW", "NT", "QLD", "SA", "TAS", "VIC", "WA" ;
.
ex:USA
a ex:Country ;
ex:stateCode "AL", "AK", "AZ", "AR", ... ;
.
This is a nice declarative and extensible solution. Using this background info the constraint can be expressed like:
ex:Address-state
a sh:PropertyShape ;
sh:path ex:state ;
sh:sparql [
a sh:SPARQLConstraint ;
sh:message "State is not among those declared for the country" ;
sh:prefixes ... ; # omitted
sh:select """
SELECT $this ?value
WHERE {
$this ex:state ?value .
FILTER NOT EXISTS {
$this ex:country ?country .
?country ex:stateCode ?value .
}
}
""" ;
] .
The query above will return all instances of Address ($this) that have a state that is not listed as ex:stateCode
for the ex:country
of the address.
Such instances will be flagged as constraint violations.
This is a reasonable solution assuming you're ready to use SPARQL, but it suffers from the same drawback as all other solutions from this section: that this information can only be used for constraint validation but hardly for any other purpose such as user interface generation.
In the proposed Dynamic SHACL extension, the values of a constraint parameter such as sh:in
can be computed
dynamically based on SHACL AF Node Expressions.
In general, Node Expressions take a focus node as input and produce any list of result nodes where the nodes may
be computed by looking up property values elsewhere, by filtering values, by applying operations such as union and minus
or even by executing a SPARQL query.
Using Dynamic SHACL, the country state example above could be expressed using the following techniques.
This Dynamic SHACL solution relies on the same helper structure from .
It computes the values of the sh:in
by fetching all values from the SHACL path expression
ex:country / ex:stateCode
.
ex:Address-state
a sh:PropertyShape ;
sh:path ex:state ;
sh:in [
sh:path ( ex:country ex:stateCode )
] .
This is elegant because it clearly expresses that the valid values for ex:state
are from an enumeration by
using sh:in
which instructs a user interface builder to pick something like a drop down selection list.
It also provides enough information to any engine to compute the valid values beforehand, from a node expression.
In this solution, the value of sh:in
is a SHACL Node Expression of type sh:select
which is basically any SPARQL query that returns a collection of nodes as result variables.
ex:Address
a rdfs:Class, sh:NodeShape ;
sh:property ex:Address-state ;
.
ex:Address-state
a sh:PropertyShape ;
sh:path ex:state ;
sh:in [
sh:select """
SELECT ?stateCode
WHERE {
$this ex:country/ex:stateCode ?stateCode .
}
"""
]
.
This is more general than the solution from because you can use any SPARQL feature for example to perform joins, look up values and convert literal datatypes, or even to build new string values.
This document (draft) only scratches the surface of what is possible with dynamic SHACL.
TopBraid 8.3 implements support for sh:in
as described.
Most other constraint types such as sh:minCount
or sh:class
could also benefit from
the added expressiveness of Dynamic SHACL.
TODO: Mention that some node expressions do not depend on the focus node and may therefore be computed statically for all target nodes, yielding good performance.
TODO: Mention that node expressions can also be used at sh:deactivated