This document introduces a declarative RDF data model to represent suggestions that can be used by tools to repair SHACL constraint violations [[shacl]]. The framework uses SPARQL [[sparql11-query]] to describe updates that need to be applied to a graph in order to fix a given constraint violation. The document illustrates the use of the framework as part of the TopBraid platform.

This document uses the prefix dash which represents the DASH Data Shapes namespace http://datashapes.org/dash# which is accessible via its URL http://datashapes.org/dash.

Scope of This Document

Note that this document covers both the general design of the test case framework and also illustrates specific tool support as part of TopBraid platform. The TopBraid binding should only be understood as one possible implementation and is in no way limiting the general applicability of the framework.

The button below can be used to show or hide the TopBraid-specific paragraphs:

This document uses the prefix dash which represents the namespace http://datashapes.org/dash# which is accessible via its URL http://datashapes.org/dash.

Overview

The DASH Suggestions vocabulary provides a declarative data model for representing and sharing instructions on how to fix a data graph so that it no longer violates SHACL constraints. Suggestions represented using this vocabulary can be presented by tools to users as part of a semi-automated process, or employed by advanced tools to fully automate the repair of incorrect data.

The general design attaches instances of dash:SuggestionGenerator to SHACL constraint components or SPARQL-based constraints using the dedicated properties dash:propertySuggestionGenerator and dash:suggestionGenerator. The current version of the DASH vocabulary includes only one subclass of dash:SuggestionGenerator, called dash:SPARQLUpdateSuggestionGenerator. Each of these SPARQL-based suggestion generators includes a SPARQL UPDATE request that is executed with certain pre-bound variables, producing a change set consisting of triples to add and delete. The resulting change sets can be represented in RDF using the class dash:GraphUpdate, which points at triples to add or delete using the properties dash:addedTriple and dash:deletedTriple.

The following screenshots of TopBraid EVN and TopBraid EDG illustrates how the suggestions framework can be used to guide users with the repair of incorrect data:



Suggestions on Property Constraints

The property dash:propertySuggestionGenerator is used to point from a sh:ConstraintComponent to an instance of dash:SuggestionGenerator. The following sections introduce the two currently supported kinds of suggestion generators:

dash:SPARQLUpdateSuggestionGenerators

Suggestions may be produced by a SPARQL UPDATE command. The following example illustrates how this mechanism can be used to represent a repair strategy for sh:maxLength constraints. If a string value is too long, the suggestion is to prune the string to the permitted maximum character length:

sh:MaxLengthConstraintComponent
  dash:propertySuggestionGenerator [
      rdf:type dash:SPARQLUpdateSuggestionGenerator ;
      sh:message "Prune string to only {$maxLength} characters" ;
      sh:order 1 ;
      sh:update """
        DELETE {
            $focusNode $predicate $value .
        }
        INSERT {
            $focusNode $predicate $newValue .
        }
        WHERE {
            FILTER (isLiteral($value) && datatype($value) = xsd:string) .
            BIND (SUBSTR($value, 1, $maxLength) AS ?newValue) .
        }
        """ ;
    ] .

The example links the constraint component of sh:maxLength (sh:MaxLengthConstraintComponent) with a dash:SPARQLUpdateSuggestionGenerator using dash:propertySuggestionGenerator. The generator must have a string representation of a valid SPARQL UPDATE request as its value for sh:update. It may have a value for sh:order to indicate preference between multiple suggestions - a higher value indicates that the given suggestion is more likely to fix the issue than those with lower values. The suggestion may also provide a template string for a human-readable display label using sh:message. This string may contain placeholders for the parameters of the constraint component (here: {$maxLength}).

The SPARQL UPDATE is performed on the data graph containing the violated triples using pre-bound variables for each parameter of the constraint component, similar to how SPARQL-based constraint components are evaluated in SHACL. For example, the value of sh:maxLength is pre-bound to the variable $maxLength. Likewise, the variable $predicate must point at the sh:path of the property shape that caused the violation (if the value is a IRI), and $focusNode must point at the focus node (sh:focusNode) from the validation result. Finally, $sourceShape must be bound to the value of sh:sourceShape, $sourceConstraintComponent must point at the value of sh:sourceConstraintComponent and $shapesGraph must be the URI of the shapes graph.

If the suggestions are supposed to be presented to the user so that she can confirm them before they are applied, then the SPARQL UPDATE can be performed on a modified graph that includes the data graph triples but can record the triples that a given UPDATE would add or delete. The resulting adds and deletes can be attached to the sh:ValidationResult instance as part of a dash:GraphUpdate using the property dash:suggestion. This is illustrated in the following example.

[
    rdf:type sh:ValidationResult ;
    sh:focusNode ex:InvalidResource1 ;
    sh:resultPath schema:postalCode ;
    sh:resultSeverity sh:Violation ;
    sh:sourceConstraintComponent sh:MaxLengthConstraintComponent ;
    sh:value "58093" ;
    dash:suggestion [
        rdf:type dash:GraphUpdate ;
        dash:addedTriple [
            rdf:type rdf:Statement ;
            rdf:object "5809" ;
            rdf:predicate schema:postalCode ;
            rdf:subject ex:InvalidResource1 ;
        ] ;
        dash:deletedTriple [
            rdf:type rdf:Statement ;
            rdf:object "58093" ;
            rdf:predicate schema:postalCode ;
            rdf:subject ex:InvalidResource1 ;
        ] ;
        sh:order 1 ;
    ]
] .

As shown above, the results of applying a SPARQL UPDATE are represented as instances of dash:GraphUpdate and each added triple is represented using an instance of rdf:Statement and its properties rdf:subject, rdf:predicate and rdf:object. Likewise, the triples that shall be deleted are represented using the property dash:deletedTriple.

dash:ScriptSuggestionGenerators

This requires TopBraid 7.0 or later.

Instances of dash:ScriptSuggestionGenerator implement suggestion generators that are backed by an Active Data Shapes script. The script needs to return a JSON object or an array of JSON objects if it shall generate multiple suggestions. It may also return null to indicate that nothing was suggested.

Note that the whole script is evaluated as a (JavaScript) expression, and those will use the last value as result. So simply putting an object at the end of your script should do. Alternatively, define the bulk of the operation as a function and simply call that function in the script.

Each response object can have the following fields:

Suggestions with neither added nor deleted triples will be discarded.

At execution time, the script operates on the data graph as the active graph, with the following pre-bound variables:

The script will be executed in read-only mode, i.e. it cannot modify the graph.

This example implements the same use case as before, but using dash:js:

sh:MaxLengthConstraintComponent
    dash:propertySuggestionGenerator [
        a dash:ScriptSuggestionGenerator ;
        dash:js """
let newValue = value.substring(0, maxLength);
(
    {
        message: `Prune to maximum length of ${maxLength} characters`,
        add: [
            [ focusNode, predicate, newValue ]
        ],
        delete: [
            [ focusNode, predicate, value ]
        ]
    }
)
""" ] .

Suggestions for SPARQL-based Constraints

If a validation result has been produced by a SPARQL-based constraint (using sh:sparql), then the constraint may point at a suggestion generator similar to the previous section. The main difference is that fewer variables will be pre-bound when the UPDATE executes since there are no parameters.

The following example illustrates the use of the suggestions framework to ensure that the full name of a person is the concatenation of given name and family name.

# baseURI: http://example.org
# imports: http://datashapes.org/dash
# prefix: ex

@prefix dash: <http://datashapes.org/dash#> .
@prefix ex: <http://example.org#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org>
	a owl:Ontology ;
	owl:imports <http://datashapes.org/dash> ;
	sh:declare [
		a sh:PrefixDeclaration ;
		sh:namespace "http://example.org#"^^xsd:anyURI ;
		sh:prefix "ex" ;
	] .

ex:Person
	a rdfs:Class, sh:NodeShape ;
	rdfs:label "Person" ;
	rdfs:subClassOf rdfs:Resource ;
	sh:property [
		a sh:PropertyShape ;
		sh:path ex:familyName ;
		sh:datatype xsd:string ;
		sh:maxCount 1 ;
		sh:minCount 1 ;
	] ;
	sh:property [
		a sh:PropertyShape ;
		sh:path ex:fullName ;
		sh:datatype xsd:string ;
		sh:maxCount 1 ;
		sh:minCount 1 ;
	] ;
	sh:property [
		a sh:PropertyShape ;
		sh:path ex:givenName ;
		sh:datatype xsd:string ;
		sh:maxCount 1 ;
		sh:minCount 1 ;
	] ;
	sh:sparql [
		dash:suggestionGenerator ex:FullNameSuggestionGenerator ;
		sh:message "The full name should be \"{?suggestedFullName}\"" ;
		sh:prefixes <http://example.org> ;
		sh:select """
			SELECT $this ?value ?suggestedFullName (ex:fullName AS ?path)
			WHERE {
				$this ex:fullName ?value .
				$this ex:givenName ?givenName .
				$this ex:familyName ?familyName .
				BIND (CONCAT(?givenName, \" \", ?familyName) AS ?suggestedFullName) .
				FILTER (?suggestedFullName != ?value) .
			}""" ;
	] .
	
ex:FullNameSuggestionGenerator
	a dash:SPARQLUpdateSuggestionGenerator ;
	rdfs:label "Full name suggestion generator" ;
	sh:prefixes <http://example.org> ;
	sh:update """
		DELETE {
			$focusNode ex:fullName ?oldFullName .
		}
		INSERT {
			$focusNode ex:fullName ?suggestedFullName .
		}
		WHERE {
			$focusNode ex:fullName ?oldFullName .
			$focusNode ex:givenName ?givenName .
			$focusNode ex:familyName ?familyName .
			BIND (CONCAT(?givenName, \" \", ?familyName) AS ?suggestedFullName) .
		}""" .

For the following data graph

ex:JohnDoe
	a ex:Person ;
	ex:givenName "John" ;
	ex:familyName "Doe" ;
	ex:fullName "John Due" .

the suggestions framework would produce:

[
	a sh:ValidationResult ;
	sh:focusNode ex:JohnDoe ;
	sh:resultPath ex:fullName ;
	sh:resultSeverity sh:Violation ;
	sh:sourceConstraint _:b53919 ;
	sh:sourceConstraintComponent sh:SPARQLConstraintComponent ;
	sh:sourceShape ex:Person ;
	sh:value "John Due" ;
	dash:suggestion [
		a dash:GraphUpdate ;
		dash:addedTriple [
			a rdf:Statement ;
			rdf:subject ex:JohnDoe ;
			rdf:predicate ex:fullName ;
			rdf:object "John Doe" ;
		] ;
		dash:deletedTriple [
			a rdf:Statement ;
			rdf:subject ex:JohnDoe ;
			rdf:predicate ex:fullName ;
			rdf:object "John Due" ;
		] ;
		sh:order 0 ;
	] ;
] .

In addition to using a single sh:update query, there is an alternative syntax that combines a SELECT query with an UPDATE:

ex:FullNameSuggestionGenerator
	a dash:SPARQLUpdateSuggestionGenerator ;
	rdfs:label "Full name suggestion generator" ;
	sh:message "Set full name to {?suggestedFullName}" ;
	sh:prefixes <http://example.org> ;
	sh:select """
		SELECT $this ?oldFullName ?suggestedFullName
		WHERE {
			$this ex:fullName ?oldFullName .
			$this ex:givenName ?givenName .
			$this ex:familyName ?familyName .
			BIND (CONCAT(?givenName, \" \", ?familyName) AS ?suggestedFullName) .
		}""" ;
	sh:update """
		DELETE {
			$this ex:fullName ?oldFullName .
		}
		INSERT {
			$this ex:fullName ?suggestedFullName .
		}
		WHERE {
		}""" .

In this variation, the system will use the variable bindings produced by the sh:select query to execute sh:update queries for each result row. This has the advantage that each suggestion may get a different sh:message (in the example above that would be based on the variable ?suggestedFullName.