This document introduces a declarative RDF data model to represent suggestions that can be used by tools to repair SHACL constraint violations [[!shacl]]. The framework uses SPARQL [[!sparql11-query]] to describe updates that need to be applied to a graph in order to fix a given constraint violation. The document illustrates the use of the framework as part of the TopBraid platform.

Note that this document covers both the general design of the test case framework and also illustrates specific tool support as part of TopBraid platform. The TopBraid binding should only be understood as one possible implementation and is in no way limiting the general applicability of the framework. The document has been updated for TopBraid release 5.2.

The button below can be used to show or hide the TopBraid-specific paragraphs:

This document uses the prefix dash which represents the namespace http://datashapes.org/dash# which is accessible via its URL http://datashapes.org/dash.

Overview

The DASH Suggestions vocabulary provides a declarative data model for representing and sharing instructions on how to fix a data graph so that it no longer violates SHACL constraints. Suggestions represented using this vocabulary can be presented by tools to users as part of a semi-automated process, or employed by advanced tools to fully automate the repair of incorrect data.

The general design attaches instances of dash:SuggestionGenerator to SHACL constraint components or SPARQL-based constraints using the dedicated properties dash:propertySuggestionGenerator and dash:suggestionGenerator. The current version of the DASH vocabulary includes only one subclass of dash:SuggestionGenerator, called dash:SPARQLUpdateSuggestionGenerator. Each of these SPARQL-based suggestion generators includes a SPARQL UPDATE request that is executed with certain pre-bound variables, producing a change set consisting of triples to add and delete. The resulting change sets can be represented in RDF using the class dash:GraphUpdate, which points at triples to add or delete using the properties dash:addedTriple and dash:deletedTriple.

The following screenshots of TopBraid EVN and TopBraid EDG illustrates how the suggestions framework can be used to guide users with the repair of incorrect data:



Suggestions on Property Constraints

The property dash:propertySuggestionGenerator is used to point from a sh:ConstraintComponent to an instance of dash:SuggestionGenerator (e.g., dash:SPARQLUpdateSuggestionGenerator).

The following example illustrates how this framework can be used to represent a repair strategy for sh:maxLength constraints. If a string value is too long, the suggestion is to prune the string to the permitted maximum character length:

sh:MaxLengthConstraintComponent
  dash:propertySuggestionGenerator [
      rdf:type dash:SPARQLUpdateSuggestionGenerator ;
      sh:message "Prune string to only {$maxLength} characters" ;
      sh:order 1 ;
      sh:update """
        DELETE {
            $focusNode $predicate $value .
        }
        INSERT {
            $focusNode $predicate $newValue .
        }
        WHERE {
            FILTER (isLiteral($value) && datatype($value) = xsd:string) .
	        BIND (SUBSTR($value, 1, $maxLength) AS ?newValue) .
        }
        """ ;
    ] .

The example links the constraint component of sh:maxLength (sh:MaxLengthConstraintComponent) with a dash:SPARQLUpdateSuggestionGenerator using dash:propertySuggestionGenerator. The generator must have a string representation of a valid SPARQL UPDATE request as its value for sh:update. It may have a value for sh:order to indicate preference between multiple suggestions - a higher value indicates that the given suggestion is more likely to fix the issue than those with lower values. The suggestion may also provide a template string for a human-readable display label using sh:message. This string may contain placeholders for the parameters of the constraint component (here: {$maxLength}).

The SPARQL UPDATE is performed on the data graph containing the violated triples using pre-bound variables for each parameter of the constraint component, similar to how SPARQL-based constraint components are evaluated in SHACL. For example, the value of sh:maxLength is pre-bound to the variable $maxLength. Likewise, the variable $predicate must point at the sh:path of the property shape that caused the violation, and $this must point at the current focus node (sh:focusNode) from the validation result.

If the suggestions are supposed to be presented to the user so that she can confirm them before they are applied, then the SPARQL UPDATE can be performed on a modified graph that includes the data graph triples but can record the triples that a given UPDATE would add or delete. The resulting adds and deletes can be attached to the sh:ValidationResult instance as part of a dash:GraphUpdate using the property dash:suggestion. This is illustrated in the following example.

[
    rdf:type sh:ValidationResult ;
    sh:focusNode ex:InvalidResource1 ;
    sh:resultPath schema:postalCode ;
    sh:resultSeverity sh:Violation ;
    sh:sourceConstraintComponent sh:MaxLengthConstraintComponent ;
    sh:value "58093" ;
    dash:suggestion [
        rdf:type dash:GraphUpdate ;
        dash:addedTriple [
            rdf:type rdf:Statement ;
            rdf:object "5809" ;
            rdf:predicate schema:postalCode ;
            rdf:subject ex:InvalidResource1 ;
        ] ;
        dash:deletedTriple [
            rdf:type rdf:Statement ;
            rdf:object "58093" ;
            rdf:predicate schema:postalCode ;
            rdf:subject ex:InvalidResource1 ;
        ] ;
        sh:order 1 ;
    ]
] .

As shown above, the results of applying a SPARQL UPDATE are represented as instances of dash:GraphUpdate and each added triple is represented using an instance of rdf:Statement and its properties rdf:subject, rdf:predicate and rdf:object. Likewise, the triples that shall be deleted are represented using the property dash:deletedTriple.

Suggestions for SPARQL-based Constraints

If a validation result has been produced by a SPARQL-based constraint (using sh:sparql), then the constraint may point at a suggestion generator similar to the previous section. The main difference is that fewer variables will be pre-bound when the UPDATE executes since there are no parameters.