DASH Multi-Functions

This document introduces the concept of so-called multi-functions, which are functions that can not only produce multiple results but also multiple variables per result. This concept, based on previous work such as SPIN Magic Properties, can be used to greatly extend the expressiveness of SPARQL and other query languages including Active Data Shapes (ADS). We introduce an RDF vocabulary based on the SHACL standard to declare such multi-functions in a platform-neutral way.

Motivation

Most query languages for RDF graph data support a notion of functions to calculate values from arguments. For example, SPARQL has a library of built-in functions plus an extension point for specific SPARQL engines to support additional functions. The SHACL Advanced Features specifies an RDF vocabulary for declaring such SPARQL functions dynamically, using classes such as sh:SPARQLFunction.

However, such functions are limited to returning just a single RDF node as its result. Many use cases require more than a single result value, for example, splitting a string into multiple sub-strings. To support such use cases, some SPARQL implementations have come up with their own "native" syntaxes and custom extensions. As one of the earliest popular examples of such a syntax, the Apache Jena API supports the concept of property functions also known as magic properties (including a property function apf:strSplit). Other SPARQL implementations offer similar capabilities through work-arounds such as the SERVICE keyword where certain property patterns have dedicated special meanings.

While the SPARQL 1.1 standard does not have official support for such multi-valued functions, there are proposals for adding this for a future SPARQL version. As an earlier attempt to resolve this, SPIN (one of the predecessors of SHACL) had introduced a vocabulary to represent SPIN Magic Properties, allowing users to declare multi-valued functions for use in SPARQL engines that support SPIN.

This document introduces a modernized version of this idea, using SHACL instead of SPIN as its foundation. The basic idea is that ontologies/shape graphs may declare multi-functions in RDF, so that SPARQL engines and other query processors or API generators can pick them up dynamically. The resulting multi-functions may implement convenience features that greatly extend the expressiveness and support better reuse of recurring query logic.

Example Multi-Function for SPARQL

This example declares a multi-function ex:namedSuperClasses that returns all superclasses of a given class excluding blank nodes and owl:Thing and rdfs:Resource. Here is how the definition of that function looks like in TopBraid EDG:

The main part of the definition is the SPARQL SELECT query. Like SPARQL queries in other SHACL-based vocabularies, the SELECT query may access input parameters as pre-bound variables, typically represented with $ variables. In this example, the variable $subClass already has a value when the query is executing, and this value must be provided as input argument. Using such input arguments, the query can return result variables ?superClass and ?label.

Definition in RDF/Turtle

The complete definition in Turtle includes the details of the parameters and result variables:


ex:namedSuperClasses
  a dash:SPARQLMultiFunction ;
  rdfs:label "named super classes" ;
  rdfs:comment "Gets all (transitive) superclasses of a given class." ;
  dash:apiStatus dash:Stable ;
  sh:parameter ex:namedSuperClasses-subClass ;
  dash:resultVariable ex:namedSuperClasses-label ;
  dash:resultVariable ex:namedSuperClasses-superClass ;
  sh:prefixes <http://datashapes.org/dash> ;
  sh:select """
        SELECT ?superClass ?label
        WHERE {
            $subClass rdfs:subClassOf+ ?superClass .
            ?superClass rdfs:label ?label .
            FILTER isIRI(?superClass) .
            FILTER (?superClass != $subClass) .
            FILTER (?superClass NOT IN ( owl:Thing, rdfs:Resource ) )
        }""" ;
.
ex:namedSuperClasses-subClass
  a sh:Parameter ;
  sh:name "sub class" ;
  sh:description "The class to get the superclasses of." ;
  sh:path ex:subClass ;
  sh:class rdfs:Class ;
  sh:order 0 ;   # Optional, there is only one parameter
.
ex:namedSuperClasses-superClass
  a sh:Parameter ;
  sh:path ex:superClass ;
  sh:class rdfs:Class ;
  sh:description "The superclass resource." ;
  sh:name "super class" ;
  sh:order 0 ;
.
ex:namedSuperClasses-label
  a sh:Parameter ;
  sh:name "label" ;
  sh:description "The label of the superclass." ;
  sh:path ex:label ;
  sh:datatype xsd:string ;
  sh:order 1 ;
.

Example Use from SPARQL

Using this declaration, a SPARQL engine like Apache Jena in TopBraid can expose a new query feature as a property function which could be used as follows:

SELECT *
WHERE {
	schema:DayOfWeek ex:namedSuperClasses ( ?superClass ?label ) .
}

Which would return variable bindings as follows:

?superClass	?label
`schema:Thing`	"Thing"
`schema:Intangible`	"Intangible"
`schema:Enumeration`	"Enumeration"

Note that such multi-functions can not only return multiple result rows but also multiple variables (or columns) per row. In the SPARQL syntax above, the values to the left of the "magic" property ex:namedSuperClasses are the input arguments (declared as parameters using SHACL) and the variables on the right will receive the value bindings that are produced by the SELECT query defined by the multi-function.

In the example above, the SPARQL property function uses the same variable names ?superClass and ?label, however it could also use any other "fresh" variable that is not yet bound to other values, e.g.

SELECT *
WHERE {
	schema:DayOfWeek ex:namedSuperClasses ( ?class ?classLabel ) .
}

Note that in order for such SPARQL property functions to be recognized and installed by TopBraid, the multi-function should be declared in files containing .api. in their name. Alternatively they may be stored in graphs that are part of the ui:graph (imported by any .ui.ttlx file), or imported from any .spin.ttl file.

Example Use from JavaScript/ADS

TopBraid's Active Data Shapes (ADS) framework makes it possible to query and manipulate RDF graphs through JavaScript or TypeScript. Assuming the instances of dash:SPARQLMultiFunction are defined in an Ontology asset collection or in a file that is imported by the Ontology, the ADS code generator will produce JavaScript functions such as follows:

let superClasses = ex.namedSuperClasses(schema.DayOfWeek);
// Result will be an array of objects with fields { superClass, label }
console.log(`DayOfWeek has ${superClasses.length} superclasses:`);
superClasses.forEach(result => {
	console.log(`- ${result.label}: ${result.superClass}`);
});