This document introduces the concept of so-called multi-functions, which are functions that can not only produce multiple results but also multiple variables per result. This concept, based on previous work such as SPIN Magic Properties, can be used to greatly extend the expressiveness of SPARQL and other query languages including Active Data Shapes (ADS). We introduce an RDF vocabulary based on the SHACL standard to declare such multi-functions in a platform-neutral way.
This document uses the prefix dash
which represents the DASH Data Shapes
namespace http://datashapes.org/dash#
which is accessible via its URL http://datashapes.org/dash
.
This requires TopBraid 7.1 or above.
Most query languages for RDF graph data support a notion of functions to calculate values from arguments.
For example, SPARQL has a library of built-in functions
plus an extension point for specific SPARQL engines
to support additional functions.
The SHACL Advanced Features specifies an RDF vocabulary for declaring
such SPARQL functions dynamically, using classes such as sh:SPARQLFunction
.
However, such functions are limited to returning just a single RDF node as its result.
Many use cases require more than a single result value, for example, splitting a string into multiple sub-strings.
To support such use cases, some SPARQL implementations have come up with their own "native" syntaxes and custom extensions.
As one of the earliest popular examples of such a syntax, the Apache Jena API supports the concept of
property functions also known as magic properties
(including a property function apf:strSplit
).
Other SPARQL implementations offer similar capabilities through work-arounds such as the SERVICE
keyword where
certain property patterns have dedicated special meanings.
While the SPARQL 1.1 standard does not have official support for such multi-valued functions, there are proposals for adding this for a future SPARQL version. As an earlier attempt to resolve this, SPIN (one of the predecessors of SHACL) had introduced a vocabulary to represent SPIN Magic Properties, allowing users to declare multi-valued functions for use in SPARQL engines that support SPIN.
This document introduces a modernized version of this idea, using SHACL instead of SPIN as its foundation. The basic idea is that ontologies/shape graphs may declare multi-functions in RDF, so that SPARQL engines and other query processors or API generators can pick them up dynamically. The resulting multi-functions may implement convenience features that greatly extend the expressiveness and support better reuse of recurring query logic.
This example declares a multi-function ex:namedSuperClasses
that returns all superclasses of a given class
excluding blank nodes and owl:Thing and rdfs:Resource.
Here is how the definition of that function looks like in TopBraid EDG:
The main part of the definition is the SPARQL SELECT query.
Like SPARQL queries in other SHACL-based vocabularies, the SELECT query may access input parameters as
pre-bound variables, typically represented with $ variables.
In this example, the variable $subClass
already has a value when the query is executing,
and this value must be provided as input argument.
Using such input arguments, the query can return result variables ?superClass
and ?label
.
The complete definition in Turtle includes the details of the parameters and result variables:
ex:namedSuperClasses
a dash:SPARQLMultiFunction ;
rdfs:label "named super classes" ;
rdfs:comment "Gets all (transitive) superclasses of a given class." ;
dash:apiStatus dash:Stable ;
sh:parameter ex:namedSuperClasses-subClass ;
dash:resultVariable ex:namedSuperClasses-label ;
dash:resultVariable ex:namedSuperClasses-superClass ;
sh:prefixes <http://datashapes.org/dash> ;
sh:select """
SELECT ?superClass ?label
WHERE {
$subClass rdfs:subClassOf+ ?superClass .
?superClass rdfs:label ?label .
FILTER isIRI(?superClass) .
FILTER (?superClass != $subClass) .
FILTER (?superClass NOT IN ( owl:Thing, rdfs:Resource ) )
}""" ;
.
ex:namedSuperClasses-subClass
a sh:Parameter ;
sh:name "sub class" ;
sh:description "The class to get the superclasses of." ;
sh:path ex:subClass ;
sh:class rdfs:Class ;
sh:order 0 ; # Optional, there is only one parameter
.
ex:namedSuperClasses-superClass
a sh:Parameter ;
sh:path ex:superClass ;
sh:class rdfs:Class ;
sh:description "The superclass resource." ;
sh:name "super class" ;
sh:order 0 ;
.
ex:namedSuperClasses-label
a sh:Parameter ;
sh:name "label" ;
sh:description "The label of the superclass." ;
sh:path ex:label ;
sh:datatype xsd:string ;
sh:order 1 ;
.
Using this declaration, a SPARQL engine like Apache Jena in TopBraid can expose a new query feature as a property function which could be used as follows:
SELECT *
WHERE {
schema:DayOfWeek ex:namedSuperClasses ( ?superClass ?label ) .
}
Which would return variable bindings as follows:
?superClass | ?label |
schema:Thing |
"Thing" |
schema:Intangible |
"Intangible" |
schema:Enumeration |
"Enumeration" |
Note that such multi-functions can not only return multiple result rows but also multiple variables (or columns)
per row.
In the SPARQL syntax above, the values to the left of the "magic" property ex:namedSuperClasses
are the
input arguments (declared as parameters using SHACL) and the variables on the right will receive the value
bindings that are produced by the SELECT query defined by the multi-function.
In the example above, the SPARQL property function uses the same variable names ?superClass
and ?label
,
however it could also use any other "fresh" variable that is not yet bound to other values, e.g.
SELECT *
WHERE {
schema:DayOfWeek ex:namedSuperClasses ( ?class ?classLabel ) .
}
Note that in order for such SPARQL property functions to be recognized and installed by TopBraid, the multi-function
should be declared in files containing .api.
in their name.
Alternatively they may be stored in graphs that are part of the ui:graph
(imported by any .ui.ttlx
file),
or imported from any .spin.ttl
file.
TopBraid's Active Data Shapes (ADS) framework makes it possible to query
and manipulate RDF graphs through JavaScript or TypeScript.
Assuming the instances of dash:SPARQLMultiFunction
are defined in an Ontology asset collection
or in a file that is imported by the Ontology, the ADS code generator will produce JavaScript functions
such as follows:
let superClasses = ex.namedSuperClasses(schema.DayOfWeek);
// Result will be an array of objects with fields { superClass, label }
console.log(`DayOfWeek has ${superClasses.length} superclasses:`);
superClasses.forEach(result => {
console.log(`- ${result.label}: ${result.superClass}`);
});
This requires TopBraid 8.3 or above.
The Active Data Shapes (ADS) framework is an extension of SHACL allowing to write RDF processing logic in JavaScript. ADS can be used to declare new multi-functions, so that JavaScript is executed whenever the multi-function is called.
The following example declares a multi-function ex:generateItems
that takes an integer (count) and returns as many result
rows/objects with an index and a label based on the current index.
ex:generateItems
a dash:ScriptMultiFunction ;
dash:apiStatus dash:Experimental ;
dash:js """let results = [];
for(let i = 0; i < count; i++) {
results.push({
index: i,
label: `Item ${i + 1}`
})
}
results;""" ;
dash:resultVariable ex:generateItems-index ;
dash:resultVariable ex:generateItems-label ;
rdfs:comment "A sample multi-function that takes an integer (count) and returns as many result rows/objects with an index and a label based on the current index." ;
rdfs:label "generate items" ;
sh:parameter ex:generateItems-count ;
.
ex:generateItems-count
a sh:Parameter ;
sh:path ex:count ;
sh:datatype xsd:integer ;
sh:description "The number of results to produce." ;
sh:name "count" ;
.
ex:generateItems-index
a sh:Parameter ;
sh:path ex:index ;
sh:datatype xsd:integer ;
sh:description "The current index of each result, starting with 0." ;
sh:name "index" ;
.
ex:generateItems-label
a sh:Parameter ;
sh:path ex:label ;
sh:datatype xsd:string ;
sh:description "The label such as \"Item 1\" for the first item." ;
sh:name "label" ;
sh:order "1"^^xsd:decimal ;
.