This document introduces Active Data Shapes, a framework to execute and share scripts written in languages like JavaScript to query, modify and otherwise process knowledge graphs. Active Data Shapes closely integrate declarative graph schema definitions (using SHACL shapes) with imperative scripting language code by means of an API generator that mirrors the shapes of the graph data as JavaScript classes. Using familiar design patterns, Active Data Shapes may significantly lower the cost of entry into graph technology for "mainstream" developers, while greatly enhancing the productivity of graph technology power users. This tutorial walks through a couple of use cases with screenshots and hands-on exercises in TopBraid EDG.

Scope of This Document

Active Data Shapes are both a general, platform-independent framework and a feature of TopBraid. In this tutorial we will include details of both the technology and the actual steps using the TopBraid EDG user interface. Future revisions may more clearly articulate the separation between the framework and the tool.

This document uses the prefix dash which represents the namespace http://datashapes.org/dash# which is accessible via its URL http://datashapes.org/dash.

Motivation

Knowledge graph technology has been established as a flexible framework for representing almost arbitrary data, schemas and query patterns. In the W3C semantic technology space, RDF triples are used to store data, while SHACL can be used to declare the structure that the data is assumed to have. Inference rules may derive new statements from asserted facts. Many applications and use cases rely on SPARQL to perform queries and updates to the data. However, these technologies are primarily about data, not executable behavior.

Applications that want to perform services on such knowledge graphs are typically written in conventional programming languages such as JavaScript, Java and Python. Often, there is imperative program code that alternates between SPARQL queries, looping through result sets and then performing further steps such as making more SPARQL queries or changing data through API calls and SPARQL UPDATEs. Programming libraries such as RDF4J and Jena offer powerful and generic APIs to operate on RDF nodes and graphs, yet there is quite some boilerplate code and a learning curve needed even to execute a simple query.

Domain experts that know their ontologies and data graphs are often unable to use these generic frameworks, in particular if they require compile-time cycles and complex plugin mechanisms to integrate them into their everyday workflows. Many organizations have graph technology experts that do have decent technological skills yet their true potential remains unused due to the gaps in the technology landscape. Such graph technology experts often shepherd larger groups of end users who have repetitive tasks that are specific to the domain model and ontology used in their organization.

We introduce Active Data Shapes as a novel way of interacting with knowledge graphs, based on JavaScript and a domain-specific API generator. This framework is based on many years of experience with home-grown frameworks such as SPIN, SWP and SPARQLMotion and the various plugin mechanisms of the TopBraid platform. A common theme of those technologies was to attach actionable rules, HTML templates, SPARQL queries or other forms of behavior to class definitions. The linkage between ontology concepts and actions worked well in principle, but we continuously ran into limitations of the execution languages. For example we often needed to hard-code all kinds of extension functions to SPARQL to raise the expressivity that was needed to solve real-world tasks, and we needed to add "declarative" means for things like IFs and FOR-loops. In this new design, we acknowledge that these tasks can only be solved with the power of general-purpose programming languages.

Sitting on a solid mainstream technology foundation, Active Data Shapes attempt to combine the elegance and flexibility of RDF-based data representation with the power of imperative programming environments. This technology may become a cornerstone of extension development for TopBraid and similar platforms in the future.

Introduction to Active Data Shapes

Please take a second to digest the following architecture diagram:

At the base of the framework sits the Data Graph with the actual RDF statements ("instance data"). In TopBraid EDG this is basically the currently open asset collection. There is also a Shapes Graph that defines the structure of the data using RDFS classes and SHACL shapes. All TopBraid ontologies use SHACL, and an increasing number of externally developed data models come with SHACL shape definitions. Shapes can also be auto-generated from OWL and RDFS ontologies. The shapes graph is often part of the data graph, so that queries can seamlessly move between the data and the schema level.

When a script is executed, for example triggered through an incoming web service request initiated by an end user, the script is interpreted by the Script Runtime. TopBraid exploits Oracle's GraalVM engine which offers polyglot support for multiple scripting languages including JavaScript, Python and R. In this version we focus on JavaScript but hope to expand to Python and possibly other languages once the framework has stabilized. Although scripts execute on the server alongside the application platform, they "live" in a safe sandbox-like environment where the script may access the data graph but is otherwise isolated from the rest of the application platform.

This means that although a script is executed on the TopBraid server it can only access a very focused API consisting of a (hard-coded) Core API and an automatically Generated API. Scripts can of course also use all standard features of the ECMAScript platform, including things like if-then-else, for loops, array operations, class and function definitions.

Core API

The Core API of the Active Data Shapes framework consists of:

Here is a picture illustrating the three Core API classes that we use to represent RDF nodes:

Every node is an instance of GraphNode. RDF literals (strings, booleans, dates etc) are represented as instances of LiteralNode, each of which makes the lexical form of the literal available using lex. All literals have a datatype property holding the URI of the RDF datatype, and a lang property with the language tag (usually an empty string unless the literal has datatype rdf:langString). All of those values are read-only properties. You can use graph.node(...) and other convenience methods such as graph.langString(lex, lang) to produce new nodes if you need them. For example you can use graph.literal(4.2) to produce a instance of LiteralNode with "4.2" as lexical form and datatype xsd:decimal. Or use graph.node({lex: "2020-06-01", datatype: xsd.date}) for an xsd:date literal. (In case you wondered, xsd.date is part of the automatically generated APIs for the XSD namespace.)

Non-literals are represented as instances of NamedNode and they all have a uri property holding the identifier of the node. (For the RDF geeks, we represent blank nodes also as named nodes, only with a URI consisting of _: and then the blank node identifier. Use .isBlankNode() and .isURI() if you need to distinguish between them.) URIs are also read-only, but you can construct new NamedNodes with graph.named("...") or graph.named({qname: "..."}) and with the factory methods from the generated namespaces explained below.

Keep in mind that you can not simply use JavaScript == operators on instances of these node types. This is because the same URI node may have multiple JavaScript objects over the life time of your script. Instead, use nodeA.equals(nodeB) to compare them, or to find an item in an array.

NamedNode has two functions to query RDF property values. For example you can use focusNode.values(skos.broader) to return an array of values for skos:broader of the focus node. However, typically you may prefer to go through the much more readable properties from the generated APIs, as follows.

Generated API

The Generated API is the domain-specific part and is entirely based on the SHACL shape definitions from the shapes graph. The following diagram shows an example from the well-known SKOS Taxonomy vocabulary.

The API generator will produce one JavaScript class for each named SHACL node shape (or class that is also a sh:NodeShape). The name of the class will be the prefix of the namespace plus underscore plus the local name of the class. For example, skos:Concept becomes skos_Concept. Each property shape for such node shapes is mapped to a JavaScript property based on the local name of the property that is the sh:path of the property shape. For example, skos:broader is mapped to a JavaScript property broader at the class skos_Concept. If no suitable local name exists (e.g. if it contains a dash or other unsuitable characters), or if you want to use a plural name such as broaderConcepts, you can specify a different name using the property graphql:name at the property shape. Such JavaScript properties are backed by so-called getters and setters which means that their values are fetched from the RDF database when requested, and assignments of the JavaScript property will actually create new triples in the data graph.

/**
 * Generated from the shape http://www.w3.org/2004/02/skos/core#Concept
 */
class skos_Concept extends NamedNode {
		
	/**
	 * Relates a concept to a concept that is more general in meaning.
	 * The RDF path is <http://www.w3.org/2004/02/skos/core#broader>
	 * @returns {skos_Concept[]}
	 */
	get broader() {
		return this.values('<http://www.w3.org/2004/02/skos/core#broader>', skos_Concept);
	}
	
	/**
	 * @param {skos_Concept[]} values  the new values or null to remove all previous values
	 */
	set broader(values) {
		...
	}

	
	/**
	 * The preferred lexical label for a resource, in a given language.
	 * The RDF path is <http://www.w3.org/2004/02/skos/core#prefLabel>
	 * @returns {LiteralNode[]}
	 */
	get prefLabel() {
		return this.values('<http://www.w3.org/2004/02/skos/core#prefLabel>', LiteralNode);
	}
	
	/**
	 * @param {LiteralNode[]} values  the new values or null to remove all previous values
	 */
	set prefLabel(values) {
		...
	}

	... plus custom functions declared via dash:shapeScript
}

The API generator will also produce one JavaScript object for each namespace prefix that contains either classes, node shapes, properties or datatypes. These prefix objects have automatically generated factory methods such as skos.asConcept(...) that can be used to conveniently create new instances. For example use skos.asConcept(g.NS + 'Canada') to produce an instance of skos_Concept for the URI with the given URI. A key feature of Active Data Shapes is polymorphism, which means that you can cast any named node into an instance of any other named node class. For example, you can use skos.asConcept(anyNode).broader to convert the given named node into an instance of skos_Concept and then fetch its broader concepts. The generated prefix objects also contain identifiers for any declared datatype, class, node shape or property from that namespace. For example, you can use xsd.string to access a NamedNode for the xsd:string datatype.

/**
 * Generated from the namespace <http://www.w3.org/2004/02/skos/core#>
 */
const skos = {
		
	/**
	 * Converts a value into an instance of skos_Concept
	 * @returns {skos_Concept}
	 */
	asConcept: (obj) => {
		return new skos_Concept(obj);
	},
	... same for each node shape in the skos namespace
		
	/**
	 * Creates a new instance of skos_Concept based on initial property values.
	 * @param {skos_Concept} props - name-value pairs for the initial properties
	 * @returns {skos_Concept}
	 */
	createConcept: (props) => {
		...
	},
	... same for each class in the skos namespace
	
	/**
	 * Gets all instances of the class skos:Concept in the data graph.
	 * @returns {skos_Concept[]} all instances including those of subclasses
	 */
	everyConcept: () => {
		...
	},
	... same for each class in the skos namespace
	
	get broader() { return new NamedNode("http://www.w3.org/2004/02/skos/core#broader") },
	... same for each property, node shape, class or datatype in the skos namespace

	NS: "http://www.w3.org/2004/02/skos/core#",
	PREFIX: "skos",
}

From TopBraid 7.1 onwards, you need to inform the system about the namespaces and/or classes that shall be generated. To do so, navigate to the "home" resource of your Ontology and switch the form to Script API. You will see the following properties:

In the case above, the ontology would state dash:generatePrefixClasses "skos" and dash:generatePrefixConstants "skos". The SKOS Shapes namespace that gets included into TopBraid EDG Taxonomies will automatically have those statements for the "skos" prefix already, but you will need those settings for your own custom ontologies, in particular when you are migrating from 7.0 to 7.1.

While not shown above, such prefix objects will also contain JavaScript functions that mirror any declared SHACL function, making it easy to call most SPARQL functions with a familar JavaScript syntax. Note that from TopBraid 7.1 onwards, any function that shall be included in the API needs to have a value for the property dash:apiStatus, e.g. dash:Stable. Functions without any API status will no longer be included even if they were included in 7.0.

One big advantage of going through such domain specific JavaScript APIs instead of a generic API is that you can benefit from the code completion and other IntelliSense features of your JavaScript editor, including the online editor bundled with TopBraid EDG. Using the API generated from the SHACL shapes, the editor will know in advance that the values of broader are again instances of skos_Concept and therefore can help you select the next operation on those values. Furthermore, if a shape declares that a certain property has sh:datatype xsd:string then the API will directly return a native JavaScript string, it will return native booleans for xsd:boolean and produce native numbers for any numeric datatype such as xsd:integer and xsd:decimal. This means you can ask queries such as country.areaKM > 200000.

Another big advantage of using the generated APIs is that you can attach additional methods (functions and mutators) to the shape definitions from where they will be injected into the API classes. We will see a couple of examples of this in action soon.

Finally, the generated API can make deeper use of the SHACL shape definitions, for example to recognize that some property values should be sorted by their dash:index, meaning that arrays will maintain their order. Future versions may also perform basic constraint checks before a property value can be assigned.

Note that the JavaScript properties based on property shapes may use complex path expressions (including inverse paths) and even inferred values that are computed dynamically using sh:values rules. However, only non-inferred paths that consist of a simple predicate can be assigned using = - the others are read-only.

IMPORTANT: In TopBraid any RDF statements that are relevant to ADS code generation must be represented in either Ontology asset collections or files in the workspace. It is not supported to store shape definitions or any other triples that impact the code generation in other asset collection types such as Taxonomies.

Getting Started

TopBraid EDG is a graph technology platform that includes, among many other features, comprehensive browser-based editors for Ontologies, Taxonomies and other knowledge graphs.

For this tutorial we use the Geography Taxonomy from the freely available TopBraid EDG Samples but you may also use any other SKOS-based taxonomy, assuming you have an EDG Ontology for the class definitions including skos:Concept.

To get started with Active Data Shapes scripts in TopBraid 6.4, make sure that an administrator has enabled Scripts in the Advanced section of the Server Configuration Parameters page. This step is not needed from version 7.0 onwards. Once enabled, you can open various new panels such as the Script Editor and Script Results panels shown below:

Please start with the Script Editor, which offers a full-blown JavaScript editor with syntax highlighting, auto-completion etc. (Not all potential features are implemented here yet, e.g. you cannot yet use CTRL+click to navigate around the ontology). Use the Execute ("Play") button to run the script, which may in the beginning only consist of the word focusNode. When you run that "script" (as shown above) the Script Results panel will show the currently selected asset. More generally, the Script Editor can evaluate whatever JavaScript expression you have entered. If you have multiple lines of JavaScript, it will simply return the value of the last expression.

The variable focusNode may serve as your starting point to play with the automatically generated API, for example using auto-complete after the dot as shown:

As another starting point into the API, try running a SPARQL query using the graph object:

Such SPARQL queries produce an object with two arrays: vars contains the names of the variables, and bindings is an array of JavaScript objects for each result row.

Here is a similar query, using pure JavaScript and the generated API for the namespace prefix g.

Finally, here is an example of using JavaScript string templates to produce HTML:

There is a similar function graph.xml() to produce XML output.

Example JSON Web Service

You have seen from the previous toy examples that the Active Data Shapes framework can be used to produce even complex output with few lines of code. We will now put our experiments into action for real-world use cases.

This example produces a JSON object that is difficult to impossible to create with query languages such as SPARQL or GraphQL, because it requires recursion to produce a tree structure of concepts. In JavaScript this is fairly easy to do, through a recursive function that gets attached to the skos:Concept class (which in this example is also a sh:NodeShape). In the RDF data model, the property dash:shapeScript links a node shape with an instance of dash:ShapeScript, which primarily consists of the source code string that is to be injected into the API class skos_Concept:

skos:Concept
  a rdfs:Class, sh:NodeShape ;
  ...
  dash:shapeScript skos:Concept-ShapeScript ;
.

skos:Concept-ShapeScript
  a dash:ShapeScript ;
  dash:js """

    /**
     * Recursively produces a JSON object with two fields suitable for rendering a SKOS concept tree.
     * @return {object}
     */
    hierarchyJSON() {
        return {
            label: this.toString(),
            children: this.narrower.map(child => child.hierarchyJSON())
        }
    }
""" ;
.

TopBraid hides this complexity and instead offers a convenient editor for the source code using the Shape Scripts panel:

You need to make such changes in the Ontology editor. For example if you are using the Geography Ontology to follow along, switch to the TopBraid Examples Geography Ontology and navigate to the class Concept in the class hierarchy. Open the Shape Scripts panel from the Panels drop down menu in the header. After you have entered the source code, don't forget to press Save Changes. Now you can move back to the instances graph (Taxonomy).

Once the generated API has been updated (press the Refresh the generated API button in the Script Editor panel), the new function hierarchyJSON is available at any instance of skos:Concept. For example, if the selected resource is the Concept g:NorthAmerica then the function can be tested as follows:

This new function can now be called as a JSON web service using TopBraid's script servlet. You can see an example of the correct syntax when you show the Network requests through a browser inspection tool.

Example Modify Service and Action

In this example we will create a script that inserts a new RDF statement about a concept. In particular, it will be a script that applies to instances of the class skos:Concept and takes its label with language tag "en" and duplicates this into some other sub-language such as "en-UK" or "en-US". And this time we do not satisfy ourselves with programmatic access but also want to expose the feature to end users through a clickable user interface!

We perform this exercise in three simple steps:

  1. Develop the actual implementation against sample instances, using the Script Editor
  2. Copy the implementation into a proper JavaScript function attached to skos:Concept
  3. Declare a Resource Action that will insert the function into a context menu for the selected instance

Step 1: Develop Script using Sample Instances

The scripting framework makes it easy for users to enter JavaScript expressions against sample data. We select a sample instance of skos:Concept, such as g:Canada shown in the screenshot below and use the Script Editor to explore the property values that we want to modify.

In the above expression, the variable focusNode is the currently selected instance of skos_Concept. The class skos_Concept is automatically generated from the shape definitions, and has a property prefLabel to access the values of skos:prefLabel from the data graph. As this is a multi-valued property, and the values may either be strings or strings with a language tag (rdf:langString), the property prefLabel has type LiteralNode[]. The JavaScript Array class has useful functions to query its members. One of them is find which takes a boolean function as its argument and returns the first element where that function returns true. Above we use focusNode.prefLabel.find(label => label.lang == 'en') to return a LiteralNode where the lang attribute is "en". Executing that little script returns the equivalent of the RDF node "Canada"@en, which is a value of skos:prefLabel at g:Canada.

We can now incrementally build up the code to create a new LiteralNode that has the same string value (aka, lexical form lex) but a different sub-tag of english, such as "en-UK". Then we add this new literal to the existing prefLabels array and assign that array back to focusNode.prefLabel as follows:

let prefLabels = focusNode.prefLabel;
let enLabel = prefLabels.find(label => label.lang == 'en')
let newLabel = graph.langString(enLabel.lex, 'en-UK');
prefLabels.push(newLabel);
focusNode.prefLabel = prefLabels;

Note: if you are receiving the All properties are read-only in this mode. error, make sure you have unlocked the graph using the padlock setting in the upper right corner of the Script Editor panel. This is in read-only mode by default to prevent accidental edits to the data - scripts are sometimes too powerful and should be handled with care! Even in read-write mode, we suggest you try the Preview button prior to actually making assertions. Undo will also work in case something went wrong.

Step 2: Attach Script to its Node Shape or Class

Once we are satisfied with the script, we move to the shape definitions (in the Ontology in TopBraid) to make the script available to all instances of skos:Concept. In this case, we generalize the script's logic to also take a parameter subLang, with values such as "UK". We also need to make sure that we replace focusNode from our experiments with this:

    /**
     * Adds a new preferred label which uses the "en" label as base but
     * uses a more specialized language tag such as "en-UK".
     * @param {string} subLang - the sub-language such as "UK"
     */
    deriveLabel(subLang) {
        let prefLabels = this.prefLabel;
        let enLabel = prefLabels.find(label => label.lang == 'en')
        let newLabel = graph.langString(enLabel.lex, 'en-' + subLang);
        prefLabels.push(newLabel);
        this.prefLabel = prefLabels;
    }

In TopBraid's Ontology editor, paste this into the Shape Script panel, assuming you have skos:Concept selected, and save changes.

Back in the Taxonomy with the sample instances, we can now invoke this new deriveLabel function as it has become part of our auto-generated API derived from the shape definitions. Make sure the API is up-to-date using the Refresh button in the Script Editor panel. You can also inspect the full API on the Script APIViewer panel:

We can now use this new function as a reusable lego brick in other scripts, for example for batch processes:

Step 3: Declare a Resource Action for End Users

Not everybody is a programmer, so we now want to make sure that our friends from the domain expert department can also use this script. The Active Data Shapes framework introduces a small vocabulary for representing so-called Resource Actions, which can be either Explore Actions or Modify Actions. As the name suggests, Modify actions can make changes to the data graph while Explore actions will execute with the data graph in read-only mode. TopBraid EDG will insert any suitable resource action into the Explore or Modify menus of a selected resource, assuming that the resource action has been assigned a dash:actionGroup and is not sh:deactivated true. Here is the final source code for our addition to the Modify menu:

skos:DeriveLabelAction
  a dash:ModifyAction ;
  dash:actionGroup skos:ConceptActions ;
  dash:actionIconClass "fas fa-language" ;
  dash:js "focusNode.deriveLabel(subLang)" ;
  rdfs:comment "Derives a new label from the \"en\" label." ;
  rdfs:label "Derive label..." ;
  sh:parameter [
      a sh:Parameter ;
      sh:path <urn:param:subLang> ;
      sh:datatype xsd:string ;
      sh:description "The sub-language such as \"UK\"" ;
      sh:name "sub language" ;
    ] ;
.

In order to produce this with the Ontology Editor, find the resource actions section on the Form of your class and use Create Resource Action... from the little drop down menu:

This will open the following dialog, which allows you to select the type of action, label and internal URI of the action:

As the next step fill in comment, select or create an action group, select an optional icon class for the menu (see Font Awesome for some suggestions) and declare the parameter and JavaScript code for the actual work:

This dialog can be used to declare the parameter:

Once all this has been completed, the users of the Taxonomy should find the new item in the Modify menu of any Concept. (If you don't see it, refresh the browser as those menu items are only loaded when the page starts.)

When selected, the Derive label... feature will open a dialog allowing the user to fill in the required parameter(s) and possibly preview the change before applying it.

If you want to further restrict that this menu item should only be available to concepts where a preferred label with language "en" actually exists, you can annotate the dash:ModifyAction with a pre-condition using dash:jsCondition. This pre-condition must be an expression that uses the variable focusNode and returns true if and only if the action can currently be applied to that node. In this particular case, use:

focusNode.prefLabel.some(label => label.lang == 'en')

Using the same technique you can now produce any number of custom features in declarative form and with relatively little imperative code. As this executable code is stored together with the shape definitions, anyone using your ontology will benefit from the same extensions. This means you can now, among others, produce a library of utility functions for common tasks that are custom tailored for your domain data graph.

For TopBraid versions before 7.1 the Explore and Modify menus would only show actions that are defined in an included subgraph of the asset collection. From 7.1 onwards, you may also declare such actions in files that are not included, for example to define operations that can be applied to all asset collections of a given type. To do so, define your actions in a separate file and in any .ui.ttlx file use the property teamwork:scriptGraph to link from the applicable instances of teamwork:ProjectType to the graph URIs of that file. For Explore actions, the system will then look into all such graphs linked to any asset collection included into the currently focused asset collection. For Modify actions, only the directly associated graphs will be used - for example this prevents people from making modifications to an Ontology from a Taxonomy.

Example Inferred Property Values

The SHACL Advanced Features include a mechanism to dynamically compute (infer) values of certain properties, based on the sh:values property. Active Data Shapes technology introduces a new kind of SHACL Node Expression based on scripts that makes it possible to dynamically compute property values from JavaScript expressions. These node expressions are blank nodes that have the script as value of dash:js. In the script, you can use the variable focusNode to reference the current context node.

In the following example, we introduce a new derived property skos:narrowerConceptCount which is an integer computed as the number of narrower concepts.

skos:Concept-narrowerConceptCount
  a sh:PropertyShape ;
  sh:path skos:narrowerConceptCount ;
  sh:datatype xsd:integer ;
  sh:description "The number of narrower concepts of this." ;
  sh:group skos:HierarchicalRelationships ;
  sh:maxCount 1 ;
  sh:name "narrower concept count" ;
  sh:order "10"^^xsd:decimal ;
  sh:values [
      dash:js "focusNode.narrower.length" ;
    ] .

skos:Concept sh:property skos:Concept-narrowerConceptCount .

With the new property defined as above, forms of SKOS concepts will now show the count:

The dash:js expression may also return an array, in which case multiple values would be inferred.

Make sure to use this (powerful) feature with care, as it might have a performance impact if the system needs to repeatedly fire up a Script Runtime for each individual value on a form. Performance should however, be good as long as you call your inference as part of some other script. In the above case, you could now query the new property like any other property of the instance:

As usual, inferred properties cannot be directly modified, i.e. you cannot assign them. Change the underlying values (here: skos:broader) instead.

Pro tip: if you want the Form panel and other parts of the UI update automatically after changes to the asserted properties annotate your node expression with dash:dependencyPredicate:

skos:Concept-narrowerConceptCount
  ...
  sh:values [
      dash:js "focusNode.narrower.length" ;
      dash:dependencyPredicate skos:broader ;
    ] .

The above will mean that the counter will automatically update whenever some skos:broader triple has been changed (we are using skos:broader as the sh:inversePath in the computation of the narrower property).

Current limitations: these script-based inferred values only work within graphs that are teamwork control, i.e. you can only query them within EDG asset collections, not from other (file based) graphs in your installation. Furthermore, make sure that the focus node only has a single type, so that it is able to pick the right JavaScript class for the focusNode variable.