This document explains how Active Data Shapes can be used to import data from files that have been downloaded through HTTP requests or uploaded to the system. Examples include importing arbitrary JSON, spreadsheet and XML files. The document also explains how to query and update relational (SQL) databases. The document defines and illustrates the provided (JavaScript) APIs.

See also this Video on Importing Spreadsheet Data into TopBraid EDG using Active Data Shapes.

Scope of This Document

This document is part of the Active Data Shapes framework. We assume that you are familiar with the Active DASH Tutorial.

Getting Started

It is very common for data graphs to be populated from external files - Spreadsheets, XML, JSON etc. Users may need to upload files or files may originate from web services and are downloaded for processing. The content of these non-RDF files shall be transformed into RDF triples, typically in a given ontology. This document introduces how Active Data Shapes and SHACL can be used to facilitate such import processes.

The easiest way to get started with this technology is by trying it out, here with the Script Editor panel of TopBraid EDG 6.4. This panel has a button to upload any file to the server, so that the file can be processed by a script:

You can incrementally develop your importer script by trying it out, especially using the Preview button:

Once you are happy with your script, you can turn it into a Resource Action that will for example appear in the Modify menu of the 'Home' resource of your asset collection.

Working with Uploaded Files

Files that have been uploaded will go into a temporary folder on the server from which they will be removed once they are no longer needed (in TopBraid, they are discarded if they haven't been accessed for more than an hour). Scripts can access those files using a provided file identifier, which in Script Panel from the example above is automatically called file. Resource actions may declare multiple files to upload and restrict what mime type they can have. The built-in function graph.uploadedFile(file) can then be used to return an instance of the JavaScript class UploadedFile which provides the following functions:

class UploadedFile {
	
	/**
	 * Assuming this file represents a spreadsheet, cast it into a Spreadsheet instance.
	 * This is only supported for files with suitable mime types.
	 * @returns {UploadedSpreadsheet}
	 */
	asSpreadsheet() {
		...
	}
	
	/**
	 * Assuming this file is a valid XML document, cast it into an XMLNode.
	 * @returns {XMLNode}
	 */
	asXML() {
		...
	}
	
	/**
	 * Gets the (client) name under which this file was uploaded to the server.
	 * @return {string}
	 */
	get name() {
		...
	}
	
	/**
	 * Assuming this is a text file (e.g. JSON, XML, HTML, CSV), this gets the content of the file as a string.
	 */
	get text() {
		...
	}
	
	/**
	 * Gets the mime type of the file as a string.
	 */
	get type() {
		...
	}
}

For example, use graph.uploadedFile(file).text to get the full content from the file as a string. Your script can then proceed with whatever it like wants to do with that file content, for example produce new instances for each row of a spreadsheet. Details will be explained in the following sections.

Downloading Files via HTTP

With TopBraid EDG 7.0 onwards, files can also be downloaded through HTTP requests. The entry point is the function IO.http() which performs arbitrary HTTP requests and either returns the response data to the program or saves the file as an 'uploaded' file so that it can be processed further.

The following example requests a tab-separated values spreadsheet from a server (here, it's actually using a TopBraid SPARQL endpoint but that doesn't matter). The resulting spreadsheet is saved to a temp file which can be processed further by the script, for example to count the number of rows.


let r = IO.http({
    url: 'http://localhost:4500/tbl/sparql',
    method: 'POST',
    params: {
        query: `
            SELECT *
            WHERE {
                GRAPH <http://topbraid.org/examples/kennedys> {
                    ?person a <http://topbraid.org/examples/kennedys#Person> .
                    ?person <http://topbraid.org/examples/kennedys#birthYear> ?birthYear .
                }
            } ORDER BY ?birthYear`
    },
    headers: {
        Accept: 'text/tab-separated-values'
    },
    toFileSuffix: 'tsv'
})
IO.uploadedFile(r.file).asSpreadsheet().rows().length

As usual, you can explore the features of this API either in the script editor (via auto-complete etc) or on the API documentation pages.

Importing JSON Files

Handling JSON files from JavaScript is very easy. You simply need to fetch the text content of the uploaded file and parse it, then query the JSON like any other JavaScript object:

let u = graph.uploadedFile(file);
let json = JSON.parse(u.text);
schema.createPerson({
	uri: 'http://example.org/Person#' + encodeURIComponent(json.firstName + '_' + json.lastName),
	givenName: json.firstName,
	familyName: json.lastName,
});

Importing Spreadsheets

Spreadsheet files may be uploaded as Tab-separated values (.tsv), Comma-separated values (.csv) or Excel files (.xlsx). Two distinct APIs are provided to access the rows of those files. One with a cell-based API that loads the whole spreadsheet in bulk, and an alternative streaming API for large files.

Bulk Loader for Spreadsheets

A unified API is provided to access the sheets, rows and columns of such spreadsheets. To get started, use .asSpreadsheet() on the UploadedFile. This produces an object with the following signature:

class UploadedSpreadsheet {
		
	/**
	 * Gets the value in a given row and column, as a string, number, boolean or LiteralNode.
	 * @param {number} rowIndex  the index of the row, starting at 0
	 * @param {number} [columnIndex]  the index of the column, starting at 0, defaulting to 0
	 * @param {number} [sheetIndex]  the index of the sheet, starting at 0, defaulting to 0
	 */
	cell(rowIndex, columnIndex, sheetIndex) {
		...
	}
	
	/**
	 * Gets the names of the columns on a given sheet.
	 * @param {number} [sheetIndex]  the index of the sheet, starting at 0, defaulting to 0
	 * @returns {string[]}
	 */
	columnNames(sheetIndex) {
		...
	}
	
	/**
	 * Gets the number of rows on a given sheet.
	 * @param {number} [sheetIndex]  the index of the sheet, starting at 0, defaulting to 0
	 * @returns {number}
	 */
	rowCount(sheetIndex) {
		...
	}
	
	/**
	 * Gets a given row as a JavaScript object, with the column names as object keys
	 * and strings, numbers, booleans or LiteralNodes as values.
	 * @param {number} rowIndex  the index of the row, starting at 0
	 * @param {number} [sheetIndex]  the index of the sheet, starting at 0, defaulting to 0
	 * @returns {object}
	 */
	row(rowIndex, sheetIndex) {
		...
	}
		
	/**
	 * Gets all rows as an array of JavaScript objects, as in the row function.
	 * @param {number} [sheetIndex]  the index of the sheet, starting at 0, defaulting to 0
	 * @returns {object[]}
	 */
	rows(sheetIndex) {
		...
	}

	/**
	 * Gets the names of the sheets (at least one) as a string array.
	 * @returns {string[]}
	 */
	sheetNames() {
		...
	}
}

By default, all cell values will be strings. However, for Excel files the system will produce JavaScript booleans (true and false) if the cell values are recognized booleans, and numbers will be converted to JavaScript numbers automatically. Date cells will be converted to instances of LiteralNode, i.e. you can use .lex to retrieve the lexical form of the date in XSD format.

A small example of all this was already shown in the Getting Started section. The spreadsheet used there was a tab-separated file such as:

City Maori Name Population LatLong
Tauranga 135000 -37.683333 176.166667
Dunedin Ōtepoti 104500 -45.866667 170.5

The example script to convert those rows into instances of g:City is:

let s = graph.uploadedFile(file).asSpreadsheet();
for(let i = 0; i < s.rowCount(); i++) {
    let row = s.row(i);
    let city = g.createCity({
        uri: 'http://topquadrant.com/ns/examples/geography#' + encodeURIComponent(row.City),
        prefLabel: row.City,
        broader: focusNode,
        population: row.Population,
        lat: parseFloat(row.LatLong.split(' ')[0]),
        long: parseFloat(row.LatLong.split(' ')[1]),
    })
    let maoriName = row['Maori Name'];
    if(maoriName) {
        city.altLabel = skosxl.createLabel({
            literalForm: graph.langString(maoriName, 'mi')
        })
    }
}

Let's walk through a couple of techniques used here.

The first line, let s = graph.uploadedFile(file).asSpreadsheet(); fetches an UploadedSpreadsheet instance that we can then query to walk through the columns and rows.

The for loop walks each row individually. In each loop, the line let row = s.row(i) delivers a JavaScript object with one value for each column. For example, if the column name is City then you can query the current value using row.City. For names that are not valid JavaScript identifiers, use the array-based syntax such as row['Maori Name'].

An alternative technique (not shown here) to fetch values is through s.cell(rowIndex, colIndex).

Empty cells are not present so you would get null or undefined when you query them. This means that in some cases you may want to insert guarding if clauses such as if(maoriName) above.

The expression row.LatLong.split(' ') turns the combined Lat/Long string into an array of two strings, separated by spaces. You can use any number of pre-processing steps before assigning values here - the whole power of JavaScript string processing is at your disposal.

To assign values with the correct datatype, the SHACL shape definitions help you. In the example above, the property g.population has sh:datatype xsd:integer. Then although the value of row.Population is a JavaScript string, the system will automatically cast it into an xsd:integer literal as part of the assignment. In the case of lat and long however, no single sh:datatype constraint exists as the values may be either xsd:integer, xsd:decimal or xsd:double. It is therefore necessary to use the built-in parseFloat function to produce a JavaScript number. If you ever need to produce literals of specific datatypes, use something like graph.literal({ lex: '42', datatype: xsd.double }).

The line broader: focusNode assigns the currently selected resource (such as the g:Country of New Zealand) to the skos:broader property of the newly created city instance.

Streaming Loader for Spreadsheets

Very large spreadsheets are best imported with a streaming loader, row by row, minimizing memory consumption. This is an alternative API that is provided for files ending with .csv and .tsv only. The following example illustrates how to use it with a for...of loop in JavaScript.

let s = graph.uploadedFile(file).asSpreadsheetIterator();
for(let row of s) {
    let city = g.createCity({
        uri: 'http://topquadrant.com/ns/examples/geography#' + encodeURIComponent(row.City),
        prefLabel: row.City,
        broader: focusNode,
        population: row.Population,
        lat: parseFloat(row.LatLong.split(' ')[0]),
        long: parseFloat(row.LatLong.split(' ')[1]),
    })
    let maoriName = row['Maori Name'];
    if(maoriName) {
        city.altLabel = skosxl.createLabel({
            literalForm: graph.langString(maoriName, 'mi')
        })
    }
}

The result of asSpreadsheetIterator() can also be used as a iterator directly, i.e. calling next() will return an object with the fields done and value. Each value (row) is a JavaScript object with the column names as field names. All cell values are strings and may need to be converted downstream in the JavaScript code. The field s.columnNames can be used to query the names of the columns.

Importing XML Files

For importing XML, we provide a convenient read-only DOM-like API, based on the following class XMLNode. You can fetch an instance of XMLNode using graph.uploadedFile(file).asXML().


/**
 * Represents an XML DOM Node as delivered by UploadedFile.asXML().
 * In addition to the declared fields and methods, nodes that represent elements have all attributes
 * of the XML node as direct properties.
 * If attribute names clash with declared fields or methods then an underscore is appended.
 */
class XMLNode {
	
	/**
	 * Gets the child nodes as instances of XMLNode.
	 * @returns {XMLNode[]}
	 */
	get childNodes() {
		...
	}
	
	/**
	 * Checks if this node represents a CDATA section.
	 * @returns {boolean}
	 */
	isCDATASection() {
		...
	}
	
	/**
	 * Checks if this node represents a Comment.
	 * @returns {boolean}
	 */
	isComment() {
		...
	}
	
	/**
	 * Checks if this node represents an element.
	 * @returns {boolean}
	 */
	isElement() {
		...
	}
	
	/**
	 * Checks if this node represents a text node.
	 * @returns {boolean}
	 */
	isText() {
		...
	}
	
	/**
	 * Returns the local name.
	 * See https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/Node.html#getLocalName--
	 * @returns {string}
	 */
	get localName() {
		...
	}
	
	/**
	 * Returns the namespace URI.
	 * See https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/Node.html#getNamespaceURI--
	 * @returns {string}
	 */
	get namespaceURI() {
		...
	}
	
	/**
	 * Gets the node name, e.g. the tag name for element nodes.
	 * See https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/Node.html#getNodeName--
	 * @returns {string}
	 */
	get nodeName() {
		...
	}
	
	/**
	 * Gets the node value, e.g. the content of a text node.
	 * See https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/Node.html#getNodeValue--
	 * @returns {string}
	 */
	get nodeValue() {
		...
	}
	
	/**
	 * Returns the prefix.
	 * See https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/Node.html#getPrefix--
	 * @returns {string}
	 */
	get prefix() {
		...
	}
	
	/**
	 * Gets the text context of this node and its descendants.
	 * See https://docs.oracle.com/javase/8/docs/api/org/w3c/dom/Node.html#getTextContent--
	 * @returns {string}
	 */
	get text() {
		...
	}
	
	/**
	 * Gets the first result of an XPath expression starting with this node, or null.
	 * Attribute values are returned as plain strings.
	 * @param {string} xpath  the XPath expression
	 * @returns {XMLNode|string}
	 */
	xpathNode(xpath) {
		...
	}
	
	/**
	 * Gets all results of an XPath expression starting with this node.
	 * Attribute values are returned as plain strings.
	 * @param {string} xpath  the XPath expression
	 * @returns {XMLNode|string[]}
	 */
	xpathNodes(xpath) {
		...
	}	
}

Let's assume we have this XML file:

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>44.95</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
   </book>
   <book id="bk102">
      <author>Ralls, Kim</author>
      <title>Midnight Rain</title>
      <genre>Fantasy</genre>
      <price>5.95</price>
      <publish_date>2000-12-16</publish_date>
      <description>A former architect battles corporate zombies, an evil sorceress, and her own childhood to become queen of the world.</description>
   </book>
</catalog>

Here are some example queries and their results.

let xml = graph.uploadedFile(file).asXML();

xml.nodeName; // "catalog"

xml.isElement(); // true

xml.childNodes.length; // 2

xml.childNodes[0].id; // "bk101"

xml.childNodes[1].childNodes[0].text; // "Ralls, Kim"

xml.xpathNode('book[2]/@id'); // "bk102"

xml.xpathNodes('book').map(node => node.id); // ["bk101","bk102"]

Resource Actions

Resource Actions allow you to define plugins for the user interface, especially to add items to Explore and Modify drop down menus. The general technique of creating Resource Actions has been covered by the Tutorial. Here we cover the topic of resource actions that allow users to upload and process files.

You can attach such resource actions to any class where it makes sense. For example if your importer is aimed at defining instances of g:City which then become narrower concepts of a parent g:Country then you may want to attach your resource action to the Country class. The variable focusNode can in that case be used to access the currently selected g:Country. If you don't really need a focus node, you could decide to attach the resource action to the class owl:Ontology, which would make it show up for the Home resource of your asset collection.

Resource actions that perform a file import should be instances of dash:ModifyAction, because they require write access to the data. Other actions that accept an uploaded file, for example just to produce a report or HTML page, may be dash:ExploreActions.

In order to enable file uploads, use the property dash:mimeTypes at one or more sh:Parameter declarations. The value of that property must be a single comma-separated string of well-known mime types. The following table summarizes the most common file types:

File Type Mime Types string
JSON Files application/json
Spreadsheets .csv,.tsv,application/vnd.ms-excel,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
XML Files application/xml

The wizard to create parameters has a built-in convenience for these types:

A complete example is shown below:

g:Country
	...
	dash:resourceAction g:ImportCitiesAction .			

g:ImportCitiesAction
	a dash:ModifyAction ;
  	rdfs:label "Import cities..." ;
	dash:actionGroup g:ActionGroup ;
	sh:parameter [
		a sh:Parameter ;
		sh:path g:file ;
		dash:mimeTypes ".csv,.tsv" ;
		sh:datatype xsd:string ;
		sh:description "The spreadsheet to upload." ;
		sh:name "file" ;
	] ;
	dash:js """
let s = graph.uploadedFile(file).asSpreadsheet();
for(let i = 0; i < s.rowCount(); i++) {
    let row = s.row(i);
    let city = g.createCity({
        uri: 'http://topquadrant.com/ns/examples/geography#' + encodeURIComponent(row.City),
        prefLabel: row.City,
        broader: focusNode,
        population: row.Population,
        lat: parseFloat(row.LatLong.split(' ')[0]),
        long: parseFloat(row.LatLong.split(' ')[1]),
    })
    let maoriName = row['Maori Name'];
    if(maoriName) {
        city.altLabel = skosxl.createLabel({
            literalForm: graph.langString(maoriName, 'mi')
        })
    }
}""" .

The action will show up in the Modify menu of a selected g:Country and would open a dialog as shown:

Working with Relational Databases via SQL

The built-in object SQL can be used to connect to relational databases through SQL. This may be used to implement batch importers or update scripts. Two functions are provided right now: SQL.query and SQL.update.

The following example walks through all rows of the table called 'country' in the database called 'world' and creates one instance of the RDFS class g:Country for each, based on the values of the Code and Name columns.

let conn = {
    url: 'jdbc:mysql://localhost:3306/world', 
    user: 'newuser',
    password: '...'
}
let sql = `
    SELECT Code, Name
    FROM country;
`
let results = SQL.query(conn, sql);
results.forEach((row) => {
    g.createCountry({
        isoCountryCode3: row.Code,
        prefLabel: row.Name
    })
})

The following example shows the syntax of SQL update commands. As usual, you may want to use JavaScript string substitution to produce a suitable SQL string from your data.

let conn = {
    url: 'jdbc:mysql://localhost:3306/world', 
    user: 'newuser',
    password: '...'
}
let sql = `
	DROP TABLE country;
`
SQL.update(conn, sql);

Web Services

The API of import scripts can also be used from ADS Web Services. This is the preferred way of performing large-scale operations such as bulk imports, as going through the user interface and resource actions will carry additional overhead to update the UI and record the change history.