Skip to content

Commit

Permalink
Added page for the integrate command
Browse files Browse the repository at this point in the history
  • Loading branch information
Aklakan committed Sep 13, 2024
1 parent 5ed9a29 commit 7ffec78
Showing 1 changed file with 106 additions and 0 deletions.
106 changes: 106 additions & 0 deletions docs/integrate/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
title: Integrate
has_children: true
nav_order: 30
layout: default
---



# Command: `integrate`

`rpt integrate` is the command for mixed RDF data and SPARQL statement processing. The name stems from the various SPARQL extensions that make it possible to reference and process non-RDF data inside a SPARQL statements.



## Basic Usage

### Example 1: Simple Processing

`rpt integrate file1.ttl 'INSERT DATA { eg:s eg:p eg:o }' spo.rq`

The command above does the following:

* It loads `file1.ttl` (into the default graph)
* It runs the given SPARQL update statement which adds a triple. For convenience, RPT includes a static copy of prefixes from [prefix.cc](https://prefix.cc). The prefix `eg` is defined as `http://www.example.org/`.
* It executes the query in the "file" `spo.rq` which is `CONSTRUCT WHERE { ?s ?p ?o }` and prints out the result. To be precise `spo.rq` is provided file in the JAR bundle (a class path resource). RPT ships with several predefined queries for common use cases.



#### Notes

* If you want RPT to print out the result of a query then you need to provide a query! If you omit `spo.rq` in the example above, rpt will only run the loading and the update.
* As alternatives for `spo.rq`, you can use `gspo.rq` to print out all quads and `spogspo.rq` to print out the union of triples and quads.
* The file extension `.rq` stands for `RDF query`. Likewise `.ru` stands for `RDF update`.



### Example 2: Starting a server

`rpt integrate --server`

This command starts a SPARQL server, by default on `port 8642`. Use e.g. `--port 8003` to pick a different one. You can mix this with the arguments from the first example.



* SPARQL endpoint and Yasgui frontend: http://localhost:8642/sparql
* GraphQL endpoint: http://localhost:8642/graphql
* Snorql frontend: http://localhost:8642/snorql
* Resource Viewer: <a href="http://localhost:8642/view/?*?http://www.wikidata.org/entity/Q1000094">http://localhost:8642/view/?*?http://www.wikidata.org/entity/Q1000094</a>



### Example 3: Using a different RDF Database Engine

RPT can run the RDF loading and SPARQL query execution on different (embedded) engines.

`rpt integrate --db-engine tdb2 --db-loc --db-keep mystores/mydata file.ttl spo.rq`

`rpt integrate -e tdb2 --loc --db-keep mystores/mydata file.ttl spo.rq`

By default, `rpt integrate` uses the in-memory engine. The `--engine` (short `-e`) option allows choosing a different RDF engine. For engines that require a file or a database folder, the location can be uniformly specified with `--db-loc` (short `--loc`). By default, **RPT will by default delete data it created itself but it will never delete existing data**. The flag `--db-keep` instructs RPT to keep database it created after termination.



### Example 4: SPARQL Proxy

You can quickly launch a SPARQL proxy with the combination of `-e` and `--server`:

`rpt integrate -e remote --loc https://dbpedia.org/sparql --server`

The proxy gives you a Yasgui frontend and the Linked Data Viewer.

Endpoints protected by basic authentication can be proxied by supplying the credentials with the URL:

`rpt integrate -e remote --loc https://USER:[email protected]/sparql --server`

Note, that this is **unsafe** and should be avoided in production, but it can be useful during development.



## Embedded SPARQL Engines

Embedded SPARQL engines are built into RPT and thus readily available. The following engines are currently available:

<table>
<tr><th>Engine</th><th>Description</tr>
<tr><td><b>mem</b></td><td>The default in-memory engine based on Apache Jena. Data is discarded once the RPT process terminates.</td></tr>
<tr><td><b>tdb2</b></td><td>Apache Jena's TDB2 persisent engine. Use <i>--loc</i> to specfify the database folder.</td></tr>
<tr><td><b>binsearch</b></td><td>Binary search engine that operates directly on sorted N-Triples files. Use <i>--loc</i> to specify the file path or HTTP(s) URL to the N-Triples file. For URLs, HTTP range requests must be supported!</td></tr>
<tr><td><b>remote</b></td><td>A pseudo engine that forwards all processing to the SPARQL endpoint whole URL is specified in <i>--loc</i>.</td></tr>
</table>



### (ARQ) Engine Configuration

The engines `mem`, `tdb2` and `binsearch` build an Jena's query engine `ARQ` and thus respect its configuration.



`rpt integrate --set 'arq:queryTimeout=60000' myQuery.rq`

###



0 comments on commit 7ffec78

Please sign in to comment.