From 7ffec781e7e14fa8d3795511ebadcc0de60e319d Mon Sep 17 00:00:00 2001 From: Claus Stadler Date: Fri, 13 Sep 2024 13:01:36 +0200 Subject: [PATCH] Added page for the integrate command --- docs/integrate/index.md | 106 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 106 insertions(+) create mode 100644 docs/integrate/index.md diff --git a/docs/integrate/index.md b/docs/integrate/index.md new file mode 100644 index 0000000..5f13ae3 --- /dev/null +++ b/docs/integrate/index.md @@ -0,0 +1,106 @@ +--- +title: Integrate +has_children: true +nav_order: 30 +layout: default +--- + + + +# Command: `integrate` + +`rpt integrate` is the command for mixed RDF data and SPARQL statement processing. The name stems from the various SPARQL extensions that make it possible to reference and process non-RDF data inside a SPARQL statements. + + + +## Basic Usage + +### Example 1: Simple Processing + +`rpt integrate file1.ttl 'INSERT DATA { eg:s eg:p eg:o }' spo.rq` + +The command above does the following: + +* It loads `file1.ttl` (into the default graph) +* It runs the given SPARQL update statement which adds a triple. For convenience, RPT includes a static copy of prefixes from [prefix.cc](https://prefix.cc). The prefix `eg` is defined as `http://www.example.org/`. +* It executes the query in the "file" `spo.rq` which is `CONSTRUCT WHERE { ?s ?p ?o }` and prints out the result. To be precise `spo.rq` is provided file in the JAR bundle (a class path resource). RPT ships with several predefined queries for common use cases. + + + +#### Notes + +* If you want RPT to print out the result of a query then you need to provide a query! If you omit `spo.rq` in the example above, rpt will only run the loading and the update. +* As alternatives for `spo.rq`, you can use `gspo.rq` to print out all quads and `spogspo.rq` to print out the union of triples and quads. +* The file extension `.rq` stands for `RDF query`. Likewise `.ru` stands for `RDF update`. + + + +### Example 2: Starting a server + +`rpt integrate --server` + +This command starts a SPARQL server, by default on `port 8642`. Use e.g. `--port 8003` to pick a different one. You can mix this with the arguments from the first example. + + + +* SPARQL endpoint and Yasgui frontend: http://localhost:8642/sparql +* GraphQL endpoint: http://localhost:8642/graphql +* Snorql frontend: http://localhost:8642/snorql +* Resource Viewer: http://localhost:8642/view/?*?http://www.wikidata.org/entity/Q1000094 + + + +### Example 3: Using a different RDF Database Engine + +RPT can run the RDF loading and SPARQL query execution on different (embedded) engines. + +`rpt integrate --db-engine tdb2 --db-loc --db-keep mystores/mydata file.ttl spo.rq` + +`rpt integrate -e tdb2 --loc --db-keep mystores/mydata file.ttl spo.rq` + +By default, `rpt integrate` uses the in-memory engine. The `--engine` (short `-e`) option allows choosing a different RDF engine. For engines that require a file or a database folder, the location can be uniformly specified with `--db-loc` (short `--loc`). By default, **RPT will by default delete data it created itself but it will never delete existing data**. The flag `--db-keep` instructs RPT to keep database it created after termination. + + + +### Example 4: SPARQL Proxy + +You can quickly launch a SPARQL proxy with the combination of `-e` and `--server`: + +`rpt integrate -e remote --loc https://dbpedia.org/sparql --server` + +The proxy gives you a Yasgui frontend and the Linked Data Viewer. + +Endpoints protected by basic authentication can be proxied by supplying the credentials with the URL: + +`rpt integrate -e remote --loc https://USER:PWD@dbpedia.org/sparql --server` + +Note, that this is **unsafe** and should be avoided in production, but it can be useful during development. + + + +## Embedded SPARQL Engines + +Embedded SPARQL engines are built into RPT and thus readily available. The following engines are currently available: + + + + + + + +
EngineDescription
memThe default in-memory engine based on Apache Jena. Data is discarded once the RPT process terminates.
tdb2Apache Jena's TDB2 persisent engine. Use --loc to specfify the database folder.
binsearchBinary search engine that operates directly on sorted N-Triples files. Use --loc to specify the file path or HTTP(s) URL to the N-Triples file. For URLs, HTTP range requests must be supported!
remoteA pseudo engine that forwards all processing to the SPARQL endpoint whole URL is specified in --loc.
+ + + +### (ARQ) Engine Configuration + +The engines `mem`, `tdb2` and `binsearch` build an Jena's query engine `ARQ` and thus respect its configuration. + + + +`rpt integrate --set 'arq:queryTimeout=60000' myQuery.rq` + +### + + +