A minimal protocol for all services #27

BorderCloud · 2019-04-03T14:07:07Z

A minimal protocol MUST be defined for SPARQL services to facilitate SPARQL clients development [2]:

For reading and writing, the SPARQL endpoint is identical at the end with: sparql (for example: http://example.org/dataset/sparql)
In reading, the HTTP GET and POST (for very long query) methods are supported and they will use the same parameter named "query" to transmit a query.
In writing, the POST method is supported and it will use the parameter named update to send a request.
The format of default response (ie without ACCEPT parameter about format) must be the same in function of the query type, ie. JSON for SPARQL SELECT.
The responses formats are JSON (and XML ?) for SPARQL SELECT.
To define the format of the service response, the ACCEPT parameter must be needed in the HTTP request.

For CONSTRUCT, DESCRIBE, INSERT, UPDATE, CLEAR, the default format need to be define also.

To simplify the development of tests, I propose to add 2 things:
7. Support deletion of all data with a CLEAR ALL request (may be disable or enable in the configuration of SPARQL service).
8. Support loading data (Turtle) with a LOAD INTO GRAPH query.

[2] (chapter 5) Le Linked Data à l'université: la plateforme LinkedWiki (French) K Rafes - 2019

Edit : The default format is different in function of the query type (CONSTUCT, CLEAR, etc).
Edit2 : Clarification of 4 and 5

cygri · 2019-04-03T17:54:15Z

Points 2, 3, 6 and 7 and 8 are already required by the SPARQL Protocol and SPARQL Update specifications, so any service that conforms to these specs will already meet these requirements.

Point 1 is stricter than the SPARQL Protocol. The Protocol places no restriction on the URL. In practice, URLs terminating in /sparql are very common, of course.

Point 4 is also stricter than the SPARQL Protocol. The Protocol doesn't place any restriction on the default response format if no accept header is sent. CONSTRUCT and DESCRIBE queries will be a problem here because they require an RDF format and not JSON.

Point 5 is not quite clear. Are you saying that JSON and XML must be supported? Are you saying that other formats such as CSV/TSV must not be supported? And again, what about the RDF formats for CONSTRUCT and DESCRIBE?

Are you assuming that query endpoint URL and update endpoint URL are required to be the same? The SPARQL Protocol spec doesn't have that restriction (in part for safety reasons; access control rules are likely to be different for both).

So, is the proposal here to define some kind of additional stricter protocol, and a server conforming to that protocol must support the SPARQL Protocol with a query+update endpoint (which means support for SPARQL Query and SPARQL Update), and on top of that also meet the additional restrictions of points 1, 4, 5 and 9? Or is the proposal to change/tighten the SPARQL Protocol and make those points part of it?

VladimirAlexiev · 2019-04-03T19:20:16Z

For 4, how do you write JSON unless its JSONLD?

JervenBolleman · 2019-04-03T19:31:20Z

Point 4 means you can't send back HTML which makes endpoints like sparql.uniprot.org sad when dealing with some browsers as I think the accept:/ should return HTML in most practical deployments.

BorderCloud · 2019-04-03T20:03:46Z

Point 5 is not quite clear. Are you saying that JSON and XML must be supported?

Yes

Are you saying that other formats such as CSV/TSV must not be supported?

No, JSON and XML are the minimum of minimum in SPARQL. A service may support also CSV/TSV.

And again, what about the RDF formats for CONSTRUCT and DESCRIBE?

This minimal protocol is for all type of query SELECT, CONSTRUCT, DESCRIBE, INSERT, UPDATE, CLEAR, etc.

Are you assuming that query endpoint URL and update endpoint URL are required to be the same? The SPARQL Protocol spec doesn't have that restriction (in part for safety reasons; access control rules are likely to be different for both).

I think this difference unnecessarily complicates the implementation of SPARQL clients who need to know who developed a SPARQL service to determine whether or not to store one or two endpoint' URLs (and know all the other differences in the protocol between manufacturers of SPARQL service).

Safety reasons are not good reasons to have different endpoints because SELECT queries need also an access control when there are graphs with limited accesses.

These differences are useless and will simplify SPARQL.

So, is the proposal here to define some kind of additional stricter protocol, and a server conforming to that protocol must support the SPARQL Protocol with a query+update endpoint (which means support for SPARQL Query and SPARQL Update), and on top of that also meet the additional restrictions of points 1, 4, 5 and 9? Or is the proposal to change/tighten the SPARQL Protocol and make those points part of it?

I do not have the project to create a new protocol but only to tighten the perimeter of the original protocol and replace several "MAY" by "MUST" in the original specifications in order to simplify SPARQL.

BorderCloud · 2019-04-03T20:16:14Z

@VladimirAlexiev > For 4, how do you write JSON unless its JSONLD?

For SELECT query, it will be "Query Results JSON Format". I speak here of a minimum.

BorderCloud · 2019-04-03T20:29:25Z

@JervenBolleman

Point 4 means you can't send back HTML which makes endpoints like sparql.uniprot.org sad when dealing with some browsers as I think the accept:/ should return HTML in most practical deployments.

Browsers insert automatically HTML in the ACCEPT parameter so it's not useful to do HTML like the format by default. It will continue to display HTML if an user tests a query via a browser.

kasei · 2019-04-03T20:31:51Z

I think this difference unnecessarily complicates the implementation of SPARQL clients who need to know who developed a SPARQL service to determine whether or not to store one or two endpoint' URLs (and know all the other differences in the protocol between manufacturers of SPARQL service).

The knowledge of "whether or not to store one or two endpoint' URLs" should not require knowing who developed the service. That information can come from the service description.

Safety reasons are not good reasons to have different endpoints because SELECT queries need also an access control when there are graphs with limited accesses.

These differences are useless and will simplify SPARQL.

I strongly believe that these differences are not useless, and that safety reasons are very good reasons for this design. The current spec specifically designed so that different authn/authz approaches could be taken for query and update.

BorderCloud · 2019-04-03T20:44:34Z

@kasei

That information can come from the service description.

In the protocol, a service SHOULD return a service description document... But generally, these informations not exist.
One alone URL is the best solution to simplify the work of all developers.

kasei · 2019-04-03T20:50:31Z

@BorderCloud "best solution" is very subjective here. It is the simplest, but it is not expressive enough for some implementations, and it is not backwards compatible.

More use of Service Descriptions would certainly help people working on query federation, client tools, etc. The fact that we don't see more widespread use does not strike me as a reason to ignore the potential benefits, so much as a reason to promote those benefits and try to help implementations add support for Service Descriptions.

cygri · 2019-04-03T21:03:16Z

@BorderCloud The responses formats are JSON and XML for debugging.

I don’t understand why XML and what you mean by debugging. Wouldn’t it be simpler to require only JSON?

And again, if the minimum is JSON and XML (I assume you mean the SPARQL Query Results JSON and XML Formats), how should a server answer to CONSTRUCT and DESCRIBE? Their responses require an RDF serialisation format such as Turtle and can’t be represented in the SPARQL Query Results JSON and XML Formats.

BorderCloud · 2019-04-03T21:30:50Z

@kasei I am sad of service description is rarely used but after 6 years, I think this tool will be adopted in 10 years or may be more. It's useless to wait it if we can simplify quickly this protocol...

@cygri

I don’t understand why XML and what you mean by debugging. Wouldn’t it be simpler to require only JSON?

JSON is now the format in production (less verbose and Wikidata uses it by default).
XML is for the SPARQL editors/developpers (for debugging) because there are more details than JSON.

And again, if the minimum is JSON and XML (I assume you mean the SPARQL Query Results JSON and XML Formats), how should a server answer to CONSTRUCT and DESCRIBE? Their responses require an RDF serialisation format such as Turtle and can’t be represented in the SPARQL Query Results JSON and XML Formats.

It's truth. In function of type of query, the type by default can be different. I updated my top message.

cygri · 2019-04-04T13:32:36Z

XML is for the SPARQL editors/developpers (for debugging) because there are more details than JSON.

What detail is there in the XML results format that is not in the JSON results format?

VladimirAlexiev · 2019-04-04T14:18:20Z

For reading or writing, the format .. JSON for SPARQL SELECT.

But SELECT is only reading. So remove the word "writing".

BorderCloud · 2019-04-04T14:38:38Z

@VladimirAlexiev it's done

@cygri

What detail is there in the XML results format that is not in the JSON results format?

Very good question. I checked the specifications of SPARQL 1.1. In theory , there are no difference between metadata in JSON and XML. In practice, I use several information only available in the XML of Virtuoso. I thought it was in the specification.
May be, it's necessary to extend/clarify the metadata in output of SPARQL SELECT results in SPARQL 1.2.

If metadata is the same in XML or JSON, we can simplify my proposal using only JSON.

cygri · 2019-04-04T16:40:07Z

Regarding the various update forms (INSERT, UPDATE, etc.), the current protocol spec says that success or failure is indicated through the HTTP status code, and the message body can be whatever the server chooses.

In the case of an error, there should be a machine-readable error response. We have #8 for that.

In the case of success, is it really necessary to say anything about what the server should return?

BorderCloud · 2019-04-04T17:35:55Z

@cygri

In the case of success, is it really necessary to say anything about what the server should return?

The minimum can be:

the nb of lines created/updated/deleted by the query.
the execution time.

After a success, a database may want to communicate with the SPARQL client... For example, rise a warning:

"Query is not optimize"
"The hard drive is soon full"
"The RAM is insufficient"
"SHACL rules rised a warning"
etc.

The success/warning messages are in relation with the error messages issues (#8).

afs · 2019-04-04T20:22:11Z

SPARQL protocol is a web protocol and uses features of HTTP such as content negotiation
and status codes.

This makes it a "good citizen" and helps because programming languages libraries and frameworks already have code HTTP interactions. That makes writing clients easier (people even use curl for client side use).

An an implementer I can say that even something like "number of triples deleted" can have a big impact.
As an example, in a lightweight implementation of a SPARQL server, DROP GRAPH <uri> might be simply deleting a file or unlinking a datastructure for the garbage collector to clear up later. The number of triples is not available. Calculating that number may significantly slow down what is otherwise a quick and cheap operation (see Linked Data Platform).

Having suggested profiles and a described set of choices is fine and useful. A minimum is something different - it is about enabling interoperation and should provide implementation freedoms on both sides - client and server.

BorderCloud · 2019-04-04T20:53:50Z

@afs
I am agree but a feedback is often asked by the developers during the creation of a new query (minimum is an error message readable by all SPARQL editors).
Maybe we can disassociate the queries in production (with an empty JSON with only true or false) and the queries via a SPARQL editor or a test runner (where we can imagine a debug mode with a JSON with a space for a maximum of details / messages / reports).

afs · 2019-04-04T21:40:31Z

The example was specifically not about an error condition. Issue #8.

If it is helpful to some use cases, then writing up a specific set of choices from the least restricting protocol, may be helpful. It then has to make its case to the world.

BorderCloud · 2019-04-04T23:21:49Z

If the CG is agree on the principle, I will participate to define this minimal API in detail in the WG and I will upgrade all my libs on this minimal API.

cygri · 2019-04-05T09:31:30Z

@kasei You said above that the proposal here (specifically, requiring a single endpoint URL for query and update) is “not backwards compatible”. I have trouble understanding why. If the proposal were accepted, clients designed for the 1.1 protocol would still work with 1.2 servers.

I don't agree with some of the things proposed here, but I am in favour of tightening down the protocol so that a minimal 1.2 client can be simpler than a minimal 1.1 client.

(There is a conversation to be had about the relative lack of adoption for SPARQL-SD, but I'm not sure that a GitHub issue is the right format for that.)

afs · 2019-04-05T09:45:22Z

The title is "minimal protocol" not "minimal client". They are different things.

A system 1.2 that offers only a "minimal protocol" may not work with a 1.1 client.

To help a "minimal client", it is a profile of the protocol.

BorderCloud · 2019-04-05T10:24:15Z

@afs

A system 1.2 that offers only a "minimal protocol" may not work with a 1.1 client.

This minimal protocol "1.2" is compatible with the specification 1.1 and it s already supported by several SPARQL services.
So if a 1.1 client doesn't support this minimal protocol, this client does not support the protocol 1.1.

Today, no client supports all implementations of SPARQL services protocols "compliant with 1.1". Today, I have to use Varnish to modify the different protocols "compliant 1.1" else my client could not work with a lot SPARQL services.

Here, it's only a proposal... I ask only the same minimal and the simplest protocol for all SPARQL services.

afs · 2019-04-05T11:27:40Z

Yes, such a minimal protocol "1.2" is compatible with the specification 1.1. The next statement is about the reverse situation is of 1.1 protocol to 1.2 server - the client starts the interaction. This is also a discussion for issue #1.

An HTTP client does not have to handle all formats. It asks for the formats it wants. A valid 1.1 client may ask for XML (e.g. because it wants to process it as XML).

My suggestion is to write a "best practice guide" for writing a client. Such as document is something a CG can produce.

It would say to things like "Use HTTP header Accept: application/sparql-results+json to get JSON results" -- (and they all do provide them, don't they?). It's the HTTP way of doing things and I think we should fit into the wider web ecosystem by encouraging the use of existing technology. In HTTP, no accept header is saying "any" so explaining that would be helpful as well.

As in the EasierRDF discussions, creating documentation and guides will be more effective to help people than technology changes. The hard part of writing a client can be understanding the nitty-gritty details.

kasei · 2019-04-05T15:33:56Z

@cygri

I have trouble understanding why. If the proposal were accepted, clients designed for the 1.1 protocol would still work with 1.2 servers.

Yes, but a 1.2 client would no longer be able to work with some 1.1 servers. What's being proposed is taking what is a perfectly fine setup for 1.1 and making it non-conforming for 1.2. This change would also mean that such deployments couldn't be updated to 1.2. Doing so to simplify the job of a writing a testing client seems like a bad trade to me, especially since I've already suggested a solution (using service descriptions) that maintains backwards compatibility and can be employed using the existing specs right now.

kasei · 2019-04-05T15:35:40Z

@BorderCloud

Today, no client supports all implementations of SPARQL services protocols "compliant with 1.1".

I'd be interested in knowing about the specific cases you've come across in your testing.

BorderCloud · 2019-04-05T16:39:05Z

@kasei

I tried to translate the "text" about protocol tests in JMeter tests. Today, it's impossible to use the same protocol tests for all SPARQL services. Their protocols are differents but "compliant 1.1".

Here my implementation of JMeter tests:
https://github.com/BorderCloud/rdf-tests/tree/withJmeter/sparql11/data-sparql11/protocol

In consequence, I have to use Varnish to align their different protocols to run the same tests about SPARQL query/update.
Here 3 examples of Varnish configuration to realign a SPARQL service on a commun API:
https://github.com/BorderCloud/tft-jena-fuseki/blob/master/default.vcl
https://github.com/BorderCloud/tft-virtuoso7-stable/blob/master/default.vcl
https://github.com/BorderCloud/tft-stardog/blob/master/default.vcl

kasei · 2019-04-05T16:42:33Z

@BorderCloud I'm not familiar with the format for these Varnish rules, but I don't see anything that indicates that the endpoints aren't using the standard Protocol. Can you summarize some of the things that make the implementations different that require you to do implementation-specific alignment?

BorderCloud · 2019-04-05T17:05:40Z

@kasei
For example, several SPARQL service have only one endpoint but a lot use two endpoints... A client SPARQL can not guess.
The classic parameter "update" for SPARQL update may become "query" sometimes...
Sometimes SPARQL UPDATE works only with HTTP GET and other only with HTTP POST (and I don't know fix this difference with Varnish).
A HTTP request without ACCEPT can raise an error and sometimes not.
Etc.

kasei · 2019-04-05T17:45:47Z

@BorderCloud Interesting. Some of those seem like simple violations of HTTP or the SPARQL Protocol (failing without an Accept header, "update" becoming "query", and performing updates with GET). As I've mentioned previously, having two endpoints is conforming by design. Clients aren't meant to "guess" endpoint names; They must be provided out-of-band, or discovered via service description.

I'm not sure any of these would qualify as issues that mean a properly written SPARQL 1.1 Protocol client would "support" a conforming Protocol endpoint.

TallTed · 2019-04-05T19:03:56Z

@BorderCloud --

In theory , there are no difference between metadata in JSON and XML. In practice, I use several information only available in the XML of Virtuoso.

I believe this is the first I've heard of this discrepancy in output (the details of which are not clear here).

If you have not already done so, please raise this to an appropriate Virtuoso-focused space -- whether an open source project issue, the public OpenLink Community Forum, or a confidential OpenLink Support Case.

cygri · 2019-04-05T19:04:37Z

@kasei

Doing so to simplify the job of a writing a testing client seems like a bad trade to me

Well I suppose I would agree with that, but I'm not interested in writing a test client. I'm interested in making SPARQL clients easier to implement and use.

Having thought more about this, I will climb down a little bit and advocate something slightly different: The 1.2 version of the protocol should say that if a service implements update, it SHOULD also implement query. That (1) preserves backwards compatibility, (2) avoids the security concerns over having to secure the update operation on a public query endpoint, and (3) still might get us to a future where one can do R/W ops against a service without needing to handle multiple endpoint URLs.

TallTed · 2019-04-05T19:26:43Z

One thing which I think falls reasonably into this discussion (though perhaps it should be split to its own) is the need for a "soft" success/failure status/response code -- similar to ODBC's SQL_SUCCESS_WITH_INFO status, as came up in Solid development when discussing SPARQL UPDATE --

From the SPARQL UPDATE spec:

3.1.3.3 DELETE WHERE
Analogous to DELETE/INSERT, deleting triples that are not present, or from a graph that is not present will have no effect and will result in success.

...

I am very disappointed to see the SPARQL 1.1 UPDATE spec excerpt, which says to me there's a tragic bug in the SPARQL 1.1 UPDATE spec, and a failure to learn from existing standards and writings, particularly those which support multiple users acting on the same resources (e.g., ODBC clients of a DBMS instance).

I would suggest that the appropriate response is not a full-success, but a warning or soft-fail/soft-success like ODBC's SQL_SUCCESS_WITH_INFO -- so the user/app can know that the thing they tried to delete wasn't there to delete, which would allow them to address typos and similar situations -- while also not being a hard-fail, so handling can be gentler (and users/apps might choose to ignore the INFO portion, but this is then an informed choice, and they need not be left guessing whether they'd actually deleted the thing)...

(We MUST find a way to modify W3 specs more quickly than is currently possible, especially when such app-blocking errors are discovered!)

As discussed above, there are many situations where a simple success/failure HTTP response code is not enough -- even if the body of such response may contain more info.

kasei · 2019-04-05T20:12:39Z

@cygri

The 1.2 version of the protocol should say that if a service implements update, it SHOULD also implement query.

Yes, I think that's a great solution. So long as deployments can continue to have an endpoint for query and another for update (which SHOULD also allow queries), I'd welcome that change.

JervenBolleman added the protocol improving sending queries over the wire label Apr 3, 2019

BorderCloud mentioned this issue Apr 4, 2019

Test suite #38

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A minimal protocol for all services #27

A minimal protocol for all services #27

BorderCloud commented Apr 3, 2019 •

edited

Loading

cygri commented Apr 3, 2019

VladimirAlexiev commented Apr 3, 2019

JervenBolleman commented Apr 3, 2019

BorderCloud commented Apr 3, 2019 •

edited

Loading

BorderCloud commented Apr 3, 2019

BorderCloud commented Apr 3, 2019

kasei commented Apr 3, 2019

BorderCloud commented Apr 3, 2019

kasei commented Apr 3, 2019

cygri commented Apr 3, 2019

BorderCloud commented Apr 3, 2019 •

edited

Loading

cygri commented Apr 4, 2019

VladimirAlexiev commented Apr 4, 2019

BorderCloud commented Apr 4, 2019 •

edited

Loading

cygri commented Apr 4, 2019

BorderCloud commented Apr 4, 2019 •

edited

Loading

afs commented Apr 4, 2019 •

edited

Loading

BorderCloud commented Apr 4, 2019 •

edited

Loading

afs commented Apr 4, 2019

BorderCloud commented Apr 4, 2019

cygri commented Apr 5, 2019

afs commented Apr 5, 2019

BorderCloud commented Apr 5, 2019

afs commented Apr 5, 2019

kasei commented Apr 5, 2019

kasei commented Apr 5, 2019

BorderCloud commented Apr 5, 2019

kasei commented Apr 5, 2019

BorderCloud commented Apr 5, 2019

kasei commented Apr 5, 2019

TallTed commented Apr 5, 2019 •

edited

Loading

cygri commented Apr 5, 2019

TallTed commented Apr 5, 2019

kasei commented Apr 5, 2019

A minimal protocol for all services #27

A minimal protocol for all services #27

Comments

BorderCloud commented Apr 3, 2019 • edited Loading

cygri commented Apr 3, 2019

VladimirAlexiev commented Apr 3, 2019

JervenBolleman commented Apr 3, 2019

BorderCloud commented Apr 3, 2019 • edited Loading

BorderCloud commented Apr 3, 2019

BorderCloud commented Apr 3, 2019

kasei commented Apr 3, 2019

BorderCloud commented Apr 3, 2019

kasei commented Apr 3, 2019

cygri commented Apr 3, 2019

BorderCloud commented Apr 3, 2019 • edited Loading

cygri commented Apr 4, 2019

VladimirAlexiev commented Apr 4, 2019

BorderCloud commented Apr 4, 2019 • edited Loading

cygri commented Apr 4, 2019

BorderCloud commented Apr 4, 2019 • edited Loading

afs commented Apr 4, 2019 • edited Loading

BorderCloud commented Apr 4, 2019 • edited Loading

afs commented Apr 4, 2019

BorderCloud commented Apr 4, 2019

cygri commented Apr 5, 2019

afs commented Apr 5, 2019

BorderCloud commented Apr 5, 2019

afs commented Apr 5, 2019

kasei commented Apr 5, 2019

kasei commented Apr 5, 2019

BorderCloud commented Apr 5, 2019

kasei commented Apr 5, 2019

BorderCloud commented Apr 5, 2019

kasei commented Apr 5, 2019

TallTed commented Apr 5, 2019 • edited Loading

cygri commented Apr 5, 2019

TallTed commented Apr 5, 2019

kasei commented Apr 5, 2019

BorderCloud commented Apr 3, 2019 •

edited

Loading

BorderCloud commented Apr 3, 2019 •

edited

Loading

BorderCloud commented Apr 3, 2019 •

edited

Loading

BorderCloud commented Apr 4, 2019 •

edited

Loading

BorderCloud commented Apr 4, 2019 •

edited

Loading

afs commented Apr 4, 2019 •

edited

Loading

BorderCloud commented Apr 4, 2019 •

edited

Loading

TallTed commented Apr 5, 2019 •

edited

Loading