Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A minimal protocol for all services #27

Open
BorderCloud opened this issue Apr 3, 2019 · 34 comments
Open

A minimal protocol for all services #27

BorderCloud opened this issue Apr 3, 2019 · 34 comments
Labels
protocol improving sending queries over the wire

Comments

@BorderCloud
Copy link

BorderCloud commented Apr 3, 2019

A minimal protocol MUST be defined for SPARQL services to facilitate SPARQL clients development [2]:

  1. For reading and writing, the SPARQL endpoint is identical at the end with: sparql (for example: http://example.org/dataset/sparql)
  2. In reading, the HTTP GET and POST (for very long query) methods are supported and they will use the same parameter named "query" to transmit a query.
  3. In writing, the POST method is supported and it will use the parameter named update to send a request.
  4. The format of default response (ie without ACCEPT parameter about format) must be the same in function of the query type, ie. JSON for SPARQL SELECT.
  5. The responses formats are JSON (and XML ?) for SPARQL SELECT.
  6. To define the format of the service response, the ACCEPT parameter must be needed in the HTTP request.

For CONSTRUCT, DESCRIBE, INSERT, UPDATE, CLEAR, the default format need to be define also.

To simplify the development of tests, I propose to add 2 things:
7. Support deletion of all data with a CLEAR ALL request (may be disable or enable in the configuration of SPARQL service).
8. Support loading data (Turtle) with a LOAD INTO GRAPH query.

[2] (chapter 5) Le Linked Data à l'université: la plateforme LinkedWiki (French) K Rafes - 2019

Edit : The default format is different in function of the query type (CONSTUCT, CLEAR, etc).
Edit2 : Clarification of 4 and 5

@cygri
Copy link

cygri commented Apr 3, 2019

Points 2, 3, 6 and 7 and 8 are already required by the SPARQL Protocol and SPARQL Update specifications, so any service that conforms to these specs will already meet these requirements.

Point 1 is stricter than the SPARQL Protocol. The Protocol places no restriction on the URL. In practice, URLs terminating in /sparql are very common, of course.

Point 4 is also stricter than the SPARQL Protocol. The Protocol doesn't place any restriction on the default response format if no accept header is sent. CONSTRUCT and DESCRIBE queries will be a problem here because they require an RDF format and not JSON.

Point 5 is not quite clear. Are you saying that JSON and XML must be supported? Are you saying that other formats such as CSV/TSV must not be supported? And again, what about the RDF formats for CONSTRUCT and DESCRIBE?

  1. Are you assuming that query endpoint URL and update endpoint URL are required to be the same? The SPARQL Protocol spec doesn't have that restriction (in part for safety reasons; access control rules are likely to be different for both).

So, is the proposal here to define some kind of additional stricter protocol, and a server conforming to that protocol must support the SPARQL Protocol with a query+update endpoint (which means support for SPARQL Query and SPARQL Update), and on top of that also meet the additional restrictions of points 1, 4, 5 and 9? Or is the proposal to change/tighten the SPARQL Protocol and make those points part of it?

@VladimirAlexiev
Copy link
Contributor

For 4, how do you write JSON unless its JSONLD?

@JervenBolleman JervenBolleman added the protocol improving sending queries over the wire label Apr 3, 2019
@JervenBolleman
Copy link
Collaborator

Point 4 means you can't send back HTML which makes endpoints like sparql.uniprot.org sad when dealing with some browsers as I think the accept:/ should return HTML in most practical deployments.

@BorderCloud
Copy link
Author

BorderCloud commented Apr 3, 2019

Point 5 is not quite clear. Are you saying that JSON and XML must be supported?

Yes

Are you saying that other formats such as CSV/TSV must not be supported?

No, JSON and XML are the minimum of minimum in SPARQL. A service may support also CSV/TSV.

And again, what about the RDF formats for CONSTRUCT and DESCRIBE?

This minimal protocol is for all type of query SELECT, CONSTRUCT, DESCRIBE, INSERT, UPDATE, CLEAR, etc.

  1. Are you assuming that query endpoint URL and update endpoint URL are required to be the same? The SPARQL Protocol spec doesn't have that restriction (in part for safety reasons; access control rules are likely to be different for both).

I think this difference unnecessarily complicates the implementation of SPARQL clients who need to know who developed a SPARQL service to determine whether or not to store one or two endpoint' URLs (and know all the other differences in the protocol between manufacturers of SPARQL service).

Safety reasons are not good reasons to have different endpoints because SELECT queries need also an access control when there are graphs with limited accesses.

These differences are useless and will simplify SPARQL.

So, is the proposal here to define some kind of additional stricter protocol, and a server conforming to that protocol must support the SPARQL Protocol with a query+update endpoint (which means support for SPARQL Query and SPARQL Update), and on top of that also meet the additional restrictions of points 1, 4, 5 and 9? Or is the proposal to change/tighten the SPARQL Protocol and make those points part of it?

I do not have the project to create a new protocol but only to tighten the perimeter of the original protocol and replace several "MAY" by "MUST" in the original specifications in order to simplify SPARQL.

@BorderCloud
Copy link
Author

@VladimirAlexiev > For 4, how do you write JSON unless its JSONLD?

For SELECT query, it will be "Query Results JSON Format". I speak here of a minimum.

@BorderCloud
Copy link
Author

@JervenBolleman

Point 4 means you can't send back HTML which makes endpoints like sparql.uniprot.org sad when dealing with some browsers as I think the accept:/ should return HTML in most practical deployments.

Browsers insert automatically HTML in the ACCEPT parameter so it's not useful to do HTML like the format by default. It will continue to display HTML if an user tests a query via a browser.

@kasei
Copy link
Collaborator

kasei commented Apr 3, 2019

I think this difference unnecessarily complicates the implementation of SPARQL clients who need to know who developed a SPARQL service to determine whether or not to store one or two endpoint' URLs (and know all the other differences in the protocol between manufacturers of SPARQL service).

The knowledge of "whether or not to store one or two endpoint' URLs" should not require knowing who developed the service. That information can come from the service description.

Safety reasons are not good reasons to have different endpoints because SELECT queries need also an access control when there are graphs with limited accesses.

These differences are useless and will simplify SPARQL.

I strongly believe that these differences are not useless, and that safety reasons are very good reasons for this design. The current spec specifically designed so that different authn/authz approaches could be taken for query and update.

@BorderCloud
Copy link
Author

@kasei

That information can come from the service description.

In the protocol, a service SHOULD return a service description document... But generally, these informations not exist.
One alone URL is the best solution to simplify the work of all developers.

@kasei
Copy link
Collaborator

kasei commented Apr 3, 2019

@BorderCloud "best solution" is very subjective here. It is the simplest, but it is not expressive enough for some implementations, and it is not backwards compatible.

More use of Service Descriptions would certainly help people working on query federation, client tools, etc. The fact that we don't see more widespread use does not strike me as a reason to ignore the potential benefits, so much as a reason to promote those benefits and try to help implementations add support for Service Descriptions.

@cygri
Copy link

cygri commented Apr 3, 2019

@BorderCloud The responses formats are JSON and XML for debugging.

I don’t understand why XML and what you mean by debugging. Wouldn’t it be simpler to require only JSON?

And again, if the minimum is JSON and XML (I assume you mean the SPARQL Query Results JSON and XML Formats), how should a server answer to CONSTRUCT and DESCRIBE? Their responses require an RDF serialisation format such as Turtle and can’t be represented in the SPARQL Query Results JSON and XML Formats.

@BorderCloud
Copy link
Author

BorderCloud commented Apr 3, 2019

@kasei I am sad of service description is rarely used but after 6 years, I think this tool will be adopted in 10 years or may be more. It's useless to wait it if we can simplify quickly this protocol...

@cygri

I don’t understand why XML and what you mean by debugging. Wouldn’t it be simpler to require only JSON?

JSON is now the format in production (less verbose and Wikidata uses it by default).
XML is for the SPARQL editors/developpers (for debugging) because there are more details than JSON.

And again, if the minimum is JSON and XML (I assume you mean the SPARQL Query Results JSON and XML Formats), how should a server answer to CONSTRUCT and DESCRIBE? Their responses require an RDF serialisation format such as Turtle and can’t be represented in the SPARQL Query Results JSON and XML Formats.

It's truth. In function of type of query, the type by default can be different. I updated my top message.

@cygri
Copy link

cygri commented Apr 4, 2019

XML is for the SPARQL editors/developpers (for debugging) because there are more details than JSON.

What detail is there in the XML results format that is not in the JSON results format?

@VladimirAlexiev
Copy link
Contributor

  1. For reading or writing, the format .. JSON for SPARQL SELECT.

But SELECT is only reading. So remove the word "writing".

@BorderCloud
Copy link
Author

BorderCloud commented Apr 4, 2019

@VladimirAlexiev it's done

@cygri

What detail is there in the XML results format that is not in the JSON results format?

Very good question. I checked the specifications of SPARQL 1.1. In theory , there are no difference between metadata in JSON and XML. In practice, I use several information only available in the XML of Virtuoso. I thought it was in the specification.
May be, it's necessary to extend/clarify the metadata in output of SPARQL SELECT results in SPARQL 1.2.

If metadata is the same in XML or JSON, we can simplify my proposal using only JSON.

@cygri
Copy link

cygri commented Apr 4, 2019

Regarding the various update forms (INSERT, UPDATE, etc.), the current protocol spec says that success or failure is indicated through the HTTP status code, and the message body can be whatever the server chooses.

In the case of an error, there should be a machine-readable error response. We have #8 for that.

In the case of success, is it really necessary to say anything about what the server should return?

@BorderCloud
Copy link
Author

BorderCloud commented Apr 4, 2019

@cygri

In the case of success, is it really necessary to say anything about what the server should return?

The minimum can be:

  • the nb of lines created/updated/deleted by the query.
  • the execution time.

After a success, a database may want to communicate with the SPARQL client... For example, rise a warning:

  • "Query is not optimize"
  • "The hard drive is soon full"
  • "The RAM is insufficient"
  • "SHACL rules rised a warning"
  • etc.

The success/warning messages are in relation with the error messages issues (#8).

@afs
Copy link
Collaborator

afs commented Apr 4, 2019

SPARQL protocol is a web protocol and uses features of HTTP such as content negotiation
and status codes.

This makes it a "good citizen" and helps because programming languages libraries and frameworks already have code HTTP interactions. That makes writing clients easier (people even use curl for client side use).

An an implementer I can say that even something like "number of triples deleted" can have a big impact.
As an example, in a lightweight implementation of a SPARQL server, DROP GRAPH <uri> might be simply deleting a file or unlinking a datastructure for the garbage collector to clear up later. The number of triples is not available. Calculating that number may significantly slow down what is otherwise a quick and cheap operation (see Linked Data Platform).

Having suggested profiles and a described set of choices is fine and useful. A minimum is something different - it is about enabling interoperation and should provide implementation freedoms on both sides - client and server.

@BorderCloud
Copy link
Author

BorderCloud commented Apr 4, 2019

@afs
I am agree but a feedback is often asked by the developers during the creation of a new query (minimum is an error message readable by all SPARQL editors).
Maybe we can disassociate the queries in production (with an empty JSON with only true or false) and the queries via a SPARQL editor or a test runner (where we can imagine a debug mode with a JSON with a space for a maximum of details / messages / reports).

@afs
Copy link
Collaborator

afs commented Apr 4, 2019

The example was specifically not about an error condition. Issue #8.

If it is helpful to some use cases, then writing up a specific set of choices from the least restricting protocol, may be helpful. It then has to make its case to the world.

@BorderCloud
Copy link
Author

If the CG is agree on the principle, I will participate to define this minimal API in detail in the WG and I will upgrade all my libs on this minimal API.

@cygri
Copy link

cygri commented Apr 5, 2019

@kasei You said above that the proposal here (specifically, requiring a single endpoint URL for query and update) is “not backwards compatible”. I have trouble understanding why. If the proposal were accepted, clients designed for the 1.1 protocol would still work with 1.2 servers.

I don't agree with some of the things proposed here, but I am in favour of tightening down the protocol so that a minimal 1.2 client can be simpler than a minimal 1.1 client.

(There is a conversation to be had about the relative lack of adoption for SPARQL-SD, but I'm not sure that a GitHub issue is the right format for that.)

@afs
Copy link
Collaborator

afs commented Apr 5, 2019

The title is "minimal protocol" not "minimal client". They are different things.

A system 1.2 that offers only a "minimal protocol" may not work with a 1.1 client.

To help a "minimal client", it is a profile of the protocol.

@BorderCloud
Copy link
Author

@afs

A system 1.2 that offers only a "minimal protocol" may not work with a 1.1 client.

This minimal protocol "1.2" is compatible with the specification 1.1 and it s already supported by several SPARQL services.
So if a 1.1 client doesn't support this minimal protocol, this client does not support the protocol 1.1.

Today, no client supports all implementations of SPARQL services protocols "compliant with 1.1". Today, I have to use Varnish to modify the different protocols "compliant 1.1" else my client could not work with a lot SPARQL services.

Here, it's only a proposal... I ask only the same minimal and the simplest protocol for all SPARQL services.

@afs
Copy link
Collaborator

afs commented Apr 5, 2019

Yes, such a minimal protocol "1.2" is compatible with the specification 1.1. The next statement is about the reverse situation is of 1.1 protocol to 1.2 server - the client starts the interaction. This is also a discussion for issue #1.

An HTTP client does not have to handle all formats. It asks for the formats it wants. A valid 1.1 client may ask for XML (e.g. because it wants to process it as XML).

My suggestion is to write a "best practice guide" for writing a client. Such as document is something a CG can produce.

It would say to things like "Use HTTP header Accept: application/sparql-results+json to get JSON results" -- (and they all do provide them, don't they?). It's the HTTP way of doing things and I think we should fit into the wider web ecosystem by encouraging the use of existing technology. In HTTP, no accept header is saying "any" so explaining that would be helpful as well.

As in the EasierRDF discussions, creating documentation and guides will be more effective to help people than technology changes. The hard part of writing a client can be understanding the nitty-gritty details.

@kasei
Copy link
Collaborator

kasei commented Apr 5, 2019

@cygri

I have trouble understanding why. If the proposal were accepted, clients designed for the 1.1 protocol would still work with 1.2 servers.

Yes, but a 1.2 client would no longer be able to work with some 1.1 servers. What's being proposed is taking what is a perfectly fine setup for 1.1 and making it non-conforming for 1.2. This change would also mean that such deployments couldn't be updated to 1.2. Doing so to simplify the job of a writing a testing client seems like a bad trade to me, especially since I've already suggested a solution (using service descriptions) that maintains backwards compatibility and can be employed using the existing specs right now.

@kasei
Copy link
Collaborator

kasei commented Apr 5, 2019

@BorderCloud

Today, no client supports all implementations of SPARQL services protocols "compliant with 1.1".

I'd be interested in knowing about the specific cases you've come across in your testing.

@BorderCloud
Copy link
Author

@kasei

I tried to translate the "text" about protocol tests in JMeter tests. Today, it's impossible to use the same protocol tests for all SPARQL services. Their protocols are differents but "compliant 1.1".

Here my implementation of JMeter tests:
https://github.com/BorderCloud/rdf-tests/tree/withJmeter/sparql11/data-sparql11/protocol

In consequence, I have to use Varnish to align their different protocols to run the same tests about SPARQL query/update.
Here 3 examples of Varnish configuration to realign a SPARQL service on a commun API:
https://github.com/BorderCloud/tft-jena-fuseki/blob/master/default.vcl
https://github.com/BorderCloud/tft-virtuoso7-stable/blob/master/default.vcl
https://github.com/BorderCloud/tft-stardog/blob/master/default.vcl

@kasei
Copy link
Collaborator

kasei commented Apr 5, 2019

@BorderCloud I'm not familiar with the format for these Varnish rules, but I don't see anything that indicates that the endpoints aren't using the standard Protocol. Can you summarize some of the things that make the implementations different that require you to do implementation-specific alignment?

@BorderCloud
Copy link
Author

@kasei
For example, several SPARQL service have only one endpoint but a lot use two endpoints... A client SPARQL can not guess.
The classic parameter "update" for SPARQL update may become "query" sometimes...
Sometimes SPARQL UPDATE works only with HTTP GET and other only with HTTP POST (and I don't know fix this difference with Varnish).
A HTTP request without ACCEPT can raise an error and sometimes not.
Etc.

@kasei
Copy link
Collaborator

kasei commented Apr 5, 2019

@BorderCloud Interesting. Some of those seem like simple violations of HTTP or the SPARQL Protocol (failing without an Accept header, "update" becoming "query", and performing updates with GET). As I've mentioned previously, having two endpoints is conforming by design. Clients aren't meant to "guess" endpoint names; They must be provided out-of-band, or discovered via service description.

I'm not sure any of these would qualify as issues that mean a properly written SPARQL 1.1 Protocol client would "support" a conforming Protocol endpoint.

@TallTed
Copy link
Member

TallTed commented Apr 5, 2019

@BorderCloud --

In theory , there are no difference between metadata in JSON and XML. In practice, I use several information only available in the XML of Virtuoso.

I believe this is the first I've heard of this discrepancy in output (the details of which are not clear here).

If you have not already done so, please raise this to an appropriate Virtuoso-focused space -- whether an open source project issue, the public OpenLink Community Forum, or a confidential OpenLink Support Case.

@cygri
Copy link

cygri commented Apr 5, 2019

@kasei

Doing so to simplify the job of a writing a testing client seems like a bad trade to me

Well I suppose I would agree with that, but I'm not interested in writing a test client. I'm interested in making SPARQL clients easier to implement and use.

Having thought more about this, I will climb down a little bit and advocate something slightly different: The 1.2 version of the protocol should say that if a service implements update, it SHOULD also implement query. That (1) preserves backwards compatibility, (2) avoids the security concerns over having to secure the update operation on a public query endpoint, and (3) still might get us to a future where one can do R/W ops against a service without needing to handle multiple endpoint URLs.

@TallTed
Copy link
Member

TallTed commented Apr 5, 2019

One thing which I think falls reasonably into this discussion (though perhaps it should be split to its own) is the need for a "soft" success/failure status/response code -- similar to ODBC's SQL_SUCCESS_WITH_INFO status, as came up in Solid development when discussing SPARQL UPDATE --

From the SPARQL UPDATE spec:

3.1.3.3 DELETE WHERE
Analogous to DELETE/INSERT, deleting triples that are not present, or from a graph that is not present will have no effect and will result in success.

...

I am very disappointed to see the SPARQL 1.1 UPDATE spec excerpt, which says to me there's a tragic bug in the SPARQL 1.1 UPDATE spec, and a failure to learn from existing standards and writings, particularly those which support multiple users acting on the same resources (e.g., ODBC clients of a DBMS instance).

I would suggest that the appropriate response is not a full-success, but a warning or soft-fail/soft-success like ODBC's SQL_SUCCESS_WITH_INFO -- so the user/app can know that the thing they tried to delete wasn't there to delete, which would allow them to address typos and similar situations -- while also not being a hard-fail, so handling can be gentler (and users/apps might choose to ignore the INFO portion, but this is then an informed choice, and they need not be left guessing whether they'd actually deleted the thing)...

(We MUST find a way to modify W3 specs more quickly than is currently possible, especially when such app-blocking errors are discovered!)

As discussed above, there are many situations where a simple success/failure HTTP response code is not enough -- even if the body of such response may contain more info.

@kasei
Copy link
Collaborator

kasei commented Apr 5, 2019

@cygri

The 1.2 version of the protocol should say that if a service implements update, it SHOULD also implement query.

Yes, I think that's a great solution. So long as deployments can continue to have an endpoint for query and another for update (which SHOULD also allow queries), I'd welcome that change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol improving sending queries over the wire
Projects
None yet
Development

No branches or pull requests

7 participants