Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace management: remove excess whitespace in long quoted literals #83

Open
kouralex opened this issue Jan 18, 2021 · 2 comments
Open

Comments

@kouralex
Copy link
Contributor

There already exists a whitespace trimming mechanism in Skosify, however, it does not take into account long quoted literals such as

""" This is an  example
of a long quoted
literal
which is     ridden    with   
   whitespace  
"""

in which the whitespace in the middle of the literal (between words and at line starts/endings) is preserved.
I would prefer the example above to be stripped into

"""This is an example
of a long quoted
literal
which is ridden with
whitespace"""

I believe that there could be an option that sets how whitespace is trimmed as sometimes whitespace may very well be intended and meaningful. At least line breaks are such a feature - I wonder what would be the best option for setting them as, for example, meaningful HTML entities. I can already think of two line breaks to be different from single one in these cases as well as in general.

Also, I wonder if long quoted literals could be switched into single quoted literals and vice versa (when applicable) based on an option, even though they yield the very same information.

Any thoughts on this, @osma ?

@osma
Copy link
Member

osma commented Jan 18, 2021

PR welcome ;) Do you have any examples of a vocabulary where this is a problem?

Regarding literal quote style: there is no semantic difference between single quoted vs. triple-quoted literals in Tutle. It is up to the serializer (i.e. rdflib) to choose the appropriate quoting style. I believe the choice depends on whether the literal includes newlines or not. I don't see this as something Skosify should try to control.

@kouralex
Copy link
Contributor Author

Heh, I might come up one, if only we can agree on behavior. :)

I am currently working on https://finto.fi/ykl/ and there exists these things I mentioned even in preferred labels - one can download the skosify'd file and search for, e.g., \s".*\ \ (won't catch long quoted triple cases, though).

And for the latter comment, I certainly know that there is no semantic difference, however, I had some use case on my mind but it probably isn't a good idea anyway (would complicate things) as you pinpointed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants