Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for csvw spec 'separator' #8

Open
SimonGreenhill opened this issue Sep 3, 2023 · 2 comments
Open

Support for csvw spec 'separator' #8

SimonGreenhill opened this issue Sep 3, 2023 · 2 comments

Comments

@SimonGreenhill
Copy link

csvw can define 'separators' in field definitions e.g.

                    {
                        "datatype": "string",
                        "propertyUrl": "http://cldf.clld.org/v1.0/terms.rdf#source",
                        "required": false,
                        "separator": ";",
                        "name": "Source"
                    }

...which means that the field should be parsed from "a;b" to something like c("a", "b"). It would be nice to support this.

@Robsteranium
Copy link
Owner

I agree this would be nice to have.

The complication is that we would need to handle all types, not just strings. We'd might want a list column whose values are all vectors of the relevant type.

It'd be nice to handle this with the call to read::read_csv which is responsible for parsing. This is done in c++ and I'm not sure how easy it'd be to extend this.

An alternative would be to read in all separated cells as strings then post-process them in R. This would be a lot slower of course.

This isn't something I expect to have time to work on but would gladly review a PR.

@xrotwang
Copy link

xrotwang commented Sep 8, 2023

In our csvw python package I found that the various requirements of dialect specs (treating lines as comments, etc.) already precluded using python's csv standard library out-of-the-box. So for me, the "post-process separated strings in python" (slow) solution seemed unavoidable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants