-
-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for escaping the delimiter like the python counterpart (Read) #233
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change accomplishes what you want. You're right that it breaks everything. :-)
I'm also not sure whether this is worthwhile to add. The data you've given is not CSV. I know this parser goes out of its way to minimally support some kinds of invalid CSV, but I'm not sure how far down that path we want to go.
(Also, in order to get something like this merged aside from the above, there needs to be tests. And ideally a doc example showing its use in the main csv
crate.)
Thanks for the quick reply. All of what you said makes sense, apologies for the poor (and quite clearly broken code). I just needed this to work quickly and I thought I'd share my current changes with a PR to spur a conversation. Using your notes I thought of this:
Would that be something that could be considered for this csv library? I know it's not standard csv, but it is technically still comma-separated values albeit with an escape character. Again, apologies for the code. |
No need to apologize. I'm happy to receive lower effort stuff first: it lets us prove out the idea, and ultimately if we don't move forward with it, then not much has been lost. Your plan sounds okay I think, although, I wonder if it makes sense to re-use the current |
Yes, I'm all for reusing that. I also think it's better if it's only enabled when quoting is false and the escape character is set. In most if not all cases, you either run with quotes for escaping the value or an escape character to escape the actual delimiter. There shouldn't really be a good reason to use both ways at the same time. From python std docs for escapechar in csv:
Would that be manageable? The escape option is never used if quoting is disabled correct? If I'm right, then you could re-use the escape character for this, quite freely. Assuming quoting also disables double quote, it might look like this: let mut csv = ReaderBuilder::new()
.quoting(false)
.escape(Some(b'\\'))
... |
Oh I see. Off hand, I believe that's right, so no new config knob is needed. Nice. |
Sounds good, I'll give it a go this weekend 👍 |
96b2a5b
to
e31d44f
Compare
I must be misinterpreting something, I really thought something like this would work. But it's still interpreting the EDIT: Also, let me know if you find it unwanted to modify the state numbers like this. |
c4547f4
to
91998f6
Compare
Solved by moving escape out from under quoting: rust-csv/csv-core/src/reader.rs Lines 807 to 812 in 91998f6
However, while this doesn't give an error and partly works, it's removing the escaped
Becomes: StringRecord(["field1", "field2", "field3 with commas", "field4"]) What am I missing? |
Not sure. I went and looked for the obvious thing (not using I would suggest debugging it at the |
7102087
to
cb2bc96
Compare
cb2bc96
to
938a372
Compare
Some gdb later and I felt silly, I forgot to add it back to One thing I'm unsure about is the rust-csv/csv-core/src/reader.rs Lines 685 to 689 in 938a372
On another note, if this is accepted as a valuable feature, could adding it to the writer be considered as well? |
Fixes #232
As mentioned in the issue, this is probably sub-optimal, and it
mightwill probably break stuff. In fact this might even be possible already, just hidden in the docs. In any case I would just like to hear some ideas.Using these options:
One can now read this properly: