Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails caused by non-standard characters in file names #339

Open
EveWright opened this issue Feb 15, 2022 · 6 comments
Open

Fails caused by non-standard characters in file names #339

EveWright opened this issue Feb 15, 2022 · 6 comments
Labels

Comments

@EveWright
Copy link

I have found that CSV Validator cannot locate some files listed in the CSV when using the checksum expression (MD5) if they have non-standard characters in the file names. I have found this in happens when the characters +, %, ~ or excessive white space appears in the file name.

Is there a way around this so CSV Validator can locate the files (other than me removing the characters from the file names and updating my CSV file)?

@DavidUnderdown
Copy link

Again, do you have some specific examples of file names where this has happened, is your CSV coming from DROID again for these? (it's possible it could be an issue with the URLs being created there rather than with validator as such)

@EveWright
Copy link
Author

Yes the CSV is coming from DROID for these as well. You can find some example files attached. Let me know if you require anything further.

Character Issue.zip

@DavidUnderdown
Copy link

Thanks, I will see if I can recreate your issue. One question, are you testing against the FILE_PATH field from DROID or the URI field?

@EveWright
Copy link
Author

Thanks I am using FILE_PATH

@DavidUnderdown
Copy link

OK, I've looked at this one, I suspect it's related to #80 where it was down to an encoding issue, that was fixed but by the looks of it that was on Unix paths and I guess this may be on Windows (I've certainly recreated the issue on Windows).

It probably won't be possible to resolve in this in the current round of development work, however, I've also verified that for the example files supplied at least if URI is used as the location to check against rather than FILE_PATH then CSV Validator correctly finds the files as here the character encoding is applied correctly. The only thing to watch out for in making the switch is that if you are using a path substitution you will need to enter that using encoded vales, for example any spaces earlier in the path would need to be replaced with %20.

@EveWright
Copy link
Author

Thanks David that's really helpful. I will give the URI a go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants