Don't raise unicode error on malformed JPEG meta #17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I encountered a JPEG file that had invalid unicode string inside "software" meta tag. This caused UnicodeDecodeError in filemagic's compatibility.py:
In this PR I am passing
errors='replace'
option to thedecode
method so that we can return safe string with the rest of the file description intact. Another alternative isignore
which I deemed less safe.When testing my specific file I have noticed that both
replace
andignore
returned strings are unicode equivalent when copied to some context (like this GitHub page), but they do look different in Terminal. See below comparison text and screenshots.python
replace
python
ignore
file
shell command