Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't raise unicode error on malformed JPEG meta #17

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

amw
Copy link

@amw amw commented Mar 17, 2017

I encountered a JPEG file that had invalid unicode string inside "software" meta tag. This caused UnicodeDecodeError in filemagic's compatibility.py:

    description = magic.id_buffer(chunk)
  File "/Users/amw/.virtualenvs/asd/lib/python3.6/site-packages/magic/identify.py", line 29, in wrapper
    return func(self, *args, **kwargs)
  File "/Users/amw/.virtualenvs/asd/lib/python3.6/site-packages/magic/compatability.py", line 30, in wrapper
    return func(*encoder(args), **kwargs)
  File "/Users/amw/.virtualenvs/asd/lib/python3.6/site-packages/magic/compatability.py", line 56, in wrapper
    return value.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 210: invalid continuation byte

In this PR I am passing errors='replace' option to the decode method so that we can return safe string with the rest of the file description intact. Another alternative is ignore which I deemed less safe.

When testing my specific file I have noticed that both replace and ignore returned strings are unicode equivalent when copied to some context (like this GitHub page), but they do look different in Terminal. See below comparison text and screenshots.

python replace

JPEG image data, JFIF standard 1.01, resolution (DPI), density 96x96, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=4, xresolution=62, yresolution=70, resolutionunit=2, software=ˮӡרҵ], baseline, precision 8, 560x372, frames 3

replace

python ignore

JPEG image data, JFIF standard 1.01, resolution (DPI), density 96x96, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=4, xresolution=62, yresolution=70, resolutionunit=2, software=ˮӡרҵ], baseline, precision 8, 560x372, frames 3

ignore

file shell command

JPEG image data, JFIF standard 1.01, resolution (DPI), density 96x96, segment length 16, Exif Standard: [TIFF image data, little-endian, direntries=4, xresolution=62, yresolution=70, resolutionunit=2, software=????ˮӡרҵ??], baseline, precision 8, 560x372, frames 3

file-cmd

@coveralls
Copy link

Coverage Status

Coverage decreased (-3.0%) to 88.177% when pulling ae5ec7c on amw:replace-invalid-unicode into 1386490 on aliles:master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants