You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From the docs, I think this is working as designed, but wanted to check. There's a decent chance I'm doing something wrong.
File auto-detection becomes less useful to library consumer like me if I have to check the type of file I'm using and take different branches for it. It's still great to get a normalized representation of various formats . . . thanks for that.
If you open an Excel file (.xlsx) with just mode r, you get an error zipfile.BadZipFile: File is not a zip file. Doesn't matter if I specify the format in the second parameter to load() when calling open() myself. You must specify mode rb.
This in on Ubuntu 20.04, python3.9, openpyxl 3.0.9
# zipfile.BadZipFile: File is not a zip filewithopen("mySheet.xlsx", "r") asfh:
data.load(fh)
# OKwithopen("mySheet.xlsx", "rb") asfh:
data.load(fh)
If you pass the file path, it cannot determine the file type. If you specify the type, it doesn't open binary.
# zipfile.BadZipFile: File is not a zip filedata.load("mySheet.xlsx", "xlsx")
# tablib.exceptions.UnsupportedFormat: Tablib has no format 'None' or it is not registered.data.load("mySheet.xlsx")
I think the issue in data.load("mySheet.xlsx", "xlsx") is because core.normalize_input() converts the path given as a str to a StringIO. And when openpyxl passes it to zipfile, zipfile would call open() with mode rb if it was still a string. I'm not sure zipfile is ready for StringIO at all. It hits this code block which doesn't seem to expect a StringIO: https://github.com/python/cpython/blob/main/Lib/zipfile.py#L1273
Now if I shift to CSV, I can't use mode rb. It does give instructions in the exception if you give it the file type hint. And it does get it right if you leave the call to open() to tablib and give it the format hint.
# tablib.exceptions.UnsupportedFormat: Tablib has no format 'None' or it is not registered.withopen("myCsv.csv", "rb") asfh:
data.load(fh)
# _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)withopen("myCsv.csv", "rb") asfh:
data.load(fh, "csv")
# tablib.exceptions.UnsupportedFormat: Tablib has no format 'None' or it is not registered.data.load("myCsv.csv")
# OKdata.load("myCsv.csv", "csv")
The text was updated successfully, but these errors were encountered:
I have the exact same problem myself, and it's very frustrating. What use is autodetection if you've already opened the file object in the wrong mode?
I believe the API design should be improved here.
It would help if you could get a list of supported formats, with a boolean indicating if they're binary or not.
However, this does not fully support the special case of writing CSV files, since they should be opened in text mode but with no newline support -- open('output.csv', 'w', newline='')
Ideally, there should be an additional set of API function calls for working directly with filenames or PathLike objects.
I still want this feature. 😊 I would like to take a run at creating a PR with a new API call and class methods for automatically loading a filename or Path. Hopefully I'll have a draft soon.
From the docs, I think this is working as designed, but wanted to check. There's a decent chance I'm doing something wrong.
File auto-detection becomes less useful to library consumer like me if I have to check the type of file I'm using and take different branches for it. It's still great to get a normalized representation of various formats . . . thanks for that.
If you open an Excel file (.xlsx) with just mode
r
, you get an errorzipfile.BadZipFile: File is not a zip file
. Doesn't matter if I specify the format in the second parameter toload()
when callingopen()
myself. You must specify moderb
.This in on Ubuntu 20.04, python3.9, openpyxl 3.0.9
If you pass the file path, it cannot determine the file type. If you specify the type, it doesn't open binary.
I think the issue in
data.load("mySheet.xlsx", "xlsx")
is becausecore.normalize_input()
converts the path given as a str to a StringIO. And when openpyxl passes it to zipfile, zipfile would callopen()
with moderb
if it was still a string. I'm not sure zipfile is ready for StringIO at all. It hits this code block which doesn't seem to expect a StringIO: https://github.com/python/cpython/blob/main/Lib/zipfile.py#L1273Now if I shift to CSV, I can't use mode
rb
. It does give instructions in the exception if you give it the file type hint. And it does get it right if you leave the call toopen()
to tablib and give it the format hint.The text was updated successfully, but these errors were encountered: