Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected complaint about too-wide a table #357

Closed
ablaom opened this issue Nov 22, 2024 · 4 comments · Fixed by #360
Closed

Unexpected complaint about too-wide a table #357

ablaom opened this issue Nov 22, 2024 · 4 comments · Fixed by #360

Comments

@ablaom
Copy link
Contributor

ablaom commented Nov 22, 2024

I don't expect tables with 5 columns to be "too wide". Should I?

Pkg.activate(temp=true)
Pkg.add("Tables")
Pkg.add("OpenML")

using OpenML, Tables

# [8b6db2d4] OpenML v0.3.2
# [bd369af6] Tables v1.12.0

table = OpenML.load(61)
# Tables.DictColumnTable with 150 rows, 5 columns, and schema:
#  :sepallength  Float64
#  :sepalwidth   Float64
#  :petallength  Float64
#  :petalwidth   Float64
#  :class        CategoricalArrays.CategoricalValue{String, UInt32}

Tables.columntable(table)
# ERROR: ArgumentError: input table too wide (5 columns) to convert to `NamedTuple` of `AbstractVector`s
# Stacktrace:
#  [1] columntable(sch::Tables.Schema{nothing, nothing}, cols::Tables.DictColumnTable)
#    @ Tables ~/.julia/packages/Tables/8p03y/src/namedtuples.jl:180
#  [2] columntable(itr::Tables.DictColumnTable)
#    @ Tables ~/.julia/packages/Tables/8p03y/src/namedtuples.jl:190
#  [3] top-level scope
#    @ REPL[7]:1
@ablaom
Copy link
Contributor Author

ablaom commented Nov 22, 2024

@jbrea Have you run into this kind of thing?

@jbrea
Copy link

jbrea commented Nov 22, 2024

Weird. No, I usually use DataFrames, which work fine on this example.

@ablaom
Copy link
Contributor Author

ablaom commented Dec 1, 2024

@quinnj

quinnj added a commit that referenced this issue Dec 3, 2024
Fixes #357.

The issue here is for stored schema, the type of the schema is
`Schema{nothing, nothing}` which usually indicates tables with
many columns. Some tables implementations, however, like ARFFFiles.jl,
may choose to explicitly store _all_ schemas, even for very narrow tables.
We already have a generated branch which checks for a specialization threshold
for the known-schema case, so the fix here is fairly straightforward in just
actually checking if the stored schema # of columns is actually too many or
not.

In the end, users should be aware that `Tables.columntable` isn't a perfect,
100% kind of table implementation that is always expected to work. It was originally
meant as just a test implementation that then turned out to be fairly convenient
for REPL use. Users should note that generating a named tuple of columns from stored
schema doesn't have a way to be particularly efficient, since it necessarily has to
generate the NamedTuple type at runtime.
@quinnj
Copy link
Member

quinnj commented Dec 3, 2024

Thanks for the ping. I put up a PR for a fix: #360. I added some commentary however on users maybe expecting too much of Tables.columntable and to just be aware that it's not meant to be a super robust column table implementation.

quinnj added a commit that referenced this issue Dec 3, 2024
Fixes #357.

The issue here is for stored schema, the type of the schema is
`Schema{nothing, nothing}` which usually indicates tables with
many columns. Some tables implementations, however, like ARFFFiles.jl,
may choose to explicitly store _all_ schemas, even for very narrow tables.
We already have a generated branch which checks for a specialization threshold
for the known-schema case, so the fix here is fairly straightforward in just
actually checking if the stored schema # of columns is actually too many or
not.

In the end, users should be aware that `Tables.columntable` isn't a perfect,
100% kind of table implementation that is always expected to work. It was originally
meant as just a test implementation that then turned out to be fairly convenient
for REPL use. Users should note that generating a named tuple of columns from stored
schema doesn't have a way to be particularly efficient, since it necessarily has to
generate the NamedTuple type at runtime.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants