Unexpected complaint about too-wide a table #357

ablaom · 2024-11-22T00:36:17Z

I don't expect tables with 5 columns to be "too wide". Should I?

Pkg.activate(temp=true)
Pkg.add("Tables")
Pkg.add("OpenML")

using OpenML, Tables

# [8b6db2d4] OpenML v0.3.2
# [bd369af6] Tables v1.12.0

table = OpenML.load(61)
# Tables.DictColumnTable with 150 rows, 5 columns, and schema:
#  :sepallength  Float64
#  :sepalwidth   Float64
#  :petallength  Float64
#  :petalwidth   Float64
#  :class        CategoricalArrays.CategoricalValue{String, UInt32}

Tables.columntable(table)
# ERROR: ArgumentError: input table too wide (5 columns) to convert to `NamedTuple` of `AbstractVector`s
# Stacktrace:
#  [1] columntable(sch::Tables.Schema{nothing, nothing}, cols::Tables.DictColumnTable)
#    @ Tables ~/.julia/packages/Tables/8p03y/src/namedtuples.jl:180
#  [2] columntable(itr::Tables.DictColumnTable)
#    @ Tables ~/.julia/packages/Tables/8p03y/src/namedtuples.jl:190
#  [3] top-level scope
#    @ REPL[7]:1

ablaom · 2024-11-22T00:45:06Z

@jbrea Have you run into this kind of thing?

jbrea · 2024-11-22T09:22:31Z

Weird. No, I usually use DataFrames, which work fine on this example.

ablaom · 2024-12-01T19:56:57Z

@quinnj

Fixes #357. The issue here is for stored schema, the type of the schema is `Schema{nothing, nothing}` which usually indicates tables with many columns. Some tables implementations, however, like ARFFFiles.jl, may choose to explicitly store _all_ schemas, even for very narrow tables. We already have a generated branch which checks for a specialization threshold for the known-schema case, so the fix here is fairly straightforward in just actually checking if the stored schema # of columns is actually too many or not. In the end, users should be aware that `Tables.columntable` isn't a perfect, 100% kind of table implementation that is always expected to work. It was originally meant as just a test implementation that then turned out to be fairly convenient for REPL use. Users should note that generating a named tuple of columns from stored schema doesn't have a way to be particularly efficient, since it necessarily has to generate the NamedTuple type at runtime.

quinnj · 2024-12-03T05:55:04Z

Thanks for the ping. I put up a PR for a fix: #360. I added some commentary however on users maybe expecting too much of Tables.columntable and to just be aware that it's not meant to be a super robust column table implementation.

Fixes #357. The issue here is for stored schema, the type of the schema is `Schema{nothing, nothing}` which usually indicates tables with many columns. Some tables implementations, however, like ARFFFiles.jl, may choose to explicitly store _all_ schemas, even for very narrow tables. We already have a generated branch which checks for a specialization threshold for the known-schema case, so the fix here is fairly straightforward in just actually checking if the stored schema # of columns is actually too many or not. In the end, users should be aware that `Tables.columntable` isn't a perfect, 100% kind of table implementation that is always expected to work. It was originally meant as just a test implementation that then turned out to be fairly convenient for REPL use. Users should note that generating a named tuple of columns from stored schema doesn't have a way to be particularly efficient, since it necessarily has to generate the NamedTuple type at runtime.

ablaom mentioned this issue Nov 22, 2024

Flush out content of "Getting Started" JuliaAI/MLJ#20

Open

quinnj mentioned this issue Dec 3, 2024

Fix columntable materialization on stored schema #360

Merged

quinnj closed this as completed in #360 Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected complaint about too-wide a table #357

Unexpected complaint about too-wide a table #357

ablaom commented Nov 22, 2024

ablaom commented Nov 22, 2024

jbrea commented Nov 22, 2024

ablaom commented Dec 1, 2024

quinnj commented Dec 3, 2024

Unexpected complaint about too-wide a table #357

Unexpected complaint about too-wide a table #357

Comments

ablaom commented Nov 22, 2024

ablaom commented Nov 22, 2024

jbrea commented Nov 22, 2024

ablaom commented Dec 1, 2024

quinnj commented Dec 3, 2024