Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Automatic saving of metadata with arrow files #3480

Open
alex-s-gardner opened this issue Nov 28, 2024 · 2 comments
Open

Feature Request: Automatic saving of metadata with arrow files #3480

alex-s-gardner opened this issue Nov 28, 2024 · 2 comments
Labels
ecosystem Issues in DataFrames.jl ecosystem
Milestone

Comments

@alex-s-gardner
Copy link

It would be great if a DataFrame and it's metadata could be saved to a single file. I believe that arrow supports this.

Right now metadata is not automatically saved when a dataframe is saved as an arrow file. I believe a PR was opened but it appears stalled.

It would be great to have this functionality.

current status of saving to arrow with metadata

using Arrow
using DataFrames

df = DataFrame(a = 1:3, b= 'A':'C')
Arrow.write("test.arrow", df)
df = DataFrame(Arrow.Table("test.arrow"))

colmetadata!(df, :a, "test", "hope this works"; style = :note)
colmetadata(df, :a, "test")

Arrow.write("test2.arrow", df)
df = DataFrame(Arrow.Table("test2.arrow"))
colmetadata(df, :a, "test")
ERROR: ArgumentError: no column-level metadata found for column "a"
Stacktrace:
 [1] colmetadata(df::DataFrame, col::Symbol, key::String, default::DataFrames.MetadataMissingDefault; style::Bool)
   @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:367
 [2] colmetadata
   @ ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:360 [inlined]
 [3] colmetadata(df::DataFrame, col::Symbol, key::String)
   @ DataFrames ~/.julia/packages/DataFrames/kcA9R/src/other/metadata.jl:360
 [4] top-level scope
   @ ~/Documents/GitHub/ItsLivePlayground.jl/src/RiverTest.jl:41
@bkamins
Copy link
Member

bkamins commented Nov 29, 2024

This is an issue with Arrow.jl. Hopefully the stalled PR will soon be merged and released by the maintainers.

@bkamins bkamins added the ecosystem Issues in DataFrames.jl ecosystem label Nov 29, 2024
@bkamins bkamins added this to the patch milestone Nov 29, 2024
@asinghvi17
Copy link
Contributor

asinghvi17 commented Dec 18, 2024

https://github.com/apache/arrow-julia/blob/2583a66f54ac4087bfe7ae34c1ffbab3cb3c81f6/src/table.jl#L365-L366

https://github.com/apache/arrow-julia/blob/2583a66f54ac4087bfe7ae34c1ffbab3cb3c81f6/src/write.jl#L48

It looks like this could be a simple change made in the generic implementation of the getmetadata function in Arrow.jl to support writing metadata?

It would have to get the metadata dictionary from DataAPI, and then convert the contents to strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ecosystem Issues in DataFrames.jl ecosystem
Projects
None yet
Development

No branches or pull requests

3 participants