-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve field name when casting List #13468
Changes from 3 commits
a68ef01
f0292d3
bf61504
9564d79
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -324,18 +324,29 @@ pub fn longest_consecutive_prefix<T: Borrow<usize>>( | |
/// Wrap an array into a single element `ListArray`. | ||
/// For example `[1, 2, 3]` would be converted into `[[1, 2, 3]]` | ||
/// The field in the list array is nullable. | ||
pub fn array_into_list_array_nullable(arr: ArrayRef) -> ListArray { | ||
array_into_list_array(arr, true) | ||
pub fn array_into_list_array_nullable( | ||
arr: ArrayRef, | ||
field_name: Option<&str>, | ||
) -> ListArray { | ||
array_into_list_array(arr, true, field_name) | ||
} | ||
|
||
/// Array Utils | ||
|
||
/// Wrap an array into a single element `ListArray`. | ||
/// For example `[1, 2, 3]` would be converted into `[[1, 2, 3]]` | ||
pub fn array_into_list_array(arr: ArrayRef, nullable: bool) -> ListArray { | ||
pub fn array_into_list_array( | ||
arr: ArrayRef, | ||
nullable: bool, | ||
field_name: Option<&str>, | ||
) -> ListArray { | ||
let offsets = OffsetBuffer::from_lengths([arr.len()]); | ||
ListArray::new( | ||
Arc::new(Field::new_list_field(arr.data_type().to_owned(), nullable)), | ||
Arc::new(Field::new( | ||
field_name.unwrap_or("item"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know this is not a concern of this PR, but looks as if this naming convention should be formalized/centralized somewhere (at least a constant), since otherwise there's a whole lot of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 100% agree -- I think it should be done in arrow-rs Maybe we could add some comments on https://docs.rs/arrow/latest/arrow/datatypes/struct.Field.html#method.new_list or something There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense—I can pick that up if there's no objection, maybe something like this impl Field {
+ /// Default list member field name
+ pub const LIST_FIELD_DEFAULT_NAME: &'static str = "item";
+
/// Creates a new field with the given name, type, and nullability
pub fn new(name: impl Into<String>, data_type: DataType, nullable: bool) -> Self {
Field {
@@ -144,7 +147,7 @@ impl Field {
/// );
/// ```
pub fn new_list_field(data_type: DataType, nullable: bool) -> Self {
- Self::new("item", data_type, nullable)
+ Self::new(Self::LIST_FIELD_DEFAULT_NAME, data_type, nullable)
}
I think There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. at the very least having a default constant exposed would be a nice improvement I think |
||
arr.data_type().to_owned(), | ||
nullable, | ||
)), | ||
offsets, | ||
arr, | ||
None, | ||
|
@@ -344,10 +355,17 @@ pub fn array_into_list_array(arr: ArrayRef, nullable: bool) -> ListArray { | |
|
||
/// Wrap an array into a single element `LargeListArray`. | ||
/// For example `[1, 2, 3]` would be converted into `[[1, 2, 3]]` | ||
pub fn array_into_large_list_array(arr: ArrayRef) -> LargeListArray { | ||
pub fn array_into_large_list_array( | ||
arr: ArrayRef, | ||
field_name: Option<&str>, | ||
) -> LargeListArray { | ||
let offsets = OffsetBuffer::from_lengths([arr.len()]); | ||
LargeListArray::new( | ||
Arc::new(Field::new_list_field(arr.data_type().to_owned(), true)), | ||
Arc::new(Field::new( | ||
field_name.unwrap_or("item"), | ||
arr.data_type().to_owned(), | ||
true, | ||
)), | ||
offsets, | ||
arr, | ||
None, | ||
|
@@ -357,10 +375,15 @@ pub fn array_into_large_list_array(arr: ArrayRef) -> LargeListArray { | |
pub fn array_into_fixed_size_list_array( | ||
arr: ArrayRef, | ||
list_size: usize, | ||
field_name: Option<&str>, | ||
) -> FixedSizeListArray { | ||
let list_size = list_size as i32; | ||
FixedSizeListArray::new( | ||
Arc::new(Field::new_list_field(arr.data_type().to_owned(), true)), | ||
Arc::new(Field::new( | ||
field_name.unwrap_or("item"), | ||
arr.data_type().to_owned(), | ||
true, | ||
)), | ||
list_size, | ||
arr, | ||
None, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this
Option<impl Into<String>>
to align with the type inField::new
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can but that means we'd now have to add type annotations to everywhere we call this, including those that are going to pass
None
because they don't have field elements. I think this would add bloat that doesn't really add much value. If you feel strongly, I'll make the adjustment.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah fair point, let's keep it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this function is public I don't think we can change its signature in a minor release
https://docs.rs/datafusion/latest/datafusion/common/utils/fn.array_into_list_array.html
Thus I suggest we keep the existing functions but mark them as deprecated and add a new API
Maybe we can do something
builder
style to make future modifications easierOr something -- that would also give us a way to take the FieldRef directly as well as document what the defaults are 🤔