-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support a 0-length array result in summarize, when working on an empty DataFrame #6637
Comments
I think those results are consistent:
|
That's fair--I'm still trying to wrap my head around this, but something feels a bit inconsistent. For example, mutate on an empty frame allows either 0 or 1 row... library(tidyverse)
df <- tibble(a = integer())
df %>% mutate(b = integer())
#> # A tibble: 0 × 2
#> # … with 2 variables: a <int>, b <int>
df %>% mutate(b = 1)
#> # A tibble: 0 × 2
#> # … with 2 variables: a <int>, b <dbl> Created on 2023-01-11 by the reprex package (v2.0.1) As long as aggregation functions always return a single value, when they're given empty data, then it seems like the current behavior shouldn't be a problem for summarize? |
Yeah, because |
Okay, thanks, this is all super helpful. Knowing that aggregate functions should always return some 1-length value, even when given empty data was the missing piece! (and that recycling can also reduce a 1-length value to 0-length!) |
@machow these recycling rules are actually the same as the broadcasting rules used by numpy, if you want a Python connection https://numpy.org/doc/stable/user/basics.broadcasting.html#general-broadcasting-rules |
Currently, afaict dplyr handles summarizing an empty dataframe as follows:
discard its value(edit: grouped summarize discards the value, ungrouped summarize keeps it)This results in a sort of funky situation where operations like
+
raise a warning, becauseinteger() + 1
results in a 0-length array. I wonder if a 0-length result when summarizing an empty data.frame should not be deprecated? It seems like there is already some special behavior around empty frames, and some operations returning 0-length results is likely in this case :/.Created on 2023-01-10 by the reprex package (v2.0.1)
related to machow/siuba#467
edit: wait -- I just noticed ungrouped summarize keeps the value, but grouped summarize discards it...
Created on 2023-01-10 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: