-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
group_by >> summarize on an empty df #467
Comments
Thanks for reporting. Digging a bit into dplyr, it seems like some it has careful handling of this case:
For example: library(dplyr)
df <- tibble(a = integer(), b = integer())
# in all the examples below, the value is discarded (e.g. 1, 1.2 get thrown away)
# c is a int
df %>% group_by(a) %>% summarize(c = 1)
# c is a dbl
df %>% group_by(a) %>% summarize(c = 1.2)
# c is a int, since sum(a) is 0
df %>% group_by(a) %>% summarize(c = sum(a)) |
Note also that the experimental behavior of summarize being able to return 0 or > 1 rows is deprecated (and a new function tentatively called reframe will handle that behavior!). It seems like the code above still works on the main branch of dplyr, but this case now prints a warning: df %>% group_by(a) %>% summarize(c = integer()) output:
|
Ah, this is quite an interesting way of looking at it. "A grouped summarise always return 1 row per group" Regarding this process:
It seems to me that there are no groups to group by, so there is no empty data to summarize with a function like It seems that most summarizing methods in pandas like If siuba needs to match dplyr behaviour on this point, then is there the possibility of adding an optional argument to the |
Consider the following:
This doesn't add the column
z
:I would have expected
The text was updated successfully, but these errors were encountered: