strange behaviour when using filter + if_else #474

danielspringt · 2023-06-20T13:45:28Z

Hi - the following example produces strange results:

import siuba as sb
from siuba import _, mutate, count, if_else
from siuba.data import penguins

print(f'initial rows:{penguins.shape[0]}')
dat = penguins >> sb.filter(_.island != "Torgersen") 
print(f'rows after filtering:{dat.shape[0]}')

dat = dat >> mutate(
    binary_col = if_else(_.island == 'Biscoe', 1, 0)
    )

dat_count = dat >> count(_.binary_col )
print(dat_count)

I use a filter to drop some of the rows. When using mutate on the filtered dataframe the previously dropped rows
somehow still appear in the dataframe.

I would expect a count output like:

   binary_col    n
0         0.0  110
1         1.0  130

but the dropped observations get labeled with NaN

   binary_col    n
0         0.0  110
1         1.0  130
2         NaN   52

What am I doing wrong?

The text was updated successfully, but these errors were encountered:

jonesworks · 2023-07-08T21:23:58Z

Note that after filtering, the index is not reset.

Instead, try this:

dat = (penguins >> sb.filter(_.island != "Torgersen")).reset_index(drop=True)

I've encountered similar issues in R, the resolution of which was droplevels()

Also, running this will perhaps shed a bit more light on discrepancy between output above and expected output.

( 
    penguins 
    >> group_by( 
        _.island
    )
    >> count() 
    >> arrange(-_.n)
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strange behaviour when using filter + if_else #474

strange behaviour when using filter + if_else #474

danielspringt commented Jun 20, 2023 •

edited

Loading

jonesworks commented Jul 8, 2023

strange behaviour when using filter + if_else #474

strange behaviour when using filter + if_else #474

Comments

danielspringt commented Jun 20, 2023 • edited Loading

jonesworks commented Jul 8, 2023

danielspringt commented Jun 20, 2023 •

edited

Loading