how can i conditionally apply, parse dates

paauw · Dec 13, 2020 · 050c984 · 050c984
1 parent ce5890f
commit 050c984
Show file tree

Hide file tree

Showing 10 changed files with 90 additions and 6 deletions.
diff --git a/Makefile b/Makefile
@@ -21,10 +21,13 @@ data/: .venv
 
 
 run: data
+	@mkdir -p book/src/outputs
 	$(PYTHON) -m micro_bench.plot_results
 	$(PYTHON) -m book.src.examples.lazy_chapter.data_head
 	$(PYTHON) -m book.src.examples.lazy_chapter.predicate_pushdown_0
 	$(PYTHON) -m book.src.examples.lazy_chapter.predicate_pushdown_1
 	$(PYTHON) -m book.src.examples.lazy_chapter.projection_pushdown_0
 	$(PYTHON) -m book.src.examples.how_can_i.groupby
 	$(PYTHON) -m book.src.examples.how_can_i.aggregate
+	$(PYTHON) -m book.src.examples.how_can_i.parse_dates
+	$(PYTHON) -m book.src.examples.how_can_i.conditionally_apply
diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md
@@ -9,3 +9,5 @@
 - [How can I?](how_can_i/intro.md)
     * [GroupBy](how_can_i/groupby.md)
     * [Aggregate](how_can_i/aggregate.md)
+    * [Conditionally apply](how_can_i/conditionally_apply.md)
+    * [Parse dates](how_can_i/parse_dates.md)
diff --git a/book/src/examples/how_can_i/aggregate.py b/book/src/examples/how_can_i/aggregate.py
@@ -1,9 +1,8 @@
 import pypolars as pl
 from pypolars.lazy import *
 
-reddit = (
-    pl.scan_csv("data/reddit.csv")
-    .select([pl.sum("comment_karma"), pl.min("link_karma")])
+reddit = pl.scan_csv("data/reddit.csv").select(
+    [pl.sum("comment_karma"), pl.min("link_karma")]
 )
 
 if __name__ == "__main__":

diff --git a/book/src/examples/how_can_i/conditionally_apply.py b/book/src/examples/how_can_i/conditionally_apply.py
@@ -0,0 +1,16 @@
+import pypolars as pl
+from pypolars.lazy import *
+import numpy as np
+
+df = pl.DataFrame({"range": np.arange(10), "left": ["foo"] * 10, "right": ["bar"] * 10})
+
+out = df.lazy().with_column(
+    when(col("range") >= 5)
+    .then(col("left"))
+    .otherwise(col("right"))
+    .alias("foo_or_bar")
+)
+
+if __name__ == "__main__":
+    with open("book/src/outputs/how_can_i_conditionally_apply.txt", "w") as f:
+        f.write(str(out.collect()))
diff --git a/book/src/examples/how_can_i/parse_dates.py b/book/src/examples/how_can_i/parse_dates.py
@@ -0,0 +1,16 @@
+import pypolars as pl
+from pypolars.lazy import *
+
+df = pl.DataFrame(
+    {"date": ["2020-01-02", "2020-01-03", "2020-01-04"], "index": [1, 2, 3]}
+)
+
+parsed = df.lazy().with_column(
+    col("date").str_parse_date(pl.datatypes.Date32, "%Y-%m-%d")
+)
+
+if __name__ == "__main__":
+    with open("book/src/outputs/how_can_i_parse_dates_0.txt", "w") as f:
+        f.write(str(df))
+    with open("book/src/outputs/how_can_i_parse_dates_1.txt", "w") as f:
+        f.write(str(parsed.collect()))
diff --git a/book/src/how_can_i/aggregate.md b/book/src/how_can_i/aggregate.md
@@ -1,9 +1,10 @@
 # How can I aggregate?
 
-Aggregations can be done in a `.select` or a `.with_column` method.
+Aggregations can be done in a `.select` or a `.with_column`/`with_columns` method.
 
 If you want to do a specific aggregation on all columns you can use the wildcard expression: `.select(col("*").sum())`
 
+## Examples
 ```python
 {{#include ../examples/how_can_i/aggregate.py:1:8}}
 reddit.collect()

diff --git a/book/src/how_can_i/conditionally_apply.md b/book/src/how_can_i/conditionally_apply.md
@@ -0,0 +1,17 @@
+# How can I conditionally apply
+
+You often want to modify or add a column to DataFrame based on some condition/predicate. This is where
+the `when().then().otherwise()` expressions are for. As they are basically a full English sentence, they need no further
+explanation.
+
+
+## Examples
+
+```python
+{{#include ../examples/how_can_i/conditionally_apply.py:1:10}}
+print(out.collect())
+```
+
+```text
+{{#include ../outputs/how_can_i_conditionally_apply.txt}}
+```
diff --git a/book/src/how_can_i/groupby.md b/book/src/how_can_i/groupby.md
@@ -1,10 +1,12 @@
 # How can I groupby?
 
-The groupby operations is done with the `.groupby` method following by `.agg` method.
+The groupby operations is done with the `.groupby` method following by the `.agg` method.
 In the `.agg` method you can do as many aggregations on as many columns as you want.
 
 If you want to do a specific aggregation on all columns you can use the wildcard expression: `.agg(col("*").sum())`
 
+## Examples
+
 ```python
 {{#include ../examples/how_can_i/groupby.py:1:8}}
 reddit.collect()

diff --git a/book/src/how_can_i/parse_dates.md b/book/src/how_can_i/parse_dates.md
@@ -0,0 +1,28 @@
+# Date parsing
+
+Polars has two date data types:
+
+* Date32 
+    - a naive date represented as the number of days since the unix epoch as a 32 bit signed integer.
+    - Use this for Date objects
+* Date64
+    - a naive datetime represented as the number of milliseconds since the unix epoch as a 64 bit signed integer.
+    - Use this for DateTime objects
+
+Utf8 types can be parsed as one of the two date datetypes. You can try to let Polars parse the date(time) implicitly or
+apply you `fmt` rule. Some examples are:
+
+* `"%Y-%m-%d"` for `"2020-12-31"`
+* `"%Y/%B/%d"` for `"2020/December/31"`
+* `"%B %y"` for `"December 20"`
+
+## Examples
+
+```python
+{{#include ../examples/how_can_i/parse_dates.py:4:10}}
+print(parsed.collect())
+```
+
+```text
+{{#include ../outputs/how_can_i_parse_dates_1.txt}}
+```
diff --git a/book/src/lazy_polars/intro.md b/book/src/lazy_polars/intro.md
@@ -1,5 +1,5 @@
 # Lazy Polars
-We directly skip the eager API and dive into the lazy API of Polars. We will be exploring it's functionality by exploring
+We directly skip the eager API and dive into the lazy API of Polars. We will be exploring its functionality by exploring
 two medium large datasets of usernames; the [reddit usernames dataset](https://www.reddit.com/r/datasets/comments/9i8s5j/dataset_metadata_for_69_million_reddit_users_in/)
 containing 69+ Million rows and a [runescape username dataset](https://github.com/RuneStar/name-cleanup-2014) containing
 55+ Million rows.