From 377dd933fd6cc6d6f926b496b6c2558c0aad27a4 Mon Sep 17 00:00:00 2001
From: drizk1 <rizkytennis@gmail.com>
Date: Tue, 30 Jul 2024 17:57:42 -0400
Subject: [PATCH 1/5] add tidierdata to frameworks

---
 docs/src/man/querying_frameworks.md | 100 ++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/docs/src/man/querying_frameworks.md b/docs/src/man/querying_frameworks.md
index abda7ec6f..98f3c088f 100644
--- a/docs/src/man/querying_frameworks.md
+++ b/docs/src/man/querying_frameworks.md
@@ -8,6 +8,106 @@ DataFramesMeta.jl, DataFrameMacros.jl and Query.jl. They implement a functionali
 These frameworks are designed both to make it easier for new users to start working with data frames in Julia
 and to allow advanced users to write more compact code.
 
+## TidierData.jl
+[TidierData.jl](https://tidierorg.github.io/TidierData.jl/latest/), part of the [Tidier](https://tidierorg.github.io/Tidier.jl/dev/) metapackage, is a macro based interface that works on `DataFrames`.  The instructions below are for version 0.16.0 of TidierData.jl.
+
+First, install the TidierData.jl package:
+
+```julia
+using Pkg
+Pkg.add("TidierData")
+```
+
+TidierData.jl allows clean, readable, and fast code for all major data transformation functions including [aggregating](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/summarize/), [pivoting](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/pivots/), [nesting](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/nesting/), and [joining](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/joins/). TidierData reexports `@chain` from Chains.jl in addition to Statistics.jl to streamline working data operations and pipelines. 
+
+TidierData abstracts away vectorization with "autovectorization" (which a user can override with `~`). This abstraction means 
+TidierData code can work directly on databases via [TidierDB](https://github.com/TidierOrg/TidierDB.jl), 
+which converts TidierData Chains to DuckDB-compatible SQL which then runs on the database (in addition to 10 other backends). 
+
+```jldoctest tidierdata
+julia> using TidierData
+
+julia> df = DataFrame(name=["John", "Sally", "Roger"],
+                      age=[54.0, 34.0, 79.0],
+                      children=[0, 2, 4])
+3×3 DataFrame
+ Row │ name    age      children
+     │ String  Float64  Int64
+─────┼───────────────────────────
+   1 │ John       54.0         0
+   2 │ Sally      34.0         2
+   3 │ Roger      79.0         4
+
+julia> @chain df begin
+         @filter(children != 2)
+         @select(name, num_children = children)
+       end
+2×2 DataFrame
+ Row │ name    num_children 
+     │ String  Int64        
+─────┼──────────────────────
+   1 │ John               0
+   2 │ Roger              4
+```
+
+Below are examples showcasing `@group_by` with `@summarize` or `@mutate` - analagous to the split, apply combine pattern.
+
+```jldoctest tidierdata
+julia> df = DataFrame(groups = repeat('a':'e', inner = 2), b_col = 1:10, c_col = 11:20,  d_col = 111:120)
+10×4 DataFrame
+ Row │ groups  b_col  c_col  d_col 
+     │ Char    Int64  Int64  Int64 
+─────┼─────────────────────────────
+   1 │ a           1     11    111
+   2 │ a           2     12    112
+   3 │ b           3     13    113
+   4 │ b           4     14    114
+   5 │ c           5     15    115
+   6 │ c           6     16    116
+   7 │ d           7     17    117
+   8 │ d           8     18    118
+   9 │ e           9     19    119
+  10 │ e          10     20    120
+
+julia> @chain df begin
+         @filter(b_col > 2)
+         @group_by(groups)
+         @summarise(median_b = median(b_col), across((b_col:d_col), mean))   
+       end
+4×5 DataFrame
+ Row │ groups  median_b  b_col_mean  c_col_mean  d_col_mean 
+     │ Char    Float64   Float64     Float64     Float64    
+─────┼──────────────────────────────────────────────────────
+   1 │ b            3.5         3.5        13.5       113.5
+   2 │ c            5.5         5.5        15.5       115.5
+   3 │ d            7.5         7.5        17.5       117.5
+   4 │ e            9.5         9.5        19.5       119.5
+
+julia> @chain df begin
+         @filter(b_col > 4 && c_col <= 18)
+         @group_by(groups)
+         @mutate begin
+            new_col = b_col + maximum(d_col)
+            new_col2 = c_col - maximum(d_col)
+            new_col3 = case_when(c_col >= 18 => "high",
+                                 c_col > 15 => "medium",
+                                 true => "low")
+         end
+         @select(starts_with("new"))
+         @ungroup
+      end
+4×4 DataFrame
+ Row │ groups  new_col  new_col2  new_col3 
+     │ Char    Int64    Int64     String   
+─────┼─────────────────────────────────────
+   1 │ c           121      -101  low
+   2 │ c           122      -100  medium
+   3 │ d           125      -101  medium
+   4 │ d           126      -100  high
+```
+
+For more examples, please visit the getting started [TidierData documentation page.](https://tidierorg.github.io/TidierData.jl/latest/)
+
 ## DataFramesMeta.jl
 
 The [DataFramesMeta.jl](https://github.com/JuliaStats/DataFramesMeta.jl) package

From e994df47d0c07377f297ea5269275b4a319db23d Mon Sep 17 00:00:00 2001
From: drizk1 <rizkytennis@gmail.com>
Date: Tue, 30 Jul 2024 18:41:32 -0400
Subject: [PATCH 2/5] adds TidierData to docs toml

---
 docs/Project.toml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docs/Project.toml b/docs/Project.toml
index f6a9f940e..d821a4f08 100755
--- a/docs/Project.toml
+++ b/docs/Project.toml
@@ -9,6 +9,7 @@ Missings = "e1d29d7a-bbdc-5cf2-9ac0-f12de2c33e28"
 Query = "1a8c2f83-1ff3-5112-b086-8aa67b057ba1"
 Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
 Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
+TidierData = "fe2206b3-d496-4ee9-a338-6a095c4ece80"
 
 [compat]
 Documenter = "1"

From 935679b46dadd315978e62dc54a1cdd5bf6abdfe Mon Sep 17 00:00:00 2001
From: drizk1 <rizkytennis@gmail.com>
Date: Tue, 30 Jul 2024 19:22:14 -0400
Subject: [PATCH 3/5] change from begin end block

---
 docs/src/man/querying_frameworks.md | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/docs/src/man/querying_frameworks.md b/docs/src/man/querying_frameworks.md
index 98f3c088f..eb7f9158f 100644
--- a/docs/src/man/querying_frameworks.md
+++ b/docs/src/man/querying_frameworks.md
@@ -50,7 +50,7 @@ julia> @chain df begin
    2 │ Roger              4
 ```
 
-Below are examples showcasing `@group_by` with `@summarize` or `@mutate` - analagous to the split, apply combine pattern.
+Below are examples showcasing `@group_by` with `@summarize` or `@mutate` - analagous to the split, apply, combine pattern.
 
 ```jldoctest tidierdata
 julia> df = DataFrame(groups = repeat('a':'e', inner = 2), b_col = 1:10, c_col = 11:20,  d_col = 111:120)
@@ -86,13 +86,12 @@ julia> @chain df begin
 julia> @chain df begin
          @filter(b_col > 4 && c_col <= 18)
          @group_by(groups)
-         @mutate begin
-            new_col = b_col + maximum(d_col)
-            new_col2 = c_col - maximum(d_col)
+         @mutate(
+            new_col = b_col + maximum(d_col),
+            new_col2 = c_col - maximum(d_col),
             new_col3 = case_when(c_col >= 18 => "high",
                                  c_col > 15 => "medium",
-                                 true => "low")
-         end
+                                 true => "low"))
          @select(starts_with("new"))
          @ungroup
       end

From 115a6880cd395372e426ecfc44a9aac5e0f87fb1 Mon Sep 17 00:00:00 2001
From: Daniel Rizk <rizkytennis@gmail.com>
Date: Tue, 3 Sep 2024 21:07:34 -0400
Subject: [PATCH 4/5] add @kdpsingh edits

---
 docs/src/man/querying_frameworks.md | 94 ++++++++++++++++++++---------
 1 file changed, 67 insertions(+), 27 deletions(-)

diff --git a/docs/src/man/querying_frameworks.md b/docs/src/man/querying_frameworks.md
index eb7f9158f..fd2d2ca4f 100644
--- a/docs/src/man/querying_frameworks.md
+++ b/docs/src/man/querying_frameworks.md
@@ -9,7 +9,10 @@ These frameworks are designed both to make it easier for new users to start work
 and to allow advanced users to write more compact code.
 
 ## TidierData.jl
-[TidierData.jl](https://tidierorg.github.io/TidierData.jl/latest/), part of the [Tidier](https://tidierorg.github.io/Tidier.jl/dev/) metapackage, is a macro based interface that works on `DataFrames`.  The instructions below are for version 0.16.0 of TidierData.jl.
+[TidierData.jl](https://tidierorg.github.io/TidierData.jl/latest/), part of 
+the [Tidier](https://tidierorg.github.io/Tidier.jl/dev/) ecosystem, is a macro-based 
+data analysis interface that wraps `DataFrames`.  The instructions below are for version 
+0.16.0 of TidierData.jl.
 
 First, install the TidierData.jl package:
 
@@ -18,18 +21,49 @@ using Pkg
 Pkg.add("TidierData")
 ```
 
-TidierData.jl allows clean, readable, and fast code for all major data transformation functions including [aggregating](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/summarize/), [pivoting](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/pivots/), [nesting](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/nesting/), and [joining](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/joins/). TidierData reexports `@chain` from Chains.jl in addition to Statistics.jl to streamline working data operations and pipelines. 
-
-TidierData abstracts away vectorization with "autovectorization" (which a user can override with `~`). This abstraction means 
-TidierData code can work directly on databases via [TidierDB](https://github.com/TidierOrg/TidierDB.jl), 
-which converts TidierData Chains to DuckDB-compatible SQL which then runs on the database (in addition to 10 other backends). 
+TidierData.jl enables clean, readable, and fast code for all major data transformation 
+functions including 
+[aggregating](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/summarize/), 
+[pivoting](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/pivots/), 
+[nesting](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/nesting/), 
+and [joining](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/joins/) 
+data frames. TidierData re-exports `DataFrame()` from DataFrames.jl, `@chain` from Chain.jl, and 
+Statistics.jl to streamline data operations. 
+
+TidierData.jl is heavily inspired by the `dplyr` and `tidyr` R packages (part of the R 
+`tidyverse`), which it aims to implement using pure Julia by wrapping DataFrames.jl. While
+TidierData.jl borrows conventions from the `tidyverse`, it is important to note that the 
+`tidyverse` itself is often not considered idiomatic R code. TidierData.jl brings 
+data analysis conventions from `tidyverse` into Julia to have the best of both worlds: 
+tidy syntax and the speed and flexibility of the Julia language.
+
+TidierData.jl has two major differences from other macro-based packages. First, TidierData.jl 
+uses tidy expressions. An example of a tidy expression is `a = mean(b)`, where `b` refers 
+to an existing column in the data frame, and `a` refers to either a new or existing column. 
+Referring to variables outside of the data frame requires prefixing variables with `!!`. 
+For example, `a = mean(!!b)` refers to a variable `b` outside the data frame. Second, 
+TidierData.jl aims to make broadcasting mostly invisible through 
+[auto-vectorization](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/autovec/). TidierData.jl currently uses a lookup table to decide which functions not to 
+vectorize; all other functions are automatically vectorized. This allows for 
+writing of concise expressions: `@mutate(df, a = a - mean(a))` transforms the `a` column 
+by subtracting each value by the mean of the column. Behind the scenes, the right-hand 
+expression is converted to `a .- mean(a)` because `mean()` is in the lookup table as a 
+function that should not be vectorized. Take a look at the 
+[auto-vectorization](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/autovec/) documentation for details.
+
+One major benefit of combining tidy expressions with auto-vectorization is that 
+TidierData.jl code (which uses DataFrames.jl as its backend) can work directly on 
+databases using [TidierDB.jl](https://github.com/TidierOrg/TidierDB.jl), 
+which converts tidy expressions into SQL, supporting DuckDB and several other backends.
 
 ```jldoctest tidierdata
 julia> using TidierData
 
-julia> df = DataFrame(name=["John", "Sally", "Roger"],
-                      age=[54.0, 34.0, 79.0],
-                      children=[0, 2, 4])
+julia> df = DataFrame(
+                name = ["John", "Sally", "Roger"],
+                age = [54.0, 34.0, 79.0],
+                children = [0, 2, 4]
+            )
 3×3 DataFrame
  Row │ name    age      children
      │ String  Float64  Int64
@@ -39,8 +73,8 @@ julia> df = DataFrame(name=["John", "Sally", "Roger"],
    3 │ Roger      79.0         4
 
 julia> @chain df begin
-         @filter(children != 2)
-         @select(name, num_children = children)
+           @filter(children != 2)
+           @select(name, num_children = children)
        end
 2×2 DataFrame
  Row │ name    num_children 
@@ -53,7 +87,12 @@ julia> @chain df begin
 Below are examples showcasing `@group_by` with `@summarize` or `@mutate` - analagous to the split, apply, combine pattern.
 
 ```jldoctest tidierdata
-julia> df = DataFrame(groups = repeat('a':'e', inner = 2), b_col = 1:10, c_col = 11:20,  d_col = 111:120)
+julia> df = DataFrame(
+                groups = repeat('a':'e', inner = 2), 
+                b_col = 1:10, 
+                c_col = 11:20, 
+                d_col = 111:120
+            )
 10×4 DataFrame
  Row │ groups  b_col  c_col  d_col 
      │ Char    Int64  Int64  Int64 
@@ -70,9 +109,10 @@ julia> df = DataFrame(groups = repeat('a':'e', inner = 2), b_col = 1:10, c_col =
   10 │ e          10     20    120
 
 julia> @chain df begin
-         @filter(b_col > 2)
-         @group_by(groups)
-         @summarise(median_b = median(b_col), across((b_col:d_col), mean))   
+           @filter(b_col > 2)
+           @group_by(groups)
+           @summarise(median_b = median(b_col), 
+                      across((b_col:d_col), mean))   
        end
 4×5 DataFrame
  Row │ groups  median_b  b_col_mean  c_col_mean  d_col_mean 
@@ -84,17 +124,17 @@ julia> @chain df begin
    4 │ e            9.5         9.5        19.5       119.5
 
 julia> @chain df begin
-         @filter(b_col > 4 && c_col <= 18)
-         @group_by(groups)
-         @mutate(
-            new_col = b_col + maximum(d_col),
-            new_col2 = c_col - maximum(d_col),
-            new_col3 = case_when(c_col >= 18 => "high",
-                                 c_col > 15 => "medium",
-                                 true => "low"))
-         @select(starts_with("new"))
-         @ungroup
-      end
+           @filter(b_col > 4 && c_col <= 18)
+           @group_by(groups)
+           @mutate(
+               new_col = b_col + maximum(d_col),
+               new_col2 = c_col - maximum(d_col),
+               new_col3 = case_when(c_col >= 18  => "high",
+                                    c_col > 15   => "medium",
+                                    true         => "low"))
+           @select(starts_with("new"))
+           @ungroup # required because `@mutate` does not ungroup
+       end
 4×4 DataFrame
  Row │ groups  new_col  new_col2  new_col3 
      │ Char    Int64    Int64     String   
@@ -105,7 +145,7 @@ julia> @chain df begin
    4 │ d           126      -100  high
 ```
 
-For more examples, please visit the getting started [TidierData documentation page.](https://tidierorg.github.io/TidierData.jl/latest/)
+For more examples, please visit the [TidierData.jl](https://tidierorg.github.io/TidierData.jl/latest/) documentation.
 
 ## DataFramesMeta.jl
 

From 1eb0da0cd3abf28cc9ff727105c0795fc120a88c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bogumi=C5=82=20Kami=C5=84ski?= <bkamins@sgh.waw.pl>
Date: Sat, 7 Sep 2024 12:48:13 +0200
Subject: [PATCH 5/5] Apply suggestions from code review

---
 docs/src/man/querying_frameworks.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/src/man/querying_frameworks.md b/docs/src/man/querying_frameworks.md
index fd2d2ca4f..dad7471b2 100644
--- a/docs/src/man/querying_frameworks.md
+++ b/docs/src/man/querying_frameworks.md
@@ -11,7 +11,7 @@ and to allow advanced users to write more compact code.
 ## TidierData.jl
 [TidierData.jl](https://tidierorg.github.io/TidierData.jl/latest/), part of 
 the [Tidier](https://tidierorg.github.io/Tidier.jl/dev/) ecosystem, is a macro-based 
-data analysis interface that wraps `DataFrames`.  The instructions below are for version 
+data analysis interface that wraps DataFrames.jl.  The instructions below are for version 
 0.16.0 of TidierData.jl.
 
 First, install the TidierData.jl package:
@@ -27,7 +27,7 @@ functions including
 [pivoting](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/pivots/), 
 [nesting](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/nesting/), 
 and [joining](https://tidierorg.github.io/TidierData.jl/latest/examples/generated/UserGuide/joins/) 
-data frames. TidierData re-exports `DataFrame()` from DataFrames.jl, `@chain` from Chain.jl, and 
+data frames. TidierData re-exports `DataFrame` from DataFrames.jl, `@chain` from Chain.jl, and 
 Statistics.jl to streamline data operations. 
 
 TidierData.jl is heavily inspired by the `dplyr` and `tidyr` R packages (part of the R