From 72d87d26d9391026baf8baaeb601fd857bf62fb9 Mon Sep 17 00:00:00 2001
From: nathanrboyer <nathanrobertboyer@gmail.com>
Date: Fri, 13 Oct 2023 17:04:13 -0400
Subject: [PATCH] Move back to basics.md and add comparison

---
 docs/make.jl                           |    1 -
 docs/src/index.md                      |    7 -
 docs/src/man/basics.md                 | 2108 ++++++++++++++++++------
 docs/src/man/manipulation_functions.md | 1431 ----------------
 4 files changed, 1568 insertions(+), 1979 deletions(-)
 delete mode 100644 docs/src/man/manipulation_functions.md

diff --git a/docs/make.jl b/docs/make.jl
index d854981e2c..fa64782dac 100644
--- a/docs/make.jl
+++ b/docs/make.jl
@@ -34,7 +34,6 @@ makedocs(
             "Data manipulation frameworks" => "man/querying_frameworks.md",
             "Comparison with Python/R/Stata" => "man/comparisons.md"
         ],
-        "A Gentle Introduction to Data Frame Manipulation Functions" => "man/manipulation_functions.md",
         "API" => Any[
             "Types" => "lib/types.md",
             "Functions" => "lib/functions.md",
diff --git a/docs/src/index.md b/docs/src/index.md
index e259fd7f13..66ed6f3e5f 100644
--- a/docs/src/index.md
+++ b/docs/src/index.md
@@ -229,13 +229,6 @@ Pages = ["man/basics.md",
 Depth = 2
 ```
 
-## A Gentle Introduction to Data Frame Manipulation Functions
-
-```@contents
-Pages = ["man/manipulation_functions.md"]
-Depth = 1
-```
-
 ## API
 
 Only exported (i.e. available for use without `DataFrames.` qualifier after
diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index 4e8ba02f75..55937b849b 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1565,599 +1565,1627 @@ julia> german[Not(5), r"S"]
                 984 rows omitted
 ```
 
-## Basic Usage of Manipulation Functions
-
-In DataFrames.jl there are seven functions
-which can be used to perform operations on data frame columns:
-
-- `combine`: creates a new data frame populated with columns that result from
-  operations applied to the source data frame columns, potentially combining
-  its rows;
-- `select`: creates a new data frame that has the same number of rows as the
-  source data frame populated with columns that result from operations
-  applied to the source data frame columns;
-- `select!`: the same as `select` but updates the passed data frame in place;
-- `transform`: the same as `select` but keeps the columns that were already
-  present in the data frame (note though that these columns can be potentially
-  modified by the transformation passed to `transform`);
-- `transform!`: the same as `transform` but updates the passed data frame in
-  place.
-- `subset`: creates a new data frame populated with the same columns
-as the source data frame, but with only the rows where the passed operations are true;
-- `subset!`: the same as `subset` but updates the passed data frame in place;
-
-!!! Note Other Resources
-    * For formal, comprehensive explanations of all manipulation functions,
-    see the [Functions](@ref) API.
-    * For an informal, long-form tutorial on these functions,
-    see [A Gentle Introduction to Data Frame Manipulation Functions](@ref).
-
-Let us now move straight to examples using the German dataset.
+## Manipulation Functions
 
-```jldoctest dataframe
-julia> using Statistics
+The seven functions below can be used to manipulate data frames
+by applying operations to them.
+
+The functions without a `!` in their name
+will create a new data frame based on the source data frame,
+so you will probably want to store the new data frame to a new variable name,
+e.g. `new_df = transform(source_df, operation)`.
+The functions with a `!` at the end of their name
+will modify an existing data frame in-place,
+so there is typically no need to assign the result to a variable,
+e.g. `transform!(source_df, operation)` instead of
+`source_df = transform(source_df, operation)`.
+
+The number of columns and rows in the resultant data frame varies
+depending on the manipulation function employed.
+
+| Function     | Memory Usage                     | Column Retention                        | Row Retention                                       |
+| ------------ | -------------------------------- | --------------------------------------- | --------------------------------------------------- |
+| `transform`  | Creates a new data frame.        | Retains original and resultant columns. | Retains same number of rows as original data frame. |
+| `transform!` | Modifies an existing data frame. | Retains original and resultant columns. | Retains same number of rows as original data frame. |
+| `select`     | Creates a new data frame.        | Retains only resultant columns.         | Retains same number of rows as original data frame. |
+| `select!`    | Modifies an existing data frame. | Retains only resultant columns.         | Retains same number of rows as original data frame. |
+| `subset`     | Creates a new data frame.        | Retains original columns.               | Retains only rows where condition is true.          |
+| `subset!`    | Modifies an existing data frame. | Retains original columns.               | Retains only rows where condition is true.          |
+| `combine`    | Creates a new data frame.        | Retains only resultant columns.         | Retains only resultant rows.                        |
+
+### Constructing Operations
+
+All of the functions above use the same syntax which is commonly
+`manipulation_function(dataframe, operation)`.
+The `operation` argument defines the
+operation to be applied to the source `dataframe`,
+and it can take any of the following common forms explained below:
+
+`source_column_selector`
+: selects source column(s) without manipulating or renaming them
+
+   Examples: `:a`, `[:a, :b]`, `All()`, `Not(:a)`
+
+`source_column_selector => operation_function`
+: passes source column(s) as arguments to a function
+and automatically names the resulting column(s)
+
+   Examples: `:a => sum`, `[:a, :b] => +`, `:a => ByRow(==(3))`
+
+`source_column_selector => operation_function => new_column_names`
+: passes source column(s) as arguments to a function
+and names the resulting column(s) `new_column_names`
+
+   Examples: `:a => sum => :sum_of_a`, `[:a, :b] => + => :a_plus_b`
+
+   *(Not available for `subset` or `subset!`)*
+
+`source_column_selector => new_column_names`
+: renames a source column,
+or splits a column containing collection elements into multiple new columns
+
+   Examples: `:a => :new_a`, `:a_b => [:a, :b]`, `:nt => AsTable`
+
+   (*Not available for `subset` or `subset!`*)
+
+The `=>` operator constructs a
+[Pair](https://docs.julialang.org/en/v1/base/collections/#Core.Pair),
+which is a type to link one object to another.
+(Pairs are commonly used to create elements of a
+[Dictionary](https://docs.julialang.org/en/v1/base/collections/#Dictionaries).)
+In DataFrames.jl manipulation functions,
+`Pair` arguments are used to define column `operations` to be performed.
+The examples shown above will be explained in more detail later.
+
+*The manipulation functions also have methods for applying multiple operations.
+See the later sections [Applying Multiple Operations per Manipulation](@ref)
+and [Broadcasting Operation Pairs](@ref) for more information.*
+
+#### `source_column_selector`
+Inside an `operation`, `source_column_selector` is usually a column name
+or column index which identifies a data frame column.
+
+`source_column_selector` may be used as the entire `operation`
+with `select` or `select!` to isolate or reorder columns.
+
+```julia
+julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6], c = [7, 8, 9])
+3×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      7
+   2 │     2      5      8
+   3 │     3      6      9
+
+julia> select(df, :b)
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+
+julia> select(df, "b")
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+
+julia> select(df, 2)
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+```
+
+`source_column_selector` may also be used as the entire `operation`
+with `subset` or `subset!` if the source column contains `Bool` values.
+
+```julia
+julia> df = DataFrame(
+           name = ["Scott", "Jill", "Erica", "Jimmy"],
+           minor = [false, true, false, true],
+       )
+4×2 DataFrame
+ Row │ name    minor
+     │ String  Bool
+─────┼───────────────
+   1 │ Scott   false
+   2 │ Jill     true
+   3 │ Erica   false
+   4 │ Jimmy    true
+
+julia> subset(df, :minor)
+2×2 DataFrame
+ Row │ name    minor
+     │ String  Bool
+─────┼───────────────
+   1 │ Jill     true
+   2 │ Jimmy    true
+```
+
+`source_column_selector` may instead be a collection of columns such as a vector,
+a [regular expression](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions),
+a `Not`, `Between`, `All`, or `Cols` expression,
+or a `:`.
+See the [Indexing](@ref) API for the full list of possible values with references.
+
+!!! Note
+      The Julia parser sometimes prevents `:` from being used by itself.
+      If you get
+      `ERROR: syntax: whitespace not allowed after ":" used for quoting`,
+      try using `All()`, `Cols(:)`, or `(:)` instead to select all columns.
 
-julia> combine(german, :Age => mean => :mean_age)
+```julia
+julia> df = DataFrame(
+           id = [1, 2, 3],
+           first_name = ["José", "Emma", "Nathan"],
+           last_name = ["Garcia", "Marino", "Boyer"],
+           age = [61, 24, 33]
+       )
+3×4 DataFrame
+ Row │ id     first_name  last_name  age
+     │ Int64  String      String     Int64
+─────┼─────────────────────────────────────
+   1 │     1  José        Garcia        61
+   2 │     2  Emma        Marino        24
+   3 │     3  Nathan      Boyer         33
+
+julia> select(df, [:last_name, :first_name])
+3×2 DataFrame
+ Row │ last_name  first_name
+     │ String     String
+─────┼───────────────────────
+   1 │ Garcia     José
+   2 │ Marino     Emma
+   3 │ Boyer      Nathan
+
+julia> select(df, r"name")
+3×2 DataFrame
+ Row │ first_name  last_name
+     │ String      String
+─────┼───────────────────────
+   1 │ José        Garcia
+   2 │ Emma        Marino
+   3 │ Nathan      Boyer
+
+julia> select(df, Not(:id))
+3×3 DataFrame
+ Row │ first_name  last_name  age
+     │ String      String     Int64
+─────┼──────────────────────────────
+   1 │ José        Garcia        61
+   2 │ Emma        Marino        24
+   3 │ Nathan      Boyer         33
+
+julia> select(df, Between(2,4))
+3×3 DataFrame
+ Row │ first_name  last_name  age
+     │ String      String     Int64
+─────┼──────────────────────────────
+   1 │ José        Garcia        61
+   2 │ Emma        Marino        24
+   3 │ Nathan      Boyer         33
+
+julia> df2 = DataFrame(
+           name = ["Scott", "Jill", "Erica", "Jimmy"],
+           minor = [false, true, false, true],
+           male = [true, false, false, true],
+       )
+4×3 DataFrame
+ Row │ name    minor  male
+     │ String  Bool   Bool
+─────┼──────────────────────
+   1 │ Scott   false   true
+   2 │ Jill     true  false
+   3 │ Erica   false  false
+   4 │ Jimmy    true   true
+
+julia> subset(df2, [:minor, :male])
+1×3 DataFrame
+ Row │ name    minor  male
+     │ String  Bool   Bool
+─────┼─────────────────────
+   1 │ Jimmy    true  true
+```
+
+!!! Note
+      Using `Symbol` in `source_column_selector` will perform slightly faster than using `String`.
+      However, `String` is convenient when column names contain spaces.
+
+      All elements of `source_column_selector` must be the same type
+      (unless wrapped in `Cols`),
+      e.g. `subset(df2, [:minor, "male"])` will error
+      since `Symbol` and `String` are used simultaneously.)
+
+#### `operation_function`
+Inside an `operation` pair, `operation_function` is a function
+which operates on data frame columns passed as vectors.
+When multiple columns are selected by `source_column_selector`,
+the `operation_function` will receive the columns as separate positional arguments
+in the order they were selected, e.g. `f(column1, column2, column3)`.
+
+```julia
+julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 4])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      4
+
+julia> combine(df, :a => sum)
 1×1 DataFrame
- Row │ mean_age
+ Row │ a_sum
+     │ Int64
+─────┼───────
+   1 │     6
+
+julia> transform(df, :b => maximum) # `transform` and `select` copy scalar result to all rows
+3×3 DataFrame
+ Row │ a      b      b_maximum
+     │ Int64  Int64  Int64
+─────┼─────────────────────────
+   1 │     1      4          5
+   2 │     2      5          5
+   3 │     3      4          5
+
+julia> transform(df, [:b, :a] => -) # vector subtraction is okay
+3×3 DataFrame
+ Row │ a      b      b_a_-
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      3
+   2 │     2      5      3
+   3 │     3      4      1
+
+julia> transform(df, [:a, :b] => *) # vector multiplication is not defined
+ERROR: MethodError: no method matching *(::Vector{Int64}, ::Vector{Int64})
+```
+
+Don't worry! There is a quick fix for the previous error.
+If you want to apply a function to each element in a column
+instead of to the entire column vector,
+then you can wrap your element-wise function in `ByRow` like
+`ByRow(my_elementwise_function)`.
+This will apply `my_elementwise_function` to every element in the column
+and then collect the results back into a vector.
+
+```julia
+julia> transform(df, [:a, :b] => ByRow(*))
+3×3 DataFrame
+ Row │ a      b      a_b_*
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      4
+   2 │     2      5     10
+   3 │     3      4     12
+
+julia> transform(df, Cols(:) => ByRow(max))
+3×3 DataFrame
+ Row │ a      b      a_b_max
+     │ Int64  Int64  Int64
+─────┼───────────────────────
+   1 │     1      4        4
+   2 │     2      5        5
+   3 │     3      4        4
+
+julia> f(x) = x + 1
+f (generic function with 1 method)
+
+julia> transform(df, :a => ByRow(f))
+3×3 DataFrame
+ Row │ a      b      a_f
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      2
+   2 │     2      5      3
+   3 │     3      4      4
+```
+
+Alternatively, you may just want to define the function itself so it
+[broadcasts](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
+over vectors.
+
+```julia
+julia> g(x) = x .+ 1
+g (generic function with 1 method)
+
+julia> transform(df, :a => g)
+3×3 DataFrame
+ Row │ a      b      a_g
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      2
+   2 │     2      5      3
+   3 │     3      4      4
+
+julia> h(x, y) = x .+ y .+ 1
+h (generic function with 1 method)
+
+julia> transform(df, [:a, :b] => h)
+3×3 DataFrame
+ Row │ a      b      a_b_h
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      6
+   2 │     2      5      8
+   3 │     3      4      8
+```
+
+[Anonymous functions](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions)
+are a convenient way to define and use an `operation_function`
+all within the manipulation function call.
+
+```julia
+julia> select(df, :a => ByRow(x -> x + 1))
+3×1 DataFrame
+ Row │ a_function
+     │ Int64
+─────┼────────────
+   1 │          2
+   2 │          3
+   3 │          4
+
+julia> transform(df, [:a, :b] => ByRow((x, y) -> 2x + y))
+3×3 DataFrame
+ Row │ a      b      a_b_function
+     │ Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      4             6
+   2 │     2      5             9
+   3 │     3      4            10
+
+julia> subset(df, :b => ByRow(x -> x < 5))
+2×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     3      4
+
+julia> subset(df, :b => ByRow(<(5))) # shorter version of the previous
+2×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     3      4
+```
+
+!!! Note
+    `operation_functions` within `subset` or `subset!` function calls
+    must return a Boolean vector.
+    `true` elements in the Boolean vector will determine
+    which rows are retained in the resulting data frame.
+
+As demonstrated above, `DataFrame` columns are usually passed
+from `source_column_selector` to `operation_function` as one or more
+vector arguments.
+However, when `AsTable(source_column_selector)` is used,
+the selected columns are collected and passed as a single `NamedTuple`
+to `operation_function`.
+
+This is often useful when your `operation_function` is defined to operate
+on a single collection argument rather than on multiple positional arguments.
+The distinction is somewhat similar to the difference between the built-in
+`min` and `minimum` functions.
+`min` is defined to find the minimum value among multiple positional arguments,
+while `minimum` is defined to find the minimum value
+among the elements of a single collection argument.
+
+```julia
+julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 2:-1:1)
+2×4 DataFrame
+ Row │ a      b      c      d
+     │ Int64  Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      3      5      2
+   2 │     2      4      6      1
+
+julia> select(df, Cols(:) => ByRow(min)) # min operates on multiple arguments
+2×1 DataFrame
+ Row │ a_b_etc_min
+     │ Int64
+─────┼─────────────
+   1 │           1
+   2 │           1
+
+julia> select(df, AsTable(:) => ByRow(minimum)) # minimum operates on a collection
+2×1 DataFrame
+ Row │ a_b_etc_minimum
+     │ Int64
+─────┼─────────────────
+   1 │               1
+   2 │               1
+
+julia> select(df, [:a,:b] => ByRow(+)) # `+` operates on a multiple arguments
+2×1 DataFrame
+ Row │ a_b_+
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     6
+
+julia> select(df, AsTable([:a,:b]) => ByRow(sum)) # `sum` operates on a collection
+2×1 DataFrame
+ Row │ a_b_sum
+     │ Int64
+─────┼─────────
+   1 │       4
+   2 │       6
+
+julia> using Statistics # contains the `mean` function
+
+julia> select(df, AsTable(Between(:b, :d)) => ByRow(mean)) # `mean` operates on a collection
+2×1 DataFrame
+ Row │ b_c_d_mean
      │ Float64
-─────┼──────────
-   1 │   35.546
+─────┼────────────
+   1 │    3.33333
+   2 │    3.66667
+```
 
-julia> select(german, :Age => mean => :mean_age)
-1000×1 DataFrame
-  Row │ mean_age
-      │ Float64
-──────┼──────────
-    1 │   35.546
-    2 │   35.546
-    3 │   35.546
-    4 │   35.546
-    5 │   35.546
-    6 │   35.546
-    7 │   35.546
-    8 │   35.546
-  ⋮   │    ⋮
-  994 │   35.546
-  995 │   35.546
-  996 │   35.546
-  997 │   35.546
-  998 │   35.546
-  999 │   35.546
- 1000 │   35.546
- 985 rows omitted
-```
-
-As you can see in both cases the `mean` function was applied to `:Age` column
-and the result was stored in the `:mean_age` column. The difference between
-the `combine` and `select` functions is that the `combine` aggregates data
-and produces as many rows as were returned by the transformation function.
-On the other hand the `select` function always keeps the number of rows in a
-data frame to be the same as in the source data frame. Therefore in this case
-the result of the `mean` function got broadcasted.
-
-As `combine` potentially allows any number of rows to be produced as a result
-of the transformation if we have a combination of transformations where some of
-them produce a vector, and other produce scalars then scalars get broadcasted
-exactly like in  `select`. Here is an example:
+`AsTable` can also be used to pass columns to a function which operates
+on fields of a `NamedTuple`.
 
-```jldoctest dataframe
-julia> combine(german, :Age => mean => :mean_age, :Housing => unique => :housing)
-3×2 DataFrame
- Row │ mean_age  housing
-     │ Float64   String7
-─────┼───────────────────
-   1 │   35.546  own
-   2 │   35.546  free
-   3 │   35.546  rent
+```julia
+julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 7:8)
+2×4 DataFrame
+ Row │ a      b      c      d
+     │ Int64  Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      3      5      7
+   2 │     2      4      6      8
+
+julia> f(nt) = nt.a + nt.d
+f (generic function with 1 method)
+
+julia> transform(df, AsTable(:) => ByRow(f))
+2×5 DataFrame
+ Row │ a      b      c      d      a_b_etc_f
+     │ Int64  Int64  Int64  Int64  Int64
+─────┼───────────────────────────────────────
+   1 │     1      3      5      7          8
+   2 │     2      4      6      8         10
 ```
 
-Note, however, that it is not allowed to return vectors of different lengths in
-different transformations:
+As demonstrated above,
+in the `source_column_selector => operation_function` operation pair form,
+the results of an operation will be placed into a new column with an
+automatically-generated name based on the operation;
+the new column name will be the `operation_function` name
+appended to the source column name(s) with an underscore.
 
-```jldoctest dataframe
-julia> combine(german, :Age, :Housing => unique => :Housing)
-ERROR: ArgumentError: New columns must have the same length as old columns
+This automatic column naming behavior can be avoided in two ways.
+First, the operation result can be placed back into the original column
+with the original column name by switching the keyword argument `renamecols`
+from its default value (`true`) to `renamecols=false`.
+This option prevents the function name from being appended to the column name
+as it usually would be.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :a => ByRow(x->x+10), renamecols=false) # add 10 in-place
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │    11      5
+   2 │    12      6
+   3 │    13      7
+   4 │    14      8
 ```
 
-Let us discuss some other examples using `select`. Often we want to apply some
-function not to the whole column of a data frame, but rather to its individual
-elements. Normally we can achieve this using broadcasting like this:
+The second method to avoid the default manipulation column naming is to
+specify your own `new_column_names`.
 
-```jldoctest dataframe
-julia> select(german, :Sex => (x -> uppercase.(x)) => :Sex)
-1000×1 DataFrame
-  Row │ Sex
-      │ String
-──────┼────────
-    1 │ MALE
-    2 │ FEMALE
-    3 │ MALE
-    4 │ MALE
-    5 │ MALE
-    6 │ MALE
-    7 │ MALE
-    8 │ MALE
-  ⋮   │   ⋮
-  994 │ MALE
-  995 │ MALE
-  996 │ FEMALE
-  997 │ MALE
-  998 │ MALE
-  999 │ MALE
- 1000 │ MALE
-985 rows omitted
+#### `new_column_names`
+
+`new_column_names` can be included at the end of an `operation` pair to specify
+the name of the new column(s).
+`new_column_names` may be a symbol, string, function, vector of symbols, vector of strings, or `AsTable`.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, Cols(:) => ByRow(+) => :c)
+4×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, Cols(:) => ByRow(+) => "a+b")
+4×3 DataFrame
+ Row │ a      b      a+b
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, :a => ByRow(x->x+10) => "a+10")
+4×3 DataFrame
+ Row │ a      b      a+10
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     11
+   2 │     2      6     12
+   3 │     3      7     13
+   4 │     4      8     14
 ```
 
-This pattern is encountered very often in practice, therefore there is a `ByRow`
-convenience wrapper for a function that creates its broadcasted variant. In
-these examples `ByRow` is a special type used for selection operations to signal
-that the wrapped function should be applied to each element (row) of the
-selection. Here we are passing `ByRow` wrapper to target column name `:Sex`
-using `uppercase` function:
+The `source_column_selector => new_column_names` operation form
+can be used to rename columns without an intermediate function.
+However, there are `rename` and `rename!` functions,
+which accept similar syntax,
+that tend to be more useful for this operation.
 
-```jldoctest dataframe
-julia> select(german, :Sex => ByRow(uppercase) => :SEX)
-1000×1 DataFrame
-  Row │ SEX
-      │ String
-──────┼────────
-    1 │ MALE
-    2 │ FEMALE
-    3 │ MALE
-    4 │ MALE
-    5 │ MALE
-    6 │ MALE
-    7 │ MALE
-    8 │ MALE
-  ⋮   │   ⋮
-  994 │ MALE
-  995 │ MALE
-  996 │ FEMALE
-  997 │ MALE
-  998 │ MALE
-  999 │ MALE
- 1000 │ MALE
-985 rows omitted
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :a => :apple) # adds column `apple`
+4×3 DataFrame
+ Row │ a      b      apple
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      1
+   2 │     2      6      2
+   3 │     3      7      3
+   4 │     4      8      4
+
+julia> select(df, :a => :apple) # retains only column `apple`
+4×1 DataFrame
+ Row │ apple
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+   4 │     4
+
+julia> rename(df, :a => :apple) # renames column `a` to `apple` in-place
+4×2 DataFrame
+ Row │ apple  b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
 ```
 
-In this case we transform our source column `:Age` using `ByRow` wrapper and
-automatically generate the target column name:
+If `new_column_names` already exist in the source data frame,
+those columns will be replaced in the existing column location
+rather than being added to the end.
+This can be done by manually specifying an existing column name
+or by using the `renamecols=false` keyword argument.
 
-```jldoctest dataframe
-julia> select(german, :Age, :Age => ByRow(sqrt))
-1000×2 DataFrame
-  Row │ Age    Age_sqrt
-      │ Int64  Float64
-──────┼─────────────────
-    1 │    67   8.18535
-    2 │    22   4.69042
-    3 │    49   7.0
-    4 │    45   6.7082
-    5 │    53   7.28011
-    6 │    35   5.91608
-    7 │    53   7.28011
-    8 │    35   5.91608
-  ⋮   │   ⋮       ⋮
-  994 │    30   5.47723
-  995 │    50   7.07107
-  996 │    31   5.56776
-  997 │    40   6.32456
-  998 │    38   6.16441
-  999 │    23   4.79583
- 1000 │    27   5.19615
-        985 rows omitted
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :b => (x -> x .+ 10))  # automatic new column and column name
+4×3 DataFrame
+ Row │ a      b      b_function
+     │ Int64  Int64  Int64
+─────┼──────────────────────────
+   1 │     1      5          15
+   2 │     2      6          16
+   3 │     3      7          17
+   4 │     4      8          18
+
+julia> transform(df, :b => (x -> x .+ 10), renamecols=false)  # transform column in-place
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1     15
+   2 │     2     16
+   3 │     3     17
+   4 │     4     18
+
+julia> transform(df, :b => (x -> x .+ 10) => :a)  # replace column :a
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │    15      5
+   2 │    16      6
+   3 │    17      7
+   4 │    18      8
 ```
 
-When we pass just a column (without the `=>` part) we can use any column selector
-that is allowed in indexing.
+Actually, `renamecols=false` just prevents the function name from being appended to the final column name such that the operation is *usually* returned to the same column.
 
-Here we exclude the column `:Age` from the resulting data frame:
+```julia
+julia> transform(df, [:a, :b] => +)  # new column name is all source columns and function name
+4×3 DataFrame
+ Row │ a      b      a_b_+
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, [:a, :b] => +, renamecols=false)  # same as above but with no function name
+4×3 DataFrame
+ Row │ a      b      a_b
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
 
-```jldoctest dataframe
-julia> select(german, Not(:Age))
-1000×9 DataFrame
-  Row │ id     Sex      Job    Housing  Saving accounts  Checking account  Cre ⋯
-      │ Int64  String7  Int64  String7  String15         String15          Int ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │     0  male         2  own      NA               little                ⋯
-    2 │     1  female       2  own      little           moderate
-    3 │     2  male         1  own      little           NA
-    4 │     3  male         2  free     little           little
-    5 │     4  male         2  free     little           little                ⋯
-    6 │     5  male         1  free     NA               NA
-    7 │     6  male         2  own      quite rich       NA
-    8 │     7  male         3  rent     little           moderate
-  ⋮   │   ⋮       ⋮       ⋮       ⋮            ⋮                ⋮              ⋱
-  994 │   993  male         3  own      little           little                ⋯
-  995 │   994  male         2  own      NA               NA
-  996 │   995  female       1  own      little           NA
-  997 │   996  male         3  own      little           little
-  998 │   997  male         2  own      little           NA                    ⋯
-  999 │   998  male         2  free     little           little
- 1000 │   999  male         2  own      moderate         moderate
-                                                  3 columns and 985 rows omitted
+julia> transform(df, [:a, :b] => (+) => :a)  # manually overwrite column :a (see Note below about parentheses)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     6      5
+   2 │     8      6
+   3 │    10      7
+   4 │    12      8
 ```
 
-In the next example we drop columns `"Age"`, `"Saving accounts"`,
-`"Checking account"`, `"Credit amount"`, and `"Purpose"`. Note that this time
-we use string column selectors because some of the column names have spaces
-in them:
+In the `source_column_selector => operation_function => new_column_names` operation form,
+`new_column_names` may also be a renaming function which operates on a string
+to create the destination column names programmatically.
 
-```jldoctest dataframe
-julia> select(german, Not(["Age", "Saving accounts", "Checking account",
-                           "Credit amount", "Purpose"]))
-1000×5 DataFrame
-  Row │ id     Sex      Job    Housing  Duration
-      │ Int64  String7  Int64  String7  Int64
-──────┼──────────────────────────────────────────
-    1 │     0  male         2  own             6
-    2 │     1  female       2  own            48
-    3 │     2  male         1  own            12
-    4 │     3  male         2  free           42
-    5 │     4  male         2  free           24
-    6 │     5  male         1  free           36
-    7 │     6  male         2  own            24
-    8 │     7  male         3  rent           36
-  ⋮   │   ⋮       ⋮       ⋮       ⋮        ⋮
-  994 │   993  male         3  own            36
-  995 │   994  male         2  own            12
-  996 │   995  female       1  own            12
-  997 │   996  male         3  own            30
-  998 │   997  male         2  own            12
-  999 │   998  male         2  free           45
- 1000 │   999  male         2  own            45
-                                 985 rows omitted
-
-```
-
-As another example let us present that the `r"S"` regular expression we used
-above also works with `select`:
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
 
-```jldoctest dataframe
-julia> select(german, r"S")
-1000×2 DataFrame
-  Row │ Sex      Saving accounts
-      │ String7  String15
-──────┼──────────────────────────
-    1 │ male     NA
-    2 │ female   little
-    3 │ male     little
-    4 │ male     little
-    5 │ male     little
-    6 │ male     NA
-    7 │ male     quite rich
-    8 │ male     little
-  ⋮   │    ⋮            ⋮
-  994 │ male     little
-  995 │ male     NA
-  996 │ female   little
-  997 │ male     little
-  998 │ male     little
-  999 │ male     little
- 1000 │ male     moderate
-                 985 rows omitted
-```
-
-The benefit of `select` or `combine` over indexing is that it is easier
-to get the union of several column selectors, e.g.:
+julia> add_prefix(s) = "new_" * s
+add_prefix (generic function with 1 method)
 
-```jldoctest dataframe
-julia> select(german, r"S", "Job", 1)
-1000×4 DataFrame
-  Row │ Sex      Saving accounts  Job    id
-      │ String7  String15         Int64  Int64
-──────┼────────────────────────────────────────
-    1 │ male     NA                   2      0
-    2 │ female   little               2      1
-    3 │ male     little               1      2
-    4 │ male     little               2      3
-    5 │ male     little               2      4
-    6 │ male     NA                   1      5
-    7 │ male     quite rich           2      6
-    8 │ male     little               3      7
-  ⋮   │    ⋮            ⋮           ⋮      ⋮
-  994 │ male     little               3    993
-  995 │ male     NA                   2    994
-  996 │ female   little               1    995
-  997 │ male     little               3    996
-  998 │ male     little               2    997
-  999 │ male     little               2    998
- 1000 │ male     moderate             2    999
-                               985 rows omitted
-```
-
-Taking advantage of this flexibility here is an idiomatic pattern to move some
-column to the front of a data frame:
+julia> transform(df, :a => (x -> 10 .* x) => add_prefix) # with named renaming function
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     10
+   2 │     2      6     20
+   3 │     3      7     30
+   4 │     4      8     40
+
+julia> transform(df, :a => (x -> 10 .* x) => (s -> "new_" * s)) # with anonymous renaming function
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     10
+   2 │     2      6     20
+   3 │     3      7     30
+   4 │     4      8     40
+```
+
+!!! Note
+      It is a good idea to wrap anonymous functions in parentheses
+      to avoid the `=>` operator accidently becoming part of the anonymous function.
+      The examples above do not work correctly without the parentheses!
+      ```julia
+      julia> transform(df, :a => x -> 10 .* x => add_prefix)  # Not what we wanted!
+      4×3 DataFrame
+       Row │ a      b      a_function
+           │ Int64  Int64  Pair…
+      ─────┼────────────────────────────────────────────
+         1 │     1      5  [10, 20, 30, 40]=>add_prefix
+         2 │     2      6  [10, 20, 30, 40]=>add_prefix
+         3 │     3      7  [10, 20, 30, 40]=>add_prefix
+         4 │     4      8  [10, 20, 30, 40]=>add_prefix
+
+      julia> transform(df, :a => x -> 10 .* x => s -> "new_" * s)  # Not what we wanted!
+      4×3 DataFrame
+       Row │ a      b      a_function
+           │ Int64  Int64  Pair…
+      ─────┼─────────────────────────────────────
+         1 │     1      5  [10, 20, 30, 40]=>#18
+         2 │     2      6  [10, 20, 30, 40]=>#18
+         3 │     3      7  [10, 20, 30, 40]=>#18
+         4 │     4      8  [10, 20, 30, 40]=>#18
+      ```
+
+A renaming function will not work in the
+`source_column_selector => new_column_names` operation form
+because a function in the second element of the operation pair is assumed to take
+the `source_column_selector => operation_function` operation form.
+To work around this limitation, use the
+`source_column_selector => operation_function => new_column_names` operation form
+with `identity` as the `operation_function`.
 
-```jldoctest dataframe
-julia> select(german, "Sex", :)
-1000×10 DataFrame
-  Row │ Sex      id     Age    Job    Housing  Saving accounts  Checking accou ⋯
-      │ String7  Int64  Int64  Int64  String7  String15         String15       ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │ male         0     67      2  own      NA               little         ⋯
-    2 │ female       1     22      2  own      little           moderate
-    3 │ male         2     49      1  own      little           NA
-    4 │ male         3     45      2  free     little           little
-    5 │ male         4     53      2  free     little           little         ⋯
-    6 │ male         5     35      1  free     NA               NA
-    7 │ male         6     53      2  own      quite rich       NA
-    8 │ male         7     35      3  rent     little           moderate
-  ⋮   │    ⋮       ⋮      ⋮      ⋮       ⋮            ⋮                ⋮       ⋱
-  994 │ male       993     30      3  own      little           little         ⋯
-  995 │ male       994     50      2  own      NA               NA
-  996 │ female     995     31      1  own      little           NA
-  997 │ male       996     40      3  own      little           little
-  998 │ male       997     38      2  own      little           NA             ⋯
-  999 │ male       998     23      2  free     little           little
- 1000 │ male       999     27      2  own      moderate         moderate
-                                                  4 columns and 985 rows omitted
+```julia
+julia> transform(df, :a => add_prefix)
+ERROR: MethodError: no method matching *(::String, ::Vector{Int64})
+
+julia> transform(df, :a => identity => add_prefix)
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      1
+   2 │     2      6      2
+   3 │     3      7      3
+   4 │     4      8      4
 ```
 
-Below, we are simply passing source column and target column name to rename them
-(without specifying the transformation part):
+In this case though,
+it is probably again more useful to use the `rename` or `rename!` function
+rather than one of the manipulation functions
+in order to rename in-place and avoid the intermediate `operation_function`.
+```julia
+julia> rename(add_prefix, df)  # rename all columns with a function
+4×2 DataFrame
+ Row │ new_a  new_b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> rename(add_prefix, df; cols=:a)  # rename some columns with a function
+4×2 DataFrame
+ Row │ new_a  b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+```
 
-```jldoctest dataframe
-julia> select(german, :Sex => :x1, :Age => :x2)
-1000×2 DataFrame
-  Row │ x1       x2
-      │ String7  Int64
-──────┼────────────────
-    1 │ male        67
-    2 │ female      22
-    3 │ male        49
-    4 │ male        45
-    5 │ male        53
-    6 │ male        35
-    7 │ male        53
-    8 │ male        35
-  ⋮   │    ⋮       ⋮
-  994 │ male        30
-  995 │ male        50
-  996 │ female      31
-  997 │ male        40
-  998 │ male        38
-  999 │ male        23
- 1000 │ male        27
-       985 rows omitted
+In the `source_column_selector => new_column_names` operation form,
+only a single source column may be selected per operation,
+so why is `new_column_names` plural?
+It is possible to split the data contained inside a single column
+into multiple new columns by supplying a vector of strings or symbols
+as `new_column_names`.
+
+```julia
+julia> df = DataFrame(data = [(1,2), (3,4)]) # vector of tuples
+2×1 DataFrame
+ Row │ data
+     │ Tuple…
+─────┼────────
+   1 │ (1, 2)
+   2 │ (3, 4)
+
+julia> transform(df, :data => [:first, :second]) # manual naming
+2×3 DataFrame
+ Row │ data    first  second
+     │ Tuple…  Int64  Int64
+─────┼───────────────────────
+   1 │ (1, 2)      1       2
+   2 │ (3, 4)      3       4
 ```
 
-It is important to note that `select` always returns a data frame, even if a
-single column selected as opposed to indexing syntax. Compare the following:
+This kind of data splitting can even be done automatically with `AsTable`.
 
-```jldoctest dataframe
-julia> select(german, :Age)
-1000×1 DataFrame
-  Row │ Age
-      │ Int64
-──────┼───────
-    1 │    67
-    2 │    22
-    3 │    49
-    4 │    45
-    5 │    53
-    6 │    35
-    7 │    53
-    8 │    35
-  ⋮   │   ⋮
-  994 │    30
-  995 │    50
-  996 │    31
-  997 │    40
-  998 │    38
-  999 │    23
- 1000 │    27
-985 rows omitted
+```julia
+julia> transform(df, :data => AsTable) # default automatic naming with tuples
+2×3 DataFrame
+ Row │ data    x1     x2
+     │ Tuple…  Int64  Int64
+─────┼──────────────────────
+   1 │ (1, 2)      1      2
+   2 │ (3, 4)      3      4
+```
 
-julia> german[:, :Age]
-1000-element Vector{Int64}:
- 67
- 22
- 49
- 45
- 53
- 35
- 53
- 35
- 61
- 28
-  ⋮
- 34
- 23
- 30
- 50
- 31
- 40
- 38
- 23
- 27
-```
-
-By default `select` copies columns of a passed source data frame. In order to
-avoid copying, pass the `copycols=false` keyword argument:
+If a data frame column contains `NamedTuple`s,
+then `AsTable` will preserve the field names.
+```julia
+julia> df = DataFrame(data = [(a=1,b=2), (a=3,b=4)]) # vector of named tuples
+2×1 DataFrame
+ Row │ data
+     │ NamedTup…
+─────┼────────────────
+   1 │ (a = 1, b = 2)
+   2 │ (a = 3, b = 4)
 
-```jldoctest dataframe
-julia> df = select(german, :Sex)
-1000×1 DataFrame
-  Row │ Sex
-      │ String7
-──────┼─────────
-    1 │ male
-    2 │ female
-    3 │ male
-    4 │ male
-    5 │ male
-    6 │ male
-    7 │ male
-    8 │ male
-  ⋮   │    ⋮
-  994 │ male
-  995 │ male
-  996 │ female
-  997 │ male
-  998 │ male
-  999 │ male
- 1000 │ male
-985 rows omitted
+julia> transform(df, :data => AsTable) # keeps names from named tuples
+2×3 DataFrame
+ Row │ data            a      b
+     │ NamedTup…       Int64  Int64
+─────┼──────────────────────────────
+   1 │ (a = 1, b = 2)      1      2
+   2 │ (a = 3, b = 4)      3      4
+```
 
-julia> df.Sex === german.Sex # copy
-false
+!!! Note
+      To pack multiple columns into a single column of `NamedTuple`s
+      (reverse of the above operation)
+      apply the `identity` function `ByRow`, e.g.
+      `transform(df, AsTable([:a, :b]) => ByRow(identity) => :data)`.
 
-julia> df = select(german, :Sex, copycols=false)
-1000×1 DataFrame
-  Row │ Sex
-      │ String7
-──────┼─────────
-    1 │ male
-    2 │ female
-    3 │ male
-    4 │ male
-    5 │ male
-    6 │ male
-    7 │ male
-    8 │ male
-  ⋮   │    ⋮
-  994 │ male
-  995 │ male
-  996 │ female
-  997 │ male
-  998 │ male
-  999 │ male
- 1000 │ male
-985 rows omitted
+Renaming functions also work for multi-column transformations,
+but they must operate on a vector of strings.
+
+```julia
+julia> df = DataFrame(data = [(1,2), (3,4)])
+2×1 DataFrame
+ Row │ data
+     │ Tuple…
+─────┼────────
+   1 │ (1, 2)
+   2 │ (3, 4)
+
+julia> new_names(v) = ["primary ", "secondary "] .* v
+new_names (generic function with 1 method)
+
+julia> transform(df, :data => identity => new_names)
+2×3 DataFrame
+ Row │ data    primary data  secondary data
+     │ Tuple…  Int64         Int64
+─────┼──────────────────────────────────────
+   1 │ (1, 2)             1               2
+   2 │ (3, 4)             3               4
+```
+
+### Applying Multiple Operations per Manipulation
+All data frame manipulation functions can accept multiple `operation` pairs
+at once using any of the following methods:
+- `manipulation_function(dataframe, operation1, operation2)`   : multiple arguments
+- `manipulation_function(dataframe, [operation1, operation2])` : vector argument
+- `manipulation_function(dataframe, [operation1 operation2])`  : matrix argument
+
+Passing multiple operations is especially useful for the `select`, `select!`,
+and `combine` manipulation functions,
+since they only retain columns which are a result of the passed operations.
+
+```julia
+julia> df = DataFrame(a = 1:4, b = [50,50,60,60], c = ["hat","bat","cat","dog"])
+4×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │     1     50  hat
+   2 │     2     50  bat
+   3 │     3     60  cat
+   4 │     4     60  dog
+
+julia> combine(df, :a => maximum, :b => sum, :c => join) # 3 combine operations
+1×3 DataFrame
+ Row │ a_maximum  b_sum  c_join
+     │ Int64      Int64  String
+─────┼────────────────────────────────
+   1 │         4    220  hatbatcatdog
+
+julia> select(df, :c, :b, :a) # re-order columns
+4×3 DataFrame
+ Row │ c       b      a
+     │ String  Int64  Int64
+─────┼──────────────────────
+   1 │ hat        50      1
+   2 │ bat        50      2
+   3 │ cat        60      3
+   4 │ dog        60      4
+
+ulia> select(df, :b, :) # `:` here means all other columns
+4×3 DataFrame
+ Row │ b      a      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │    50      1  hat
+   2 │    50      2  bat
+   3 │    60      3  cat
+   4 │    60      4  dog
+
+julia> select(
+           df,
+           :c => (x -> "a " .* x) => :one_c,
+           :a => (x -> 100x),
+           :b,
+           renamecols=false
+       ) # can mix operation forms
+4×3 DataFrame
+ Row │ one_c   a      b
+     │ String  Int64  Int64
+─────┼──────────────────────
+   1 │ a hat     100     50
+   2 │ a bat     200     50
+   3 │ a cat     300     60
+   4 │ a dog     400     60
+
+julia> select(
+           df,
+           :c => ByRow(reverse),
+           :c => ByRow(uppercase)
+       ) # multiple operations on same column
+4×2 DataFrame
+ Row │ c_reverse  c_uppercase
+     │ String     String
+─────┼────────────────────────
+   1 │ tah        HAT
+   2 │ tab        BAT
+   3 │ tac        CAT
+   4 │ god        DOG
+```
+
+In the last two examples,
+the manipulation function arguments were split across multiple lines.
+This is a good way to make manipulations with many operations more readable.
+
+Passing multiple operations to `subset` or `subset!` is an easy way to narrow in
+on a particular row of data.
+
+```julia
+julia> subset(
+           df,
+           :b => ByRow(==(60)),
+           :c => ByRow(contains("at"))
+       ) # rows with 60 and "at"
+1×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │     3     60  cat
+```
 
-julia> df.Sex === german.Sex # no-copy is performed
+Note that all operations within a single manipulation must use the data
+as it existed before the function call
+i.e. you cannot use newly created columns for subsequent operations
+within the same manipulation.
+
+```julia
+julia> transform(
+           df,
+           [:a, :b] => ByRow(+) => :d,
+           :d => (x -> x ./ 2),
+       ) # requires two separate transformations
+ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
+
+julia> new_df = transform(df, [:a, :b] => ByRow(+) => :d)
+4×4 DataFrame
+ Row │ a      b      c       d
+     │ Int64  Int64  String  Int64
+─────┼─────────────────────────────
+   1 │     1     50  hat        51
+   2 │     2     50  bat        52
+   3 │     3     60  cat        63
+   4 │     4     60  dog        64
+
+julia> transform!(new_df, :d => (x -> x ./ 2) => :d_2)
+4×5 DataFrame
+ Row │ a      b      c       d      d_2
+     │ Int64  Int64  String  Int64  Float64
+─────┼──────────────────────────────────────
+   1 │     1     50  hat        51     25.5
+   2 │     2     50  bat        52     26.0
+   3 │     3     60  cat        63     31.5
+   4 │     4     60  dog        64     32.0
+```
+
+
+### Broadcasting Operation Pairs
+
+[Broadcasting](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
+pairs with `.=>` is often a convenient way to generate multiple
+similar `operation`s to be applied within a single manipulation.
+Broadcasting within the `Pair` of an `operation` is no different than
+broadcasting in base Julia.
+The broadcasting `.=>` will be expanded into a vector of pairs
+(`[operation1, operation2, ...]`),
+and this expansion will occur before the manipulation function is invoked.
+Then the manipulation function will use the
+`manipulation_function(dataframe, [operation1, operation2, ...])` method.
+This process will be explained in more detail below.
+
+To illustrate these concepts, let us first examine the `Type` of a basic `Pair`.
+In DataFrames.jl, a symbol, string, or integer
+may be used to select a single column.
+Some `Pair`s with these types are below.
+
+```julia
+julia> typeof(:x => :a)
+Pair{Symbol, Symbol}
+
+julia> typeof("x" => "a")
+Pair{String, String}
+
+julia> typeof(1 => "a")
+Pair{Int64, String}
+```
+
+Any of the `Pair`s above could be used to rename the first column
+of the data frame below to `a`.
+
+```julia
+julia> df = DataFrame(x = 1:3, y = 4:6)
+3×2 DataFrame
+ Row │ x      y
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+
+julia> select(df, :x => :a)
+3×1 DataFrame
+ Row │ a
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+
+julia> select(df, 1 => "a")
+3×1 DataFrame
+ Row │ a
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+```
+
+What should we do if we want to keep and rename both the `x` and `y` column?
+One option is to supply a `Vector` of operation `Pair`s to `select`.
+`select` will process all of these operations in order.
+
+```julia
+julia> ["x" => "a", "y" => "b"]
+2-element Vector{Pair{String, String}}:
+ "x" => "a"
+ "y" => "b"
+
+julia> select(df, ["x" => "a", "y" => "b"])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+We can use broadcasting to simplify the syntax above.
+
+```julia
+julia> ["x", "y"] .=> ["a", "b"]
+2-element Vector{Pair{String, String}}:
+ "x" => "a"
+ "y" => "b"
+
+julia> select(df, ["x", "y"] .=> ["a", "b"])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+Notice that `select` sees the same `Vector{Pair{String, String}}` operation
+argument whether the individual pairs are written out explicitly or
+constructed with broadcasting.
+The broadcasting is applied before the call to `select`.
+
+```julia
+julia> ["x" => "a", "y" => "b"] == (["x", "y"] .=> ["a", "b"])
 true
 ```
 
-To perform the selection operation in-place use `select!`:
+!!! Note
+      These operation pairs (or vector of pairs) can be given variable names.
+      This is uncommon in practice but could be helpful for intermediate
+      inspection and testing.
+      ```julia
+      df = DataFrame(x = 1:3, y = 4:6)       # create data frame
+      operation = ["x", "y"] .=> ["a", "b"]  # save operation to variable
+      typeof(operation)                      # check type of operation
+      first(operation)                       # check first pair in operation
+      last(operation)                        # check last pair in operation
+      select(df, operation)                  # manipulate `df` with `operation`
+      ```
 
-```jldoctest dataframe
-julia> select!(german, Not(:Age));
+In Julia,
+a non-vector broadcasted with a vector will be repeated in each resultant pair element.
 
-julia> german
-1000×9 DataFrame
-  Row │ id     Sex      Job    Housing  Saving accounts  Checking account  Cre ⋯
-      │ Int64  String7  Int64  String7  String15         String15          Int ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │     0  male         2  own      NA               little                ⋯
-    2 │     1  female       2  own      little           moderate
-    3 │     2  male         1  own      little           NA
-    4 │     3  male         2  free     little           little
-    5 │     4  male         2  free     little           little                ⋯
-    6 │     5  male         1  free     NA               NA
-    7 │     6  male         2  own      quite rich       NA
-    8 │     7  male         3  rent     little           moderate
-  ⋮   │   ⋮       ⋮       ⋮       ⋮            ⋮                ⋮              ⋱
-  994 │   993  male         3  own      little           little                ⋯
-  995 │   994  male         2  own      NA               NA
-  996 │   995  female       1  own      little           NA
-  997 │   996  male         3  own      little           little
-  998 │   997  male         2  own      little           NA                    ⋯
-  999 │   998  male         2  free     little           little
- 1000 │   999  male         2  own      moderate         moderate
-                                                  3 columns and 985 rows omitted
+```julia
+julia> ["x", "y"] .=> :a    # :a is repeated
+2-element Vector{Pair{String, Symbol}}:
+ "x" => :a
+ "y" => :a
+
+julia> 1 .=> [:a, :b]       # 1 is repeated
+2-element Vector{Pair{Int64, Symbol}}:
+ 1 => :a
+ 1 => :b
 ```
 
-As you can see the `:Age` column was dropped from the `german` data frame.
+We can use this fact to easily broadcast an `operation_function` to multiple columns.
 
-The `transform` and `transform!` functions work identically to `select` and
-`select!` with the only difference that they retain all columns that are present
-in the source data frame. Here are some examples:
+```julia
+julia> f(x) = 2 * x
+f (generic function with 1 method)
 
-```jldoctest dataframe
-julia> german = copy(german_ref);
+julia> ["x", "y"] .=> f  # f is repeated
+2-element Vector{Pair{String, typeof(f)}}:
+ "x" => f
+ "y" => f
 
-julia> df = german_ref[1:8, 1:5]
-8×5 DataFrame
- Row │ id     Age    Sex      Job    Housing
-     │ Int64  Int64  String7  Int64  String7
-─────┼───────────────────────────────────────
-   1 │     0     67  male         2  own
-   2 │     1     22  female       2  own
-   3 │     2     49  male         1  own
-   4 │     3     45  male         2  free
-   5 │     4     53  male         2  free
-   6 │     5     35  male         1  free
-   7 │     6     53  male         2  own
-   8 │     7     35  male         3  rent
-
-julia> transform(df, :Age => maximum)
-8×6 DataFrame
- Row │ id     Age    Sex      Job    Housing  Age_maximum
-     │ Int64  Int64  String7  Int64  String7  Int64
+julia> select(df, ["x", "y"] .=> f)  # apply f with automatic column renaming
+3×2 DataFrame
+ Row │ x_f    y_f
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+
+julia> ["x", "y"] .=> f .=> ["a", "b"]  # f is repeated
+2-element Vector{Pair{String, Pair{typeof(f), String}}}:
+ "x" => (f => "a")
+ "y" => (f => "b")
+
+julia> select(df, ["x", "y"] .=> f .=> ["a", "b"])  # apply f with manual column renaming
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+```
+
+A renaming function can be applied to multiple columns in the same way.
+It will also be repeated in each operation `Pair`.
+
+```julia
+julia> newname(s::String) = s * "_new"
+newname (generic function with 1 method)
+
+julia> ["x", "y"] .=> f .=> newname  # both f and newname are repeated
+2-element Vector{Pair{String, Pair{typeof(f), typeof(newname)}}}:
+ "x" => (f => newname)
+ "y" => (f => newname)
+
+julia> select(df, ["x", "y"] .=> f .=> newname)  # apply f then rename column with newname
+3×2 DataFrame
+ Row │ x_new  y_new
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+```
+
+You can see from the type output above
+that a three element pair does not actually exist.
+A `Pair` (as the name implies) can only contain two elements.
+Thus, `:x => :y => :z` becomes a nested `Pair`,
+where `:x` is the first element and points to the `Pair` `:y => :z`,
+which is the second element.
+
+```julia
+julia> p = :x => :y => :z
+:x => (:y => :z)
+
+julia> p[1]
+:x
+
+julia> p[2]
+:y => :z
+
+julia> p[2][1]
+:y
+
+julia> p[2][2]
+:z
+
+julia> p[3] # there is no index 3 for a pair
+ERROR: BoundsError: attempt to access Pair{Symbol, Pair{Symbol, Symbol}} at index [3]
+```
+
+In the previous examples, the source columns have been individually selected.
+When broadcasting multiple columns to the same function,
+often similarities in the column names or position can be exploited to avoid
+tedious selection.
+Consider a data frame with temperature data at three different locations
+taken over time.
+```julia
+julia> df = DataFrame(Time = 1:4,
+                      Temperature1 = [20, 23, 25, 28],
+                      Temperature2 = [33, 37, 41, 44],
+                      Temperature3 = [15, 10, 4, 0])
+4×4 DataFrame
+ Row │ Time   Temperature1  Temperature2  Temperature3
+     │ Int64  Int64         Int64         Int64
+─────┼─────────────────────────────────────────────────
+   1 │     1            20            33            15
+   2 │     2            23            37            10
+   3 │     3            25            41             4
+   4 │     4            28            44             0
+```
+
+To convert all of the temperature data in one transformation,
+we just need to define a conversion function and broadcast
+it to all of the "Temperature" columns.
+
+```julia
+julia> celsius_to_kelvin(x) = x + 273
+celsius_to_kelvin (generic function with 1 method)
+
+julia> transform(
+           df,
+           Cols(r"Temp") .=> ByRow(celsius_to_kelvin),
+           renamecols = false
+       )
+4×4 DataFrame
+ Row │ Time   Temperature1  Temperature2  Temperature3
+     │ Int64  Int64         Int64         Int64
+─────┼─────────────────────────────────────────────────
+   1 │     1           293           306           288
+   2 │     2           296           310           283
+   3 │     3           298           314           277
+   4 │     4           301           317           273
+```
+Or, simultaneously changing the column names:
+
+```julia
+julia> rename_function(s) = "Temperature $(last(s)) (K)"
+rename_function (generic function with 1 method)
+
+julia> select(
+           df,
+           "Time",
+           Cols(r"Temp") .=> ByRow(celsius_to_kelvin) .=> rename_function
+       )
+4×4 DataFrame
+ Row │ Time   Temperature 1 (K)  Temperature 2 (K)  Temperature 3 (K)
+     │ Int64  Int64              Int64              Int64
+─────┼────────────────────────────────────────────────────────────────
+   1 │     1                293                306                288
+   2 │     2                296                310                283
+   3 │     3                298                314                277
+   4 │     4                301                317                273
+```
+
+!!! Note Notes
+      * `Not("Time")` or `2:4` would have been equally good choices for `source_column_selector` in the above operations.
+      * Don't forget `ByRow` if your function is to be applied to elements rather than entire column vectors.
+      Without `ByRow`, the manipulations above would have thrown
+      `ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.
+      * Regular expression (`r""`) and `:` `source_column_selectors`
+      must be wrapped in `Cols` to be properly broadcasted
+      because otherwise the broadcasting occurs before the expression is expanded into a vector of matches.
+
+You could also broadcast different columns to different functions
+by supplying a vector of functions.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> f1(x) = x .+ 1
+f1 (generic function with 1 method)
+
+julia> f2(x) = x ./ 10
+f2 (generic function with 1 method)
+
+julia> transform(df, [:a, :b] .=> [f1, f2])
+4×4 DataFrame
+ Row │ a      b      a_f1   b_f2
+     │ Int64  Int64  Int64  Float64
+─────┼──────────────────────────────
+   1 │     1      5      2      0.5
+   2 │     2      6      3      0.6
+   3 │     3      7      4      0.7
+   4 │     4      8      5      0.8
+```
+
+However, this form is not much more convenient than supplying
+multiple individual operations.
+
+```julia
+julia> transform(df, [:a => f1, :b => f2]) # same manipulation as previous
+4×4 DataFrame
+ Row │ a      b      a_f1   b_f2
+     │ Int64  Int64  Int64  Float64
+─────┼──────────────────────────────
+   1 │     1      5      2      0.5
+   2 │     2      6      3      0.6
+   3 │     3      7      4      0.7
+   4 │     4      8      5      0.8
+```
+
+Perhaps more useful for broadcasting syntax
+is to apply multiple functions to multiple columns
+by changing the vector of functions to a 1-by-x matrix of functions.
+(Recall that a list, a vector, or a matrix of operation pairs are all valid
+for passing to the manipulation functions.)
+
+```julia
+julia> [:a, :b] .=> [f1 f2] # No comma `,` between f1 and f2
+2×2 Matrix{Pair{Symbol}}:
+ :a=>f1  :a=>f2
+ :b=>f1  :b=>f2
+
+julia> transform(df, [:a, :b] .=> [f1 f2]) # No comma `,` between f1 and f2
+4×6 DataFrame
+ Row │ a      b      a_f1   b_f1   a_f2     b_f2
+     │ Int64  Int64  Int64  Int64  Float64  Float64
+─────┼──────────────────────────────────────────────
+   1 │     1      5      2      6      0.1      0.5
+   2 │     2      6      3      7      0.2      0.6
+   3 │     3      7      4      8      0.3      0.7
+   4 │     4      8      5      9      0.4      0.8
+```
+
+In this way, every combination of selected columns and functions will be applied.
+
+Pair broadcasting is a simple but powerful tool
+that can be used in any of the manipulation functions listed under
+[Basic Usage of Manipulation Functions](@ref).
+Experiment for yourself to discover other useful operations.
+
+### Additional Resources
+More details and examples of operation pair syntax can be found in
+[this blog post](https://bkamins.github.io/julialang/2020/12/24/minilanguage.html).
+(The official wording describing the syntax has changed since the blog post was written,
+but the examples are still illustrative.
+The operation pair syntax is sometimes referred to as the DataFrames.jl mini-language
+or Domain-Specific Language.)
+
+For additional syntax niceties,
+many users find the [Chain.jl](https://github.com/jkrumbiegel/Chain.jl)
+and [DataFramesMeta.jl](https://github.com/JuliaData/DataFramesMeta.jl)
+packages useful
+to help simplify manipulations that may be tedious with operation pairs alone.
+
+## Approach Comparison
+
+After that deep dive into [Manipulation Functions](@ref),
+it is a good idea to review the alternative approaches covered in
+[Getting and Setting Data in a Data Frame](@ref).
+Let us compare the two approaches with a few examples.
+
+### Convenience
+
+For simple operations,
+often getting/setting data with dot syntax
+is simpler than the equivalent data frame manipulation.
+Here we will add the two columns of our data frame together
+and place the result in a new third column.
+
+Setup:
+
+```julia
+julia> df = DataFrame(x = 1:3, y = 4:6)  # define data frame
+3×2 DataFrame
+ Row │ x      y
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+Manipulation:
+
+```julia
+julia> transform!(df, [:x, :y] => (+) => :z)
+3×3 DataFrame
+ Row │ x      y      z
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      5
+   2 │     2      5      7
+   3 │     3      6      9
+```
+
+Dot Syntax:
+
+```julia
+julia> df.x  # dot syntax returns a vector
+3-element Vector{Int64}:
+ 1
+ 2
+ 3
+
+julia> df.z = df.x + df.y
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+
+julia> df  # see that the previous expression updated the data frame `df`
+3×3 DataFrame
+ Row │ x      y      z
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      5
+   2 │     2      5      7
+   3 │     3      6      9
+```
+
+Recall that the return type from a data frame manipulation function call is always a `DataFrame`.
+The return type of a data frame column accessed with dot syntax is a `Vector`.
+Thus the expression `df.x + df.y` gets the column data as vectors
+and returns the result of the vector addition.
+However, in that same line,
+we assigned the resultant `Vector` to a new column `z` in the data frame `df`.
+We could have instead assigned the resultant `Vector` to some other variable,
+and then `df` would not have been altered.
+The approach with dot syntax is very versatile
+since the data getting, mathematics, and data setting can be separate steps.
+
+```julia
+julia> df.x
+3-element Vector{Int64}:
+ 1
+ 2
+ 3
+
+julia> v = df.x + df.y
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+
+julia> df.z = v
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+```
+
+One downside to dot syntax is that the column name must be explicitly written in the code.
+Indexing syntax can perform a similar operation with dynamic column names.
+(Manipulation functions can also work with dynamic column names as will be shown in the next example.)
+
+```julia
+julia> df = DataFrame("My First Column" => 1:3, "My Second Column" => 4:6)  # define data frame
+3×2 DataFrame
+ Row │ My First Column  My Second Column
+     │ Int64            Int64
+─────┼───────────────────────────────────
+   1 │               1                 4
+   2 │               2                 5
+   3 │               3                 6
+
+julia> c1 = "My First Column"; c2 = "My Second Column"; c3 = "My Third Column";  # define column names
+
+# Imagine the above data was read from a file or entered by a user at runtime.
+
+julia> df.c1  # dot syntax expects an explicit column name and cannot be used
+ERROR: ArgumentError: column name :c1 not found in the data frame
+
+julia> df[:, c3] = df[:, c1] + df[:, c2]  # access columns with names stored in variables
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+
+julia> df  # see that the previous expression updated the data frame `df`
+3×3 DataFrame
+ Row │ My First Column  My Second Column  My Third Column
+     │ Int64            Int64             Int64
 ─────┼────────────────────────────────────────────────────
-   1 │     0     67  male         2  own               67
-   2 │     1     22  female       2  own               67
-   3 │     2     49  male         1  own               67
-   4 │     3     45  male         2  free              67
-   5 │     4     53  male         2  free              67
-   6 │     5     35  male         1  free              67
-   7 │     6     53  male         2  own               67
-   8 │     7     35  male         3  rent              67
+   1 │               1                 4                5
+   2 │               2                 5                7
+   3 │               3                 6                9
 ```
 
-In the example below we are swapping values stored in columns `:Sex` and `:Age`:
+One benefit of using manipulation functions is that
+the name of the data frame only needs to be written once.
 
-```jldoctest dataframe
-julia> transform(german, :Age => :Sex, :Sex => :Age)
-1000×10 DataFrame
-  Row │ id     Age      Sex    Job    Housing  Saving accounts  Checking accou ⋯
-      │ Int64  String7  Int64  Int64  String7  String15         String15       ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │     0  male        67      2  own      NA               little         ⋯
-    2 │     1  female      22      2  own      little           moderate
-    3 │     2  male        49      1  own      little           NA
-    4 │     3  male        45      2  free     little           little
-    5 │     4  male        53      2  free     little           little         ⋯
-    6 │     5  male        35      1  free     NA               NA
-    7 │     6  male        53      2  own      quite rich       NA
-    8 │     7  male        35      3  rent     little           moderate
-  ⋮   │   ⋮       ⋮       ⋮      ⋮       ⋮            ⋮                ⋮       ⋱
-  994 │   993  male        30      3  own      little           little         ⋯
-  995 │   994  male        50      2  own      NA               NA
-  996 │   995  female      31      1  own      little           NA
-  997 │   996  male        40      3  own      little           little
-  998 │   997  male        38      2  own      little           NA             ⋯
-  999 │   998  male        23      2  free     little           little
- 1000 │   999  male        27      2  own      moderate         moderate
-                                                  4 columns and 985 rows omitted
+Setup:
+
+```julia
+julia> my_very_long_data_frame_name = DataFrame(
+           "My First Column" => 1:3,
+           "My Second Column" => 4:6
+       )  # define data frame
+3×2 DataFrame
+ Row │ My First Column  My Second Column
+     │ Int64            Int64
+─────┼───────────────────────────────────
+   1 │               1                 4
+   2 │               2                 5
+   3 │               3                 6
+
+julia> c1 = "My First Column"; c2 = "My Second Column"; c3 = "My Third Column";  # define column names
 ```
 
-If we give more than one source column to a transformation they are passed as
-consecutive positional arguments. So for example the
-`[:Age, :Job] => (+) => :res` transformation below evaluates `+(df1.Age, df1.Job)`
-(which adds two columns) and stores the result in the `:res` column:
+Manipulation:
 
-```jldoctest dataframe
-julia> select(german, :Age, :Job, [:Age, :Job] => (+) => :res)
-1000×3 DataFrame
-  Row │ Age    Job    res
-      │ Int64  Int64  Int64
-──────┼─────────────────────
-    1 │    67      2     69
-    2 │    22      2     24
-    3 │    49      1     50
-    4 │    45      2     47
-    5 │    53      2     55
-    6 │    35      1     36
-    7 │    53      2     55
-    8 │    35      3     38
-  ⋮   │   ⋮      ⋮      ⋮
-  994 │    30      3     33
-  995 │    50      2     52
-  996 │    31      1     32
-  997 │    40      3     43
-  998 │    38      2     40
-  999 │    23      2     25
- 1000 │    27      2     29
-            985 rows omitted
-```
-
-This concludes the introductory examples of data frame manipulations.
-See later sections of the manual,
-particularly [A Gentle Introduction to Data Frame Manipulation Functions](@ref),
-for additional explanations and functionality,
-including how to broadcast operation functions and operation pairs
-and how to pass or produce multiple columns using `AsTable`.
+```julia
+
+julia> transform!(my_very_long_data_frame_name, [c1, c2] => (+) => c3)
+3×3 DataFrame
+ Row │ My First Column  My Second Column  My Third Column
+     │ Int64            Int64             Int64
+─────┼────────────────────────────────────────────────────
+   1 │               1                 4                5
+   2 │               2                 5                7
+   3 │               3                 6                9
+```
+
+Indexing:
+
+```julia
+julia> my_very_long_data_frame_name[:, c3] = my_very_long_data_frame_name[:, c1] + my_very_long_data_frame_name[:, c2]
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+
+julia> df  # see that the previous expression updated the data frame `df`
+3×3 DataFrame
+ Row │ My First Column  My Second Column  My Third Column
+     │ Int64            Int64             Int64
+─────┼────────────────────────────────────────────────────
+   1 │               1                 4                5
+   2 │               2                 5                7
+   3 │               3                 6                9
+```
+
+### Speed
+
+TODO: Compare speed, memory, and view options (@view, !, :, copycols=false).
+(May need someone else to write this part unless I do more studying.)
diff --git a/docs/src/man/manipulation_functions.md b/docs/src/man/manipulation_functions.md
deleted file mode 100644
index 72df944763..0000000000
--- a/docs/src/man/manipulation_functions.md
+++ /dev/null
@@ -1,1431 +0,0 @@
-# A Gentle Introduction to Data Frame Manipulation Functions
-
-The seven functions below can be used to manipulate data frames
-by applying operations to them.
-This section of the documentation aims to methodically build understanding
-of these functions and their possible arguments
-by reinforcing foundational concepts and slowly increasing complexity.
-
-The functions without a `!` in their name
-will create a new data frame based on the source data frame,
-so you will probably want to store the new data frame to a new variable name,
-e.g. `new_df = transform(source_df, operation)`.
-The functions with a `!` at the end of their name
-will modify an existing data frame in-place,
-so there is typically no need to assign the result to a variable,
-e.g. `transform!(source_df, operation)` instead of
-`source_df = transform(source_df, operation)`.
-
-The number of columns and rows in the resultant data frame varies
-depending on the manipulation function employed.
-
-| Function     | Memory Usage                     | Column Retention                        | Row Retention                                       |
-| ------------ | -------------------------------- | --------------------------------------- | --------------------------------------------------- |
-| `transform`  | Creates a new data frame.        | Retains original and resultant columns. | Retains same number of rows as original data frame. |
-| `transform!` | Modifies an existing data frame. | Retains original and resultant columns. | Retains same number of rows as original data frame. |
-| `select`     | Creates a new data frame.        | Retains only resultant columns.         | Retains same number of rows as original data frame. |
-| `select!`    | Modifies an existing data frame. | Retains only resultant columns.         | Retains same number of rows as original data frame. |
-| `subset`     | Creates a new data frame.        | Retains original columns.               | Retains only rows where condition is true.          |
-| `subset!`    | Modifies an existing data frame. | Retains original columns.               | Retains only rows where condition is true.          |
-| `combine`    | Creates a new data frame.        | Retains only resultant columns.         | Retains only resultant rows.                        |
-
-## Constructing Operations
-
-All of the functions above use the same syntax which is commonly
-`manipulation_function(dataframe, operation)`.
-The `operation` argument defines the
-operation to be applied to the source `dataframe`,
-and it can take any of the following common forms explained below:
-
-`source_column_selector`
-: selects source column(s) without manipulating or renaming them
-
-   Examples: `:a`, `[:a, :b]`, `All()`, `Not(:a)`
-
-`source_column_selector => operation_function`
-: passes source column(s) as arguments to a function
-and automatically names the resulting column(s)
-
-   Examples: `:a => sum`, `[:a, :b] => +`, `:a => ByRow(==(3))`
-
-`source_column_selector => operation_function => new_column_names`
-: passes source column(s) as arguments to a function
-and names the resulting column(s) `new_column_names`
-
-   Examples: `:a => sum => :sum_of_a`, `[:a, :b] => + => :a_plus_b`
-
-   *(Not available for `subset` or `subset!`)*
-
-`source_column_selector => new_column_names`
-: renames a source column,
-or splits a column containing collection elements into multiple new columns
-
-   Examples: `:a => :new_a`, `:a_b => [:a, :b]`, `:nt => AsTable`
-
-   (*Not available for `subset` or `subset!`*)
-
-The `=>` operator constructs a
-[Pair](https://docs.julialang.org/en/v1/base/collections/#Core.Pair),
-which is a type to link one object to another.
-(Pairs are commonly used to create elements of a
-[Dictionary](https://docs.julialang.org/en/v1/base/collections/#Dictionaries).)
-In DataFrames.jl manipulation functions,
-`Pair` arguments are used to define column `operations` to be performed.
-The examples shown above will be explained in more detail later.
-
-*The manipulation functions also have methods for applying multiple operations.
-See the later sections [Applying Multiple Operations per Manipulation](@ref)
-and [Broadcasting Operation Pairs](@ref) for more information.*
-
-### `source_column_selector`
-Inside an `operation`, `source_column_selector` is usually a column name
-or column index which identifies a data frame column.
-
-`source_column_selector` may be used as the entire `operation`
-with `select` or `select!` to isolate or reorder columns.
-
-```julia
-julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6], c = [7, 8, 9])
-3×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      7
-   2 │     2      5      8
-   3 │     3      6      9
-
-julia> select(df, :b)
-3×1 DataFrame
- Row │ b
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     5
-   3 │     6
-
-julia> select(df, "b")
-3×1 DataFrame
- Row │ b
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     5
-   3 │     6
-
-julia> select(df, 2)
-3×1 DataFrame
- Row │ b
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     5
-   3 │     6
-```
-
-`source_column_selector` may also be used as the entire `operation`
-with `subset` or `subset!` if the source column contains `Bool` values.
-
-```julia
-julia> df = DataFrame(
-           name = ["Scott", "Jill", "Erica", "Jimmy"],
-           minor = [false, true, false, true],
-       )
-4×2 DataFrame
- Row │ name    minor
-     │ String  Bool
-─────┼───────────────
-   1 │ Scott   false
-   2 │ Jill     true
-   3 │ Erica   false
-   4 │ Jimmy    true
-
-julia> subset(df, :minor)
-2×2 DataFrame
- Row │ name    minor
-     │ String  Bool
-─────┼───────────────
-   1 │ Jill     true
-   2 │ Jimmy    true
-```
-
-`source_column_selector` may instead be a collection of columns such as a vector,
-a [regular expression](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions),
-a `Not`, `Between`, `All`, or `Cols` expression,
-or a `:`.
-See the [Indexing](@ref) API for the full list of possible values with references.
-
-!!! Note
-      The Julia parser sometimes prevents `:` from being used by itself.
-      If you get
-      `ERROR: syntax: whitespace not allowed after ":" used for quoting`,
-      try using `All()`, `Cols(:)`, or `(:)` instead to select all columns.
-
-```julia
-julia> df = DataFrame(
-           id = [1, 2, 3],
-           first_name = ["José", "Emma", "Nathan"],
-           last_name = ["Garcia", "Marino", "Boyer"],
-           age = [61, 24, 33]
-       )
-3×4 DataFrame
- Row │ id     first_name  last_name  age
-     │ Int64  String      String     Int64
-─────┼─────────────────────────────────────
-   1 │     1  José        Garcia        61
-   2 │     2  Emma        Marino        24
-   3 │     3  Nathan      Boyer         33
-
-julia> select(df, [:last_name, :first_name])
-3×2 DataFrame
- Row │ last_name  first_name
-     │ String     String
-─────┼───────────────────────
-   1 │ Garcia     José
-   2 │ Marino     Emma
-   3 │ Boyer      Nathan
-
-julia> select(df, r"name")
-3×2 DataFrame
- Row │ first_name  last_name
-     │ String      String
-─────┼───────────────────────
-   1 │ José        Garcia
-   2 │ Emma        Marino
-   3 │ Nathan      Boyer
-
-julia> select(df, Not(:id))
-3×3 DataFrame
- Row │ first_name  last_name  age
-     │ String      String     Int64
-─────┼──────────────────────────────
-   1 │ José        Garcia        61
-   2 │ Emma        Marino        24
-   3 │ Nathan      Boyer         33
-
-julia> select(df, Between(2,4))
-3×3 DataFrame
- Row │ first_name  last_name  age
-     │ String      String     Int64
-─────┼──────────────────────────────
-   1 │ José        Garcia        61
-   2 │ Emma        Marino        24
-   3 │ Nathan      Boyer         33
-
-julia> df2 = DataFrame(
-           name = ["Scott", "Jill", "Erica", "Jimmy"],
-           minor = [false, true, false, true],
-           male = [true, false, false, true],
-       )
-4×3 DataFrame
- Row │ name    minor  male
-     │ String  Bool   Bool
-─────┼──────────────────────
-   1 │ Scott   false   true
-   2 │ Jill     true  false
-   3 │ Erica   false  false
-   4 │ Jimmy    true   true
-
-julia> subset(df2, [:minor, :male])
-1×3 DataFrame
- Row │ name    minor  male
-     │ String  Bool   Bool
-─────┼─────────────────────
-   1 │ Jimmy    true  true
-```
-
-### `operation_function`
-Inside an `operation` pair, `operation_function` is a function
-which operates on data frame columns passed as vectors.
-When multiple columns are selected by `source_column_selector`,
-the `operation_function` will receive the columns as separate positional arguments
-in the order they were selected, e.g. `f(column1, column2, column3)`.
-
-```julia
-julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 4])
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      4
-
-julia> combine(df, :a => sum)
-1×1 DataFrame
- Row │ a_sum
-     │ Int64
-─────┼───────
-   1 │     6
-
-julia> transform(df, :b => maximum) # `transform` and `select` copy scalar result to all rows
-3×3 DataFrame
- Row │ a      b      b_maximum
-     │ Int64  Int64  Int64
-─────┼─────────────────────────
-   1 │     1      4          5
-   2 │     2      5          5
-   3 │     3      4          5
-
-julia> transform(df, [:b, :a] => -) # vector subtraction is okay
-3×3 DataFrame
- Row │ a      b      b_a_-
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      3
-   2 │     2      5      3
-   3 │     3      4      1
-
-julia> transform(df, [:a, :b] => *) # vector multiplication is not defined
-ERROR: MethodError: no method matching *(::Vector{Int64}, ::Vector{Int64})
-```
-
-Don't worry! There is a quick fix for the previous error.
-If you want to apply a function to each element in a column
-instead of to the entire column vector,
-then you can wrap your element-wise function in `ByRow` like
-`ByRow(my_elementwise_function)`.
-This will apply `my_elementwise_function` to every element in the column
-and then collect the results back into a vector.
-
-```julia
-julia> transform(df, [:a, :b] => ByRow(*))
-3×3 DataFrame
- Row │ a      b      a_b_*
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      4
-   2 │     2      5     10
-   3 │     3      4     12
-
-julia> transform(df, Cols(:) => ByRow(max))
-3×3 DataFrame
- Row │ a      b      a_b_max
-     │ Int64  Int64  Int64
-─────┼───────────────────────
-   1 │     1      4        4
-   2 │     2      5        5
-   3 │     3      4        4
-
-julia> f(x) = x + 1
-f (generic function with 1 method)
-
-julia> transform(df, :a => ByRow(f))
-3×3 DataFrame
- Row │ a      b      a_f
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      2
-   2 │     2      5      3
-   3 │     3      4      4
-```
-
-Alternatively, you may just want to define the function itself so it
-[broadcasts](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
-over vectors.
-
-```julia
-julia> g(x) = x .+ 1
-g (generic function with 1 method)
-
-julia> transform(df, :a => g)
-3×3 DataFrame
- Row │ a      b      a_g
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      2
-   2 │     2      5      3
-   3 │     3      4      4
-
-julia> h(x, y) = x .+ y .+ 1
-h (generic function with 1 method)
-
-julia> transform(df, [:a, :b] => h)
-3×3 DataFrame
- Row │ a      b      a_b_h
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      4      6
-   2 │     2      5      8
-   3 │     3      4      8
-```
-
-[Anonymous functions](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions)
-are a convenient way to define and use an `operation_function`
-all within the manipulation function call.
-
-```julia
-julia> select(df, :a => ByRow(x -> x + 1))
-3×1 DataFrame
- Row │ a_function
-     │ Int64
-─────┼────────────
-   1 │          2
-   2 │          3
-   3 │          4
-
-julia> transform(df, [:a, :b] => ByRow((x, y) -> 2x + y))
-3×3 DataFrame
- Row │ a      b      a_b_function
-     │ Int64  Int64  Int64
-─────┼────────────────────────────
-   1 │     1      4             6
-   2 │     2      5             9
-   3 │     3      4            10
-
-julia> subset(df, :b => ByRow(x -> x < 5))
-2×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     3      4
-
-julia> subset(df, :b => ByRow(<(5))) # shorter version of the previous
-2×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     3      4
-```
-
-!!! Note
-    `operation_functions` within `subset` or `subset!` function calls
-    must return a Boolean vector.
-    `true` elements in the Boolean vector will determine
-    which rows are retained in the resulting data frame.
-
-As demonstrated above, `DataFrame` columns are usually passed
-from `source_column_selector` to `operation_function` as one or more
-vector arguments.
-However, when `AsTable(source_column_selector)` is used,
-the selected columns are collected and passed as a single `NamedTuple`
-to `operation_function`.
-
-This is often useful when your `operation_function` is defined to operate
-on a single collection argument rather than on multiple positional arguments.
-The distinction is somewhat similar to the difference between the built-in
-`min` and `minimum` functions.
-`min` is defined to find the minimum value among multiple positional arguments,
-while `minimum` is defined to find the minimum value
-among the elements of a single collection argument.
-
-```julia
-julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 2:-1:1)
-2×4 DataFrame
- Row │ a      b      c      d
-     │ Int64  Int64  Int64  Int64
-─────┼────────────────────────────
-   1 │     1      3      5      2
-   2 │     2      4      6      1
-
-julia> select(df, Cols(:) => ByRow(min)) # min operates on multiple arguments
-2×1 DataFrame
- Row │ a_b_etc_min
-     │ Int64
-─────┼─────────────
-   1 │           1
-   2 │           1
-
-julia> select(df, AsTable(:) => ByRow(minimum)) # minimum operates on a collection
-2×1 DataFrame
- Row │ a_b_etc_minimum
-     │ Int64
-─────┼─────────────────
-   1 │               1
-   2 │               1
-
-julia> select(df, [:a,:b] => ByRow(+)) # `+` operates on a multiple arguments
-2×1 DataFrame
- Row │ a_b_+
-     │ Int64
-─────┼───────
-   1 │     4
-   2 │     6
-
-julia> select(df, AsTable([:a,:b]) => ByRow(sum)) # `sum` operates on a collection
-2×1 DataFrame
- Row │ a_b_sum
-     │ Int64
-─────┼─────────
-   1 │       4
-   2 │       6
-
-julia> using Statistics # contains the `mean` function
-
-julia> select(df, AsTable(Between(:b, :d)) => ByRow(mean)) # `mean` operates on a collection
-2×1 DataFrame
- Row │ b_c_d_mean
-     │ Float64
-─────┼────────────
-   1 │    3.33333
-   2 │    3.66667
-```
-
-`AsTable` can also be used to pass columns to a function which operates
-on fields of a `NamedTuple`.
-
-```julia
-julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 7:8)
-2×4 DataFrame
- Row │ a      b      c      d
-     │ Int64  Int64  Int64  Int64
-─────┼────────────────────────────
-   1 │     1      3      5      7
-   2 │     2      4      6      8
-
-julia> f(nt) = nt.a + nt.d
-f (generic function with 1 method)
-
-julia> transform(df, AsTable(:) => ByRow(f))
-2×5 DataFrame
- Row │ a      b      c      d      a_b_etc_f
-     │ Int64  Int64  Int64  Int64  Int64
-─────┼───────────────────────────────────────
-   1 │     1      3      5      7          8
-   2 │     2      4      6      8         10
-```
-
-As demonstrated above,
-in the `source_column_selector => operation_function` operation pair form,
-the results of an operation will be placed into a new column with an
-automatically-generated name based on the operation;
-the new column name will be the `operation_function` name
-appended to the source column name(s) with an underscore.
-
-This automatic column naming behavior can be avoided in two ways.
-First, the operation result can be placed back into the original column
-with the original column name by switching the keyword argument `renamecols`
-from its default value (`true`) to `renamecols=false`.
-This option prevents the function name from being appended to the column name
-as it usually would be.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> transform(df, :a => ByRow(x->x+10), renamecols=false) # add 10 in-place
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │    11      5
-   2 │    12      6
-   3 │    13      7
-   4 │    14      8
-```
-
-The second method to avoid the default manipulation column naming is to
-specify your own `new_column_names`.
-
-### `new_column_names`
-
-`new_column_names` can be included at the end of an `operation` pair to specify
-the name of the new column(s).
-`new_column_names` may be a symbol, string, function, vector of symbols, vector of strings, or `AsTable`.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> transform(df, Cols(:) => ByRow(+) => :c)
-4×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      6
-   2 │     2      6      8
-   3 │     3      7     10
-   4 │     4      8     12
-
-julia> transform(df, Cols(:) => ByRow(+) => "a+b")
-4×3 DataFrame
- Row │ a      b      a+b
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      6
-   2 │     2      6      8
-   3 │     3      7     10
-   4 │     4      8     12
-
-julia> transform(df, :a => ByRow(x->x+10) => "a+10")
-4×3 DataFrame
- Row │ a      b      a+10
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5     11
-   2 │     2      6     12
-   3 │     3      7     13
-   4 │     4      8     14
-```
-
-The `source_column_selector => new_column_names` operation form
-can be used to rename columns without an intermediate function.
-However, there are `rename` and `rename!` functions,
-which accept similar syntax,
-that tend to be more useful for this operation.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> transform(df, :a => :apple) # adds column `apple`
-4×3 DataFrame
- Row │ a      b      apple
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      1
-   2 │     2      6      2
-   3 │     3      7      3
-   4 │     4      8      4
-
-julia> select(df, :a => :apple) # retains only column `apple`
-4×1 DataFrame
- Row │ apple
-     │ Int64
-─────┼───────
-   1 │     1
-   2 │     2
-   3 │     3
-   4 │     4
-
-julia> rename(df, :a => :apple) # renames column `a` to `apple` in-place
-4×2 DataFrame
- Row │ apple  b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-```
-
-If `new_column_names` already exist in the source data frame,
-those columns will be replaced in the existing column location
-rather than being added to the end.
-This can be done by manually specifying an existing column name
-or by using the `renamecols=false` keyword argument.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> transform(df, :b => (x -> x .+ 10))  # automatic new column and column name
-4×3 DataFrame
- Row │ a      b      b_function
-     │ Int64  Int64  Int64
-─────┼──────────────────────────
-   1 │     1      5          15
-   2 │     2      6          16
-   3 │     3      7          17
-   4 │     4      8          18
-
-julia> transform(df, :b => (x -> x .+ 10), renamecols=false)  # transform column in-place
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1     15
-   2 │     2     16
-   3 │     3     17
-   4 │     4     18
-
-julia> transform(df, :b => (x -> x .+ 10) => :a)  # replace column :a
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │    15      5
-   2 │    16      6
-   3 │    17      7
-   4 │    18      8
-```
-
-Actually, `renamecols=false` just prevents the function name from being appended to the final column name such that the operation is *usually* returned to the same column.
-
-```julia
-julia> transform(df, [:a, :b] => +)  # new column name is all source columns and function name
-4×3 DataFrame
- Row │ a      b      a_b_+
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      6
-   2 │     2      6      8
-   3 │     3      7     10
-   4 │     4      8     12
-
-julia> transform(df, [:a, :b] => +, renamecols=false)  # same as above but with no function name
-4×3 DataFrame
- Row │ a      b      a_b
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      6
-   2 │     2      6      8
-   3 │     3      7     10
-   4 │     4      8     12
-
-julia> transform(df, [:a, :b] => (+) => :a)  # manually overwrite column :a (see Note below about parentheses)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     6      5
-   2 │     8      6
-   3 │    10      7
-   4 │    12      8
-```
-
-In the `source_column_selector => operation_function => new_column_names` operation form,
-`new_column_names` may also be a renaming function which operates on a string
-to create the destination column names programmatically.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> add_prefix(s) = "new_" * s
-add_prefix (generic function with 1 method)
-
-julia> transform(df, :a => (x -> 10 .* x) => add_prefix) # with named renaming function
-4×3 DataFrame
- Row │ a      b      new_a
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5     10
-   2 │     2      6     20
-   3 │     3      7     30
-   4 │     4      8     40
-
-julia> transform(df, :a => (x -> 10 .* x) => (s -> "new_" * s)) # with anonymous renaming function
-4×3 DataFrame
- Row │ a      b      new_a
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5     10
-   2 │     2      6     20
-   3 │     3      7     30
-   4 │     4      8     40
-```
-
-!!! Note
-      It is a good idea to wrap anonymous functions in parentheses
-      to avoid the `=>` operator accidently becoming part of the anonymous function.
-      The examples above do not work correctly without the parentheses!
-      ```julia
-      julia> transform(df, :a => x -> 10 .* x => add_prefix)  # Not what we wanted!
-      4×3 DataFrame
-       Row │ a      b      a_function
-           │ Int64  Int64  Pair…
-      ─────┼────────────────────────────────────────────
-         1 │     1      5  [10, 20, 30, 40]=>add_prefix
-         2 │     2      6  [10, 20, 30, 40]=>add_prefix
-         3 │     3      7  [10, 20, 30, 40]=>add_prefix
-         4 │     4      8  [10, 20, 30, 40]=>add_prefix
-
-      julia> transform(df, :a => x -> 10 .* x => s -> "new_" * s)  # Not what we wanted!
-      4×3 DataFrame
-       Row │ a      b      a_function
-           │ Int64  Int64  Pair…
-      ─────┼─────────────────────────────────────
-         1 │     1      5  [10, 20, 30, 40]=>#18
-         2 │     2      6  [10, 20, 30, 40]=>#18
-         3 │     3      7  [10, 20, 30, 40]=>#18
-         4 │     4      8  [10, 20, 30, 40]=>#18
-      ```
-
-A renaming function will not work in the
-`source_column_selector => new_column_names` operation form
-because a function in the second element of the operation pair is assumed to take
-the `source_column_selector => operation_function` operation form.
-To work around this limitation, use the
-`source_column_selector => operation_function => new_column_names` operation form
-with `identity` as the `operation_function`.
-
-```julia
-julia> transform(df, :a => add_prefix)
-ERROR: MethodError: no method matching *(::String, ::Vector{Int64})
-
-julia> transform(df, :a => identity => add_prefix)
-4×3 DataFrame
- Row │ a      b      new_a
-     │ Int64  Int64  Int64
-─────┼─────────────────────
-   1 │     1      5      1
-   2 │     2      6      2
-   3 │     3      7      3
-   4 │     4      8      4
-```
-
-In this case though,
-it is probably again more useful to use the `rename` or `rename!` function
-rather than one of the manipulation functions
-in order to rename in-place and avoid the intermediate `operation_function`.
-```julia
-julia> rename(add_prefix, df)  # rename all columns with a function
-4×2 DataFrame
- Row │ new_a  new_b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> rename(add_prefix, df; cols=:a)  # rename some columns with a function
-4×2 DataFrame
- Row │ new_a  b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-```
-
-In the `source_column_selector => new_column_names` operation form,
-only a single source column may be selected per operation,
-so why is `new_column_names` plural?
-It is possible to split the data contained inside a single column
-into multiple new columns by supplying a vector of strings or symbols
-as `new_column_names`.
-
-```julia
-julia> df = DataFrame(data = [(1,2), (3,4)]) # vector of tuples
-2×1 DataFrame
- Row │ data
-     │ Tuple…
-─────┼────────
-   1 │ (1, 2)
-   2 │ (3, 4)
-
-julia> transform(df, :data => [:first, :second]) # manual naming
-2×3 DataFrame
- Row │ data    first  second
-     │ Tuple…  Int64  Int64
-─────┼───────────────────────
-   1 │ (1, 2)      1       2
-   2 │ (3, 4)      3       4
-```
-
-This kind of data splitting can even be done automatically with `AsTable`.
-
-```julia
-julia> transform(df, :data => AsTable) # default automatic naming with tuples
-2×3 DataFrame
- Row │ data    x1     x2
-     │ Tuple…  Int64  Int64
-─────┼──────────────────────
-   1 │ (1, 2)      1      2
-   2 │ (3, 4)      3      4
-```
-
-If a data frame column contains `NamedTuple`s,
-then `AsTable` will preserve the field names.
-```julia
-julia> df = DataFrame(data = [(a=1,b=2), (a=3,b=4)]) # vector of named tuples
-2×1 DataFrame
- Row │ data
-     │ NamedTup…
-─────┼────────────────
-   1 │ (a = 1, b = 2)
-   2 │ (a = 3, b = 4)
-
-julia> transform(df, :data => AsTable) # keeps names from named tuples
-2×3 DataFrame
- Row │ data            a      b
-     │ NamedTup…       Int64  Int64
-─────┼──────────────────────────────
-   1 │ (a = 1, b = 2)      1      2
-   2 │ (a = 3, b = 4)      3      4
-```
-
-!!! Note
-      To pack multiple columns into a single column of `NamedTuple`s
-      (reverse of the above operation)
-      apply the `identity` function `ByRow`, e.g.
-      `transform(df, AsTable([:a, :b]) => ByRow(identity) => :data)`.
-
-Renaming functions also work for multi-column transformations,
-but they must operate on a vector of strings.
-
-```julia
-julia> df = DataFrame(data = [(1,2), (3,4)])
-2×1 DataFrame
- Row │ data
-     │ Tuple…
-─────┼────────
-   1 │ (1, 2)
-   2 │ (3, 4)
-
-julia> new_names(v) = ["primary ", "secondary "] .* v
-new_names (generic function with 1 method)
-
-julia> transform(df, :data => identity => new_names)
-2×3 DataFrame
- Row │ data    primary data  secondary data
-     │ Tuple…  Int64         Int64
-─────┼──────────────────────────────────────
-   1 │ (1, 2)             1               2
-   2 │ (3, 4)             3               4
-```
-
-## Applying Multiple Operations per Manipulation
-All data frame manipulation functions can accept multiple `operation` pairs
-at once using any of the following methods:
-- `manipulation_function(dataframe, operation1, operation2)`   : multiple arguments
-- `manipulation_function(dataframe, [operation1, operation2])` : vector argument
-- `manipulation_function(dataframe, [operation1 operation2])`  : matrix argument
-
-Passing multiple operations is especially useful for the `select`, `select!`,
-and `combine` manipulation functions,
-since they only retain columns which are a result of the passed operations.
-
-```julia
-julia> df = DataFrame(a = 1:4, b = [50,50,60,60], c = ["hat","bat","cat","dog"])
-4×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  String
-─────┼──────────────────────
-   1 │     1     50  hat
-   2 │     2     50  bat
-   3 │     3     60  cat
-   4 │     4     60  dog
-
-julia> combine(df, :a => maximum, :b => sum, :c => join) # 3 combine operations
-1×3 DataFrame
- Row │ a_maximum  b_sum  c_join
-     │ Int64      Int64  String
-─────┼────────────────────────────────
-   1 │         4    220  hatbatcatdog
-
-julia> select(df, :c, :b, :a) # re-order columns
-4×3 DataFrame
- Row │ c       b      a
-     │ String  Int64  Int64
-─────┼──────────────────────
-   1 │ hat        50      1
-   2 │ bat        50      2
-   3 │ cat        60      3
-   4 │ dog        60      4
-
-ulia> select(df, :b, :) # `:` here means all other columns
-4×3 DataFrame
- Row │ b      a      c
-     │ Int64  Int64  String
-─────┼──────────────────────
-   1 │    50      1  hat
-   2 │    50      2  bat
-   3 │    60      3  cat
-   4 │    60      4  dog
-
-julia> select(
-           df,
-           :c => (x -> "a " .* x) => :one_c,
-           :a => (x -> 100x),
-           :b,
-           renamecols=false
-       ) # can mix operation forms
-4×3 DataFrame
- Row │ one_c   a      b
-     │ String  Int64  Int64
-─────┼──────────────────────
-   1 │ a hat     100     50
-   2 │ a bat     200     50
-   3 │ a cat     300     60
-   4 │ a dog     400     60
-
-julia> select(
-           df,
-           :c => ByRow(reverse),
-           :c => ByRow(uppercase)
-       ) # multiple operations on same column
-4×2 DataFrame
- Row │ c_reverse  c_uppercase
-     │ String     String
-─────┼────────────────────────
-   1 │ tah        HAT
-   2 │ tab        BAT
-   3 │ tac        CAT
-   4 │ god        DOG
-```
-
-In the last two examples,
-the manipulation function arguments were split across multiple lines.
-This is a good way to make manipulations with many operations more readable.
-
-Passing multiple operations to `subset` or `subset!` is an easy way to narrow in
-on a particular row of data.
-
-```julia
-julia> subset(
-           df,
-           :b => ByRow(==(60)),
-           :c => ByRow(contains("at"))
-       ) # rows with 60 and "at"
-1×3 DataFrame
- Row │ a      b      c
-     │ Int64  Int64  String
-─────┼──────────────────────
-   1 │     3     60  cat
-```
-
-Note that all operations within a single manipulation must use the data
-as it existed before the function call
-i.e. you cannot use newly created columns for subsequent operations
-within the same manipulation.
-
-```julia
-julia> transform(
-           df,
-           [:a, :b] => ByRow(+) => :d,
-           :d => (x -> x ./ 2),
-       ) # requires two separate transformations
-ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
-
-julia> new_df = transform(df, [:a, :b] => ByRow(+) => :d)
-4×4 DataFrame
- Row │ a      b      c       d
-     │ Int64  Int64  String  Int64
-─────┼─────────────────────────────
-   1 │     1     50  hat        51
-   2 │     2     50  bat        52
-   3 │     3     60  cat        63
-   4 │     4     60  dog        64
-
-julia> transform!(new_df, :d => (x -> x ./ 2) => :d_2)
-4×5 DataFrame
- Row │ a      b      c       d      d_2
-     │ Int64  Int64  String  Int64  Float64
-─────┼──────────────────────────────────────
-   1 │     1     50  hat        51     25.5
-   2 │     2     50  bat        52     26.0
-   3 │     3     60  cat        63     31.5
-   4 │     4     60  dog        64     32.0
-```
-
-
-## Broadcasting Operation Pairs
-
-[Broadcasting](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
-pairs with `.=>` is often a convenient way to generate multiple
-similar `operation`s to be applied within a single manipulation.
-Broadcasting within the `Pair` of an `operation` is no different than
-broadcasting in base Julia.
-The broadcasting `.=>` will be expanded into a vector of pairs
-(`[operation1, operation2, ...]`),
-and this expansion will occur before the manipulation function is invoked.
-Then the manipulation function will use the
-`manipulation_function(dataframe, [operation1, operation2, ...])` method.
-This process will be explained in more detail below.
-
-To illustrate these concepts, let us first examine the `Type` of a basic `Pair`.
-In DataFrames.jl, a symbol, string, or integer
-may be used to select a single column.
-Some `Pair`s with these types are below.
-
-```julia
-julia> typeof(:x => :a)
-Pair{Symbol, Symbol}
-
-julia> typeof("x" => "a")
-Pair{String, String}
-
-julia> typeof(1 => "a")
-Pair{Int64, String}
-```
-
-Any of the `Pair`s above could be used to rename the first column
-of the data frame below to `a`.
-
-```julia
-julia> df = DataFrame(x = 1:3, y = 4:6)
-3×2 DataFrame
- Row │ x      y
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      6
-
-julia> select(df, :x => :a)
-3×1 DataFrame
- Row │ a
-     │ Int64
-─────┼───────
-   1 │     1
-   2 │     2
-   3 │     3
-
-julia> select(df, 1 => "a")
-3×1 DataFrame
- Row │ a
-     │ Int64
-─────┼───────
-   1 │     1
-   2 │     2
-   3 │     3
-```
-
-What should we do if we want to keep and rename both the `x` and `y` column?
-One option is to supply a `Vector` of operation `Pair`s to `select`.
-`select` will process all of these operations in order.
-
-```julia
-julia> ["x" => "a", "y" => "b"]
-2-element Vector{Pair{String, String}}:
- "x" => "a"
- "y" => "b"
-
-julia> select(df, ["x" => "a", "y" => "b"])
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      6
-```
-
-We can use broadcasting to simplify the syntax above.
-
-```julia
-julia> ["x", "y"] .=> ["a", "b"]
-2-element Vector{Pair{String, String}}:
- "x" => "a"
- "y" => "b"
-
-julia> select(df, ["x", "y"] .=> ["a", "b"])
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      4
-   2 │     2      5
-   3 │     3      6
-```
-
-Notice that `select` sees the same `Vector{Pair{String, String}}` operation
-argument whether the individual pairs are written out explicitly or
-constructed with broadcasting.
-The broadcasting is applied before the call to `select`.
-
-```julia
-julia> ["x" => "a", "y" => "b"] == (["x", "y"] .=> ["a", "b"])
-true
-```
-
-!!! Note
-      These operation pairs (or vector of pairs) can be given variable names.
-      This is uncommon in practice but could be helpful for intermediate
-      inspection and testing.
-      ```julia
-      df = DataFrame(x = 1:3, y = 4:6)       # create data frame
-      operation = ["x", "y"] .=> ["a", "b"]  # save operation to variable
-      typeof(operation)                      # check type of operation
-      first(operation)                       # check first pair in operation
-      last(operation)                        # check last pair in operation
-      select(df, operation)                  # manipulate `df` with `operation`
-      ```
-
-In Julia,
-a non-vector broadcasted with a vector will be repeated in each resultant pair element.
-
-```julia
-julia> ["x", "y"] .=> :a    # :a is repeated
-2-element Vector{Pair{String, Symbol}}:
- "x" => :a
- "y" => :a
-
-julia> 1 .=> [:a, :b]       # 1 is repeated
-2-element Vector{Pair{Int64, Symbol}}:
- 1 => :a
- 1 => :b
-```
-
-We can use this fact to easily broadcast an `operation_function` to multiple columns.
-
-```julia
-julia> f(x) = 2 * x
-f (generic function with 1 method)
-
-julia> ["x", "y"] .=> f  # f is repeated
-2-element Vector{Pair{String, typeof(f)}}:
- "x" => f
- "y" => f
-
-julia> select(df, ["x", "y"] .=> f)  # apply f with automatic column renaming
-3×2 DataFrame
- Row │ x_f    y_f
-     │ Int64  Int64
-─────┼──────────────
-   1 │     2      8
-   2 │     4     10
-   3 │     6     12
-
-julia> ["x", "y"] .=> f .=> ["a", "b"]  # f is repeated
-2-element Vector{Pair{String, Pair{typeof(f), String}}}:
- "x" => (f => "a")
- "y" => (f => "b")
-
-julia> select(df, ["x", "y"] .=> f .=> ["a", "b"])  # apply f with manual column renaming
-3×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     2      8
-   2 │     4     10
-   3 │     6     12
-```
-
-A renaming function can be applied to multiple columns in the same way.
-It will also be repeated in each operation `Pair`.
-
-```julia
-julia> newname(s::String) = s * "_new"
-newname (generic function with 1 method)
-
-julia> ["x", "y"] .=> f .=> newname  # both f and newname are repeated
-2-element Vector{Pair{String, Pair{typeof(f), typeof(newname)}}}:
- "x" => (f => newname)
- "y" => (f => newname)
-
-julia> select(df, ["x", "y"] .=> f .=> newname)  # apply f then rename column with newname
-3×2 DataFrame
- Row │ x_new  y_new
-     │ Int64  Int64
-─────┼──────────────
-   1 │     2      8
-   2 │     4     10
-   3 │     6     12
-```
-
-You can see from the type output above
-that a three element pair does not actually exist.
-A `Pair` (as the name implies) can only contain two elements.
-Thus, `:x => :y => :z` becomes a nested `Pair`,
-where `:x` is the first element and points to the `Pair` `:y => :z`,
-which is the second element.
-
-```julia
-julia> p = :x => :y => :z
-:x => (:y => :z)
-
-julia> p[1]
-:x
-
-julia> p[2]
-:y => :z
-
-julia> p[2][1]
-:y
-
-julia> p[2][2]
-:z
-
-julia> p[3] # there is no index 3 for a pair
-ERROR: BoundsError: attempt to access Pair{Symbol, Pair{Symbol, Symbol}} at index [3]
-```
-
-In the previous examples, the source columns have been individually selected.
-When broadcasting multiple columns to the same function,
-often similarities in the column names or position can be exploited to avoid
-tedious selection.
-Consider a data frame with temperature data at three different locations
-taken over time.
-```julia
-julia> df = DataFrame(Time = 1:4,
-                      Temperature1 = [20, 23, 25, 28],
-                      Temperature2 = [33, 37, 41, 44],
-                      Temperature3 = [15, 10, 4, 0])
-4×4 DataFrame
- Row │ Time   Temperature1  Temperature2  Temperature3
-     │ Int64  Int64         Int64         Int64
-─────┼─────────────────────────────────────────────────
-   1 │     1            20            33            15
-   2 │     2            23            37            10
-   3 │     3            25            41             4
-   4 │     4            28            44             0
-```
-
-To convert all of the temperature data in one transformation,
-we just need to define a conversion function and broadcast
-it to all of the "Temperature" columns.
-
-```julia
-julia> celsius_to_kelvin(x) = x + 273
-celsius_to_kelvin (generic function with 1 method)
-
-julia> transform(
-           df,
-           Cols(r"Temp") .=> ByRow(celsius_to_kelvin),
-           renamecols = false
-       )
-4×4 DataFrame
- Row │ Time   Temperature1  Temperature2  Temperature3
-     │ Int64  Int64         Int64         Int64
-─────┼─────────────────────────────────────────────────
-   1 │     1           293           306           288
-   2 │     2           296           310           283
-   3 │     3           298           314           277
-   4 │     4           301           317           273
-```
-Or, simultaneously changing the column names:
-
-```julia
-julia> rename_function(s) = "Temperature $(last(s)) (K)"
-rename_function (generic function with 1 method)
-
-julia> select(
-           df,
-           "Time",
-           Cols(r"Temp") .=> ByRow(celsius_to_kelvin) .=> rename_function
-       )
-4×4 DataFrame
- Row │ Time   Temperature 1 (K)  Temperature 2 (K)  Temperature 3 (K)
-     │ Int64  Int64              Int64              Int64
-─────┼────────────────────────────────────────────────────────────────
-   1 │     1                293                306                288
-   2 │     2                296                310                283
-   3 │     3                298                314                277
-   4 │     4                301                317                273
-```
-
-!!! Note Notes
-      * `Not("Time")` or `2:4` would have been equally good choices for `source_column_selector` in the above operations.
-      * Don't forget `ByRow` if your function is to be applied to elements rather than entire column vectors.
-      Without `ByRow`, the manipulations above would have thrown
-      `ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.
-      * Regular expression (`r""`) and `:` `source_column_selectors`
-      must be wrapped in `Cols` to be properly broadcasted
-      because otherwise the broadcasting occurs before the expression is expanded into a vector of matches.
-
-You could also broadcast different columns to different functions
-by supplying a vector of functions.
-
-```julia
-julia> df = DataFrame(a=1:4, b=5:8)
-4×2 DataFrame
- Row │ a      b
-     │ Int64  Int64
-─────┼──────────────
-   1 │     1      5
-   2 │     2      6
-   3 │     3      7
-   4 │     4      8
-
-julia> f1(x) = x .+ 1
-f1 (generic function with 1 method)
-
-julia> f2(x) = x ./ 10
-f2 (generic function with 1 method)
-
-julia> transform(df, [:a, :b] .=> [f1, f2])
-4×4 DataFrame
- Row │ a      b      a_f1   b_f2
-     │ Int64  Int64  Int64  Float64
-─────┼──────────────────────────────
-   1 │     1      5      2      0.5
-   2 │     2      6      3      0.6
-   3 │     3      7      4      0.7
-   4 │     4      8      5      0.8
-```
-
-However, this form is not much more convenient than supplying
-multiple individual operations.
-
-```julia
-julia> transform(df, [:a => f1, :b => f2]) # same manipulation as previous
-4×4 DataFrame
- Row │ a      b      a_f1   b_f2
-     │ Int64  Int64  Int64  Float64
-─────┼──────────────────────────────
-   1 │     1      5      2      0.5
-   2 │     2      6      3      0.6
-   3 │     3      7      4      0.7
-   4 │     4      8      5      0.8
-```
-
-Perhaps more useful for broadcasting syntax
-is to apply multiple functions to multiple columns
-by changing the vector of functions to a 1-by-x matrix of functions.
-(Recall that a list, a vector, or a matrix of operation pairs are all valid
-for passing to the manipulation functions.)
-
-```julia
-julia> [:a, :b] .=> [f1 f2] # No comma `,` between f1 and f2
-2×2 Matrix{Pair{Symbol}}:
- :a=>f1  :a=>f2
- :b=>f1  :b=>f2
-
-julia> transform(df, [:a, :b] .=> [f1 f2]) # No comma `,` between f1 and f2
-4×6 DataFrame
- Row │ a      b      a_f1   b_f1   a_f2     b_f2
-     │ Int64  Int64  Int64  Int64  Float64  Float64
-─────┼──────────────────────────────────────────────
-   1 │     1      5      2      6      0.1      0.5
-   2 │     2      6      3      7      0.2      0.6
-   3 │     3      7      4      8      0.3      0.7
-   4 │     4      8      5      9      0.4      0.8
-```
-
-In this way, every combination of selected columns and functions will be applied.
-
-Pair broadcasting is a simple but powerful tool
-that can be used in any of the manipulation functions listed under
-[Basic Usage of Manipulation Functions](@ref).
-Experiment for yourself to discover other useful operations.
-
-## Additional Resources
-More details and examples of operation pair syntax can be found in
-[this blog post](https://bkamins.github.io/julialang/2020/12/24/minilanguage.html).
-(The official wording describing the syntax has changed since the blog post was written,
-but the examples are still illustrative.
-The operation pair syntax is sometimes referred to as the DataFrames.jl mini-language
-or Domain-Specific Language.)
-
-For additional practice,
-an interactive tutorial is provided on a variety of introductory topics
-by the DataFrames.jl package author
-[here](https://github.com/bkamins/Julia-DataFrames-Tutorial).
-
-
-For additional syntax niceties,
-many users find the [Chain.jl](https://github.com/jkrumbiegel/Chain.jl)
-and [DataFramesMeta.jl](https://github.com/JuliaData/DataFramesMeta.jl)
-packages useful
-to help simplify manipulations that may be tedious with operation pairs alone.
\ No newline at end of file