From dc59622fee8a45d3eb106d3f648231d3b7b8ecdf Mon Sep 17 00:00:00 2001
From: Nathan Boyer <65452054+nathanrboyer@users.noreply.github.com>
Date: Fri, 13 Dec 2024 06:46:54 -0500
Subject: [PATCH] Updated Basic Usage of Manipulation Functions (#3360)

---
 docs/src/man/basics.md | 2197 ++++++++++++++++++++++++++++++----------
 1 file changed, 1654 insertions(+), 543 deletions(-)

diff --git a/docs/src/man/basics.md b/docs/src/man/basics.md
index d1962262b..03e5c5082 100644
--- a/docs/src/man/basics.md
+++ b/docs/src/man/basics.md
@@ -1109,7 +1109,7 @@ true
 
 If in indexing you select a subset of rows from a data frame the mutation is
 performed in place, i.e. writing to an existing vector.
-Below setting values of column `:Job` in rows `1:3` to values `[2, 4, 6]`:
+Below setting values of column `:Job` in rows `1:3` to values `[2, 3, 2]`:
 
 ```jldoctest dataframe
 julia> df1[1:3, :Job] = [2, 3, 2]
@@ -1215,7 +1215,7 @@ DataFrameRow
    2 │    98  male        2
 ```
 
-This operations updated the data stored in the `df1` data frame.
+These operations updated the data stored in the `df1` data frame.
 
 In a similar fashion views can be used to update data stored in their parent
 data frame. Here are some examples:
@@ -1599,604 +1599,1715 @@ julia> german[Not(5), r"S"]
                 984 rows omitted
 ```
 
-## Basic Usage of Transformation Functions
+## Manipulation Functions
 
-In DataFrames.jl we have five functions that we can be used to perform
-transformations of columns of a data frame:
+The seven functions below can be used to manipulate data frames
+by applying operations to them.
 
-- `combine`: creates a new data frame populated with columns that are results of
-  transformation applied to the source data frame columns, potentially combining
-  its rows;
-- `select`: creates a new data frame that has the same number of rows as the
-  source data frame populated with columns that are results of transformations
-  applied to the source data frame columns;
-- `select!`: the same as `select` but updates the passed data frame in place;
-- `transform`: the same as `select` but keeps the columns that were already
-  present in the data frame (note though that these columns can be potentially
-  modified by the transformation passed to `transform`);
-- `transform!`: the same as `transform` but updates the passed data frame in
-  place.
+The functions without a `!` in their name
+will create a new data frame based on the source data frame,
+so you will probably want to store the new data frame to a new variable name,
+e.g. `new_df = transform(source_df, operation)`.
+The functions with a `!` at the end of their name
+will modify an existing data frame in-place,
+so there is typically no need to assign the result to a variable,
+e.g. `transform!(source_df, operation)` instead of
+`source_df = transform(source_df, operation)`.
 
-The fundamental ways to specify a transformation are:
+The number of columns and rows in the resultant data frame varies
+depending on the manipulation function employed.
 
-- `source_column => transformation => target_column_name`; In this scenario the
-  `source_column` is passed as an argument to `transformation` function and
-  stored in `target_column_name` column.
-- `source_column => transformation`; In this scenario we apply the
-  transformation function to `source_column` and the target column names is
-  automatically generated.
-- `source_column => target_column_name` renames the `source_column` to
-  `target_column_name`.
-- `source_column` just keep the source column as is in the result without any
-  transformation;
+| Function     | Memory Usage                     | Column Retention                        | Row Retention                                       |
+| ------------ | -------------------------------- | --------------------------------------- | --------------------------------------------------- |
+| `transform`  | Creates a new data frame.        | Retains original and resultant columns. | Retains same number of rows as original data frame. |
+| `transform!` | Modifies an existing data frame. | Retains original and resultant columns. | Retains same number of rows as original data frame. |
+| `select`     | Creates a new data frame.        | Retains only resultant columns.         | Retains same number of rows as original data frame. |
+| `select!`    | Modifies an existing data frame. | Retains only resultant columns.         | Retains same number of rows as original data frame. |
+| `subset`     | Creates a new data frame.        | Retains original columns.               | Retains only rows where condition is true.          |
+| `subset!`    | Modifies an existing data frame. | Retains original columns.               | Retains only rows where condition is true.          |
+| `combine`    | Creates a new data frame.        | Retains only resultant columns.         | Retains only resultant rows.                        |
 
-These rules are typically called transformation mini-language.
+### Constructing Operations
 
-Let us move to the examples of application of these rules
+All of the functions above use the same syntax which is commonly
+`manipulation_function(dataframe, operation)`.
+The `operation` argument defines the
+operation to be applied to the source `dataframe`,
+and it can take any of the following common forms explained below:
 
-```jldoctest dataframe
-julia> using Statistics
+`source_column_selector`
+: selects source column(s) without manipulating or renaming them
+
+   Examples: `:a`, `[:a, :b]`, `All()`, `Not(:a)`
+
+`source_column_selector => operation_function`
+: passes source column(s) as arguments to a function
+and automatically names the resulting column(s)
+
+   Examples: `:a => sum`, `[:a, :b] => +`, `:a => ByRow(==(3))`
+
+`source_column_selector => operation_function => new_column_names`
+: passes source column(s) as arguments to a function
+and names the resulting column(s) `new_column_names`
+
+   Examples: `:a => sum => :sum_of_a`, `[:a, :b] => (+) => :a_plus_b`
+
+   *(Not available for `subset` or `subset!`)*
+
+`source_column_selector => new_column_names`
+: renames a source column,
+or splits a column containing collection elements into multiple new columns
+
+   Examples: `:a => :new_a`, `:a_b => [:a, :b]`, `:nt => AsTable`
+
+   (*Not available for `subset` or `subset!`*)
+
+The `=>` operator constructs a
+[Pair](https://docs.julialang.org/en/v1/base/collections/#Core.Pair),
+which is a type to link one object to another.
+(Pairs are commonly used to create elements of a
+[Dictionary](https://docs.julialang.org/en/v1/base/collections/#Dictionaries).)
+In DataFrames.jl manipulation functions,
+`Pair` arguments are used to define column `operations` to be performed.
+The examples shown above will be explained in more detail later.
+
+*The manipulation functions also have methods for applying multiple operations.
+See the later sections [Applying Multiple Operations per Manipulation](@ref)
+and [Broadcasting Operation Pairs](@ref) for more information.*
+
+#### `source_column_selector`
+Inside an `operation`, `source_column_selector` is usually a column name
+or column index which identifies a data frame column.
+
+`source_column_selector` may be used as the entire `operation`
+with `select` or `select!` to isolate or reorder columns.
+
+```julia
+julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 6], c = [7, 8, 9])
+3×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      7
+   2 │     2      5      8
+   3 │     3      6      9
+
+julia> select(df, :b)
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+
+julia> select(df, "b")
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+
+julia> select(df, 2)
+3×1 DataFrame
+ Row │ b
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     5
+   3 │     6
+```
+
+`source_column_selector` may also be used as the entire `operation`
+with `subset` or `subset!` if the source column contains `Bool` values.
+
+```julia
+julia> df = DataFrame(
+           name = ["Scott", "Jill", "Erica", "Jimmy"],
+           minor = [false, true, false, true],
+       )
+4×2 DataFrame
+ Row │ name    minor
+     │ String  Bool
+─────┼───────────────
+   1 │ Scott   false
+   2 │ Jill     true
+   3 │ Erica   false
+   4 │ Jimmy    true
+
+julia> subset(df, :minor)
+2×2 DataFrame
+ Row │ name    minor
+     │ String  Bool
+─────┼───────────────
+   1 │ Jill     true
+   2 │ Jimmy    true
+```
+
+`source_column_selector` may instead be a collection of columns such as a vector,
+a [regular expression](https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions),
+a `Not`, `Between`, `All`, or `Cols` expression,
+or a `:`.
+See the [Indexing](@ref) API for the full list of possible values with references.
+
+!!! note
+
+    The Julia parser sometimes prevents `:` from being used by itself.
+    If you get
+    `ERROR: syntax: whitespace not allowed after ":" used for quoting`,
+    try using `All()`, `Cols(:)`, or `(:)` instead to select all columns.
+
+```julia
+julia> df = DataFrame(
+           id = [1, 2, 3],
+           first_name = ["José", "Emma", "Nathan"],
+           last_name = ["Garcia", "Marino", "Boyer"],
+           age = [61, 24, 33]
+       )
+3×4 DataFrame
+ Row │ id     first_name  last_name  age
+     │ Int64  String      String     Int64
+─────┼─────────────────────────────────────
+   1 │     1  José        Garcia        61
+   2 │     2  Emma        Marino        24
+   3 │     3  Nathan      Boyer         33
+
+julia> select(df, [:last_name, :first_name])
+3×2 DataFrame
+ Row │ last_name  first_name
+     │ String     String
+─────┼───────────────────────
+   1 │ Garcia     José
+   2 │ Marino     Emma
+   3 │ Boyer      Nathan
+
+julia> select(df, r"name")
+3×2 DataFrame
+ Row │ first_name  last_name
+     │ String      String
+─────┼───────────────────────
+   1 │ José        Garcia
+   2 │ Emma        Marino
+   3 │ Nathan      Boyer
+
+julia> select(df, Not(:id))
+3×3 DataFrame
+ Row │ first_name  last_name  age
+     │ String      String     Int64
+─────┼──────────────────────────────
+   1 │ José        Garcia        61
+   2 │ Emma        Marino        24
+   3 │ Nathan      Boyer         33
+
+julia> select(df, Between(2,4))
+3×3 DataFrame
+ Row │ first_name  last_name  age
+     │ String      String     Int64
+─────┼──────────────────────────────
+   1 │ José        Garcia        61
+   2 │ Emma        Marino        24
+   3 │ Nathan      Boyer         33
+
+julia> df2 = DataFrame(
+           name = ["Scott", "Jill", "Erica", "Jimmy"],
+           minor = [false, true, false, true],
+           male = [true, false, false, true],
+       )
+4×3 DataFrame
+ Row │ name    minor  male
+     │ String  Bool   Bool
+─────┼──────────────────────
+   1 │ Scott   false   true
+   2 │ Jill     true  false
+   3 │ Erica   false  false
+   4 │ Jimmy    true   true
+
+julia> subset(df2, [:minor, :male])
+1×3 DataFrame
+ Row │ name    minor  male
+     │ String  Bool   Bool
+─────┼─────────────────────
+   1 │ Jimmy    true  true
+```
+
+!!! note
+
+    Using `Symbol` in `source_column_selector` will perform slightly faster than using string.
+    However, a string is convenient when column names contain spaces.
+
+    All elements of `source_column_selector` must be the same type
+    (unless wrapped in `Cols`),
+    e.g. `subset(df2, [:minor, "male"])` will error
+    since `Symbol` and string are used simultaneously.
+
+#### `operation_function`
+Inside an `operation` pair, `operation_function` is a function
+which operates on data frame columns passed as vectors.
+When multiple columns are selected by `source_column_selector`,
+the `operation_function` will receive the columns as separate positional arguments
+in the order they were selected, e.g. `f(column1, column2, column3)`.
+
+```julia
+julia> df = DataFrame(a = [1, 2, 3], b = [4, 5, 4])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      4
+
+julia> combine(df, :a => sum)
+1×1 DataFrame
+ Row │ a_sum
+     │ Int64
+─────┼───────
+   1 │     6
+
+julia> transform(df, :b => maximum) # `transform` and `select` copy scalar result to all rows
+3×3 DataFrame
+ Row │ a      b      b_maximum
+     │ Int64  Int64  Int64
+─────┼─────────────────────────
+   1 │     1      4          5
+   2 │     2      5          5
+   3 │     3      4          5
+
+julia> transform(df, [:b, :a] => -) # vector subtraction is okay
+3×3 DataFrame
+ Row │ a      b      b_a_-
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      3
+   2 │     2      5      3
+   3 │     3      4      1
+
+julia> transform(df, [:a, :b] => *) # vector multiplication is not defined
+ERROR: MethodError: no method matching *(::Vector{Int64}, ::Vector{Int64})
+```
+
+Don't worry! There is a quick fix for the previous error.
+If you want to apply a function to each element in a column
+instead of to the entire column vector,
+then you can wrap your element-wise function in `ByRow` like
+`ByRow(my_elementwise_function)`.
+This will apply `my_elementwise_function` to every element in the column
+and then collect the results back into a vector.
+
+```julia
+julia> transform(df, [:a, :b] => ByRow(*))
+3×3 DataFrame
+ Row │ a      b      a_b_*
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      4
+   2 │     2      5     10
+   3 │     3      4     12
+
+julia> transform(df, Cols(:) => ByRow(max))
+3×3 DataFrame
+ Row │ a      b      a_b_max
+     │ Int64  Int64  Int64
+─────┼───────────────────────
+   1 │     1      4        4
+   2 │     2      5        5
+   3 │     3      4        4
+
+julia> f(x) = x + 1
+f (generic function with 1 method)
+
+julia> transform(df, :a => ByRow(f))
+3×3 DataFrame
+ Row │ a      b      a_f
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      2
+   2 │     2      5      3
+   3 │     3      4      4
+```
+
+Alternatively, you may just want to define the function itself so it
+[broadcasts](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
+over vectors.
+
+```julia
+julia> g(x) = x .+ 1
+g (generic function with 1 method)
+
+julia> transform(df, :a => g)
+3×3 DataFrame
+ Row │ a      b      a_g
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      2
+   2 │     2      5      3
+   3 │     3      4      4
+
+julia> h(x, y) = x .+ y .+ 1
+h (generic function with 1 method)
+
+julia> transform(df, [:a, :b] => h)
+3×3 DataFrame
+ Row │ a      b      a_b_h
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      6
+   2 │     2      5      8
+   3 │     3      4      8
+```
+
+[Anonymous functions](https://docs.julialang.org/en/v1/manual/functions/#man-anonymous-functions)
+are a convenient way to define and use an `operation_function`
+all within the manipulation function call.
+
+```julia
+julia> select(df, :a => ByRow(x -> x + 1))
+3×1 DataFrame
+ Row │ a_function
+     │ Int64
+─────┼────────────
+   1 │          2
+   2 │          3
+   3 │          4
+
+julia> transform(df, [:a, :b] => ByRow((x, y) -> 2x + y))
+3×3 DataFrame
+ Row │ a      b      a_b_function
+     │ Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      4             6
+   2 │     2      5             9
+   3 │     3      4            10
+
+julia> subset(df, :b => ByRow(x -> x < 5))
+2×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     3      4
+
+julia> subset(df, :b => ByRow(<(5))) # shorter version of the previous
+2×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     3      4
+```
+
+!!! note
+
+    `operation_functions` within `subset` or `subset!` function calls
+    must return a Boolean vector.
+    `true` elements in the Boolean vector will determine
+    which rows are retained in the resulting data frame.
+
+As demonstrated above, `DataFrame` columns are usually passed
+from `source_column_selector` to `operation_function` as one or more
+vector arguments.
+However, when `AsTable(source_column_selector)` is used,
+the selected columns are collected and passed as a single `NamedTuple`
+to `operation_function`.
+
+This is often useful when your `operation_function` is defined to operate
+on a single collection argument rather than on multiple positional arguments.
+The distinction is somewhat similar to the difference between the built-in
+`min` and `minimum` functions.
+`min` is defined to find the minimum value among multiple positional arguments,
+while `minimum` is defined to find the minimum value
+among the elements of a single collection argument.
+
+```julia
+julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 2:-1:1)
+2×4 DataFrame
+ Row │ a      b      c      d
+     │ Int64  Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      3      5      2
+   2 │     2      4      6      1
+
+julia> select(df, Cols(:) => ByRow(min)) # min operates on multiple arguments
+2×1 DataFrame
+ Row │ a_b_etc_min
+     │ Int64
+─────┼─────────────
+   1 │           1
+   2 │           1
+
+julia> select(df, AsTable(:) => ByRow(minimum)) # minimum operates on a collection
+2×1 DataFrame
+ Row │ a_b_etc_minimum
+     │ Int64
+─────┼─────────────────
+   1 │               1
+   2 │               1
+
+julia> select(df, [:a,:b] => ByRow(+)) # `+` operates on a multiple arguments
+2×1 DataFrame
+ Row │ a_b_+
+     │ Int64
+─────┼───────
+   1 │     4
+   2 │     6
+
+julia> select(df, AsTable([:a,:b]) => ByRow(sum)) # `sum` operates on a collection
+2×1 DataFrame
+ Row │ a_b_sum
+     │ Int64
+─────┼─────────
+   1 │       4
+   2 │       6
+
+julia> using Statistics # contains the `mean` function
+
+julia> select(df, AsTable(Between(:b, :d)) => ByRow(mean)) # `mean` operates on a collection
+2×1 DataFrame
+ Row │ b_c_d_mean
+     │ Float64
+─────┼────────────
+   1 │    3.33333
+   2 │    3.66667
+```
+
+`AsTable` can also be used to pass columns to a function which operates
+on fields of a `NamedTuple`.
+
+```julia
+julia> df = DataFrame(a = 1:2, b = 3:4, c = 5:6, d = 7:8)
+2×4 DataFrame
+ Row │ a      b      c      d
+     │ Int64  Int64  Int64  Int64
+─────┼────────────────────────────
+   1 │     1      3      5      7
+   2 │     2      4      6      8
+
+julia> f(nt) = nt.a + nt.d
+f (generic function with 1 method)
+
+julia> transform(df, AsTable(:) => ByRow(f))
+2×5 DataFrame
+ Row │ a      b      c      d      a_b_etc_f
+     │ Int64  Int64  Int64  Int64  Int64
+─────┼───────────────────────────────────────
+   1 │     1      3      5      7          8
+   2 │     2      4      6      8         10
+```
+
+As demonstrated above,
+in the `source_column_selector => operation_function` operation pair form,
+the results of an operation will be placed into a new column with an
+automatically-generated name based on the operation;
+the new column name will be the `operation_function` name
+appended to the source column name(s) with an underscore.
+
+This automatic column naming behavior can be avoided in two ways.
+First, the operation result can be placed back into the original column
+with the original column name by switching the keyword argument `renamecols`
+from its default value (`true`) to `renamecols=false`.
+This option prevents the function name from being appended to the column name
+as it usually would be.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :a => ByRow(x->x+10), renamecols=false) # add 10 in-place
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │    11      5
+   2 │    12      6
+   3 │    13      7
+   4 │    14      8
+```
+
+The second method to avoid the default manipulation column naming is to
+specify your own `new_column_names`.
+
+#### `new_column_names`
+
+`new_column_names` can be included at the end of an `operation` pair to specify
+the name of the new column(s).
+`new_column_names` may be a symbol, string, function, vector of symbols, vector of strings, or `AsTable`.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, Cols(:) => ByRow(+) => :c)
+4×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, Cols(:) => ByRow(+) => "a+b")
+4×3 DataFrame
+ Row │ a      b      a+b
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, :a => ByRow(x->x+10) => "a+10")
+4×3 DataFrame
+ Row │ a      b      a+10
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     11
+   2 │     2      6     12
+   3 │     3      7     13
+   4 │     4      8     14
+```
+
+The `source_column_selector => new_column_names` operation form
+can be used to rename columns without an intermediate function.
+However, there are `rename` and `rename!` functions,
+which accept similar syntax,
+that tend to be more useful for this operation.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :a => :apple) # adds column `apple`
+4×3 DataFrame
+ Row │ a      b      apple
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      1
+   2 │     2      6      2
+   3 │     3      7      3
+   4 │     4      8      4
+
+julia> select(df, :a => :apple) # retains only column `apple`
+4×1 DataFrame
+ Row │ apple
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+   4 │     4
+
+julia> rename(df, :a => :apple) # renames column `a` to `apple` in-place
+4×2 DataFrame
+ Row │ apple  b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+```
+
+If `new_column_names` already exist in the source data frame,
+those columns will be replaced in the existing column location
+rather than being added to the end.
+This can be done by manually specifying an existing column name
+or by using the `renamecols=false` keyword argument.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> transform(df, :b => (x -> x .+ 10))  # automatic new column and column name
+4×3 DataFrame
+ Row │ a      b      b_function
+     │ Int64  Int64  Int64
+─────┼──────────────────────────
+   1 │     1      5          15
+   2 │     2      6          16
+   3 │     3      7          17
+   4 │     4      8          18
+
+julia> transform(df, :b => (x -> x .+ 10), renamecols=false)  # transform column in-place
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1     15
+   2 │     2     16
+   3 │     3     17
+   4 │     4     18
+
+julia> transform(df, :b => (x -> x .+ 10) => :a)  # replace column :a
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │    15      5
+   2 │    16      6
+   3 │    17      7
+   4 │    18      8
+```
+
+Actually, `renamecols=false` just prevents the function name from being appended to the final column name such that the operation is *usually* returned to the same column.
+
+```julia
+julia> transform(df, [:a, :b] => +)  # new column name is all source columns and function name
+4×3 DataFrame
+ Row │ a      b      a_b_+
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, [:a, :b] => +, renamecols=false)  # same as above but with no function name
+4×3 DataFrame
+ Row │ a      b      a_b
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      6
+   2 │     2      6      8
+   3 │     3      7     10
+   4 │     4      8     12
+
+julia> transform(df, [:a, :b] => (+) => :a)  # manually overwrite column :a (see Note below about parentheses)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     6      5
+   2 │     8      6
+   3 │    10      7
+   4 │    12      8
+```
+
+In the `source_column_selector => operation_function => new_column_names` operation form,
+`new_column_names` may also be a renaming function which operates on a string
+to create the destination column names programmatically.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> add_prefix(s) = "new_" * s
+add_prefix (generic function with 1 method)
+
+julia> transform(df, :a => (x -> 10 .* x) => add_prefix) # with named renaming function
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     10
+   2 │     2      6     20
+   3 │     3      7     30
+   4 │     4      8     40
+
+julia> transform(df, :a => (x -> 10 .* x) => (s -> "new_" * s)) # with anonymous renaming function
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5     10
+   2 │     2      6     20
+   3 │     3      7     30
+   4 │     4      8     40
+```
+
+!!! note
+
+    It is a good idea to wrap anonymous functions in parentheses
+    to avoid the `=>` operator accidently becoming part of the anonymous function.
+    The examples above do not work correctly without the parentheses!
+    ```julia
+    julia> transform(df, :a => x -> 10 .* x => add_prefix)  # Not what we wanted!
+    4×3 DataFrame
+     Row │ a      b      a_function
+         │ Int64  Int64  Pair…
+    ─────┼────────────────────────────────────────────
+       1 │     1      5  [10, 20, 30, 40]=>add_prefix
+       2 │     2      6  [10, 20, 30, 40]=>add_prefix
+       3 │     3      7  [10, 20, 30, 40]=>add_prefix
+       4 │     4      8  [10, 20, 30, 40]=>add_prefix
+    julia> transform(df, :a => x -> 10 .* x => s -> "new_" * s)  # Not what we wanted!
+    4×3 DataFrame
+     Row │ a      b      a_function
+         │ Int64  Int64  Pair…
+    ─────┼─────────────────────────────────────
+       1 │     1      5  [10, 20, 30, 40]=>#18
+       2 │     2      6  [10, 20, 30, 40]=>#18
+       3 │     3      7  [10, 20, 30, 40]=>#18
+       4 │     4      8  [10, 20, 30, 40]=>#18
+    ```
+
+A renaming function will not work in the
+`source_column_selector => new_column_names` operation form
+because a function in the second element of the operation pair is assumed to take
+the `source_column_selector => operation_function` operation form.
+To work around this limitation, use the
+`source_column_selector => operation_function => new_column_names` operation form
+with `identity` as the `operation_function`.
+
+```julia
+julia> transform(df, :a => add_prefix)
+ERROR: MethodError: no method matching *(::String, ::Vector{Int64})
+
+julia> transform(df, :a => identity => add_prefix)
+4×3 DataFrame
+ Row │ a      b      new_a
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      5      1
+   2 │     2      6      2
+   3 │     3      7      3
+   4 │     4      8      4
+```
+
+In this case though,
+it is probably again more useful to use the `rename` or `rename!` function
+rather than one of the manipulation functions
+in order to rename in-place and avoid the intermediate `operation_function`.
+```julia
+julia> rename(add_prefix, df)  # rename all columns with a function
+4×2 DataFrame
+ Row │ new_a  new_b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> rename(add_prefix, df; cols=:a)  # rename some columns with a function
+4×2 DataFrame
+ Row │ new_a  b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+```
+
+In the `source_column_selector => new_column_names` operation form,
+only a single source column may be selected per operation,
+so why is `new_column_names` plural?
+It is possible to split the data contained inside a single column
+into multiple new columns by supplying a vector of strings or symbols
+as `new_column_names`.
+
+```julia
+julia> df = DataFrame(data = [(1,2), (3,4)]) # vector of tuples
+2×1 DataFrame
+ Row │ data
+     │ Tuple…
+─────┼────────
+   1 │ (1, 2)
+   2 │ (3, 4)
+
+julia> transform(df, :data => [:first, :second]) # manual naming
+2×3 DataFrame
+ Row │ data    first  second
+     │ Tuple…  Int64  Int64
+─────┼───────────────────────
+   1 │ (1, 2)      1       2
+   2 │ (3, 4)      3       4
+```
+
+This kind of data splitting can even be done automatically with `AsTable`.
+
+```julia
+julia> transform(df, :data => AsTable) # default automatic naming with tuples
+2×3 DataFrame
+ Row │ data    x1     x2
+     │ Tuple…  Int64  Int64
+─────┼──────────────────────
+   1 │ (1, 2)      1      2
+   2 │ (3, 4)      3      4
+```
+
+If a data frame column contains `NamedTuple`s,
+then `AsTable` will preserve the field names.
+```julia
+julia> df = DataFrame(data = [(a=1,b=2), (a=3,b=4)]) # vector of named tuples
+2×1 DataFrame
+ Row │ data
+     │ NamedTup…
+─────┼────────────────
+   1 │ (a = 1, b = 2)
+   2 │ (a = 3, b = 4)
+
+julia> transform(df, :data => AsTable) # keeps names from named tuples
+2×3 DataFrame
+ Row │ data            a      b
+     │ NamedTup…       Int64  Int64
+─────┼──────────────────────────────
+   1 │ (a = 1, b = 2)      1      2
+   2 │ (a = 3, b = 4)      3      4
+```
+
+!!! note
+
+    To pack multiple columns into a single column of `NamedTuple`s
+    (reverse of the above operation)
+    apply the `identity` function `ByRow`, e.g.
+    `transform(df, AsTable([:a, :b]) => ByRow(identity) => :data)`.
+
+Renaming functions also work for multi-column transformations,
+but they must operate on a vector of strings.
+
+```julia
+julia> df = DataFrame(data = [(1,2), (3,4)])
+2×1 DataFrame
+ Row │ data
+     │ Tuple…
+─────┼────────
+   1 │ (1, 2)
+   2 │ (3, 4)
+
+julia> new_names(v) = ["primary ", "secondary "] .* v
+new_names (generic function with 1 method)
+
+julia> transform(df, :data => identity => new_names)
+2×3 DataFrame
+ Row │ data    primary data  secondary data
+     │ Tuple…  Int64         Int64
+─────┼──────────────────────────────────────
+   1 │ (1, 2)             1               2
+   2 │ (3, 4)             3               4
+```
+
+### Applying Multiple Operations per Manipulation
+All data frame manipulation functions can accept multiple `operation` pairs
+at once using any of the following methods:
+- `manipulation_function(dataframe, operation1, operation2)`   : multiple arguments
+- `manipulation_function(dataframe, [operation1, operation2])` : vector argument
+- `manipulation_function(dataframe, [operation1 operation2])`  : matrix argument
+
+Passing multiple operations is especially useful for the `select`, `select!`,
+and `combine` manipulation functions,
+since they only retain columns which are a result of the passed operations.
+
+```julia
+julia> df = DataFrame(a = 1:4, b = [50,50,60,60], c = ["hat","bat","cat","dog"])
+4×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │     1     50  hat
+   2 │     2     50  bat
+   3 │     3     60  cat
+   4 │     4     60  dog
+
+julia> combine(df, :a => maximum, :b => sum, :c => join) # 3 combine operations
+1×3 DataFrame
+ Row │ a_maximum  b_sum  c_join
+     │ Int64      Int64  String
+─────┼────────────────────────────────
+   1 │         4    220  hatbatcatdog
+
+julia> select(df, :c, :b, :a) # re-order columns
+4×3 DataFrame
+ Row │ c       b      a
+     │ String  Int64  Int64
+─────┼──────────────────────
+   1 │ hat        50      1
+   2 │ bat        50      2
+   3 │ cat        60      3
+   4 │ dog        60      4
+
+ulia> select(df, :b, :) # `:` here means all other columns
+4×3 DataFrame
+ Row │ b      a      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │    50      1  hat
+   2 │    50      2  bat
+   3 │    60      3  cat
+   4 │    60      4  dog
+
+julia> select(
+           df,
+           :c => (x -> "a " .* x) => :one_c,
+           :a => (x -> 100x),
+           :b,
+           renamecols=false
+       ) # can mix operation forms
+4×3 DataFrame
+ Row │ one_c   a      b
+     │ String  Int64  Int64
+─────┼──────────────────────
+   1 │ a hat     100     50
+   2 │ a bat     200     50
+   3 │ a cat     300     60
+   4 │ a dog     400     60
+
+julia> select(
+           df,
+           :c => ByRow(reverse),
+           :c => ByRow(uppercase)
+       ) # multiple operations on same column
+4×2 DataFrame
+ Row │ c_reverse  c_uppercase
+     │ String     String
+─────┼────────────────────────
+   1 │ tah        HAT
+   2 │ tab        BAT
+   3 │ tac        CAT
+   4 │ god        DOG
+```
+
+In the last two examples,
+the manipulation function arguments were split across multiple lines.
+This is a good way to make manipulations with many operations more readable.
+
+Passing multiple operations to `subset` or `subset!` is an easy way to narrow in
+on a particular row of data.
+
+```julia
+julia> subset(
+           df,
+           :b => ByRow(==(60)),
+           :c => ByRow(contains("at"))
+       ) # rows with 60 and "at"
+1×3 DataFrame
+ Row │ a      b      c
+     │ Int64  Int64  String
+─────┼──────────────────────
+   1 │     3     60  cat
+```
+
+Note that all operations within a single manipulation must use the data
+as it existed before the function call
+i.e. you cannot use newly created columns for subsequent operations
+within the same manipulation.
+
+```julia
+julia> transform(
+           df,
+           [:a, :b] => ByRow(+) => :d,
+           :d => (x -> x ./ 2),
+       ) # requires two separate transformations
+ERROR: ArgumentError: column name :d not found in the data frame; existing most similar names are: :a, :b and :c
+
+julia> new_df = transform(df, [:a, :b] => ByRow(+) => :d)
+4×4 DataFrame
+ Row │ a      b      c       d
+     │ Int64  Int64  String  Int64
+─────┼─────────────────────────────
+   1 │     1     50  hat        51
+   2 │     2     50  bat        52
+   3 │     3     60  cat        63
+   4 │     4     60  dog        64
+
+julia> transform!(new_df, :d => (x -> x ./ 2) => :d_2)
+4×5 DataFrame
+ Row │ a      b      c       d      d_2
+     │ Int64  Int64  String  Int64  Float64
+─────┼──────────────────────────────────────
+   1 │     1     50  hat        51     25.5
+   2 │     2     50  bat        52     26.0
+   3 │     3     60  cat        63     31.5
+   4 │     4     60  dog        64     32.0
+```
+
+
+### Broadcasting Operation Pairs
+
+[Broadcasting](https://docs.julialang.org/en/v1/manual/arrays/#Broadcasting)
+pairs with `.=>` is often a convenient way to generate multiple
+similar `operation`s to be applied within a single manipulation.
+Broadcasting within the `Pair` of an `operation` is no different than
+broadcasting in base Julia.
+The broadcasting `.=>` will be expanded into a vector of pairs
+(`[operation1, operation2, ...]`),
+and this expansion will occur before the manipulation function is invoked.
+Then the manipulation function will use the
+`manipulation_function(dataframe, [operation1, operation2, ...])` method.
+This process will be explained in more detail below.
+
+To illustrate these concepts, let us first examine the `Type` of a basic `Pair`.
+In DataFrames.jl, a symbol, string, or integer
+may be used to select a single column.
+Some `Pair`s with these types are below.
+
+```julia
+julia> typeof(:x => :a)
+Pair{Symbol, Symbol}
+
+julia> typeof("x" => "a")
+Pair{String, String}
+
+julia> typeof(1 => "a")
+Pair{Int64, String}
+```
+
+Any of the `Pair`s above could be used to rename the first column
+of the data frame below to `a`.
+
+```julia
+julia> df = DataFrame(x = 1:3, y = 4:6)
+3×2 DataFrame
+ Row │ x      y
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+
+julia> select(df, :x => :a)
+3×1 DataFrame
+ Row │ a
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+
+julia> select(df, 1 => "a")
+3×1 DataFrame
+ Row │ a
+     │ Int64
+─────┼───────
+   1 │     1
+   2 │     2
+   3 │     3
+```
+
+What should we do if we want to keep and rename both the `x` and `y` column?
+One option is to supply a `Vector` of operation `Pair`s to `select`.
+`select` will process all of these operations in order.
+
+```julia
+julia> ["x" => "a", "y" => "b"]
+2-element Vector{Pair{String, String}}:
+ "x" => "a"
+ "y" => "b"
+
+julia> select(df, ["x" => "a", "y" => "b"])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+We can use broadcasting to simplify the syntax above.
+
+```julia
+julia> ["x", "y"] .=> ["a", "b"]
+2-element Vector{Pair{String, String}}:
+ "x" => "a"
+ "y" => "b"
+
+julia> select(df, ["x", "y"] .=> ["a", "b"])
+3×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
+```
+
+Notice that `select` sees the same `Vector{Pair{String, String}}` operation
+argument whether the individual pairs are written out explicitly or
+constructed with broadcasting.
+The broadcasting is applied before the call to `select`.
+
+```julia
+julia> ["x" => "a", "y" => "b"] == (["x", "y"] .=> ["a", "b"])
+true
+```
 
-julia> combine(german, :Age => mean => :mean_age)
-1×1 DataFrame
- Row │ mean_age
-     │ Float64
-─────┼──────────
-   1 │   35.546
+!!! note
 
-julia> select(german, :Age => mean => :mean_age)
-1000×1 DataFrame
-  Row │ mean_age
-      │ Float64
-──────┼──────────
-    1 │   35.546
-    2 │   35.546
-    3 │   35.546
-    4 │   35.546
-    5 │   35.546
-    6 │   35.546
-    7 │   35.546
-    8 │   35.546
-  ⋮   │    ⋮
-  994 │   35.546
-  995 │   35.546
-  996 │   35.546
-  997 │   35.546
-  998 │   35.546
-  999 │   35.546
- 1000 │   35.546
- 985 rows omitted
-```
-
-As you can see in both cases the `mean` function was applied to `:Age` column
-and the result was stored in the `:mean_age` column. The difference between
-the `combine` and `select` functions is that the `combine` aggregates data
-and produces as many rows as were returned by the transformation function.
-On the other hand the `select` function always keeps the number of rows in a
-data frame to be the same as in the source data frame. Therefore in this case
-the result of the `mean` function got broadcasted.
-
-As `combine` potentially allows any number of rows to be produced as a result
-of the transformation if we have a combination of transformations where some of
-them produce a vector, and other produce scalars then scalars get broadcasted
-exactly like in  `select`. Here is an example:
+    These operation pairs (or vector of pairs) can be given variable names.
+    This is uncommon in practice but could be helpful for intermediate
+    inspection and testing.
+    ```julia
+    df = DataFrame(x = 1:3, y = 4:6)       # create data frame
+    operation = ["x", "y"] .=> ["a", "b"]  # save operation to variable
+    typeof(operation)                      # check type of operation
+    first(operation)                       # check first pair in operation
+    last(operation)                        # check last pair in operation
+    select(df, operation)                  # manipulate `df` with `operation`
+    ```
+
+In Julia,
+a non-vector broadcasted with a vector will be repeated in each resultant pair element.
 
-```jldoctest dataframe
-julia> combine(german, :Age => mean => :mean_age, :Housing => unique => :housing)
+```julia
+julia> ["x", "y"] .=> :a    # :a is repeated
+2-element Vector{Pair{String, Symbol}}:
+ "x" => :a
+ "y" => :a
+
+julia> 1 .=> [:a, :b]       # 1 is repeated
+2-element Vector{Pair{Int64, Symbol}}:
+ 1 => :a
+ 1 => :b
+```
+
+We can use this fact to easily broadcast an `operation_function` to multiple columns.
+
+```julia
+julia> f(x) = 2 * x
+f (generic function with 1 method)
+
+julia> ["x", "y"] .=> f  # f is repeated
+2-element Vector{Pair{String, typeof(f)}}:
+ "x" => f
+ "y" => f
+
+julia> select(df, ["x", "y"] .=> f)  # apply f with automatic column renaming
+3×2 DataFrame
+ Row │ x_f    y_f
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
+
+julia> ["x", "y"] .=> f .=> ["a", "b"]  # f is repeated
+2-element Vector{Pair{String, Pair{typeof(f), String}}}:
+ "x" => (f => "a")
+ "y" => (f => "b")
+
+julia> select(df, ["x", "y"] .=> f .=> ["a", "b"])  # apply f with manual column renaming
 3×2 DataFrame
- Row │ mean_age  housing
-     │ Float64   String7
-─────┼───────────────────
-   1 │   35.546  own
-   2 │   35.546  free
-   3 │   35.546  rent
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
 ```
 
-Note, however, that it is not allowed to return vectors of different lengths in
-different transformations:
+A renaming function can be applied to multiple columns in the same way.
+It will also be repeated in each operation `Pair`.
 
-```jldoctest dataframe
-julia> combine(german, :Age, :Housing => unique => :Housing)
-ERROR: ArgumentError: New columns must have the same length as old columns
+```julia
+julia> newname(s::String) = s * "_new"
+newname (generic function with 1 method)
+
+julia> ["x", "y"] .=> f .=> newname  # both f and newname are repeated
+2-element Vector{Pair{String, Pair{typeof(f), typeof(newname)}}}:
+ "x" => (f => newname)
+ "y" => (f => newname)
+
+julia> select(df, ["x", "y"] .=> f .=> newname)  # apply f then rename column with newname
+3×2 DataFrame
+ Row │ x_new  y_new
+     │ Int64  Int64
+─────┼──────────────
+   1 │     2      8
+   2 │     4     10
+   3 │     6     12
 ```
 
-Let us discuss some other examples using `select`. Often we want to apply some
-function not to the whole column of a data frame, but rather to its individual
-elements. Normally we can achieve this using broadcasting like this:
+You can see from the type output above
+that a three element pair does not actually exist.
+A `Pair` (as the name implies) can only contain two elements.
+Thus, `:x => :y => :z` becomes a nested `Pair`,
+where `:x` is the first element and points to the `Pair` `:y => :z`,
+which is the second element.
 
-```jldoctest dataframe
-julia> select(german, :Sex => (x -> uppercase.(x)) => :Sex)
-1000×1 DataFrame
-  Row │ Sex
-      │ String
-──────┼────────
-    1 │ MALE
-    2 │ FEMALE
-    3 │ MALE
-    4 │ MALE
-    5 │ MALE
-    6 │ MALE
-    7 │ MALE
-    8 │ MALE
-  ⋮   │   ⋮
-  994 │ MALE
-  995 │ MALE
-  996 │ FEMALE
-  997 │ MALE
-  998 │ MALE
-  999 │ MALE
- 1000 │ MALE
-985 rows omitted
+```julia
+julia> p = :x => :y => :z
+:x => (:y => :z)
+
+julia> p[1]
+:x
+
+julia> p[2]
+:y => :z
+
+julia> p[2][1]
+:y
+
+julia> p[2][2]
+:z
+
+julia> p[3] # there is no index 3 for a pair
+ERROR: BoundsError: attempt to access Pair{Symbol, Pair{Symbol, Symbol}} at index [3]
 ```
 
-This pattern is encountered very often in practice, therefore there is a `ByRow`
-convenience wrapper for a function that creates its broadcasted variant. In
-these examples `ByRow` is a special type used for selection operations to signal
-that the wrapped function should be applied to each element (row) of the
-selection. Here we are passing `ByRow` wrapper to target column name `:Sex`
-using `uppercase` function:
+In the previous examples, the source columns have been individually selected.
+When broadcasting multiple columns to the same function,
+often similarities in the column names or position can be exploited to avoid
+tedious selection.
+Consider a data frame with temperature data at three different locations
+taken over time.
+```julia
+julia> df = DataFrame(Time = 1:4,
+                      Temperature1 = [20, 23, 25, 28],
+                      Temperature2 = [33, 37, 41, 44],
+                      Temperature3 = [15, 10, 4, 0])
+4×4 DataFrame
+ Row │ Time   Temperature1  Temperature2  Temperature3
+     │ Int64  Int64         Int64         Int64
+─────┼─────────────────────────────────────────────────
+   1 │     1            20            33            15
+   2 │     2            23            37            10
+   3 │     3            25            41             4
+   4 │     4            28            44             0
+```
+
+To convert all of the temperature data in one transformation,
+we just need to define a conversion function and broadcast
+it to all of the "Temperature" columns.
+
+```julia
+julia> celsius_to_kelvin(x) = x + 273
+celsius_to_kelvin (generic function with 1 method)
+
+julia> transform(
+           df,
+           Cols(r"Temp") .=> ByRow(celsius_to_kelvin),
+           renamecols = false
+       )
+4×4 DataFrame
+ Row │ Time   Temperature1  Temperature2  Temperature3
+     │ Int64  Int64         Int64         Int64
+─────┼─────────────────────────────────────────────────
+   1 │     1           293           306           288
+   2 │     2           296           310           283
+   3 │     3           298           314           277
+   4 │     4           301           317           273
+```
+Or, simultaneously changing the column names:
 
-```jldoctest dataframe
-julia> select(german, :Sex => ByRow(uppercase) => :SEX)
-1000×1 DataFrame
-  Row │ SEX
-      │ String
-──────┼────────
-    1 │ MALE
-    2 │ FEMALE
-    3 │ MALE
-    4 │ MALE
-    5 │ MALE
-    6 │ MALE
-    7 │ MALE
-    8 │ MALE
-  ⋮   │   ⋮
-  994 │ MALE
-  995 │ MALE
-  996 │ FEMALE
-  997 │ MALE
-  998 │ MALE
-  999 │ MALE
- 1000 │ MALE
-985 rows omitted
+```julia
+julia> rename_function(s) = "Temperature $(last(s)) (K)"
+rename_function (generic function with 1 method)
+
+julia> select(
+           df,
+           "Time",
+           Cols(r"Temp") .=> ByRow(celsius_to_kelvin) .=> rename_function
+       )
+4×4 DataFrame
+ Row │ Time   Temperature 1 (K)  Temperature 2 (K)  Temperature 3 (K)
+     │ Int64  Int64              Int64              Int64
+─────┼────────────────────────────────────────────────────────────────
+   1 │     1                293                306                288
+   2 │     2                296                310                283
+   3 │     3                298                314                277
+   4 │     4                301                317                273
 ```
 
-In this case we transform our source column `:Age` using `ByRow` wrapper and
-automatically generate the target column name:
+!!! note "Notes"
 
-```jldoctest dataframe
-julia> select(german, :Age, :Age => ByRow(sqrt))
-1000×2 DataFrame
-  Row │ Age    Age_sqrt
-      │ Int64  Float64
-──────┼─────────────────
-    1 │    67   8.18535
-    2 │    22   4.69042
-    3 │    49   7.0
-    4 │    45   6.7082
-    5 │    53   7.28011
-    6 │    35   5.91608
-    7 │    53   7.28011
-    8 │    35   5.91608
-  ⋮   │   ⋮       ⋮
-  994 │    30   5.47723
-  995 │    50   7.07107
-  996 │    31   5.56776
-  997 │    40   6.32456
-  998 │    38   6.16441
-  999 │    23   4.79583
- 1000 │    27   5.19615
-        985 rows omitted
+    * `Not("Time")` or `2:4` would have been equally good choices for `source_column_selector` in the above operations.
+    * Don't forget `ByRow` if your function is to be applied to elements rather than entire column vectors.
+    Without `ByRow`, the manipulations above would have thrown
+    `ERROR: MethodError: no method matching +(::Vector{Int64}, ::Int64)`.
+    * Regular expression (`r""`) and `:` `source_column_selectors`
+    must be wrapped in `Cols` to be properly broadcasted
+    because otherwise the broadcasting occurs before the expression is expanded into a vector of matches.
+
+You could also broadcast different columns to different functions
+by supplying a vector of functions.
+
+```julia
+julia> df = DataFrame(a=1:4, b=5:8)
+4×2 DataFrame
+ Row │ a      b
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      5
+   2 │     2      6
+   3 │     3      7
+   4 │     4      8
+
+julia> f1(x) = x .+ 1
+f1 (generic function with 1 method)
+
+julia> f2(x) = x ./ 10
+f2 (generic function with 1 method)
+
+julia> transform(df, [:a, :b] .=> [f1, f2])
+4×4 DataFrame
+ Row │ a      b      a_f1   b_f2
+     │ Int64  Int64  Int64  Float64
+─────┼──────────────────────────────
+   1 │     1      5      2      0.5
+   2 │     2      6      3      0.6
+   3 │     3      7      4      0.7
+   4 │     4      8      5      0.8
 ```
 
-When we pass just a column (without the `=>` part) we can use any column selector
-that is allowed in indexing.
+However, this form is not much more convenient than supplying
+multiple individual operations.
 
-Here we exclude the column `:Age` from the resulting data frame:
+```julia
+julia> transform(df, [:a => f1, :b => f2]) # same manipulation as previous
+4×4 DataFrame
+ Row │ a      b      a_f1   b_f2
+     │ Int64  Int64  Int64  Float64
+─────┼──────────────────────────────
+   1 │     1      5      2      0.5
+   2 │     2      6      3      0.6
+   3 │     3      7      4      0.7
+   4 │     4      8      5      0.8
+```
+
+Perhaps more useful for broadcasting syntax
+is to apply multiple functions to multiple columns
+by changing the vector of functions to a 1-by-x matrix of functions.
+(Recall that a list, a vector, or a matrix of operation pairs are all valid
+for passing to the manipulation functions.)
 
-```jldoctest dataframe
-julia> select(german, Not(:Age))
-1000×9 DataFrame
-  Row │ id     Sex      Job    Housing  Saving accounts  Checking account  Cre ⋯
-      │ Int64  String7  Int64  String7  String15         String15          Int ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │     0  male         2  own      NA               little                ⋯
-    2 │     1  female       2  own      little           moderate
-    3 │     2  male         1  own      little           NA
-    4 │     3  male         2  free     little           little
-    5 │     4  male         2  free     little           little                ⋯
-    6 │     5  male         1  free     NA               NA
-    7 │     6  male         2  own      quite rich       NA
-    8 │     7  male         3  rent     little           moderate
-  ⋮   │   ⋮       ⋮       ⋮       ⋮            ⋮                ⋮              ⋱
-  994 │   993  male         3  own      little           little                ⋯
-  995 │   994  male         2  own      NA               NA
-  996 │   995  female       1  own      little           NA
-  997 │   996  male         3  own      little           little
-  998 │   997  male         2  own      little           NA                    ⋯
-  999 │   998  male         2  free     little           little
- 1000 │   999  male         2  own      moderate         moderate
-                                                  3 columns and 985 rows omitted
+```julia
+julia> [:a, :b] .=> [f1 f2] # No comma `,` between f1 and f2
+2×2 Matrix{Pair{Symbol}}:
+ :a=>f1  :a=>f2
+ :b=>f1  :b=>f2
+
+julia> transform(df, [:a, :b] .=> [f1 f2]) # No comma `,` between f1 and f2
+4×6 DataFrame
+ Row │ a      b      a_f1   b_f1   a_f2     b_f2
+     │ Int64  Int64  Int64  Int64  Float64  Float64
+─────┼──────────────────────────────────────────────
+   1 │     1      5      2      6      0.1      0.5
+   2 │     2      6      3      7      0.2      0.6
+   3 │     3      7      4      8      0.3      0.7
+   4 │     4      8      5      9      0.4      0.8
+```
+
+In this way, every combination of selected columns and functions will be applied.
+
+Pair broadcasting is a simple but powerful tool
+that can be used in any of the manipulation functions listed under
+[Manipulation Functions](@ref).
+Experiment for yourself to discover other useful operations.
+
+### Additional Resources
+More details and examples of operation pair syntax can be found in
+[this blog post](https://bkamins.github.io/julialang/2020/12/24/minilanguage.html).
+(The official wording describing the syntax has changed since the blog post was written,
+but the examples are still illustrative.
+The operation pair syntax is sometimes referred to as the DataFrames.jl mini-language
+or Domain-Specific Language.)
+
+For additional syntax niceties,
+many users find the [Chain.jl](https://github.com/jkrumbiegel/Chain.jl)
+and [DataFramesMeta.jl](https://github.com/JuliaData/DataFramesMeta.jl)
+packages useful
+to help simplify manipulations that may be tedious with operation pairs alone.
+
+## Approach Comparison
+
+After that deep dive into [Manipulation Functions](@ref),
+it is a good idea to review the alternative approaches covered in
+[Getting and Setting Data in a Data Frame](@ref).
+Let us compare the approaches with a few examples.
+
+For simple operations,
+often getting/setting data with dot syntax
+is simpler than the equivalent data frame manipulation.
+Here we will add the two columns of our data frame together
+and place the result in a new third column.
+
+**Setup:**
+
+```julia
+julia> df = DataFrame(x = 1:3, y = 4:6)  # define a data frame
+3×2 DataFrame
+ Row │ x      y
+     │ Int64  Int64
+─────┼──────────────
+   1 │     1      4
+   2 │     2      5
+   3 │     3      6
 ```
 
-In the next example we drop columns `"Age"`, `"Saving accounts"`,
-`"Checking account"`, `"Credit amount"`, and `"Purpose"`. Note that this time
-we use string column selectors because some of the column names have spaces
-in them:
+**Manipulation:**
 
-```jldoctest dataframe
-julia> select(german, Not(["Age", "Saving accounts", "Checking account",
-                           "Credit amount", "Purpose"]))
-1000×5 DataFrame
-  Row │ id     Sex      Job    Housing  Duration
-      │ Int64  String7  Int64  String7  Int64
-──────┼──────────────────────────────────────────
-    1 │     0  male         2  own             6
-    2 │     1  female       2  own            48
-    3 │     2  male         1  own            12
-    4 │     3  male         2  free           42
-    5 │     4  male         2  free           24
-    6 │     5  male         1  free           36
-    7 │     6  male         2  own            24
-    8 │     7  male         3  rent           36
-  ⋮   │   ⋮       ⋮       ⋮       ⋮        ⋮
-  994 │   993  male         3  own            36
-  995 │   994  male         2  own            12
-  996 │   995  female       1  own            12
-  997 │   996  male         3  own            30
-  998 │   997  male         2  own            12
-  999 │   998  male         2  free           45
- 1000 │   999  male         2  own            45
-                                 985 rows omitted
-
-```
-
-As another example let us present that the `r"S"` regular expression we used
-above also works with `select`:
+```julia
+julia> transform!(df, [:x, :y] => (+) => :z)
+3×3 DataFrame
+ Row │ x      y      z
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      5
+   2 │     2      5      7
+   3 │     3      6      9
+```
 
-```jldoctest dataframe
-julia> select(german, r"S")
-1000×2 DataFrame
-  Row │ Sex      Saving accounts
-      │ String7  String15
-──────┼──────────────────────────
-    1 │ male     NA
-    2 │ female   little
-    3 │ male     little
-    4 │ male     little
-    5 │ male     little
-    6 │ male     NA
-    7 │ male     quite rich
-    8 │ male     little
-  ⋮   │    ⋮            ⋮
-  994 │ male     little
-  995 │ male     NA
-  996 │ female   little
-  997 │ male     little
-  998 │ male     little
-  999 │ male     little
- 1000 │ male     moderate
-                 985 rows omitted
-```
-
-The benefit of `select` or `combine` over indexing is that it is easier
-to get the union of several column selectors, e.g.:
+**Dot Syntax:**
 
-```jldoctest dataframe
-julia> select(german, r"S", "Job", 1)
-1000×4 DataFrame
-  Row │ Sex      Saving accounts  Job    id
-      │ String7  String15         Int64  Int64
-──────┼────────────────────────────────────────
-    1 │ male     NA                   2      0
-    2 │ female   little               2      1
-    3 │ male     little               1      2
-    4 │ male     little               2      3
-    5 │ male     little               2      4
-    6 │ male     NA                   1      5
-    7 │ male     quite rich           2      6
-    8 │ male     little               3      7
-  ⋮   │    ⋮            ⋮           ⋮      ⋮
-  994 │ male     little               3    993
-  995 │ male     NA                   2    994
-  996 │ female   little               1    995
-  997 │ male     little               3    996
-  998 │ male     little               2    997
-  999 │ male     little               2    998
- 1000 │ male     moderate             2    999
-                               985 rows omitted
-```
-
-Taking advantage of this flexibility here is an idiomatic pattern to move some
-column to the front of a data frame:
+```julia
+julia> df.z = df.x + df.y
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
 
-```jldoctest dataframe
-julia> select(german, "Sex", :)
-1000×10 DataFrame
-  Row │ Sex      id     Age    Job    Housing  Saving accounts  Checking accou ⋯
-      │ String7  Int64  Int64  Int64  String7  String15         String15       ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │ male         0     67      2  own      NA               little         ⋯
-    2 │ female       1     22      2  own      little           moderate
-    3 │ male         2     49      1  own      little           NA
-    4 │ male         3     45      2  free     little           little
-    5 │ male         4     53      2  free     little           little         ⋯
-    6 │ male         5     35      1  free     NA               NA
-    7 │ male         6     53      2  own      quite rich       NA
-    8 │ male         7     35      3  rent     little           moderate
-  ⋮   │    ⋮       ⋮      ⋮      ⋮       ⋮            ⋮                ⋮       ⋱
-  994 │ male       993     30      3  own      little           little         ⋯
-  995 │ male       994     50      2  own      NA               NA
-  996 │ female     995     31      1  own      little           NA
-  997 │ male       996     40      3  own      little           little
-  998 │ male       997     38      2  own      little           NA             ⋯
-  999 │ male       998     23      2  free     little           little
- 1000 │ male       999     27      2  own      moderate         moderate
-                                                  4 columns and 985 rows omitted
+julia> df  # see that the previous expression updated the data frame `df`
+3×3 DataFrame
+ Row │ x      y      z
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      5
+   2 │     2      5      7
+   3 │     3      6      9
+```
+
+Recall that the return type from a data frame manipulation function call is always a data frame.
+The return type of a data frame column accessed with dot syntax is a `Vector`.
+Thus the expression `df.x + df.y` gets the column data as vectors
+and returns the result of the vector addition.
+However, in that same line,
+we assigned the resultant `Vector` to a new column `z` in the data frame `df`.
+We could have instead assigned the resultant `Vector` to some other variable,
+and then `df` would not have been altered.
+The approach with dot syntax is very versatile
+since the data getting, mathematics, and data setting can be separate steps.
+
+```julia
+julia> df.x  # dot syntax returns a vector
+3-element Vector{Int64}:
+ 1
+ 2
+ 3
+
+julia> v = df.x + df.y  # assign mathematical result to a vector `v`
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+
+julia> df.z = v  # place `v` into the data frame `df` with the column name `z`
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
 ```
 
-Below, we are simply passing source column and target column name to rename them
-(without specifying the transformation part):
+However, one way in which dot syntax is less versatile
+is that the column name must be explicitly written in the code.
+Indexing syntax is a good alternative in these cases
+which is only slightly longer to write than dot syntax.
+Both indexing syntax and manipulation functions can operate on dynamic column names
+stored in variables.
 
-```jldoctest dataframe
-julia> select(german, :Sex => :x1, :Age => :x2)
-1000×2 DataFrame
-  Row │ x1       x2
-      │ String7  Int64
-──────┼────────────────
-    1 │ male        67
-    2 │ female      22
-    3 │ male        49
-    4 │ male        45
-    5 │ male        53
-    6 │ male        35
-    7 │ male        53
-    8 │ male        35
-  ⋮   │    ⋮       ⋮
-  994 │ male        30
-  995 │ male        50
-  996 │ female      31
-  997 │ male        40
-  998 │ male        38
-  999 │ male        23
- 1000 │ male        27
-       985 rows omitted
+**Setup:**
+
+Imagine this setup data was read from a file and/or entered by a user at runtime.
+
+```julia
+julia> df = DataFrame("My First Column" => 1:3, "My Second Column" => 4:6)  # define a data frame
+3×2 DataFrame
+ Row │ My First Column  My Second Column
+     │ Int64            Int64
+─────┼───────────────────────────────────
+   1 │               1                 4
+   2 │               2                 5
+   3 │               3                 6
+
+julia> c1 = "My First Column"; c2 = "My Second Column"; c3 = "My Third Column";  # define column names
 ```
 
-It is important to note that `select` always returns a data frame, even if a
-single column selected as opposed to indexing syntax. Compare the following:
+**Dot Syntax:**
 
-```jldoctest dataframe
-julia> select(german, :Age)
-1000×1 DataFrame
-  Row │ Age
-      │ Int64
-──────┼───────
-    1 │    67
-    2 │    22
-    3 │    49
-    4 │    45
-    5 │    53
-    6 │    35
-    7 │    53
-    8 │    35
-  ⋮   │   ⋮
-  994 │    30
-  995 │    50
-  996 │    31
-  997 │    40
-  998 │    38
-  999 │    23
- 1000 │    27
-985 rows omitted
+```julia
+julia> df.c1  # dot syntax expects an explicit column name and cannot be used to access variable column name
+ERROR: ArgumentError: column name :c1 not found in the data frame
+```
 
-julia> german[:, :Age]
-1000-element Vector{Int64}:
- 67
- 22
- 49
- 45
- 53
- 35
- 53
- 35
- 61
- 28
-  ⋮
- 34
- 23
- 30
- 50
- 31
- 40
- 38
- 23
- 27
-```
-
-By default `select` copies columns of a passed source data frame. In order to
-avoid copying, pass the `copycols=false` keyword argument:
+**Indexing:**
 
-```jldoctest dataframe
-julia> df = select(german, :Sex)
-1000×1 DataFrame
-  Row │ Sex
-      │ String7
-──────┼─────────
-    1 │ male
-    2 │ female
-    3 │ male
-    4 │ male
-    5 │ male
-    6 │ male
-    7 │ male
-    8 │ male
-  ⋮   │    ⋮
-  994 │ male
-  995 │ male
-  996 │ female
-  997 │ male
-  998 │ male
-  999 │ male
- 1000 │ male
-985 rows omitted
+```julia
+julia> df[:, c3] = df[:, c1] + df[:, c2]  # access columns with names stored in variables
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
 
-julia> df.Sex === german.Sex # copy
-false
+julia> df  # see that the previous expression updated the data frame `df`
+3×3 DataFrame
+ Row │ My First Column  My Second Column  My Third Column
+     │ Int64            Int64             Int64
+─────┼────────────────────────────────────────────────────
+   1 │               1                 4                5
+   2 │               2                 5                7
+   3 │               3                 6                9
+```
 
-julia> df = select(german, :Sex, copycols=false)
-1000×1 DataFrame
-  Row │ Sex
-      │ String7
-──────┼─────────
-    1 │ male
-    2 │ female
-    3 │ male
-    4 │ male
-    5 │ male
-    6 │ male
-    7 │ male
-    8 │ male
-  ⋮   │    ⋮
-  994 │ male
-  995 │ male
-  996 │ female
-  997 │ male
-  998 │ male
-  999 │ male
- 1000 │ male
-985 rows omitted
+**Manipulation:**
 
-julia> df.Sex === german.Sex # no-copy is performed
-true
+```julia
+julia> transform!(df, [c1, c2] => (+) => c3)  # access columns with names stored in variables
+3×3 DataFrame
+ Row │ My First Column  My Second Column  My Third Column
+     │ Int64            Int64             Int64
+─────┼────────────────────────────────────────────────────
+   1 │               1                 4                5
+   2 │               2                 5                7
+   3 │               3                 6                9
 ```
 
-To perform the selection operation in-place use `select!`:
+Additionally, manipulation functions only require
+the name of the data frame to be written once.
+This can be helpful when dealing with long variable and column names.
 
-```jldoctest dataframe
-julia> select!(german, Not(:Age));
+**Setup:**
 
-julia> german
-1000×9 DataFrame
-  Row │ id     Sex      Job    Housing  Saving accounts  Checking account  Cre ⋯
-      │ Int64  String7  Int64  String7  String15         String15          Int ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │     0  male         2  own      NA               little                ⋯
-    2 │     1  female       2  own      little           moderate
-    3 │     2  male         1  own      little           NA
-    4 │     3  male         2  free     little           little
-    5 │     4  male         2  free     little           little                ⋯
-    6 │     5  male         1  free     NA               NA
-    7 │     6  male         2  own      quite rich       NA
-    8 │     7  male         3  rent     little           moderate
-  ⋮   │   ⋮       ⋮       ⋮       ⋮            ⋮                ⋮              ⋱
-  994 │   993  male         3  own      little           little                ⋯
-  995 │   994  male         2  own      NA               NA
-  996 │   995  female       1  own      little           NA
-  997 │   996  male         3  own      little           little
-  998 │   997  male         2  own      little           NA                    ⋯
-  999 │   998  male         2  free     little           little
- 1000 │   999  male         2  own      moderate         moderate
-                                                  3 columns and 985 rows omitted
+```julia
+julia> my_very_long_data_frame_name = DataFrame(
+           "My First Column" => 1:3,
+           "My Second Column" => 4:6
+       )  # define a data frame
+3×2 DataFrame
+ Row │ My First Column  My Second Column
+     │ Int64            Int64
+─────┼───────────────────────────────────
+   1 │               1                 4
+   2 │               2                 5
+   3 │               3                 6
+
+julia> c1 = "My First Column"; c2 = "My Second Column"; c3 = "My Third Column";  # define column names
 ```
 
-As you can see the `:Age` column was dropped from the `german` data frame.
+**Manipulation:**
 
-The `transform` and `transform!` functions work identically to `select` and
-`select!` with the only difference that they retain all columns that are present
-in the source data frame. Here are some examples:
+```julia
 
-```jldoctest dataframe
-julia> german = copy(german_ref);
+julia> transform!(my_very_long_data_frame_name, [c1, c2] => (+) => c3)
+3×3 DataFrame
+ Row │ My First Column  My Second Column  My Third Column
+     │ Int64            Int64             Int64
+─────┼────────────────────────────────────────────────────
+   1 │               1                 4                5
+   2 │               2                 5                7
+   3 │               3                 6                9
+```
 
-julia> df = german_ref[1:8, 1:5]
-8×5 DataFrame
- Row │ id     Age    Sex      Job    Housing
-     │ Int64  Int64  String7  Int64  String7
-─────┼───────────────────────────────────────
-   1 │     0     67  male         2  own
-   2 │     1     22  female       2  own
-   3 │     2     49  male         1  own
-   4 │     3     45  male         2  free
-   5 │     4     53  male         2  free
-   6 │     5     35  male         1  free
-   7 │     6     53  male         2  own
-   8 │     7     35  male         3  rent
-
-julia> transform(df, :Age => maximum)
-8×6 DataFrame
- Row │ id     Age    Sex      Job    Housing  Age_maximum
-     │ Int64  Int64  String7  Int64  String7  Int64
+**Indexing:**
+
+```julia
+julia> my_very_long_data_frame_name[:, c3] = my_very_long_data_frame_name[:, c1] + my_very_long_data_frame_name[:, c2]
+3-element Vector{Int64}:
+ 5
+ 7
+ 9
+
+julia> df  # see that the previous expression updated the data frame `df`
+3×3 DataFrame
+ Row │ My First Column  My Second Column  My Third Column
+     │ Int64            Int64             Int64
 ─────┼────────────────────────────────────────────────────
-   1 │     0     67  male         2  own               67
-   2 │     1     22  female       2  own               67
-   3 │     2     49  male         1  own               67
-   4 │     3     45  male         2  free              67
-   5 │     4     53  male         2  free              67
-   6 │     5     35  male         1  free              67
-   7 │     6     53  male         2  own               67
-   8 │     7     35  male         3  rent              67
+   1 │               1                 4                5
+   2 │               2                 5                7
+   3 │               3                 6                9
 ```
 
-In the example below we are swapping values stored in columns `:Sex` and `:Age`:
+Another benefit of manipulation functions and indexing over dot syntax is that
+it is easier to operate on a subset of columns.
 
-```jldoctest dataframe
-julia> transform(german, :Age => :Sex, :Sex => :Age)
-1000×10 DataFrame
-  Row │ id     Age      Sex    Job    Housing  Saving accounts  Checking accou ⋯
-      │ Int64  String7  Int64  Int64  String7  String15         String15       ⋯
-──────┼─────────────────────────────────────────────────────────────────────────
-    1 │     0  male        67      2  own      NA               little         ⋯
-    2 │     1  female      22      2  own      little           moderate
-    3 │     2  male        49      1  own      little           NA
-    4 │     3  male        45      2  free     little           little
-    5 │     4  male        53      2  free     little           little         ⋯
-    6 │     5  male        35      1  free     NA               NA
-    7 │     6  male        53      2  own      quite rich       NA
-    8 │     7  male        35      3  rent     little           moderate
-  ⋮   │   ⋮       ⋮       ⋮      ⋮       ⋮            ⋮                ⋮       ⋱
-  994 │   993  male        30      3  own      little           little         ⋯
-  995 │   994  male        50      2  own      NA               NA
-  996 │   995  female      31      1  own      little           NA
-  997 │   996  male        40      3  own      little           little
-  998 │   997  male        38      2  own      little           NA             ⋯
-  999 │   998  male        23      2  free     little           little
- 1000 │   999  male        27      2  own      moderate         moderate
-                                                  4 columns and 985 rows omitted
+**Setup:**
+
+```julia
+julia> df = DataFrame(x = 1:3, y = 4:6, z = 7:9)  # define data frame
+3×3 DataFrame
+ Row │ x      y      z
+     │ Int64  Int64  Int64
+─────┼─────────────────────
+   1 │     1      4      7
+   2 │     2      5      8
+   3 │     3      6      9
 ```
 
-If we give more than one source column to a transformation they are passed as
-consecutive positional arguments. So for example the
-`[:Age, :Job] => (+) => :res` transformation below evaluates `+(df1.Age, df1.Job)`
-(which adds two columns) and stores the result in the `:res` column:
+**Dot Syntax:**
 
-```jldoctest dataframe
-julia> select(german, :Age, :Job, [:Age, :Job] => (+) => :res)
-1000×3 DataFrame
-  Row │ Age    Job    res
-      │ Int64  Int64  Int64
-──────┼─────────────────────
-    1 │    67      2     69
-    2 │    22      2     24
-    3 │    49      1     50
-    4 │    45      2     47
-    5 │    53      2     55
-    6 │    35      1     36
-    7 │    53      2     55
-    8 │    35      3     38
-  ⋮   │   ⋮      ⋮      ⋮
-  994 │    30      3     33
-  995 │    50      2     52
-  996 │    31      1     32
-  997 │    40      3     43
-  998 │    38      2     40
-  999 │    23      2     25
- 1000 │    27      2     29
-            985 rows omitted
-```
-
-In the examples given in this introductory tutorial we did not cover all
-options of the transformation mini-language. More advanced examples, in particular
-showing how to pass or produce multiple columns using the `AsTable` operation
-(which you might have seen in some DataFrames.jl demos) are given in the later
-sections of the manual.
+```julia
+julia> df.Not(:x)  # will not work; requires a literal column name
+ERROR: ArgumentError: column name :Not not found in the data frame
+```
+
+**Indexing:**
+
+```julia
+julia> df[:, :y_z_max] = maximum.(eachrow(df[:, Not(:x)]))  # find maximum value across all rows except for column `x`
+3-element Vector{Int64}:
+ 7
+ 8
+ 9
+
+julia> df  # see that the previous expression updated the data frame `df`
+3×4 DataFrame
+ Row │ x      y      z      y_z_max
+     │ Int64  Int64  Int64  Int64
+─────┼──────────────────────────────
+   1 │     1      4      7        7
+   2 │     2      5      8        8
+   3 │     3      6      9        9
+```
+
+**Manipulation:**
+
+```julia
+julia> transform!(df, Not(:x) => ByRow(max))  # find maximum value across all rows except for column `x`
+3×4 DataFrame
+ Row │ x      y      z      y_z_max
+     │ Int64  Int64  Int64  Int64
+─────┼──────────────────────────────
+   1 │     1      4      7        7
+   2 │     2      5      8        8
+   3 │     3      6      9        9
+```
+
+Moreover, indexing can operate on a subset of columns *and* rows.
+
+**Indexing:**
+
+```julia
+julia> y_z_max_row3 = maximum(df[3, Not(:x)])  # find maximum value across row 3 except for column `x`
+9
+```
+
+Hopefully this small comparison has illustrated some of the benefits and drawbacks
+of the various syntaxes available in DataFrames.jl.
+The best syntax to use depends on the situation.