Skip to content

Commit

Permalink
Merge branch 'main' into nb/manipulation_function_basics
Browse files Browse the repository at this point in the history
  • Loading branch information
nathanrboyer authored Oct 17, 2023
2 parents 79a1171 + 1a5da8a commit 621f253
Show file tree
Hide file tree
Showing 12 changed files with 401 additions and 66 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-runtest@v1
env:
JULIA_NUM_THREADS: 4
JULIA_NUM_THREADS: 4,1
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v1
with:
Expand Down
14 changes: 14 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,24 @@

## New functionalities

* Allow passing multiple values to add in `push!`, `pushfirst!`,
`append!`, and `prepend!`
([#3372](https://github.com/JuliaData/DataFrames.jl/pull/3372))
* `rename` and `rename!` now allow to apply a function transforming
column names only to a subset of the columns specified by the `cols`
keyword argument
([#3380](https://github.com/JuliaData/DataFrames.jl/pull/3380))
* `mapcols` and `mapcols!` now allow to apply a function transforming
columns only to a subset of the columns specified by the `cols`
keyword argument
([#3386](https://github.com/JuliaData/DataFrames.jl/pull/3386))

## Bug fixes

* Always use the default thread pool for multithreaded operations,
instead of using the interactive thread pool when Julia was started
with `-tM,N` with N > 0
([#3385](https://github.com/JuliaData/DataFrames.jl/pull/3385))

# DataFrames.jl v1.6.1 Release Notes

Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name = "DataFrames"
uuid = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
version = "1.6.1"
version = "1.7.0"

[deps]
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"
Expand Down
3 changes: 2 additions & 1 deletion docs/src/lib/functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ CurrentModule = DataFrames
## Multithreading support

By default, selected operations in DataFrames.jl automatically use multiple threads
when available. It is task-based and implemented using the `@spawn` macro from Julia Base.
when available. Multi-threading is task-based and implemented using the `@spawn`
macro from Julia Base. Tasks are therefore scheduled on the `:default` threadpool.
Functions that take user-defined functions and may run it in parallel
accept a `threads` keyword argument which allows disabling multithreading
when the provided function requires serial execution or is not thread-safe.
Expand Down
2 changes: 1 addition & 1 deletion src/abstractdataframe/abstractdataframe.jl
Original file line number Diff line number Diff line change
Expand Up @@ -254,7 +254,7 @@ function rename!(df::AbstractDataFrame,
return df
end

# needed because of dispach ambiguity
# needed because of dispatch ambiguity
function rename!(df::AbstractDataFrame)
_drop_all_nonnote_metadata!(parent(df))
return df
Expand Down
78 changes: 61 additions & 17 deletions src/abstractdataframe/iteration.jl
Original file line number Diff line number Diff line change
Expand Up @@ -107,20 +107,20 @@ as a `DataFrameRows` over a view of rows of parent of `dfr`.
julia> collect(Iterators.partition(eachrow(DataFrame(x=1:5)), 2))
3-element Vector{DataFrames.DataFrameRows{SubDataFrame{DataFrame, DataFrames.Index, UnitRange{Int64}}}}:
2×1 DataFrameRows
Row │ x
│ Int64
Row │ x
│ Int64
─────┼───────
1 │ 1
2 │ 2
2×1 DataFrameRows
Row │ x
│ Int64
Row │ x
│ Int64
─────┼───────
1 │ 3
2 │ 4
1×1 DataFrameRows
Row │ x
│ Int64
Row │ x
│ Int64
─────┼───────
1 │ 5
```
Expand Down Expand Up @@ -408,12 +408,17 @@ Base.show(dfcs::DataFrameColumns;
summary=summary, eltypes=eltypes, truncate=truncate, kwargs...)

"""
mapcols(f::Union{Function, Type}, df::AbstractDataFrame)
mapcols(f::Union{Function, Type}, df::AbstractDataFrame; cols=All())
Return a `DataFrame` where each column of `df` selected by `cols` (by default, all columns)
is transformed using function `f`.
Columns not selected by `cols` are copied.
Return a `DataFrame` where each column of `df` is transformed using function `f`.
`f` must return `AbstractVector` objects all with the same length or scalars
(all values other than `AbstractVector` are considered to be a scalar).
The `cols` column selector can be any value accepted as column selector by the `names` function.
Note that `mapcols` guarantees not to reuse the columns from `df` in the returned
`DataFrame`. If `f` returns its argument then it gets copied before being stored.
Expand All @@ -440,15 +445,32 @@ julia> mapcols(x -> x.^2, df)
2 │ 4 144
3 │ 9 169
4 │ 16 196
julia> mapcols(x -> x.^2, df, cols=r"y")
4×2 DataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 1 121
2 │ 2 144
3 │ 3 169
4 │ 4 196
```
"""
function mapcols(f::Union{Function, Type}, df::AbstractDataFrame)
function mapcols(f::Union{Function, Type}, df::AbstractDataFrame; cols=All())
if cols === All() || cols === Colon()
apply = Iterators.repeated(true)
else
picked = Set(names(df, cols))
apply = Bool[name in picked for name in names(df)]
end

# note: `f` must return a consistent length
vs = AbstractVector[]
seenscalar = false
seenvector = false
for v in eachcol(df)
fv = f(v)
for (v, doapply) in zip(eachcol(df), apply)
fv = doapply ? f(v) : copy(v)
if fv isa AbstractVector
if seenscalar
throw(ArgumentError("mixing scalars and vectors in mapcols not allowed"))
Expand All @@ -470,9 +492,12 @@ function mapcols(f::Union{Function, Type}, df::AbstractDataFrame)
end

"""
mapcols!(f::Union{Function, Type}, df::DataFrame)
mapcols!(f::Union{Function, Type}, df::DataFrame; cols=All())
Update a `DataFrame` in-place where each column of `df` selected by `cols` (by default, all columns)
is transformed using function `f`.
Columns not selected by `cols` are left unchanged.
Update a `DataFrame` in-place where each column of `df` is transformed using function `f`.
`f` must return `AbstractVector` objects all with the same length or scalars
(all values other than `AbstractVector` are considered to be a scalar).
Expand Down Expand Up @@ -503,20 +528,39 @@ julia> df
2 │ 4 144
3 │ 9 169
4 │ 16 196
julia> mapcols!(x -> 2 * x, df, cols=r"x");
julia> df
4×2 DataFrame
Row │ x y
│ Int64 Int64
─────┼──────────────
1 │ 2 121
2 │ 8 144
3 │ 18 169
4 │ 32 196
```
"""
function mapcols!(f::Union{Function, Type}, df::DataFrame)
# note: `f` must return a consistent length
function mapcols!(f::Union{Function,Type}, df::DataFrame; cols=All())
if ncol(df) == 0 # skip if no columns
_drop_all_nonnote_metadata!(df)
return df
end

if cols === All() || cols === Colon()
apply = Iterators.repeated(true)
else
picked = Set(names(df, cols))
apply = Bool[name in picked for name in names(df)]
end

# note: `f` must return a consistent length
vs = AbstractVector[]
seenscalar = false
seenvector = false
for v in eachcol(df)
fv = f(v)
for (v, doapply) in zip(eachcol(df), apply)
fv = doapply ? f(v) : v
if fv isa AbstractVector
if seenscalar
throw(ArgumentError("mixing scalars and vectors in mapcols not allowed"))
Expand Down
Loading

0 comments on commit 621f253

Please sign in to comment.