Return a NamedTuple from a single transformation inside the DataFramesMeta.jl macros, @select, @transform, and their mutating and row-wise equivalents.
@astable acts on a single block. It works through all top-level expressions and collects all such expressions of the form :y = ... or $y = ..., i.e. assignments to a Symbol or an escaped column identifier, which is a syntax error outside of DataFramesMeta.jl macros. At the end of the expression, all assignments are collected into a NamedTuple to be used with the AsTable destination in the DataFrames.jl transformation mini-language.
Return a NamedTuple from a single transformation inside the DataFramesMeta.jl macros, @select, @transform, and their mutating and row-wise equivalents.
@astable acts on a single block. It works through all top-level expressions and collects all such expressions of the form :y = ... or $y = ..., i.e. assignments to a Symbol or an escaped column identifier, which is a syntax error outside of DataFramesMeta.jl macros. At the end of the expression, all assignments are collected into a NamedTuple to be used with the AsTable destination in the DataFrames.jl transformation mini-language.
cols : a column indicator (Symbol, Int, Vector{Symbol}, etc.)
e : keyword-like arguments, of the form :y = f(:x) specifying
new columns in terms of column groupings
kwargs : keyword arguments passed to DataFrames.combine
Returns
::DataFrame or a GroupedDataFrame
Details
Transformation inputs to @by can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation, or as a series of keyword-like arguments. For example, the following are equivalent:
cols : a column indicator (Symbol, Int, Vector{Symbol}, etc.)
e : keyword-like arguments, of the form :y = f(:x) specifying
new columns in terms of column groupings
kwargs : keyword arguments passed to DataFrames.combine
Returns
::DataFrame or a GroupedDataFrame
Details
Transformation inputs to @by can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation, or as a series of keyword-like arguments. For example, the following are equivalent:
@by df :g begin
:mx = mean(:x)
:sx = std(:x)
end
and
@by(df, :g, mx = mean(:x), sx = std(:x))
Transformations can also use the macro-flag @astable for creating multiple new columns at once and letting transformations share the same name-space. See ? @astable for more details.
@by accepts the same keyword arguments as DataFrames.combine and can be added in two ways. When inputs are given as multiple arguments, they are added at the end after a semi-colon ;, as in
@by(ds, :g, :x = first(:a); ungroup = false)
When inputs are given in "block" format, the last lines may be written @kwarg key = value, which indicates keyword arguments to be passed to combine function.
Broadcast operations within DataFramesMeta.jl macros.
@byrow is not a "real" Julia macro but rather serves as a "flag" to indicate that the anonymous function created by DataFramesMeta to represent an operation should be applied "by-row".
If an expression starts with @byrow, either of the form @byrow :y = f(:x) in transformations or @byrow f(:x) in @orderby, @subset, and @with, then the anonymous function created by DataFramesMeta is wrapped in the DataFrames.ByRow function wrapper, which broadcasts the function so that it run on each row.
Broadcast operations within DataFramesMeta.jl macros.
@byrow is not a "real" Julia macro but rather serves as a "flag" to indicate that the anonymous function created by DataFramesMeta to represent an operation should be applied "by-row".
If an expression starts with @byrow, either of the form @byrow :y = f(:x) in transformations or @byrow f(:x) in @orderby, @subset, and @with, then the anonymous function created by DataFramesMeta is wrapped in the DataFrames.ByRow function wrapper, which broadcasts the function so that it run on each row.
This problem comes up when using the @. macro as well, but can easily be fixed with $. Because $ is currently reserved for escaping column references, no solution currently exists with @byrow or in DataFramesMeta.jl at large. The best solution is simply
args... : transformations defining new columns, of the form :y = f(:x)
kwargs: : keyword arguments passed to DataFrames.combine
Results
A DataFrame or a GroupedDataFrame
Details
Inputs to @combine can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation, or as a series of keyword-like arguments. For example, the following are equivalent:
args... : transformations defining new columns, of the form :y = f(:x)
kwargs: : keyword arguments passed to DataFrames.combine
Results
A DataFrame or a GroupedDataFrame
Details
Inputs to @combine can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation, or as a series of keyword-like arguments. For example, the following are equivalent:
@combine df begin
:mx = mean(:x)
:sx = std(:x)
end
and
@combine(df, :mx = mean(:x), :sx = std(:x))
Transformations can also use the macro-flag @astable for creating multiple new columns at once and letting transformations share the same name-space. See ? @astable for more details.
@combine accepts the same keyword arguments as DataFrames.combine and can be added in two ways. When inputs are given as multiple arguments, they are added at the end after a semi-colon ;, as in
@combine(gd, :x = first(:a); ungroup = false)
When inputs are given in "block" format, the last lines may be written @kwarg key = value, which indicates keyword arguments to be passed to combine function.
In-place selection of unique rows in an AbstractDataFrame. Users should note that @distinct! differs from unique! in DataFrames.jl, such that @distinct!(df, [:x,:y]) is not equal to unique(df, [:x,:y]). See Details for a discussion of these differences.
Arguments
d : an AbstractDataFrame
args... : transformations of the form :x designating
symbols to specify columns or f(:x) specifying their transformations
Returns
::AbstractDataFrame
Inputs to @distinct! can come in two formats: a begin ... end block, or as a series of arguments and keyword-like arguments. For example, the following are equivalent:
In-place selection of unique rows in an AbstractDataFrame. Users should note that @distinct! differs from unique! in DataFrames.jl, such that @distinct!(df, [:x,:y]) is not equal to unique(df, [:x,:y]). See Details for a discussion of these differences.
Arguments
d : an AbstractDataFrame
args... : transformations of the form :x designating
symbols to specify columns or f(:x) specifying their transformations
Returns
::AbstractDataFrame
Inputs to @distinct! can come in two formats: a begin ... end block, or as a series of arguments and keyword-like arguments. For example, the following are equivalent:
@distinct! df begin
:x .+ :y
end
and
@distinct!(df, :x .+ :y)
@distinct! uses the syntax @byrow to wrap transformations in the ByRow function wrapper from DataFrames, apply a function row-wise, similar to broadcasting. @distinct! allows @byrow at the beginning of a block of selections (i.e. @byrow begin... end). The transformation in the block will operate by row. For example, the following two statements are equivalent.
@distinct! df @byrow begin
:x + :y
@@ -342,7 +342,7 @@
Row │ x y
│ Int64 Int64
─────┼───────────────
- 1 │ 1 10
Return the first occurrence of unique rows in an AbstractDataFrame according to given combinations of values in selected columns or their transformation. args can be most column selectors or transformation accepted by select. Users should note that @distinct differs from unique in DataFrames.jl, such that @distinct(df, :x,:y) is not the same as unique(df, [:x,:y]). See Details for a discussion of these differences.
Arguments
d : an AbstractDataFrame
args... : transformations of the form :x designating
symbols to specify columns or f(:x) specifying their transformations
Returns
::AbstractDataFrame
Inputs to @distinct can come in two formats: a begin ... end block, or as a series of arguments and keyword-like arguments. For example, the following are equivalent:
Return the first occurrence of unique rows in an AbstractDataFrame according to given combinations of values in selected columns or their transformation. args can be most column selectors or transformation accepted by select. Users should note that @distinct differs from unique in DataFrames.jl, such that @distinct(df, :x,:y) is not the same as unique(df, [:x,:y]). See Details for a discussion of these differences.
Arguments
d : an AbstractDataFrame
args... : transformations of the form :x designating
symbols to specify columns or f(:x) specifying their transformations
Returns
::AbstractDataFrame
Inputs to @distinct can come in two formats: a begin ... end block, or as a series of arguments and keyword-like arguments. For example, the following are equivalent:
@distinct df begin
:x + :y
end
and
@distinct(df, :x + :y)
@distinct uses the syntax @byrow to wrap transformations in the ByRow function wrapper from DataFrames, apply a function row-wise, similar to broadcasting. @distinct allows @byrow at the beginning of a block of selections (i.e. @byrow begin... end). The transformation in the block will operate by row. For example, the following two statements are equivalent.
@distinct df @byrow begin
:x + :y
@@ -368,7 +368,7 @@
Row │ x y
│ Int64 Int64
─────┼───────────────
- 1 │ 1 10
Act on each row of a data frame in-place, similar to
for row in eachrow(df)
... # Actions that modify `df`.
end
Includes support for control flow and begin end blocks. Since the "environment" induced by @eachrow! df is implicitly a single row of df, use regular operators and comparisons instead of their elementwise counterparts as in @with. Note that the scope within @eachrow! is a hard scope.
eachrow! also supports special syntax for allocating new columns. The syntax @newcol x::Vector{Int} allocates a new uninitialized column :x with an Vector container with eltype Int.This feature makes it easier to use eachrow for data transformations. _N is introduced to represent the number of rows in the data frame, _DF represents the dataframe including added columns, and row represents the index of the current row.
Changes to the rows directly affect df. The operation will modify the data frame in place. See @eachrow which employs the same syntax but allocates a fresh data frame.
Like with @transform!, @eachrow! supports the use of $ to work with column names stored as variables. Using $ with a multi-column selector, such as a Vector of Symbols, is currently unsupported.
@eachrow! is a thin wrapper around a for-loop. As a consequence, inside an @eachrow! block, the reserved-word arguments break and continue function the same as if written in a for loop. Rows unaffected by break and continue are unmodified, but are still present in modified. Also because @eachrow! is a for-loop, re-assigning global variables inside an @eachrow block is discouraged.
Act on each row of a data frame, producing a new dataframe. Similar to
for row in eachrow(copy(df))
...
end
Includes support for control flow and begin end blocks. Since the "environment" induced by @eachrow df is implicitly a single row of df, use regular operators and comparisons instead of their elementwise counterparts as in @with. Note that the scope within @eachrow is a hard scope.
eachrow also supports special syntax for allocating new columns. The syntax @newcol x::Vector{Int} allocates a new uninitialized column :x with an Vector container with eltype Int.This feature makes it easier to use eachrow for data transformations. _N is introduced to represent the number of rows in the data frame, _DF represents the DataFrame including added columns, and row represents the index of the current row.
Changes to the rows do not affect df but instead a freshly allocated data frame is returned by @eachrow. Also note that the returned data frame does not share columns with df. See @eachrow! which employs the same syntax but modifies the data frame in-place.
Like with @transform, @eachrow supports the use of $ to work with column names stored as variables. Using $ with a multi-column selector, such as a Vector of Symbols, is currently unsupported.
@eachrow is a thin wrapper around a for-loop. As a consequence, inside an @eachrow block, the reserved-word arguments break and continue function the same as if written in a for loop. Rows unaffected by break and continue are unmodified, but are still present in the returned data frame. Also because @eachrow is a for-loop, re-assigning global variables inside an @eachrow block is discouraged.
Sort rows by values in one of several columns or a transformation of columns. Always returns a fresh DataFrame. Does not accept a GroupedDataFrame.
Arguments
d: a DataFrame or GroupedDataFrame
i...: arguments on which to sort the object
Details
When given a DataFrame, @orderby applies the transformation given by its arguments (but does not create new columns) and sorts the given DataFrame on the result, returning a new DataFrame.
Inputs to @orderby can come in two formats: a begin ... end block, in which case each line in the block is a separate ordering operation, and as mulitple arguments. For example, the following two statements are equivalent:
@orderby df begin
+Wage is capped at 99th percentile
Sort rows by values in one of several columns or a transformation of columns. Always returns a fresh DataFrame. Does not accept a GroupedDataFrame.
Arguments
d: a DataFrame or GroupedDataFrame
i...: arguments on which to sort the object
Details
When given a DataFrame, @orderby applies the transformation given by its arguments (but does not create new columns) and sorts the given DataFrame on the result, returning a new DataFrame.
Inputs to @orderby can come in two formats: a begin ... end block, in which case each line in the block is a separate ordering operation, and as mulitple arguments. For example, the following two statements are equivalent:
@orderby df begin
:x
-:y
end
and
@orderby(df, :x, -:y)
Arguments
d : an AbstractDataFrame
i... : expression for sorting
If an expression provided to @orderby begins with @byrow, operations are applied "by row" along the data frame. To avoid writing @byrow multiple times, @orderby also allows @byrowto be placed at the beginning of a block of operations. For example, the following two statements are equivalent.
@orderby df @byrow begin
@@ -724,7 +724,7 @@
7 │ 2 8 h
8 │ 3 1 a
9 │ 3 2 b
- 10 │ 3 3 c
@passmissing is not a "real" Julia macro but rather serves as a "flag" to indicate that the anonymous function created by DataFramesMeta.jl to represent an operation should be wrapped in passmissing from Missings.jl.
@passmissing can only be combined with @byrow or the row-wise versions of macros such as @rtransform and @rselect, etc. If any of the arguments passed to the row-wise anonymous function created by DataFramesMeta.jl with @byrow, the result will automatically be missing.
In the below example, @transform would throw an error without the @passmissing flag.
@passmissing is especially useful for functions which operate on strings, such as parse.
Examples
julia> no_missing(x::Int, y::Int) = x + y;
+ 10 │ 3 3 c
@passmissing is not a "real" Julia macro but rather serves as a "flag" to indicate that the anonymous function created by DataFramesMeta.jl to represent an operation should be wrapped in passmissing from Missings.jl.
@passmissing can only be combined with @byrow or the row-wise versions of macros such as @rtransform and @rselect, etc. If any of the arguments passed to the row-wise anonymous function created by DataFramesMeta.jl with @byrow, the result will automatically be missing.
In the below example, @transform would throw an error without the @passmissing flag.
@passmissing is especially useful for functions which operate on strings, such as parse.
args... : expressions of the form :new = :old specifying the change of a column's name
from "old" to "new". The left- and right-hand side of each expression can be passed as symbol arguments, as in :old_col, or strings escaped with $ as in $"new_col". See Details for a description of accepted values.
Returns
::AbstractDataFrame
Inputs to @rename! can come in two formats: a begin ... end block, or as a series of keyword-like arguments. For example, the following are equivalent:
args... : expressions of the form :new = :old specifying the change of a column's name
from "old" to "new". The left- and right-hand side of each expression can be passed as symbol arguments, as in :old_col, or strings escaped with $ as in $"new_col". See Details for a description of accepted values.
Returns
::AbstractDataFrame
Inputs to @rename! can come in two formats: a begin ... end block, or as a series of keyword-like arguments. For example, the following are equivalent:
@rename! df begin
:new_col = :old_col
end
and
@rename!(df, :new_col = :old_col)
Details
Both the left- and right-hand side of an expression specifying a column name assignment can be either a Symbol or a Stringescaped with$` For example `:new = ...`, and `$"new" = ...` are both valid ways of assigning a new column name.
This idea can be extended to pass arbitrary right-hand side expressions. For example, the following are equivalent:
args... : expressions of the form :new = :old specifying the change of a column's name
from "old" to "new". The left- and right-hand side of each expression can be passed as symbol arguments, as in :old_col, or strings escaped with $ as in $"new_col". See Details for a description of accepted values.
Returns
::AbstractDataFrame
Inputs to @rename can come in two formats: a begin ... end block, or as a series of keyword-like arguments. For example, the following are equivalent:
args... : expressions of the form :new = :old specifying the change of a column's name
from "old" to "new". The left- and right-hand side of each expression can be passed as symbol arguments, as in :old_col, or strings escaped with $ as in $"new_col". See Details for a description of accepted values.
Returns
::AbstractDataFrame
Inputs to @rename can come in two formats: a begin ... end block, or as a series of keyword-like arguments. For example, the following are equivalent:
Both the left- and right-hand side of an expression specifying a column name assignment can be either a Symbol or an AbstractString (which may contain spaces) escaped with $. For example :new = ..., and $"new" = ... are both valid ways of assigning a new column name.
This idea can be extended to pass arbitrary right-hand side expressions. For example, the following are equivalent:
@rename(df, :new = :old1)
and
@rename(df, :new = old_col1)
The right-hand side can additionally be an Integer, escaped with $, to indicate column position. For example, to rename the 4th column in a data frame to a new name, write @rename df :newname = $.
Mutate d in-place to retain only columns or transformations specified by e and return it. No copies of existing columns are made.
Arguments
d : an AbstractDataFrame
i : transformations of the form :y = f(:x) specifying
new columns in terms of existing columns or symbols to specify existing columns
kwargs : keyword arguments passed to DataFrames.select!
Returns
::DataFrame
Details
Inputs to @select! can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation or selector, or as a series of arguments and keyword-like arguments. For example, the following are equivalent:
@select! uses the syntax @byrow to wrap transformations in the ByRow function wrapper from DataFrames, apply a function row-wise, similar to broadcasting. For example, the call
a transformation which cannot be conveniently expressed using broadcasting.
To avoid writing @byrow multiple times when performing multiple transformations by row, @select! allows @byrow at the beginning of a block of select!ations (i.e. @byrow begin... end). All transformations in the block will operate by row.
To select many columns at once use the tools Not, Between, All, and Cols.
@select df Not(:a) keeps all columns except for :a
@select df Between(:a, :z) keeps all columns between :a and :z, inclusive
@select df All() keeps all columns
@select df Cols(...) can be used to combine many different selectors, as well as use regular expressions. For example Cols(r"a") selects all columns that start with "a".
Transformations can also use the macro-flag @astable for creating multiple new columns at once and letting transformations share the same name-space. See ? @astable for more details.
In operations, it is also allowed to use AsTable(cols) to work with multiple columns at once, where the columns are grouped together in a NamedTuple. When AsTable(cols) appears in a operation, no other columns may be referenced in the block.
Using AsTable in this way is useful for working with many columns at once programmatically. For example, to compute the row-wise sum of the columns [:a, :b, :c, :d], write
@byrow :c = sum(AsTable([:a, :b, :c, :d]))
This constructs the pairs
AsTable(nms) => ByRow(sum) => :c
AsTable on the right-hand side also allows the use of the special column selectors Not, Between, and regular expressions. For example, to calculate the product of all the columns beginning with the letter "a", write
@byrow :d = prod(AsTable(r"^a"))
@select! accepts the same keyword arguments as DataFrames.select! and can be added in two ways. When inputs are given as multiple arguments, they are added at the end after a semi-colon ;, as in
@select!(gd, :a; ungroup = false)
When inputs are given in "block" format, the last lines may be written @kwarg key = value, which indicates keyword arguments to be passed to select! function.
Mutate d in-place to retain only columns or transformations specified by e and return it. No copies of existing columns are made.
Arguments
d : an AbstractDataFrame
i : transformations of the form :y = f(:x) specifying
new columns in terms of existing columns or symbols to specify existing columns
kwargs : keyword arguments passed to DataFrames.select!
Returns
::DataFrame
Details
Inputs to @select! can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation or selector, or as a series of arguments and keyword-like arguments. For example, the following are equivalent:
@select! uses the syntax @byrow to wrap transformations in the ByRow function wrapper from DataFrames, apply a function row-wise, similar to broadcasting. For example, the call
a transformation which cannot be conveniently expressed using broadcasting.
To avoid writing @byrow multiple times when performing multiple transformations by row, @select! allows @byrow at the beginning of a block of select!ations (i.e. @byrow begin... end). All transformations in the block will operate by row.
To select many columns at once use the tools Not, Between, All, and Cols.
@select df Not(:a) keeps all columns except for :a
@select df Between(:a, :z) keeps all columns between :a and :z, inclusive
@select df All() keeps all columns
@select df Cols(...) can be used to combine many different selectors, as well as use regular expressions. For example Cols(r"a") selects all columns that start with "a".
Transformations can also use the macro-flag @astable for creating multiple new columns at once and letting transformations share the same name-space. See ? @astable for more details.
In operations, it is also allowed to use AsTable(cols) to work with multiple columns at once, where the columns are grouped together in a NamedTuple. When AsTable(cols) appears in a operation, no other columns may be referenced in the block.
Using AsTable in this way is useful for working with many columns at once programmatically. For example, to compute the row-wise sum of the columns [:a, :b, :c, :d], write
@byrow :c = sum(AsTable([:a, :b, :c, :d]))
This constructs the pairs
AsTable(nms) => ByRow(sum) => :c
AsTable on the right-hand side also allows the use of the special column selectors Not, Between, and regular expressions. For example, to calculate the product of all the columns beginning with the letter "a", write
@byrow :d = prod(AsTable(r"^a"))
@select! accepts the same keyword arguments as DataFrames.select! and can be added in two ways. When inputs are given as multiple arguments, they are added at the end after a semi-colon ;, as in
@select!(gd, :a; ungroup = false)
When inputs are given in "block" format, the last lines may be written @kwarg key = value, which indicates keyword arguments to be passed to select! function.
i : transformations of the form :y = f(:x) specifying
new columns in terms of existing columns or symbols to specify existing columns
kwargs : keyword arguments passed to DataFrames.select
Returns
::AbstractDataFrame or a GroupedDataFrame
Details
Inputs to @select can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation or selector, or as a series of arguments and keyword-like arguments arguments. For example, the following are equivalent:
i : transformations of the form :y = f(:x) specifying
new columns in terms of existing columns or symbols to specify existing columns
kwargs : keyword arguments passed to DataFrames.select
Returns
::AbstractDataFrame or a GroupedDataFrame
Details
Inputs to @select can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation or selector, or as a series of arguments and keyword-like arguments arguments. For example, the following are equivalent:
@select df begin
:x
:y = :a .+ :b
end
and
@select(df, :x, :y = :a .+ :b)
@select uses the syntax @byrow to wrap transformations in the ByRow function wrapper from DataFrames, apply a function row-wise, similar to broadcasting. For example, the call
a transformation which cannot be conveniently expressed using broadcasting.
To avoid writing @byrow multiple times when performing multiple transformations by row, @select allows @byrow at the beginning of a block of selections (i.e. @byrow begin... end). All transformations in the block will operate by row.
To select many columns at once use the tools Not, Between, All, and Cols.
@select df Not(:a) keeps all columns except for :a
@select df Between(:a, :z) keeps all columns between :a and :z, inclusive
@select df All() keeps all columns
@select df Cols(...) can be used to combine many different selectors, as well as use regular expressions. For example Cols(r"a") selects all columns that start with "a".
Expressions inside Not(...), Between(...) etc. are untouched by DataFramesMeta's parsing. To refer to a variable x which represents a column inside Not, write Not(x), rather than Not($x).
Transformations can also use the macro-flag @astable for creating multiple new columns at once and letting transformations share the same name-space. See ? @astable for more details.
In operations, it is also allowed to use AsTable(cols) to work with multiple columns at once, where the columns are grouped together in a NamedTuple. When AsTable(cols) appears in a operation, no other columns may be referenced in the block.
Using AsTable in this way is useful for working with many columns at once programmatically. For example, to compute the row-wise sum of the columns [:a, :b, :c, :d], write
@byrow :c = sum(AsTable([:a, :b, :c, :d]))
This constructs the pairs
AsTable(nms) => ByRow(sum) => :c
AsTable on the right-hand side also allows the use of the special column selectors Not, Between, and regular expressions. For example, to calculate the product of all the columns beginning with the letter "a", write
@byrow :d = prod(AsTable(r"^a"))
@select accepts the same keyword arguments as DataFrames.select and can be added in two ways. When inputs are given as multiple arguments, they are added at the end after a semi-colon ;, as in
@select(df, :a; copycols = false)
When inputs are given in "block" format, the last lines may be written @kwarg key = value, which indicates keyword arguments to be passed to select function.
Select row subsets in AbstractDataFrames and GroupedDataFrames, mutating the underlying data-frame in-place.
Arguments
d : an AbstractDataFrame or GroupedDataFrame
i... : expression for selecting rows
kwargs : keyword arguments passed to DataFrames.subset!
Details
Multiple i expressions are "and-ed" together.
If given a GroupedDataFrame, @subset! applies transformations by group, and returns a fresh DataFrame containing the rows for which the generated values are all true.
Inputs to @subset! can come in two formats: a begin ... end block, in which case each line is a separate selector, or as multiple arguments. For example the following two statements are equivalent:
Select row subsets in AbstractDataFrames and GroupedDataFrames, mutating the underlying data-frame in-place.
Arguments
d : an AbstractDataFrame or GroupedDataFrame
i... : expression for selecting rows
kwargs : keyword arguments passed to DataFrames.subset!
Details
Multiple i expressions are "and-ed" together.
If given a GroupedDataFrame, @subset! applies transformations by group, and returns a fresh DataFrame containing the rows for which the generated values are all true.
Inputs to @subset! can come in two formats: a begin ... end block, in which case each line is a separate selector, or as multiple arguments. For example the following two statements are equivalent:
@subset! df begin
:x .> 1
:y .< 2
end
and
@subset!(df, :x .> 1, :y .< 2)
Note
@subset! treats missing values as false when filtering rows. Unlike DataFrames.subset! and other Boolean operations with missing, @subset! will not error on missing values, and will only keep true values.
If an expression provided to @subset! begins with @byrow, operations are applied "by row" along the data frame. To avoid writing @byrow multiple times, @orderby also allows @byrowto be placed at the beginning of a block of operations. For example, the following two statements are equivalent.
@subset! df @byrow begin
@@ -1167,7 +1167,7 @@
Row │ a b
│ Int64? String?
─────┼─────────────────
- 1 │ 1 x
Select row subsets in AbstractDataFrames and GroupedDataFrames.
Arguments
d : an AbstractDataFrame or GroupedDataFrame
i... : expression for selecting rows
kwargs... : keyword arguments passed to DataFrames.subset
Details
Multiple i expressions are "and-ed" together.
If given a GroupedDataFrame, @subset applies transformations by group, and returns a fresh DataFrame containing the rows for which the generated values are all true.
Inputs to @subset can come in two formats: a begin ... end block, in which case each line is a separate selector, or as multiple arguments. For example the following two statements are equivalent:
Select row subsets in AbstractDataFrames and GroupedDataFrames.
Arguments
d : an AbstractDataFrame or GroupedDataFrame
i... : expression for selecting rows
kwargs... : keyword arguments passed to DataFrames.subset
Details
Multiple i expressions are "and-ed" together.
If given a GroupedDataFrame, @subset applies transformations by group, and returns a fresh DataFrame containing the rows for which the generated values are all true.
Inputs to @subset can come in two formats: a begin ... end block, in which case each line is a separate selector, or as multiple arguments. For example the following two statements are equivalent:
@subset df begin
:x .> 1
:y .< 2
end
and
@subset(df, :x .> 1, :y .< 2)
Note
@subset treats missing values as false when filtering rows. Unlike DataFrames.subset and other Boolean operations with missing, @subset will not error on missing values, and will only keep true values.
If an expression provided to @subset begins with @byrow, operations are applied "by row" along the data frame. To avoid writing @byrow multiple times, @orderby also allows @byrow to be placed at the beginning of a block of operations. For example, the following two statements are equivalent.
@subset df @byrow begin
@@ -1269,7 +1269,7 @@
│ Int64? String?
─────┼─────────────────
1 │ 1 x
- 2 │ 2 y
Mutate d inplace to add additional columns or keys based on keyword-like arguments and return it. No copies of existing columns are made.
Arguments
d : an AbstractDataFrame, or GroupedDataFrame
i... : transformations of the form :y = f(:x) defining new columns or keys
kwargs...: keyword arguments passed to DataFrames.transform!
Returns
::DataFrame or a GroupedDataFrame
Details
Inputs to @transform! can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation, (:y = f(:x)), or as a series of keyword-like arguments. For example, the following are equivalent:
Mutate d inplace to add additional columns or keys based on keyword-like arguments and return it. No copies of existing columns are made.
Arguments
d : an AbstractDataFrame, or GroupedDataFrame
i... : transformations of the form :y = f(:x) defining new columns or keys
kwargs...: keyword arguments passed to DataFrames.transform!
Returns
::DataFrame or a GroupedDataFrame
Details
Inputs to @transform! can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation, (:y = f(:x)), or as a series of keyword-like arguments. For example, the following are equivalent:
@transform! df begin
:a = :x
:b = :y
end
and
@transform!(df, :a = :x, :b = :y)
@transform! uses the syntax @byrow to wrap transform!ations in the ByRow function wrapper from DataFrames, apply a function row-wise, similar to broadcasting. For example, the call
a transformation which cannot be conveniently expressed using broadcasting.
To avoid writing @byrow multiple times when performing multiple transform!ations by row, @transform! allows @byrow at the beginning of a block of transform!ations (i.e. @byrow begin... end). All transform!ations in the block will operate by row.
Transformations can also use the macro-flag @astable for creating multiple new columns at once and letting transformations share the same name-space. See ? @astable for more details.
In operations, it is also allowed to use AsTable(cols) to work with multiple columns at once, where the columns are grouped together in a NamedTuple. When AsTable(cols) appears in a operation, no other columns may be referenced in the block.
Using AsTable in this way is useful for working with many columns at once programmatically. For example, to compute the row-wise sum of the columns [:a, :b, :c, :d], write
@byrow :c = sum(AsTable([:a, :b, :c, :d]))
This constructs the pairs
AsTable(nms) => ByRow(sum) => :c
AsTable on the right-hand side also allows the use of the special column selectors Not, Between, and regular expressions. For example, to calculate the product of all the columns beginning with the letter "a", write
@byrow :d = prod(AsTable(r"^a"))
@transform! accepts the same keyword arguments as DataFrames.transform! and can be added in two ways. When inputs are given as multiple arguments, they are added at the end after a semi-colon ;, as in
@transform!(gd, :x = :a .- 1; ungroup = false)
When inputs are given in "block" format, the last lines may be written @kwarg key = value, which indicates keyword arguments to be passed to transform! function.
Add additional columns or keys based on keyword-like arguments.
Arguments
d: an AbstractDataFrame, or GroupedDataFrame
i...: transformations defining new columns or keys, of the form :y = f(:x)
kwargs...: keyword arguments passed to DataFrames.transform
Returns
::AbstractDataFrame or ::GroupedDataFrame
Details
Inputs to @transform can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation, (:y = f(:x)), or as a series of keyword-like arguments. For example, the following are equivalent:
Add additional columns or keys based on keyword-like arguments.
Arguments
d: an AbstractDataFrame, or GroupedDataFrame
i...: transformations defining new columns or keys, of the form :y = f(:x)
kwargs...: keyword arguments passed to DataFrames.transform
Returns
::AbstractDataFrame or ::GroupedDataFrame
Details
Inputs to @transform can come in two formats: a begin ... end block, in which case each line in the block is a separate transformation, (:y = f(:x)), or as a series of keyword-like arguments. For example, the following are equivalent:
@transform df begin
:a = :x
:b = :y
end
and
@transform(df, :a = :x, :b = :y)
@transform uses the syntax @byrow to wrap transformations in the ByRow function wrapper from DataFrames, apply a function row-wise, similar to broadcasting. For example, the call
a transformation which cannot be conveniently expressed using broadcasting.
To avoid writing @byrow multiple times when performing multiple transformations by row, @transform allows @byrow at the beginning of a block of transformations (i.e. @byrow begin... end). All transformations in the block will operate by row.
Transformations can also use the macro-flag @astable for creating multiple new columns at once and letting transformations share the same name-space. See ? @astable for more details.
In operations, it is also allowed to use AsTable(cols) to work with multiple columns at once, where the columns are grouped together in a NamedTuple. When AsTable(cols) appears in a operation, no other columns may be referenced in the block.
Using AsTable in this way is useful for working with many columns at once programmatically. For example, to compute the row-wise sum of the columns [:a, :b, :c, :d], write
@byrow :c = sum(AsTable([:a, :b, :c, :d]))
This constructs the pairs
AsTable(nms) => ByRow(sum) => :c
AsTable on the right-hand side also allows the use of the special column selectors Not, Between, and regular expressions. For example, to calculate the product of all the columns beginning with the letter "a", write
@byrow :d = prod(AsTable(r"^a"))
@transform accepts the same keyword arguments as DataFrames.transform! and can be added in two ways. When inputs are given as multiple arguments, they are added at the end after a semi-colon ;, as in
@transform(gd, :x = :a .- 1; ungroup = false)
When inputs are given in "block" format, the last lines may be written @kwarg key = value, which indicates keyword arguments to be passed to transform! function.
@with allows DataFrame columns keys to be referenced as symbols.
Arguments
d : an AbstractDataFrame type
expr : the expression to evaluate in d
Details
@with works by parsing the expression body for all columns indicated by symbols (e.g. :colA). Then, a function is created that wraps the body and passes the columns as function arguments. This function is then called. Operations are efficient because:
A pseudo-anonymous function is defined, so types are stable.
Columns are passed as references, eliminating DataFrame indexing.
@with allows DataFrame columns keys to be referenced as symbols.
Arguments
d : an AbstractDataFrame type
expr : the expression to evaluate in d
Details
@with works by parsing the expression body for all columns indicated by symbols (e.g. :colA). Then, a function is created that wraps the body and passes the columns as function arguments. This function is then called. Operations are efficient because:
A pseudo-anonymous function is defined, so types are stable.
Columns are passed as references, eliminating DataFrame indexing.
The following
@with(d, :a .+ :b .+ 1)
becomes
tempfun(a, b) = a .+ b .+ 1
tempfun(d[!, :a], d[!, :b])
If an expression is wrapped in ^(expr), expr gets passed through untouched. If an expression is wrapped in $(expr), the column is referenced by the variable expr rather than a symbol.
If the expression provide to @with begins with @byrow, the function created by the @with block is broadcasted along the columns of the data frame.
Examples
julia> using DataFramesMeta
julia> y = 3;
@@ -1378,4 +1378,4 @@
2
2
6
-
Note
@with creates a function, so the scope within @with is a local scope. Variables in the parent can be read. Writing to variables in the parent scope differs depending on the type of scope of the parent. If the parent scope is a global scope, then a variable cannot be assigned without using the global keyword. If the parent scope is a local scope (inside a function or let block for example), the global keyword is not needed to assign to that parent scope.
Note
Using AsTable inside @with block is currently not supported.