Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed tilde and updated dependencies #7

Merged
merged 12 commits into from
Aug 7, 2023
2 changes: 1 addition & 1 deletion .github/workflows/Documenter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
- uses: julia-actions/setup-julia@v1
- uses: julia-actions/cache@v1
with:
cache-registries: "true"
cache-registries: "false"
- name: Install documentation dependencies
run: julia --project=docs -e 'using Pkg; pkg"dev ."; Pkg.instantiate()'
- name: Build and deploy
Expand Down
7 changes: 7 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# TidierCats.jl updates

## v0.1.1 - 2023-08-06
- Added the `TidierCats.jl` functions to the `TidierData.jl` list of `not_vectorized[]` functions, which means that the user does *not* need to explicitly prefix them with a `~` when used inside of a `@mutate()` within `TidierData.jl`. Thus, all the `~` prefixes have been removed from the examples.

## v0.1.0 - Initial commit
- Released to Julia general registry
3 changes: 1 addition & 2 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
name = "TidierCats"
uuid = "79ddc9fe-4dbf-4a56-a832-df41fb326d23"
authors = ["Daniel Rizk"]
version = "0.1.0"
version = "0.1.1"

[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Reexport = "189a3867-3050-52da-a836-e630ba90ab69"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
TidierData = "fe2206b3-d496-4ee9-a338-6a095c4ece80"

[compat]
CategoricalArrays = "0.10"
Expand Down
22 changes: 10 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@

`TidierCats.jl` has one main goal: to implement forcats's straightforward syntax and of ease of use while working with categorical variables for Julia users. While this package was develeoped to work seamelessly with `Tidier.jl` fucntions and macros, it can also work as a indepentenly as a standalone package. This package is powered by CateogricalArrays.jl


## What functions does TidierCats.jl support?

- `cat_rev()`
Expand All @@ -25,7 +24,6 @@
- `cat_lump_prop()`
- `as_categorical()`


## Installation

For the development version:
Expand Down Expand Up @@ -59,7 +57,7 @@ This function changes the order of levels in a categorical variable. It accepts

```julia
custom_order = @chain df begin
@mutate(CatVar = ~cat_relevel(CatVar, ["Zilch", "Medium", "High", "Low"]))
@mutate(CatVar = cat_relevel(CatVar, ["Zilch", "Medium", "High", "Low"]))
end

print(levels(df[!,:CatVar]))
Expand All @@ -76,7 +74,7 @@ This function reverses the order of levels in a categorical variable. It only re

```julia
reversed_order = @chain df begin
@mutate(CatVar = ~cat_rev(CatVar))
@mutate(CatVar = cat_rev(CatVar))
end

print(levels(df[!,:CatVar]))
Expand Down Expand Up @@ -109,7 +107,7 @@ end

```julia
orderedbyfrequency = @chain df begin
@mutate(CatVar = ~cat_infreq(CatVar))
@mutate(CatVar = cat_infreq(CatVar))
end

print(levels(df[!,:CatVar]))
Expand All @@ -126,7 +124,7 @@ This function lumps the least frequent levels into a new "Other" level. It accep

```julia
lumped_cats = @chain df begin
@mutate(CatVar = ~cat_lump(CatVar,2))
@mutate(CatVar = cat_lump(CatVar,2))
end

print(levels(df[!,:CatVar]))
Expand All @@ -149,11 +147,11 @@ df3 = DataFrame(
)

df4 = @chain df3 begin
@mutate(cat_var= ~cat_reorder(cat_var, order_var, "median" ))
@mutate(cat_var= cat_reorder(cat_var, order_var, "median" ))
end

@chain df3 begin
@mutate(catty = ~as_categorical(cat_var))
@mutate(catty = as_categorical(cat_var))
@group_by(cat_var)
@summarise(median = median(order_var))
end
Expand All @@ -179,7 +177,7 @@ This function collapses levels in a categorical variable according to a specifie

```julia
df5 = @chain df begin
@mutate(CatVar = ~cat_collapse(CatVar, Dict("Low" => "bad", "Zilch" => "bad")))
@mutate(CatVar = cat_collapse(CatVar, Dict("Low" => "bad", "Zilch" => "bad")))
end

@chain df begin
Expand Down Expand Up @@ -215,7 +213,7 @@ This function converts a standard Julia array to a categorical array. The only a
test = DataFrame( w = ["A", "B", "C", "D"])

@chain test begin
@mutate(w = ~as_categorical(w))
@mutate(w = as_categorical(w))
end
```

Expand All @@ -234,7 +232,7 @@ This function wil lump any cargory with less than the minimum number of entries

```julia
lumpedbymin = @chain df begin
@mutate(CatVar = ~cat_lump_min(CatVar, 14))
@mutate(CatVar = cat_lump_min(CatVar, 14))
end

print(levels(df[!,:CatVar]))
Expand All @@ -252,7 +250,7 @@ This function wil lump any cargory with less than the minimum proportion and rec

```julia
lumpedbyprop = @chain df begin
@mutate(CatVar = ~cat_lump_prop(CatVar, .25, "new name"))
@mutate(CatVar = cat_lump_prop(CatVar, .25, "new name"))
end

print(levels(df[!,:CatVar]))
Expand Down
8 changes: 4 additions & 4 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
[deps]
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
Chain = "8be319e6-bccf-4806-a6f7-6fae938471bc"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DocumenterMarkdown = "997ab1e6-3595-5248-9280-8efb232c3433"
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Tidier = "f0413319-3358-4bb0-8e7c-0c83523a93bd"
TidierData = "fe2206b3-d496-4ee9-a338-6a095c4ece80"
TidierCats = "79ddc9fe-4dbf-4a56-a832-df41fb326d23"

[compat]
TidierData = ">=0.9.2"
57 changes: 27 additions & 30 deletions docs/examples/UserGuide/supported_functions.jl
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
using Tidier
using TidierData
using TidierCats
using CategoricalArrays
using Random

Random.seed!(10)
Expand All @@ -9,7 +8,6 @@ categories = ["High", "Medium", "Low", "Zilch"]

random_indices = rand(1:length(categories), 57)


df = DataFrame(
ID = 1:57,
CatVar = categorical([categories[i] for i in random_indices], levels = categories)
Expand All @@ -20,27 +18,27 @@ first(df, 5)
# This function changes the order of levels in a categorical variable. It accepts two arguments - a column name and an array of levels in the desired order.

custom_order = @chain df begin
@mutate(CatVar = ~cat_relevel(CatVar, ["Zilch", "Medium", "High", "Low"]))
@mutate(CatVar = cat_relevel(CatVar, ["Zilch", "Medium", "High", "Low"]))
end

print(levels(df[!,:CatVar]))
print(levels(df.CatVar))

# and

print(levels(custom_order[!,:CatVar]))
print(levels(custom_order.CatVar))


# ## `cat_rev()`
# This function reverses the order of levels in a categorical variable. It only requires one argument - the column name whose levels are to be reversed
reversed_order = @chain df begin
@mutate(CatVar = ~cat_rev(CatVar))
@mutate(CatVar = cat_rev(CatVar))
end

print(levels(df[!,:CatVar]))
print(levels(df.CatVar))

# and

print(levels(reversed_order[!,:CatVar]))
print(levels(reversed_order.CatVar))

# ## `cat_infreq()`
# This function reorders levels of a categorical variable based on their frequencies, with most frequent level first. The single argument is column name
Expand All @@ -50,14 +48,14 @@ print(levels(reversed_order[!,:CatVar]))
end

orderedbyfrequency = @chain df begin
@mutate(CatVar = ~cat_infreq(CatVar))
@mutate(CatVar = cat_infreq(CatVar))
end

print(levels(df[!,:CatVar]))
print(levels(df.CatVar))

# and

print(levels(orderedbyfrequency[!,:CatVar]))
print(levels(orderedbyfrequency.CatVar))


@chain df begin
Expand All @@ -68,14 +66,14 @@ end
# This function lumps the least frequent levels into a new "Other" level. It accepts two arguments - a column name and an integer specifying the number of levels to keep.

lumped_cats = @chain df begin
@mutate(CatVar = ~cat_lump(CatVar,2))
@mutate(CatVar = cat_lump(CatVar,2))
end

print(levels(df[!,:CatVar]))
print(levels(df.CatVar))

# and

print(levels(lumped_cats[!,:CatVar]))
print(levels(lumped_cats.CatVar))


@chain lumped_cats begin
Expand All @@ -91,43 +89,42 @@ df3 = DataFrame(
)

df4 = @chain df3 begin
@mutate(cat_var= ~cat_reorder(cat_var, order_var, "median" ))
@mutate(cat_var= cat_reorder(cat_var, order_var, "median" ))
end


print(levels(df3[!,:cat_var]))
print(levels(df3.cat_var))

# and

print(levels(df4[!,:cat_var]))
print(levels(df4.cat_var))


@chain df3 begin
@mutate(catty = ~as_categorical(cat_var))
@mutate(catty = as_categorical(cat_var))
@group_by(catty)
#@summarise(median = median(order_var))
end

# ## `cat_collapse()`
# This function collapses levels in a categorical variable according to a specified mapping. It requires two arguments - a categorical column and a dictionary that maps original levels to new ones.

df5 = @chain df begin
@mutate(CatVar = ~cat_collapse(CatVar, Dict("Low" => "bad", "Zilch" => "bad")))
@mutate(CatVar = cat_collapse(CatVar, Dict("Low" => "bad", "Zilch" => "bad")))
end

print(levels(df[!,:CatVar]))
print(levels(df.CatVar))

# and

print(levels(df5[!,:CatVar]))
print(levels(df5.CatVar))

# ## `as_categorical()`
# This function converts a standard Julia array to a categorical array. The only argument it needs is the colunn name to be converted.

test = DataFrame( w = ["A", "B", "C", "D"])

@chain test begin
@mutate(w = ~as_categorical(w))
@mutate(w = as_categorical(w))
end

# ## `cat_lump_min()`
Expand All @@ -137,28 +134,28 @@ end
@count(CatVar)
end
lumpedbymin = @chain df begin
@mutate(CatVar = ~cat_lump_min(CatVar, 14))
@mutate(CatVar = cat_lump_min(CatVar, 14))
end

print(levels(df[!,:CatVar]))
print(levels(df.CatVar))

# and

print(levels(lumpedbymin[!,:CatVar]))
print(levels(lumpedbymin.CatVar))

# ## `cat_lump_min()`
# This function wil lump any cargory with less than the minimum proportion and recateogrize it as "Other" as the default, or a category name chosen by the user

lumpedbyprop = @chain df begin
@mutate(CatVar = ~cat_lump_prop(CatVar, .25, "wow"))
@mutate(CatVar = cat_lump_prop(CatVar, .25, "wow"))
end


print(levels(df[!,:CatVar]))
print(levels(df.CatVar))

# and

print(levels(lumpedbyprop[!,:CatVar]))
print(levels(lumpedbyprop.CatVar))


# ## `cat_na_value_to_level()`
Expand Down
6 changes: 3 additions & 3 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
using Documenter, DocumenterMarkdown
using Tidier, TidierCats
using CategoricalArrays
using TidierCats

DocTestMeta = quote
using Tidier, TidierCats, DataFrames, Chain, Statistics
using TidierData, TidierCats, Statistics
end

DocMeta.setdocmeta!(TidierCats,
:DocTestSetup,
DocTestMeta;
Expand Down
25 changes: 13 additions & 12 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
<img src="assets/TidierCats\_logo.png" align="left" style="padding-right:10px"; width="150"></img>

## TidierCats
## TidierCats.jl

The goal of this package is to bring the convenience and simple usability of Forcats in R to Julia. This package was designed to work with Tidier.jl, but can also work independently.
The goal of this package is to bring the convenience and simple usability of `forcats` in R to Julia. This package was designed to work with `Tidier.jl` but can also work independently.

This package re-exports `CategoricalArrays.jl`.

This package includes:
In addition, this package includes:

- `cat_rev`
- `cat_relevel`
- `cat_infreq`
- `cat_lump`
- `cat_reorder`
- `cat_collapse`
- `cat_lump_min`
- `cat_lump_prop`
- `as_categorical`
- `cat_rev()`
- `cat_relevel()`
- `cat_infreq()`
- `cat_lump()`
- `cat_reorder()`
- `cat_collapse()`
- `cat_lump_min()`
- `cat_lump_prop()`
- `as_categorical()`
3 changes: 2 additions & 1 deletion docs/src/reference.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
```@meta
DocTestSetup= quote
DocTestSetup = quote
using TidierData
using TidierCats
end
```
Expand Down
7 changes: 0 additions & 7 deletions src/TidierCats.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,6 @@ using Reexport
export cat_rev, cat_relevel, cat_infreq, cat_lump, cat_reorder, cat_collapse, cat_lump_min, cat_lump_prop, as_categorical
include("catsdocstrings.jl")

function __init__()
try
append!(Main.TidierData.not_vectorized[], [:cat_rev, :cat_relevel, :cat_infreq, :cat_lump, :cat_reorder, :cat_collapse, :cat_lump_min, :cat_lump_prop, :as_categorical])
catch
end
end

"""
$docstring_cat_rev
"""
Expand Down