Skip to content

Commit

Permalink
Use doctests throughout documentation, and add missing functions to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
cjprybol committed Nov 14, 2017
1 parent 8fd0851 commit 202c42c
Show file tree
Hide file tree
Showing 19 changed files with 1,125 additions and 333 deletions.
11 changes: 3 additions & 8 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,16 +19,11 @@ makedocs(
"Reshaping" => "man/reshaping_and_pivoting.md",
"Sorting" => "man/sorting.md",
"Categorical Data" => "man/categorical.md",
"Querying frameworks" => "man/querying_frameworks.md",
"Querying frameworks" => "man/querying_frameworks.md"
],
"API" => Any[
"Main types" => "lib/maintypes.md",
"Utilities" => "lib/utilities.md",
"Data manipulation" => "lib/manipulation.md",
],
"About" => Any[
"Release Notes" => "NEWS.md",
"License" => "LICENSE.md",
"Types" => "lib/types.md",
"Functions" => "lib/functions.md"
]
]
)
Expand Down
23 changes: 0 additions & 23 deletions docs/src/LICENSE.md

This file was deleted.

Empty file removed docs/src/NEWS.md
Empty file.
29 changes: 24 additions & 5 deletions docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,40 @@
# DataFrames Documentation Outline
# DataFrames.jl

Welcome to the DataFrames documentation! This resource aims to teach you everything you need
to know to get up and running with tabular data manipulation using the DataFrames.jl package
and the Julia language. If there is something you expect DataFrames to be capable of, but
cannot figure out how to do, please reach out with questions in Domains/Data on
[Discourse](https://discourse.julialang.org/new-topic?title=[DataFrames%20Question]:%20&body=%23%20Question:%0A%0A%23%20Dataset%20(if%20applicable):%0A%0A%23%20Minimal%20Working%20Example%20(if%20applicable):%0A&category=Domains/Data&tags=question).
Please report bugs by
[opening an issue](https://github.com/JuliaData/DataFrames.jl/issues/new). You can follow
the [**source**]() links throughout the documentation to jump right to the
source files on GitHub to make pull requests for improving the documentation and function
capabilities. Please review
[DataFrames contributing guidelines](https://github.com/JuliaData/DataFrames.jl/blob/master/CONTRIBUTING.md)
before submitting your first PR! Information on specific versions can be found on the [Release page](https://github.com/JuliaData/DataFrames.jl/releases).

## Package Manual

```@contents
Pages = ["man/getting_started.md", "man/joins.md", "man/split_apply_combine.md", "man/reshaping_and_pivoting.md", "man/sorting.md", "man/categorical.md", "man/querying_frameworks.md"]
Pages = ["man/getting_started.md",
"man/joins.md",
"man/split_apply_combine.md",
"man/reshaping_and_pivoting.md",
"man/sorting.md",
"man/categorical.md",
"man/querying_frameworks.md"]
Depth = 2
```

## API

```@contents
Pages = ["lib/maintypes.md", "lib/manipulation.md", "lib/utilities.md"]
Pages = ["lib/types.md", "lib/functions.md"]
Depth = 2
```

## Documentation Index
## Index

```@index
Pages = ["lib/maintypes.md", "lib/manipulation.md", "lib/utilities.md"]
Pages = ["lib/types.md", "lib/functions.md"]
```
54 changes: 54 additions & 0 deletions docs/src/lib/functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
```@meta
CurrentModule = DataFrames
```

# Functions

```@index
Pages = ["functions.md"]
```

## Grouping, Joining, and Split-Apply-Combine

```@docs
aggregate
by
colwise
groupby
join
melt
stack
unstack
stackdf
meltdf
```

## Basics

```@docs
categorical!
combine
completecases
deleterows!
describe
dropnull
dropnull!
eachcol
eachrow
eltypes
head
names
names!
nonunique
nullable!
order
rename!
rename
show
showcols
size
sort
sort!
tail
unique!
```
25 changes: 0 additions & 25 deletions docs/src/lib/manipulation.md

This file was deleted.

7 changes: 5 additions & 2 deletions docs/src/lib/maintypes.md → docs/src/lib/types.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,17 @@
CurrentModule = DataFrames
```

# Main Types
# Types

```@index
Pages = ["maintypes.md"]
Pages = ["types.md"]
```

```@docs
AbstractDataFrame
DataFrame
DataFrameRow
GroupApplied
GroupedDataFrame
SubDataFrame
```
26 changes: 0 additions & 26 deletions docs/src/lib/utilities.md

This file was deleted.

145 changes: 122 additions & 23 deletions docs/src/man/categorical.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,52 +2,151 @@

Often, we have to deal with factors that take on a small number of levels:

```julia
v = ["Group A", "Group A", "Group A",
"Group B", "Group B", "Group B"]
```jldoctest categorical
julia> v = ["Group A", "Group A", "Group A", "Group B", "Group B", "Group B"]
6-element Array{String,1}:
"Group A"
"Group A"
"Group A"
"Group B"
"Group B"
"Group B"
```

The naive encoding used in an `Array` represents every entry of this vector as a full string. In contrast, we can represent the data more efficiently by replacing the strings with indices into a small pool of levels. This is what the `CategoricalArray` type does:

```julia
cv = CategoricalArray(["Group A", "Group A", "Group A",
"Group B", "Group B", "Group B"])
```jldoctest categorical
julia> using CategoricalArrays
julia> cv = CategoricalArray(v)
6-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
"Group A"
"Group A"
"Group A"
"Group B"
"Group B"
"Group B"
```

`CategoricalArrays` support missing values via the `Nulls` package.

```julia
using Nulls
cv = CategoricalArray(["Group A", null, "Group A",
"Group B", "Group B", null])
```jldoctest categorical
julia> using Nulls
julia> cv = CategoricalArray(["Group A", null, "Group A",
"Group B", "Group B", null])
6-element CategoricalArrays.CategoricalArray{Union{Nulls.Null, String},1,UInt32}:
"Group A"
null
"Group A"
"Group B"
"Group B"
null
```

In addition to representing repeated data efficiently, the `CategoricalArray` type allows us to determine efficiently the allowed levels of the variable at any time using the `levels` function (note that levels may or may not be actually used in the data):

```julia
levels(cv)
```jldoctest categorical
julia> levels(cv)
2-element Array{String,1}:
"Group A"
"Group B"
```

The `levels!` function also allows changing the order of appearance of the levels, which can be useful for display purposes or when working with ordered variables.

By default, a `CategoricalArray` is able to represent 2<sup>32</sup>differents levels. You can use less memory by calling the `compact` function:
```jldoctest categorical
julia> levels!(cv, ["Group B", "Group A"]);
julia> levels(cv)
2-element Array{String,1}:
"Group B"
"Group A"
julia> sort(cv)
6-element CategoricalArrays.CategoricalArray{Union{Nulls.Null, String},1,UInt32}:
"Group B"
"Group B"
"Group A"
"Group A"
null
null
```julia
cv = compact(cv)
```

Often, you will have factors encoded inside a DataFrame with `Array` columns instead of `CategoricalArray` columns. You can do conversion of a single column using the `categorical` function:
By default, a `CategoricalArray` is able to represent 2<sup>32</sup>differents levels. You can use less memory by calling the `compress` function:

```jldoctest categorical
julia> cv = compress(cv)
6-element CategoricalArrays.CategoricalArray{Union{Nulls.Null, String},1,UInt8}:
"Group A"
null
"Group A"
"Group B"
"Group B"
null
```julia
cv = categorical(v)
```

Or you can edit the columns of a `DataFrame` in-place using the `categorical!` function:
Often, you will have factors encoded inside a DataFrame with `Array` columns instead of
`CategoricalArray` columns. You can convert one or more columns of the DataFrame using the
`categorical!` function, which modifies the input DataFrame in-place.

```jldoctest categorical
julia> using DataFrames
julia> df = DataFrame(A = ["A", "B", "C", "D", "D", "A"],
B = ["X", "X", "X", "Y", "Y", "Y"])
6×2 DataFrames.DataFrame
│ Row │ A │ B │
├─────┼───┼───┤
│ 1 │ A │ X │
│ 2 │ B │ X │
│ 3 │ C │ X │
│ 4 │ D │ Y │
│ 5 │ D │ Y │
│ 6 │ A │ Y │
julia> eltypes(df)
2-element Array{Type,1}:
String
String
julia> categorical!(df, :A) # change the column `:A` to be categorical
6×2 DataFrames.DataFrame
│ Row │ A │ B │
├─────┼───┼───┤
│ 1 │ A │ X │
│ 2 │ B │ X │
│ 3 │ C │ X │
│ 4 │ D │ Y │
│ 5 │ D │ Y │
│ 6 │ A │ Y │
julia> eltypes(df)
2-element Array{Type,1}:
CategoricalArrays.CategoricalString{UInt32}
String
julia> categorical!(df) # change all columns to be categorical
6×2 DataFrames.DataFrame
│ Row │ A │ B │
├─────┼───┼───┤
│ 1 │ A │ X │
│ 2 │ B │ X │
│ 3 │ C │ X │
│ 4 │ D │ Y │
│ 5 │ D │ Y │
│ 6 │ A │ Y │
julia> eltypes(df)
2-element Array{Type,1}:
CategoricalArrays.CategoricalString{UInt32}
CategoricalArrays.CategoricalString{UInt32}
```julia
df = DataFrame(A = [1, 1, 1, 2, 2, 2],
B = ["X", "X", "X", "Y", "Y", "Y"])
categorical!(df, [:A, :B])
```

Using categorical arrays is important for working with the [GLM package](https://github.com/JuliaStats/GLM.jl). When fitting regression models, `CategoricalArray` columns in the input are translated into 0/1 indicator columns in the `ModelMatrix` with one column for each of the levels of the `CategoricalArray`. This allows one to analyze categorical data efficiently.
Expand Down
Loading

0 comments on commit 202c42c

Please sign in to comment.