-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathREADME.jmd
127 lines (95 loc) · 4.04 KB
/
README.jmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
---
title : SortingLab README
author : Dai ZJ
date: 2024--09-21
weave_options:
out_path : README.MD
doctype : github
---
# SortingLab
An alternative implementation of sorting algorithms and APIs. The ultimate aim is to contribute back to Julia base or SortingAlgorithms.jl. However, there is commitment to keep this package's API stable and supported, so other developers can rely on the implementation and API here.
# Faster Sort and Sortperm
The main function exported by SortingLab is `fsort` and `fsortperm` which generally implements faster algorithms than `sort` and `sortperm` for `CategoricalArrays.CategoricalVector`, `Vector{T}`, `Vector{Union{String, Missing}}` where `T` is
**Update Sep'2024**: SortingLab.jl used to be faster than base on integer sorting which is no longer the case! Well done base!
**Note**: The reason why we restrict the type to `Vector` is that SortingLab.jl assumes something about memory layout and hence `Vector` provides that guarantee in the types supported.
## Usage
```julia
using SortingLab;
using Test
N = 1_000_000;
K = 100;
svec = rand("id".*string.(1:N÷K, pad=10), N);
svec_sorted = fsort(svec);
@test issorted(svec_sorted)
@test issorted(svec) == false
```
```julia
# faster string sortperm
sorted_idx = fsortperm(svec)
issorted(svec[sorted_idx]) #true
# in place string sort
fsort!(svec);
issorted(svec) # true
```
```julia
# CategoricalArray sort
using CategoricalArrays
pools = "id".*string.(1:100,3);
byvec = CategoricalArray{String, 1}(rand(UInt32(1):UInt32(length(pools)), N), CategoricalPool(pools, false));
byvec = compress(byvec);
byvec_sorted = fsort(byvec);
@test issorted(byvec_sorted)
```
### Sorting `Vector{Union{T, Missing}}`
For vectors that contain `missing`, the `sort` and `sortperm` performance is often sub-optimal in `Base` and is not supported in `SortingAlgorithms.jl`'s radixsort implementation. This is solved by `SortingLab.jl` `fsort`, see Benchmarks Section
```julia
using Test
using Missings: allowmissing
x = allowmissing(rand(1:10_000, 1_000_000))
x[rand(1:length(x), 100_000)] .= missing
using SortingLab
@test isequal(fsort(x), sort(x))
```
## Benchmarks
![Base.sort vs SortingLab.radixsort](benchmarks/sort_vs_radixsort.png)
#![Base.sort vs SortingLab.radixsort](benchmarks/sortperm_vs_fsortperm.png)
## Benchmarking code
```julia
using SortingLab;
using BenchmarkTools;
import Random: randstring
using Test
using Missings: allowmissing
using Plots, StatsPlots
N = 1_000_000;
K = 100;
# String Sort
svec = rand("id".*string.(1:N÷K, pad=10), N);
sort_id_1m = @belapsed sort($svec);
radixsort_id_1m = @belapsed radixsort($svec);
sortperm_id_1m = @belapsed sortperm($svec);
fsortperm_id_1m = @belapsed fsortperm($svec);
rsvec = rand([randstring(rand(1:32)) for i = 1:N÷K], N);
sort_r_1m = @belapsed sort($rsvec);
radixsort_r_1m = @belapsed radixsort($rsvec);
sortperm_r_1m = @belapsed sortperm($rsvec);
fsortperm_r_1m = @belapsed fsortperm($rsvec);
groupedbar(
repeat(["IDs", "Random len 32"], inner=2),
[sort_id_1m, radixsort_id_1m, sort_r_1m, radixsort_r_1m],
group = repeat(["Base.sort","SortingLab.radixsort"], outer = 2),
title = "Strings sort (1m rows): Base vs SortingLab")
savefig("benchmarks/sort_vs_radixsort.png")
groupedbar(
repeat(["IDs", "Random len 32"], inner=2),
[sortperm_id_1m, fsortperm_id_1m, sortperm_r_1m, fsortperm_r_1m],
group = repeat(["Base.sortperm","SortingLab.fsortperm"], outer = 2),
title = "Strings sortperm (1m rows): Base vs SortingLab")
savefig("benchmarks/sortperm_vs_fsortperm.png")
```
# Similar package
https://github.com/JuliaCollections/SortingAlgorithms.jl
# Build status
[![Build Status](https://travis-ci.org/xiaodaigh/SortingLab.jl.svg?branch=master)](https://travis-ci.org/xiaodaigh/SortingLab.jl)
[![Coverage Status](https://coveralls.io/repos/xiaodaigh/SortingLab.jl/badge.svg?branch=master&service=github)](https://coveralls.io/github/xiaodaigh/SortingLab.jl?branch=master)
[![codecov.io](http://codecov.io/github/xiaodaigh/SortingLab.jl/coverage.svg?branch=master)](http://codecov.io/github/xiaodaigh/SortingLab.jl?branch=master)