Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more descriptive docs + some experiments #108

Merged
merged 23 commits into from
Apr 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
5cfdb56
Add more descriptive docs + some experiments
asinghvi17 Apr 14, 2024
467db0c
Update docs project to include experiment packages
asinghvi17 Apr 14, 2024
7f1b504
Add more benchmark files (still raw)
asinghvi17 Apr 14, 2024
4117333
Update apply return type docs
asinghvi17 Apr 16, 2024
0a4e66f
Update docs/src/paradigms.md
asinghvi17 Apr 16, 2024
a75640b
Update docs/src/paradigms.md
asinghvi17 Apr 16, 2024
6b17cf8
Update docs/src/peculiarities.md
asinghvi17 Apr 19, 2024
c19c315
Add code for `orient` demo
asinghvi17 Apr 20, 2024
8862541
Add examples from issue
asinghvi17 Apr 20, 2024
071e462
Add a true summary figure to the docs
asinghvi17 Apr 20, 2024
d849abb
Import the relevant Chairmarks/BenchmarkTools functions
asinghvi17 Apr 20, 2024
d0767fc
Update Project.toml
asinghvi17 Apr 20, 2024
3bb2606
Write the GeometryOps HackMD call notes to the docs
asinghvi17 Apr 20, 2024
abd70ef
Add MultiFloats
asinghvi17 Apr 20, 2024
76570f7
Add NaturalEarth.jl devbranch when building docs
asinghvi17 Apr 20, 2024
64d83db
make Julia actually execute the code
asinghvi17 Apr 20, 2024
4000fac
Merge branch 'main' into as/docs
asinghvi17 Apr 21, 2024
edc5840
Add Statistics, fix namespacing error
asinghvi17 Apr 21, 2024
4716965
`geometry_providers.jl`: Remove redundancy, add comments
asinghvi17 Apr 21, 2024
76a93d6
`vector_benchmark_plot.jl`: add a comment on top
asinghvi17 Apr 21, 2024
44b91cd
rearrange file
asinghvi17 Apr 21, 2024
9cb714d
Add warning that BoolsAsTypes are not public API
asinghvi17 Apr 21, 2024
f19b574
Merge branch 'main' into as/docs
asinghvi17 Apr 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ jobs:
- uses: julia-actions/setup-julia@v1
with:
version: '1'
- name: Add custom versions of packages
run: julia --project=docs -e 'using Pkg; Pkg.add(PackageSpec(; url = "https://github.com/JuliaGeo/NaturalEarth.jl", rev = "as/scratchspaces"))'
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-docdeploy@v1
env:
Expand Down
182 changes: 182 additions & 0 deletions benchmarks/geometry_providers.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
#=
# Geometry providers

This file benchmarks GeometryOps methods on every GeoInterface.jl implementation we can find, in order to test:
a. genericness, i.e., does GeometryOps work correctly with all GeoInterface.jl implementations?
b. performance, i.e., how does GeometryOps compare to the native implementation?
c. performance issues in the packages' implementations of GeoInterface
=#

# First, we import the providers:
using ArchGDAL, LibGEOS, Shapefile, GeoJSON, WellKnownGeometry, GeometryBasics, GeoInterface, GeoFormatTypes
PROVIDERS = (ArchGDAL, LibGEOS, GeometryBasics, GI.Wrappers)
# Now, we import GeoInterface and GeometryOps,
import GeometryOps as GO, GeoInterface as GI
# Finally, we import some utility benchmarking, plotting and data munging packages!
using BenchmarkTools, Chairmarks, CairoMakie, MakieThemes, DataFrames, Proj
using CoordinateTransformations, Rotations


# Polylabel.jl is a package that finds the "pole of inaccessibility" of a polygon,
# i.e., the point within it that is furthest away from its boundaries.

# It depends on GeometryOps, but in this instance, we'll grab some of its test geometries
# to use.
import Polylabel

# TODO: the reason we change to LibGEOS intermediately here is so that the
# linear rings of the WKG polygons are interpreted correctly. Unfortunately
# that doesn't work when read, which there's an issue up for.
water1 = GeoFormatTypes.WellKnownText(GeoFormatTypes.Geom(), readchomp(joinpath(dirname(dirname(pathof(Polylabel))), "test", "data", "water1.wkt")) |> String) |> x -> GI.convert(LibGEOS, x) |> GO.tuples
water2 = GeoFormatTypes.WellKnownText(GeoFormatTypes.Geom(), readchomp(joinpath(dirname(dirname(pathof(Polylabel))), "test", "data", "water2.wkt")) |> String) |> x -> GI.convert(LibGEOS, x) |> GO.tuples
# To fix these polygons is a complicated task, and even then LibGEOS gets it wrong:
# water1 |> x -> LibGEOS.makeValid(GI.convert(LibGEOS, x)) |> GI.getgeom |> collect |> x -> filter(y -> GI.trait(y) isa Union{GI.PolygonTrait, GI.MultiPolygonTrait}, x) |> first |> GO.tuples # hide

f, a, p = poly(water1; axis = (; title = "water1")); poly(f[1, 2], water2; axis = (; title = "water2")); f
# Now, we rotate the `water1` polygon about its centroid, so we can use it to
# test the time it takes to intersect complex polygons:
water1r = GO.transform(
Translation(GO.centroid(water1)) ∘ LinearMap(Makie.rotmatrix2d(π/2)) ∘ Translation((-).(GO.centroid(water1))),
water1
)
f, a, p = poly(water1; label = "Original")
poly!(water1r; label = "Rotated")
axislegend(a)
f
# WARNING: does not work
@b GO.union($(water1), $(water1r); target = GI.PolygonTrait()) seconds=3
@b LibGEOS.union($(GI.convert(LibGEOS, water1)), $(GI.convert(LibGEOS, water1r))) seconds=3
@b ArchGDAL.union($(GI.convert(ArchGDAL, water1)), $(GI.convert(ArchGDAL, water1r))) seconds=3

poly(GO.union(w1g, w1rg; target = GI.PolygonTrait()))

GI.getgeom(water1, 3) |> GI.trait

# We can benchmark each provider and see if any of them have glaring issues.

water1_centroid_suite = BenchmarkGroup()

for provider in PROVIDERS
@info "Benchmarking $provider"
geom = GI.convert(provider, water1)
water1_centroid_suite[string(provider)] = @be GO.centroid($geom) seconds=3
end


# ## Tables.jl performance in `apply`
#=
This code checks how Tables.jl performs when using `apply`.
We use two sources for this: `Shapefile.jl` and `DataFrames.jl`.
More will be coming in the future!
=#
shp_file = "/Users/anshul/Downloads/ne_10m_admin_0_countries (1)/ne_10m_admin_0_countries.shp"
table = Shapefile.Table(shp_file)
go_df = DataFrame(table)
go_df.geometry = GO.tuples(go_df.geometry);

table_suite = BenchmarkGroup()


ll2moll = Proj.Transformation("+proj=longlat +datum=WGS84", "+proj=moll")

# First, we try reprojecting the geometries using Proj,
reproject_suite = table_suite["reproject"] = BenchmarkGroup(["title:Reproject", "subtitle:All country borders from Natural Earth, 1:10m res."])

reproject_suite["Shapefile.Table"] = @be GO.reproject($table, $ll2moll) seconds=3
reproject_suite["DataFrame (Shapefile)"] = @be GO.reproject($(DataFrame(table)), $ll2moll) seconds=3
reproject_suite["DataFrame (GO)"] = @be GO.reproject($(go_df), $ll2moll) seconds=3
reproject_suite["Shapefile geoms"] = @be GO.reproject($(table.geometry), $ll2moll) seconds=3
reproject_suite["GeometryOps geoms"] = @be GO.reproject($(GO.tuples(table.geometry)), $ll2moll) seconds=3

# then transforming, just to see the difference in runtime
# between calling out to C vs pure Julia,
function _scaleby5(x)
return x .* 5
end

transform_suite = table_suite["transform"] = BenchmarkGroup(["title:Transform", "subtitle:All country borders from Natural Earth, 1:10m res."])
transform_suite["Shapefile.Table"] = @be GO.transform($_scaleby5, $table) seconds=3
transform_suite["DataFrame (Shapefile)"] = @be GO.transform($_scaleby5, $(DataFrame(table))) seconds=3
transform_suite["DataFrame (GO)"] = @be GO.transform($_scaleby5, $(go_df)) seconds=3
transform_suite["Shapefile geoms"] = @be GO.transform($_scaleby5, $(table.geometry)) seconds=3
transform_suite["GeometryOps geoms"] = @be GO.transform($_scaleby5, $(GO.tuples(table.geometry))) seconds=3

# and finally, calling `applyreduce` to find the area of each
# polygon.
area_suite = table_suite["area"] = BenchmarkGroup(["title:Area", "subtitle:All country borders from Natural Earth, 1:10m res."])

area_suite["Shapefile.Table"] = @be GO.area($(table)) seconds=3
area_suite["DataFrame (Shapefile)"] = @be GO.area($(DataFrame(table))) seconds=3
area_suite["DataFrame (GO)"] = @be GO.area($(go_df)) seconds=3
area_suite["Shapefile geoms"] = @be GO.area($(table.geometry)) seconds=3
area_suite["GeometryOps geoms"] = @be GO.area($(GO.tuples(table.geometry))) seconds=3

ts = getproperty.(area_suite["Shapefile.Table"].samples, :time)
boxplot(ones(length(ts)), ts)
violin(ones(length(ts)), ts; npoints = 3500, axis = (; yscale = log10,))


# ## Plotting
function Makie.convert_arguments(::Makie.PointBased, xs, bs::AbstractVector{<: Chairmarks.Benchmark})
ts = getproperty.(Statistics.mean.(bs), :time)
return (xs, ts)
end

function Makie.convert_arguments(::Makie.PointBased, bs::AbstractVector{<: Chairmarks.Benchmark})
ts = getproperty.(Statistics.mean.(bs), :time)
return (1:length(bs), ts)
end

function Makie.convert_arguments(::Makie.SampleBased, b::Chairmarks.Benchmark)
ts = getproperty.(b.samples, :time)
return (ones(length(ts)), ts)
end

function Makie.convert_arguments(::Makie.SampleBased, n::Number, b::Chairmarks.Benchmark)
ts = getproperty.(b.samples, :time)
return (fill(n, length(ts)), ts)
end

function Makie.convert_arguments(::Makie.SampleBased, labels::AbstractVector{<: AbstractString}, bs::AbstractVector{<: Chairmarks.Benchmark})
ts = map(b -> getproperty.(b.samples, :time), bs)
labels =
return flatten
end

function Makie.convert_arguments(::Type{Makie.Errorbars}, xs, bs::AbstractVector{<: Chairmarks.Benchmark})
ts = map(b -> getproperty.(b.samples, :time), bs)
means = map(Statistics.mean, ts)
stds = map(Statistics.std, ts)
return (xs, ts)
end

ks = keys(area_suite) |> collect .|> identity

bs = getindex.((area_suite,), ks)
b_lengths = length.(getproperty.(bs, :samples))
b_timing_flattened = collect(Iterators.flatten(Iterators.map(b -> getproperty.(b.samples, :time), bs)))
k_strings = Iterators.flatten((fill(k, bl) for (k, bl) in zip(ks, b_lengths))) |> collect

f = Figure()
ax = Axis(f[1, 1];
convert_dim_1=Makie.CategoricalConversion(; sortby=nothing),
)
violin!(ax, k_strings, b_timing_flattened .|> log10)
f
ax.yscale = log10
ax.xticklabelrotation = π/12
f


bs = values(area_suite) |> collect .|> identity
labels = ["ST", "DS", "DG", "SG", "GG"]


using AlgebraOfGraphics

boxplot(b1)
boxplot!.(1:5, values(area_suite) |> collect .|> identity)
Makie.current_figure()
Makie.current_axis().yscale = log10

data((; x = labels, y = bs)) * mapping(:y => verbatim, :x, :y) * visual(BoxPlot) |> draw
125 changes: 125 additions & 0 deletions benchmarks/vector_benchmark_plot.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
#=
# `vector-benchmark` result plot

This code plots the results of the `kadyb/vector-benchmark` repository,
and needs the MakieTeX SVG pr for now.

The unique feature (and what takes up so many lines of code) is that
the scatter markers for each language are SVGs of the logo! This
makes the plot eye-catching and allows users to quickly grasp language
wise performance.

Stepwise, here's what is going on:
1. It loads the benchmark data from a CSV file into a DataFrame.
2. It defines color and marker mappings for each package, where the markers are SVG logos of the respective programming languages.
3. It uses the beeswarm function from the SwarmMakie package to create a scatter plot, where the x-axis represents the different benchmark tasks, and the y-axis represents the median execution time (in seconds) on a log scale.
4. The scatter points are colored and marked according to the package and programming language, using the predefined color and marker mappings.
5. It adds a legend to the plot, displaying the package names and their corresponding language logos.

=#

using CairoMakie, MakieTeX, SwarmMakie

using CSV, DataFrames, CategoricalArrays
using DataToolkit

path_to_makietex_datatoml = joinpath(dirname(dirname(@__DIR__)), "MakieTeX", "docs", "Data.toml")
data = DataToolkit.load(path_to_makietex_datatoml)


using DataToolkit, DataFrames, StatsBase
using CairoMakie, SwarmMakie #=beeswarm plots=#, Colors
using MakieTeX # for SVG icons

function svg_icon(name::String)
if name == "go"
icon = d"go-logo-solid::IO"
else
path = "svg/$name.svg"
icon = get(d"file-icons::Dict{String,IO}", path, nothing)
end
if isnothing(icon)
icon = get(d"file-icons-mfixx::Dict{String,IO}", path, nothing)
end
if isnothing(icon)
icon = get(d"file-icons-devopicons::Dict{String,IO}", path, nothing)
end
isnothing(icon) && return missing
return CachedSVG(read(seekstart(icon), String))
end

const colours_vibrant = range(LCHab(60,70,0), stop=LCHab(60,70,360), length=36)
const colours_dim = range(LCHab(25,50,0), stop=LCHab(25,50,360), length=36)

const julia_logo = svg_icon("Julia")
const r_logo = svg_icon("R")
const python_logo = svg_icon("python")

marker_map = Dict(
"geometryops" => julia_logo,
# "gdal-jl" => julia_logo,
"sf" => r_logo,
"terra" => r_logo,
"geos" => r_logo,
"s2" => r_logo,
"geopandas" => python_logo,
)


color_map = Dict(
# R packages
"sf" => Makie.wong_colors()[1],
"s2" => Makie.wong_colors()[5],
"terra" => Makie.wong_colors()[6],
"geos" => Makie.wong_colors()[4],
# Python package
"geopandas" => Makie.wong_colors()[2],
# Julia package
"geometryops" => Makie.wong_colors()[3],
)

path_to_vector_benchmark = "/Users/anshul/git/vector-benchmark"
timings_df = CSV.read(joinpath(path_to_vector_benchmark, "timings.csv"), DataFrame)
replace!(timings_df.package, "sf-project" => "sf", "sf-transform" => "sf")

# now plot

task_ca = CategoricalArray(timings_df.task)

group_marker = [MarkerElement(; color = color_map[package], marker = marker_map[package], markersize = 12) for package in keys(marker_map)]
names_marker = collect(keys(marker_map))
lang_markers = ["R" => r_logo, "Python" => python_logo, "Julia" => julia_logo]
group_package = [MarkerElement(; marker, markersize = 12) for (lang, marker) in lang_markers]
names_package = first.(lang_markers)


f, a, p = beeswarm(
task_ca.refs, timings_df.median;
marker = getindex.((marker_map,), timings_df.package),
color = getindex.((color_map,), timings_df.package),
markersize = 10,
axis = (;
xticks = (1:length(task_ca.pool.levels), task_ca.pool.levels),
xlabel = "Task",
ylabel = "Median time (s)",
yscale = log10,
title = "Benchmark vector operations",
xgridvisible = false,
xminorgridvisible = true,
yminorgridvisible = true,
yminorticks = IntervalsBetween(5),
ygridcolor = RGBA{Float32}(0.0f0,0.0f0,0.0f0,0.05f0),
)
)
leg = Legend(
f[1, 2],
[group_marker, group_package],
[names_marker, names_package],
["Package", "Language"],
tellheight = false,
tellwidth = true,
gridshalign = :left,
)
resize!(f, 650, 450)
a.spinewidth[] = 0.5
f
5 changes: 5 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
[deps]
AccurateArithmetic = "22286c92-06ac-501d-9306-4abd417d9753"
Base64 = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
CairoMakie = "13f3f980-e62b-5c42-98c6-ff1f3baf88f0"
Expand All @@ -10,6 +11,8 @@ DataStructures = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
DocumenterVitepress = "4710194d-e776-4893-9690-8d956a29c365"
DoubleFloats = "497a8b3b-efae-58df-a0af-a86822472b78"
ExactPredicates = "429591f6-91af-11e9-00e2-59fbe8cec110"
GeoDatasets = "ddc7317b-88db-5cb5-a849-8449e5df04f9"
GeoInterface = "cf35fbd7-0cd7-5166-be24-54bfbe79505f"
GeoInterfaceMakie = "0edc0954-3250-4c18-859d-ec71c1660c08"
Expand All @@ -20,7 +23,9 @@ LibGEOS = "a90b1aa1-3769-5649-ba7e-abc5a9d163eb"
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
MakieThemes = "e296ed71-da82-5faf-88ab-0034a9761098"
MultiFloats = "bdf0d083-296b-4888-a5b6-7498122e68a5"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Proj = "c94c279d-25a6-4763-9509-64d165bea63e"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Shapefile = "8e980c4a-a4fe-5da2-b3a7-4b4b0353a2f4"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
7 changes: 7 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,9 @@ withenv("JULIA_DEBUG" => "Literate") do # allow Literate debug output to escape
# TODO: We should probably fix the above in `process_literate_recursive!`.
end

# Now that the Literate stuff is done, we also download the call notes from HackMD:
download("https://hackmd.io/kpIqAR8YRJOZQDJjUKVAUQ/download", joinpath(@__DIR__, "src", "call_notes.md"))

# Finally, make the docs!
makedocs(;
modules=[GeometryOps],
Expand All @@ -91,6 +94,10 @@ makedocs(;
pages=[
"Introduction" => "introduction.md",
"API Reference" => "api.md",
"Explanations" => [
"Paradigms" => "paradigms.md",
"Peculiarities" => "peculiarities.md",
],
"Source code" => literate_pages,
],
warnonly = true,
Expand Down
Loading
Loading