Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert series data to string as fallback. #2519

Closed
wants to merge 4 commits into from

Conversation

daschw
Copy link
Member

@daschw daschw commented Mar 30, 2020

With this we support the changes in CategoricalArrays.jl without introducing dependencies (JuliaData/CategoricalArrays.jl#256).

using Plots, CategoricalArrays

cstr = categorical(["B", "A", "C"])
cnum = categorical([10, 1, 2])
plot(cstr, cnum)

Before:

ERROR: Cannot convert CategoricalValue{String,UInt32} to series data for plotting
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] prepareSeriesData(::CategoricalValue{String,UInt32}) at /home/daniel/.julia/packages/Plots/cc8wh/src/series.jl:14
 [3] convertToAnyVector(::CategoricalValue{String,UInt32}, ::Dict{Symbol,Any}) at /home/daniel/.julia/packages/Plots/cc8wh/src/series.jl:29
 [4] (::Plots.var"#147#150"{Dict{Symbol,Any}})(::CategoricalValue{String,UInt32}) at ./none:0
 [5] iterate(::Base.Generator{CategoricalArray{String,1,UInt32,String,CategoricalValue{String,UInt32},Union{}},Plots.var"#147#150"{Dict{Symbol,Any}}}) at ./generator.jl:47
 [6] convertToAnyVector(::CategoricalArray{String,1,UInt32,String,CategoricalValue{String,UInt32},Union{}}, ::Dict{Symbol,Any}) at /home/daniel/.julia/packages/Plots/cc8wh/src/series.jl:44
 [7] macro expansion at /home/daniel/.julia/packages/Plots/cc8wh/src/series.jl:128 [inlined]
 [8] apply_recipe(::Dict{Symbol,Any}, ::Type{Plots.SliceIt}, ::CategoricalArray{String,1,UInt32,String,CategoricalValue{String,UInt32},Union{}}, ::CategoricalArray{Int64,1,UInt32,Int64,CategoricalValue{Int64,UInt32},Union{}}, ::Nothing) at /home/daniel/.julia/packages/RecipesBase/G4s6f/src/RecipesBase.jl:279
 [9] _process_userrecipes(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{CategoricalArray{String,1,UInt32,String,CategoricalValue{String,UInt32},Union{}},CategoricalArray{Int64,1,UInt32,Int64,CategoricalValue{Int64,UInt32},Union{}}}) at /home/daniel/.julia/packages/Plots/cc8wh/src/pipeline.jl:85
 [10] _plot!(::Plots.Plot{Plots.GRBackend}, ::Dict{Symbol,Any}, ::Tuple{CategoricalArray{String,1,UInt32,String,CategoricalValue{String,UInt32},Union{}},CategoricalArray{Int64,1,UInt32,Int64,CategoricalValue{Int64,UInt32},Union{}}}) at /home/daniel/.julia/packages/Plots/cc8wh/src/plot.jl:178
 [11] plot(::CategoricalArray{String,1,UInt32,String,CategoricalValue{String,UInt32},Union{}}, ::Vararg{Any,N} where N; kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/daniel/.julia/packages/Plots/cc8wh/src/plot.jl:57
 [12] plot(::CategoricalArray{String,1,UInt32,String,CategoricalValue{String,UInt32},Union{}}, ::CategoricalArray{Int64,1,UInt32,Int64,CategoricalValue{Int64,UInt32},Union{}}) at /home/daniel/.julia/packages/Plots/cc8wh/src/plot.jl:51
 [13] top-level scope at none:0

With this:

┌ Warning: `lastindex(x::CategoricalString)` is deprecated, use `lastindex(String(x))` instead.
│   caller = _split(::CategoricalString{UInt32}, ::Function, ::Int64, ::Bool, ::Array{SubString{CategoricalString{UInt32}},1}) at util.jl:326
└ @ Base ./strings/util.jl:326
┌ Warning: `ncodeunits(x::CategoricalString)` is deprecated, use `ncodeunits(String(x))` instead.
│   caller = findnext(::Base.Fix2{typeof(isequal),Char}, ::CategoricalString{UInt32}, ::Int64) at search.jl:128
└ @ Base ./strings/search.jl:128
┌ Warning: `isvalid(x::CategoricalString, i::Integer)` is deprecated, use `isvalid(String(x), i)` instead.
│   caller = findnext(::Base.Fix2{typeof(isequal),Char}, ::CategoricalString{UInt32}, ::Int64) at search.jl:130
└ @ Base ./strings/search.jl:130
┌ Warning: `lastindex(x::CategoricalString)` is deprecated, use `lastindex(String(x))` instead.
│   caller = SubString at substring.jl:39 [inlined]
└ @ Core ./strings/substring.jl:39
┌ Warning: `ncodeunits(x::CategoricalString)` is deprecated, use `ncodeunits(String(x))` instead.
│   caller = checkbounds at basic.jl:188 [inlined]
└ @ Core ./strings/basic.jl:188
┌ Warning: `isvalid(x::CategoricalString, i::Integer)` is deprecated, use `isvalid(String(x), i)` instead.
│   caller = SubString{CategoricalString{UInt32}}(::CategoricalString{UInt32}, ::Int64, ::Int64) at substring.jl:31
└ @ Base ./strings/substring.jl:31
┌ Warning: `isvalid(x::CategoricalString, i::Integer)` is deprecated, use `isvalid(String(x), i)` instead.
│   caller = SubString{CategoricalString{UInt32}}(::CategoricalString{UInt32}, ::Int64, ::Int64) at substring.jl:32
└ @ Base ./strings/substring.jl:32
┌ Warning: `nextind(x::CategoricalString, i::Int)` is deprecated, use `nextind(String(x), i)` instead.
│   caller = SubString{CategoricalString{UInt32}}(::CategoricalString{UInt32}, ::Int64, ::Int64) at substring.jl:34
└ @ Base ./strings/substring.jl:34
┌ Warning: `iterate(x::CategoricalString, i::Int)` is deprecated, use `iterate(String(x), i)` instead.
│   caller = iterate at substring.jl:72 [inlined]
└ @ Core ./strings/substring.jl:72
┌ Warning: `nextind(x::CategoricalString, i::Int)` is deprecated, use `nextind(String(x), i)` instead.
│   caller = _split(::CategoricalString{UInt32}, ::Function, ::Int64, ::Bool, ::Array{SubString{CategoricalString{UInt32}},1}) at util.jl:329
└ @ Base ./strings/util.jl:329
┌ Warning: `lastindex(x::CategoricalString)` is deprecated, use `lastindex(String(x))` instead.
│   caller = SubString at substring.jl:39 [inlined]
└ @ Core ./strings/substring.jl:39

categorical_new

Note that the categories are not sorted. We did not have this before either. However, before the y axis would have been numeric, because we used to convert categorical arrays of numbers to numeric arrays. That is no longer possible without depending on CategoricalArrays, because a categorical array of numbers is not a subtype of AbstractArray{<:Number} anymore. @nalimilan pointed out in JuliaData/CategoricalArrays.jl#256 (comment) that the new behavior is actually correct.

I played around with automatic sorting of categorical values along the axes with a new attribute sortstrings on a local branch and got it working, but only for single series. If a new series added a new categorical value to the axis, we would have to update all the axis positions of the previous series. This would mean a significant change in the recipe processing pipeline, which I do not want to do at this point. An alternative idea for the future could be do to a common final processing of all series if sortstrings is provided.

I'm not sure, where in the pipeline these warnings above pop-up. We will have to investigate this further. Still I think this is an improvement to the current behavior of erroring, and we should probably release this soon.

cc: @mkborregaard @nalimilan @bkamins

@bkamins
Copy link
Contributor

bkamins commented Mar 30, 2020

Thank you for working on this. I think "not sorting categorical due to problems with handling multiple plots" makes sense (I understand the user can sort what is to be plotted before passing it for plotting - right?).

@mkborregaard
Copy link
Member

Sorting is useful in two cases:

  1. things like boxplots or grouped bar plots, where you want to determine the sort order on the x axis. For those cases we could add a sorting keyword to the signature I guess
  2. for the variable passed to group, which you often don't want to be alphabetic (the current default) but follow a certain sort order. I think that last case still works ?

In other cases user sorting should solve it.

@daschw
Copy link
Member Author

daschw commented Mar 30, 2020

OK, so apparently these convert to string fallbacks don't play well with animations.

However, I was reading @bkamins comment JuliaData/CategoricalArrays.jl#256 (comment) on the order of categories again where he comes to the conclusion

In short Plots.jl currently does not handle this correctly anyway, so:

  • we can either leave things as is (and use a simple recipe)
  • or we can make it correct (which will require some kind of integration between CategoricalArrays.jl and Plots.jl as Plots.jl must be in some way be able to discover the correct order of levels in categorical arrays)

This PR, if it passed tests, would be an implementation of the first bullet point without a recipe.

We can also make it correct (second bullet) without integrating Plots and CategoricalArrays.jl. With #2503 and #2515 we can use the following type recipe:

@recipe function f(::Type{T}, cv::T) where T <: CategoricalValue
    level_strings = string.(levels(cv))
    ticks --> eachindex(level_strings) # EDIT: `:=` would be safer than `-->`
    cv -> levelcode(cv), i -> level_strings[Int(i)]
end

For the example

cstr = categorical(["B", "A", "C"])
cnum = categorical([10, 1, 2])
plot(cstr, cnum)

we get
categorical_recipe

@bkamins
Copy link
Contributor

bkamins commented Mar 30, 2020

must be in some way be able to discover the correct order of levels in categorical arrays

you do not need integration I think - you can just sort the collection of categorical values and it will get correctly ordered (and you do not have to be aware that this is a CategoricalArray).

@daschw
Copy link
Member Author

daschw commented Mar 30, 2020

must be in some way be able to discover the correct order of levels in categorical arrays

That was actually still a quote from your post. However, I agree with your original opinion, because Plots does not want to sort Arrays in general, so it must be aware if an object is a CategoricalArray. For this purpose we can use my recipe from above.

@nalimilan
Copy link

Thanks for working on this! Unfortunately, the deprecation warnings are serious: these methods are going to be removed in the release after the next one (since CategoricalString is going to be deprecated in favor of CategoricalValue{String}, which isn't an AbstractString), so we have to find another solution in the longer term. The recipe you show looks reasonable (though I don't master the details).

BTW, have you tried with CategoricalArrays master? That's the version which is going to drop CategoricalString <: AbstractString.

That was actually still a quote from your post. However, I agree with your original opinion, because Plots does not want to sort Arrays in general, so it must be aware if an object is a CategoricalArray. For this purpose we can use my recipe from above.

Plots doesn't sort arrays, but doesn't it call unique somewhere to get the list of categories to represent? If so, it could use levels instead and categorical values would appear in the right order.

@daschw
Copy link
Member Author

daschw commented Mar 30, 2020

The recipe you show looks reasonable (though I don't master the details).

It tells Plots: whenever you see an AbstractArray of CategoricalValues inspect its first element cv and

  • Set the axis tick positions to eachindex(levels(cv)) (one tick for each level)
  • convert categorical values on the axis to their levelcode as datapoints. (the first function returned)
  • convert tick positions on the axis to a string of the corresponding level for the tick labels. (The second function returned)

BTW, have you tried with CategoricalArrays master?

Yes, everything I post here is with CategoricalArrays master.

Plots doesn't sort arrays, but doesn't it call unique somewhere to get the list of categories to represent? If so, it could use levels instead and categorical values would appear in the right order.

julia> leves(["a", "b"])
ERROR: UndefVarError: leves not defined
Stacktrace:
 [1] top-level scope at REPL[3]:1
 [2] eval(::Module, ::Any) at ./boot.jl:331
 [3] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
 [4] run_backend(::REPL.REPLBackend) at /home/daniel/.julia/packages/Revise/Pcs5V/src/Revise.jl:1073
 [5] top-level scope at none:0

How would it know, when to use unique and when to use levels?

I'm quite convinced that the recipe is the best option we have if we do not want Plots to depend on CategoricalArrays.

@bkamins
Copy link
Contributor

bkamins commented Mar 31, 2020

levels is defined in DataAPI.jl on which for sure you can depend as it is very lightweight.

The difference between unique and levels:

  • levels does not list missing if it is present in the array
  • levels gives you the right order while unique gives you the order of appearance in x

@nalimilan
Copy link

Yes, everything I post here is with CategoricalArrays master.

That's surprising. I asked because I see CategoricalString mentioned in the output you posted, yet it doesn't exist anymore on master. For example, I get this:

julia> lastindex(categorical(["a"])[1])
┌ Warning: `lastindex(x::CategoricalValue{String})` is deprecated, use `lastindex(String(x))` instead.
│   caller = top-level scope at REPL[6]:1
└ @ Core REPL[6]:1
1

@daschw
Copy link
Member Author

daschw commented Mar 31, 2020

OK, thanks for the hint! Let me check again.

@daschw
Copy link
Member Author

daschw commented Mar 31, 2020

You were right, there are no warnings with this PR and CategoricalArrays#master.
One issue with the levels suggestion is that this happens when Plots axes are populated, which is after input data types are processed. So levels would only see Array{String} and we would have the same behavior as we have now with this PR. The other issue is that this PR still has failing tests.

@daschw daschw closed this Jun 30, 2020
@daschw daschw deleted the categorical branch June 30, 2020 14:32
@bicepjai
Copy link

bicepjai commented Apr 4, 2021

Do we have a solution yet on ordering the x axis ticks for violin like categorical plots ?

Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Xeon(R) W-3275M CPU @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake-avx512)

StatsPlots v0.14.19

I get

Cannot convert CategoricalValue{String,UInt32} to series data for plotting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants