Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tables.jl support #3104

Merged
merged 18 commits into from
Oct 27, 2022
Merged

Tables.jl support #3104

merged 18 commits into from
Oct 27, 2022

Conversation

trulsf
Copy link
Contributor

@trulsf trulsf commented Oct 4, 2022

Support for solution tables supporting the Tables.jl interface for JuMP variable containers. Currently has support for Array, DenseAxisArray, SparseAxisArray.

Closes #3096

src/JuMP.jl Outdated Show resolved Hide resolved
@odow
Copy link
Member

odow commented Oct 14, 2022

So I took a look at this. Any reason that we can't just return a Vector{NamedTuple}? This satisfies the tables interface without us needing the dependency, and the code is pretty trivial:

using JuMP
model = Model()
@variable(model, x[1:2, 1:2])
@variable(model, y[1:2, 1:2], container = DenseAxisArray)
@variable(model, z[i=1:3, j=1:3; isodd(i+j)], container = SparseAxisArray)

function _row_iterator(x::Union{Array,Containers.DenseAxisArray})
    return zip(eachindex(x), Iterators.product(axes(x)...))
end

function _row_iterator(x::Containers.SparseAxisArray)
    return zip(eachindex(x.data), keys(x.data))
end

function table(x, name::Symbol, col_names::Symbol...)
    return table(identity, x, name, col_names...)
end

function table(
    f::Function, 
    x::Union{Array,Containers.DenseAxisArray,Containers.SparseAxisArray}, 
    name::Symbol, 
    col_names::Symbol...,
)
    C = (col_names..., name)
    return [NamedTuple{C}((args..., f(x[i]))) for (i, args) in _row_iterator(x)]
end

julia> for row in table(JuMP.name, x, :value, :A, :B)
           @show row
       end
row = (A = 1, B = 1, value = "x[1,1]")
row = (A = 2, B = 1, value = "x[2,1]")
row = (A = 1, B = 2, value = "x[1,2]")
row = (A = 2, B = 2, value = "x[2,2]")

julia> for row in table(JuMP.name, y, :value, :A, :B)
           @show row
       end
row = (A = 1, B = 1, value = "y[1,1]")
row = (A = 2, B = 1, value = "y[2,1]")
row = (A = 1, B = 2, value = "y[1,2]")
row = (A = 2, B = 2, value = "y[2,2]")

julia> for row in table(z, :value, :A, :B)
           @show row
       end
row = (A = 3, B = 2, value = z[3,2])
row = (A = 1, B = 2, value = z[1,2])
row = (A = 2, B = 1, value = z[2,1])
row = (A = 2, B = 3, value = z[2,3])

julia> import PrettyTables

julia> PrettyTables.pretty_table(table(z, :value, :A, :B))
┌───────┬───────┬─────────────┐
│     A │     B │       value │
│ Int64 │ Int64 │ VariableRef │
├───────┼───────┼─────────────┤
│     32 │      z[3,2] │
│     12 │      z[1,2] │
│     21 │      z[2,1] │
│     23 │      z[2,3] │
└───────┴───────┴─────────────┘

@trulsf
Copy link
Contributor Author

trulsf commented Oct 14, 2022

I was not aware that a vector of NamedTuple implemented the Tables.jl interface, but it makes sense. I should read the docs more carefully next time :)

The only problem with your suggested implementation is that you return an array of NamedTuples instead of a a vector for Array and DenseAxisArray, but it is fixed by e.g.

return vec([NamedTuple{C}((args..., f(x[i]))) for (i, args) in _row_iterator(x)])

(Maybe more efficient to fix it in the iterator?)

I can update the PR with your code or is it more efficient that you take this further if it is to be added to JuMP?

@odow
Copy link
Member

odow commented Oct 14, 2022 via email

@codecov
Copy link

codecov bot commented Oct 16, 2022

Codecov Report

Base: 97.64% // Head: 97.64% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (139ab15) compared to base (8b774fd).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3104   +/-   ##
=======================================
  Coverage   97.64%   97.64%           
=======================================
  Files          32       33    +1     
  Lines        4330     4377   +47     
=======================================
+ Hits         4228     4274   +46     
- Misses        102      103    +1     
Impacted Files Coverage Δ
src/Containers/Containers.jl 90.90% <ø> (ø)
src/Containers/tables.jl 100.00% <100.00%> (ø)
src/copy.jl 96.15% <0.00%> (+0.43%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@odow odow marked this pull request as ready for review October 16, 2022 20:46
Copy link
Member

@odow odow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trulsf I tidied things up a little.

Questions:

  • Should this go in JuMP.table or JuMP.Containers.table? It seems like JuMP automatically exporting table could lead to clashes with other packages?
  • Is value_name, col_names the correct order? It seems to lead to a counter-intuitive ordering of the resulting named tuple. What if we just had names...?

We should also think about what documentation this needs:

julia> using JuMP, DataFrames

julia> x = Containers.@container([i = 1:2, j = [:A, :B]], (i, j))
2-dimensional DenseAxisArray{Tuple{Int64, Symbol},2,...} with index sets:
    Dimension 1, Base.OneTo(2)
    Dimension 2, [:A, :B]
And data, a 2×2 Matrix{Tuple{Int64, Symbol}}:
 (1, :A)  (1, :B)
 (2, :A)  (2, :B)

julia> DataFrames.DataFrame(table(x, :x, :i, :j))  # <- here's the ordering that feels wrong
4×3 DataFrame
 Row │ i      j       x       
     │ Int64  Symbol  Tuple  
─────┼────────────────────────
   11  A       (1, :A)
   22  A       (2, :A)
   31  B       (1, :B)
   42  B       (2, :B)

@odow
Copy link
Member

odow commented Oct 16, 2022

src/Containers/tables.jl Outdated Show resolved Hide resolved
Copy link
Member

@odow odow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I answered my two questions, and I added some docs. I think this will be quite useful.

src/Containers/tables.jl Outdated Show resolved Hide resolved
@mlubin
Copy link
Member

mlubin commented Oct 17, 2022

I'd like to take a look before we merge this. Please bug me if I don't review it quickly.

@odow
Copy link
Member

odow commented Oct 17, 2022

Sure. I guess one other potential change is to call it Containers.rowtable to match https://tables.juliadata.org/stable/#Tables.rowtable and to reflect the fact that we're returning a one-dimensional object.

@hellemo
Copy link
Contributor

hellemo commented Oct 17, 2022

Very nice!

Did you consider storing the names in the containers, which would simplify the use of tables further (as mentioned in #3096 and demonstrated in the last example), following up on #3088 ?

julia> using PrettyTables

julia> pretty_table(table(y)) # SparseVarArray
┌──────┬──────┬─────┐
│  car │ year │   y │
├──────┼──────┼─────┤
│  bmw │ 2001 │ 0.0 │
│ ford │ 2001 │ 0.0 │
│ ford │ 2000 │ 0.0 │
│  bmw │ 2002 │ 0.0 │
└──────┴──────┴─────┘```

@odow
Copy link
Member

odow commented Oct 17, 2022

Did you consider storing the names in the containers

Not yet. That's a larger, more orthogonal change. We'd have to think about how names etc are preserved during broadcasting etc. Otherwise someone would say "why does table(x) work, but table(value.(x)) not?

@trulsf
Copy link
Contributor Author

trulsf commented Oct 17, 2022

@odow Thanks for tidying up and fixing. Great learning experience to see how you approach and deal with the code. I fully agree with your decision to group names into one argument.

@hellemo
Copy link
Contributor

hellemo commented Oct 17, 2022

Not yet. That's a larger, more orthogonal change. We'd have to think about how names etc are preserved during broadcasting etc. Otherwise someone would say "why does table(x) work, but table(value.(x)) not?

I guess I imagined the scope of table to be narrower and more of an alternative way to get (all) the values. So I don't quite understand why you would want to do table(value.(x)) when table(x) is shorter and hopefully optimized for fetching all values at once. A follow-up question is of course your thoughts on how to make this possible/easy (assuming it is supported by solver APIs, which I haven't checked). Subsetting or slicing might complicate this approach a bit, I imagine.

I was kind of hoping to get rid of the call to tables where you have to specify column names (you already specified them once), and rather provide defaults if they are not specified to the macro on container creation, like e.g. DataFrames.

Just picking up on these in case they are important to consider at this point. (In terms of designing API for table and considering possibilities to optimize fetching of solution values)

@odow
Copy link
Member

odow commented Oct 17, 2022

So I don't quite understand why you would want to do table(value.(x)) when table(x) is shorter and hopefully optimized for fetching all values at once

The current design doesn't assume value is used. You can either go table(value, x, names...) or table(value.(x), names...) (the default map function is identity).

The problem with names is this:

julia> using JuMP.Containers

julia> Containers.@container(x[a=1:2, b=1:2], a + b, container = DenseAxisArray)
2-dimensional DenseAxisArray{Int64,2,...} with index sets:
    Dimension 1, Base.OneTo(2)
    Dimension 2, Base.OneTo(2)
And data, a 2×2 Matrix{Int64}:
 2  3
 3  4

julia> Containers.@container(y[i=1:2, j=1:2], i + j, container = DenseAxisArray)
2-dimensional DenseAxisArray{Int64,2,...} with index sets:
    Dimension 1, Base.OneTo(2)
    Dimension 2, Base.OneTo(2)
And data, a 2×2 Matrix{Int64}:
 2  3
 3  4

julia> x .+ y
2-dimensional DenseAxisArray{Int64,2,...} with index sets:
    Dimension 1, Base.OneTo(2)
    Dimension 2, Base.OneTo(2)
And data, a 2×2 Matrix{Int64}:
 4  6
 6  8

Currently, we consider two arrays to be alike if the have the same axes. The "name" associated with each axis is ignored.

I was kind of hoping to get rid of the call to tables where you have to specify column names (you already specified them once), and rather provide defaults if they are not specified to the macro on container creation, like e.g. DataFrames.

A reasonable compromise might be Symbol("x$I") for the axes, and :y for the result, if no names are specified.

@odow
Copy link
Member

odow commented Oct 17, 2022

This is now:

julia> model = Model();

julia> @variable(model, x[i=1:2, j=i:2] >= 0, start = i+j);

julia> Containers.table(start_value, x, :i, :j, :start)
3-element Vector{NamedTuple{(:i, :j, :start), Tuple{Int64, Int64, Float64}}}:
 (i = 1, j = 2, start = 3.0)
 (i = 1, j = 1, start = 2.0)
 (i = 2, j = 2, start = 4.0)

julia> Containers.table(x)
3-element Vector{NamedTuple{(:x1, :x2, :y), Tuple{Int64, Int64, VariableRef}}}:
 (x1 = 1, x2 = 2, y = x[1,2])
 (x1 = 1, x2 = 1, y = x[1,1])
 (x1 = 2, x2 = 2, y = x[2,2])

@mlubin
Copy link
Member

mlubin commented Oct 18, 2022

Agreed with the rowtable naming. table is a bit confusing and seems to "underdeliver" by returning a Vector{NamedTuple} that you need additional packages to work with nicely.

Overall this is a nice little addition.

@odow
Copy link
Member

odow commented Oct 18, 2022

Updated to rowtable. We can't add a method to Tables.rowtable because that would be piracy, and we return a slightly different notion of table:

julia> using Tables

julia> x = rand(2, 2)
2×2 Matrix{Float64}:
 0.262588  0.086919
 0.199496  0.542848

julia> Tables.rowtable(x)
ERROR: ArgumentError: a 'Matrix{Float64}' is not a table; see `?Tables.table` for ways to treat an AbstractVecOrMat as a table
Stacktrace:
 [1] rows(m::Matrix{Float64})
   @ Tables ~/.julia/packages/Tables/T7rHm/src/matrix.jl:3
 [2] rowtable(itr::Matrix{Float64})
   @ Tables ~/.julia/packages/Tables/T7rHm/src/namedtuples.jl:105
 [3] top-level scope
   @ REPL[4]:1

julia> Tables.table(x)
Tables.MatrixTable{Matrix{Float64}} with 2 rows, 2 columns, and schema:
 :Column1  Float64
 :Column2  Float64

julia> Tables.rowtable(Tables.table(x))
2-element Vector{NamedTuple{(:Column1, :Column2), Tuple{Float64, Float64}}}:
 (Column1 = 0.262588126998895, Column2 = 0.08691902246653505)
 (Column1 = 0.19949566550192044, Column2 = 0.5428477642995122)

I wonder if we should follow the ; header::Vector{Symbol} argument for the names. It'd be simpler.

help?> Tables.table
  Tables.table(m::AbstractVecOrMat; [header])

  Wrap an AbstractVecOrMat (Matrix, Vector, Adjoint, etc.) in a MatrixTable, which satisfies the
  Tables.jl interface. (An AbstractVector is treated as a 1-column matrix.) This allows accessing
  the matrix via Tables.rows and Tables.columns. An optional keyword argument iterator header can
  be passed which will be converted to a Vector{Symbol} to be used as the column names. Note that
  no copy of the AbstractVecOrMat is made.

julia> Tables.table(x, header = [:x1, :x2])
Tables.MatrixTable{Matrix{Float64}} with 2 rows, 2 columns, and schema:
 :x1  Float64
 :x2  Float64

julia> Tables.rowtable(Tables.table(x, header = [:x1, :x2]))
2-element Vector{NamedTuple{(:x1, :x2), Tuple{Float64, Float64}}}:
 (x1 = 0.262588126998895, x2 = 0.08691902246653505)
 (x1 = 0.19949566550192044, x2 = 0.5428477642995122)

@hellemo
Copy link
Contributor

hellemo commented Oct 18, 2022

The current design doesn't assume value is used. You can either go table(value, x, names...) or table(value.(x), names...) (the default map function is identity).

While I appreciate the elegance of this solution, doesn't this make it harder to specialize on a table method for value that is optimized for fetching all values at once? (or is there a problem with this idea more generally? - would appreciate some pointers on where to start looking into that)

Currently, we consider two arrays to be alike if the have the same axes. The "name" associated with each axis is ignored.

You could still keep that behaviour, no? I mean, you already disregard the names initially. The suggested change would just be to keep them around for when they would be of use, such as when getting results via table.

A reasonable compromise might be Symbol("x$I") for the axes, and :y for the result, if no names are specified.

Thanks.

@odow
Copy link
Member

odow commented Oct 18, 2022

While I appreciate the elegance of this solution, doesn't this make it harder to specialize on a table method for value that is optimized for fetching all values at once? (or is there a problem with this idea more generally? - would appreciate some pointers on where to start looking into that)

You could write a specific rowtable(::typeof(JuMP.value), x::SparseVarArray) method to make it faster.

I would expect that rowtable(value, x) takes about the same time as value.(x); it has to call value(x[i]) for every element. But I think the utility of rowtable is not speed, but that it simplifies converting into tables.

@hellemo
Copy link
Contributor

hellemo commented Oct 19, 2022

You could write a specific rowtable(::typeof(JuMP.value), x::SparseVarArray) method to make it faster.

Good idea. Would also want that for Sparse and Dense I think, if that's possible (to speed up by retrieving all at once).

I would expect that rowtable(value, x) takes about the same time as value.(x); it has to call value(x[i]) for every element. But I think the utility of rowtable is not speed, but that it simplifies converting into tables.

I would say both, but we can always revisit the performance later as long as the design permits that. Thanks.

@odow
Copy link
Member

odow commented Oct 27, 2022

Thanks @trulsf, I think this is a nice addition.

@trulsf
Copy link
Contributor Author

trulsf commented Oct 28, 2022

Thanks @odow for your work on incorporating this in JuMP. We are looking forward to using the new features!

@odow
Copy link
Member

odow commented Oct 28, 2022

I'll make a new release: #3117

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Tables.jl support for results
5 participants