Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Structure #15

Merged
merged 20 commits into from
Jul 16, 2022
Merged

Initial Structure #15

merged 20 commits into from
Jul 16, 2022

Conversation

dynamic-queries
Copy link
Contributor

I realized that a recurring theme of the conventional model reduction methods is based on linear (PCA) and non-linear data reduction.(Diffusion Maps, VAEs ...). As a result, I started working on the data reduction part of the package, which could be useful for some the new data-driven reduction schemes that have been proposed.

To check, if this direction is , inline with the ideas of the Maintainer, I have included a sample implementation of PCA. It is still unclear if PCA from MultivariateStats.jl should be leveraged in this package. I will make a benchmark and post in the comments soon.

Tests for Reduction schemes, seems "ad-hoc" in nature. For instance, the intro to POD / PCA in #2 has some MATLAB scripts that could be used to test the energy of the implementations. But I have to admit, I am coming up short in trying to find a systematic way to write tests, that could generalize to different data.

Please advise.

Rahul Manavalan added 6 commits June 25, 2022 14:35

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
ModelOrderReduction using Data-Driven methods often requires some data
reduction techniques to begin with. As a result, it is sensible to
abstract this out.

PCA which is used in Galerkin based POD is one
linear method, that Chris already refers to in Issue 2.

Non-linear data reduction methods like the Diffussion Maps is also
something that is in the pipeline.

Ofcourse, VAEs are well known to represent a state vector in latent
space, so this is a no-brainer, as there are some interesting model
reduction methods, that could benefit from this.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
The PCA routine is precompiled and tested with the Lorenz attractor.
It is to be determined, what qualifies as a good test for PCA and other
reduction routines for that matter.
Allow the PCA type to include Matrix{FT} as well.
Debug PCA reduce! implementation.
Add reduce! to the export list.
Added prelim test for the reduction of the Lorenz time series.
I know apriori that the energy of the system after reduction is >0.9.
A more systematic test setup has to be thought up.
Adding visualization of the data overlayed with the reduced basis would
be nice!
include("ErrorHandle.jl")
using LinearAlgebra
include("DataReduction/PCA.jl")
include("DataReduction/DifussionMaps.jl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

@codecov
Copy link

codecov bot commented Jun 30, 2022

Codecov Report

Merging #15 (a06a9f7) into main (020f782) will increase coverage by 72.34%.
The diff coverage is 72.34%.

@@            Coverage Diff            @@
##           main      #15       +/-   ##
=========================================
+ Coverage      0   72.34%   +72.34%     
=========================================
  Files         0        3        +3     
  Lines         0       47       +47     
=========================================
+ Hits          0       34       +34     
- Misses        0       13       +13     
Impacted Files Coverage Δ
src/DataReduction/POD.jl 67.50% <67.50%> (ø)
src/ErrorHandle.jl 100.00% <100.00%> (ø)
src/Types.jl 100.00% <100.00%> (ø)

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

new{eltype(snaps[1]),typeof(nmodes)}(snaps,nmodes)
end

function PCA(snaps::Matrix{FT},nmodes::IT) where {FT,IT}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Among all the names for the SVD, POD is by far the most common in the realm of model order reduction. I would suggest we go with that (if not all).

if typeof(pca.snapshots) == Vector{Vector{FT}}
op_matrix = matricize(pca.snapshots)
end
u,s,v = svd(op_matrix)
Copy link
Contributor

@FHoltorf FHoltorf Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In cases where applying POD (model order reduction in general) makes sense, taking the FULL SVD of the whole data matrix is often prohibitively expensive. So I think we should default to using the method of snapshots to compute the dominant singular vectors/values. If the matrix is small it makes hardly a difference. In general it would be good to allow explicit specification of the algorithm to be used. Many options are conceivable: Randomized SVD (see RandomizedLinAlg.jl for example), Lancosz bidiagonalization (see TSVD.jl).

@FHoltorf
Copy link
Contributor

Also I am unsure what you mean by "testing the energy of the implementation"? If you want to test what percentage of the energy of the input data is captured by a reduced basis, I think it is perfectly fine to hardcode the number you would expect. Other than that, I think it is fair to assume correctness of svd, so you can always rely on that as benchmark for any alternative way to compute a truncated SVD (method of snapshots, rSVD, ...). Any other opinions?

Rahul Manavalan added 4 commits July 1, 2022 11:40
Ad-hoc tests are now implemented with the Lorenz attractor.
Regression tests with MultivariateStats.jl coming up.
Code Formatting to be done.
Include a Benchmark from the data in #2.
@dynamic-queries
Copy link
Contributor Author

dynamic-queries commented Jul 3, 2022

Regression Tests with MultivariateStats.jl is not straightforward, as there is some pre-processing done which does not really make sense in this package.

Instead a visual inspection can be realized by looking at the Lorenz attractor.
lorenz_attractor

The POD(mode=2) version looks as follows
pod2

It is also known that the POD(mode=1) of the attractor resembles its z-trajectory, which is also verified.

z
pod1

Please advise on how to close this request.

UPDATE :
While I await a reply, I shall start implementing the OperatorInference Module of Lift and Learn in a separate branch at my fork. :D

Rahul Manavalan added 2 commits July 3, 2022 06:26
Considering that we need the reduced bases in Matrix form down the line,
for POD based MOR methods, a change in the DS is made.
@ChrisRackauckas
Copy link
Member

Is this still WIP? If not, @FHoltorf should take another pass.

@dynamic-queries dynamic-queries changed the title WIP - Initial Structure Initial Structure Jul 7, 2022
@dynamic-queries
Copy link
Contributor Author

Is this still WIP? If not, @FHoltorf should take another pass.

Copy link
Contributor

@FHoltorf FHoltorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty good already. Most importantly, there is a critical mistake in the organization of the snapshot matrix and that really needs fixing. Also a trivial but usually informative lower bound on the rel. energy recovered by truncated SVD routines should be returned. Beyond that I made only some minor suggestions. Good work, thanks!

end

sol = lorenz_prob()
solution = Matrix(reduce(hcat,sol.u)')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array(sol) is what you want to be using. You are not "correctly" (to be understood in the usual context of POD) concatenating the data snapshots (see comment on matricize). For a test this is alright but this isn't really how PCA is used in the context of model order reduction.

@@ -0,0 +1,65 @@
function matricize(VoV::Vector{Vector{FT}}) where {FT}
Matrix(reduce(hcat,VoV)')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You usually want reduce(hcat,VoV) to get the right snapshot matrix (each column is a snapshot). For context: The POD basis is simply the orthogonalized basis that captures the column space of the best low rank r approximation to the snapshot matrix, i.e., a basis of the r-dimensional subspace that best embeds the time series data given by the snapshots in the Frobenius norm sense.

op_matrix = matricize(pod.snapshots)
end
u,s,v = tsvd(op_matrix,pod.nmodes)
pod.energy = NaN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to return sum(s)/(sum(s) + (size(op_matrix,1)-pod.nmodes)*s[end]) as trivial lower bound on the rel. energy captured. In cases where using POD makes sense, that will usually be quite informative.

op_matrix = matricize(pod.snapshots)
end
u,s,v = rsvd(op_matrix,pod.nmodes)
pod.energy = NaN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to return sum(s)/(sum(s) + (size(op_matrix,1)-pod.nmodes)*s[end]) as trivial lower bound on the rel. energy captured. In cases where using POD makes sense, that will usually be quite informative.

print(io,"POD \n")
print(io,"Reduction Order = ",pod.nmodes,"\n")
print(io,"Snapshot size = (", length(pod.snapshots),",",length(pod.snapshots[1]),")\n")
print(io,"Energy = ", pod.energy,"\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would advocate for renaming to "relative Energy"

export SVD, TSVD, RSVD
export POD, reduce!, matricize
#========================Model Reduction========================================#
include("ModelReduction/LiftAndLearn.jl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LiftAndLearn.jl is empty? Perhaps remove for now?

end

sol = lorenz_prob()
solution = Matrix(reduce(hcat,sol.u)')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as in the example applies

test/runtests.jl Outdated
# Write your tests here.
end
include("DataReduction.jl")
include("ModelReduction.jl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ModelReduction.jl is empty. Perhaps remove for now?

function errorhandle(data::Matrix{FT},modes::IT) where {FT,IT}
@assert size(data,1)>1 "State vector is expected to be vector valued."
s = size(data,2)
@assert (modes>0)&(modes<s) "Number of modes should be [1,$(s)]."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the error message should read "Number of modes should be in {1, ..., $(s-1)}."


mutable struct POD{FT,IT} <: AbstractDRProblem
snapshots::Union{Vector{Vector{FT}},Matrix{FT}}
nmodes::IT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to not force nmodes to be an integer? Seems like that would be better to catch user mistakes earlier rather than later.

Also in general, I would suggest to make the POD struct parametric in the type of the snapshots. That is less error prone and you also do not need the if statement in the reduce! functions to check whether the data has already been supplied as a matrix or vector-of-vectors. You can simply dispatch on the POD type appropriately.

Currently, all of this is a bit brittle. In particular, it really kind of directly assumes that the data is given in terms of Float64 (note that you require energy to be of the same type as the elements of the snapshots; that can yield some awkward errors later on), but that is not always the case.

Rahul Manavalan and others added 6 commits July 12, 2022 10:11
nmodes is now forced to be an integer.
Silly transpose error fix
Update relative energy for truncated SVD algos

TODO: Modify type of POD, similar to the type of snapshots
Copy link
Contributor

@FHoltorf FHoltorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is good to go @ChrisRackauckas.

I still have a few concerns about certain choices but for sake of expediting the process I will simply submit a PR with the suggested changes myself.

@ChrisRackauckas
Copy link
Member

Run the formatter and I think this is a fine start.

@ChrisRackauckas ChrisRackauckas merged commit 4910f4a into SciML:main Jul 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants