Introduction of ATOL in finite vectorisation method #348

martinroyer · 2020-06-08T14:04:54Z

No description provided.

mglisse

Thanks for the PR!
Could you include some minimal (or more extensive 😉) test, so we have a chance to notice early if we accidentally completely break the code at some point?

src/python/gudhi/representations/vector_methods.py

martinroyer

I added a minimal working example in the class documentation.

Is there a test file for representation methods that I cannot find?

src/python/gudhi/representations/vector_methods.py

mglisse · 2020-06-09T14:16:24Z

Is there a test file for representation methods that I cannot find?

Currently there is only src/python/test/test_representations.py. You can add to it or create a separate file as you like.

For your example in the documentation, if you want, you can use syntax from https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html so that your example code is actually run while building the documentation, and the output is checked against what you have written. For instance, src/python/doc/wasserstein_distance_user.rst uses testcode/testoutput.

martinroyer · 2020-06-09T15:05:25Z

For your example in the documentation, if you want, you can use syntax from https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html so that your example code is actually run while building the documentation, and the output is checked against what you have written. For instance, src/python/doc/wasserstein_distance_user.rst uses testcode/testoutput.

The current format already allows for that, i.e. the test I wrote in the doc is already being run during the build (hence my problem above this comment).

mglisse · 2020-06-09T15:14:31Z

The current format already allows for that, i.e. the test I wrote in the doc is already being run during the build

Oops, I missed that. Nice.

(hence my problem above this comment).

Er, which problem?

mglisse

I notice you seem to be using the web interface to modify the code. Do you find it convenient? I usually find it easier to commit and push to my branch without messing with a browser.

src/python/gudhi/representations/vector_methods.py

mglisse · 2020-06-09T16:00:49Z

Your name should probably appear in a more visible place than just one comment, maybe Vincent will have suggestions about that.

src/python/gudhi/representations/vector_methods.py

martinroyer · 2020-06-10T07:35:25Z

Er, which problem?

Problem with randomness that are not fixed by fixing the seed (the seed I used at home for KMeans lead to a different center order than that on the server)

src/python/gudhi/representations/vector_methods.py

martinroyer · 2020-06-11T14:30:05Z

src/python/gudhi/representations/vector_methods.py

+        """
+        if sample_weight is None:
+            sample_weight = [self.weighting_method(measure) for measure in X]
+        return np.stack([self(measure, sample_weight=weight) for measure, weight in zip(X, sample_weight)])


ATOL does not necessarily apply to diagrams, it could be simpler objects that have near zero production cost.
You're right about parallelizing the transform: our research shows ATOL can be very useful even after a rough calibration, so it's very likely that there are situations where fit will necessarily be quick and dirty, but then there is a lot of transforming to be done.

src/python/gudhi/representations/vector_methods.py

(thank you Marc!)

src/python/gudhi/representations/vector_methods.py

mglisse · 2020-06-11T17:40:55Z

src/python/gudhi/representations/vector_methods.py

+        self.centers = self.quantiser.cluster_centers_
+        dist_centers = pairwise.pairwise_distances(self.centers)
+        np.fill_diagonal(dist_centers, np.inf)
+        self.inertias = np.min(dist_centers, axis=0)/2


I guess this isn't meant to be used with a single center.

Actually there is no reason why it should not, and it is interesting.

martinroyer · 2020-06-11T19:16:22Z

src/python/gudhi/representations/vector_methods.py

+        dist_centers = pairwise.pairwise_distances(self.centers)
+        np.fill_diagonal(dist_centers, np.inf)
+        self.inertias = np.min(dist_centers, axis=0)/2


Suggested change

dist_centers = pairwise.pairwise_distances(self.centers)

np.fill_diagonal(dist_centers, np.inf)

self.inertias = np.min(dist_centers, axis=0)/2

if self.quantiser.n_clusters == 1:

dist_centers = pairwise.pairwise_distances(measures_concat)

np.fill_diagonal(dist_centers, 0)

self.inertias = np.max(dist_centers)/2

else:

dist_centers = pairwise.pairwise_distances(self.centers)

np.fill_diagonal(dist_centers, np.inf)

self.inertias = np.min(dist_centers, axis=0)/2

src/python/gudhi/representations/vector_methods.py

martinroyer · 2020-06-16T20:36:01Z

src/python/gudhi/representations/vector_methods.py

+        if self.quantiser.n_clusters == 1:
+            dist_centers = pairwise.pairwise_distances(measures_concat)
+            np.fill_diagonal(dist_centers, 0)
+            self.inertias = np.max(dist_centers)/2
+        else:
+            dist_centers = pairwise.pairwise_distances(self.centers)
+            np.fill_diagonal(dist_centers, np.inf)
+            self.inertias = np.min(dist_centers, axis=0)/2
+        return self


It happens that two quantiser centers are identical, which leads to 0s in self.inertias and that is a problem I could address here...?

That sounds like a good idea. Are they exactly identical, or can they be very close but slightly apart? Can there be an arbitrary number of identical centers? Do you already know how you want to handle this problem?

It happens that they can be very close but slightly apart, or that they are exactly identical. And there can be an arbitrary number of identical centers as far as I can tell. I am unsure as to how to handle this.

Options that I think of include:
1- removing centers that are identical - it actually sounds rather solid but we lose control of the number of centers and does not feel natural.
2- recasting the resulting inertias that are too close to zero - which is what prompts error when computing the contrast functions over measures. I actually like the simplicity of this. In a way it says that the quantised mean measure is supported on points that can have multiplicity higher than 1.
3- ensuring ahead of quantising that the calibrating measure points are sufficiently distinct (using some sort of tree-trimming), so that the space quantising will pick sufficiently different cluster centers; even though this could work well in practice, it is rather shaky as it assumes the quantiser will not mess up and adds an extra and arbitrary trimming step that could obfuscate the signal in the end.

I don't understand what 2- means, but anyway, I am also ok with merging the current version and improving it once you have settled on a way to handle it, if the problem doesn't occur too often.

Yes it is rare and should be even rarer now, I'm ok with merging too.

Let's wait and see if Vincent has any comments, he can handle the merging then.

martinroyer · 2020-06-19T09:51:58Z

I have a hard time understanding how this PR breaks those tests...

src/python/gudhi/representations/vector_methods.py

mglisse · 2020-06-19T10:54:10Z

I have a hard time understanding how this PR breaks those tests...

The failure was due to a change in the contiguous integration we use, Vincent fixed it, it should work just fine on your next push.

VincentRouvreau

In src/python/test/test_representations.py (or a new test_atol.py), I would test your example:

import numpy as np
from sklearn.cluster import KMeans
from gudhi.representations.vector_methods import Atol

def test_atol():
    a = np.array([[1, 2, 4], [1, 4, 0], [1, 0, 4]])
    b = np.array([[4, 2, 0], [4, 4, 0], [4, 0, 2]])
    c = np.array([[3, 2, -1], [1, 2, -1]])

    atol_vectoriser = Atol(quantiser=KMeans(n_clusters=2, random_state=202006))
    assert np.allclose(atol_vectoriser.fit(X=[a, b, c]).centers,
                       np.array([[2., 0.66666667, 3.33333333], [2.6, 2.8, -0.4]]))
    assert np.allclose(atol_vectoriser(a),
                       np.array([1.18168665, 0.42375966]))
    assert np.allclose(atol_vectoriser(c),
                       np.array([0.02062512, 1.25157463]))
    assert np.allclose(atol_vectoriser.transform(X=[a, b, c]),
                       np.array([[1.18168665, 0.42375966], [0.29861028, 1.06330156], [0.02062512, 1.25157463]]))

VincentRouvreau · 2020-07-01T10:27:30Z

In src/python/doc/representations_sum.inc, you add your name:

:Author: Mathieu Carrière, Martin Royer

(be careful with table format that is ending with a |)

mglisse · 2020-07-01T10:38:42Z

In src/python/test/test_representations.py (or a new test_atol.py), I would test your example:

I think it is already tested by sphinx?

VincentRouvreau · 2020-07-01T11:59:26Z

I think it is already tested by sphinx?

Here it is just some text rendering. If you want sphinx to test it, you need to use .. testcode:: and .. testoutput:: syntax.

mglisse · 2020-07-01T12:22:29Z

Here it is just some text rendering. If you want sphinx to test it, you need to use .. testcode:: and .. testoutput:: syntax.

https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html it looks like this syntax also enables testing, and indeed it was causing trouble because Martin had a different version of sklearn at home and it was producing a different result.

VincentRouvreau · 2020-07-01T12:39:49Z

https://www.sphinx-doc.org/en/master/usage/extensions/doctest.html it looks like this syntax also enables testing, and indeed it was causing trouble because Martin had a different version of sklearn at home and it was producing a different result.

Oh you are right, sphinx_py_test fails. I was not aware of this syntax, forget my remark.

VincentRouvreau · 2020-07-01T12:54:48Z

I wrote this dummy test:

def test_dummy_atol():
    a = np.array([[1, 2, 4], [1, 4, 0], [1, 0, 4]])
    b = np.array([[4, 2, 0], [4, 4, 0], [4, 0, 2]])
    c = np.array([[3, 2, -1], [1, 2, -1]])

    for weighting_method in ["cloud", "iidproba"]:
        for contrast in ["gaussian", "laplacian", "indicator"]:
            atol_vectoriser = Atol(quantiser=KMeans(n_clusters=1, random_state=202006), weighting_method = weighting_method, contrast = contrast)
            atol_vectoriser(a)
            atol_vectoriser.transform(X=[a, b, c])

As Python is not compiled, it would be nice to have 100% of code coverage.

martinroyer · 2020-07-03T09:02:23Z

I wrote this dummy test:

def test_dummy_atol():
    a = np.array([[1, 2, 4], [1, 4, 0], [1, 0, 4]])
    b = np.array([[4, 2, 0], [4, 4, 0], [4, 0, 2]])
    c = np.array([[3, 2, -1], [1, 2, -1]])

    for weighting_method in ["cloud", "iidproba"]:
        for contrast in ["gaussian", "laplacian", "indicator"]:
            atol_vectoriser = Atol(quantiser=KMeans(n_clusters=1, random_state=202006), weighting_method = weighting_method, contrast = contrast)
            atol_vectoriser(a)
            atol_vectoriser.transform(X=[a, b, c])

As Python is not compiled, it would be nice to have 100% of code coverage.

Thank you!

ATOL introduction as finite vectorisation method

9c45abc

mglisse reviewed Jun 8, 2020

View reviewed changes

martinroyer commented Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

martinroyer added 2 commits June 9, 2020 08:42

Apply suggestions from code review

9b4de0e

fix minimal example

285919a

martinroyer commented Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

Update src/python/gudhi/representations/vector_methods.py

5d5a2b1

martinroyer commented Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

martinroyer commented Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

tweak test results from doc

588e712

martinroyer commented Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

Update src/python/gudhi/representations/vector_methods.py

7eed21c

martinroyer requested a review from mglisse June 9, 2020 09:21

martinroyer commented Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

remove randomness in example

a22b48e

martinroyer commented Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

awful test

3d12635

mglisse reviewed Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

mglisse reviewed Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Show resolved Hide resolved

mglisse reviewed Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

mglisse reviewed Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

mglisse reviewed Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

mglisse reviewed Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

mglisse reviewed Jun 9, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

add ATOL (arxiv) ref to bibliography

b7a3e9e

martinroyer commented Jun 11, 2020

View reviewed changes

Apply suggestions from code review GUDHI#456

76529ca

(thank you Marc!)

martinroyer commented Jun 11, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

test value tweak

a90843c

mglisse reviewed Jun 11, 2020

View reviewed changes

mglisse approved these changes Jun 11, 2020

View reviewed changes

martinroyer commented Jun 11, 2020

View reviewed changes

case n_centers = 1

ec1c3ad

martinroyer commented Jun 16, 2020

View reviewed changes

Update src/python/gudhi/representations/vector_methods.py

1abc47f

martinroyer commented Jun 19, 2020

View reviewed changes

src/python/gudhi/representations/vector_methods.py Outdated Show resolved Hide resolved

fix for null inertias

4a558f9

VincentRouvreau reviewed Jul 1, 2020

View reviewed changes

martinroyer added 3 commits July 3, 2020 10:58

Author in representations_sum

b1f656f

dummy test for code coverage

af49fdd

small optim

b7d9cc2

missing fit in dummy test

96eb09e

VincentRouvreau merged commit 7bd8c85 into GUDHI:master Jul 23, 2020

VincentRouvreau added the 3.3.0 GUDHI version 3.3.0 label Jul 23, 2020

martinroyer deleted the atol branch June 28, 2024 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduction of ATOL in finite vectorisation method #348

Introduction of ATOL in finite vectorisation method #348

martinroyer commented Jun 8, 2020

mglisse left a comment

martinroyer left a comment

mglisse commented Jun 9, 2020

martinroyer commented Jun 9, 2020

mglisse commented Jun 9, 2020

mglisse left a comment

mglisse commented Jun 9, 2020

martinroyer commented Jun 10, 2020

martinroyer Jun 11, 2020

mglisse Jun 11, 2020

martinroyer Jun 11, 2020

martinroyer Jun 11, 2020 •

edited

Loading

martinroyer Jun 16, 2020

mglisse Jun 16, 2020

martinroyer Jun 19, 2020

mglisse Jun 19, 2020

martinroyer Jun 19, 2020

mglisse Jun 19, 2020

martinroyer commented Jun 19, 2020

mglisse commented Jun 19, 2020

VincentRouvreau left a comment

VincentRouvreau commented Jul 1, 2020

mglisse commented Jul 1, 2020

VincentRouvreau commented Jul 1, 2020

mglisse commented Jul 1, 2020

VincentRouvreau commented Jul 1, 2020

VincentRouvreau commented Jul 1, 2020 •

edited

Loading

martinroyer commented Jul 3, 2020

Introduction of ATOL in finite vectorisation method #348

Introduction of ATOL in finite vectorisation method #348

Conversation

martinroyer commented Jun 8, 2020

mglisse left a comment

Choose a reason for hiding this comment

martinroyer left a comment

Choose a reason for hiding this comment

mglisse commented Jun 9, 2020

martinroyer commented Jun 9, 2020

mglisse commented Jun 9, 2020

mglisse left a comment

Choose a reason for hiding this comment

mglisse commented Jun 9, 2020

martinroyer commented Jun 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinroyer Jun 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinroyer commented Jun 19, 2020

mglisse commented Jun 19, 2020

VincentRouvreau left a comment

Choose a reason for hiding this comment

VincentRouvreau commented Jul 1, 2020

mglisse commented Jul 1, 2020

VincentRouvreau commented Jul 1, 2020

mglisse commented Jul 1, 2020

VincentRouvreau commented Jul 1, 2020

VincentRouvreau commented Jul 1, 2020 • edited Loading

martinroyer commented Jul 3, 2020

martinroyer Jun 11, 2020 •

edited

Loading

VincentRouvreau commented Jul 1, 2020 •

edited

Loading