Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SequenceLearner points hashable by passing the sequence to the function. #266

Closed
wants to merge 9 commits into from

Conversation

basnijholt
Copy link
Member

Description

With this change, the function of the SequenceLearner contains the sequence.

This is needed because points that are passed to the Runner need to be hashable, and this is not required by the entries of the sequence in SequenceLearner.

Fixes #265

Checklist

  • Fixed style issues using pre-commit run --all (first install using pip install pre-commit)
  • pytest passed

Type of change

Check relevant option(s).

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

@basnijholt basnijholt force-pushed the fix-sequence-learner branch from 897d4c8 to 093f148 Compare April 14, 2020 16:14
@basnijholt basnijholt requested review from akhmerov and jbweston April 14, 2020 16:15
@basnijholt basnijholt force-pushed the fix-sequence-learner branch from e5f7dd5 to 35930c7 Compare April 16, 2020 15:03
@akhmerov
Copy link
Contributor

I propose to:

@basnijholt what do you think?

@codecov-io
Copy link

Codecov Report

Merging #266 into master will increase coverage by 0.16%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #266      +/-   ##
==========================================
+ Coverage   79.52%   79.68%   +0.16%     
==========================================
  Files          32       33       +1     
  Lines        4424     4416       -8     
  Branches      815      814       -1     
==========================================
+ Hits         3518     3519       +1     
+ Misses        779      768      -11     
- Partials      127      129       +2     
Impacted Files Coverage Δ
adaptive/learner/sequence_learner.py 86.66% <100.00%> (+0.95%) ⬆️
adaptive/tests/test_learners.py 98.63% <100.00%> (-0.08%) ⬇️
adaptive/tests/test_sequence_learner.py 100.00% <100.00%> (ø)
adaptive/learner/learner1D.py 91.71% <0.00%> (-0.62%) ⬇️
adaptive/runner.py 69.66% <0.00%> (+3.37%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3d9397d...230b356. Read the comment docs.

@basnijholt
Copy link
Member Author

basnijholt commented Apr 16, 2020

@akhmerov, I've implemented the changes you suggested and opened #267.

Thanks for your review! 🙌

@akhmerov
Copy link
Contributor

Sorry, I'm having second thoughts. Passing the complete sequence of arguments to the function means we're distributing it across all workers. Is that definitely OK? It might be, but I think we should consider this carefully.

@akhmerov
Copy link
Contributor

Passing the complete sequence of arguments to the function means we're distributing it across all workers. Is that definitely OK?

Ultimately I think this would only be a problem in extremely exotic cases, and the simplification of the codebase justifies this downside.

@basnijholt
Copy link
Member Author

I actually lean towards not accepting the overhead.

From learning the following function on a ProcessPoolExecutor:

class FailOnce:
    def __init__(self):
        self.n = 0

    def __call__(self, value):
        self.n += 1
        return self.n

Which will always return 1, I learn that the sequence will be serialized and send each point!

As an alternative solution, I have opened https://github.com/python-adaptive/adaptive/pull/268/files. Which appeared to be way less work than I thought initially.

@basnijholt
Copy link
Member Author

Superseded by #268.

@basnijholt basnijholt closed this Apr 23, 2020
@akhmerov
Copy link
Contributor

I'm actually not sure whether creating multiple copies of a sequence is an important limitation and I am wondering whether we should merge this anyway. I still believe this implementation of a sequence learner is simpler, and therefore deserves consideration separate of its relation to #265.

@jbweston, what do you think?

@basnijholt
Copy link
Member Author

basnijholt commented Apr 23, 2020

@akhmerov, I've benchmarked this and the overhead is completely unacceptable.

For example, at this exact moment I am doing simulations with SequenceLearners of ~1e6 items with 15 parameters.

import pickle
import numpy as np
sequence = list(np.random.rand(1_000_000, 15))
%timeit pickle.dumps(sequence)

4.92 s ± 44.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
An that happens on each function evaluation. And this does not even include unserializing and sending it over the network.

@akhmerov
Copy link
Contributor

OK, fair enough.

@basnijholt basnijholt mentioned this pull request May 19, 2020
@basnijholt basnijholt deleted the fix-sequence-learner branch May 20, 2020 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SequenceLearner with unhashable entries in sequence break Runner
3 participants