Issue 58 intermediate outputs #85

csala · 2019-05-07T15:22:43Z

Resolve #58 and #61

Allow getting intermediate outputs and allow fitting or producing only half of the pipeline.
Also update the documentation to match the changes from the latest MLPrimitives versions.

The implementation has been done as follows:

New optional keyword arguments output_ and start_ have been added to MLPipeline fit and predict methods.
The output_ argument can be either an int or a str, and encodes a block_name and a variable_name within it:
- If it's an int, it is interpreted as the block index, and the method call will return the context right after the indicated block's produce method has been called. As all indices in python, 0 is the first one. Also, negative values work, so -1 is the last block. Indices that are either too big or too small raise an IndexError.
- If it's an str, it is expected to be the name of a block, including the counter number at the end: name.of.the.primitive#n. In this case, the method call will return the context right after the indicated block's produce method has been called. Optionally, a variable name can be added at the end, using a '.' as separator: name.of.the.primitive#n.variable_name. In this case, instead of returning all the context, the corresponding variable will be extracted from it. If the block name does not match a block exactly, or if the indicated variable cannot be found in the context, a Value error is raised.
The start_ argument can be either an int or an str, and it is interpreted as either the index or the name of a block. If given, the execution of the pipeline will start on that block, and all the blocks before that will be skipped. If and int is given and the index is too big or to small, an IndexError will be raised. If an str is given and it does not match the name of a block exactly, a ValueError will be raised.

As a consequence of this development, several new possibilities arise:

Issue #58: Getting Intermediate Outputs

Once the pipeline has been fit, we can obtain an intermediate output by specifying the name or the index which we want to get the output from:

primitives = ['first_primitive', 'second_primitive', 'third_primitive']
pipeline = MLPipeline(primitives)
pipeline.fit(X_train, y_train)
output = 1  # alternatively, "second_primitive#1"
context_after_second_primitive = pipeline.produce(X_test, output_=output)

If we only need one of the variables, we can get it by specifying the block name and the variable name

output = 'second_primitive#1.X'
X_after_primitive2 = pipeline.produce(X_test, output_=output)

Partial predict outputs can also be obtained during the fit process:

output = 'second_primitive#1'
context_after_second_primitive = pipeline.fit(X_train, y_train, output_=output)

Issue #61: Partial re-fit

By using the previous feature, we can capture the status of the context after a certain block, and then use the start_ argument to fit and produce the rest of the pipeline multiple times:

output = 'second_primitive#1'
context_after_second_primitive = pipeline.fit(X_train, y_train, output_=output)
start = 'third_primitive#1'
pipeline.fit(start_=start, **context_after_second_primitive)
output = pipeline.predict(start_=start, **context_after_second_primitive)

And we can even set new hyperparameters between calls:

pipeline.set_hyperparameters({
    'third_primitive': {
        'some_argument': 'some_value'
    }
})
pipeline.fit(start_=start, **context_after_second_primitive)
new_output = pipeline.predict(start_=start, **context_after_second_primitive)

codecov-io · 2019-05-07T15:28:30Z

Codecov Report

Merging #85 into master will increase coverage by 17.97%.
The diff coverage is 91.52%.

@@            Coverage Diff             @@
##           master     #85       +/-   ##
==========================================
+ Coverage   51.53%   69.5%   +17.97%     
==========================================
  Files           5       5               
  Lines         423     469       +46     
==========================================
+ Hits          218     326      +108     
+ Misses        205     143       -62

Impacted Files	Coverage Δ
mlblocks/mlblock.py	`91.57% <ø> (+15.78%)`	⬆️
mlblocks/primitives.py	`100% <ø> (ø)`	⬆️
mlblocks/datasets.py	`36.12% <50%> (+3.22%)`	⬆️
mlblocks/mlpipeline.py	`79.31% <92.98%> (+40.24%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1ca77b...7112016. Read the comment docs.

CLAassistant · 2019-05-09T13:14:37Z

All committers have signed the CLA.

csala added 9 commits April 19, 2019 13:38

Initial implementation to work with intermediate outputs

d3cbee7

Update contributing guide to match the current release workflow

59fae90

Update docs config

e768037

Remove spaces

080580d

ADd docstrings

e25fa6d

Update primitive names to match the latest versions of MLPrimitives

5e9be7a

Add random state to datasets get_splits

9f0ae6a

Rename output and start arguments

5aea647

Add unit tests for partial outputs feature

4607b38

csala requested a review from ManuelAlvarezC May 7, 2019 16:45

ManuelAlvarezC approved these changes May 9, 2019

View reviewed changes

Improve docstrings and add toc in autogenerated API reference

980794b

Add missing dependency

7112016

csala merged commit abe6e25 into master May 16, 2019

csala deleted the issue_58_intermediate_outputs branch May 16, 2019 16:11

csala mentioned this pull request May 22, 2019

Implement partial re-fit #61

Closed

AlexanderGeiger mentioned this pull request Jul 23, 2019

update DB schema sintel-dev/Orion#57

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 58 intermediate outputs #85

Issue 58 intermediate outputs #85

csala commented May 7, 2019 •

edited

Loading

codecov-io commented May 7, 2019 •

edited

Loading

CLAassistant commented May 9, 2019 •

edited

Loading

Issue 58 intermediate outputs #85

Issue 58 intermediate outputs #85

Conversation

csala commented May 7, 2019 • edited Loading

Issue #58: Getting Intermediate Outputs

Issue #61: Partial re-fit

codecov-io commented May 7, 2019 • edited Loading

Codecov Report

CLAassistant commented May 9, 2019 • edited Loading

csala commented May 7, 2019 •

edited

Loading

codecov-io commented May 7, 2019 •

edited

Loading

CLAassistant commented May 9, 2019 •

edited

Loading