Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 58 intermediate outputs #85

Merged
merged 11 commits into from
May 16, 2019
Merged

Issue 58 intermediate outputs #85

merged 11 commits into from
May 16, 2019

Conversation

csala
Copy link
Contributor

@csala csala commented May 7, 2019

Resolve #58 and #61

Allow getting intermediate outputs and allow fitting or producing only half of the pipeline.
Also update the documentation to match the changes from the latest MLPrimitives versions.

The implementation has been done as follows:

  • New optional keyword arguments output_ and start_ have been added to MLPipeline fit and predict methods.
  • The output_ argument can be either an int or a str, and encodes a block_name and a variable_name within it:
    • If it's an int, it is interpreted as the block index, and the method call will return the context right after the indicated block's produce method has been called. As all indices in python, 0 is the first one. Also, negative values work, so -1 is the last block. Indices that are either too big or too small raise an IndexError.
    • If it's an str, it is expected to be the name of a block, including the counter number at the end: name.of.the.primitive#n. In this case, the method call will return the context right after the indicated block's produce method has been called. Optionally, a variable name can be added at the end, using a '.' as separator: name.of.the.primitive#n.variable_name. In this case, instead of returning all the context, the corresponding variable will be extracted from it. If the block name does not match a block exactly, or if the indicated variable cannot be found in the context, a Value error is raised.
  • The start_ argument can be either an int or an str, and it is interpreted as either the index or the name of a block. If given, the execution of the pipeline will start on that block, and all the blocks before that will be skipped. If and int is given and the index is too big or to small, an IndexError will be raised. If an str is given and it does not match the name of a block exactly, a ValueError will be raised.

As a consequence of this development, several new possibilities arise:

Issue #58: Getting Intermediate Outputs

Once the pipeline has been fit, we can obtain an intermediate output by specifying the name or the index which we want to get the output from:

primitives = ['first_primitive', 'second_primitive', 'third_primitive']
pipeline = MLPipeline(primitives)
pipeline.fit(X_train, y_train)
output = 1  # alternatively, "second_primitive#1"
context_after_second_primitive = pipeline.produce(X_test, output_=output)

If we only need one of the variables, we can get it by specifying the block name and the variable name

output = 'second_primitive#1.X'
X_after_primitive2 = pipeline.produce(X_test, output_=output)

Partial predict outputs can also be obtained during the fit process:

output = 'second_primitive#1'
context_after_second_primitive = pipeline.fit(X_train, y_train, output_=output)

Issue #61: Partial re-fit

By using the previous feature, we can capture the status of the context after a certain block, and then use the start_ argument to fit and produce the rest of the pipeline multiple times:

output = 'second_primitive#1'
context_after_second_primitive = pipeline.fit(X_train, y_train, output_=output)
start = 'third_primitive#1'
pipeline.fit(start_=start, **context_after_second_primitive)
output = pipeline.predict(start_=start, **context_after_second_primitive)

And we can even set new hyperparameters between calls:

pipeline.set_hyperparameters({
    'third_primitive': {
        'some_argument': 'some_value'
    }
})
pipeline.fit(start_=start, **context_after_second_primitive)
new_output = pipeline.predict(start_=start, **context_after_second_primitive)

@codecov-io
Copy link

codecov-io commented May 7, 2019

Codecov Report

Merging #85 into master will increase coverage by 17.97%.
The diff coverage is 91.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #85       +/-   ##
==========================================
+ Coverage   51.53%   69.5%   +17.97%     
==========================================
  Files           5       5               
  Lines         423     469       +46     
==========================================
+ Hits          218     326      +108     
+ Misses        205     143       -62
Impacted Files Coverage Δ
mlblocks/mlblock.py 91.57% <ø> (+15.78%) ⬆️
mlblocks/primitives.py 100% <ø> (ø) ⬆️
mlblocks/datasets.py 36.12% <50%> (+3.22%) ⬆️
mlblocks/mlpipeline.py 79.31% <92.98%> (+40.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e1ca77b...7112016. Read the comment docs.

@csala csala requested a review from ManuelAlvarezC May 7, 2019 16:45
@CLAassistant
Copy link

CLAassistant commented May 9, 2019

CLA assistant check
All committers have signed the CLA.

@csala csala merged commit abe6e25 into master May 16, 2019
@csala csala deleted the issue_58_intermediate_outputs branch May 16, 2019 16:11
@csala csala mentioned this pull request May 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow getting intermediate outputs
4 participants