Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add check to determine if graph uses all avaliable parameters #86

Closed
wants to merge 3 commits into from

Conversation

jokasimr
Copy link
Contributor

Fixes #43.

I wasn't sure if this should be added to the task graph rather than to the pipeline. What do you think?

I think there is also an opportunity here to extract the functionality of retrieving all providers to a helper method, it's also used in the _repr_html_ for example, but I didn't want to complicate things for now.

def _kind_of_provider(p: Callable[..., Any]) -> str:
if _qualname(p) == f'{_qualname(Pipeline.__setitem__)}.<locals>.<lambda>':
return 'parameter'
if _qualname(p) == f'{_qualname(Pipeline.set_param_table)}.<locals>.<lambda>':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be better to make a dummy Pipeline with a parameter and a table
and retrieve their names, and use them to check these instead?

Or if you add the passing case unit test I mentioned, it might be fine like this.

Copy link
Contributor Author

@jokasimr jokasimr Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what you mean. Do you mean to compute the expected _qualnames for each of the different kinds of providers, instead of hardcoding something like f'{_qualname(Pipeline.set_param_table)}.<locals>.<lambda>'?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No sth like..

_DUMMY_PIPELINE = sl.Pipeline(params={int: 0})
_PARAMETER_PROVIDER_NAME = _DUMMY_PIPELINE._get_provider(int)[0].__qualname__

pipeline[Unused] = 'Should raise'
assert pipeline.compute(float) == 1.5
with pytest.raises(RuntimeError):
assert pipeline.compute(float, check=True) == 1.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test that pass,
even if the compute passes with the check=True?

@@ -710,6 +725,8 @@ def get(
During development and debugging it can be helpful to use a handler that
raises an exception only when the graph is computed. This can be achieved
by passing :py:class:`HandleAsComputeTimeException` as the handler.
check:
Checks if the constructed graph uses all available parameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #43, detecting missing providers was also mentioned, so maybe we need a more specific name for unused parameters check here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need a more meaningful name.

Comment on lines +740 to +754
if check:
all_providers = chain(
(
(p[c], v)
for p in self._subproviders
for c, v in self._subproviders[p].items()
),
((p, v) for p, v in self._providers.items()),
)

for p, v in all_providers:
if p not in graph and _kind_of_provider(v) == 'parameter':
raise RuntimeError(
'Graph was expected to use all available parameters.'
)
Copy link
Member

@SimonHeybrock SimonHeybrock Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we take a step back, I am not sure this is sufficiently capturing the problem. In general, I think what we want to do is check whether a graph is connected, I suppose? Raising errors about individual disconnected pieces may not be meaningful?

Instead of the check flag and handling things via exceptions, would it be better to return some list (or other data structure) of disconnected pieces of the graph? This might be a separate method, instead of extending the compute and get methods with keyword args.

@SimonHeybrock
Copy link
Member

See #74 (comment), I suggest to put this on hold so we can have a think.

@jokasimr jokasimr closed this Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Detect unused parameters
3 participants