Add `compute_mapped` #170

SimonHeybrock · 2024-06-18T12:03:03Z

This fixes a big UX hurdle when computing results that use mapped nodes.

We will later on consider if this should be integrated more tightly with Pipeline.compute, for now this should serve as a solution that is 80% there an can be used to gather more insights/experience.

jl-wynen · 2024-06-18T13:29:54Z

src/sciline/pipeline.py

Did you consider how you could integrate this functionality into compute? E.g., check whether a given key is mapped and then do the equivalent of compute_series?

Something like that, yes.

jl-wynen

Ok for a prototype. But in the long run, I would prefer merging this with compute. And I would also prefer not using pandas if possible because users might not use pandas either.

SimonHeybrock · 2024-06-19T11:47:55Z

Ok for a prototype.

This is not meant to be a prototype.

And I would also prefer not using pandas if possible because users might not use pandas either.

We could use a plain dict, but then we do not have index names (aka dims), so one would need to implement a custom data structure for this, including ultimately something like a multi-index (or, simpler something N-D like Scipp, but that won't work any more as soon as Cyclebane supports groupby). I don't think users would prefer using a custom class from an unknown library over a very well known and package in widespread use. Do you have something else in mind?

But in the long run, I would prefer merging this with compute.

It is not yet clear to me that this is a good idea, so I refrained from expressing a preference at this point. Pipeline.compute can compute multiple keys (returning a dict, instead of a single value). So we would need to return a dict containing one or more pandas.Series (instead of a single Series). Would it be too confusing to get a dict, a Series, or a dict containing Series, depending on the args? It is starting too look like a bad interface. Maybe the methods for one target (or a single mapped target) should be separate from the one computing multiple targets? Edit: Furthermore, I had to add an optional index_names argument, in case there are multiple mapped nodes with the same name, which would be cumbersome to integrate into compute with multiple targets.

jl-wynen · 2024-06-21T07:09:39Z

docs/user-guide/parameter-tables.ipynb

+    "\n",
+    "**Note**\n",
+    "\n",
+    "[compute_mapped](https://scipp.github.io/sciline/generated/functions/sciline.compute_mapped.html) depends on Pandas, which is not a dependency of Sciline and must be installed separately, e.g., using pip:\n",


This link and the link below should be relative links.

src/sciline/pipeline.py

jl-wynen · 2024-06-21T07:25:22Z

src/sciline/pipeline.py

+
+    candidates = [
+        node
+        for node in graph._cbgraph.graph.nodes


I don't like that this uses a protected attribute of graph. Should get_mapped_node_names be a method of Pipeline?

This is definitely something that we need to consider for a lot of other functionality (in particular around planner graph operations) that we intend to implement. Either we need to make those properties accessible on the public interface, or potentially add a lot more methods to Pipeline. Right now I cannot say which is better? So unless you can clearly say which solution should be chosen, I'd like to keep this as it is for now.

I don't know what operations you have in mind that need direct access to the graph. But I would say that we should expose a minimal set of primitive operations that can be composed in, e.g., methods of workflow classes.

jl-wynen · 2024-06-21T07:25:37Z

src/sciline/pipeline.py

+
+
+def get_mapped_node_names(
+    graph: Pipeline, key: type, index_names: Sequence[Hashable] | None = None


Is it a graph or a pipeline? We should be consistent with names.

jl-wynen · 2024-06-21T07:25:50Z

src/sciline/pipeline.py

@@ -194,3 +199,93 @@ def bind_and_call(
    def _repr_html_(self) -> str:
        nodes = ((key, data) for key, data in self._graph.nodes.items())
        return pipeline_html_repr(nodes)
+
+
+def get_mapped_node_names(


'name' or 'key'? What is the difference?

jl-wynen · 2024-06-21T07:27:29Z

src/sciline/pipeline.py

+            node for node in candidates if set(node.indices) == set(index_names)
+        ]
+    if len(candidates) == 0:
+        raise ValueError(f"'{key}' is not a mapped node.")


Should this check be before filtering by index names? It seems that key refers to a mapped node but that is filtered out by the index names.

Co-authored-by: Jan-Lukas Wynen <[email protected]>

SimonHeybrock added 3 commits June 18, 2024 12:45

Add compute_series

5348538

Improve tests

6680c5c

Update docs

2041c3f

SimonHeybrock requested a review from nvaytet June 18, 2024 12:03

SimonHeybrock assigned nvaytet Jun 18, 2024

SimonHeybrock added 4 commits June 18, 2024 14:07

Satisfy mypy

07b5e07

Add nicer exception

c7f95e2

Add another test/example

e2c2ff2

Avoid FutureWarning from Pandas

eaefb27

jl-wynen reviewed Jun 18, 2024

View reviewed changes

jl-wynen reviewed Jun 19, 2024

View reviewed changes

jl-wynen approved these changes Jun 19, 2024

View reviewed changes

SimonHeybrock added 3 commits June 19, 2024 13:50

Rename to compute_mapped

194beb5

Test and catch some more errors

12442de

Add argument to disambiguate if multiple matches

0df6300

SimonHeybrock changed the title ~~Add compute_series~~ Add compute_mapped Jun 19, 2024

SimonHeybrock added 2 commits June 19, 2024 14:40

Fix mypy

303bfb7

Fix docs

1997df4

jl-wynen reviewed Jun 21, 2024

View reviewed changes

SimonHeybrock and others added 5 commits June 21, 2024 10:14

Update src/sciline/pipeline.py

4ae04f7

Co-authored-by: Jan-Lukas Wynen <[email protected]>

Use relative links

9d4bd36

More precise type hints and naming

151fd4c

Avoid mixing "key" and "name"

c88585b

Move check

f6d8058

SimonHeybrock requested a review from jl-wynen June 21, 2024 08:30

jl-wynen approved these changes Jun 21, 2024

View reviewed changes

SimonHeybrock merged commit bd8e6f3 into main Jun 21, 2024
5 checks passed

SimonHeybrock deleted the compute-series branch June 21, 2024 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `compute_mapped` #170

Add `compute_mapped` #170

SimonHeybrock commented Jun 18, 2024

jl-wynen Jun 18, 2024

SimonHeybrock Jun 18, 2024

jl-wynen left a comment

SimonHeybrock commented Jun 19, 2024 •

edited

Loading

jl-wynen Jun 21, 2024

jl-wynen Jun 21, 2024

SimonHeybrock Jun 21, 2024

jl-wynen Jun 21, 2024

jl-wynen Jun 21, 2024

jl-wynen Jun 21, 2024

jl-wynen Jun 21, 2024



		def get_mapped_node_names(
		graph: Pipeline, key: type, index_names: Sequence[Hashable] \| None = None

Add compute_mapped #170

Add compute_mapped #170

Conversation

SimonHeybrock commented Jun 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jl-wynen left a comment

Choose a reason for hiding this comment

SimonHeybrock commented Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add `compute_mapped` #170

Add `compute_mapped` #170

SimonHeybrock commented Jun 19, 2024 •

edited

Loading