Graph-based pyiron workflows #577

liamhuber · 2023-02-02T22:25:34Z

As discussed on the teams page.

An example of Joerg's syntax running.

Here:

All the low-level in-memory stuff, working pretty well
Some really good documentation, some missing docs
Tests throughout
An example

Big things not here:

A good treatment of job creation
- Project is just really deeply embedded in pyiron, understandably, and I'm not happy with the janky way I "avoid" it
~~QoL feedback, especially visualization~~ Let's add some quality -- TODOs updated
- Debugging the example, I realized just how badly I miss the gui elements of ironflow!
Serialization
- This is absolutely critical, but IMO it should be a different PR. Either we can merge this and then work on that, or leave this un-merged and just stack the PRs.
Macros
- After thinking about it today, right now I lean towards a to_node() converter on the workflows, which dynamically generates you all the code needed to make a node with your current workflow's functionality. The big advantage is that it locks in what IO is open and what IO is internally connected within the workflow, which is the sort of robustness I'd like to see. I also think it may wind up really helping with "sharability", as then you have a light-weight python code snippet that other people can register with and use in their workflows. But I didn't want to bother implementing anything until we'd had some discussion.
Discontinuous graph exectution, e.g. nodes that need to be explicitly "run" or execution inputs
Any handling of using remote resources or starting subprocesses

We can replace this with a real tool later

In anticipation of having Input and Output classes be entire collections of channels

And refactor the conneciton validity tests to be more readable and robust

And refactor access

Just let it catch the key or attribute error farther down the stack. We can catch it here with a more informative message later if we want

samwaseda

One feature that would be really nice to have would be a nice representation of __repr__. Right now, what we see is <pyiron_contrib.workflow.node.Node at 0x7f5366bf97b0> for a node. Something like this:

class Node:
    ...
    def to_dict(self):
        return {
            "label": self.label,
            "inputs": {
                tag: {
                    "type": str(channel.type_hint),
                    "connected": channel.ready,
                    "value": str(channel.value),
                }
                for tag, channel in self.inputs.channel_dict.items()
            },
            "outputs": {
                tag: {
                    "type": str(channel.type_hint),
                    "value": str(channel.value),
                }
                for tag, channel in self.outputs.channel_dict.items()
            }
        }
    
    def _repr_json_(self):
        return self.to_dict()
    
    def __repr__(self):
        return json.dumps(self.to_dict(), indent=2)

Obviously it's only a suggestion, so feel free to change it.

samwaseda · 2023-03-07T07:27:08Z

notebooks/workflow_example.ipynb

+    "try:\n",
+    "    node = Node(plus_minus_one, (\"p1\", \"m1\"))\n",
+    "except TypeError:\n",
+    "    print(\"This won't quite work yet!\")"


I find it confusing that this notebook starts with an example which does not work.

That's fair. I don't think I'll fix it this PR though.

samwaseda · 2023-03-07T15:15:30Z

I presume the following behaviour is not desired?

def f(a: int) -> int:
    return a + 1

def g(b: int) -> int:
    return b - 2

nd_f = Node(f, 'f')
nd_g = Node(g, 'g')
nd_f.inputs.a = nd_g.outputs.g
print(nd_f.inputs.a.ready)

Output:

False

Is it possible that it's because def connect in channels.py only appends the connections but does not update self.value?

liamhuber · 2023-03-07T20:01:54Z

Is it possible that it's because def connect in channels.py only appends the connections but does not update self.value?

This is exactly why. I realized last night that that's what was bugging me about the pre-constructed nodes example, but didn't get a chance to fix it/add it to the todo list yet. I want to see if it introduces any weird side effects in the tests, but right now I plan to introduce updating-on-connection.

liamhuber · 2023-03-07T20:03:55Z

One feature that would be really nice to have would be a nice representation of repr. Right now, what we see is <pyiron_contrib.workflow.node.Node at 0x7f5366bf97b0> for a node. Something like this:

I actually like to be able to see the memory address fairly often, so I think I'll leave the __repr__ alone, but I like to_dict and _json_repr and will add those. Maybe add some nicer representation to some other classes too.

@samwaseda

In agreement with @samwaseda

samwaseda · 2023-03-08T07:59:43Z

I created a full fledged example of pseudo Murnaghan:

from pyiron_contrib.workflow.node import Node
from pyiron_contrib.workflow.workflow import Workflow
from pyiron_atomistics import Project
import numpy as np
from hashlib import sha1

from pyiron_atomistics.atomistics.structure.atoms import Atoms

pr = Project('WORKFLOW')

potential = '1995--Angelo-J-E--Ni-Al-H--LAMMPS--ipr1'

def get_energy(structure: Atoms, potential: str=potential, pr: Project=pr) -> tuple[float, float]:
    lmp = pr.create.job.Lammps(('lmp', sha1((structure.__repr__() + potential).encode()).hexdigest()))
    if lmp.status.initialized:
        lmp.structure = structure
        lmp.potential = potential
        lmp.calc_static()
        lmp.run()
    return lmp.output.energy_pot[-1]

def get_strain(min_strain: float=-0.1, max_strain: float=0.1, n_points: int=11) -> np.ndarray:
    return np.linspace(min_strain, max_strain, n_points)

def get_min_lattice(strain: np.ndarray, pr: Project=pr) -> Atoms:
    structure = pr.create.structure.bulk('Al', cubic=True)
    e_lst = [get_energy(structure.apply_strain(s, return_box=True)) for s in strain]
    coeff = np.polyfit(strain, e_lst, 3)
    s_range = np.linspace(np.min(strain), np.max(strain), 1001)
    return structure.apply_strain(s_range[np.polyval(coeff, s_range).argmin()], return_box=True)

nd_min_lattice = Node(get_min_lattice, 'min_lattice')

nd_strain = Node(get_strain, 'strain')

nd_min_lattice.inputs.strain.connect(nd_strain.outputs.strain)

But I was not really sure how to make a workflow out of the inner loop, i.e. currently get_energy is not a node, because I didn't know how to make a variable number of nodes (in this case 11) into a workflow. Is there a better way to do it?

liamhuber · 2023-03-08T17:09:28Z

I created a full fledged example of pseudo Murnaghan:

Rad! Note that you can typically avoid passing Project as an argument; in the case of structures you just need to instantiate a structure factory, which is actually totally independent of projects anyhow (this is a good sign to me that Jan's work to pull the structure stuff entirely out of the module is on the right track). For jobs I've been playing with totally deleting them and the project that made them after the calculation is finished, so project can exist entirely under the hood there too. I've also been trying to work in a paradigm that one node creates a job and defines the engine-level properties (potential in this case, but kpoints etc in principle), then the node that actually calculates something copies that job to do its work. We don't have to do things this way, of course, but I've found both patterns useful. I think the next step is really to focus on lammps and just get them running totally HDF and database free with Jan's work.

But I was not really sure how to make a workflow out of the inner loop, i.e. currently get_energy is not a node, because I didn't know how to make a variable number of nodes (in this case 11) into a workflow. Is there a better way to do it?

Yeah, this is a great question. I see three approaches:

Have a node that internally creates and runs (and IMO afterwards deletes) multiple jobs. There's no great reason a node can't be a bit heavy, and down the line we may even be able to assign multiple cores to node execution and let it parallelize under the hood (we'd need that later for, e.g. Vasp jobs anyhow, since we can't break that parallelization over multiple nodes). This is the easiest way, but maybe the least satisfying -- it works great when you're custom-building a node out of a function, but maybe results in too many very specific nodes if you try to make a new node class for each type of thing like this that you want.
Make a for-loop node ala Unreal Engine's "blueprints". To keep the graph properly functional, I think we also need an "accumulator" node. I sketched these out while my kid was having a sick day, and I think we might need a NotData type that forces type checking to fail (thus influencing ready) regardless of whether the IO is typed or not. (He wanted to help so sorry it's not the most readable...)
Make some sort of "batching" paradigm, that automatically converts list-like collections of input data marked as "batched" into a loop during node execution. I do this over in ironflow and it's super sexy UX when you have a GUI because you just click a button to "batch" an input and all of a sudden your graph is running loops with no additional work on your end. The downside is that it massively complicates the already complex task of type checking, as you need to introduce a whole new branch of logic to see what to accept once your channel is batched....

(1) will always be acceptable. At the moment I lean towards trying to get really tight flow control nodes to make (2) as easy as possible rather than porting over my work in ironflow for (3), just because of the type-checking headaches.

Note: passing nullify [NotData] back into your own input makes the node not ready so that it won't update again until one of the other connected inputs updates. I think this for-loop is actually busted still, but the accumulator should work.

pyiron_contrib/workflow/io.py

Co-authored-by: Sam Dareska <[email protected]>

liamhuber added 30 commits February 2, 2023 10:34

Make a simple dot-access dictionary

7bd4d78

We can replace this with a real tool later

Copy over your github stuff

f569432

Aspire to the pseudo example from github

ed39838

Define io channel interface by tests

7e4b182

Refactor: Rename

eabcb13

In anticipation of having Input and Output classes be entire collections of channels

Improve type hinting and naming

ef28e47

Refactor: rename

bd9904c

Refactor: rename files

56b5734

Break apart channel tests

b954852

Allow making multiple connections at once

63add00

And refactor the conneciton validity tests to be more readable and robust

Give channels names

b8703d9

Add some stuff to session

f17e395

Warn on failed connections

b8791e7

Type hint single classes for channel types

481c84f

Throw a type error for non-channel connections

4e84ca3

Create IO layer that allows assignments as connections

9d981ef

Channels carry their names now, so dict can become list

f55b0f7

Add QoL method for getting values

328d532

Get Node using the new IO

d4d2705

Actually instantiate what you type hint

940b78a

Just use project

2100b7f

Add properties for quickly evaluating connectedness

0503b8e

Make readiness a property only of input

5422fdd

And refactor access

PEP8 rm newline

b945dcf

Streamline and expand double-definition checks

5c88bcc

Add promises

60dcb9c

Add type hints

47340f5

Fix call signature

06d01fb

Remove explicit error

f3c5f9a

Just let it catch the key or attribute error farther down the stack. We can catch it here with a more informative message later if we want

Remove unused code

4849242

liamhuber added 11 commits March 6, 2023 10:49

Fix logic and test for fix

909324a

Move type hint tests to their own module

2d87ea6

Add notes

ae34f72

Allow users to turn off type checking for connections

d705d6e

Add helper methods to change input strictness en masse

9b30242

Debug name iteration and add to docstring

5936a29

Update node adder docs

51df6d7

Add helper methods for seeing channel status in text form

fddc767

Merge branch 'main' into graph

6b0bacd

Update nodes

56004ff

Use prebuilts in docs and notebook

1737dd2

liamhuber marked this pull request as ready for review March 6, 2023 23:15

samwaseda reviewed Mar 7, 2023

View reviewed changes

liamhuber and others added 3 commits March 7, 2023 12:54

Update on connection

2d1a54c

In agreement with @samwaseda

Update example

f24f3f8

Better representation

5fe8dfe

samwaseda reviewed Mar 8, 2023

View reviewed changes

pyiron_contrib/workflow/io.py Outdated Show resolved Hide resolved

pyiron_contrib/workflow/io.py Show resolved Hide resolved

liamhuber and others added 6 commits March 8, 2023 11:51

Add tab completion for IO channels

01a4be4

Co-authored-by: Sam Dareska <[email protected]>

PEP8

bffa043

Add tab completion access for nodes on workflows

2ed5974

Get rid of channel_list

8e0ae2c

Merge branch 'main' into graph

e533334

Add lammps to notebooks env

326b2ee

liamhuber merged commit 1f2dcea into main Mar 9, 2023

delete-merged-branch bot deleted the graph branch March 9, 2023 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph-based pyiron workflows #577

Graph-based pyiron workflows #577

liamhuber commented Feb 2, 2023 •

edited

Loading

samwaseda left a comment

samwaseda Mar 7, 2023

liamhuber Mar 7, 2023

samwaseda commented Mar 7, 2023 •

edited

Loading

liamhuber commented Mar 7, 2023

liamhuber commented Mar 7, 2023

samwaseda commented Mar 8, 2023 •

edited

Loading

liamhuber commented Mar 8, 2023

Graph-based pyiron workflows #577

Graph-based pyiron workflows #577

Conversation

liamhuber commented Feb 2, 2023 • edited Loading

samwaseda left a comment

Choose a reason for hiding this comment

samwaseda Mar 7, 2023

Choose a reason for hiding this comment

liamhuber Mar 7, 2023

Choose a reason for hiding this comment

samwaseda commented Mar 7, 2023 • edited Loading

liamhuber commented Mar 7, 2023

liamhuber commented Mar 7, 2023

samwaseda commented Mar 8, 2023 • edited Loading

liamhuber commented Mar 8, 2023

liamhuber commented Feb 2, 2023 •

edited

Loading

samwaseda commented Mar 7, 2023 •

edited

Loading

samwaseda commented Mar 8, 2023 •

edited

Loading