Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graph-based pyiron workflows #577

Merged
merged 126 commits into from
Mar 9, 2023
Merged

Graph-based pyiron workflows #577

merged 126 commits into from
Mar 9, 2023

Conversation

liamhuber
Copy link
Member

@liamhuber liamhuber commented Feb 2, 2023

As discussed on the teams page.

An example of Joerg's syntax running.

  • IO channels
  • IO interface
  • Node
  • Workflow
  • Session This can actually probably just stay Project
  • Content (actual nodes)
  • Serialization This can be a separate PR
  • QoL stuff I thought of over the weekend
    • Rename name and engine, as these may frequently conflict with channel names
    • Rename input and output to inputs and outputs to avoid conflicting with the verbs
    • Make convenient iterators for the different layers, e.g. for connection in channel; for channel in inputs; for node in workflow
      • For inputs and outputs, it might be better to just directly rebase these onto a DotDict so they inherit keys(),values()anditems()`.
    • Some sort of verification routine, to display the status and connections of all the ports in a IO/node/workflow (this comes for free with a GUI, but some text-based thing that gives a dataframe or something should be easy enough.)
    • A tool using the inspect module to help node designers quickly check for naming consistencies between input/pre/engine/post/output (the error messages are not so opaque, but this could be faster and easier). inspect is used to generate the channels to start with now
  • Allow callables for types that can facilitate some custom type checking from the user Opted to skip this for now since serialization would require saving an execing more raw code.
  • Make the node function central and mandatory
  • Don't make people use ChannelTemplate at all (i.e. scrape types from annotations fo the node function)
  • Expect node functions to return a value or tuple of values, not a dict
  • Update nodes module so child classes are created by modifying __init__ instead of overriding class attributes
  • Update tests
  • Update docstring
  • Refactor out ugliness that crept in during the refactor to the new paradigm

Here:

  • All the low-level in-memory stuff, working pretty well
  • Some really good documentation, some missing docs
  • Tests throughout
  • An example

Big things not here:

  • A good treatment of job creation
    • Project is just really deeply embedded in pyiron, understandably, and I'm not happy with the janky way I "avoid" it
  • QoL feedback, especially visualization Let's add some quality -- TODOs updated
    • Debugging the example, I realized just how badly I miss the gui elements of ironflow!
  • Serialization
    • This is absolutely critical, but IMO it should be a different PR. Either we can merge this and then work on that, or leave this un-merged and just stack the PRs.
  • Macros
    • After thinking about it today, right now I lean towards a to_node() converter on the workflows, which dynamically generates you all the code needed to make a node with your current workflow's functionality. The big advantage is that it locks in what IO is open and what IO is internally connected within the workflow, which is the sort of robustness I'd like to see. I also think it may wind up really helping with "sharability", as then you have a light-weight python code snippet that other people can register with and use in their workflows. But I didn't want to bother implementing anything until we'd had some discussion.
  • Discontinuous graph exectution, e.g. nodes that need to be explicitly "run" or execution inputs
  • Any handling of using remote resources or starting subprocesses

We can replace this with a real tool later
In anticipation of having Input and Output classes be entire collections of channels
And refactor the conneciton validity tests to be more readable and robust
Just let it catch the key or attribute error farther down the stack. We can catch it here with a more informative message later if we want
@liamhuber liamhuber marked this pull request as ready for review March 6, 2023 23:15
Copy link
Member

@samwaseda samwaseda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One feature that would be really nice to have would be a nice representation of __repr__. Right now, what we see is <pyiron_contrib.workflow.node.Node at 0x7f5366bf97b0> for a node. Something like this:

class Node:
    ...
    def to_dict(self):
        return {
            "label": self.label,
            "inputs": {
                tag: {
                    "type": str(channel.type_hint),
                    "connected": channel.ready,
                    "value": str(channel.value),
                }
                for tag, channel in self.inputs.channel_dict.items()
            },
            "outputs": {
                tag: {
                    "type": str(channel.type_hint),
                    "value": str(channel.value),
                }
                for tag, channel in self.outputs.channel_dict.items()
            }
        }
    
    def _repr_json_(self):
        return self.to_dict()
    
    def __repr__(self):
        return json.dumps(self.to_dict(), indent=2)

Obviously it's only a suggestion, so feel free to change it.

Comment on lines +79 to +82
"try:\n",
" node = Node(plus_minus_one, (\"p1\", \"m1\"))\n",
"except TypeError:\n",
" print(\"This won't quite work yet!\")"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it confusing that this notebook starts with an example which does not work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I don't think I'll fix it this PR though.

@samwaseda
Copy link
Member

samwaseda commented Mar 7, 2023

I presume the following behaviour is not desired?

def f(a: int) -> int:
    return a + 1

def g(b: int) -> int:
    return b - 2

nd_f = Node(f, 'f')
nd_g = Node(g, 'g')
nd_f.inputs.a = nd_g.outputs.g
print(nd_f.inputs.a.ready)

Output:

False

Is it possible that it's because def connect in channels.py only appends the connections but does not update self.value?

@liamhuber
Copy link
Member Author

Is it possible that it's because def connect in channels.py only appends the connections but does not update self.value?

This is exactly why. I realized last night that that's what was bugging me about the pre-constructed nodes example, but didn't get a chance to fix it/add it to the todo list yet. I want to see if it introduces any weird side effects in the tests, but right now I plan to introduce updating-on-connection.

@liamhuber
Copy link
Member Author

One feature that would be really nice to have would be a nice representation of repr. Right now, what we see is <pyiron_contrib.workflow.node.Node at 0x7f5366bf97b0> for a node. Something like this:

I actually like to be able to see the memory address fairly often, so I think I'll leave the __repr__ alone, but I like to_dict and _json_repr and will add those. Maybe add some nicer representation to some other classes too.

@samwaseda
Copy link
Member

samwaseda commented Mar 8, 2023

I created a full fledged example of pseudo Murnaghan:

from pyiron_contrib.workflow.node import Node
from pyiron_contrib.workflow.workflow import Workflow
from pyiron_atomistics import Project
import numpy as np
from hashlib import sha1

from pyiron_atomistics.atomistics.structure.atoms import Atoms

pr = Project('WORKFLOW')

potential = '1995--Angelo-J-E--Ni-Al-H--LAMMPS--ipr1'

def get_energy(structure: Atoms, potential: str=potential, pr: Project=pr) -> tuple[float, float]:
    lmp = pr.create.job.Lammps(('lmp', sha1((structure.__repr__() + potential).encode()).hexdigest()))
    if lmp.status.initialized:
        lmp.structure = structure
        lmp.potential = potential
        lmp.calc_static()
        lmp.run()
    return lmp.output.energy_pot[-1]

def get_strain(min_strain: float=-0.1, max_strain: float=0.1, n_points: int=11) -> np.ndarray:
    return np.linspace(min_strain, max_strain, n_points)

def get_min_lattice(strain: np.ndarray, pr: Project=pr) -> Atoms:
    structure = pr.create.structure.bulk('Al', cubic=True)
    e_lst = [get_energy(structure.apply_strain(s, return_box=True)) for s in strain]
    coeff = np.polyfit(strain, e_lst, 3)
    s_range = np.linspace(np.min(strain), np.max(strain), 1001)
    return structure.apply_strain(s_range[np.polyval(coeff, s_range).argmin()], return_box=True)

nd_min_lattice = Node(get_min_lattice, 'min_lattice')

nd_strain = Node(get_strain, 'strain')

nd_min_lattice.inputs.strain.connect(nd_strain.outputs.strain)

But I was not really sure how to make a workflow out of the inner loop, i.e. currently get_energy is not a node, because I didn't know how to make a variable number of nodes (in this case 11) into a workflow. Is there a better way to do it?

@liamhuber
Copy link
Member Author

I created a full fledged example of pseudo Murnaghan:

Rad! Note that you can typically avoid passing Project as an argument; in the case of structures you just need to instantiate a structure factory, which is actually totally independent of projects anyhow (this is a good sign to me that Jan's work to pull the structure stuff entirely out of the module is on the right track). For jobs I've been playing with totally deleting them and the project that made them after the calculation is finished, so project can exist entirely under the hood there too. I've also been trying to work in a paradigm that one node creates a job and defines the engine-level properties (potential in this case, but kpoints etc in principle), then the node that actually calculates something copies that job to do its work. We don't have to do things this way, of course, but I've found both patterns useful. I think the next step is really to focus on lammps and just get them running totally HDF and database free with Jan's work.

But I was not really sure how to make a workflow out of the inner loop, i.e. currently get_energy is not a node, because I didn't know how to make a variable number of nodes (in this case 11) into a workflow. Is there a better way to do it?

Yeah, this is a great question. I see three approaches:

  1. Have a node that internally creates and runs (and IMO afterwards deletes) multiple jobs. There's no great reason a node can't be a bit heavy, and down the line we may even be able to assign multiple cores to node execution and let it parallelize under the hood (we'd need that later for, e.g. Vasp jobs anyhow, since we can't break that parallelization over multiple nodes). This is the easiest way, but maybe the least satisfying -- it works great when you're custom-building a node out of a function, but maybe results in too many very specific nodes if you try to make a new node class for each type of thing like this that you want.
  2. Make a for-loop node ala Unreal Engine's "blueprints". To keep the graph properly functional, I think we also need an "accumulator" node. I sketched these out while my kid was having a sick day, and I think we might need a NotData type that forces type checking to fail (thus influencing ready) regardless of whether the IO is typed or not. (He wanted to help so sorry it's not the most readable...)
  3. Make some sort of "batching" paradigm, that automatically converts list-like collections of input data marked as "batched" into a loop during node execution. I do this over in ironflow and it's super sexy UX when you have a GUI because you just click a button to "batch" an input and all of a sudden your graph is running loops with no additional work on your end. The downside is that it massively complicates the already complex task of type checking, as you need to introduce a whole new branch of logic to see what to accept once your channel is batched....

(1) will always be acceptable. At the moment I lean towards trying to get really tight flow control nodes to make (2) as easy as possible rather than porting over my work in ironflow for (3), just because of the type-checking headaches.

IMG_4533
Note: passing nullify [NotData] back into your own input makes the node not ready so that it won't update again until one of the other connected inputs updates. I think this for-loop is actually busted still, but the accumulator should work.

pyiron_contrib/workflow/io.py Outdated Show resolved Hide resolved
pyiron_contrib/workflow/io.py Show resolved Hide resolved
@liamhuber liamhuber merged commit 1f2dcea into main Mar 9, 2023
@delete-merged-branch delete-merged-branch bot deleted the graph branch March 9, 2023 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants