Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce DirectoryObject and FileObject in Workflow #725

Merged
merged 40 commits into from
Jun 27, 2023
Merged

Conversation

samwaseda
Copy link
Member

@samwaseda samwaseda commented Jun 19, 2023

This PR allows for working files which some softwares require.

  • FileObject handles only the file and has no information of address i.e. it must be attached to a DirectoryObject
  • The central path is known to Workflow -> a Node cannot create a directory if it is not attached to a Workflow
  • The directory is created only if workflow.working_directory is called.
  • Path: workflow_name/function_name/file_name

Example I:

from pyiron_contrib.workflow import Workflow
from pyiron_contrib.workflow.node import Node

def f(x):
    return x

wf = Workflow("TEST")
n_f = Node(f, "f")
wf.add(n_f)
my_file = n_f.working_directory.create_file('new_file')  # Creates all directories
my_file.write('something')

with open(my_file.path, 'r') as f:
    print(f.read())

wf.working_directory.delete()  # deletes everything

Output:

TEST/f/new_file
something

Example II

def f(x):
    return x

n_f = Node(f, "f")
my_file = n_f.create_file('new_file')  # Error, because not attached to a workflow

Closes #702

@samwaseda samwaseda requested review from liamhuber and JNmpi June 19, 2023 17:08
@github-actions
Copy link
Contributor

Binder 👈 Launch a binder notebook on branch pyiron/pyiron_contrib/directory

@samwaseda samwaseda added the enhancement New feature or request label Jun 19, 2023
@samwaseda samwaseda added broken compatibility Pull Requests that break API or HDF compatibility format_black Invoke a black formatting commit and removed broken compatibility Pull Requests that break API or HDF compatibility labels Jun 19, 2023
Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. One UX question.

Needs tests though. files.py is super readable and I don't see anything that could go wrong, but tests should be there at least for the coverage 😉 For the workflow and nodes I actually see tests being useful as they lock-in two promises:

  • Paths are based off labels (this is actually mis-spoken in the PR comment, it uses the node label not the function name -- these are the same by default but can change, and even do change in your example!)
  • Nodes not being able to create a file if not in a workflow (maybe we want to change that spec later, but for now I like it)

At first I thought it might be overkill to put create_file right on Node (as it wouldn't be accessible to any node that doesn't define self in the function argument), but actually (a) I don't mind the extra code that much, and (b) maybe we can exploit it when it comes time to serialize IO. So I want to keep in the back of our minds the idea of refactoring these methods away to some more specific child class, but for now it seems perfectly appropriate to have it here.

pyiron_contrib/workflow/files.py Show resolved Hide resolved
@samwaseda
Copy link
Member Author

Needs tests though.

Ah yes, I left a comment in the botched previous PR saying I was waiting for the first comments but I forgot to write the same in this PR.

  • Paths are based off labels (this is actually mis-spoken in the PR comment, it uses the node label not the function name -- these are the same by default but can change, and even do change in your example!)

Yeah it's good that we talk about it, because until I saw this PR I didn't really think that the label can be different.

  • Nodes not being able to create a file if not in a workflow (maybe we want to change that spec later, but for now I like it)

At first I thought it might be overkill to put create_file right on Node (as it wouldn't be accessible to any node that doesn't define self in the function argument), but actually (a) I don't mind the extra code that much, and (b) maybe we can exploit it when it comes time to serialize IO. So I want to keep in the back of our minds the idea of refactoring these methods away to some more specific child class, but for now it seems perfectly appropriate to have it here.

I also thought about defining it in node.working_directory.create_file, and to be honest I don't have a very strong feeling, so I can also move it there.

@JNmpi
Copy link

JNmpi commented Jun 20, 2023

Thanks, @samwaseda! I like the clear separation into two classes and the functionality very much. In addition to the present functionality, I would like to have the option for a single node to create multiple working directories. The idea is that a single node can createeach several tasks, each needing a separate working directory. A simple example is a Murnaghan node, where several atomistic jobs (with each a separate working directory) need to be created. Thus, nodes should be able to create subdirectories with the working directory of the workflow being the parent directory.

@samwaseda
Copy link
Member Author

A simple example is a Murnaghan node, where several atomistic jobs (with each a separate working directory) need to be created. Thus, nodes should be able to create subdirectories with the working directory of the workflow being the parent directory.

There, I think there's a bit of a difference in terminology with respect to the current implementation of Workflow: a Node is the smallest unit that wraps one function, while a Workflow can contain multiple nodes and itself can act like a node. This means, in the example of Murnaghan (which is a bit difficult to talk about, because parallel jobs are not fully conceptualised on our side), each of the atomistic job is a Node, but the Murnaghan job itself is a Workflow. So I think it would be sufficient to have one directory for one node.

This being said, there indeed should be a functionality that allows a workflow folder within a parent workflow. I just didn't do it because I didn't see a possibility in the current implementation how to see whether a workflow is attached to a parent workflow (i.e. I don't know if it's possible for a child workflow to know whether it has a parent workflow).

Comments from @liamhuber might be appreciated here.

@liamhuber
Copy link
Member

but the Murnaghan job itself is a Workflow

No, Joerg is correct; you can in principle have a node that executes the entire Murnaghan procedure. It doesn't even need to make sub-nodes out of the steps! The idea that nodes are functional and idempotent is a "best practice" and not strictly enforced, and even here you could have a node that starts a pyiron project, makes a bunch of lammps jobs, runs everything, scrapes the results, and finishes that is both idempotent and functional! Of course, you also could make a Murnaghan workflow where each calculation is a node, and then you don't need this.

A task is just a computational unit and a Node.run() might generate multiple of them at once. There's no reason to forbid a task from having its own working directory. This is already possible from inside the node of course, you can use self.working_directory as a root and then make new sub-dirs as part of the task. I don't have any objection to adding tools that make this subdirectory creation easier.

@liamhuber liamhuber changed the base branch from submittable_workflow to main June 26, 2023 21:07
@liamhuber
Copy link
Member

I could have sworn GitHub used to rebase stacked PRs onto main automatically when their target got merged and deleted....

@github-actions
Copy link
Contributor

github-actions bot commented Jun 26, 2023

Pull Request Test Coverage Report for Build 5387100971

  • 73 of 76 (96.05%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.5%) to 12.343%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pyiron_contrib/workflow/files.py 58 61 95.08%
Totals Coverage Status
Change from base Build 5382174367: 0.5%
Covered Lines: 1744
Relevant Lines: 14130

💛 - Coveralls

@liamhuber
Copy link
Member

Test failures on `` look like they're just typical windows stuff, although the is_mount thing is new to me:

======================================================================
ERROR: test_write (workflow.test_files.TestFiles)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\pyiron_contrib\pyiron_contrib\tests\unit\workflow\test_files.py", line 21, in test_write
    self.assertTrue("test/test.txt" in self.directory.list_content()['file'])
  File "D:\a\pyiron_contrib\pyiron_contrib\pyiron_contrib\workflow\files.py", line 48, in list_content
    return categorize_folder_items(self.path)
  File "D:\a\pyiron_contrib\pyiron_contrib\pyiron_contrib\workflow\files.py", line 31, in categorize_folder_items
    if getattr(item, f"is_{tt}")():
  File "C:\Miniconda3\envs\my-env\lib\pathlib.py", line 1461, in is_mount
    raise NotImplementedError("Path.is_mount() is unsupported on this system")
NotImplementedError: Path.is_mount() is unsupported on this system

======================================================================
FAIL: test_path (workflow.test_files.TestFiles)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\pyiron_contrib\pyiron_contrib\tests\unit\workflow\test_files.py", line 30, in test_path
    self.assertEqual(str(f.path), "test/test.txt")
AssertionError: 'test\\test.txt' != 'test/test.txt'
- test\test.txt
?     ^
+ test/test.txt
?     ^


----------------------------------------------------------------------

Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to extend the tests as here, but otherwise once the tests are passing this is looking alright to me.

@samwaseda samwaseda added format_black Invoke a black formatting commit and removed format_black Invoke a black formatting commit labels Jun 27, 2023
@samwaseda
Copy link
Member Author

Test failures on `` look like they're just typical windows stuff, although the is_mount thing is new to me

I was unnecessarily calling all functions starting with is. Now it should be fine.

Copy link
Member

@liamhuber liamhuber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super.

I think there is a way to more elegantly convert "test/tests" to look like a path from the current system than to convert all the references to "/", but what you do here is clear and effective so it's also ok with me.

@samwaseda samwaseda merged commit 5064692 into main Jun 27, 2023
@delete-merged-branch delete-merged-branch bot deleted the directory branch June 27, 2023 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request format_black Invoke a black formatting commit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concepts and ideas for a node-based pyiron applied to file-based executables
4 participants