Concepts and ideas for a node-based pyiron applied to file-based executables #702

JNmpi · 2023-06-04T18:10:24Z

I have played a bit with combining our new ideas regarding a node-based pyiron with handling files for codes such as Lammps that rely on file input and output. The example works, but I have many ideas how to extend and improve it. To keep it simple I didn't
try to extrect single self-contained pyiron functions (which we should do) but used the full pyiron stack.

Below is the Markdown version of my JupyterNotebook :

Toy implementation and first tests of a File Object Class for workflows

from pathlib import Path
import os

Create file data type/object

import os
from pathlib import Path

class DirectoryObject:
    def __init__(self, directory):
        self.directory = Path(directory)
        self.create()

    def create(self):
        if not self.directory.exists():
            self.directory.mkdir(parents=True)
            print(f"Directory '{self.directory}' created successfully.")
        else:
            print(f"Directory '{self.directory}' already exists.")

    def delete(self):
        if self.directory.exists():
            # Remove all files within the directory
            for file in os.listdir(self.directory):
                file_path = self.directory / file
                print (file, file_path.is_file(), file_path.is_dir())
                if file_path.is_file():
                    file_path.unlink()
                    print(f"File '{file_path}' deleted successfully.")
            
            self.directory.rmdir()
            print(f"Directory '{self.directory}' deleted successfully.")
        else:
            print(f"Directory '{self.directory}' does not exist.")

    def list_files(self):
        if self.directory.exists():
            files = os.listdir(self.directory)
            if files:
                print(f"Files in directory '{self.directory}':")
                for file in files:
                    print(file)
            else:
                print(f"No files found in directory '{self.directory}'.")
        else:
            print(f"Directory '{self.directory}' does not exist.")
            
    def __len__(self):
        files = []
        if self.directory.exists():
            files = os.listdir(self.directory)
        return len(files)
            
    def __repr__(self):
            return f"DirectoryObject(directory='{self.directory}' with {len(self)} files)"

# Example usage
directory_handler = DirectoryObject('WorkingDir_new')

directory_handler.list_files()

directory_handler.delete()
directory_handler.list_files()
directory_handler

Create file data type/object

class FileObject:
    def __init__(self, path):
        self.path = Path(path)
        
    def __repr__(self):
        if not self.path.is_file():
            return 'File does not exist'
        f_str = f'File object(path=\'{self.path}\') {os.linesep}'
        with self.path.open(mode='r') as f:
            f_str += f.read()
        return f_str    
        
    def write(self, text, mode='w'):
        with self.path.open(mode=mode) as f:
            f.write(text)
            
    def write_text(self, text, mode='w'):
        self.path.write_text(text)  
            
    def delete(self): 
        self.path.unlink()
        
    def move_to_dir(self, target):
        if isinstance(target, DirectoryObject):
            target = target.directory
        else:    
            target = Path(target)
        source = target / self.path.name

        if target.is_dir():
            self.path.rename(source)
        self.path = source 
        
    def rename(self, new_filename):
        new_path = self.path.parent / new_filename
        if new_path.is_file():
            raise ValueError('File exists already')
        self.path.rename(new_path)
        self.path = new_path

Application examples

work_dir = DirectoryObject('WorkingDir')

fo = FileObject('./input.test')
fo.write('my first test')
fo

fo_new = DirectoryObject('WorkingDir_new')
fo.move_to_dir(fo_new)
#fo.move_to_dir('WorkingDir')
fo_new, fo

fo.delete()

fo_new.delete()

ListFileObject

class ListFileObject:
    def __init__(self, file_list=[], directory='.'):
        self.files = {}
        for file in file_list:
            self.files[file] = FileObject(Path(directory) / Path(file) )
            
    def __repr__(self):
        return f'ListFileObject({list(self.files.keys())})'

Create example workflow for Lammps

from string import Template
import dask

Create some of the Lammps files using pyiron

Note: For the toy model I use pyiron to generate the structure.inp and potential files. For a more practical application/implementation this should be replaced by nodes that directly translate e.g. the pyiron atomic structure into a Lammps input.ini file.

class WorkflowResources:
    def __init__(self, working_directory, executable, job_name, project):
        self.working_directory = working_directory
        self.executable = executable
        self.job_name = job_name
        self.project = project

@dask.delayed  # does not work with dask
def lammps_setup(structure, project_path='.', job_name='job'):
    from pyiron import Project

    pr = Project(project_path)

    lammps = pr.create.job.Lammps(job_name)
    lammps.structure = structure
    list_pot = lammps.list_potentials()
    lammps.potential = list_pot[0]
    lammps.write_input()
    return WorkflowResources(working_directory=lammps.working_directory, 
                     executable=lammps.executable, 
                     job_name=job_name, 
                     project=pr)

Note: The following statement fails in dask when implementing multiple returns (can then be only resolved via getitem)

working_dir, executable = lammps_setup(structure=bulk, label=pr.name)

It is therefore better to introduce a complex data object

control_inp_MD_str = Template('''
units metal
dimension 3
boundary p p p
atom_style atomic
read_data structure.inp
include potential.inp
fix ensemble all nvt temp $temperature $temperature 0.1
variable dumptime  equal $n_print 
variable thermotime  equal $n_print 
timestep 0.001
velocity all create $temperature_x2 67525 dist gaussian
dump 1 all custom $${dumptime} dump.out id type xsu ysu zsu fx fy fz vx vy vz
dump_modify 1 sort id format line "%d %d %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g"
thermo_style custom step temp pe etotal pxx pxy pxz pyy pyz pzz vol
thermo_modify format float %20.15g
thermo $${thermotime}
run $n_ionic_steps
''')

# print(control_inp_MD_str.substitute(temperature=900, temperature_x2=2*900, n_print=100, n_ionic_steps=1000))

@dask.delayed
def LammpsMD_control(temperature=900, n_print=100, n_ionic_steps=1000, directory='.'):
    
    text = control_inp_MD_str.substitute(temperature=temperature, 
                                         temperature_x2=2*temperature, 
                                         n_print=n_print, 
                                         n_ionic_steps=n_ionic_steps)
    
    fo = FileObject(Path(directory) / 'control.inp')
    fo.write_text(text)
    return fo

@dask.delayed
def LammpsControlFile(directory='.', 
                      text=None):
    fo = FileObject(Path(directory) / 'control.inp')
    fo.write_text(text)
    return fo

@dask.delayed
def runLammps(control,
              directory='.', 
              executable=None):
    import subprocess

    subprocess.run(f'cd {directory}; {str(executable)} > error.out', shell=True)
    
    return ListFileObject(['dump.out', 'error.out', 'log.lammps'], 
                          directory=directory)

@dask.delayed
def lammps_collect_dump(dump_file:FileObject):
    import numpy as np
    
    data_lst = []
    
    n_atoms = 4
    n_shift = 9 + n_atoms    

    with open(dump_file.path, 'r') as f:
        for i in range(8):
            line = f.readline()
            if i==3:
                n_atoms = int(line.split()[-1])
                n_shift = 9 + n_atoms
                print ('atoms:', n_atoms)         
        for i, line in enumerate(f.readlines()):
            if (i%n_shift > 0) & (i%n_shift<n_shift-8):
                data_lst.append([float(number) for number in line.split()[2:]])
                
        data_lst = np.array(data_lst).T

    label_lst = 'xsu ysu zsu fx fy fz vx vy vz'.split()
    out_dict = {}
    out_dict['unwrapped_positions'] = data_lst[:3].reshape([-1, n_atoms, 3])
    out_dict['forces'] = data_lst[3:6].reshape([-1, n_atoms, 3])
    return out_dict

@dask.delayed
def plot_tensor_rank3(tensor, axis=0, ylabel=''):
    import matplotlib.pylab as plt
    
    plt.ylabel(ylabel)
    return plt.plot(tensor[:,:,axis])

Construct and run workflow

from pyiron import Project

pr = Project('test')
bulk = pr.create.structure.bulk('Al', cubic=True)

%%time
resources = lammps_setup(structure=bulk, project_path=pr.name, job_name='lammps')

control = LammpsMD_control(temperature=20, 
                           n_ionic_steps=1000, 
                           directory=resources.working_directory)
lammps = runLammps(control, 
                   directory=resources.working_directory,
                   executable=resources.executable)

out_dict = lammps_collect_dump(lammps.files['dump.out'])

plot = plot_tensor_rank3(out_dict['unwrapped_positions'], axis=2, ylabel='Positions');
plot.compute();

The above warning occurs in setup_lammps and is likely related to not closing the database connection in our pyiron job object. Check how this could be done.

Visualize workflow graph

plot.dask

plot.visualize(engine="cytoscape")

Missing in dask/present implementation:

access to input after initialization
construction of explicit workflows
what is a good syntax to access input of nodes, subnotes etc. in a workflow

The text was updated successfully, but these errors were encountered:

samwaseda · 2023-06-05T07:54:38Z

Thanks @JNmpi for writing this down! Especially for using the python markdown.

I'm a bit confused about the distinction between DirectoryObject and FileObject, especially by the fact that the path is defined in both of them independently. I would have suggested: path is only stored in DirectoryObject, and FileObject can write files only when it's attached to a DirectoryObject, so that a single DirectoryObject contains multiple FileObject.

JNmpi added the enhancement New feature or request label Jun 4, 2023

JNmpi assigned liamhuber and samwaseda Jun 4, 2023

This was referenced Jun 19, 2023

Introduce DirectoryObject and FileObject in Workflow #722

Closed

Introduce DirectoryObject and FileObject in Workflow #725

Merged

samwaseda closed this as completed in #725 Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concepts and ideas for a node-based pyiron applied to file-based executables #702

Concepts and ideas for a node-based pyiron applied to file-based executables #702

JNmpi commented Jun 4, 2023

samwaseda commented Jun 5, 2023

Concepts and ideas for a node-based pyiron applied to file-based executables #702

Concepts and ideas for a node-based pyiron applied to file-based executables #702

Comments

JNmpi commented Jun 4, 2023

Toy implementation and first tests of a File Object Class for workflows

Create file data type/object

Create file data type/object

Application examples

ListFileObject

Create example workflow for Lammps

Create some of the Lammps files using pyiron

Construct and run workflow

Visualize workflow graph

samwaseda commented Jun 5, 2023