Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concepts and ideas for a node-based pyiron applied to file-based executables #702

Closed
JNmpi opened this issue Jun 4, 2023 · 1 comment · Fixed by #725
Closed

Concepts and ideas for a node-based pyiron applied to file-based executables #702

JNmpi opened this issue Jun 4, 2023 · 1 comment · Fixed by #725
Assignees
Labels
enhancement New feature or request

Comments

@JNmpi
Copy link

JNmpi commented Jun 4, 2023

I have played a bit with combining our new ideas regarding a node-based pyiron with handling files for codes such as Lammps that rely on file input and output. The example works, but I have many ideas how to extend and improve it. To keep it simple I didn't
try to extrect single self-contained pyiron functions (which we should do) but used the full pyiron stack.

Below is the Markdown version of my JupyterNotebook :

Toy implementation and first tests of a File Object Class for workflows

from pathlib import Path
import os

Create file data type/object

import os
from pathlib import Path

class DirectoryObject:
    def __init__(self, directory):
        self.directory = Path(directory)
        self.create()

    def create(self):
        if not self.directory.exists():
            self.directory.mkdir(parents=True)
            print(f"Directory '{self.directory}' created successfully.")
        else:
            print(f"Directory '{self.directory}' already exists.")

    def delete(self):
        if self.directory.exists():
            # Remove all files within the directory
            for file in os.listdir(self.directory):
                file_path = self.directory / file
                print (file, file_path.is_file(), file_path.is_dir())
                if file_path.is_file():
                    file_path.unlink()
                    print(f"File '{file_path}' deleted successfully.")
            
            self.directory.rmdir()
            print(f"Directory '{self.directory}' deleted successfully.")
        else:
            print(f"Directory '{self.directory}' does not exist.")

    def list_files(self):
        if self.directory.exists():
            files = os.listdir(self.directory)
            if files:
                print(f"Files in directory '{self.directory}':")
                for file in files:
                    print(file)
            else:
                print(f"No files found in directory '{self.directory}'.")
        else:
            print(f"Directory '{self.directory}' does not exist.")
            
    def __len__(self):
        files = []
        if self.directory.exists():
            files = os.listdir(self.directory)
        return len(files)
            
    def __repr__(self):
            return f"DirectoryObject(directory='{self.directory}' with {len(self)} files)"            
# Example usage
directory_handler = DirectoryObject('WorkingDir_new')

directory_handler.list_files()
directory_handler.delete()
directory_handler.list_files()
directory_handler

Create file data type/object

class FileObject:
    def __init__(self, path):
        self.path = Path(path)
        
    def __repr__(self):
        if not self.path.is_file():
            return 'File does not exist'
        f_str = f'File object(path=\'{self.path}\') {os.linesep}'
        with self.path.open(mode='r') as f:
            f_str += f.read()
        return f_str    
        
    def write(self, text, mode='w'):
        with self.path.open(mode=mode) as f:
            f.write(text)
            
    def write_text(self, text, mode='w'):
        self.path.write_text(text)  
            
    def delete(self): 
        self.path.unlink()
        
    def move_to_dir(self, target):
        if isinstance(target, DirectoryObject):
            target = target.directory
        else:    
            target = Path(target)
        source = target / self.path.name

        if target.is_dir():
            self.path.rename(source)
        self.path = source 
        
    def rename(self, new_filename):
        new_path = self.path.parent / new_filename
        if new_path.is_file():
            raise ValueError('File exists already')
        self.path.rename(new_path)
        self.path = new_path

Application examples

work_dir = DirectoryObject('WorkingDir')

fo = FileObject('./input.test')
fo.write('my first test')
fo
fo_new = DirectoryObject('WorkingDir_new')
fo.move_to_dir(fo_new)
#fo.move_to_dir('WorkingDir')
fo_new, fo
fo.delete()
fo_new.delete()

ListFileObject

class ListFileObject:
    def __init__(self, file_list=[], directory='.'):
        self.files = {}
        for file in file_list:
            self.files[file] = FileObject(Path(directory) / Path(file) )
            
    def __repr__(self):
        return f'ListFileObject({list(self.files.keys())})'
            

Create example workflow for Lammps

from string import Template
import dask

Create some of the Lammps files using pyiron

Note: For the toy model I use pyiron to generate the structure.inp and potential files. For a more practical application/implementation this should be replaced by nodes that directly translate e.g. the pyiron atomic structure into a Lammps input.ini file.

class WorkflowResources:
    def __init__(self, working_directory, executable, job_name, project):
        self.working_directory = working_directory
        self.executable = executable
        self.job_name = job_name
        self.project = project

@dask.delayed  # does not work with dask
def lammps_setup(structure, project_path='.', job_name='job'):
    from pyiron import Project

    pr = Project(project_path)

    lammps = pr.create.job.Lammps(job_name)
    lammps.structure = structure
    list_pot = lammps.list_potentials()
    lammps.potential = list_pot[0]
    lammps.write_input()
    return WorkflowResources(working_directory=lammps.working_directory, 
                     executable=lammps.executable, 
                     job_name=job_name, 
                     project=pr)

Note: The following statement fails in dask when implementing multiple returns (can then be only resolved via getitem)

working_dir, executable = lammps_setup(structure=bulk, label=pr.name)

It is therefore better to introduce a complex data object

control_inp_MD_str = Template('''
units metal
dimension 3
boundary p p p
atom_style atomic
read_data structure.inp
include potential.inp
fix ensemble all nvt temp $temperature $temperature 0.1
variable dumptime  equal $n_print 
variable thermotime  equal $n_print 
timestep 0.001
velocity all create $temperature_x2 67525 dist gaussian
dump 1 all custom $${dumptime} dump.out id type xsu ysu zsu fx fy fz vx vy vz
dump_modify 1 sort id format line "%d %d %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g %20.15g"
thermo_style custom step temp pe etotal pxx pxy pxz pyy pyz pzz vol
thermo_modify format float %20.15g
thermo $${thermotime}
run $n_ionic_steps
''')

# print(control_inp_MD_str.substitute(temperature=900, temperature_x2=2*900, n_print=100, n_ionic_steps=1000))
@dask.delayed
def LammpsMD_control(temperature=900, n_print=100, n_ionic_steps=1000, directory='.'):
    
    text = control_inp_MD_str.substitute(temperature=temperature, 
                                         temperature_x2=2*temperature, 
                                         n_print=n_print, 
                                         n_ionic_steps=n_ionic_steps)
    
    fo = FileObject(Path(directory) / 'control.inp')
    fo.write_text(text)
    return fo    
@dask.delayed
def LammpsControlFile(directory='.', 
                      text=None):
    fo = FileObject(Path(directory) / 'control.inp')
    fo.write_text(text)
    return fo    
@dask.delayed
def runLammps(control,
              directory='.', 
              executable=None):
    import subprocess

    subprocess.run(f'cd {directory}; {str(executable)} > error.out', shell=True)
    
    return ListFileObject(['dump.out', 'error.out', 'log.lammps'], 
                          directory=directory)
@dask.delayed
def lammps_collect_dump(dump_file:FileObject):
    import numpy as np
    
    data_lst = []
    
    n_atoms = 4
    n_shift = 9 + n_atoms    

    with open(dump_file.path, 'r') as f:
        for i in range(8):
            line = f.readline()
            if i==3:
                n_atoms = int(line.split()[-1])
                n_shift = 9 + n_atoms
                print ('atoms:', n_atoms)         
        for i, line in enumerate(f.readlines()):
            if (i%n_shift > 0) & (i%n_shift<n_shift-8):
                data_lst.append([float(number) for number in line.split()[2:]])
                
        data_lst = np.array(data_lst).T

    label_lst = 'xsu ysu zsu fx fy fz vx vy vz'.split()
    out_dict = {}
    out_dict['unwrapped_positions'] = data_lst[:3].reshape([-1, n_atoms, 3])
    out_dict['forces'] = data_lst[3:6].reshape([-1, n_atoms, 3])
    return out_dict
@dask.delayed
def plot_tensor_rank3(tensor, axis=0, ylabel=''):
    import matplotlib.pylab as plt
    
    plt.ylabel(ylabel)
    return plt.plot(tensor[:,:,axis])

Construct and run workflow

from pyiron import Project

pr = Project('test')
bulk = pr.create.structure.bulk('Al', cubic=True)
%%time
resources = lammps_setup(structure=bulk, project_path=pr.name, job_name='lammps')

control = LammpsMD_control(temperature=20, 
                           n_ionic_steps=1000, 
                           directory=resources.working_directory)
lammps = runLammps(control, 
                   directory=resources.working_directory,
                   executable=resources.executable)

out_dict = lammps_collect_dump(lammps.files['dump.out'])

plot = plot_tensor_rank3(out_dict['unwrapped_positions'], axis=2, ylabel='Positions');
plot.compute();

The above warning occurs in setup_lammps and is likely related to not closing the database connection in our pyiron job object. Check how this could be done.

Visualize workflow graph

plot.dask
plot.visualize(engine="cytoscape")

Missing in dask/present implementation:

  • access to input after initialization
  • construction of explicit workflows
  • what is a good syntax to access input of nodes, subnotes etc. in a workflow
@JNmpi JNmpi added the enhancement New feature or request label Jun 4, 2023
@samwaseda
Copy link
Member

Thanks @JNmpi for writing this down! Especially for using the python markdown.

I'm a bit confused about the distinction between DirectoryObject and FileObject, especially by the fact that the path is defined in both of them independently. I would have suggested: path is only stored in DirectoryObject, and FileObject can write files only when it's attached to a DirectoryObject, so that a single DirectoryObject contains multiple FileObject.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants