New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Avoid `bader_caller` from altering compressed file in place #3660

Merged

janosh merged 15 commits into materialsproject:master from DanielYang59:improve-bader-caller

Mar 1, 2024

Contributor

DanielYang59 commented Feb 28, 2024 •

edited

Loading

Summary: Avoid bader_caller from altering compressed file in place

Prevent bader_caller from altering compressed test in place, see Deprecate _parse_atomic_densities in BaderAnalysis and fix Bader test setup #3656 (comment)
Removed bader_exe_path to encourage users to add Bader executable to PATH for simplicity
Other format clean ups

Details

Added a utility function temp_decompress to copy compressed file and decompress the copied file in ScratchDir. If the file is not compressed, would just pass to avoid unnecessary file OPs.

DanielYang59 added 4 commits

February 28, 2024 11:20


          add type hint and fix mypy error

ade7207


          Merge branch 'materialsproject:master' into improve-bader-caller

e4e445f


          fix calling of class

819c66e


          format cleanups

302307c

DanielYang59 commented

View reviewed changes

tests/command_line/test_bader_caller.py Outdated Show resolved Hide resolved

DanielYang59 commented

View reviewed changes

pymatgen/command_line/bader_caller.py

                       self.parse_atomic_densities = parse_atomic_densities
                       with ScratchDir("."):
                           if chgcar_filename:
-                              self.is_vasp = True

Contributor Author

DanielYang59 Feb 28, 2024 •

edited

Loading

Didn't see self.is_vasp being used.

DanielYang59 commented

View reviewed changes

pymatgen/command_line/bader_caller.py

                       return paths[0]
                   chgcar_path = _get_filepath("CHGCAR", "Could not find CHGCAR!")
-                  chgcar = Chgcar.from_file(chgcar_path)
+                  if chgcar_path is not None:
+                      chgcar = Chgcar.from_file(chgcar_path)

Contributor Author

DanielYang59 Feb 28, 2024 •

edited

Loading

Fixed mypy error:

Argument 1 to "from_file" of "Chgcar" has incompatible type "str | None"; expected "str"

DanielYang59 added 3 commits

February 28, 2024 13:55


          add type hint and format tweak

cba9a17


          remove unnecessary var

de679d1


          simplify Bader executable fetch

e53c383

DanielYang59 commented

View reviewed changes

pymatgen/command_line/bader_caller.py Show resolved Hide resolved

DanielYang59 added 3 commits

February 28, 2024 16:50


          clean up __init__

50f0352


          revert change of BaderAnalysis

ae60ff0


          avoid changing compressed file in place

7c1df63

DanielYang59 changed the title ~~Improve bader_caller and its tests~~ Avoid bader_caller from altering compressed file in place

DanielYang59 marked this pull request as ready for review

February 28, 2024 12:03

DanielYang59 requested review from shyuep and mkhorton as code owners

February 28, 2024 12:03

Contributor Author

DanielYang59 commented Feb 28, 2024

Please review @janosh. Thanks.

Maybe relocate the utility function to a more general accessible location (as I notice there are other tests seem to alter compressed file in place as well, cannot remember which one at the moment, would tag them in the future once I find any)?

Tests for test_cod seems to fail but this should not be related to this PR, ERROR 2003 (HY000): Can't connect to MySQL server on 'www.crystallography.net:3306' (10060)

DanielYang59 requested a review from janosh

February 28, 2024 12:15

DanielYang59 commented

View reviewed changes

pymatgen/command_line/bader_caller.py Show resolved Hide resolved

DanielYang59 and others added 3 commits

February 28, 2024 21:36


          revert removal of arg bader_path

06d1663


          revise docstring


          fix chgcar_path possibly unbound pyright error

b3a989a

janosh reviewed

View reviewed changes

pymatgen/command_line/bader_caller.py Show resolved Hide resolved

DanielYang59 commented

View reviewed changes

pymatgen/command_line/bader_caller.py Outdated

    
            @@ -100,14 +98,14 @@ def temp_decompress(file: str | Path, target_dir: str = ".") -> str:
          
                          """

                          file = Path(file)

                          if file.suffix.lower() in [".bz2", ".gz", ".z"]:

                          if file.suffix.lower() in (".bz2", ".gz", ".z"):

Contributor Author

DanielYang59 Mar 1, 2024

My oversight, should use set instead {".bz2", ".gz", ".z"}

Contributor Author

DanielYang59 Mar 4, 2024 •

edited

Loading

Add a quick note for myself, Sourcery would recommend using set over list or tuple for membership check, see here.

The advantage of set/tuple over list seems quite obvious, for example discussed here.

However I still don't quite know the difference between tuple and set in this scenario, some claims it's related to performance.

DanielYang59 commented

View reviewed changes

pymatgen/command_line/bader_caller.py Outdated

-                  if chgcar_path is not None:
-                      chgcar = Chgcar.from_file(chgcar_path)
+                  if chgcar_path is None:
+                      raise FileNotFoundError("Could not find CHGCAR!")

Contributor Author

DanielYang59 Mar 1, 2024

The _get_filepath method already issued a warning in line 529, maybe this is not necessary?

Member

janosh Mar 1, 2024

i guess Chgcar.from_file(chgcar_path) will already raise if the file is missing?

Contributor Author

DanielYang59 Mar 1, 2024

Maybe no... By Chgcar.from_file

pymatgen/pymatgen/io/vasp/outputs.py

Lines 3661 to 3672 in d07164f

    
               @classmethod 
        
               def from_file(cls, filename: str): 
        
                   """Read a CHGCAR file. 
        
                   Args: 
        
                       filename (str): Path to CHGCAR file. 
        
                   Returns: 
        
                       Chgcar 
        
                   """ 
        
                   poscar, data, data_aug = VolumetricData.parse_file(filename) 
        
                   return cls(poscar, data, data_aug=data_aug)

And then VolumetricData.parse_file:

pymatgen/pymatgen/io/vasp/outputs.py

Lines 3431 to 3535 in d07164f

    
               @staticmethod 
        
               def parse_file(filename: str) -> tuple[Poscar, dict, dict]: 
        
                   """ 
        
                   Convenience method to parse a generic volumetric data file in the vasp 
        
                   like format. Used by subclasses for parsing file. 
        
                   Args: 
        
                       filename (str): Path of file to parse 
        
                   Returns: 
        
                       tuple[Poscar, dict, dict]: Poscar object, data dict, data_aug dict 
        
                   """ 
        
                   poscar_read = False 
        
                   poscar_string: list[str] = [] 
        
                   dataset: np.ndarray = np.zeros((1, 1, 1)) 
        
                   all_dataset: list[np.ndarray] = [] 
        
                   # for holding any strings in input that are not Poscar 
        
                   # or VolumetricData (typically augmentation charges) 
        
                   all_dataset_aug: dict[int, list[str]] = {} 
        
                   dim: list[int] = [] 
        
                   dimline = "" 
        
                   read_dataset = False 
        
                   ngrid_pts = 0 
        
                   data_count = 0 
        
                   poscar = None 
        
                   with zopen(filename, mode="rt") as file: 
        
                       for line in file: 
        
                           original_line = line 
        
                           line = line.strip() 
        
                           if read_dataset: 
        
                               for tok in line.split(): 
        
                                   if data_count < ngrid_pts: 
        
                                       # This complicated procedure is necessary because 
        
                                       # vasp outputs x as the fastest index, followed by y 
        
                                       # then z. 
        
                                       no_x = data_count // dim[0] 
        
                                       dataset[data_count % dim[0], no_x % dim[1], no_x // dim[1]] = float(tok) 
        
                                       data_count += 1 
        
                               if data_count >= ngrid_pts: 
        
                                   read_dataset = False 
        
                                   data_count = 0 
        
                                   all_dataset.append(dataset) 
        
                           elif not poscar_read: 
        
                               if line != "" or len(poscar_string) == 0: 
        
                                   poscar_string.append(line) 
        
                               elif line == "": 
        
                                   poscar = Poscar.from_str("\n".join(poscar_string)) 
        
                                   poscar_read = True 
        
                           elif not dim: 
        
                               dim = [int(i) for i in line.split()] 
        
                               ngrid_pts = dim[0] * dim[1] * dim[2] 
        
                               dimline = line 
        
                               read_dataset = True 
        
                               dataset = np.zeros(dim) 
        
                           elif line == dimline: 
        
                               # when line == dimline, expect volumetric data to follow 
        
                               # so set read_dataset to True 
        
                               read_dataset = True 
        
                               dataset = np.zeros(dim) 
        
                           else: 
        
                               # store any extra lines that were not part of the 
        
                               # volumetric data so we know which set of data the extra 
        
                               # lines are associated with 
        
                               key = len(all_dataset) - 1 
        
                               if key not in all_dataset_aug: 
        
                                   all_dataset_aug[key] = [] 
        
                               all_dataset_aug[key].append(original_line) 
        
                       if len(all_dataset) == 4: 
        
                           data = { 
        
                               "total": all_dataset[0], 
        
                               "diff_x": all_dataset[1], 
        
                               "diff_y": all_dataset[2], 
        
                               "diff_z": all_dataset[3], 
        
                           } 
        
                           data_aug = { 
        
                               "total": all_dataset_aug.get(0), 
        
                               "diff_x": all_dataset_aug.get(1), 
        
                               "diff_y": all_dataset_aug.get(2), 
        
                               "diff_z": all_dataset_aug.get(3), 
        
                           } 
        
                           # construct a "diff" dict for scalar-like magnetization density, 
        
                           # referenced to an arbitrary direction (using same method as 
        
                           # pymatgen.electronic_structure.core.Magmom, see 
        
                           # Magmom documentation for justification for this) 
        
                           # TODO: re-examine this, and also similar behavior in 
        
                           # Magmom - @mkhorton 
        
                           # TODO: does CHGCAR change with different SAXIS? 
        
                           diff_xyz = np.array([data["diff_x"], data["diff_y"], data["diff_z"]]) 
        
                           diff_xyz = diff_xyz.reshape((3, dim[0] * dim[1] * dim[2])) 
        
                           ref_direction = np.array([1.01, 1.02, 1.03]) 
        
                           ref_sign = np.sign(np.dot(ref_direction, diff_xyz)) 
        
                           diff = np.multiply(np.linalg.norm(diff_xyz, axis=0), ref_sign) 
        
                           data["diff"] = diff.reshape((dim[0], dim[1], dim[2])) 
        
                       elif len(all_dataset) == 2: 
        
                           data = {"total": all_dataset[0], "diff": all_dataset[1]} 
        
                           data_aug = { 
        
                               "total": all_dataset_aug.get(0), 
        
                               "diff": all_dataset_aug.get(1), 
        
                           } 
        
                       else: 
        
                           data = {"total": all_dataset[0]} 
        
                           data_aug = {"total": all_dataset_aug.get(0)} 
        
                       return poscar, data, data_aug  # type: ignore[return-value]

Thenzopen: https://github.com/materialsvirtuallab/monty/blob/4d60fd4745f840354e7b3f48346eaf3cd68f2b35/monty/io.py#L19-L45

But certainly open would complain (maybe not so descriptive). But _get_filepath would issue a warning anyway, so maybe don't bother issure another here 😄

Contributor Author

DanielYang59 Mar 1, 2024 •

edited

Loading

By the way, I added the if chgcar_path is not None check to avoid mypy error, because _get_filepath could return None and trigger mypy incompatible type error:

error: Argument 1 to "from_file" of "Chgcar" has incompatible type "str | None"; expected "str"  [arg-type]

DanielYang59 commented

View reviewed changes

pymatgen/command_line/bader_caller.py Outdated Show resolved Hide resolved

DanielYang59 commented

View reviewed changes

pymatgen/command_line/bader_caller.py Show resolved Hide resolved


          format tweaks

022889f


          Merge branch 'master' into improve-bader-caller

c648e14

janosh approved these changes

View reviewed changes

Member

janosh left a comment

thanks @DanielYang59! 👍

janosh merged commit 83806df into materialsproject:master

22 checks passed

janosh added ux cli charge labels

Contributor Author

DanielYang59 commented Mar 2, 2024

Thanks for viewing @janosh . Glad I could help.

DanielYang59 deleted the improve-bader-caller branch

March 2, 2024 03:51

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels