-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid bader_caller
from altering compressed file in place
#3660
Avoid bader_caller
from altering compressed file in place
#3660
Conversation
self.parse_atomic_densities = parse_atomic_densities | ||
|
||
with ScratchDir("."): | ||
if chgcar_filename: | ||
self.is_vasp = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't see self.is_vasp
being used.
@@ -485,7 +482,8 @@ def _get_filepath(filename: str, msg: str = "") -> str | None: | |||
return paths[0] | |||
|
|||
chgcar_path = _get_filepath("CHGCAR", "Could not find CHGCAR!") | |||
chgcar = Chgcar.from_file(chgcar_path) | |||
if chgcar_path is not None: | |||
chgcar = Chgcar.from_file(chgcar_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed mypy
error:
Argument 1 to "from_file" of "Chgcar" has incompatible type "str | None"; expected "str"
bader_caller
and its testsbader_caller
from altering compressed file in place
Please review @janosh. Thanks. Maybe relocate the utility function to a more general accessible location (as I notice there are other tests seem to alter compressed file in place as well, cannot remember which one at the moment, would tag them in the future once I find any)? Tests for |
@@ -100,14 +98,14 @@ def temp_decompress(file: str | Path, target_dir: str = ".") -> str: | |||
""" | |||
file = Path(file) | |||
|
|||
if file.suffix.lower() in [".bz2", ".gz", ".z"]: | |||
if file.suffix.lower() in (".bz2", ".gz", ".z"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My oversight, should use set instead {".bz2", ".gz", ".z"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a quick note for myself, Sourcery
would recommend using set
over list
or tuple
for membership check, see here.
The advantage of set/tuple
over list
seems quite obvious, for example discussed here.
However I still don't quite know the difference between tuple
and set
in this scenario, some claims it's related to performance.
if chgcar_path is not None: | ||
chgcar = Chgcar.from_file(chgcar_path) | ||
if chgcar_path is None: | ||
raise FileNotFoundError("Could not find CHGCAR!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _get_filepath
method already issued a warning in line 529, maybe this is not necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess Chgcar.from_file(chgcar_path)
will already raise if the file is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe no... By Chgcar.from_file
pymatgen/pymatgen/io/vasp/outputs.py
Lines 3661 to 3672 in d07164f
@classmethod | |
def from_file(cls, filename: str): | |
"""Read a CHGCAR file. | |
Args: | |
filename (str): Path to CHGCAR file. | |
Returns: | |
Chgcar | |
""" | |
poscar, data, data_aug = VolumetricData.parse_file(filename) | |
return cls(poscar, data, data_aug=data_aug) |
And then VolumetricData.parse_file
:
pymatgen/pymatgen/io/vasp/outputs.py
Lines 3431 to 3535 in d07164f
@staticmethod | |
def parse_file(filename: str) -> tuple[Poscar, dict, dict]: | |
""" | |
Convenience method to parse a generic volumetric data file in the vasp | |
like format. Used by subclasses for parsing file. | |
Args: | |
filename (str): Path of file to parse | |
Returns: | |
tuple[Poscar, dict, dict]: Poscar object, data dict, data_aug dict | |
""" | |
poscar_read = False | |
poscar_string: list[str] = [] | |
dataset: np.ndarray = np.zeros((1, 1, 1)) | |
all_dataset: list[np.ndarray] = [] | |
# for holding any strings in input that are not Poscar | |
# or VolumetricData (typically augmentation charges) | |
all_dataset_aug: dict[int, list[str]] = {} | |
dim: list[int] = [] | |
dimline = "" | |
read_dataset = False | |
ngrid_pts = 0 | |
data_count = 0 | |
poscar = None | |
with zopen(filename, mode="rt") as file: | |
for line in file: | |
original_line = line | |
line = line.strip() | |
if read_dataset: | |
for tok in line.split(): | |
if data_count < ngrid_pts: | |
# This complicated procedure is necessary because | |
# vasp outputs x as the fastest index, followed by y | |
# then z. | |
no_x = data_count // dim[0] | |
dataset[data_count % dim[0], no_x % dim[1], no_x // dim[1]] = float(tok) | |
data_count += 1 | |
if data_count >= ngrid_pts: | |
read_dataset = False | |
data_count = 0 | |
all_dataset.append(dataset) | |
elif not poscar_read: | |
if line != "" or len(poscar_string) == 0: | |
poscar_string.append(line) | |
elif line == "": | |
poscar = Poscar.from_str("\n".join(poscar_string)) | |
poscar_read = True | |
elif not dim: | |
dim = [int(i) for i in line.split()] | |
ngrid_pts = dim[0] * dim[1] * dim[2] | |
dimline = line | |
read_dataset = True | |
dataset = np.zeros(dim) | |
elif line == dimline: | |
# when line == dimline, expect volumetric data to follow | |
# so set read_dataset to True | |
read_dataset = True | |
dataset = np.zeros(dim) | |
else: | |
# store any extra lines that were not part of the | |
# volumetric data so we know which set of data the extra | |
# lines are associated with | |
key = len(all_dataset) - 1 | |
if key not in all_dataset_aug: | |
all_dataset_aug[key] = [] | |
all_dataset_aug[key].append(original_line) | |
if len(all_dataset) == 4: | |
data = { | |
"total": all_dataset[0], | |
"diff_x": all_dataset[1], | |
"diff_y": all_dataset[2], | |
"diff_z": all_dataset[3], | |
} | |
data_aug = { | |
"total": all_dataset_aug.get(0), | |
"diff_x": all_dataset_aug.get(1), | |
"diff_y": all_dataset_aug.get(2), | |
"diff_z": all_dataset_aug.get(3), | |
} | |
# construct a "diff" dict for scalar-like magnetization density, | |
# referenced to an arbitrary direction (using same method as | |
# pymatgen.electronic_structure.core.Magmom, see | |
# Magmom documentation for justification for this) | |
# TODO: re-examine this, and also similar behavior in | |
# Magmom - @mkhorton | |
# TODO: does CHGCAR change with different SAXIS? | |
diff_xyz = np.array([data["diff_x"], data["diff_y"], data["diff_z"]]) | |
diff_xyz = diff_xyz.reshape((3, dim[0] * dim[1] * dim[2])) | |
ref_direction = np.array([1.01, 1.02, 1.03]) | |
ref_sign = np.sign(np.dot(ref_direction, diff_xyz)) | |
diff = np.multiply(np.linalg.norm(diff_xyz, axis=0), ref_sign) | |
data["diff"] = diff.reshape((dim[0], dim[1], dim[2])) | |
elif len(all_dataset) == 2: | |
data = {"total": all_dataset[0], "diff": all_dataset[1]} | |
data_aug = { | |
"total": all_dataset_aug.get(0), | |
"diff": all_dataset_aug.get(1), | |
} | |
else: | |
data = {"total": all_dataset[0]} | |
data_aug = {"total": all_dataset_aug.get(0)} | |
return poscar, data, data_aug # type: ignore[return-value] |
But certainly open
would complain (maybe not so descriptive). But _get_filepath
would issue a warning anyway, so maybe don't bother issure another here 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I added the if chgcar_path is not None
check to avoid mypy
error, because _get_filepath
could return None
and trigger mypy
incompatible type error:
error: Argument 1 to "from_file" of "Chgcar" has incompatible type "str | None"; expected "str" [arg-type]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @DanielYang59! 👍
Thanks for viewing @janosh . Glad I could help. |
Summary: Avoid bader_caller from altering compressed file in place
bader_caller
from altering compressed test in place, see Deprecate_parse_atomic_densities
inBaderAnalysis
and fixBader
test setup #3656 (comment)bader_exe_path
to encourage users to addBader
executable to PATH for simplicityDetails
Added a utility function
temp_decompress
to copy compressed file and decompress the copied file inScratchDir
. If the file is not compressed, would just pass to avoid unnecessary file OPs.