Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid bader_caller from altering compressed file in place #3660

Merged
merged 15 commits into from
Mar 1, 2024

Conversation

DanielYang59
Copy link
Contributor

@DanielYang59 DanielYang59 commented Feb 28, 2024

Summary: Avoid bader_caller from altering compressed file in place

Details

Added a utility function temp_decompress to copy compressed file and decompress the copied file in ScratchDir. If the file is not compressed, would just pass to avoid unnecessary file OPs.

self.parse_atomic_densities = parse_atomic_densities

with ScratchDir("."):
if chgcar_filename:
self.is_vasp = True
Copy link
Contributor Author

@DanielYang59 DanielYang59 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't see self.is_vasp being used.

@@ -485,7 +482,8 @@ def _get_filepath(filename: str, msg: str = "") -> str | None:
return paths[0]

chgcar_path = _get_filepath("CHGCAR", "Could not find CHGCAR!")
chgcar = Chgcar.from_file(chgcar_path)
if chgcar_path is not None:
chgcar = Chgcar.from_file(chgcar_path)
Copy link
Contributor Author

@DanielYang59 DanielYang59 Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed mypy error:

Argument 1 to "from_file" of "Chgcar" has incompatible type "str | None"; expected "str"

@DanielYang59 DanielYang59 changed the title Improve bader_caller and its tests Avoid bader_caller from altering compressed file in place Feb 28, 2024
@DanielYang59 DanielYang59 marked this pull request as ready for review February 28, 2024 12:03
@DanielYang59
Copy link
Contributor Author

Please review @janosh. Thanks.

Maybe relocate the utility function to a more general accessible location (as I notice there are other tests seem to alter compressed file in place as well, cannot remember which one at the moment, would tag them in the future once I find any)?

Tests for test_cod seems to fail but this should not be related to this PR, ERROR 2003 (HY000): Can't connect to MySQL server on 'www.crystallography.net:3306' (10060)

@DanielYang59 DanielYang59 requested a review from janosh February 28, 2024 12:15
@@ -100,14 +98,14 @@ def temp_decompress(file: str | Path, target_dir: str = ".") -> str:
"""
file = Path(file)

if file.suffix.lower() in [".bz2", ".gz", ".z"]:
if file.suffix.lower() in (".bz2", ".gz", ".z"):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My oversight, should use set instead {".bz2", ".gz", ".z"}

Copy link
Contributor Author

@DanielYang59 DanielYang59 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a quick note for myself, Sourcery would recommend using set over list or tuple for membership check, see here.

The advantage of set/tuple over list seems quite obvious, for example discussed here.

However I still don't quite know the difference between tuple and set in this scenario, some claims it's related to performance.

if chgcar_path is not None:
chgcar = Chgcar.from_file(chgcar_path)
if chgcar_path is None:
raise FileNotFoundError("Could not find CHGCAR!")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _get_filepath method already issued a warning in line 529, maybe this is not necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess Chgcar.from_file(chgcar_path) will already raise if the file is missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe no... By Chgcar.from_file

@classmethod
def from_file(cls, filename: str):
"""Read a CHGCAR file.
Args:
filename (str): Path to CHGCAR file.
Returns:
Chgcar
"""
poscar, data, data_aug = VolumetricData.parse_file(filename)
return cls(poscar, data, data_aug=data_aug)

And then VolumetricData.parse_file:

@staticmethod
def parse_file(filename: str) -> tuple[Poscar, dict, dict]:
"""
Convenience method to parse a generic volumetric data file in the vasp
like format. Used by subclasses for parsing file.
Args:
filename (str): Path of file to parse
Returns:
tuple[Poscar, dict, dict]: Poscar object, data dict, data_aug dict
"""
poscar_read = False
poscar_string: list[str] = []
dataset: np.ndarray = np.zeros((1, 1, 1))
all_dataset: list[np.ndarray] = []
# for holding any strings in input that are not Poscar
# or VolumetricData (typically augmentation charges)
all_dataset_aug: dict[int, list[str]] = {}
dim: list[int] = []
dimline = ""
read_dataset = False
ngrid_pts = 0
data_count = 0
poscar = None
with zopen(filename, mode="rt") as file:
for line in file:
original_line = line
line = line.strip()
if read_dataset:
for tok in line.split():
if data_count < ngrid_pts:
# This complicated procedure is necessary because
# vasp outputs x as the fastest index, followed by y
# then z.
no_x = data_count // dim[0]
dataset[data_count % dim[0], no_x % dim[1], no_x // dim[1]] = float(tok)
data_count += 1
if data_count >= ngrid_pts:
read_dataset = False
data_count = 0
all_dataset.append(dataset)
elif not poscar_read:
if line != "" or len(poscar_string) == 0:
poscar_string.append(line)
elif line == "":
poscar = Poscar.from_str("\n".join(poscar_string))
poscar_read = True
elif not dim:
dim = [int(i) for i in line.split()]
ngrid_pts = dim[0] * dim[1] * dim[2]
dimline = line
read_dataset = True
dataset = np.zeros(dim)
elif line == dimline:
# when line == dimline, expect volumetric data to follow
# so set read_dataset to True
read_dataset = True
dataset = np.zeros(dim)
else:
# store any extra lines that were not part of the
# volumetric data so we know which set of data the extra
# lines are associated with
key = len(all_dataset) - 1
if key not in all_dataset_aug:
all_dataset_aug[key] = []
all_dataset_aug[key].append(original_line)
if len(all_dataset) == 4:
data = {
"total": all_dataset[0],
"diff_x": all_dataset[1],
"diff_y": all_dataset[2],
"diff_z": all_dataset[3],
}
data_aug = {
"total": all_dataset_aug.get(0),
"diff_x": all_dataset_aug.get(1),
"diff_y": all_dataset_aug.get(2),
"diff_z": all_dataset_aug.get(3),
}
# construct a "diff" dict for scalar-like magnetization density,
# referenced to an arbitrary direction (using same method as
# pymatgen.electronic_structure.core.Magmom, see
# Magmom documentation for justification for this)
# TODO: re-examine this, and also similar behavior in
# Magmom - @mkhorton
# TODO: does CHGCAR change with different SAXIS?
diff_xyz = np.array([data["diff_x"], data["diff_y"], data["diff_z"]])
diff_xyz = diff_xyz.reshape((3, dim[0] * dim[1] * dim[2]))
ref_direction = np.array([1.01, 1.02, 1.03])
ref_sign = np.sign(np.dot(ref_direction, diff_xyz))
diff = np.multiply(np.linalg.norm(diff_xyz, axis=0), ref_sign)
data["diff"] = diff.reshape((dim[0], dim[1], dim[2]))
elif len(all_dataset) == 2:
data = {"total": all_dataset[0], "diff": all_dataset[1]}
data_aug = {
"total": all_dataset_aug.get(0),
"diff": all_dataset_aug.get(1),
}
else:
data = {"total": all_dataset[0]}
data_aug = {"total": all_dataset_aug.get(0)}
return poscar, data, data_aug # type: ignore[return-value]

Thenzopen: https://github.com/materialsvirtuallab/monty/blob/4d60fd4745f840354e7b3f48346eaf3cd68f2b35/monty/io.py#L19-L45

But certainly open would complain (maybe not so descriptive). But _get_filepath would issue a warning anyway, so maybe don't bother issure another here 😄

Copy link
Contributor Author

@DanielYang59 DanielYang59 Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I added the if chgcar_path is not None check to avoid mypy error, because _get_filepath could return None and trigger mypy incompatible type error:

error: Argument 1 to "from_file" of "Chgcar" has incompatible type "str | None"; expected "str"  [arg-type]

Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @DanielYang59! 👍

@janosh janosh merged commit 83806df into materialsproject:master Mar 1, 2024
22 checks passed
@janosh janosh added ux User experience cli Command line interface charge Electric charge analysis, partitioning, integrations, etc. labels Mar 1, 2024
@DanielYang59
Copy link
Contributor Author

Thanks for viewing @janosh . Glad I could help.

@DanielYang59 DanielYang59 deleted the improve-bader-caller branch March 2, 2024 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
charge Electric charge analysis, partitioning, integrations, etc. cli Command line interface ux User experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants