Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add capability for Vasprun to read KPOINTS_OPT data #3509

Merged
merged 37 commits into from
Feb 23, 2024

Conversation

bfield1
Copy link
Contributor

@bfield1 bfield1 commented Dec 10, 2023

Summary

This pull request resolves Issue #3455 , implementing parsing functionality for band structures generated using KPOINTS_OPT.
KPOINTS_OPT is a new VASP feature which provides an alternative method for specifying k-points for hybrid band structures.
The KPOINTS_OPT-related data is now read in by the Vasprun class, and Vasprun.get_band_structure defaults to reading the KPOINTS_OPT-based eigenvalues if present.

Unfinished tasks

  • Implement KPOINTS_OPT functionality for density of states. As the DOS is accessed using a property rather than a function, it is unclear what the best interface would be. The current pull request reads and saves the data from vasprun.xml, but it doesn't have a convenient getter. Regardless, the other features of this pull request stand without this feature.

Checklist

  • Google format doc strings added. Check with ruff.
  • Type annotations included. Check with mypy.
  • Tests added for new features/fixes.
  • If applicable, new classes/functions/modules have duecredit @due.dcite decorators to reference relevant papers by DOI (example)

bfield1 and others added 15 commits December 7, 2023 18:50
Allow reading in KPOINTS_OPT data from vasprun.xml. Also get band structure from KPOINTS_OPT data, if requested. (May change that to the default behaviour.)

I've tested it separately, but still need to test integration into the pymatgen code, once I get a working development build of pymatgen.
projected_magnetism_kpoints_opt -> projected_magnetisation_kpoints_opt
From issue materialsproject#3455 , dataset supplied by Jeff-oakly materialsproject#3455 (comment)
Although, for some reason, even though LORBIT = 11, vasprun.xml doesn't have projected eigenvalues for kpoints_opt. Wierd.
It was with VASP 6.4 rather than 6.3, if that makes a difference. And more CPUs so it snuck in more bands, which slightly changed some energy.
if KPOINTS_OPT data is present.
Prior name was not descriptive of the tests being done.
in both Vasprun and BSVasprun
@bfield1 bfield1 marked this pull request as ready for review December 11, 2023 01:19
@janosh janosh added enhancement A new feature or improvement to an existing one io Input/output functionality vasp Vienna Ab initio Simulation Package labels Dec 11, 2023
@bfield1 bfield1 changed the title Vasprun kpoints opt Add capability for Vasprun to read KPOINTS_OPT data Dec 11, 2023
@janosh janosh force-pushed the master branch 2 times, most recently from 3c23114 to 36e289c Compare December 19, 2023 02:10
bfield1 and others added 2 commits December 20, 2023 11:38
Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! All good except for the large new file tests/files/kpoints_opt/vasprun.xml.gz. Can you check if we can use it in place of one of the existing vasprun.xml files?

tests/files/kpoints_opt/vasprun.xml.gz Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@esoteric-ephemera If you're not too busy, could you take a look and let me know if the changes in this file all make sense to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sorry I will soon!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These overall look pretty reasonable to me. @bfield1, are the changes to the structure of the XML parsing needed? It seems like you've added explicit triggers for when XML blocks are read.

Also, if a user specifies a KPOINTS_OPT file, it might make more sense to put the various *_opt quantities without the suffix?

For many use cases, where KPOINTS_OPT isn't used, there would be 5 new top-level attrs that default to None, which is OK. Maybe it makes more sense to have underscored attrs that default to None and store the quantities computed with KPOINTS/KSPACING when KPOINTS_OPT is used?

For example, when no KPOINTS_OPT file is used, or LKPOINTS_OPT = False:

    self.eigenvalues_kpoints = < eigenvalues using KPOINTS / KSPACING >
    self._eigenvalues_kpoints = None

and when KPOINTS_OPT is used:

    self.eigenvalues_kpoints = < eigenvalues using KPOINTS_OPT >
    self._eigenvalues_kpoints = < eigenvalues using KPOINTS / KSPACING >

Just to keep things more organized and hopefully reflect what a user expects to get from a Vasprun.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bfield1, are the changes to the structure of the XML parsing needed? It seems like you've added explicit triggers for when XML blocks are read.

Yes, we do need those changes. The prior 'end' condition for reading XML elements doesn't tell you which element it is nested within until after you have left that element. The 'start' condition is used to identify when we are entering certain blocks (but we have to wait until 'end' to read the data).
When using KPOINTS_OPT, some of the blocks can only be distinguished from their regular KPOINTS counterparts by knowing which block they are nested within. As such, I had to change the parser to read both start and end tags, and add a flag to track when we have entered (or left) certain blocks.

Regarding how to structure the data:

My choice of convention was mainly based on how the data is structured within the vasprun.xml file itself, as well as the calculation outputs. KPOINTS and KPOINTS_OPT are separate files, as are PROCAR and PROCAR_OPT. The vasprun.xml files stores the conventional KPOINTS data in the standard location and the KPOINTS_OPT data under special flags. The KPOINTS_OPT calculation is an extra non-self-consistent calculation run on top of the regular self-consistent calculation.
I also didn't want to change what the current variables in Vasprun were pointing to without proper consultation.

That said, I am not opposed to making the KPOINTS_OPT data the default when it is present. You probably have a better idea of what the average user "expects" from a Vasprun than I do. I only recently started using KPOINTS_OPT too, so different people may have different conventions.

One use-case of KPOINTS_OPT I was considering was doing a regular grid with normal KPOINTS (which you have to do) and a band structure with KPOINTS_OPT. In principle, one could get DOS data from the regular grid and band structure data from the KPOINTS_OPT data. But maybe this is a niche use case?

I would guess that typically someone cares most about the KPOINTS_OPT data when they specify it, so making that the prime data then using extra flags to get the underlying data probably makes sense. But also, I would expect the Vasprun class to reflect the data structure of the vasprun.xml file, so listing KPOINTS_OPT separately also makes sense. I suppose it depends which we value more.

For what it's worth, I counted 10 new top-level attributes rather than 5. tdos_opt, idos_opt, pdos_opt, efermi_opt, eigenvalues_kpoints_opt, projected_eigenvalues_kpoints_opt, projected_magnetisation_kpoints_opt, kpoints_opt, actual_kpoints_opt, and actual_kpoints_weights_opt, which are all the KPOINTS_OPT equivalents of their non-opt counterparts and are listed in separate fields in vasprun.xml. I can appreciate that having lots of usually-empty attributes floating around might be a bit messy, though. If we wanted to tidy up, maybe we could store these attributes inside a dictionary so we only have one new top-level attribute?

Let me know your thoughts.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using KPOINTS_OPT, some of the blocks can only be distinguished from their regular KPOINTS counterparts by knowing which block they are nested within. As such, I had to change the parser to read both start and end tags, and add a flag to track when we have entered (or left) certain blocks.

this is very helpful information to understand why the code is as it is. could you insert this as a comment along with a date?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, I counted 10 new top-level attributes rather than 5. [...] I can appreciate that having lots of usually-empty attributes floating around might be a bit messy, though. If we wanted to tidy up, maybe we could store these attributes inside a dictionary so we only have one new top-level attribute?

would be interested to hear @utf opinion on this and also if it affects atomate2 in some way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bfield1! This is super helpful context and I appreciate the work you've put into this.

Re:

In principle, one could get DOS data from the regular grid and band structure data from the KPOINTS_OPT data. But maybe this is a niche use case?

This is actually the default behavior of the Materials Project's bandstructure workflows, which means we should probably save some compute effort and revise these after this PR is merged. As they stand currently, two separate self-consistent calculations are performed to get the DOS from a uniform k-mesh, and the bandstructure from a line-mode k-mesh

If we wanted to tidy up, maybe we could store these attributes inside a dictionary so we only have one new top-level attribute?

I might lean towards this approach, or perhaps defining a small container class (e.g., electronic_properties) that contains these 10 attrs (electronic_properties.tdos, electronic_properties.idos,...). The container class would be a top-level attr of Vasprun. Hope that makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very helpful information to understand why the code is as it is. could you insert this as a comment along with a date?

Done.

This is actually the default behavior of the Materials Project's bandstructure workflows, which means we should probably save some compute effort and revise these after this PR is merged.

You'd probably want to double-check that VASP prints its output to vasprun.xml as expected when using KPOINTS_OPT, as it's a new feature so may have some bugs (the compilation of VASP I'm using doesn't write the projected eigenvalues to vasprun.xml properly, for instance). But if VASP behaves you might be able to save some effort, yes.

I might lean towards this approach, or perhaps defining a small container class (e.g., electronic_properties) that contains these 10 attrs (electronic_properties.tdos, electronic_properties.idos,...). The container class would be a top-level attr of Vasprun. Hope that makes sense

I have now implemented such a solution. A simple container class (_ElectronicPropertiesContainer) (which I made MSONable because it seemed straightforward) which is used in the attribute Vasprun.kpoints_opt_properties. It holds the electronic properties with their normal names (without the suffixes). The names might be a little wordy, but it's probably better to be more verbose than less.

In principle, one could use such a container for the regular electronic properties too, but that's probably an unnecessary extra layer with no obvious benefits. Unless, maybe, we have setters/getters to the top-level attributes which intelligently decide whether to access the kpoints_opt or regular data, but that's getting ahead of ourselves. People can request such a feature if demand arises.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these changes @bfield1! I'm happy with the current state of this PR. Re the "non-opt" electronic properties: just to avoid breaking changes, I prefer the way you've done things by not storing them in an _ElectronicPropertiesContainer.

The option to transition to a container class for these is there, but not necessary now.

@bfield1 bfield1 requested a review from janosh January 31, 2024 23:08
@@ -140,6 +140,50 @@ def _vasprun_float(flt: float | str) -> float:
raise exc


class _ElectronicPropertiesContainer(MSONable):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm confused on what the consensus re _ElectronicPropertiesContainer is in this PR. to me it looks like boilerplate. not sure what the benefits are. if i understand @esoteric-ephemera's last comment, he's also (no longer?) in favor of keeping it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I think this is good to keep things organized. The alternative is to have 10 new attrs on Vasprun.*_opt that are None whenever a KPOINTS_OPT file isn't used. That's fine but (in my view) messy for most use cases.

Either solution (removing the container class or having the *_opt attrs directly on Vasprun) works tho

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janosh As @esoteric-ephemera said, this container is useful for storing these extra variables.

The earlier remark about not storing things in the container was regarding the pre-existing electronic properties. I had briefly floated the idea of putting all the electronic properties into these containers (and using getters to choose them intelligently), but decided against it to avoid breaking changes and unnecessary complexity. But the container is still good for the KPOINTS_OPT attributes.

@janosh janosh force-pushed the master branch 4 times, most recently from d725325 to dca98be Compare February 2, 2024 11:47
Copy link
Member

@janosh janosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a bit concerned here that the tests pass even if you remove all attributes from

@dataclass
class KpointOptProps:
    """Simple container class to store KPOINTS_OPT data in a separate namespace. Used by Vasprun."""

    tdos: Dos | None = None
    idos: Dos | None = None
    pdos: list | None = None
    efermi: float | None = None
    eigenvalues: dict | None = None
    projected_eigenvalues: dict | None = None
    projected_magnetisation: np.ndarray | None = None
    kpoints: Kpoints | None = None
    actual_kpoints: list | None = None
    actual_kpoints_weights: list | None = None
    dos_has_errors: bool | None = None

i.e.

@dataclass
class KpointOptProps:
    """Simple container class to store KPOINTS_OPT data in a separate namespace. Used by Vasprun."""

works fine too. the attributes help with IDE auto-complete and type checking of course but maybe @bfield1 can think about more stringent tests for this?

either way, merging because of the very long wait time. sorry about that.

@janosh janosh merged commit 53141c1 into materialsproject:master Feb 23, 2024
22 checks passed
@bfield1
Copy link
Contributor Author

bfield1 commented Feb 23, 2024

i'm a bit concerned here that the tests pass even if you remove all attributes from

@dataclass
class KpointOptProps:
    """Simple container class to store KPOINTS_OPT data in a separate namespace. Used by Vasprun."""

    tdos: Dos | None = None
    idos: Dos | None = None
    pdos: list | None = None
    efermi: float | None = None
    eigenvalues: dict | None = None
    projected_eigenvalues: dict | None = None
    projected_magnetisation: np.ndarray | None = None
    kpoints: Kpoints | None = None
    actual_kpoints: list | None = None
    actual_kpoints_weights: list | None = None
    dos_has_errors: bool | None = None

i.e.

@dataclass
class KpointOptProps:
    """Simple container class to store KPOINTS_OPT data in a separate namespace. Used by Vasprun."""

works fine too. the attributes help with IDE auto-complete and type checking of course but maybe @bfield1 can think about more stringent tests for this?

either way, merging because of the very long wait time. sorry about that.

My rationale was that some vasprun.xml files would not have all that data in them (e.g. if your calculation doesn't have magnetism, it wouldn't have a projected_magnetisation field, I don't think. Or if your calculation didn't use LORBIT, it won't have any projected_* fields). I felt it was safer for the end user if the blank fields returned None rather than throwing an AttributeError.

But I can indeed consider some way to test this behaviour and ensure it acts as expected. I'll open another pull request once I get around to it.

Anyway, thanks for working with me on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement to an existing one io Input/output functionality vasp Vienna Ab initio Simulation Package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants