Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing structure files #3874

Closed
shyuep opened this issue Jun 10, 2024 · 10 comments
Closed

Missing structure files #3874

shyuep opened this issue Jun 10, 2024 · 10 comments
Labels

Comments

@shyuep
Copy link
Member

shyuep commented Jun 10, 2024

Python version

any

Pymatgen version

2024.6.4

Operating system version

No response

Current behavior

PymatgenTest.get_structure fail if tests is not are not present, e.g., if it is installed from pip.

@DanielYang59 @janosh You cannot do this kind of thing. PymatgenTest is used not just within Pymatgen but a lot of downstream packages. Some of those packages rely on having PymatgenTest to generate a structure. I have to ask that you stop with all these irrelevant moving of files around and causing breakages. STRICTLY keep backwards compatiblity! Revert this change and issue a release immediately.

Expected Behavior

PymatgenTest.get_structure actually does not result in a failure.

Minimal example

No response

Relevant files to reproduce this bug

No response

@shyuep shyuep added the bug label Jun 10, 2024
@janosh
Copy link
Member

janosh commented Jun 10, 2024

please stop calling other people's contributions irrelevant (i assume you're referring to #3831).

PymatgenTest is used not just within Pymatgen but a lot of downstream packages

in my opinion, having a publicly exported PymatgenTest class was an antipattern to begin with. it's below your current coding standards. imagine someone proposed to add such a class now. you wouldn't accept such a PR.
pymatgen shipped several dozen CIF and JSON files to thousands of users even though downstream usage is likely limited to a handful of call sites (a dev script in atomate is the only one i could find from a github search)

if you really want those structures in the package, there should be a clearly user-facing API like pymatgen.core.get_example_structure function that loads these files as opposed to pymatgen.util.testing.PymatgenTest.get_structure. then more people would use them, justifying their inclusion in the package.

if it was important that external packages can load these structures, there should have been tests for that. you often told me yourself, if people don't write tests for things they added to pymatgen, they should not expect others to maintain that untested functionality for them.

finally, i would like to point out that you were the one to do a massive (in your own words apparently irrelevant) migration of test files into the tests directory in 77f89dfbb. you're attacking others for your own behavior

@shyuep
Copy link
Member Author

shyuep commented Jun 11, 2024

  1. Moving the structure files is irrelevant - it improves none of pymatgen's main functionality.
  2. Tests were actually written for PymatgenTest - it is everywhere that PymatgenTest is used. What are you expecting - a test for a code that implements a testing class? Or something that checks for existence of files? Moving the structure files does not cause issues within pymatgen tests because the CI checkouts the Github version, but it does cause issues in downstream code that uses PymatgenTest when installed via pip.
  3. Just because you don't understand the reason why the structure files were included in pymatgen (21 very small json files, which added at best a few kb to the pymatgen wheel) doesn't mean there is no good reason for it. ASK. Apparently this is very difficult for you to do.

@shyuep shyuep closed this as completed Jun 11, 2024
@DanielYang59
Copy link
Contributor

DanielYang59 commented Jun 11, 2024

Hi thanks for pinging me.

Yes indeed I relocated those structure files in #3831, and I was not expecting them to be needed outside pymatgen tests (just like any other pymatgen test files).

Moving such files might not be a major concern/improvement here, but from my side, PymatgenTest being needed outside pymatgen tests is truly unexpected. That's the reason I didn't ask in the first place, also as you might have noticed, I tried to explicitly tag and ask for approval for any potentially breaking changes already :)

As per the module docstring (and everything else I could infer from), it should not be accessed externally:

"""Common test support for pymatgen test scripts.
This single module should provide all the common functionality for pymatgen
tests in a single location, so that test scripts can just import it and work
right away.
"""

@janosh
Copy link
Member

janosh commented Jun 11, 2024

What are you expecting - a test for a code that implements a testing class? Or something that checks for existence of files?

@shyuep yes, that's exactly what i'm expecting for any external API! PymatgenTest certainly doesn't sound like an external API which is why in my opinion, the issue is not the lack of tests but the fact that PymatgenTest wasn't marked as private. but given you see it as an external API, i'm surprised is not not self-evident to you that it then requires test coverage. you yourself have always emphasized the importance of tests for anything you want to stay unchanged over time.

Moving the structure files does not cause issues within pymatgen tests because the CI checkouts the Github version, but it does cause issues in downstream code that uses PymatgenTest when installed via pip.

no need to explain, i'm well aware of that.

doesn't mean there is no good reason for it

so what is the good reason? like i said, those files were there to support an antipattern which is hardly a good reason. and because it was an antipattern neither i nor @DanielYang59 suspected that there was a reason at all. given there were no tests or documentation to inform us otherwise, i think you should tone down your language

@shyuep
Copy link
Member Author

shyuep commented Jun 11, 2024

@DanielYang59 Thanks for the thoughtful response. The intent is for PymatgenTest to be an public utility class to be used for testing materials science code, not just within pymatgen. There are some packages that use PymatgenTest to load example structures to simplify testing. E.g., matcalc, maml, pymatgen.analysis.diffusion. That change broke the CI of multiple downstream codes. It is really no different from something like the built-in datasets for something like scikit-learn or seaborn.

It is a good practice to find out who the original author of the code is (which is available from Git) and then clarify intent first before making breaking changes. I am not a newbie. I know the difference between a private class and a public class. I know the difference between test files and code files. There are reasons why I left the json files in pymatgen/utils/files and not pymatgen/utils/tests/files. When I consolidated the test files, I did not move them like all the other test files.

Perhaps let me outline the guidelines:

  1. Backwards compatibility wherever possible.
  2. Absolutely no breakages for purely aesthetic or housekeeping changes unless followed by a very long deprecation period. This is not on you, but there was a similar change to variable names just for spelling which caused breakages for no good reason.
  3. For actual new functionality/bug fixes, breaking changes can be tolerated if gains > costs. But again, preferable if there is a deprecation period.

I am not going to dignify the rest of the comments from others with a response.

@DanielYang59
Copy link
Contributor

DanielYang59 commented Jun 12, 2024

Perhaps let me outline the guidelines:

1. Backwards compatibility wherever possible.

2. Absolutely no breakages for purely aesthetic or housekeeping changes unless followed by a very long deprecation period. This is not on you, but there was a similar change to variable names just for spelling which caused breakages for no good reason.

3. For actual new functionality/bug fixes, breaking changes can be tolerated if gains > costs. But again, preferable if there is a deprecation period.

I appreciate your sharing this, which certainly would help me become a better contributor :) Also I'm beginning to feel the same as I gain more experience in open-source.

The intent is for PymatgenTest to be an public utility class to be used for testing materials science code, not just within pymatgen. There are some packages that use PymatgenTest to load example structures to simplify testing. E.g., matcalc, maml, pymatgen.analysis.diffusion.

Thanks for explaining this, I was not expecting this in the first place. As per its name PymatgenTest, it should be a unit test class for pymatgen (not a more general name like EnhancedUnitTest). Therefore, I skipped the following consultation phases (ask for approval/deprecate with a deprecation period and such). As such, thanks for clarifying this in c83d5d5

If I were to realize (or even just suspect) it might be access externally, I would certainly follow the guidelines you mentioned above :)

With all these said, perhaps those packages should consider trying to get structures (even for unit test) through the mp-api (just because it can get structure from PymatgenTest.get_structure, doesn't mean it should, right?)? This sounds like a more rational choice to me...

@janosh
Copy link
Member

janosh commented Jun 12, 2024

I am not going to dignify the rest of the comments from others with a response.

sounds more like you lost the argument on its merit...

shyuep added a commit that referenced this issue Jun 12, 2024
@shyuep
Copy link
Member Author

shyuep commented Jun 12, 2024

@DanielYang59 Sure, perhaps naming can be modified (with a deprecation period because this is purely aesthetic change), but even just for Pymatgen add-on packages, it is expected that PymatgenTest would be useful.

As for getting structures, you can use the mp-api of course but it is more involved. You need to set up an API key as a Github secret. And online access is always needed. Why not simply have some example files within code? I would note for example that codes like seaborn, scikit-learn, all have example data within the code for ease of testing, tutorials, etc. There was a method called get_mp_structure in PymatgenTest previously. Anyway, I added a Structure.from_id, which is a more robust solution for other applications.

@DanielYang59
Copy link
Contributor

DanielYang59 commented Jun 13, 2024

Sure, perhaps naming can be modified (with a deprecation period because this is purely aesthetic change), but even just for Pymatgen add-on packages, it is expected that PymatgenTest would be useful.

That's a fair point to me!

As for getting structures, you can use the mp-api of course but it is more involved. You need to set up an API key as a Github secret. And online access is always needed.

Yes it's less "convenient", but for CI environment, Internet access is usually available anyway (installing pymatgen requires Internet access too).

Why not simply have some example files within code? I would note for example that codes like seaborn, scikit-learn, all have example data within the code for ease of testing, tutorials, etc.

I have a feeling that this is not "best practice", but I'm not quite certain yet. Perhaps we can do better than others :)

There was a method called get_mp_structure in PymatgenTest previously. Anyway, I added a Structure.from_id, which is a more robust solution for other applications.

Thanks for adding this! It sounds much more reasonable :)

@mkhorton
Copy link
Member

Hi everyone - I need to add a note to this thread. I am glad to have discussions and disagreements, but we have to maintain a standard of civil and kind communication throughout these disagreements. There are some examples in this thread where this standard is not met.

I want the pymatgen community to be a welcoming place to all contributors, without whom pymatgen would not be what it is today. We also want to encourage voices to speak up when they think something is done badly, or a mistake has been made.

We will only encourage useful, critical feedback if people feel safe to make it, and know that we can collectively tackle the problem and not the person (no "you" statements that are used to attack an individual please).

Thank you everyone for your work here on the underlying issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants