Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsampling #79

Closed
msoroush opened this issue May 2, 2019 · 3 comments
Closed

Subsampling #79

msoroush opened this issue May 2, 2019 · 3 comments
Labels

Comments

@msoroush
Copy link
Contributor

msoroush commented May 2, 2019

In order to use subsampling functions, time must be used as index name of data frame. In GOMC parser, I am using step as index name, which I receive error when I want to use subsampling functions, such as slicing.

KeyError: Index(['time'], dtype='object')

One possible solution is to change index name to time in GOMC parser. However, generalizing the subsampling module to work with any index name might be useful.

Is it possible to have a function to perform slicing on Concatenate data frame?

@orbeckst
Copy link
Member

orbeckst commented May 3, 2019

The dataframes are standardized as described in standard forms of raw data and there "time" has to be included. The standardization is crucial to make all parts of alchemlyb work seamlessly together and changing it might not be easy.

I see two options:

  1. You could rename step to time in PR Add gomc parser #78. You should add tests that show that the parsed data can be processed with subsampling and estimators.
  2. You could propose a change to all the subsampling and estimator modules to be more flexible with respect to the "time" column. For that, submit a PR that shows that this change is not breaking everything else and we review it then. I won't promise that it will get merged – it looks as if could be a big change with a narrow range of applicability. But that's a discussion we would have on the PR when we see actual code and tests.

(You could argue that we should have included a column frame or step in addition to time in the raw data frame format. However, so far the lack of this column has not prevented the library from doing what it's supposed to. One would need examples where this information would be necessary.)

@orbeckst
Copy link
Member

orbeckst commented May 3, 2019

Just to add: it would be great to make alchemlyb also work with MC data.

@orbeckst
Copy link
Member

Is this issue still relevant or can it be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants