Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data validation #3

Open
vkhodygo opened this issue Jun 20, 2022 · 0 comments
Open

Data validation #3

vkhodygo opened this issue Jun 20, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@vkhodygo
Copy link
Member

This code needs some data validation.

@dkopasker I'd like you to describe here what you expect from every variable in the raw data files, that includes their range, possible NA or NaN, etc. In addition, we need to clearly state how the code processes such values. Common options include dropping such entries, asking the aggregate functions to ignore them, or replacing with some imputed values (mean of some sort, median).
This approach should make the data analysis much more reproducible.

We should also consider LABsim output as potentially corrupted as the code itself is not tested properly. Constant changes in the code do not help here either. That means this script must notify every user in the case any input value is out of expected range.

@vkhodygo vkhodygo added the enhancement New feature or request label Jun 20, 2022
@vkhodygo vkhodygo added the question Further information is requested label Jun 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants