-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to filter data using assertr? #86
Comments
OK, my solution/kludge looks like this (please let me know if there's a better way):
|
Wow, this would be a really great feature!! Thanks for suggesting it! |
My solution looks like this now:
Not sure if that's actually better, but that's what I'm using. Basically, if an assertion fails, the row is filtered, processing continues without interruption, I get a message from tidylog telling me that rows were removed, and I get a data.frame attached as an attribute to the data that contains a full description of what failed, and why. |
Quick question... |
H'mmm, not sure. For my specific use case, it was better
in a single step, but I could have filtered at each step in the chain and built the error data.frame as the chain progressed. The advantage of the strategy I used is that there may be an overlap between different assertions that result in a row being removed; a row might be bad for more than one reason, and it could be good to know this. If you filter immediately after each step, you won't know that a row has more than one problem, and how it is removed will depend on the order of the assertions in the code. Personal preference, but I prefer to know everything that was wrong with the data. |
That makes total sense. I think it's important that there's an option to do it last. |
After further thought, it's best to filter offending rows in each step rather than all at the end. I encountered some situations in which a crash occurred because the input values to an assertion were nonsensical. For example, in a situation in which values are tested to see if they satisfy |
I made a bugfix to ensure that "values" are coerced to a consistent type before row binding. My code looks like this now:
|
Heads up. I think the latest version of assertr might be able to help you.
Let me know if this works for you. If not, please re-open the issue |
Hi, quick question: I want a different way to handle errors in my data; instead of halting execution when an error is detected (such as with
error_stop
anderror_warn
) or adding a special "assertr_errors" attribute to the data and continuing execution (such as witherror_append
), I want to filter rows (remove the rows with bad data) and report the errors so they can be displayed at the end of the pipeline. My use case is that I have a huge data.frame in a complex pipeline that typically takes days to run, so I need the pipeline to react dynamically and recover -- removing rows is acceptable -- instead of going crunch-bang after several hours of running. And it would be nice to have some record of which rows were removed, and why. Any ideas on how I could do this with assertr?The text was updated successfully, but these errors were encountered: