-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a validator for src => transform => dst #2866
Comments
@pdeffebach - can you give me please an example when we produce something that is not clear? I have checked and for incorrect selectors we produce:
|
I think the best suggestion for this was a function to help construct pairs.
There should also be functionality added to make This might also help with #2870 if the function syntax would make it easier to programatically create a list of |
Could you please explain what is the advantage of:
over
? For me it seems more verbose and less readable.
I am not fully clear what you mean here? Could you please provide an example of assumed input and expected output? Thank you! |
It is only advantageous if the new function (or package) can provide additional error checking, type checking, and validation. Ideally, it could tell you what part of your pair is invalid or needs broadcasted. |
OK - now I understand. The problem is that it is extremely hard (if not impossible). The problem is that Julia is very flexible what you can define to work (e.g. consider https://docs.julialang.org/en/v1/manual/methods/#Function-like-objects-1 which are objects, but are callable - you cannot detect such things by checking types unfortunately) But I will think what can be done. |
I'll call on @pdeffebach and @CameronBieganek to add their thoughts too. |
I'm afraid I don't know enough to be helpful with the implementation details. Could you start by supporting more basic types and just print a warning "Validation not yet supported" for functors and such? I imagine advanced users would construct the pairs themselves anyway. |
I think another problem is that because the rules are so complicated, having a validator be separate from the place where the error is would lead to code duplication and, at worst, divergent behavior. One thing we could also do is export a prettier version of |
We would just need to allow passing a data frame, or grouped data frame instead of an index. This is fully feasible. In short - I agree that it is hard. I have added one answer on SO https://stackoverflow.com/questions/69194267/failing-to-execute-column-transformation-in-dataframes-jl discussing the problem that is impossible to solve but is pretty common. |
Did |
No. I haven't gotten around to adding more features to |
An example what I mean. If you gave:
which is most likely not what the user wanted but rather My answer would be: If you gave:
which is most likely not what the user wanted but rather My answer would be: |
The best examples I have are linked in the discourse post which prompted this issue. I am currently writing a documentation PR for the "Basic Usage of Transformation Functions" section which I hope will also help. |
I have found in this thread only one example that you gave that uses
The answer for this case is: it is impossible to give a better error message. By the time As commented above - if you have any other problematic examples please let me know and I will check if they are fixable. E.g. #2867 was a concrete request and we have already fixed it. |
iirc OP had confusion about
which should fail on |
What is confusing here:
? For me the error seems pretty explicit: you passed a column selector |
One thing that is confusing is that the first argument of the pair is not what is passed to the function. In Another thing that is confusing, is how
The first example under
I am trying to help suggest fixes, but honestly I am still kind of confused on how everything works. I don't know the best way to fix all the hurdles I am presenting, but I doubt that rewording the existing error messages is the full solution to simplifying understanding of the mini-language syntax. |
Sidenote
But I got an error:
What is the correct way to submit my changes as a PR? Clicking "New Pull Request" on GitHub didn't seem to do anything. |
That might be a good idea, or at least a link. Although, even after reading the Indexing Doc, I would not have known it is meant to list what can be used as the first part of any transformation |
I was thinking that |
Better documentation - for sure we should do it. If I understand correctly you are working on this in your PR - right? Error checking - it is impossible unfortunately to do anything better than what we do now + additionally adding information on what is accepted.
You have to push changes to your clone of the repository on GitHub not to the master repository. You do not have permissions to write to it.
These functions are referenced here in the manual: https://github.com/JuliaData/DataFrames.jl/blob/main/docs/src/man/working_with_dataframes.md#subsetting-functions. As is explained in the last paragraph of this section The
It is enough for me that you report the problematic cases and we can work out how to fix them either by:
Regarding your
This is not possible the reason is that the syntax is. If
if
The crucial thing here why it is not possible is that when you pass a non-table input the columns are splatted. However, in short one can think of that the function gets the columns of
Now the question to you is how do you think we should explain it to the users so that it would be easily understandable. Thank you! |
I think it's important to remembers that new users don't know how to read errors and might not understand that there are multiple possible points of failure in a call.
This is a good error message, but at the end of the day it's still red text users don't read. We could give them a single function that lets them pinpoint at least one point of failure, the selection process which throws an error with
We could return a But if a new user can't read an error message in a I think this conversation definitely reveals actionable items about documentation. People need to know that |
Both
|
That is why we keep |
In order to improve readability of error messages
The text was updated successfully, but these errors were encountered: