-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling column names directly? #6
Comments
Hi @Docbiz ! There will be a lot of corner cases:
BTW maybe the root problem why we may need this feature is because it is hard to associate |
It's certainly true that always and only referencing columns by their header row (where one exists) would be very error-prone. But I think it would be a nice option—I stress 'option'—to permit queries that reference columns by their header name. (The header names could be restricted to, say, not include the file's separator or The advantage of queries that use the names of the columns is that they can be understood on their own without the user having to keep flicking back and forth between the query and the table(s). That is: yes, it can be a bit frustrating to associate (My personal use case is wanting to casually reshape data without firing up something like R, but I think this would be a generally useful feature.) If you want, I'll try to find some time over the next few weeks to come up with a pull request? |
I like your idea with wrapping column names in backtick chars, it is better than underscore replacement which I had in mind. |
Please don't use some weird mysql-specific stuff. |
There are actually some aspects of this problem that I didn't think about before.
Anyway it looks like any possible implementation of this feature would be very tricky because of all these considerations. |
Maybe there is actually an interesting trick to achieve this, currently RBQL uses a1, a2, ... notation. But if we declare special
|
Another solution could be to translate the query string with column names into a query string with a1, a2, an. Then feed that into the current engine. select name, age where age > 2 By reading header you can build dictionary {name:1,age:2} No core engine requirements changed. |
@peheje This will not work because we can't reliably "translate" the query string. Think about name collisions between column names and built-in python/js keywords, identifiers and popular modules e.g. columns named "if", "sys", "os", "or", "and" etc. So we can't just replace all tokens in a query that match column names in the csv file. One way to work around this is to require all column names to be wrapped in backticks as @Docbiz proposed, then we can make the translation work reliably. But I think it is possible to totally avoid the translation by combining notations 2 and 3. |
I think you misunderstood. This would all be string manipulation. No
collisions chance regardless of language.
If a column was named "null" we would look at the header string. Find the
column with name null, count the number let's says 2 and 'intelligently'
replace that "null" from the query string with a2.
Let me know if I should make an example.
…On Sat, Jun 29, 2019, 15:56 Dmitry Ignatovich ***@***.***> wrote:
@peheje <https://github.com/peheje> This will not work because we can't
reliably "translate" the query string. Think about name collisions between
column names and built-in python/js keywords, identifiers and popular
modules e.g. columns named "if", "sys", "os", "or", "and" etc. So we can't
just replace all tokens in a query that match column names in the csv file.
One way to work around this is to require all column names to be wrapped in
backticks as @Docbiz <https://github.com/docbiz> proposed, then we can
make the translation work reliably. But I think it is possible to totally
avoid the translation by combining notations 2 and 3.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6?email_source=notifications&email_token=ABXPXFSATDI3LOBHKMEWPATP45SXFA5CNFSM4GJHIA72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY3ZHHA#issuecomment-506958748>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABXPXFW3R4OJ4Z32KFQCTA3P45SXFANCNFSM4GJHIA7Q>
.
|
OK, That's how I understand your proposal (and problems associated with it): let's say we have a table:
And the query (in javascript) is:
which is obviously an error, a3. stringify(...) will throw an exception because a3 is a string that doesn't have stringify() function. So how do you suggest to alter that algorithm to avoid the problem? |
I admit, I didn't think of that. However the user could be forced to specify where the column is coming from: a or b. Required in the case of join anyway. Maybe this could be a solution?
I'm know sure how to handle spaces in column names though, maybe unsupported? |
Yep, I think that a.name, a.age, b.name, etc notations are the way to go.
The output would be So to implement this "a.name" etc notations I suggest to initialize special "a" and "b" objects (if either "a." or "b." were detected somewhere in the query). This should be relatively easy to do - almost the same code that currently initializes a1, a2, b1 etc variables. "a" and "b" can also support square bracket access like a["person name"]. Those who think that this is super ugly and non-SQLish can pretend that this feature doesn't exist and only columns without whitespaces/special characters are supported. I don't have 100% guarantee that this approach would work, but I currently don't see why it wouldn't, and the only way to be sure is to try and implement this. |
Doesn't address this issue directly, but something that might be worth considering is to simply allow skipping of the header row when the query is performed. |
@fahnzmode I was planning to do this at some point. Implementing an additional "--skip-header" option should be easy, but it has to be supported in all user interfaces - this is the hard part. Also I would like to add support to NL variable simultaneously - which would be at least NR + 1 in this case (or even greater if some records span multiple lines). BTW until it's implemented you can use |
Thanks for the |
So, I wrote the code to support PS I rewrote JS implementation: replaced all callbacks with async/await so it should be much easier to read now. |
Column name support was added for all RBQL apps that I am supporting. Many thanks to everyone who participated in this discussion! I am not sure if RBQL would ever support column names without "a." and "b." prefixes, still we can open a new issue for this. |
Typically, a csv file will have a header row (e.g.
carat,cut,color,clarity,depth,table,price,x,y,z
for the diamonds data set fromggplot2
).Is there a chance of adding an option to perform RBQL queries directly with the columns (e.g.
carat
) instead of having to reference them by their index (e.g.a1
)?The text was updated successfully, but these errors were encountered: