-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Hive Query language as an SQL dialect #880
Conversation
Just found time to implement your suggestions and for fixing the omissions you found. Thanks again for the review @dblessing ! Let me know what you think of these latest commits. |
Hi @dblessing , is there anything I can do to help this get merged? (Commenting from my personal account -- hope that doesn't lead to confusion.) |
Any update on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I'm trying to assist in triaging pull requests to see if we can get the backlog down. For the most part your pull request looks fine, but see the comments below. Thanks for your submission...
@tkluck-booking @tkluck I've recently joined the maintainers group of Rouge and have been working my way chronologically backwards through the outstanding PRs. I'm sorry that's meant I'm only just getting to this now :( Are you still able to submit changes to this? @vidarh raises a couple of good points that are outstanding and I'm personally a little unclear on why some of the keywords were removed from the SQL lexer. Just wanted to check you're still in a position to respond. |
Hi @pyrmont congrats on stepping up as a maintainer! I'll be glad to update the PR with these comments. I'll aim for later this week. |
@tkluck-booking Thanks for the quick respond! Look forward to hearing more fully from you soon! :) |
I don't understand the difference between the files in `lib/rouge/demos` and `spec/visual/samples`, but I'm just cargo-culting having the same file in both.
I added the distinction between keywords_type and keywords to HQL which subclasses SQL. @dblessing makes the excellent suggestion to move that up to the SQL class, which might benefit too. This commit does that. That also means adding a list of type keywords to the SQL class. I opted to take MySQL's list, which I'm more familiar with than other dialects. Choice of dialect is an arbitrary choice anyway. This *mostly* adds keywords but a few keywords were moved from keywords to keywords_type. SET is a tricky one, as it's also a statement, but highlighting it as a type works for what's probably the more prevalent occurrence.
As discussed in rouge-ruby#880, we're guessing it wasn't a conscious decision to highlight a literal string as Name::Variable, so we may as well fix this in SQL as opposed to fixing it by an override in the subclass.
This is a change that has been made more generally in the SQL lexer since this feature branch was started. Applying it here for consistency.
As noticed by @vidarh, this better represents the existence of many different interpretations in different SQL dialects. The suggestion in the code review was to maybe make this an optional argument (e.g. `dialect`) to the lexer, but since we only have two dialects at the moment, I feel it's clearer to stick with subclassing as a way to define dialects for now.
cfc3ceb
to
b39605e
Compare
Just force-pushed a branch with the conflicts resolved and the suggestions implemented. Let me know if anything else is needed to merge this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if anything is unclear!
As suggested by @pyrmont in rouge-ruby#880 (comment) and as is also done in Pygments and Chroma.
Both of these are (part of) type specifications as well, and so we have to choose where to put them. In f3311f1, I let the interpretation as a type always take precedence. This commit applies a subjective rule-of-least-suprise instead: - Most people will run into COLLATE as a function, not as part of a unicode collation rule for a column - Most people will run into SET as part of an UPDATE statement, not as part of a CHARACTER SET specification I chose not to move YEAR and TIMESTAMP from that same commit. They are functions as well as types, but their use as a function is so closely related to the type that there's no confusion to be feared. I didn't move NATIONAL (from NATIONAL CHAR), NCHAR and PRECISION. They were moved to be types in the same commit, but that's their only interpretation.
Will fix&push failing CI tomorrow (cest). |
Travis was failing because the lexer was using the SQL lexer's rules in the |
@tkluck-booking I think this is good to merge. Let me know if you think anything is missing. |
@pyrmont thanks for collaborating on this with me. I approve of & love your changes. Let's merge! |
Thanks for all your work, @tkluck-booking! It's good to support an additional language and thank you for improving the SQL lexer :) |
This adds syntax support for Hive Query Language (HQL), an sql dialect that is heavily used on Hadoop clusters. For reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual
This is a very naive attempt, extending the SQL lexer only by adding interpolation variables and by updating the list of reserved words/keywords. I'll be happy to hear your thoughts!