Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hive Query language as an SQL dialect #880

Merged
merged 13 commits into from
Aug 14, 2019

Conversation

tkluck-booking
Copy link
Contributor

This adds syntax support for Hive Query Language (HQL), an sql dialect that is heavily used on Hadoop clusters. For reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual

This is a very naive attempt, extending the SQL lexer only by adding interpolation variables and by updating the list of reserved words/keywords. I'll be happy to hear your thoughts!

lib/rouge/lexers/hql.rb Show resolved Hide resolved
lib/rouge/lexers/hql.rb Outdated Show resolved Hide resolved
@tkluck-booking
Copy link
Contributor Author

tkluck-booking commented Apr 16, 2018

Just found time to implement your suggestions and for fixing the omissions you found. Thanks again for the review @dblessing ! Let me know what you think of these latest commits.

@tkluck
Copy link

tkluck commented Jul 26, 2018

Hi @dblessing , is there anything I can do to help this get merged?

(Commenting from my personal account -- hope that doesn't lead to confusion.)

@zoidyzoidzoid
Copy link
Contributor

Any update on this?

Copy link
Contributor

@vidarh vidarh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I'm trying to assist in triaging pull requests to see if we can get the backlog down. For the most part your pull request looks fine, but see the comments below. Thanks for your submission...

lib/rouge/lexers/sql.rb Outdated Show resolved Hide resolved
lib/rouge/lexers/sql.rb Outdated Show resolved Hide resolved
@pyrmont pyrmont added the needs-review The PR needs to be reviewed label Jul 18, 2019
@pyrmont
Copy link
Contributor

pyrmont commented Aug 5, 2019

@tkluck-booking @tkluck I've recently joined the maintainers group of Rouge and have been working my way chronologically backwards through the outstanding PRs. I'm sorry that's meant I'm only just getting to this now :(

Are you still able to submit changes to this? @vidarh raises a couple of good points that are outstanding and I'm personally a little unclear on why some of the keywords were removed from the SQL lexer. Just wanted to check you're still in a position to respond.

@pyrmont pyrmont added author-action The PR has been reviewed but action by the author is needed and removed needs-review The PR needs to be reviewed labels Aug 5, 2019
@tkluck-booking
Copy link
Contributor Author

Hi @pyrmont congrats on stepping up as a maintainer! I'll be glad to update the PR with these comments. I'll aim for later this week.

@pyrmont
Copy link
Contributor

pyrmont commented Aug 5, 2019

@tkluck-booking Thanks for the quick respond! Look forward to hearing more fully from you soon! :)

I don't understand the difference between the files in `lib/rouge/demos` and
`spec/visual/samples`, but I'm just cargo-culting having the same file in both.
I added the distinction between keywords_type and keywords to
HQL which subclasses SQL. @dblessing makes the excellent suggestion to
move that up to the SQL class, which might benefit too.

This commit does that. That also means adding a list of type keywords
to the SQL class. I opted to take MySQL's list, which I'm more familiar
with than other dialects. Choice of dialect is an arbitrary choice
anyway.

This *mostly* adds keywords but a few keywords were moved from
keywords to keywords_type. SET is a tricky one, as it's also a
statement, but highlighting it as a type works for what's probably the
more prevalent occurrence.
As discussed in rouge-ruby#880, we're guessing it wasn't a conscious decision
to highlight a literal string as Name::Variable, so we may as well
fix this in SQL as opposed to fixing it by an override in the subclass.
This is a change that has been made more generally in the SQL
lexer since this feature branch was started. Applying it
here for consistency.
As noticed by @vidarh, this better represents the existence of many
different interpretations in different SQL dialects.

The suggestion in the code review was to maybe make this an optional
argument (e.g. `dialect`) to the lexer, but since we only have two
dialects at the moment, I feel it's clearer to stick with subclassing
as a way to define dialects for now.
@tkluck-booking
Copy link
Contributor Author

Just force-pushed a branch with the conflicts resolved and the suggestions implemented. Let me know if anything else is needed to merge this!

Copy link
Contributor

@pyrmont pyrmont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if anything is unclear!

lib/rouge/lexers/sql.rb Outdated Show resolved Hide resolved
lib/rouge/lexers/sql.rb Outdated Show resolved Hide resolved
spec/visual/samples/hql Outdated Show resolved Hide resolved
This should have been part of b39605e. Thanks to @pyrmont's
code review.
As suggested by @pyrmont in rouge-ruby#880 (comment)
and as is also done in Pygments and Chroma.
Both of these are (part of) type specifications as well, and so
we have to choose where to put them. In  f3311f1, I let
the interpretation as a type always take precedence.

This commit applies a subjective rule-of-least-suprise instead:

 - Most people will run into COLLATE as a function, not as part
   of a unicode collation rule for a column
 - Most people will run into SET as part of an UPDATE statement,
   not as part of a CHARACTER SET specification

I chose not to move YEAR and TIMESTAMP from that same commit.
They are functions as well as types, but their use as a function
is so closely related to the type that there's no confusion to be
feared.

I didn't move NATIONAL (from NATIONAL CHAR), NCHAR and PRECISION. They
were moved to be types in the same commit, but that's their only
interpretation.
@tkluck
Copy link

tkluck commented Aug 13, 2019

Will fix&push failing CI tomorrow (cest).

@pyrmont
Copy link
Contributor

pyrmont commented Aug 14, 2019

Travis was failing because the lexer was using the SQL lexer's rules in the :double_string state and those tokenise " as Name::Variable. I added a rule to override this but also took the opportunity to change the tokens to Str::Double. Visually, these should almost always look the same, but it's more semantically correct. (If you think it's better to have single-quoted and double-quoted strings use the same token, the Str would be more correct.)

@pyrmont
Copy link
Contributor

pyrmont commented Aug 14, 2019

@tkluck-booking I think this is good to merge. Let me know if you think anything is missing.

@tkluck-booking
Copy link
Contributor Author

@pyrmont thanks for collaborating on this with me. I approve of & love your changes. Let's merge!

@pyrmont pyrmont merged commit 30e1e8c into rouge-ruby:master Aug 14, 2019
@pyrmont pyrmont removed the author-action The PR has been reviewed but action by the author is needed label Aug 14, 2019
@pyrmont
Copy link
Contributor

pyrmont commented Aug 14, 2019

Thanks for all your work, @tkluck-booking! It's good to support an additional language and thank you for improving the SQL lexer :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants