-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(WIP) Initial work on nimble_parsec generated lexer #448
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also scan for (and reject) reserved words?
elixir-thrift/src/thrift_lexer.xrl
Lines 39 to 52 in 1610f61
RESERVED1 = BEGIN|END|__CLASS__|__DIR__|__FILE__|__FUNCTION__|__LINE__ | |
RESERVED2 = __METHOD__|__NAMESPACE__|abstract|alias|and|args|as|assert|begin | |
RESERVED3 = break|case|catch|class|clone|continue|declare|def|default|del | |
RESERVED4 = delete|do|dynamic|elif|else|elseif|elsif|end|enddeclare|endfor | |
RESERVED5 = endforeach|endif|endswitch|endwhile|ensure|except|exec|finally | |
RESERVED6 = float|for|foreach|from|function|global|goto|if|implements|import | |
RESERVED7 = in|inline|instanceof|interface|is|lambda|module|native|new|next | |
RESERVED8 = nil|not|or|package|pass|public|print|private|protected|raise|redo | |
RESERVED9 = rescue|retry|register|return|self|sizeof|static|super|switch | |
RESERVED10 = synchronized|then|this|throw|transient|try|undef|unless|unsigned | |
RESERVED11 = until|use|var|virtual|volatile|when|while|with|xor|yield | |
RESERVED12 = {RESERVED1}|{RESERVED2}|{RESERVED3}|{RESERVED4}|{RESERVED5} | |
RESERVED13 = {RESERVED6}|{RESERVED7}|{RESERVED8}|{RESERVED9}|{RESERVED10} | |
RESERVED = {RESERVED11}|{RESERVED12}|{RESERVED13} |
|> label("literal") | ||
end | ||
|
||
defp literal_with(char) do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps quoted_literal(quote_char)
?
|> choice([ | ||
unsigned_number(), | ||
empty() | ||
|> error("expected number") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this reads a little better as (like you do above):
error(empty(), "expected number")
def identifier() do | ||
ascii_char([?a..?z, ?A..?Z, ?_]) | ||
|> repeat(ascii_char([?a..?z, ?A..?Z, ?_, ?0..?9])) | ||
|> reduce({List, :to_atom, []}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider extracting these three lines into their own function because we repeat them below.
@@ -0,0 +1,189 @@ | |||
defmodule Thrift.Parser.Nimble do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we'll have multiple parsecs in here? Otherwise, just Thrift.Parser.Lexer
(lib/thrift/parser/lexer.ex
) seems better.
@@ -83,6 +83,10 @@ defmodule Thrift.Mixfile do | |||
{:credo, "~> 1.0", only: :dev, runtime: false}, | |||
{:dialyxir, "~> 0.5", only: :dev, runtime: false}, | |||
|
|||
# Compile | |||
{:nimble_parsec, "~> 0.4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.5 was just released.
|> ignore() | ||
|> concat( | ||
choice([ | ||
utf8_char([?\\]) |> ignore() |> concat(delim), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this handle embedded newlines, etc. like we support in the current lexer?
elixir-thrift/src/thrift_lexer.xrl
Lines 91 to 105 in 1610f61
% Process a quoted string by stripping its surrounding quote characters and | |
% expanding any escape sequences (prefixed by a \). To keep things simple, | |
% we're very lenient in that we allow any character to be escaped, and if the | |
% character isn't "special" (like \n), we just return the unescaped character. | |
% It might be nicer in the future to report "bad" escape characters, but that | |
% would involve complicating this logic to allow a top-level {error, Reason} | |
% result that could be returned to leex above. | |
process_string(S,Len) -> process_chars(lists:sublist(S, 2, Len-2)). | |
process_chars([$\\,$n|Chars]) -> [$\n|process_chars(Chars)]; | |
process_chars([$\\,$r|Chars]) -> [$\r|process_chars(Chars)]; | |
process_chars([$\\,$t|Chars]) -> [$\t|process_chars(Chars)]; | |
process_chars([$\\,C|Chars]) -> [C|process_chars(Chars)]; | |
process_chars([C|Chars]) -> [C|process_chars(Chars)]; | |
process_chars([]) -> []. |
@jparise I'm curious why those reserved words are reserved. What was the idea behind them? |
This is borrowed from Apache Thrift. The idea is that Thrift IDL shouldn't contain words that would clash with an output language's keywords when code generated. There are two possible approaches:
Apache Thrift generally does (1) (at least for C++'s reserved words), so we copied that behavior in our lexer for compatibility. In practice, we should only be concerned with Elixir's reserved words (which aren't much of a problem) and could take approach (2). In this project, reserved word rejection was added in #295 by @thecodeboss. |
Makes sense, it was a bit perplexing to see keywords for other languages in here. |
Initial work on a nimble parsec based lexer. Tried to design for nice errors and to avoid back tracking on happy path.