Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) Initial work on nimble_parsec generated lexer #448

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fishcakez
Copy link
Member

Initial work on a nimble parsec based lexer. Tried to design for nice errors and to avoid back tracking on happy path.

Copy link
Collaborator

@jparise jparise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also scan for (and reject) reserved words?

RESERVED1 = BEGIN|END|__CLASS__|__DIR__|__FILE__|__FUNCTION__|__LINE__
RESERVED2 = __METHOD__|__NAMESPACE__|abstract|alias|and|args|as|assert|begin
RESERVED3 = break|case|catch|class|clone|continue|declare|def|default|del
RESERVED4 = delete|do|dynamic|elif|else|elseif|elsif|end|enddeclare|endfor
RESERVED5 = endforeach|endif|endswitch|endwhile|ensure|except|exec|finally
RESERVED6 = float|for|foreach|from|function|global|goto|if|implements|import
RESERVED7 = in|inline|instanceof|interface|is|lambda|module|native|new|next
RESERVED8 = nil|not|or|package|pass|public|print|private|protected|raise|redo
RESERVED9 = rescue|retry|register|return|self|sizeof|static|super|switch
RESERVED10 = synchronized|then|this|throw|transient|try|undef|unless|unsigned
RESERVED11 = until|use|var|virtual|volatile|when|while|with|xor|yield
RESERVED12 = {RESERVED1}|{RESERVED2}|{RESERVED3}|{RESERVED4}|{RESERVED5}
RESERVED13 = {RESERVED6}|{RESERVED7}|{RESERVED8}|{RESERVED9}|{RESERVED10}
RESERVED = {RESERVED11}|{RESERVED12}|{RESERVED13}

|> label("literal")
end

defp literal_with(char) do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps quoted_literal(quote_char)?

|> choice([
unsigned_number(),
empty()
|> error("expected number")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this reads a little better as (like you do above):

error(empty(), "expected number")

def identifier() do
ascii_char([?a..?z, ?A..?Z, ?_])
|> repeat(ascii_char([?a..?z, ?A..?Z, ?_, ?0..?9]))
|> reduce({List, :to_atom, []})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider extracting these three lines into their own function because we repeat them below.

@@ -0,0 +1,189 @@
defmodule Thrift.Parser.Nimble do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we'll have multiple parsecs in here? Otherwise, just Thrift.Parser.Lexer (lib/thrift/parser/lexer.ex) seems better.

@@ -83,6 +83,10 @@ defmodule Thrift.Mixfile do
{:credo, "~> 1.0", only: :dev, runtime: false},
{:dialyxir, "~> 0.5", only: :dev, runtime: false},

# Compile
{:nimble_parsec, "~> 0.4",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.5 was just released.

|> ignore()
|> concat(
choice([
utf8_char([?\\]) |> ignore() |> concat(delim),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this handle embedded newlines, etc. like we support in the current lexer?

% Process a quoted string by stripping its surrounding quote characters and
% expanding any escape sequences (prefixed by a \). To keep things simple,
% we're very lenient in that we allow any character to be escaped, and if the
% character isn't "special" (like \n), we just return the unescaped character.
% It might be nicer in the future to report "bad" escape characters, but that
% would involve complicating this logic to allow a top-level {error, Reason}
% result that could be returned to leex above.
process_string(S,Len) -> process_chars(lists:sublist(S, 2, Len-2)).
process_chars([$\\,$n|Chars]) -> [$\n|process_chars(Chars)];
process_chars([$\\,$r|Chars]) -> [$\r|process_chars(Chars)];
process_chars([$\\,$t|Chars]) -> [$\t|process_chars(Chars)];
process_chars([$\\,C|Chars]) -> [C|process_chars(Chars)];
process_chars([C|Chars]) -> [C|process_chars(Chars)];
process_chars([]) -> [].

@scohen
Copy link
Collaborator

scohen commented Dec 13, 2018

@jparise I'm curious why those reserved words are reserved. What was the idea behind them?

@jparise
Copy link
Collaborator

jparise commented Dec 13, 2018

@jparise I'm curious why those reserved words are reserved. What was the idea behind them?

This is borrowed from Apache Thrift. The idea is that Thrift IDL shouldn't contain words that would clash with an output language's keywords when code generated. There are two possible approaches:

  1. Prevent the IDL from ever containing languages' keywords (across a large set of supported languages).
  2. Let the IDL contain close to anything, and deal with reserved words on a per-generator basis. The generator would have to recognize its own reserved words and renamed them accordingly.

Apache Thrift generally does (1) (at least for C++'s reserved words), so we copied that behavior in our lexer for compatibility. In practice, we should only be concerned with Elixir's reserved words (which aren't much of a problem) and could take approach (2).

In this project, reserved word rejection was added in #295 by @thecodeboss.

@scohen
Copy link
Collaborator

scohen commented Dec 14, 2018

Makes sense, it was a bit perplexing to see keywords for other languages in here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants