-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added buffer in get_ch() for line continuation lookahead #136
Added buffer in get_ch() for line continuation lookahead #136
Conversation
pnut.c
Outdated
int ch; | ||
int chbuf[CHBUF_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how a buffer larger than 1 would be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure - I implemented it as the concept it was meant to be such that it was future proof if the tokenizer needed to do longer lookahead eventually, but it could be simplified down to just an integer variable and without the tail pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to keep it as simple as possible. We can always add the buffer and tail pointer back if we need it. Also, this should be in a #ifdef SUPPORT_LINE_CONTINUATION
block.
pnut.c
Outdated
chbuf_head = 0; // Set the pointer to the character buffer | ||
ch = '\\'; // Restore the character that was read on this call | ||
} else { // The character is a newline, so this is a line continuation which we want to bypass | ||
ch = fgetc(fp); // Consume yet another character, the next one for logical parsing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to increment the line and column number if INCLUDE_LINE_NUMBER_ON_ERROR
or the line count will get out of sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - as a side note, I haven't looked extensively at the debug error messages, but it seems the debug output always says the error is one line up from where it actually is - but I'm guessing that's because the tokenizer is behind where the input is being consumed? Is that right/ expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error location can be off by a few lines when the code generator throws the error. That's because we parse each declaration fully before passing it to the code generator, so the line_number
and column_number
point to the last token of the declaration (the last }
for function declarations) which is often incorrect.
I've been thinking of annotating the AST objects with the values of line_number
and column_number
when the object is created. That would allow the code generator to throw a more precise error, it would probably still be off by a few characters but it would give a relatively good idea of what line caused the error.
When the error is thrown by the parser however, I haven't noticed the error location to be wrong. Do you have an example?
We need to check how much it costs to check |
Hi - I have ran the benchmark with and without the #ifdef guarding the check on The first run is without, the second run is with.
|
we don't want to slow down the bootstrap of `pnut.exe`, which is in
a certain sense the only benchmark we care about. There is also
something to be said for a simpler tokenizer so that the auditing of
Indeed, supporting \ at EOL is important for macros, but I've never seen
it used anywhere else than between tokens (and it should be trivial to
change the source code in the few cases where it might not be so
already), so I'm not sure this feature pays for itself.
|
pnut.c
Outdated
@@ -757,7 +763,35 @@ void output_declaration_c_code(bool no_header) { | |||
#endif | |||
|
|||
void get_ch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could simplify the whole thing by separating get_ch
in 2:
- One function (let's call it
get_ch_
) that callsfgetc
, does the EOF handling and updates the location (likeget_ch
frommain
). - Another function (called
get_ch
) that callsget_ch_
that loops when it encounters a\\n
sequence.
That way we wouldn't need to repeat the location logic.
And when SUPPORT_LINE_CONTINUATION
is off, get_ch_
would simply be called get_ch
.
cfa2032
to
5eaa69f
Compare
9e00727
to
391e186
Compare
zsh doesn't handle octal and hex literals properly, so use a decimal literal instead.
Problem statement: #135
This PR aims to allow pnut.c::get_ch() as simply as possible to walk over line continuations while being compatible with the existing code for parsing escape sequences.
This support is optional and can be disabled by removing line 48:
#define SUPPORT_LINE_CONTINUATION