Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parse timing by separating consume() into fetch() and consume() #2054

Merged
merged 4 commits into from
Aug 21, 2019

Conversation

edemaine
Copy link
Member

@edemaine edemaine commented Jul 19, 2019

Fix #1989, and generally cleanup parse timing (many fewer hoops to jump through) by defining two methods on parser:

  1. consume() marks the current token (nextToken) as "done", but doesn't yet fetch the next token (setting nextToken to null).
  2. fetch() fetches the next token if we don't already have one (e.g., if last token was consume()d).

Before this change, consume() did both actions. By separating them, and allowing the parser to live in a state for a while where nextToken is null, it is far easier to change settings (in particular, math/text mode and catcodes) before reading the next token, in a way that depends on what we're parsing. For example, if an argument should be treated in text mode, we can just set the mode in the argument parser, instead of when the previous token was consumed. Similarly, if an argument should be treated as a URL, we can just set the catcode of % in the URL argument parser, and reset it after. We no longer have to take care to reset things before calling consume().

This change mostly involves changing this.nextToken to this.fetch(). In a perfect world, we could use slightly fewer calls to this.fetch(), but Flow doesn't realize that this.nextToken will be non-null after a
call to this.fetch(), so we need to use a few more calls to this.nextToken() or a few more local nextToken variables.

I'm kind of impressed how such a major change could be relatively easy to do. It's been extremely helpful to have so many tests and type checking in our codebase. For reference, I previously tried to change all the methods to load the token when they need them, and not bother marking used tokens as consumed, so that nextToken is never null. This proved very difficult (maybe impossible) because we sometimes get nested deep within parseExpression and a single token needs to trigger all of them to terminate (e.g., \fbox{$\displaystyle x$}, which is what \boxed{x} expands to, the last $ token ends the \displaystyle implicit group and the math mode). So I opted for this approach, which is a much smaller (and more obviously correct) change to the code.

Fix KaTeX#1989, and generally cleanup parse timing (many fewer hoops to jump
through) by defining two methods on parser:

1. `consume()` marks the current token (`nextToken`) as "done", but
   doesn't yet fetch the next token (setting `nextToken` to `null`).
2. `fetch()` fetches the next token if we don't already have one
   (e.g., if last token was `consume()`d).

Before this change, `consume()` did both actions.  By separating them,
and allowing the parser to live in a state for a while where `nextToken`
is `null`, it is far easier to change settings (in particular, math/text
mode and catcodes) before reading the next token, in a way that depends
on what we're parsing.  For example, if an argument should be treated in
text mode, we can just set the mode in the argument parser, instead of
when the previous token was consumed.  Similarly, if an argument should
be treated as a URL, we can just set the catcode of `%` in the URL
argument parser, and reset it after.  We no longer have to take care to
reset things before calling `consume()`.

This change mostly involves changing `this.nextToken` to `this.fetch()`.
In a perfect world, we could use slightly fewer calls to `this.fetch()`,
but Flow doesn't realize that `this.nextToken` will be non-null after a
call to `this.fetch()`, so we need to use a few more calls to
`this.nextToken()` or a few more local `nextToken` variables.
@codecov-io
Copy link

codecov-io commented Jul 19, 2019

Codecov Report

Merging #2054 into master will decrease coverage by 0.02%.
The diff coverage is 98.27%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #2054      +/-   ##
=========================================
- Coverage   94.83%   94.8%   -0.03%     
=========================================
  Files          81      81              
  Lines        5281    5277       -4     
  Branches      923     924       +1     
=========================================
- Hits         5008    5003       -5     
- Misses        249     250       +1     
  Partials       24      24
Flag Coverage Δ
#screenshotter 89.68% <93.1%> (+0.01%) ⬆️
#test 88.53% <98.27%> (-0.03%) ⬇️
Impacted Files Coverage Δ
src/functions/text.js 100% <ø> (ø) ⬆️
src/defineFunction.js 93.75% <ø> (ø) ⬆️
src/environments/array.js 98.43% <100%> (ø) ⬆️
src/functions/math.js 100% <100%> (ø) ⬆️
src/Parser.js 96.56% <98.14%> (-0.32%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 081b44a...3e3bf84. Read the comment docs.

@kevinbarabash
Copy link
Member

This is great! I'll try to review this sometime this week.

@ylemkimon
Copy link
Member

Thank you for the PR. This would allow manipulation of nextToken on the Parser level, which is needed fixing issues like #924!

src/Parser.js Outdated Show resolved Hide resolved
@ylemkimon ylemkimon merged commit 2a3013d into KaTeX:master Aug 21, 2019
@edemaine
Copy link
Member Author

@ylemkimon Thanks for the review! I do think this will be a helpful step for parsing. I hadn't thought about #924 in a while, but it would be great if it helps with that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

First character of array doesn't get its own group
4 participants