Fix parse timing by separating consume() into fetch() and consume() #2054

edemaine · 2019-07-19T20:38:28Z

Fix #1989, and generally cleanup parse timing (many fewer hoops to jump through) by defining two methods on parser:

consume() marks the current token (nextToken) as "done", but doesn't yet fetch the next token (setting nextToken to null).
fetch() fetches the next token if we don't already have one (e.g., if last token was consume()d).

Before this change, consume() did both actions. By separating them, and allowing the parser to live in a state for a while where nextToken is null, it is far easier to change settings (in particular, math/text mode and catcodes) before reading the next token, in a way that depends on what we're parsing. For example, if an argument should be treated in text mode, we can just set the mode in the argument parser, instead of when the previous token was consumed. Similarly, if an argument should be treated as a URL, we can just set the catcode of % in the URL argument parser, and reset it after. We no longer have to take care to reset things before calling consume().

This change mostly involves changing this.nextToken to this.fetch(). In a perfect world, we could use slightly fewer calls to this.fetch(), but Flow doesn't realize that this.nextToken will be non-null after a
call to this.fetch(), so we need to use a few more calls to this.nextToken() or a few more local nextToken variables.

I'm kind of impressed how such a major change could be relatively easy to do. It's been extremely helpful to have so many tests and type checking in our codebase. For reference, I previously tried to change all the methods to load the token when they need them, and not bother marking used tokens as consumed, so that nextToken is never null. This proved very difficult (maybe impossible) because we sometimes get nested deep within parseExpression and a single token needs to trigger all of them to terminate (e.g., \fbox{$\displaystyle x$}, which is what \boxed{x} expands to, the last $ token ends the \displaystyle implicit group and the math mode). So I opted for this approach, which is a much smaller (and more obviously correct) change to the code.

Fix KaTeX#1989, and generally cleanup parse timing (many fewer hoops to jump through) by defining two methods on parser: 1. `consume()` marks the current token (`nextToken`) as "done", but doesn't yet fetch the next token (setting `nextToken` to `null`). 2. `fetch()` fetches the next token if we don't already have one (e.g., if last token was `consume()`d). Before this change, `consume()` did both actions. By separating them, and allowing the parser to live in a state for a while where `nextToken` is `null`, it is far easier to change settings (in particular, math/text mode and catcodes) before reading the next token, in a way that depends on what we're parsing. For example, if an argument should be treated in text mode, we can just set the mode in the argument parser, instead of when the previous token was consumed. Similarly, if an argument should be treated as a URL, we can just set the catcode of `%` in the URL argument parser, and reset it after. We no longer have to take care to reset things before calling `consume()`. This change mostly involves changing `this.nextToken` to `this.fetch()`. In a perfect world, we could use slightly fewer calls to `this.fetch()`, but Flow doesn't realize that `this.nextToken` will be non-null after a call to `this.fetch()`, so we need to use a few more calls to `this.nextToken()` or a few more local `nextToken` variables.

codecov-io · 2019-07-19T20:40:43Z

Codecov Report

Merging #2054 into master will decrease coverage by 0.02%.
The diff coverage is 98.27%.

@@            Coverage Diff            @@
##           master   #2054      +/-   ##
=========================================
- Coverage   94.83%   94.8%   -0.03%     
=========================================
  Files          81      81              
  Lines        5281    5277       -4     
  Branches      923     924       +1     
=========================================
- Hits         5008    5003       -5     
- Misses        249     250       +1     
  Partials       24      24

Flag	Coverage Δ
#screenshotter	`89.68% <93.1%> (+0.01%)`	⬆️
#test	`88.53% <98.27%> (-0.03%)`	⬇️

Impacted Files	Coverage Δ
src/functions/text.js	`100% <ø> (ø)`	⬆️
src/defineFunction.js	`93.75% <ø> (ø)`	⬆️
src/environments/array.js	`98.43% <100%> (ø)`	⬆️
src/functions/math.js	`100% <100%> (ø)`	⬆️
src/Parser.js	`96.56% <98.14%> (-0.32%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 081b44a...3e3bf84. Read the comment docs.

kevinbarabash · 2019-08-20T11:03:30Z

This is great! I'll try to review this sometime this week.

ylemkimon · 2019-08-21T03:08:56Z

Thank you for the PR. This would allow manipulation of nextToken on the Parser level, which is needed fixing issues like #924!

src/Parser.js

edemaine · 2019-08-21T11:41:12Z

@ylemkimon Thanks for the review! I do think this will be a helpful step for parsing. I hadn't thought about #924 in a while, but it would be great if it helps with that!

Remove now-unnecessary consumeMath

2bec6db

Merge branch 'master' into fetch-consume

9f8c0c5

ylemkimon reviewed Aug 21, 2019

View reviewed changes

src/Parser.js Outdated Show resolved Hide resolved

Update Parser.js

3e3bf84

ylemkimon approved these changes Aug 21, 2019

View reviewed changes

ylemkimon merged commit 2a3013d into KaTeX:master Aug 21, 2019

snyk-bot mentioned this pull request Feb 23, 2020

[Snyk] Upgrade katex from 0.9.0 to 0.11.1 saurabharch/Rocket.Chat#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parse timing by separating consume() into fetch() and consume() #2054

Fix parse timing by separating consume() into fetch() and consume() #2054

edemaine commented Jul 19, 2019 •

edited

Loading

codecov-io commented Jul 19, 2019 •

edited

Loading

kevinbarabash commented Aug 20, 2019

ylemkimon commented Aug 21, 2019

edemaine commented Aug 21, 2019

Fix parse timing by separating consume() into fetch() and consume() #2054

Fix parse timing by separating consume() into fetch() and consume() #2054

Conversation

edemaine commented Jul 19, 2019 • edited Loading

codecov-io commented Jul 19, 2019 • edited Loading

Codecov Report

kevinbarabash commented Aug 20, 2019

ylemkimon commented Aug 21, 2019

edemaine commented Aug 21, 2019

edemaine commented Jul 19, 2019 •

edited

Loading

codecov-io commented Jul 19, 2019 •

edited

Loading