-
Notifications
You must be signed in to change notification settings - Fork 6
Complete Indent Block Parsing
Related: Blocks-Instead-of-Brackets
CaffeineScript is founded on the idea that it is possible to do Indent-Block parsing consistently and universally throughout the language. Other indent-based languages (Python, CoffeeScript) resort to hacks to parse indent-blocks, which are inherently context sensitive with LALR parsers which can only parse contex-free syntax. The hack is they, essentially, insert "{" and "}" brackets around the detected blocks in the lexer pass. This approach is fundamentally incompatible with indent-based comments, indent-based-strings and other constructs which change how the contents of a given block is parsed. It wouldn't work to insert "{" and "}" around a string-block.
- Parsing Python
- CoffeeScript's Lexer (search for 'INDENT' and 'OUTDENT')
It took me several months to figure out how to achieve "complete indent-block parsing" efficiently. My answer was to combined parsing-expression-grammars (PEG) with 'sub-parsing.' Basically, while parsing, when a block-start is expected and detected, a new parser is instantiated and run over the contents of the deindented block source-text. While subparsing is relatively streightforward, it only works with PEGs, which combine both the lexing and parsing into one step.
Subparsing Example:
# input:
if foo
bar()
baz ""
boom()
bam()
# deindented, subparsed block #1, parsing rule: statements
bar()
baz ""
boom()
bam()
# deindented, subparsed block #2, parsing rule: string
boom()
Output:
if (foo) {
bar();
baz("boom()");
bam();
}
Because a new subparser is started for each block, that block can be parsed arbitrarily. CaffeineScript uses this for string-blocks, comment-blocks and regexp-blocks.
The result is my BabelBridgeJS parser library. This library stands on its own. You can use it to write your own parsers, optionally with complete-indent-block-parsing support.
- Home
- Get Started
- Benefits
- Highlights
- Productivity by Design
- CaffeineScript Design
- What is CaffeineScript Good For?
- Get the most out of JavaScript
- Language Comparison
- CHANGELOG
- Blocks Instead of Brackets
- Binary Line-Starts
- Everything Returns a Value
- Streamlined Modules
- Scopes and Variables
- Optional Commas
- Semantics
- Ambiguities