-
Notifications
You must be signed in to change notification settings - Fork 6
Complete Indent Block Parsing
Related: Blocks-Instead-of-Brackets, Significant-Whitespace-Design
CaffeineScript is founded on the idea that it is possible to do Indent-Block parsing consistently and universally throughout the language. Other indent-based languages (Python, CoffeeScript) do a pre-pass where they essentially insert "{" and "}" brackets around any detected block. The problem is the parser doesn't actually understand indents, and the pre-pass doesn't understand grammatical structure. I found this approach complex and error-prone, particularly for grammars such as string-blocks with interpolation.
- Parsing Python
- CoffeeScript's Lexer (search for 'INDENT' and 'OUTDENT')
It took me several months to figure out how to achieve "complete indent-block parsing" efficiently. My answer was to combine parsing-expression-grammars (PEG) with 'sub-parsing.' Basically, while parsing, when a block-start is expected and detected, a new parser is instantiated and run over the contents of the deindented block source-text. While subparsing is relatively straightforward, it only works with PEGs, which combine both the lexing and parsing into one step.
Subparsing Example:
# input:
if foo
bar()
baz ""
boom()
bam()
# deindented, subparsed block #1, parsing rule: statements
bar()
baz ""
boom()
bam()
# deindented, subparsed block #2, parsing rule: string
boom()
Output:
if (foo) {
bar();
baz("boom()");
bam();
}
Because a new subparser is started for each block, that block can be parsed arbitrarily. CaffeineScript uses this for string-blocks, comment-blocks and regexp-blocks.
The result is my Caffeine8 parser library. This library stands on its own. You can use it to write your own parsers, optionally with complete-indent-block-parsing support.
- Home
- Get Started
- Benefits
- Highlights
- Productivity by Design
- CaffeineScript Design
- What is CaffeineScript Good For?
- Get the most out of JavaScript
- Language Comparison
- CHANGELOG
- Blocks Instead of Brackets
- Binary Line-Starts
- Everything Returns a Value
- Streamlined Modules
- Scopes and Variables
- Optional Commas
- Semantics
- Ambiguities