Complete Indent Block Parsing

Related: Blocks-Instead-of-Brackets

CaffeineScript is founded on the idea that it is possible to do Indent-Block parsing consistently and universally throughout the language. Other indent-based languages (Python, CoffeeScript) resort to hacks to parse indent-blocks, which are inherently context sensitive, with LALR parsers which can only parse contex-free syntax. The hack is they, essentially, insert "{" and "}" brackets around the detected blocks in the lexer pass. This approach is fundamentally incompatible with indent-based comments, indent-based-strings and other constructs which change how the contents of a given block is parsed. It wouldn't work to insert "{" and "}" around a string-block.

Parsing Python
CoffeeScript's Lexer (search for 'INDENT' and 'OUTDENT')

It took me several months to figure out how to achieve "complete indent-block parsing" efficiently. My answer was to combined parsing-expression-grammars (PEG) with 'sub-parsing.' Basically, while parsing, when a block-start is expected and detected, a new parser is instantiated and run over the contents of the deindented block source-text. While subparsing is relatively streightforward, it only works with PEGs, which combine both the lexing and parsing into one step.

Subparsing Example:

# input:
if foo
  bar()
  baz ""
    boom()
  bam()

# deindented, subparsed block #1, parsing rule: statements
bar()
baz ""
  boom()
bam()

# deindented, subparsed block #2, parsing rule: string
boom()

Output:

if (foo) {
  bar();
  baz("boom()");
  bam();
}

Because a new subparser is started for each block, that block can be parsed arbitrarily. CaffeineScript uses this for string-blocks, comment-blocks and regexp-blocks.

The result is my BabelBridgeJS parser library. This library stands on its own. You can use it to write your own parsers, optionally with complete-indent-block-parsing support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete Indent Block Parsing

CaffeineScript

CaffeineMC

Best of JavaScript, Even Better

Concepts

Opinion

Applications

Reference

Modules

Literals

Operators

Migration

Clone this wiki locally