Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: annotated code blocks #65

Merged
merged 1 commit into from
Nov 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 127 additions & 44 deletions corpus/codeblock.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,15 @@ block3:
(block
(line
(codeblock
(line))))
(code
(line)))))
Copy link
Member

@justinmk justinmk Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm admittedly going in circles, but: I guess it makes sense for (code) to be named (block). Because the block+lines here (nested in codeblock) is analogous to non-code (block (line...)).

I'm not sure what the best practice/convention is for grammars: is it better to re-use names for similar concepts, or should we use globally unique names like (code (codeline)) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it makes sense for (code) to be named (block).

But you can't, since (block) is already defined at this point (as something different) -- you'll get a conflict in the grammar you need to resolve somehow. And a different name is by far the easiest way to do that.

Copy link
Member

@justinmk justinmk Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow. It can be aliased, as done elsewhere (e.g. we already alias to (line) here). And conflicts are based on the "fully qualified node path", irrespective of the node names.

Anyways let's go with (code (codeline)) I guess.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead, try it.

Copy link
Member

@justinmk justinmk Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried this, it works fine (no conflicts):

alias(repeat1(alias($.line_code, $.line)), $.block),

But this is just academic. If you're ok with (code (codeline)) , I think I favor that too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do realize that the alias is to lines (plural) of code?

Copy link
Member

@justinmk justinmk Nov 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But clearly I'm missing something or we're miscommunicating. Note that (codeline) should be understood as any number of children, e.g.:

(code
    (codeline)
    (codeline)
    …)

If I still misunderstood, sorry. Like I said, my comments aren't blockers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat, yes, because I don't get what the end goal is here, so it feels like stabbing in the dark, which is frustrating.

(I do admit that my last comment was sheer pedantry, born out of said feeling.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I started this thread by literally saying "I'm admittedly going in circles". The goal was just to explore options.

Copy link
Member Author

@clason clason Nov 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just tried this, it works fine (no conflicts):

alias(repeat1(alias($.line_code, $.line)), $.block),

Interesting, that's what I tried. Looks like switching from optional to choice avoids the conflict then; that's good. Feel free to rename as you wish before tagging a release so I can downstream the queries. (And when you do, may want to update the description in the README, which I missed -- sorry!)

Note that (codeline) should be understood as any number of children, e.g.:
...
But this is just academic. If you're ok with (code (codeline)) , I think I favor that too.

There seems to be indeed some miscommunication. My point is that the contents must be a single block that you pass to the injected parser. Otherwise every single line gets parsed in isolation which a) is wasteful and b) leads to errors with constructs spanning multiple lines (as in my -- admittedly academic -- example above).

(block
(line
(word))
(line
(codeblock
(line))))
(code
(line)))))
(block
(line
(word)))
Expand All @@ -39,7 +41,8 @@ block3:
(word))
(line
(codeblock
(line))))
(code
(line)))))
(block
(line
(word))))
Expand Down Expand Up @@ -73,9 +76,10 @@ text
(word)
(word)
(codeblock
(line)
(line)
(line))))
(code
(line)
(line)
(line)))))
(block
(line
(word)
Expand All @@ -89,13 +93,51 @@ text
(word))
(line
(codeblock
(line)
(line)
(line))))
(code
(line)
(line)
(line)))))
(block
(line
(word))))

================================================================================
codeblock with language annotations
================================================================================
This code is in Lua: >lua
local foo = 'bar'
<

This code is in Vimscript: >vim
let foo = "bar"
<

--------------------------------------------------------------------------------

(help_file
(block
(line
(word)
(word)
(word)
(word)
(word)
(codeblock
(language)
(code
clason marked this conversation as resolved.
Show resolved Hide resolved
(line)))))
(block
(line
(word)
(word)
(word)
(word)
(word)
(codeblock
(language)
(code
(line))))))

================================================================================
codeblock with implicit stop
================================================================================
Expand Down Expand Up @@ -127,9 +169,10 @@ H3 HEADLINE *foo*
(block
(line
(codeblock
(line)
(line)
(line)))
(code
(line)
(line)
(line))))
(line
(h1
(word)
Expand All @@ -143,8 +186,9 @@ H3 HEADLINE *foo*
(block
(line
(codeblock
(line)
(line)))
(code
(line)
(line))))
(line
(h2
(word)
Expand All @@ -153,8 +197,9 @@ H3 HEADLINE *foo*
(block
(line
(codeblock
(line)
(line)))
(code
(line)
(line))))
(line
(h3
(uppercase_name)
Expand Down Expand Up @@ -185,15 +230,16 @@ x
(line
(word)
(codeblock
(line)
(line)
(line)
(line)
(line)
(line)
(line)
(line)
(line)))
(code
(line)
(line)
(line)
(line)
(line)
(line)
(line)
(line)
(line))))
(line
(word))))

Expand Down Expand Up @@ -224,19 +270,21 @@ tricky
(block
(line
(codeblock
(line)
(line)
(line))))
(code
(line)
(line)
(line)))))
(block
(line
(word)
(codeblock
(line)
(line)
(line)
(line)
(line)
(line))))
(code
(line)
(line)
(line)
(line)
(line)
(line)))))
(block
(line
(word))))
Expand Down Expand Up @@ -266,8 +314,9 @@ To test for a non-empty string, use empty(): >
(word)
(word)
(codeblock
(line)
(line)))
(code
(line)
(line))))
(line
(word)
(word)
Expand All @@ -290,8 +339,9 @@ To test for a non-empty string, use empty(): >
(word)
(word)
(codeblock
(line)
(line)))
(code
(line)
(line))))
(line
(word)
(word)
Expand All @@ -303,7 +353,8 @@ To test for a non-empty string, use empty(): >
(word)
(word)
(codeblock
(line)))))
(code
(line))))))

================================================================================
codeblock stop and start on same line
Expand All @@ -329,12 +380,14 @@ codeblock stop and start on same line
(line
(word)
(codeblock
(line))))
(code
(line)))))
(block
(line
(word)
(codeblock
(line))))
(code
(line)))))
(block
(line
(tag
Expand All @@ -350,8 +403,9 @@ codeblock stop and start on same line
(line
(word)
(codeblock
(line)
(line)))))
(code
(line)
(line))))))

================================================================================
NOT codeblock: whitespace after ">"
Expand All @@ -362,6 +416,11 @@ x >
x x>
line

Not a language annotation: > lua
line

Not a language annotation: >lua is not at EOL
line


--------------------------------------------------------------------------------
Expand All @@ -378,4 +437,28 @@ x x>
(word)
(word))
(line
(word))))
(word)))
(block
(line
(word)
(word)
(word)
(word)
(word)
(word))
(line
(word)))
(block
(line
(word)
(word)
(word)
(word)
(word)
(word)
(word)
(word)
(word)
(word))
(line
(word))))
3 changes: 2 additions & 1 deletion corpus/heading3-column_heading.txt
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,8 @@ column_heading should NOT parse atoms (links, tags, etc.) (FIXME)
(line
(word)
(codeblock
(line))))
(code
(line)))))
(block
(line
(column_heading
Expand Down
10 changes: 6 additions & 4 deletions corpus/line_block.txt
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,8 @@ listitem with codeblock
(word)
(word))
(codeblock
(line)))
(code
(line))))
(line_li
(line
(word)
Expand All @@ -215,9 +216,10 @@ listitem with codeblock
(line
(word)
(codeblock
(line)
(line)
(line))))
(code
(line)
(line)
(line)))))
(line_li
(line
(word))
Expand Down
8 changes: 5 additions & 3 deletions grammar.js
Original file line number Diff line number Diff line change
Expand Up @@ -121,11 +121,13 @@ module.exports = grammar({
repeat($._blank),
),

// Codeblock: preformatted block of lines starting with ">".
// Codeblock: preformatted block of lines starting with ">" or ">language" at EOL
codeblock: ($) => prec.right(seq(
'>',
token.immediate('\n'),
repeat1(alias($.line_code, $.line)),
choice(
alias(token.immediate(/[a-z0-9]+\n/), $.language),
token.immediate('\n')),
alias(repeat1(alias($.line_code, $.line)), $.code),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this for semantics or does it have a functional purpose? Maybe $.codeline fits better with the existing naming patterns.

Copy link
Member Author

@clason clason Nov 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary (as far as I can tell) so the whole block gets punted to the injected parser; you can't parse, e.g.,

for k,v, in pairs(foo) do
  print(k,
    v,
  )
end

purely line by line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this aliases the whole set of repeated lines as a single node, not each single line.

Copy link
Member

@justinmk justinmk Nov 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we do that with (codeblock) itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that includes the markers, which are not part of the injected code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(But I'm not married to the name, of course; I just need a single node that I can capture as @contents; see the injections.scm query.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that includes the markers, which are not part of the injected code.

I want to believe that's solvable because they're anonymous :) Any hope?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not unless you give me some ;) Honestly, it would help me if you could explain the nature of the objection to doing it the way I was able to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just elegance/aesthetics (unnecessary nesting)

// Codeblock ends if a line starts with non-whitespace.
// Terminating "<" is consumed in other rules.
)),
Expand Down
2 changes: 1 addition & 1 deletion queries/help/highlights.scm
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
text: (_) @text.literal)
(codeblock) @text.literal
(codeblock
">" @conceal (#set! conceal ""))
[">" (language)] @conceal (#set! conceal ""))
(block
"<" @conceal (#set! conceal ""))
(argument) @parameter
Expand Down
3 changes: 3 additions & 0 deletions queries/help/injections.scm
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
(codeblock
(language) @language
(code) @contents)
Loading