Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex: start line character ignored when in group #790

Closed
MischaDeNola opened this issue Sep 27, 2017 · 6 comments
Closed

Regex: start line character ignored when in group #790

MischaDeNola opened this issue Sep 27, 2017 · 6 comments
Labels
stale-issue There has been no activity for a year.

Comments

@MischaDeNola
Copy link

MischaDeNola commented Sep 27, 2017

Have to move start line marker outside of a group in order for it to get the respect it deservers.

-        rule /(^!)(.*$\n?)/ do #doesn't work
+        rule /^(!)(.*$\n?)/ do #totally works
          groups Generic::Subheading, Generic::Strong
        end

Even without groups, just putting the entire regex in a single group rule /(^!.*$\n?)/, Generic::Strong will cause it to disrespect the ^. I couldn't find anything about Ruby regex that explains this, which is why I'm reporting it as either a bug in ruby, or in my ability to google.


Potentially useful longer post

@jneen Hi, I'm a writer, not a developer, so my apologies for sounding like an idiot:

A couple years ago I wrote a custom lexer for rouge for my company's own proprietary language. I recently added a diff rule (for strong) per revathskumar's example so that we could draw attention to particular lines in code blocks when writing instructional docs.

To keep customers from copying the ! from our code blocks, I split the labeling with the groups thing:

        rule /(^!)(.*$\n?)/ do
          groups Generic::Subheading, Generic::Strong
        end

I then just set the style for Subheading to display: none; and it worked like a charm. We highlight the line while excluding the bang. Yay!

However, I recently discovered a bug with this in a couple of rare situations where the rule would trigger inappropriately--not respecting the start-of-line assertion.

function(
domain!whatever(
string: "hello"
)
)

Would render that as:

function(
domainwhatever(
string: "hello"
)
)

When I removed the groups, it did NOT trigger on domain!whatever:

rule /^!.*$\n?/, Generic::Strong

I don't know why using groups makes it act like this. I'm working around the problem by just adding a rule to catch domain!whatever.

        rule /(^!)(.*$\n?)/ do
          groups Generic::Subheading, Generic::Strong
        end
+        rule /(domain!\[a-zA-Z_][a-zA-Z0-9_]*)/, Str # this stops weird bug

Just in case this is an actual bug, I'm letting you know.

Thank you for your time (and for rouge!)

@jneen
Copy link
Member

jneen commented Sep 27, 2017

Ah, yes. It is a known problem that traces back to a StringScanner bug that I opend a loooong time ago: https://bugs.ruby-lang.org/issues/7092. Most of the time we work around it by detecting \n and pushing a :bol state that has a default :pop! at the end - see C for an example (preprocessor macros like #pragma etc use this approach)

@jneen
Copy link
Member

jneen commented Sep 27, 2017

We sort of work around the above by detecting if a regex's source begins with ^, and if so manually checking #beginning_of_line?, but as you have found it doesn't work if you put it anywhere else in the regex.

See https://github.com/jneen/rouge/blob/master/lib/rouge/regex_lexer.rb#L19 and https://github.com/jneen/rouge/blob/master/lib/rouge/regex_lexer.rb#L297.

@MischaDeNola
Copy link
Author

Ok, good to know, thanks! I was losing confidence in my regex mastery; as a writer, it's the only thing I can do better than the engineers here.

@jneen
Copy link
Member

jneen commented Sep 27, 2017

@jneen Hi, I'm a writer, not a developer, so my apologies for sounding like an idiot:

Also, just so you know you didn't sound like an idiot at any point :]. This is a weird and subtle bug that I'm sure we'll have to reference many times in the future.

@MischaDeNola
Copy link
Author

Thanks! Not to toot my own horn too much, but I have copy-and-pasted others' code for many years now. With enough time and fiddling, something usually happens. As I like to say around here, "Whatever you can do in an hour, I can do in two days."

I'm just waiting for the right dev job at this point--"Looking for someone with extraordinary regex skills, mind-boggling functional spreadsheets, but just a faint glimmer of Groovy, Python, and Ruby."

@stale
Copy link

stale bot commented Jun 19, 2019

This contribution has been automatically marked as stale because it has not had any activity for more than a year. It will be closed if no additional activity occurs within the next 14 days.

@stale stale bot added the stale-issue There has been no activity for a year. label Jun 19, 2019
@stale stale bot closed this as completed Jul 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale-issue There has been no activity for a year.
Projects
None yet
Development

No branches or pull requests

2 participants