Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix very slow regexp #87

Closed
wants to merge 1 commit into from
Closed

Fix very slow regexp #87

wants to merge 1 commit into from

Conversation

caub
Copy link

@caub caub commented May 12, 2018

No description provided.

@caub caub changed the title deprecation warnings + indent deprecation warnings and other cleaning May 12, 2018
@caub caub force-pushed the c branch 3 times, most recently from b5c3daa to 95acb27 Compare May 20, 2018 15:01
@caub
Copy link
Author

caub commented May 20, 2018

@formulahendry I did a very slight change to fix this 'infinite' running regex, (just [^<>]*? to [^<>]?)

the meaning is slightly different, but it's necessary to avoid the exponential backtracking that comes with [^<>]*[^\s/<>=]*

some hints from: http://www.rexegg.com/regex-explosive-quantifiers.html

Could you merge it?

@caub
Copy link
Author

caub commented Jun 8, 2018

@formulahendry Can you check it please?

@garygreen
Copy link

garygreen commented Jul 31, 2018

I did a very slight change to fix this 'infinite' running regex, (just [^<>]*? to [^<>]?)

Yet the PR has hundreds of lines changed with formatting stuff and merge problems. You might have more luck getting this merged if you changed a small amount of code... 😄

@caub caub force-pushed the c branch 6 times, most recently from d72e74a to a6de765 Compare July 31, 2018 15:35
@caub
Copy link
Author

caub commented Jul 31, 2018

right, done

Can @formulahendry see this?

also single or double quotes? there is a mix of them in the sources

@caub caub force-pushed the c branch 2 times, most recently from 9dab3ae to 7ca2272 Compare August 4, 2018 14:04
@caub
Copy link
Author

caub commented Aug 4, 2018

I think I'll make a quick cron script that gently ping @formulahendry every week, to hopefully merge this

@@ -78,13 +78,30 @@
},
"auto-close-tag.excludedTags": {
"type": "array",
"default": ["area", "base", "br", "col", "command", "embed", "hr", "img", "input", "keygen", "link", "meta", "param", "source", "track", "wbr"],
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npm i has reformatted it

package.json Outdated
"description": "Set the tag list that would not be auto closed.",
"scope": "resource"
},
"auto-close-tag.SublimeText3Mode": {
"type": "boolean",
"default": false,
"default": true,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd not change the default setting if we could not make sure most of users want this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want this!

@@ -0,0 +1,27 @@
export default function getCloseTag(text: string, excludedTags: string[]): string {
const regex = /<(\/?[a-zA-Z][a-zA-Z0-9:\-_.]*)(?:\s+[^<>]?[^\s/<>=]*)*\s?>/g;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<abc key=value> would not work correctly.

Copy link
Author

@caub caub Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes indeed it's broken, need to add many more tests, thanks for noticing

// original regex:
'lol <a href="/ok"></a><abc key="value"> </'.match(/<(\/?[a-zA-Z][a-zA-Z0-9:\-_.]*)(?:\s+[^<>]*?[^\s/<>=]+?)*?\s?>/g)
    ["<a href="/ok">", "</a>", "<abc key="value">"]
// with my change:
'lol <a href="/ok"></a><abc key="value"> </'.match(/<(\/?[a-zA-Z][a-zA-Z0-9:\-_.]*)(?:\s+[^<>]?[^\s/<>=]*)*\s?>/g)
    ["</a>"]

@@ -0,0 +1,27 @@
export default function getCloseTag(text: string, excludedTags: string[]): string {
const regex = /<(\/?[a-zA-Z][a-zA-Z0-9:\-_.]*)(?:\s+[^<>]*?[^\s/<>=]*)*\s?>/g;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'<abc => should not get close tag.

Copy link
Owner

@formulahendry formulahendry Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest concern is that: although this may improve the performance, but might break several kinds of scenario,

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(?:\s+[^<>]*?[^\s/<>=]*)* would make [^\s/<>=]* useless
(?:\s+[^<>]*?[^\s/<>=]*)* is similar to (?:\s+[^<>]*?)*

Copy link
Author

@caub caub Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advices
<abc => is a tag with an empty attribute key and value? or something else?

currently I hit again the slowness issue in the last test case, I think I'll try a different approach than the big regex, and search backward with smaller regexes and a loop

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow function shouldn't trigger auto-close: #5

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search backward would be great idea! 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh you meant '<abc =>' like in a string? (or a comment, ..). Honestly I don't think this extension can be perfect, with jsx expressions we'd need to parse them to be 100% correct, and it's not even possible when we are in an unfinished one. So we might need to accept false-positiives like <foo><bar x=1 y={'<lol'} ></ that gives lol currently

Copy link

@joeysino joeysino Nov 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind if a few rare edge cases are broken, if that means the performance issues are fixed. Performance a big concern for this extension.

(I just wish we could add a timeout to regexps so they will fail if they take too long!)

(Or is it really a regexp problem? I wonder if a time check inside the while might be able to break out of problematic loops.)

@@ -0,0 +1,61 @@
import { readFileSync } from "fs";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@caub caub Aug 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will try to add tests, (code-runner would need tests too, if I figure it, I'd update that project too)

@caub
Copy link
Author

caub commented Aug 25, 2018

Ok, so I made a new approach by searching tags backward, it's not 100% robust, we can try to imrpove it more, like the current one anyway (jsx expressions can be very tricky), but no more performance issues at least

it's tested like the pointers you showed and run in CI

@garethedwards-tass
Copy link
Contributor

This has helped me a bit, thank you, but also your code wasn't catering for excludedTags, and I think this does the job.

`const TAG_RE = /<(/?[a-zA-Z][a-zA-Z0-9:_.-])(?![\s\S]</?[a-zA-Z])/;

function getCloseTag(text: string, excludedTags: string[]): string {
const s = text[text.length - 1] === '/' && text[text.length - 2] === '<' ? text.slice(0, -2) : text[text.length - 1] === '<' ? text.slice(0, -1) : text;
let m = s.match(TAG_RE);
// while we catch a closing tag, we jump directly to the matching opening tag
while (m && ( m[1][0] === '/' || excludedTags.indexOf(m[1].toLowerCase()) !== -1 )) {
const s2 = s.slice(0, m.index);
if ( m[1][0] === '/' ) {
// Already Closed Tags
const m2 = s2.match(RegExp(<${m[1].slice(1)}.*$, 'm'));
if (!m2) return '';
m = s.slice(0, m2.index).match(TAG_RE);
} else {
// Excluded Tags
m = s.slice(0, m.index).match(TAG_RE);
}
}
if (!m) return null;
return (text[text.length - 1] === '/' && text[text.length - 2] === '<' ? m[1] : text[text.length - 1] === '<' ? '/' + m[1] : '</' + m[1]) + '>';
}
`

@caub
Copy link
Author

caub commented Jan 12, 2024

Sorry guys I can't maintain this PR anymore, please go ahead with a new PR

@caub caub closed this Jan 12, 2024
@garethedwards-tass
Copy link
Contributor

@caub curious, are you using different tools or just not this extension anymore ? I've built in some of this functionality into our internal tools to avoid devs having to use this extension.

@caub
Copy link
Author

caub commented Jan 17, 2024

I went back to sublime-text

@garethedwards-tass
Copy link
Contributor

@caub we took this route a few times in the past, curious what languages you're working with? we're writing CFML and I've made some headway using your code, its not perfect but it works ok.

@caub
Copy link
Author

caub commented Jan 18, 2024

@garethedwards-tass you can definitely make progress, try to make you own PR and the maintainer may merge it

I'm only working on JS (node+react), so sublimetext support things decently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants