Slow compilation for big bytes pattern #657

Alphare · 2020-03-24T18:16:11Z

As promised in #524, I'm opening this issue because I found a difference in performance between Re2 (with custom bindings) and regex when compiling the regex itself.

Here is the example:

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=eb83255739973d40d16997574ba9dd35

These are the exact bytes that are constructed after parsing Netbeans' old .hgignore and this is the exact way I am using them.

regex takes about 5ms to build the regex while Re2 takes about 1ms on a very stable, 4 core machine.

I cannot offer you an easy reproduction for Re2, however should you want to try to see the difference within Mercurial I can help.

I am using rustc version 1.34.2 and regex version 1.3.5 on a Linux amd64 machine.

Note: I've also seen a small performance difference when using is_match in Mercurial's working directory traversal code, but that is a separate issue probably.

The text was updated successfully, but these errors were encountered:

BurntSushi · 2020-03-24T19:45:08Z

Thanks. It would be useful to just get the actual regex next time. I had to spend a bit of time making it look nicer:

(?x)
.*~$
|.*/build$
|.*/dist$
|.*/nbproject/private$
|.*/nbproject/sqe.properties$
|.*/nbproject/findbugs.settings$
|.*/nbproject/.*\.mk$
|.*/nbproject/.*\.bash$
|.*/test/(coverage
|lib
|results
|work)$
|.*derby\.log
|.*\.class$
|.*\.DS_Store$
|.*\.orig$
|.*\.rej$
|.*\.tmp\.$
|.*\.swp$
|^tags$
|.*/gensrc$
|.*\.ant-targets-build\.xml$
|^\.classpath$
|^\.project$
|^\.externalToolBuilders/.*$
|^hs_err_pid[0-9]*\.log
|^[^/]+/external/[^/]+\.(zip
|jar
|gz
|bz2
|gem
|dll
|xpi
|dylib
|so)$
|^apisupport.osgidemo/osgidemo/netbinox/.*$
|^o.apache.tools.ant.module/external/lib$
|^classfile/\.nbintdb$
|^classfile/classfile\.jar$
|^cnd.apt/src/org/netbeans/modules/cnd/apt/impl/support/generated$
|^cnd.apt/src/org/netbeans/modules/cnd/apt/support/APTTokenTypes\.java$
|^asm/src/org/netbeans/modules/asm/base/generated$
|^cnd.asm/src/org/netbeans/modules/cnd/asm/base/generated$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/generated$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/impl/services/evaluator/parser/generated$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranLexicalPrepass.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranParser.g$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranParser2.g$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranParserActionFactory.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranStream.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranToken.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/FortranTokenStream.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/IActionEnums.java$
|^cnd.modelimpl/src/org/netbeans/modules/cnd/modelimpl/parser/IFortranParserAction.java$
|^db/javahelp/org/netbeans/modules/usersguide$
|^dbschema/src/org/netbeans/modules/dbschema/jdbcimpl/resources/templates\.jar$
|^db.sample/test/unit/.*\.properties$
|^db.metadata.model/test/unit/data/mysql-connector-java-5.1.5-bin.jar$
|^db.sql.visualeditor/src/org/netbeans/modules/db/sql/visualeditor/parser/.*\.java$
|^editor.lib2/test/\.ant-targets-build\.xml$
|^editor/test/unit/src/org/netbeans/modules/editor/java/data/.*\.jar$
|^performance/enterprise/test/.*\.sh$
|^performance/enterprise/test/qa-functional/data/PerformanceTestData$
|^performance/enterprise/test/qa-functional/data/PerformanceTestData/nbproject/build-impl\.xml$
|^performance/enterprise/test/qa-functional/data/PerformanceTestData/nbproject/genfiles\.properties$
|^performance/enterprise/test/qa-functional/data/PerformanceTestWebApplication/nbproject/build-impl\.xml$
|^performance/enterprise/test/qa-functional/data/PerformanceTestWebApplication/nbproject/genfiles\.properties$
|^performance/enterprise/test/qa-functional/src/gui/TempTest\.java$
|^performance/enterprise/test/unit/data/PerformanceTestData$
|^swingapp/src/org/netbeans/modules/swingapp/resources/BasicShellApp\.zip$
|^swingapp/src/org/netbeans/modules/swingapp/resources/CRUDShellApp\.zip$
|^ide/launcher/windows/netbeans\.res$
|^ide.kit/test/qa-functional/data/tomcat6/
|^ide.kit/test/whitelist/temp/
|^identity.ant/nbproject/suite\.properties$
|^identity.profile.api/src/org/netbeans/modules/identity/profile/api/configurator/impl/file/jaxb$
|^identity.samples/src/org/netbeans/modules/identity/samples/resources/.*\.zip$
|^installer/default$
|^installer/.*-subst\.xml$
|^installer/asbundle/customcode/lib$
|^installer/cdc/customcode/lib$
|^installer/cnd/customcode/lib$
|^installer/coreide/customcode/lib$
|^installer/infra/build/nbi_all$
|^installer/j2ee/customcode/lib$
|^installer/jbossbundle/customcode/lib$
|^installer/jdkbundle/customcode/lib$
|^installer/jnlp/release$
|^installer/jnlp/default\.keystore$
|^installer/jnlp/jars$
|^installer/lib/packjars/classes$
|^installer/mobility/customcode/lib$
|^installer/packages/solaris/i386$
|^installer/packages/solaris/sparc$
|^installer/packages/ubuntu/netbeans-5.5/netbeans-5_5_1\.tar\.gz$
|^installer/packages/ubuntu/netbeans-5.5/netbeans-5_5_1-platform\.tar\.gz$
|^installer/packages/ubuntu/netbeans-5.5/nb55-jumbodocpack\.zip$
|^installer/packages/ubuntu/netbeans-5.5/netbeans-5_5_1-javadoc\.tar\.gz$
|^installer/profiler/customcode/lib$
|^j2ee\.dd/src/.+/impl/.+/model(_[0-9]+)*(_frag)?$
|^j2ee.dd/test/unit/data/web\.xml$
|^j2ee.dd/test/unit/data/web\.diff$
|^j2ee.dd/test/unit/data/invalid/web\.xml$
|^j2ee.toplinklib/external/toplink$
|^java.editor/test/unit/src/org/netbeans/modules/editor/java/data/.*\.jar$
|^java.helpset/javahelp/org/netbeans/modules/java/helpset/docs/gui/gui_template\.html$
|^java.navigation/test/run\.sh$
|^jellytools/test/qa-functional/data/SampleProject/build\.xml$
|^jellytools/test/qa-functional/data/SampleProject/nbproject/build-impl\.xml$
|^jellytools/test/qa-functional/data/SampleProject/nbproject/genfiles\.properties$
|^jemmy/run$
|^lib.terminalemulator/demosrc/telnet/de$
|^lexer/gen/build$
|^mvd/lib$
|^mvd/graphlib/classes$
|^vmd.game/\.classpath$
|^vmd.game/\.project$
|^vmd.components.midp/netbeans_midp_components_basic/dist/doc$
|^vmd.components.midp/netbeans_midp_components_basic/dist/lib$
|^vmd.components.midp/netbeans_midp_components_basic/dist/netbeans_midp_components_basic\.jad$
|^vmd.components.midp.pda/netbeans_midp_components_pda/dist/doc$
|^vmd.components.midp.pda/netbeans_midp_components_pda/dist/lib$
|^vmd.components.midp.pda/netbeans_midp_components_pda/dist/netbeans_midp_components_pda\.jad$
|^vmd.components.midp.wma/netbeans_midp_components_wma/dist/doc$
|^vmd.components.midp.wma/netbeans_midp_components_wma/dist/lib$
|^vmd.components.midp.wma/netbeans_midp_components_wma/dist/netbeans_midp_components_wma\.jad$
|^performance/mobility/test/.*\.sh$
|^mobility.project/test/x$
|^nbbuild/netbeans$
|^nbbuild/NetBeans-.*\.zip$
|^nbbuild/testuserdir$
|^nbbuild/NetBeans-.*\.log$
|^nbbuild/l10ndist$
|^nbbuild/nbms$
|^nbbuild/user\.build\.properties$
|^nbbuild/moduledefs-tmp\.properties$
|^nbbuild/site\.build\.properties$
|^nbbuild/external/findbugs/lib$
|^nbbuild/external/findbugs/plugin$
|^nbi/engine/native/jnilib/solaris-sparc/\.make\.state\.Debug$
|^nbi/engine/native/jnilib/solaris-x86/\.make\.state\.Debug$
|^nbi/engine/native/jnilib/windows/vc80\.idb$
|^nbi/engine/native/jnilib/windows/dist/windows-x86\.exp$
|^nbi/engine/native/jnilib/windows/dist/windows-x86\.lib$
|^nbi/engine/native/jnilib/windows/dist/windows-x64\.exp$
|^nbi/engine/native/jnilib/windows/dist/windows-x64\.lib$
|^nbi/infra/build/engine/\.ant-lib$
|^nbi/infra/build/jvm/cache$
|^nbi/infra/build/jvm/build-private\.sh$
|^nbi/infra/sandbox/src$
|^performance/lib$
|^performance/jtrace/win32/jtrace\.dll$
|^performance/jtrace/win32/jtrace\.exp$
|^performance/jtrace/win32/jtrace\.lib$
|^performance/performancetestutilities/nbproject/ide-file-targets\.xml$
|^performance/test/.*\.sh$
|^performance/test/perf/lib$
|^performance/test/perf/report$
|^performance/test/perf/baseline$
|^performance/test/qa-functional/src/gui/.*Temp.*$
|^performance/test/qa-functional/src/gui/memory_usage/.*Temp.*$
|^performance/test/reports/classes$
|^performance/test/reports/output\.xml$
|^performance/test/reports/perfresults\.jar$
|^performance/threaddemo/user\.build\.properties$
|^php.doc/libdoc/doc\.gzip$
|^php.doc/libdoc/doc\.html$
|^j2ee.sun.appsrv81/\.keystore$
|^tomcat5/\.swo$
|^performance/uml/test/.*\.sh$
|^performance/uml/test/qa-functional/src/gui/TempTest\.java$
|^versioning/clearcase/JavaApplication21/\.hgignore$
|^versioning/clearcase/JavaApplication21/\.hg$
|^versioning/test/qa-functional/data/tck\.properties$
|^web\.client\.tools\.internetexplorer/nativesrc/(activescript
|Release
|Debug
|VSIP8.0)$
|^web\.client\.tools\.internetexplorer/nativesrc/NetBeansExtension\.(h
|ncb
|suo
|vcproj\..+)$
|^web\.client\.tools\.internetexplorer/nativesrc/NetBeansExtension_(i
|p)\.c$
|^web\.client\.tools\.internetexplorer/nativesrc/dlldata\.c$
|^web.jsf.navigation/\.externalToolBuilders$
|^web.jsf.navigation/\.project$
|^web.debug/test/run\.sh$
|^websvc.jaxrpc/src/org/netbeans/modules/websvc/jaxrpc/dev/dd/gen$
|^websvc.jaxrpc/src/org/netbeans/modules/websvc/wsdl/config/impl$
|^websvc.restkit/nbms$
|^websvc.saas.api/src/org/netbeans/modules/websvc/saas/model/jaxb$
|^websvc.saas.api/src/org/netbeans/modules/websvc/saas/model/oauth$
|^websvc.saas.api/src/org/netbeans/modules/websvc/saas/model/wadl$
|^websvc.wsitmodelext/coverage$
|^xml.tax/lib/test/qa-functional/src/org/netbeans/tax/data/Delme\.xml$
|^xml/test/qa-functional/src$
|^xml/test/qa-lib/xml-test-lib\.jar$
|^xml.tools/test/qa-functional/src/org/netbeans/modules/xsl/action/data/out/document\.html$
|^hibernate/test/unit/data/db-derby-10\.2\.2\.0-bin$
|.*\.orig\..*$
|.*\.chg\..*$
|.*\.conflict\~$
|^nbbuild/jna\.keystore$
|^nbi/engine/testData$
|^o\.n\.bootstrap/launcher/windows/nbexec_exe\.res$
|^installer/mac/newbuild/dist_dmg$
|^cordova\.platforms\.ios/external/ios-sim$
|^cnd\.discovery/tools/BuildTrace/\.dep\.inc$

This fixes a pretty bad performance bug in the NFA compiler. In particular, c_char was implemented by diverting to c_class, which is correct, but rather costly to do for every single character in a regex. This causes way more things than necessary to go through the class compilation infrastructure, which includes the suffix caching. We fix this by just special casing c_char. This speeds up regex compilation in #657 by around 30%. Fixes #657

BurntSushi · 2020-03-24T21:11:03Z

Once I profiled it, the problem was pretty clear: the compiler was using the class compilation infrastructure to handle even plain literal characters. In big regexes like this, that adds up quite a bit. I opened #658 to fix that. In my test, it got about a 30% improvement, so I think that should narrow the gap considerably for this particular case.

Once that was fixed, I didn't see any other obvious hotspots that were easy to fix. Interestingly, a good chunk of the time was actually spend in the parsing and translation steps. I haven't spent much (if any) time optimizing that, so that may be the next place to go.

I ported your benchmark to regex-automata's NFA compiler, and it does seem to be a bit faster there. Although I don't think I did a completely fair comparison.

This fixes a pretty bad performance bug in the NFA compiler. In particular, c_char was implemented by diverting to c_class, which is correct, but rather costly to do for every single character in a regex. This causes way more things than necessary to go through the class compilation infrastructure, which includes the suffix caching. We fix this by just special casing c_char. This speeds up regex compilation in #657 by around 30%. Fixes #657

BurntSushi · 2020-03-24T22:59:36Z

The fix for this is on crates.io in regex 1.3.6.

Alphare · 2020-03-25T08:54:11Z

It would be useful to just get the actual regex next time. I had to spend a bit of time making it look nicer
Sorry, I was trying to make sure that I didn't accidentally escape something. Net time I'll give you the pattern directly.

Once I profiled it, the problem was pretty clear: the compiler was using the class compilation infrastructure to handle even plain literal characters. In big regexes like this, that adds up quite a bit. I opened #658 to fix that. In my test, it got about a 30% improvement, so I think that should narrow the gap considerably for this particular case.

I am seeing the same performance improvement on my side.

Once that was fixed, I didn't see any other obvious hotspots that were easy to fix. Interestingly, a good chunk of the time was actually spend in the parsing and translation steps. I haven't spent much (if any) time optimizing that, so that may be the next place to go.

Ok, thank you for the quick fix. I will be keeping my eye on new releases of regex. :)

The fix for this is on crates.io in regex 1.3.6.

regex was introduced as the default option in the experimental Rust extensions of Mercurial for ignore mechanism in the current default branch. It should in the next (5.4) release.

BurntSushi mentioned this issue Mar 24, 2020

compile: add fast path for c_char #658

Merged

BurntSushi closed this as completed in #658 Mar 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow compilation for big bytes pattern #657

Slow compilation for big bytes pattern #657

Alphare commented Mar 24, 2020

BurntSushi commented Mar 24, 2020

BurntSushi commented Mar 24, 2020

BurntSushi commented Mar 24, 2020

Alphare commented Mar 25, 2020

Slow compilation for big bytes pattern #657

Slow compilation for big bytes pattern #657

Comments

Alphare commented Mar 24, 2020

BurntSushi commented Mar 24, 2020

BurntSushi commented Mar 24, 2020

BurntSushi commented Mar 24, 2020

Alphare commented Mar 25, 2020