-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvement of dvcignore #3967
Conversation
1. add new command `dvc remote rename <old> <new>`. 2. add tests for this new command.
Co-authored-by: Peter Rowlands (λ³κΈ°νΈ) <[email protected]>
fix#3869 1.Use big regex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though I would not close #3869 just yet.
@karajan1001 you mentioned the flashtext library, and I could not find the description of the experiment conducted, that gave the author conclusion that it makes sense to use trie
-based approach only above some number of keywords. What about matching few gitignore
patterns against hundreds of files? Maybe there could be gain too.
Also, I wonder whether re.compile
has limit of pattern size. Checked against 100k, seems to not be a problem. I did not find resources pointing to the limit.
@karajan1001 Windows build is failing. |
Maybe this is because |
@pared @Suor
This is the result after this submit.
The new version took 24ms, and the old one took 417ms. (17.35 times faster) And In the worst case where the rules in the order that ignore rules and include rules are separated from each other. And there is no big regex exists(Hardly could this happened in real). The new version took 84ms, and the old one took 416ms. (4.95 times faster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one looks good. We should merge it.
The thing is bothering me is that this doesn't help with many ignore files. Can we make a bench for that? How many ignore files do we need to be in trouble?
@karajan1001, thanks for this PR! Now, that implementation of |
Fix #3869
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.
β I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. π