-
Notifications
You must be signed in to change notification settings - Fork 749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid object instantiation in lexers #1147
Comments
Yes! And also, as far as possible, reducing the amount of giant regexes we generate by interpolating large arrays would go a long way. |
@pyrmont I have no issues undertaking this task. Optimizing memory will help everyone. |
Thanks @ashmaroli! I've been playing around with seeing the impact of deleting certain lexers. I reduced memory allocation almost 1.2MB (!!!) just by deleting all the lexers staring with I haven't looked at which lexer in particular was causing problems but if it was because of unnecessary object instantiation, this could be a huge help :) |
This comment has been minimized.
This comment has been minimized.
Deleting |
I see a lot of inconsistencies in the lexers. Some have |
Great. Here's my plan: Convert all |
As a first pass, my preferred option would be to put all local variables (be they arrays of special words or single regular expressions) behind class methods. This has the benefit of not requiring any further editing of lexers that were using local variables (although, as you noted, they weren't all doing that so it's not going to be true universally). Once that's done, further optimisation could involve replacing some of the individual regular expressions with references to commonly used ones that are centrally defined (as you proposed in #1139). I know @jneen expressed concern about unnecessary obfuscation with that approach, though, and I think it would be worth testing how much it improves things. If the early indications are correct, 'wrapping' variables should drastically reduce memory allocation for most uses cases. Users are rarely going to be invoking Rouge to syntax highlight more than a handful of languages and so the actual number of regular expressions instantiated in practice even without centrally defined expressions should be low (I think). |
@pyrmont I won't be able to finish this task. The various lexers are structured differently. Some specs fail on changing |
No problem at all! Thanks for taking a look at it! |
So I kept playing around and now have this code as a kind of hacky proof of concept. ApproachThe important stuff is all in
All tests pass. ResultsHere's the stats for memory on master:
and on this branch:
The memory for just loading the library has been reduced to:
LimitationsThe information from the lexer files is extracted using a simple regular expression. If a lexer file does not express the information required (class name, tag, aliases) in the format expected, this won't work. Future WorkA better approach may be to formally split lexers into two files: one that contains the details necessary for selection of the appropriate lexer and the other that contains the lexing logic. This might be something appropriate for version 4.0. |
This comment has been minimized.
This comment has been minimized.
Rebased for your ease of comparison pleasure. Memory usage is now:
and for vanilla loading:
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@ashmaroli Vanilla loading means without actually running any highlighting. I introduced a rake task that just loads the library and that's what I mean by vanilla loading. |
Another option that does away with that scary regex: keep the |
@jneen I feel like I've just been galaxy brained. |
lol maybe. it might be a bit much. also make sure to update |
I spent some time doing more experimentation today and I'm sorry to say that the 'don't save the block' approach didn't greatly reduce memory usage. It does reduce the total number of objects retained but only by a little and does not really reduce the number of objects allocated. Here are the results from wrapping the Current Approach
Unsaved Block Approach
Other things I tried:
Short of deferring parsing of the code, I can't see a way to substantially reduce memory usage in respect of the lexers. |
The last resort is to use a Rake task to generate an index, then use |
This issue has been automatically marked as stale because it has not had any activity for more than a year. It will be closed if no additional activity occurs within the next 14 days. |
One way to reduce the memory usage of Rouge is to avoid instantiation of objects in individual lexers. Based on the report generated by
rake profile_memory
, it would seem the biggest causes of unnecessary object instantiation in a lexer are:a failure to wrap arrays of keywords in class methods so that you do this:
rather than this:
a failure to wrap regexs (especially involving interpolation) in class methods so that you do this:
rather than this:
Initial testing suggested this can lead to a substantial reduction in memory used by the lexers. The AppleScript lexer (the largest lexer by memory usage) can be reduced from 65KB to 2.22KB by simply putting the regular expressions defined as local variables behind class methods.
Do you have any interest in doing this thankless task, @ashmaroli? (I promise I will thank you!)
The text was updated successfully, but these errors were encountered: