-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repeated annotation of large files is slow #256
Comments
After a bit more investigation, it seems repeated annotation is actually faster, because most of the existing annotations can be reused and nothing needs to change in the repository. Problem is saving new annotations. In MPP, there are 4386 term occurrences and since each occurrence usually has two selectors, it gives over 7800 instances to be saved. Asynchronous saving of term occurrences could be used to improve performance of text analysis as a whole. |
…parate class. This way an alternative implementation using asynchronous processing can be introduced.
…n processing performance. Helps mainly when no occurrences existed originally.
Should decrease number of iterations over occurrences in annotated source.
…currences in analyzed file. Since the same terms are likely to occur multiple times in a file, it makes sense to cache existence check results, thus improving performance of term occurrence resolution.
When text analysis is invoked on an already annotated larger file (cca 1MB) containing many term occurrences, processing of its results can take minutes to finish. This makes it practically unusable, as the user is unsure whether it is normal that the application shows
Please wait...
for several minutes and may leave/attempt to refresh.Analysis of repeated annotation of the metropolitan plan shows the following times:
The goal should be to get at least under a minute altogether, preferably even better.
The text was updated successfully, but these errors were encountered: