Repeated annotation of large files is slow #256

ledsoft · 2024-02-12T14:38:31Z

When text analysis is invoked on an already annotated larger file (cca 1MB) containing many term occurrences, processing of its results can take minutes to finish. This makes it practically unusable, as the user is unsure whether it is normal that the application shows Please wait... for several minutes and may leave/attempt to refresh.

Analysis of repeated annotation of the metropolitan plan shows the following times:

Invocation of text analysis: 8.5s
Resolution of occurrences in the file: 47s
Saving occurrences: 5min 31s

The goal should be to get at least under a minute altogether, preferably even better.

The text was updated successfully, but these errors were encountered:

ledsoft · 2024-02-12T14:49:08Z

After a bit more investigation, it seems repeated annotation is actually faster, because most of the existing annotations can be reused and nothing needs to change in the repository. Problem is saving new annotations. In MPP, there are 4386 term occurrences and since each occurrence usually has two selectors, it gives over 7800 instances to be saved.

Asynchronous saving of term occurrences could be used to improve performance of text analysis as a whole.

…parate class. This way an alternative implementation using asynchronous processing can be introduced.

…n processing performance. Helps mainly when no occurrences existed originally.

Should decrease number of iterations over occurrences in annotated source.

…currences in analyzed file. Since the same terms are likely to occur multiple times in a file, it makes sense to cache existence check results, thus improving performance of term occurrence resolution.

ledsoft added the performance Performance issue label Feb 12, 2024

ledsoft added a commit that referenced this issue Feb 12, 2024

[Perf #256] Extract term occurrence saving after annotation into a se…

49ba559

…parate class. This way an alternative implementation using asynchronous processing can be introduced.

ledsoft added a commit that referenced this issue Feb 12, 2024

[Perf #256] Save term occurrences asynchronously to improve annotatio…

8586246

…n processing performance. Helps mainly when no occurrences existed originally.

ledsoft added a commit that referenced this issue Feb 12, 2024

[Perf #256] Simplify term occurrence resolution to improve performance.

c26bb01

Should decrease number of iterations over occurrences in annotated source.

ledsoft mentioned this issue Feb 13, 2024

Perf#256 annotation is slow #257

Merged

ledsoft linked a pull request Feb 13, 2024 that will close this issue

Perf#256 annotation is slow #257

Merged

ledsoft closed this as completed Feb 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeated annotation of large files is slow #256

Repeated annotation of large files is slow #256

ledsoft commented Feb 12, 2024

ledsoft commented Feb 12, 2024 •

edited

Loading

Repeated annotation of large files is slow #256

Repeated annotation of large files is slow #256

Comments

ledsoft commented Feb 12, 2024

ledsoft commented Feb 12, 2024 • edited Loading

ledsoft commented Feb 12, 2024 •

edited

Loading