Improve tagger: Return iterators over `WordData`, remove groups, parallelize deserialization #70

bminixhofer · 2021-04-22T11:28:19Z

I had another look at the tagger today. This PR:

Changes all the get_tags_* methods to return iterators instead of Vec.
Removes the groups. These were only used by nlprule in the PosReplacer which wasn't used anywhere as it is not fully implemented. Some of the currently unimplemented rules might need the groups in some form though, but we can probably get away with search + caching since the groups are only needed if a rule actually matches there.
Iterates over the FST in parallel in chunks with disjoint words, this allows populating the tags without synchronization.
Replaces the word HashMap with a Vec since the IDs go from zero to n_words anyway, so we don't need to hash anything.

I see another ~ 30% speedup in loading the Tokenizer. This could also have positive impact on rule checking speed, but there's some weird behavior in the local benchmark on my PC so I have to double check.

@drahnr you might be interested in this PR. It would also be great if you could double check the speedup.

drahnr · 2021-04-22T12:06:57Z

Could you provide some instructions how you run your benches? Just copied the .bin files from the last release, but to no avail - I assume the structure of the .bins changed as well.

Benchmarking load tokenizer: Warming up for 3.0000 sthread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Serialization(Io(Custom { kind: UnexpectedEof, error: "failed to fill whole buffer" }))', nlprule/benches/load.rs:6:76
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

bminixhofer · 2021-04-22T12:08:50Z

Oh, I forgot that the binaries have changed. You can rebuild them with:

./scripts/build_and_test.sh en xxx

drahnr · 2021-04-22T12:51:07Z

Alright, it was a bit more than that, but it's running now.

Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 36.1s, or reduce sample count to 10.
load tokenizer          time:   [350.09 ms 350.94 ms 351.92 ms]                           

load rules              time:   [32.718 ms 32.819 ms 32.929 ms]

A few observations (again, cargo-spellcheck PoV):

Numbers look really good, getting closer to the point where I would consider not thinking about launching another process just to avoid re-parsing - so this is a huge 👍
since the data is constant, a 3.0 second warmup does not represent what a common user of cargo-spellcheck will face (talking mostly caches here, black_box will only prevent the compiler to pre-maturely optimizing, but the actual CPU doesn't care) - See more details in chore: improve benches a bit #71 - EDIT: even for the case of a long running service, you are mostly intersted in the one-off execution, both rules and tokenizer loading are not common hot-paths, but rather one-off operations.
criterion has options for, so I would recommend not much of a warmup, so the worst case represents what cli programs will see

Thanks for working on the perf ❤️

drahnr · 2021-04-22T12:52:12Z

Benches were all done with

vendor_id	: AuthenticAMD
cpu family	: 23
model		: 113
model name	: AMD Ryzen 7 3700X 8-Core Processor
stepping	: 0
microcode	: 0x8701013
cpu MHz		: 2200.000
cache size	: 512 KB
...

(for perspective).

bminixhofer · 2021-04-22T16:10:55Z

Alright, it was a bit more than that

I should document that, I guess you had to download the build dirs too.

Thanks for checking the perf! And thanks for the PR. I agree on the point with the warmup, I'll merge it.

bminixhofer · 2021-04-22T19:59:04Z

Turns out the Chunker deserialization also contributed a significant amount now that the Tokenizer is faster. There was a fairly easy fix storing the parameters in one large vector and accessing with (offset, length) tuples instead of having one vector for each feature which shaves of another ~12% of loading time on the Tokenizer (in terms of the already improved speed above).

Unfortunately I think that's it for the reasonably low-hanging speed improvements 😉

nlprule/src/compile/impls.rs

bminixhofer added 3 commits April 21, 2021 16:31

use vec instead of hashmap internally in tagger

388c2e0

parallelize tagger deserialization

1b794fa

return iterators in tagger

34aa9d2

drahnr and others added 2 commits April 22, 2021 18:11

chore: improve benches a bit (#71)

7ba1003

remove redundant vectors in chunker

d122bfd

cleanup wordidmap

fd00534

drahnr reviewed Apr 23, 2021

View reviewed changes

nlprule/src/compile/impls.rs Show resolved Hide resolved

nlprule/src/compile/impls.rs Show resolved Hide resolved

nlprule/src/compile/impls.rs Show resolved Hide resolved

update changelog

424966e

bminixhofer merged commit 602aaf1 into main Apr 24, 2021

bminixhofer deleted the faster-tagger-load branch April 27, 2021 13:47

bminixhofer mentioned this pull request Apr 28, 2021

Improve loading speed (of regex?) - cli usecase #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve tagger: Return iterators over `WordData`, remove groups, parallelize deserialization #70

Improve tagger: Return iterators over `WordData`, remove groups, parallelize deserialization #70

bminixhofer commented Apr 22, 2021

drahnr commented Apr 22, 2021 •

edited

Loading

bminixhofer commented Apr 22, 2021

drahnr commented Apr 22, 2021 •

edited

Loading

drahnr commented Apr 22, 2021

bminixhofer commented Apr 22, 2021

bminixhofer commented Apr 22, 2021

Improve tagger: Return iterators over WordData, remove groups, parallelize deserialization #70

Improve tagger: Return iterators over WordData, remove groups, parallelize deserialization #70

Conversation

bminixhofer commented Apr 22, 2021

drahnr commented Apr 22, 2021 • edited Loading

bminixhofer commented Apr 22, 2021

drahnr commented Apr 22, 2021 • edited Loading

drahnr commented Apr 22, 2021

bminixhofer commented Apr 22, 2021

bminixhofer commented Apr 22, 2021

Improve tagger: Return iterators over `WordData`, remove groups, parallelize deserialization #70

Improve tagger: Return iterators over `WordData`, remove groups, parallelize deserialization #70

drahnr commented Apr 22, 2021 •

edited

Loading

drahnr commented Apr 22, 2021 •

edited

Loading