You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thinking more about this - assume the default kmer size of 12. A kmer Vec<u8> (or ByteStr) is 16 bytes (12 u8s + a usize length), while a slice is 8 bytes (a usize pointer + a usize length). That means, with 2 or more threads, it consumes as much or more memory to build a separate set of target hashes for each thread, rather than build one and share it (behind an Arc). It would also make the API cleaner to use Vec<u8>/ByteStr keys as we can get rid of the lifetime parameter. I'm not sure if there will be any impact on performance.
We could also do something fancier - compute the key size (threads * 8 vs k + 4) to decide whether to use a single shared target hashes or one per thread, and put it behind e.g. a Cow (although then we need to keep the lifetime parameter).
The text was updated successfully, but these errors were encountered:
Furthermore, TargetHash should switch from using std::collections::HashMap with FxHash to hashbrown (which is inherently faster, and also uses AHash, which is faster than FxHash for string keys).
One cool feature of hashbrown is the ability to dynamically compute keys using RawEntryBuilder. This means that the hash table could store keys as u32 (the index into the sequence). When a new key is added, the string slice associated with that index (for a fixed length k) is used to compute the hash (and compare amongst keys in the case of a collision). In this case, I'd suggest must moving the hash tables into the TargetSeq struct.
Thinking more about this - assume the default kmer size of 12. A kmer
Vec<u8>
(orByteStr
) is 16 bytes (12u8
s + ausize
length), while a slice is 8 bytes (ausize
pointer + ausize
length). That means, with 2 or more threads, it consumes as much or more memory to build a separate set of target hashes for each thread, rather than build one and share it (behind anArc
). It would also make the API cleaner to useVec<u8>
/ByteStr
keys as we can get rid of the lifetime parameter. I'm not sure if there will be any impact on performance.We could also do something fancier - compute the key size (
threads * 8
vsk + 4
) to decide whether to use a single shared target hashes or one per thread, and put it behind e.g. aCow
(although then we need to keep the lifetime parameter).The text was updated successfully, but these errors were encountered: