Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is pyo3_branchwater much slower than sra_search?? #9

Closed
ctb opened this issue Aug 25, 2023 · 13 comments
Closed

is pyo3_branchwater much slower than sra_search?? #9

ctb opened this issue Aug 25, 2023 · 13 comments

Comments

@ctb
Copy link
Member

ctb commented Aug 25, 2023

branchwater benchmarking for 5 searches of 1,000 genomes x 10,000 metagenomes

old code, sra_search

time (minutes) 24.2, std 1.7
RSS/mem: mean 16.4 GB, std 1.6 GB
I/O: mean 93.4 GB, std 1.9

new code, pyo3_branchwater v ~0.5.1

time (minutes) 130.2, std 32.7
RSS/mem: mean 10.4 GB, std 146.9 MB
I/O: mean 66GB, std 13.1

first cut observations

time is ...a LOT slower. Not sure why that would be!? maybe need to look at other stats too, around efficiency and so on.

why would memory decrease?!
and esp why would I/O decrease?

maybe threading isn't working properly in pyo3_Branchwater?

@ctb
Copy link
Member Author

ctb commented Aug 25, 2023

first cut, I see that mean load looks substantially lower in pyo3_branchwater than in sra_search - see the first
large dataframe here:

original: https://github.com/dib-lab/2022-branchwater-benchmarking/blob/main/01.basic-benchmarks.ipynb
pyo3: https://github.com/dib-lab/2022-branchwater-benchmarking/blob/plugin/01.basic-benchmarks.ipynb

left is sra_search, right is pyo3_branchwater. cpu_time is ~comparable (lower in pyo3_branchwater!) but mean load is MUCH smaller in the newer code base. sigh.

Screenshot 2023-08-25 at 6 55 23 AM Screenshot 2023-08-25 at 6 55 17 AM

@ctb
Copy link
Member Author

ctb commented Aug 25, 2023

ok, with @mr-eyes latest,

/usr/bin/time -v sourmash scripts manysearch -k 31 --scaled=1000 -o outputs/output_a_vs_a.csv            data/gtdb-list-a.sigs.txt data/wort-list-a.txt -c 32
...
...manysearch is done! results in 'outputs/output_a_vs_a.csv'
        Command being timed: "sourmash scripts manysearch -k 31 --scaled=1000 -o outputs/output_a_vs_a.csv data/gtdb-list-a.sigs.txt data/wort-list-a.txt -c 32"
        User time (seconds): 57905.38
        System time (seconds): 99202.38
        Percent of CPU this job got: 2068%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:06:33
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 31175788
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 551
        Minor (reclaiming a frame) page faults: 9168376074
        Voluntary context switches: 61561612
        Involuntary context switches: 384530
        Swaps: 0
        File system inputs: 196666568
        File system outputs: 195288
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb
Copy link
Member Author

ctb commented Aug 25, 2023

with sourmash-bio/sourmash_plugin_branchwater@ae0b26a, multithreading also seemed to work fine - I accidentally ran it with 64 threads tho ;)

% /usr/bin/time -v sourmash scripts manysearch -k 31 --scaled=1000 -o outputs/output_a_vs_a.csv             data/gtdb-list-a.sigs.txt data/wort-list-a.txt
...
        Command being timed: "sourmash scripts manysearch -k 31 --scaled=1000 -o outputs/output_a_vs_a.csv data/gtdb-list-a.sigs.txt data/wort-list-a.txt"
        User time (seconds): 100630.36
        System time (seconds): 165828.01
        Percent of CPU this job got: 3074%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:24:26
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 58909660
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 282
        Minor (reclaiming a frame) page faults: 9490768717
        Voluntary context switches: 105107175
        Involuntary context switches: 3457801
        Swaps: 0
        File system inputs: 186895752
        File system outputs: 194880
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb
Copy link
Member Author

ctb commented Aug 25, 2023

hmm. definitely something funky going on up above tho. user time seems whack. parallelization is not good?

@mr-eyes
Copy link
Member

mr-eyes commented Aug 25, 2023

Could you please get the same 64 cores run output using sourmash-bio/pyo3_branchwater@d7086d3 (#57) ?

@ctb
Copy link
Member Author

ctb commented Aug 26, 2023

with sourmash-bio/sourmash_plugin_branchwater@d7086d3:

% /usr/bin/time -v sourmash scripts manysearch -k 31 --scaled=1000 -o outputs/output_a_vs_a.csv             data/gtdb-list-a.sigs.txt data/wort-list-a.txt -c 64  
...
ksize: 31 / scaled: 1000 / threshold: 0.01
searching all sketches in 'data/gtdb-list-a.sigs.txt' against 'data/wort-list-a.txt' using 64 threads
Reading list of queries from: 'data/gtdb-list-a.sigs.txt'
Loaded 1000 query signatures
Reading search file paths from: 'data/wort-list-a.txt'
Loaded 10000 sig paths to search.
notify doneme
DONE. Processed 10000 search sigs
...manysearch is done! results in 'outputs/output_a_vs_a.csv'
        Command being timed: "sourmash scripts manysearch -k 31 --scaled=1000 -o outputs/output_a_vs_a.csv data/gtdb-list-a.sigs.txt data/wort-list-a.txt -c 64"
        User time (seconds): 98165.52
        System time (seconds): 170954.78
        Percent of CPU this job got: 3085%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:25:22
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 60492340
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 441
        Minor (reclaiming a frame) page faults: 9591422200
        Voluntary context switches: 105558503
        Involuntary context switches: 3213541
        Swaps: 0
        File system inputs: 174266152
        File system outputs: 194872
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb
Copy link
Member Author

ctb commented Aug 27, 2023

I now have a strong suspicion that the above benchmarks were done with "unoptimized" or dev mode builds - I wasn't installing wheels from maturin build -r, because I thought pip install -e . would do release-mode builds.

So I'm rerunning the last benchmark to see if there's a difference :)

@ctb
Copy link
Member Author

ctb commented Aug 27, 2023

hmm. no difference I think. hmm.

        Command being timed: "sourmash scripts manysearch -k 31 --scaled=1000 -o outputs/output_a_vs_a.csv data/gtdb-list-a.sigs.txt data/wort-list-a.txt -c 64"
        User time (seconds): 59784.59
        System time (seconds): 178823.98
        Percent of CPU this job got: 3167%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:05:33
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 55803492
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 108
        Minor (reclaiming a frame) page faults: 9297596640
        Voluntary context switches: 116233294
        Involuntary context switches: 4092034
        Swaps: 0
        File system inputs: 0
        File system outputs: 194936
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb
Copy link
Member Author

ctb commented Aug 28, 2023

so frustrating - the original code is super fast! and low memory! 😭

I feel like this has to be at least partly an optimization thing.

        Command being timed: "/home/ctbrown/scratch/magsearch/bin/searcher -k 31 --scaled=1000 -o outputs/output_a_vs_a.csv data/gtdb-list-a.sigs.txt data/wort-list-a.txt"
        User time (seconds): 27442.66
        System time (seconds): 4188.67
        Percent of CPU this job got: 4186%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 12:35.46
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 27440076
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 8704
        Minor (reclaiming a frame) page faults: 304905625
        Voluntary context switches: 936048
        Involuntary context switches: 9548437
        Swaps: 0
        File system inputs: 408992
        File system outputs: 2496
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

@ctb
Copy link
Member Author

ctb commented Aug 28, 2023

ok, I think I figured it out - after I got cargo flamegraph working in sourmash-bio/sourmash_plugin_branchwater#70 & it showed me that the merge command necessary to calculate Jaccard was consuming a lot of time, I removed the Jaccard code completely and gave it a try -

...manysearch is done! results in 'outputs/output_a_vs_a.csv'
        Command being timed: "sourmash scripts manysearch -k 31 --scaled=1000 -o outputs/output_a_vs_a.csv data/gtdb-list-a.sigs.txt data/wort-list-a.txt -c 64"
        User time (seconds): 23487.51
        System time (seconds): 883.01
        Percent of CPU this job got: 4881%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 8:19.27
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 37619264
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 602
        Minor (reclaiming a frame) page faults: 294906448
        Voluntary context switches: 830322
        Involuntary context switches: 8060687
        Swaps: 0
        File system inputs: 40
        File system outputs: 171176
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

we are now faster than the original searcher binary, but consume a bit more memory.

anyway, I think this means we should not be calculating jaccard 😆 .

@ctb
Copy link
Member Author

ctb commented Aug 29, 2023

old code, sra_search

time (minutes) 24.2, std 1.7
RSS/mem: mean 16.4 GB, std 1.6 GB
I/O: mean 93.4 GB, std 1.9

new code, pyo3_branchwater v0.6.1 or thereabouts

time (minutes) 13.4, std 0.9
mean (GB) 16.8 GB, std 1.06
IO ??? mean 0.004, std 0.005

@ctb
Copy link
Member Author

ctb commented Aug 29, 2023

(no! it is faster!)

@ctb ctb closed this as completed Aug 29, 2023
@ctb
Copy link
Member Author

ctb commented Sep 2, 2023

with pyo3_branchwater 0.7.0:

mean time (minutes) 15.1, std 1.0
mean RSS (GB) 16.3, std 1.8
mean I/O (kb) 0.0, std 0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants