Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Profile-Guided Optimization (PGO) #2354

Open
zamazan4ik opened this issue Jul 29, 2023 · 12 comments
Open

Evaluate Profile-Guided Optimization (PGO) #2354

zamazan4ik opened this issue Jul 29, 2023 · 12 comments
Labels
enhancement New feature or request

Comments

@zamazan4ik
Copy link

Hi!

There is an idea that PGO could help with improving performance even more! There are a lot of examples of different software, where PGO helps a lot with performance - you can check it here. E.g. in this list are a lot of databases like PostgreSQL and ClickHouse.

There are several options. I'd appreciate it if you could provide an easy way to build Qdrant with PGO. And experienced users will be able to do it on their own for their own usage scenarios. Another option is to optimize Qdrant build with a generic-enough profile. Providing PGO-optimized binaries could be a trickier task (since it requires preparing a good-enough profile) but as an option would be great to see too. Another idea of how to use PGO - optimize your own cloud-based Qdrant installation.

As an additional optimization way, I suggest taking a look at LLVM BOLT. But from my experience, it would be better to start with PGO and then try to use BOLT.

For the Rust projects, I recommend starting with cargo-pgo.

@agourlay
Copy link
Member

This sounds interesting, thank you for raising this issue 👍

What do you miss or need to be able to try cargo-pgo?

@zamazan4ik
Copy link
Author

What do you miss or need to be able to try cargo-pgo?

I've created the issue as an idea to try PGO on Qdrant. Regarding applying PGO to Qdrant, if https://github.com/qdrant/qdrant/tree/master/benches benchmarks have good coverage from the Qdrant functionality perspective, I think we can try to test PGO on it.

The only thing I miss is free time to do it :) (since I am working on enabling PGO for multiple projects).

@agourlay
Copy link
Member

The only thing I miss is free time to do it

I did not mean to put you under pressure to get it done 👍

since I am working on enabling PGO for multiple projects

It seems you have much more experience with PGO than us for the time being so your input is very valuable.

I think it would help to prioritize this work to have a basic experiment demonstrating how much it costs to setup/maintain for which potential performance gain.

@zamazan4ik
Copy link
Author

I think it would help to prioritize this work to have a basic experiment demonstrating how much it costs to setup/maintain for which potential performance gain.

Well, if we are going to use benchmarks as a PGO training and evaluation set, I need to understand how to run the benchmarks with Qdrant. Are they integrated via cargo bench?

@agourlay
Copy link
Member

We use criterion but bench should work as well AFAIK.

@zamazan4ik
Copy link
Author

Well, at least cargo bench in the root qdrant repo runs nothing:

Finished bench [optimized] target(s) in 4m 43s
     Running unittests src/main.rs (target/release/deps/qdrant-e69b7ac971f11c78)

running 16 tests
test actix::api::collections_api::tests::timeout_is_deserialized ... ignored
test actix::api::read_params::test::deserialize_empty_string ... ignored
test actix::api::read_params::test::deserialize_empty_value ... ignored
test actix::api::read_params::test::deserialize_factor ... ignored
test actix::api::read_params::test::deserialize_type ... ignored
test actix::api::read_params::test::try_deserialize_factor_0 ... ignored
test actix::tests::test_version ... ignored
test common::helpers::tests::test_is_ready ... ignored
test common::metrics::tests::test_endpoint_whitelists_sorted ... ignored
test consensus::tests::collection_creation_passes_consensus ... ignored
test greeting::tests::test_welcome ... ignored
test settings::tests::test_custom_config ... ignored
test settings::tests::test_default_config ... ignored
test settings::tests::test_no_config_files ... ignored
test settings::tests::test_runtime_development_config ... ignored
test tonic::api::tests::test_validation ... ignored

test result: ok. 0 passed; 0 failed; 16 ignored; 0 measured; 0 filtered out; finished in 0.00s

As far as I see in https://github.com/qdrant/qdrant/blob/master/benches/search-points/search-points.sh , the benchmarks should be run in a different way:

  1. Compile and run Qdrant instance
  2. Run search-points.sh script
  3. Collect benchmark data

Maybe, I will try to do it a bit later :)

@agourlay
Copy link
Member

Then cargo criterion --all should unleash the kraken 🐙

@zamazan4ik
Copy link
Author

Then cargo criterion --all should unleash the kraken

Yeah, it definitely should :) The problem is that https://github.com/Kobzol/cargo-pgo relies on cargo bench integration. So if we want to build PGO-optimized Qdrant version, we need to do a little bit more work. It's of course possible to do (no rocket science at all) but just a little bit more work to do here.

@agourlay
Copy link
Member

If you step into the lib crates, you can run benchmarks directly.

cd lib/segment
cargo bench --bench hnsw_build_graph

@agourlay agourlay added the enhancement New feature or request label Aug 31, 2023
@zamazan4ik
Copy link
Author

@agourlay During testing on Macbook M1 with macOS 13.4 (Ventura) and Qdrant from master branch (commit 0f102a5575ac33df03f06563844198f3ea26136b) I get the following error:

cd lib/segment
RUST_BACKTRACE=1 cargo bench --bench hnsw_build_asymptotic
    Finished bench [optimized] target(s) in 0.24s
     Running benches/hnsw_build_asymptotic.rs (/Users/zamazan4ik/open_source/qdrant/target/release/deps/hnsw_build_asymptotic-215d8cb651e6cb54)
Gnuplot not found, using plotters backend
hnsw-index-build-asymptotic/build-n-search-hnsw
                        time:   [26.704 µs 26.792 µs 26.902 µs]
                        change: [-0.6981% -0.2924% +0.1105%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe
hnsw-index-build-asymptotic/build-n-search-hnsw-10x
                        time:   [31.965 µs 32.401 µs 32.844 µs]
                        change: [-1.5955% +0.5111% +2.6498%] (p = 0.64 > 0.05)
                        No change in performance detected.
hnsw-index-build-asymptotic/build-n-search-hnsw-10x-score-point
                        time:   [33.176 µs 33.408 µs 33.735 µs]
                        change: [-0.5501% +0.1561% +0.9129%] (p = 0.67 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe

Benchmarking scoring-vector/score-point: Warming up for 3.0000 sthread 'main' panicked at 'internal error: entered unreachable code: FakeMetric::distance', lib/segment/benches/hnsw_build_asymptotic.rs:113:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
   1: core::panicking::panic_fmt
             at /rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
   2: <hnsw_build_asymptotic::FakeMetric as segment::spaces::metric::Metric>::distance
   3: segment::vector_storage::raw_scorer::raw_scorer_impl
   4: criterion::bencher::Bencher<M>::iter
   5: <criterion::routine::Function<M,F,T> as criterion::routine::Routine<M,T>>::warm_up
   6: criterion::routine::Routine::sample
   7: criterion::analysis::common
   8: criterion::benchmark_group::BenchmarkGroup<M>::bench_function
   9: hnsw_build_asymptotic::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

error: bench failed, to rerun pass `--bench hnsw_build_asymptotic`

@zamazan4ik
Copy link
Author

zamazan4ik commented Sep 1, 2023

I benchmarked segment with PGO on my Macbook M1 Pro, macOS Ventura 13.4. The background noise was the same. The only disabled benchmark is hnsw_build_asymptotic since it doesn't work on the current Qdrant version on my machine (see the comment above about the panic).

The results are the following (in cargo bench format):

According to these microbenchmarks, PGO helps with achieving better performance in almost all cases. However, would be much more interesting to test Qdrant itself with PGO.

Additionally, I want to highlight, that some used by Qdrant 3rd-parties could be optimized with PGO as well. For E.g. RocksDB as a C++ dependency will not be optimized with cargo pgo but RocksDB benefits from PGO too according to my tests (somewhere near 10% in performance).

@agourlay
Copy link
Member

Thanks for the deep investigation 👍

FYI we have finally fixed the panic in the hnsw_build_asymptotic bench :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants