Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"cargo test --benches" runs unit tests #6454

Closed
asomers opened this issue Dec 17, 2018 · 17 comments
Closed

"cargo test --benches" runs unit tests #6454

asomers opened this issue Dec 17, 2018 · 17 comments
Labels
A-cargo-targets Area: selection and definition of targets (lib, bins, examples, tests, benches) Command-test S-propose-close Status: A team member has nominated this for closing, pending further input from the team

Comments

@asomers
Copy link

asomers commented Dec 17, 2018

Problem
Running cargo test --benches is supposed to build all benchmarks in debug mode and run them. It does that. But if there are unit tests, then it runs them, too. It shouldn't do that. In a workspace the problem is more complicated, because it's possible that there are unit tests in subcrates that have no benchmarks. In that case, cargo test --benches --all doesn't build those unit tests. But if they've already been built, then it will run them too.

Steps

  1. git clone https://github.com/carllerche/bytes.git
  2. cd bytes
  3. cargo +nightly test --benches --all

I would expect two executables to run, containing 36 and 20 tests with names like get_f32 and alloc_big. Those are the benchmarks. But a third executable gets built and run, with two tests named bytes::test_original_capacity_from_repr and bytes::test_original_capacity_to_repr. Those are unit tests.

Here is another example, with a workspace repository.

Steps

  1. git clone https://github.com/tokio-rs/tokio.git
  2. cd tokio
  3. cargo +nightly test --all
  4. cargo +nightly test --benches --all

I expect that eighteen benchmarks would run and the unit tests would be ignored, just as if I had typed cargo +nightly bench --all. But the unit tests don't get ignored, so a total of 26 tests and benchmarks get run.

Notes

Output of cargo version:
On FreeBSD x86_64:
cargo 1.33.0-nightly (28fb200 2018-12-09)
And on Linux x86_64:
cargo 1.31.0-nightly (2d0863f 2018-10-20)

@ehuss
Copy link
Contributor

ehuss commented Dec 17, 2018

--benches is a generic target selection flag. cargo test --benches is building all benchmarkable targets and running them as tests. Running the unit-tests is to be expected.

Can you explain more about what you are trying to do? If you are trying to build/run benchmarks in debug mode, then there is an open PR to add that (#6446). As a workaround, you can do cargo test --benches -- --bench which will ignore tests.

@asomers
Copy link
Author

asomers commented Dec 17, 2018

--benches is a generic target selection flag. cargo test --benches is building all benchmarkable targets and running them as tests. Running the unit-tests is to be expected.

I disagree. There's no way that this behavior is intended. For one thing, the help text is "Test all benches". Unit tests aren't benches; they aren't in the benches directory and don't have a #[bench] directive. For another, the behavior is very inconsistent. In a workspace, whether cargo test --benches --all runs a unit test depends on whether that test has been previously built. And in what sense are unit tests benchmarkable but functional tests aren't? cargo test --benches doesn't run the functional tests.

Can you explain more about what you are trying to do? If you are trying to build/run benchmarks in debug mode, then there is an open PR to add that (#6446). As a workaround, you can do cargo test --benches -- --bench which will ignore tests.

Yeah, I submitted that PR :). But then I discovered cargo test --benches, and I don't want to introduce redundant functionality. If cargo test --benches is supposed to run the benchmarks in debug mode, then I should fix this issue and close the other PR. But if cargo test --benches is supposed to run the benchmarks and some of the unit tests too, then I think it's a useless command and I should move forward with the PR.

@ehuss
Copy link
Contributor

ehuss commented Dec 17, 2018

The lib is considered multiple things: a lib target, a test target (for unittests) and a bench target (for benchmarks). When a benchmark (either standalone, or a "lib") is run as a test, it runs the benchmark once just to verify that it works (see https://github.com/rust-lang/rust/blob/adbfec229ce07ff4b2a7bf2d6dec2d13cb224980/src/libtest/lib.rs#L1377-L1397). You can set bench=false if you don't have any benchmarks to prevent something from being included.

I'm actually not able to reproduce the issue with --all changing behavior based on what has been previously built. tokio doesn't build due to warnings on nightly. bytes only builds two executables, not three. Can you put together a minimal repro workspace?

And in what sense are unit tests benchmarkable but functional tests aren't?

lib and bins can contain both unit tests and benchmarks. So to rephrase this, libs are benchmarkable. Integration tests aren't enabled for benchmarks by default because that probably isn't useful for most people. If you really want to benchmark an integration test, then you can set bench = true in the [[test]] section.

Yeah, I submitted that PR :).

Oh, haha, oops, sorry wasn't paying attention.

If cargo test --benches is supposed to run the benchmarks in debug mode, then I should fix this issue and close the other PR. But if cargo test --benches is supposed to run the benchmarks and some of the unit tests too, then I think it's a useless command and I should move forward with the PR.

cargo test --benches should run #[test] and #[bench] functions as tests in all benchmarkable targets (as it does today). I think your PR is going in the right direction.

@asomers
Copy link
Author

asomers commented Dec 17, 2018

@ehuss you seem to be saying that cargo test --benches rightly runs all of a lib's unit tests because any one of those might be a benchmark. But there are three problems with that:

  1. It's not what cargo test --help says.
  2. It's inconsistent, because cargo bench doesn't run unit tests; only benchmarks.
  3. It's inconsistent, because cargo test --bench <BENCHMARK> cannot be used to run unit tests; only benchmarks.

I think there are two alternatives that make sense:

  1. The --bench could be a test selection flag that operates at the program level. In this case cargo test --benches would execute any program that's marked as [[bench]] in Cargo.toml, as well as everything in the benches directory.
  2. Or, the --bench flag could be a test selection flag that operates at the testcase level. In this case cargo test --benches would execute any function that's marked in the source with #[bench], but not stuff marked as #[test]. Cargo obviously has the ability to discriminate between the two, because the cargo bench command does it.

I like option 1 better. I think it would be more intuitive. Plus, it would work for crates that use alternative benchmarking libraries, like criterion, easybench, etc. Option 2 would not.

@ehuss
Copy link
Contributor

ehuss commented Dec 17, 2018

It's not what cargo test --help says.

Depends on how you interpret it. "lib" is a benchmark by default, so it is running tests on lib.

It's inconsistent, because cargo bench doesn't run unit tests; only benchmarks.

I think that inconsistency makes sense. I definitely would not want cargo bench to also run tests. And it's a convenience that cargo test will run benchmarks in test mode. This helps prevent benchmarks from bitrotting since they are often not run in CI.

It's inconsistent, because cargo test --bench <BENCHMARK> cannot be used to run unit tests; only benchmarks.

It will run anything marked #[test] within that benchmark. I'm not sure I see the inconsistency.

would execute any program that's marked as [[bench]] in Cargo.toml

Cargo is not capable of doing that kind of introspection, and I doubt it ever will. You can always change the default behavior by setting bench flag.

Cargo obviously has the ability to discriminate between the two, because the cargo bench command does it.

Cargo is not making any distinction here. When libtest is running, it runs #[test] and #[bench] functions as tests. When you run libtest with the --bench flag, it only runs #[bench] functions as benchmarks. That behavior seems reasonable and useful to me.

@asomers
Copy link
Author

asomers commented Dec 17, 2018

It's not what cargo test --help says.

Depends on how you interpret it. "lib" is a benchmark by default, so it is running tests on lib.

There aren't many ways to interpret it. The help text says it will "Test all benches". Perhaps "lib" is a benchmark for purposes of this command, but that isn't documented anywhere in the help text or in the cargo reference guide. The fact that cargo does this doesn't mean that cargo should do this.

It's inconsistent, because cargo bench doesn't run unit tests; only benchmarks.

I think that inconsistency makes sense. I definitely would not want cargo bench to also run tests.

Nor would I. I also wouldn't want cargo test --benches to run tests.

And it's a convenience that cargo test will run benchmarks in test mode. This helps prevent benchmarks from bitrotting since they are often not run in CI.

It is a convenience that cargo test runs any benchmarks in the src directory. But it does that anyway even without the --benches flag. Running the lib tests both with and without --benches isn't a convenience, it's a redundancy.

It's inconsistent, because cargo test --bench <BENCHMARK> cannot be used to run unit tests; only benchmarks.

It will run anything marked #[test] within that benchmark. I'm not sure I see the inconsistency.

cargo test --bench <BENCHMARK> won't even invoke the lib tests' binary. There's no argument it will accept that will run anything in the src dir. It only accepts the names of files in the benches dir (and probably anything else marked [[benchmark]] in Cargo.toml).

would execute any program that's marked as [[bench]] in Cargo.toml

Cargo is not capable of doing that kind of introspection, and I doubt it ever will. You can always change the default behavior by setting bench flag.

Yes it is. It already does. That kind of introspection is what cargo test --bench <BENCHMARK> is already doing.

Cargo obviously has the ability to discriminate between the two, because the cargo bench command does it.

Cargo is not making any distinction here. When libtest is running, it runs #[test] and #[bench] functions as tests. When you run libtest with the --bench flag, it only runs #[bench] functions as benchmarks. That behavior seems reasonable and useful to me.

Without making such a distinction, cargo's --benches flag isn't very useful. After all, plain cargo test already runs any benchmark functions in the src dir. That's why I favor alternative 1. It makes sense to the user, and

@ehuss I think you understand very well what cargo is doing. But you aren't thinking about what cargo should be doing. Its existing behavior is not very useful.

Here's an informal survey. I have 988 crates in my ~/.cargo/registry/src/jackfan.us.kg-1ecc6299db9ec823 directory (including multiple versions of the same crate). Only 43 of them include the regex #\[bench\] anywhere outside of the benches directory, and none of those run cargo test --benches in their Travis-CI configurations. I also have three crates that do use cargo test --benches in Travis (arc-swap, http, and rand) and all three also run plain cargo test. It's as if the authors don't realize that cargo test --benches also runs their unit tests. If cargo adopts my first suggestion, then those three crates' Travis configurations will suddenly make sense.

@ehuss
Copy link
Contributor

ehuss commented Dec 17, 2018

that isn't documented anywhere in the help text or in the cargo reference guide

As of this morning, a significant amount of new documentation was added. The most relevant part is probably https://github.com/rust-lang/cargo/blob/master/src/doc/man/options-targets.adoc which describes the --benches flag.

cargo test --bench <BENCHMARK> won't even invoke the lib tests' binary. There's no argument it will accept that will run anything in the src dir.

--bench is for running things in the benches directory. cargo test --lib will run the lib, or --bin can be used for binaries.

That kind of introspection is what cargo test --bench <BENCHMARK> is already doing.

Ah, I misread what you wrote. I thought it said #[bench] not [[bench]].

cargo's --benches flag isn't very useful.

It depends on what you are trying to accomplish. cargo test --benches is an unusual thing to do, but not unreasonable. If you just want to test everything, including benchmarks (as quick checks), cargo test --all-targets would be the preferred command. If you want to run benchmarks, cargo bench is usually what you want. I'm still not sure I understand what would be the use case for cargo test --benches.

I hear what you are saying, though, and this is useful to work through. Others have pointed out confusion with --tests also including libs/bins by default, and there is no way to say "run all integration tests only". I think it can be improved. But it needs to be distilled to a use case ("I want to verify all my tests compile" — use cargo check --tests). Often it's already possible to do it, sometimes not. For example, I think adding some kind of glob matching could be useful, and could allow one to select just [[bench]] or [[test]] targets. I think the current behavior for --benches makes sense considering that it builds everything with bench=true set, and that it is the mirror to --tests which does everything with test=true.

It might also be helpful to distinguish between a target and a mode. [lib] is a target which can be built in different modes (library, test, bench). --bench is only for selecting [[bench]] targets, it is not related to the mode. The command (test, build, bench) will affect the mode.

@najamelan
Copy link

najamelan commented Jan 18, 2019

Sorry to muddle the waters, but on cargo 1.33.0-nightly (2b4a5f1f0 2019-01-12), cargo bench runs unit tests.

@ehuss
Copy link
Contributor

ehuss commented Jan 19, 2019

@najamelan Can you show a sample project? It should say ignored when you run cargo bench for anything marked #[test].

@najamelan
Copy link

najamelan commented Jan 19, 2019 via email

@ehuss
Copy link
Contributor

ehuss commented Jan 19, 2019

There may not be a specific reason why. That is part of libtest. I would suggest opening an issue on rust-lang/rust and discuss possible improvements there. The code for displaying that particular message is here. Tests are force-ignored here, and then benchmarks are run later below. I don't know why it does it that way.

asomers added a commit to bfffs/bfffs that referenced this issue Sep 20, 2019
@ehuss ehuss added the A-cargo-targets Area: selection and definition of targets (lib, bins, examples, tests, benches) label Apr 6, 2020
@jrose-signal
Copy link
Contributor

jrose-signal commented May 24, 2023

Bumping this because lib tests are most definitely not ignored as of Rust 1.69.0. (We're running our tests and benches separately because we're passing libtest harness options to our regular tests.) You can see this by checking out https://github.com/signalapp/libsignal/tree/v0.25.0 and running cargo +stable test --benches -p libsignal-protocol; the benches run as tests, but the test target is doctests lib tests are also run.

Sample output
running 50 tests
test crypto::test::aes_ctr_test ... ok
test crypto::test::aes_cbc_test ... ok
test fingerprint::test::fingerprint_encodings ... ok
test curve::tests::test_decode_size ... ok
test curve::curve25519::tests::test_agreement ... ok
test curve::curve25519::tests::test_signature ... ok
test curve::curve25519::tests::test_random_signatures ... ok
test fingerprint::test::fingerprint_mismatching_identifiers ... ok
test fingerprint::test::fingerprint_matching_identifiers ... ok
test identity_key::tests::test_identity_key_from ... ok
test identity_key::tests::test_serialize_identity_key_pair ... ok
test identity_key::tests::test_alternate_identity_signing ... ok
test incremental_mac::test::chunk_sizes ... ok
test fingerprint::test::fingerprint_mismatching_fingerprints ... ok
test incremental_mac::test::chunk_size_is_never_larger_than_data_size ... ok
test curve::curve25519::tests::test_random_agreements ... ok
test incremental_mac::test::simple_test ... ok
test incremental_mac::test::total_digest_size_is_never_too_big ... ok
test incremental_mac::test::validating_simple_test ... ok
test kem::tests::test_kyber1024_kem ... ok
test kem::tests::test_kyber1024_keypair ... ok
test kem::tests::test_kyber768_keypair ... ok
test kem::tests::test_raw_kem ... ok
test kem::tests::test_serialize ... ok
test protocol::tests::test_decryption_error_message ... ok
test protocol::tests::test_decryption_error_message_for_plaintext ... ok
test protocol::tests::test_pre_key_signal_message_serialize_deserialize ... ok
test protocol::tests::test_sender_key_message_serialize_deserialize ... ok
test protocol::tests::test_signal_message_serialize_deserialize ... ok
test ratchet::keys::tests::test_chain_key_derivation ... ok
test sealed_sender::sealed_sender_v1::test_agreement_and_authentication ... ok
test sealed_sender::sealed_sender_v2::test_agreement_and_authentication ... ok
test sealed_sender::test_lossless_round_trip ... ok
test sender_keys::sender_key_record_add_sender_key_state_tests::add_second_state ... ok
test sender_keys::sender_key_record_add_sender_key_state_tests::add_single_state ... ok
test sender_keys::sender_key_record_add_sender_key_state_tests::when_exceed_maximum_states_then_oldest_is_ejected ... ok
test sender_keys::sender_key_record_add_sender_key_state_tests::when_second_state_with_different_public_key_but_same_chain_id_added_then_it_gets_replaced ... ok
test sender_keys::sender_key_record_add_sender_key_state_tests::when_second_state_with_same_public_key_and_chain_id_added_then_it_becomes_the_most_recent ... ok
test sender_keys::sender_key_record_add_sender_key_state_tests::when_second_state_with_same_public_key_and_chain_id_added_then_it_keeps_first_data ... ok
test incremental_mac::test::final_result_should_be_equal_to_non_incremental_hmac ... ok
test utils::tests::test_ct_is_eq ... ok
test utils::tests::test_ct_is_lt ... ok
test utils::tests::test_ct_is_zero ... ok
test incremental_mac::test::produce_and_validate ... ok
test utils::tests::test_constant_time_cmp ... ok
test incremental_mac::test::incremental_macs_are_valid ... ok
test fingerprint::test::fingerprint_mismatching_versions ... ok
test fingerprint::test::fingerprint_test_v1 ... ok
test fingerprint::test::fingerprint_test_v2 ... ok
test curve::tests::test_large_signatures ... ok

test result: ok. 50 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 1.03s

     Running benches/curve.rs (target/debug/deps/curve-bf6dde112796adde)
Testing generation
Success

Testing key agreement
Success

Testing generate signature
Success

Testing verify signature
Success

     Running benches/kem.rs (target/debug/deps/kem-2bf7b56f6bb80a46)
Testing Kyber768_generate
Success

Testing Kyber768_encapsulate
Success

Testing Kyber768_decapsulate
Success

Testing Kyber1024_generate
Success

Testing Kyber1024_encapsulate
Success

Testing Kyber1024_decapsulate
Success

     Running benches/ratchet.rs (target/debug/deps/ratchet-ecf669b5877f733f)
Testing ratchet/ratchet 100
Success
Testing ratchet/ratchet 1000
Success

     Running benches/sealed_sender.rs (target/debug/deps/sealed_sender-3e1540bb219b7e13)
Testing v1/encrypt
Success

Testing v1/decrypt
Success

Testing v2/encrypt
Success

Testing v2/decrypt
Success

Testing v2/encrypt/multi-recipient/2
Success
Testing v2/encrypt/multi-recipient/5
Success
Testing v2/encrypt/multi-recipient/10
Success

Testing v2/encrypt/multi-device/2
Success
Testing v2/encrypt/multi-device/5
Success
Testing v2/encrypt/multi-device/10
Success

     Running benches/session.rs (target/debug/deps/session-2db415533238f5ab)
Testing session decrypt first message
Success

Testing session encrypt
Success

Testing session decrypt
Success

Testing session decrypt with archived state
Success

Testing session decrypt using previous state
Success

Testing session encrypt+decrypt 1 way
Success

Testing session encrypt+decrypt ping pong
Success

@Marcono1234
Copy link
Contributor

Marcono1234 commented Sep 23, 2023

(as reply to #6454 (comment))

If you just want to test everything, including benchmarks (as quick checks), cargo test --all-targets would be the preferred command. If you want to run benchmarks, cargo bench is usually what you want. I'm still not sure I understand what would be the use case for cargo test --benches.

Except that unfortunately --all-targets does not include doc tests at the moment (#6669). That is why in my use case I wanted to use cargo test and cargo test --benches separately and was surprised that this re-runs regular tests which had already been run by cargo test before.
(A workaround might be #6669 (comment); haven't tested that yet)

Another use case might be for CI setups where you want to split the execution of the different test targets into separate steps so it is easier to see which of them failed.
(Not sure though if this is a bad idea and if it is more efficient to let Cargo build and run all of them at once.)

Also the workaround suggested in #6454 (comment) (... -- --bench) does not seem to work for me with Criterion; it seems to run them as regular benchmarks (performing large number of iterations) instead of as regular tests.

@epage
Copy link
Contributor

epage commented Oct 30, 2023

Considering that --benches is intended as a build-target selection flag and the current behavior is that build-targets are additive to the lib, this is working as expected. The only thing I can think of to change would be to not assume that the lib target should run when specifying a build-targets flag but we are likely stuck with that due to compatibility.

As for doctests, the help output has improved over time to clarify that those aren't normal build targets.

With all of the prior documentation improvements, I'm not seeing anything more we can do with this issue. I'm going to propose to the cargo team that we close it.

Bumping this because lib tests are most definitely not ignored as of Rust 1.69.0.

I believe that sub-thread was referring to having a #[test] inside of your bench. I just verified and these do get ignored.

@epage epage added the S-propose-close Status: A team member has nominated this for closing, pending further input from the team label Oct 30, 2023
@jrose-signal
Copy link
Contributor

jrose-signal commented Oct 30, 2023

Even if this is working as intended (by the Cargo maintainers), could we have a --skip-lib-tests option to let it work as desired? (by the commenters on this issue)

@weihanglo
Copy link
Member

weihanglo commented Oct 30, 2023

This is definitely a documentation issue, a similar one like #10936. I am not sure if the doc is great enough to close this issue. Should we keep one of them?

BTW the new testing-devex team might want to take care of the general confusion of test organization.

@epage
Copy link
Contributor

epage commented Oct 30, 2023

Ah, if we have #10936 for the running of libs generally, then I would recommend we close in favor of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cargo-targets Area: selection and definition of targets (lib, bins, examples, tests, benches) Command-test S-propose-close Status: A team member has nominated this for closing, pending further input from the team
Projects
Development

No branches or pull requests

7 participants