Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: slow regression tests tests #4117

Merged
merged 3 commits into from
Nov 13, 2024
Merged

fix: slow regression tests tests #4117

merged 3 commits into from
Nov 13, 2024

Conversation

kostasrim
Copy link
Contributor

@kostasrim kostasrim commented Nov 12, 2024

We have a few very slow tests on the CI. Specifically:

241.30s call     dragonfly/replication_test.py::test_replication_all[df_factory0-mode0-8-t_replicas8-seeder_config8-50000-True]
240.80s call     dragonfly/replication_test.py::test_replication_all[df_factory0-mode0-8-t_replicas7-seeder_config7-50000-False]
239.92s call     dragonfly/replication_test.py::test_replication_all[df_factory0-mode1-8-t_replicas7-seeder_config7-50000-False]
236.87s call     dragonfly/replication_test.py::test_replication_all[df_factory0-mode1-8-t_replicas8-seeder_config8-50000-True]
178.18s call     dragonfly/snapshot_test.py::test_big_value_serialization_memory_limit[HSET-df_factory0]
175.97s call     dragonfly/snapshot_test.py::test_big_value_serialization_memory_limit[SADD-df_factory0]
175.79s call     dragonfly/snapshot_test.py::test_big_value_serialization_memory_limit[ZSET-df_factory0]
172.35s call     dragonfly/snapshot_test.py::test_big_value_serialization_memory_limit[LIST-df_factory0]
60.21s call     dragonfly/connection_test.py::test_pubsub_busy_connections[df_factory0]

This PR changes:

It refactors the test_big_value_serialization. First, the test was incorrect, in fact for some reason it did not even run with big value serialization flag set. Second, the test used execute_command in a loop which took a substantial amount of time. The changes on my machine reduce the total running time of the test from 4 * 170 seconds on average to a staggering 1 minute and 10 seconds for all 4.

Also notice, that we don't really need to test big value serialization on a stress test. It's redundant because when we set the value to a small number (4096 etc) it will have the same effect regardless of the stress load. I removed that one test case, saving as roughly (235 * 2) seconds

  • refactor test_big_value_serialization
  • remove no needed replication tests for big value

@kostasrim
Copy link
Contributor Author

Impact on a full run: https://github.com/dragonflydb/dragonfly/actions/runs/11796502461

(I will post the results once it completes)

@kostasrim kostasrim self-assigned this Nov 12, 2024
Signed-off-by: kostas <[email protected]>
@kostasrim
Copy link
Contributor Author

x86 debug 28 minutes now takes 22
arm debug 42 minutes now takes 34
x86 release 28 minutes now takes 19
arm release 46 minutes now takes 28

Savings on a full run 24minutes (because arm tests run sequentially)

https://github.com/dragonflydb/dragonfly/actions/runs/11797723885/job/32862315798
vs
https://github.com/dragonflydb/dragonfly/actions/runs/11793906726/job/32850430451

@kostasrim kostasrim requested a review from adiholden November 12, 2024 15:51
@kostasrim
Copy link
Contributor Author

At some point I will take care of:

180.20s call dragonfly/replication_test.py::test_replicaof_reject_on_load[df_seeder_factory0-df_factory0]

It's the next low hanging fruit

await asyncio.sleep(0.01)

checker = asyncio.create_task(check_memory_usage(instance))
await client.execute_command(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment here what it exectly does.
i.e add 1 db entry of given type with elements num each one of size element size

Now my question is why do we end up with more than 2g rss if we have one entry of 1000 elemets each is 1Mb size

Copy link
Contributor Author

@kostasrim kostasrim Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSS will grow during DEBUG populate, For example run:

debug populate 1 prefix 1000000 TYPE hash RAND ELEMENTS 1000

Creates a hash table with 1GB total size. Now do INFO MEMORY. RSS is: 2062901248 (2GB ~ doubled).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not answer the question why, but we can continue with this PR and please create another github ticket so we can follow up on this to investigate why we have this overhead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh you meant what is causing the rss spike. I do not know, it was an observation. I created an issue #4124

@@ -566,12 +566,12 @@ async def test_tiered_entries_throttle(async_client: aioredis.Redis):
assert await StaticSeeder.capture(async_client) == start_capture


@dfly_args({"proactor_threads": 1})
@dfly_args({"serialization_max_chunk_size": 4096, "proactor_threads": 1})
@pytest.mark.parametrize(
"query",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please rename query to container type

ten_mb = 10_000_000
one_gb = 1_000_000_000
elements = 1000
one_mb = 1_000_000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename one_mb to elemtent_size


info = await client.info("ALL")
# rss double's because of DEBUG POPULATE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean, why does it doubles because of DEBUG POPULATE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my other comment

@kostasrim kostasrim requested a review from adiholden November 13, 2024 07:59
@kostasrim kostasrim merged commit 91c236a into main Nov 13, 2024
12 checks passed
@kostasrim kostasrim deleted the kpr5 branch November 13, 2024 08:32
@kostasrim kostasrim changed the title fix: slow CI tests fix: slow regression tests tests Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants