Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(transaction): Idempotent callbacks (immediate runs) #2453

Merged
merged 6 commits into from
Apr 3, 2024

Conversation

dranikpg
Copy link
Contributor

@dranikpg dranikpg commented Jan 22, 2024

This PR generalizes the machanism of running transaction callbacks during scheduling, removing the need for specialized ScheduleUniqueShard/RunQuickie. Instead, transactions can be run now during ScheduleInShard - called "immediate" runs - if the transaction is concluding and either only a single shard is active or the operation can be safely repeated if scheduling failed (idempotent commands, like MGET).

Updates transaction stats to mirror the new changes more closely.

@dranikpg
Copy link
Contributor Author

I want to rebase it on #2455 because currently the code is hard to deal with

@dranikpg dranikpg force-pushed the tx-opt-inline-ooo branch 3 times, most recently from 327d63d to 4e9338a Compare February 20, 2024 06:37
@dranikpg
Copy link
Contributor Author

I removed ScheduleSingleHop 😈

@dranikpg dranikpg force-pushed the tx-opt-inline-ooo branch 6 times, most recently from d02a6d4 to 307a9bb Compare March 21, 2024 16:27
Comment on lines +719 to +714
// This is a contention point for all threads - avoid using it unless necessary.
// Single shard operations can assign txid later if the immediate run failed.
if (unique_shard_cnt_ > 1)
txid_ = op_seq.fetch_add(1, memory_order_relaxed);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning txid unconditionally was costing about ~5% on 8 threads (and will do more on more threads)

The results are (8 threads):

basic, pipeline 20

new: 1.77M
old: 1.76M

mget 3 keys, pipeline 20

new: 950k
old: 670k, txq warnings ~100 len

so its +- the same for single key and +40% for mget 😎

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@dranikpg dranikpg force-pushed the tx-opt-inline-ooo branch from f574338 to 3a917b0 Compare March 21, 2024 17:23
@dranikpg dranikpg changed the title WIP feat(transaction): Idempotent callbacks (immediate runs) feat(transaction): Idempotent callbacks (immediate runs) Mar 21, 2024
@dranikpg dranikpg requested a review from romange March 21, 2024 17:42
src/server/transaction.cc Outdated Show resolved Hide resolved
src/server/transaction.cc Outdated Show resolved Hide resolved
shard->stats().tx_immediate_total++;

RunCallback(shard);
if (coordinator_state_ & COORD_CONCLUDING) // could've been cleared by AVOID_CONCLUDING
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok that you access coordinator_state_ from the shard thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just reading is fine. I write only during "avoid concluding" optimization that is issued only for single shard operations (checking)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not fully understand your comment but please add a comment to the code because it's an exception to the rule about coordinator_state_ access.

@romange
Copy link
Collaborator

romange commented Mar 21, 2024

please add a extensive commit description for the change. It's pretty fundamental. You change a two year old tx design, so worth at good explanation. You can squash commits too

@dranikpg dranikpg force-pushed the tx-opt-inline-ooo branch 2 times, most recently from c664213 to ba591c9 Compare March 22, 2024 13:16
Signed-off-by: Vladislav Oleshko <[email protected]>

chore: Replace Schedule and ScheduleSingleHop with no-op

chore: fix blocking concluding

chore: Allow inlined runs and optimize txid

chore: Move more stuff into RunCallback

Signed-off-by: Vladislav Oleshko <[email protected]>

chore: Enable metrics for new tx
Signed-off-by: Vladislav Oleshko <[email protected]>
@dranikpg dranikpg requested a review from romange March 23, 2024 07:04
@dranikpg dranikpg marked this pull request as ready for review March 23, 2024 07:10
Copy link
Collaborator

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

shard_set->RunBriefInParallel(std::move(cb), is_active);

run_barrier_.Start(unique_shard_cnt_);
if (CanRunInlined()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not?

if (CanRunInlined()) {
  CHECK(ScheduleInShard(EngineShard::tlocal(), can_run_immediately));
} else {
  auto cb = ..
  run_barrier_.Start(unique_shard_cnt_);
  IterateActiveShards([cb](const auto& sd, ShardId i) { shard_set->Add(i, cb); });
  run_barrier_.Wait();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, good idea... I just thought you don't like such checks 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It fails our DCHECK logic for the barrier being active, I changed it a little to make it fit

src/server/transaction.cc Outdated Show resolved Hide resolved
src/server/transaction.cc Outdated Show resolved Hide resolved
shard->stats().tx_immediate_total++;

RunCallback(shard);
if (coordinator_state_ & COORD_CONCLUDING) // could've been cleared by AVOID_CONCLUDING
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not fully understand your comment but please add a comment to the code because it's an exception to the rule about coordinator_state_ access.

@dranikpg dranikpg requested a review from romange March 24, 2024 18:33
romange
romange previously approved these changes Mar 24, 2024
Copy link
Collaborator

@romange romange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Please wait until we release 1.16

@dranikpg dranikpg merged commit fbc55bb into dragonflydb:main Apr 3, 2024
7 checks passed
szinn referenced this pull request in szinn/k8s-homelab Apr 16, 2024
…nfly ( v1.16.1 → v1.17.0 ) (#3473)

This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
|
[docker.dragonflydb.io/dragonflydb/dragonfly](https://togithub.com/dragonflydb/dragonfly)
| minor | `v1.16.1` -> `v1.17.0` |

---

### Release Notes

<details>
<summary>dragonflydb/dragonfly
(docker.dragonflydb.io/dragonflydb/dragonfly)</summary>

###
[`v1.17.0`](https://togithub.com/dragonflydb/dragonfly/releases/tag/v1.17.0)

[Compare
Source](https://togithub.com/dragonflydb/dragonfly/compare/v1.16.1...v1.17.0)

##### Dragonfly v1.17.0

Some prominent changes include:

- Improved performance for MGET operations
([#&#8203;2453](https://togithub.com/dragonflydb/dragonfly/issues/2453))
- Fix argument parsing in json.objkeys
([#&#8203;2872](https://togithub.com/dragonflydb/dragonfly/issues/2872))
- Fix ipv6 support for replication
([#&#8203;2889](https://togithub.com/dragonflydb/dragonfly/issues/2889))
- Support serialisation of bloom filters - saving to and loading from
snapshots
([#&#8203;2846](https://togithub.com/dragonflydb/dragonfly/issues/2846))
- Support of HLL PFADD
([#&#8203;2761](https://togithub.com/dragonflydb/dragonfly/issues/2761))
- Support bullmq workloads that do not have `{}` hashtags in their queue
names
([#&#8203;2890](https://togithub.com/dragonflydb/dragonfly/issues/2890))

##### What's Changed

- fix:
[#&#8203;2745](https://togithub.com/dragonflydb/dragonfly/issues/2745)
don't start migration process again after apply the same the same config
is applied by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[https://github.com/dragonflydb/dragonfly/pull/2822](https://togithub.com/dragonflydb/dragonfly/pull/2822)
- feat(transaction): Idempotent callbacks (immediate runs) by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[https://github.com/dragonflydb/dragonfly/pull/2453](https://togithub.com/dragonflydb/dragonfly/pull/2453)
- refactor(cluster): replace sync_id with node_id for slot migration
[#&#8203;2835](https://togithub.com/dragonflydb/dragonfly/issues/2835)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[https://github.com/dragonflydb/dragonfly/pull/2838](https://togithub.com/dragonflydb/dragonfly/pull/2838)
- feat(tiering): Simple OpManager by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[https://github.com/dragonflydb/dragonfly/pull/2781](https://togithub.com/dragonflydb/dragonfly/pull/2781)
- chore: implement path mutation for JsonFlat by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2805](https://togithub.com/dragonflydb/dragonfly/pull/2805)
- feat(cluster): add migration removing by config
[#&#8203;2835](https://togithub.com/dragonflydb/dragonfly/issues/2835)
by [@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[https://github.com/dragonflydb/dragonfly/pull/2844](https://togithub.com/dragonflydb/dragonfly/pull/2844)
- chore: expose direct API on Bloom objects by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2845](https://togithub.com/dragonflydb/dragonfly/pull/2845)
- chore: generalize CompactObject::AllocateMR by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[https://github.com/dragonflydb/dragonfly/pull/2847](https://togithub.com/dragonflydb/dragonfly/pull/2847)
- feat(tiering): Simplest small bins by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[https://github.com/dragonflydb/dragonfly/pull/2810](https://togithub.com/dragonflydb/dragonfly/pull/2810)
- refactor: clean cluster slot migration code by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[https://github.com/dragonflydb/dragonfly/pull/2848](https://togithub.com/dragonflydb/dragonfly/pull/2848)
- fix(tests): Fix numsub test by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[https://github.com/dragonflydb/dragonfly/pull/2852](https://togithub.com/dragonflydb/dragonfly/pull/2852)
- fix: healthcheck for docker containers by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2853](https://togithub.com/dragonflydb/dragonfly/pull/2853)
- fix: possible crash in tls code by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2854](https://togithub.com/dragonflydb/dragonfly/pull/2854)
- fix(server): Do not block admin-port commands by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[https://github.com/dragonflydb/dragonfly/pull/2842](https://togithub.com/dragonflydb/dragonfly/pull/2842)
- fix(pytest): make pytests fail if server crash on shutdown by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[https://github.com/dragonflydb/dragonfly/pull/2827](https://togithub.com/dragonflydb/dragonfly/pull/2827)
- feat(server): add prints on takeover timeout by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[https://github.com/dragonflydb/dragonfly/pull/2856](https://togithub.com/dragonflydb/dragonfly/pull/2856)
- fix(pytest): dont check process return code on kill by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[https://github.com/dragonflydb/dragonfly/pull/2862](https://togithub.com/dragonflydb/dragonfly/pull/2862)
- fix: authorize the http connection to call commands by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2863](https://togithub.com/dragonflydb/dragonfly/pull/2863)
- feat(cluster): Send number of keys for incoming and outgoing
migrations. by [@&#8203;chakaz](https://togithub.com/chakaz) in
[https://github.com/dragonflydb/dragonfly/pull/2858](https://togithub.com/dragonflydb/dragonfly/pull/2858)
- feat(tiering): TieredStorageV2 by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[https://github.com/dragonflydb/dragonfly/pull/2849](https://togithub.com/dragonflydb/dragonfly/pull/2849)
- bug(server): set connection flags block/pause flag on all blocking
commands by [@&#8203;adiholden](https://togithub.com/adiholden) in
[https://github.com/dragonflydb/dragonfly/pull/2816](https://togithub.com/dragonflydb/dragonfly/pull/2816)
- chore: serialize SBF by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2846](https://togithub.com/dragonflydb/dragonfly/pull/2846)
- fix: test_replicaof_reject_on_load crash on stop by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[https://github.com/dragonflydb/dragonfly/pull/2818](https://togithub.com/dragonflydb/dragonfly/pull/2818)
- feat(dbslice): Add self-laundering iterator in `DbSlice` by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[https://github.com/dragonflydb/dragonfly/pull/2815](https://togithub.com/dragonflydb/dragonfly/pull/2815)
- chore: License update by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2767](https://togithub.com/dragonflydb/dragonfly/pull/2767)
- fix(acl): incompatibilities with acl load by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[https://github.com/dragonflydb/dragonfly/pull/2867](https://togithub.com/dragonflydb/dragonfly/pull/2867)
- fix(json): make path optional in json.objkeys by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[https://github.com/dragonflydb/dragonfly/pull/2872](https://togithub.com/dragonflydb/dragonfly/pull/2872)
- fix: return wrong type errors for SET...GET command by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2874](https://togithub.com/dragonflydb/dragonfly/pull/2874)
- fix(redis replication): remove partial sync flow ,not supported yet by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[https://github.com/dragonflydb/dragonfly/pull/2865](https://togithub.com/dragonflydb/dragonfly/pull/2865)
- chore: limit traffic logger only to the main interface by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2877](https://togithub.com/dragonflydb/dragonfly/pull/2877)
- chore: relax repltakeover constraints to only exclude write commands
by [@&#8203;kostasrim](https://togithub.com/kostasrim) in
[https://github.com/dragonflydb/dragonfly/pull/2873](https://togithub.com/dragonflydb/dragonfly/pull/2873)
- chore(replayer): Roll back to go1.18 by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[https://github.com/dragonflydb/dragonfly/pull/2881](https://togithub.com/dragonflydb/dragonfly/pull/2881)
- fix: brpoplpush single shard to wake up blocked transactions by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[https://github.com/dragonflydb/dragonfly/pull/2875](https://togithub.com/dragonflydb/dragonfly/pull/2875)
- chore: LockTable tracks fingerprints of keys by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2839](https://togithub.com/dragonflydb/dragonfly/pull/2839)
- chore: reject TLS handshake when our listener is plain TCP by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2882](https://togithub.com/dragonflydb/dragonfly/pull/2882)
- Add support for Sparse HLL PFADD by
[@&#8203;azuredream](https://togithub.com/azuredream) in
[https://github.com/dragonflydb/dragonfly/pull/2761](https://togithub.com/dragonflydb/dragonfly/pull/2761)
- feat server: bring visibility to script errors by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[https://github.com/dragonflydb/dragonfly/pull/2879](https://togithub.com/dragonflydb/dragonfly/pull/2879)
- chore: clean up REPLTAKEOVER flow by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2887](https://togithub.com/dragonflydb/dragonfly/pull/2887)
- chore(tiering): Move files and move kb literal to common by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[https://github.com/dragonflydb/dragonfly/pull/2868](https://togithub.com/dragonflydb/dragonfly/pull/2868)
- chore(interpreter): Support object replies by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[https://github.com/dragonflydb/dragonfly/pull/2885](https://togithub.com/dragonflydb/dragonfly/pull/2885)
- fix(ci/helm): Stick to v0.73.0 version of prom operator by
[@&#8203;Pothulapati](https://togithub.com/Pothulapati) in
[https://github.com/dragonflydb/dragonfly/pull/2893](https://togithub.com/dragonflydb/dragonfly/pull/2893)
- fix(acl): authentication with UDS socket by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[https://github.com/dragonflydb/dragonfly/pull/2895](https://togithub.com/dragonflydb/dragonfly/pull/2895)
- feat(cluster): add repeated ACK if an error is happened by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[https://github.com/dragonflydb/dragonfly/pull/2892](https://togithub.com/dragonflydb/dragonfly/pull/2892)
- chore(blocking): Remove faulty DCHECK by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[https://github.com/dragonflydb/dragonfly/pull/2898](https://togithub.com/dragonflydb/dragonfly/pull/2898)
- chore: add a clear link on how to build dragonfly from source by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2884](https://togithub.com/dragonflydb/dragonfly/pull/2884)
- feat(server): Allow configuration of hashtag extraction by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[https://github.com/dragonflydb/dragonfly/pull/2890](https://togithub.com/dragonflydb/dragonfly/pull/2890)
- fix: fix build under macos by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[https://github.com/dragonflydb/dragonfly/pull/2901](https://togithub.com/dragonflydb/dragonfly/pull/2901)
- fix(cluster_replication): replicate redis cluster node bug fix by
[@&#8203;adiholden](https://togithub.com/adiholden) in
[https://github.com/dragonflydb/dragonfly/pull/2876](https://togithub.com/dragonflydb/dragonfly/pull/2876)
- fix(acl): skip http and add check on connection traversals by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[https://github.com/dragonflydb/dragonfly/pull/2883](https://togithub.com/dragonflydb/dragonfly/pull/2883)
- fix(zset): Better memory consumption calculation by
[@&#8203;chakaz](https://togithub.com/chakaz) in
[https://github.com/dragonflydb/dragonfly/pull/2900](https://togithub.com/dragonflydb/dragonfly/pull/2900)
- fix: fix ld for num converting by
[@&#8203;BorysTheDev](https://togithub.com/BorysTheDev) in
[https://github.com/dragonflydb/dragonfly/pull/2902](https://togithub.com/dragonflydb/dragonfly/pull/2902)
- chore: add help string for memory_fiberstack_vms_bytes by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2903](https://togithub.com/dragonflydb/dragonfly/pull/2903)
- fix(sanitizers): false positive fail on multi_test::Eval by
[@&#8203;kostasrim](https://togithub.com/kostasrim) in
[https://github.com/dragonflydb/dragonfly/pull/2896](https://togithub.com/dragonflydb/dragonfly/pull/2896)
- chore: pull helio and add ipv6 replication test by
[@&#8203;dranikpg](https://togithub.com/dranikpg) in
[https://github.com/dragonflydb/dragonfly/pull/2889](https://togithub.com/dragonflydb/dragonfly/pull/2889)
- chore: add ipv6 support for native linux release by
[@&#8203;romange](https://togithub.com/romange) in
[https://github.com/dragonflydb/dragonfly/pull/2908](https://togithub.com/dragonflydb/dragonfly/pull/2908)

##### Huge thanks to all the contributors! ❤️

**Full Changelog**:
dragonflydb/dragonfly@v1.16.0...v1.17.0

</details>

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zMDEuNSIsInVwZGF0ZWRJblZlciI6IjM3LjMwMS41IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19-->

Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
@dranikpg dranikpg deleted the tx-opt-inline-ooo branch May 25, 2024 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants