-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(transaction): Idempotent callbacks (immediate runs) #2453
Conversation
I want to rebase it on #2455 because currently the code is hard to deal with |
327d63d
to
4e9338a
Compare
I removed ScheduleSingleHop 😈 |
d02a6d4
to
307a9bb
Compare
// This is a contention point for all threads - avoid using it unless necessary. | ||
// Single shard operations can assign txid later if the immediate run failed. | ||
if (unique_shard_cnt_ > 1) | ||
txid_ = op_seq.fetch_add(1, memory_order_relaxed); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assigning txid unconditionally was costing about ~5% on 8 threads (and will do more on more threads)
The results are (8 threads):
basic, pipeline 20
new: 1.77M
old: 1.76M
mget 3 keys, pipeline 20
new: 950k
old: 670k, txq warnings ~100 len
so its +- the same for single key and +40% for mget 😎
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
f574338
to
3a917b0
Compare
src/server/transaction.cc
Outdated
shard->stats().tx_immediate_total++; | ||
|
||
RunCallback(shard); | ||
if (coordinator_state_ & COORD_CONCLUDING) // could've been cleared by AVOID_CONCLUDING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it ok that you access coordinator_state_
from the shard thread?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just reading is fine. I write only during "avoid concluding" optimization that is issued only for single shard operations (checking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not fully understand your comment but please add a comment to the code because it's an exception to the rule about coordinator_state_ access.
please add a extensive commit description for the change. It's pretty fundamental. You change a two year old tx design, so worth at good explanation. You can squash commits too |
c664213
to
ba591c9
Compare
Signed-off-by: Vladislav Oleshko <[email protected]> chore: Replace Schedule and ScheduleSingleHop with no-op chore: fix blocking concluding chore: Allow inlined runs and optimize txid chore: Move more stuff into RunCallback Signed-off-by: Vladislav Oleshko <[email protected]> chore: Enable metrics for new tx Signed-off-by: Vladislav Oleshko <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
shard_set->RunBriefInParallel(std::move(cb), is_active); | ||
|
||
run_barrier_.Start(unique_shard_cnt_); | ||
if (CanRunInlined()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not?
if (CanRunInlined()) {
CHECK(ScheduleInShard(EngineShard::tlocal(), can_run_immediately));
} else {
auto cb = ..
run_barrier_.Start(unique_shard_cnt_);
IterateActiveShards([cb](const auto& sd, ShardId i) { shard_set->Add(i, cb); });
run_barrier_.Wait();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, good idea... I just thought you don't like such checks 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails our DCHECK logic for the barrier being active, I changed it a little to make it fit
src/server/transaction.cc
Outdated
shard->stats().tx_immediate_total++; | ||
|
||
RunCallback(shard); | ||
if (coordinator_state_ & COORD_CONCLUDING) // could've been cleared by AVOID_CONCLUDING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not fully understand your comment but please add a comment to the code because it's an exception to the rule about coordinator_state_ access.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Please wait until we release 1.16
Signed-off-by: Vladislav <[email protected]>
…nfly ( v1.16.1 → v1.17.0 ) (#3473) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [docker.dragonflydb.io/dragonflydb/dragonfly](https://togithub.com/dragonflydb/dragonfly) | minor | `v1.16.1` -> `v1.17.0` | --- ### Release Notes <details> <summary>dragonflydb/dragonfly (docker.dragonflydb.io/dragonflydb/dragonfly)</summary> ### [`v1.17.0`](https://togithub.com/dragonflydb/dragonfly/releases/tag/v1.17.0) [Compare Source](https://togithub.com/dragonflydb/dragonfly/compare/v1.16.1...v1.17.0) ##### Dragonfly v1.17.0 Some prominent changes include: - Improved performance for MGET operations ([#​2453](https://togithub.com/dragonflydb/dragonfly/issues/2453)) - Fix argument parsing in json.objkeys ([#​2872](https://togithub.com/dragonflydb/dragonfly/issues/2872)) - Fix ipv6 support for replication ([#​2889](https://togithub.com/dragonflydb/dragonfly/issues/2889)) - Support serialisation of bloom filters - saving to and loading from snapshots ([#​2846](https://togithub.com/dragonflydb/dragonfly/issues/2846)) - Support of HLL PFADD ([#​2761](https://togithub.com/dragonflydb/dragonfly/issues/2761)) - Support bullmq workloads that do not have `{}` hashtags in their queue names ([#​2890](https://togithub.com/dragonflydb/dragonfly/issues/2890)) ##### What's Changed - fix: [#​2745](https://togithub.com/dragonflydb/dragonfly/issues/2745) don't start migration process again after apply the same the same config is applied by [@​BorysTheDev](https://togithub.com/BorysTheDev) in [https://github.com/dragonflydb/dragonfly/pull/2822](https://togithub.com/dragonflydb/dragonfly/pull/2822) - feat(transaction): Idempotent callbacks (immediate runs) by [@​dranikpg](https://togithub.com/dranikpg) in [https://github.com/dragonflydb/dragonfly/pull/2453](https://togithub.com/dragonflydb/dragonfly/pull/2453) - refactor(cluster): replace sync_id with node_id for slot migration [#​2835](https://togithub.com/dragonflydb/dragonfly/issues/2835) by [@​BorysTheDev](https://togithub.com/BorysTheDev) in [https://github.com/dragonflydb/dragonfly/pull/2838](https://togithub.com/dragonflydb/dragonfly/pull/2838) - feat(tiering): Simple OpManager by [@​dranikpg](https://togithub.com/dranikpg) in [https://github.com/dragonflydb/dragonfly/pull/2781](https://togithub.com/dragonflydb/dragonfly/pull/2781) - chore: implement path mutation for JsonFlat by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2805](https://togithub.com/dragonflydb/dragonfly/pull/2805) - feat(cluster): add migration removing by config [#​2835](https://togithub.com/dragonflydb/dragonfly/issues/2835) by [@​BorysTheDev](https://togithub.com/BorysTheDev) in [https://github.com/dragonflydb/dragonfly/pull/2844](https://togithub.com/dragonflydb/dragonfly/pull/2844) - chore: expose direct API on Bloom objects by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2845](https://togithub.com/dragonflydb/dragonfly/pull/2845) - chore: generalize CompactObject::AllocateMR by [@​kostasrim](https://togithub.com/kostasrim) in [https://github.com/dragonflydb/dragonfly/pull/2847](https://togithub.com/dragonflydb/dragonfly/pull/2847) - feat(tiering): Simplest small bins by [@​dranikpg](https://togithub.com/dranikpg) in [https://github.com/dragonflydb/dragonfly/pull/2810](https://togithub.com/dragonflydb/dragonfly/pull/2810) - refactor: clean cluster slot migration code by [@​BorysTheDev](https://togithub.com/BorysTheDev) in [https://github.com/dragonflydb/dragonfly/pull/2848](https://togithub.com/dragonflydb/dragonfly/pull/2848) - fix(tests): Fix numsub test by [@​dranikpg](https://togithub.com/dranikpg) in [https://github.com/dragonflydb/dragonfly/pull/2852](https://togithub.com/dragonflydb/dragonfly/pull/2852) - fix: healthcheck for docker containers by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2853](https://togithub.com/dragonflydb/dragonfly/pull/2853) - fix: possible crash in tls code by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2854](https://togithub.com/dragonflydb/dragonfly/pull/2854) - fix(server): Do not block admin-port commands by [@​chakaz](https://togithub.com/chakaz) in [https://github.com/dragonflydb/dragonfly/pull/2842](https://togithub.com/dragonflydb/dragonfly/pull/2842) - fix(pytest): make pytests fail if server crash on shutdown by [@​adiholden](https://togithub.com/adiholden) in [https://github.com/dragonflydb/dragonfly/pull/2827](https://togithub.com/dragonflydb/dragonfly/pull/2827) - feat(server): add prints on takeover timeout by [@​adiholden](https://togithub.com/adiholden) in [https://github.com/dragonflydb/dragonfly/pull/2856](https://togithub.com/dragonflydb/dragonfly/pull/2856) - fix(pytest): dont check process return code on kill by [@​adiholden](https://togithub.com/adiholden) in [https://github.com/dragonflydb/dragonfly/pull/2862](https://togithub.com/dragonflydb/dragonfly/pull/2862) - fix: authorize the http connection to call commands by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2863](https://togithub.com/dragonflydb/dragonfly/pull/2863) - feat(cluster): Send number of keys for incoming and outgoing migrations. by [@​chakaz](https://togithub.com/chakaz) in [https://github.com/dragonflydb/dragonfly/pull/2858](https://togithub.com/dragonflydb/dragonfly/pull/2858) - feat(tiering): TieredStorageV2 by [@​dranikpg](https://togithub.com/dranikpg) in [https://github.com/dragonflydb/dragonfly/pull/2849](https://togithub.com/dragonflydb/dragonfly/pull/2849) - bug(server): set connection flags block/pause flag on all blocking commands by [@​adiholden](https://togithub.com/adiholden) in [https://github.com/dragonflydb/dragonfly/pull/2816](https://togithub.com/dragonflydb/dragonfly/pull/2816) - chore: serialize SBF by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2846](https://togithub.com/dragonflydb/dragonfly/pull/2846) - fix: test_replicaof_reject_on_load crash on stop by [@​kostasrim](https://togithub.com/kostasrim) in [https://github.com/dragonflydb/dragonfly/pull/2818](https://togithub.com/dragonflydb/dragonfly/pull/2818) - feat(dbslice): Add self-laundering iterator in `DbSlice` by [@​chakaz](https://togithub.com/chakaz) in [https://github.com/dragonflydb/dragonfly/pull/2815](https://togithub.com/dragonflydb/dragonfly/pull/2815) - chore: License update by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2767](https://togithub.com/dragonflydb/dragonfly/pull/2767) - fix(acl): incompatibilities with acl load by [@​kostasrim](https://togithub.com/kostasrim) in [https://github.com/dragonflydb/dragonfly/pull/2867](https://togithub.com/dragonflydb/dragonfly/pull/2867) - fix(json): make path optional in json.objkeys by [@​kostasrim](https://togithub.com/kostasrim) in [https://github.com/dragonflydb/dragonfly/pull/2872](https://togithub.com/dragonflydb/dragonfly/pull/2872) - fix: return wrong type errors for SET...GET command by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2874](https://togithub.com/dragonflydb/dragonfly/pull/2874) - fix(redis replication): remove partial sync flow ,not supported yet by [@​adiholden](https://togithub.com/adiholden) in [https://github.com/dragonflydb/dragonfly/pull/2865](https://togithub.com/dragonflydb/dragonfly/pull/2865) - chore: limit traffic logger only to the main interface by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2877](https://togithub.com/dragonflydb/dragonfly/pull/2877) - chore: relax repltakeover constraints to only exclude write commands by [@​kostasrim](https://togithub.com/kostasrim) in [https://github.com/dragonflydb/dragonfly/pull/2873](https://togithub.com/dragonflydb/dragonfly/pull/2873) - chore(replayer): Roll back to go1.18 by [@​dranikpg](https://togithub.com/dranikpg) in [https://github.com/dragonflydb/dragonfly/pull/2881](https://togithub.com/dragonflydb/dragonfly/pull/2881) - fix: brpoplpush single shard to wake up blocked transactions by [@​kostasrim](https://togithub.com/kostasrim) in [https://github.com/dragonflydb/dragonfly/pull/2875](https://togithub.com/dragonflydb/dragonfly/pull/2875) - chore: LockTable tracks fingerprints of keys by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2839](https://togithub.com/dragonflydb/dragonfly/pull/2839) - chore: reject TLS handshake when our listener is plain TCP by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2882](https://togithub.com/dragonflydb/dragonfly/pull/2882) - Add support for Sparse HLL PFADD by [@​azuredream](https://togithub.com/azuredream) in [https://github.com/dragonflydb/dragonfly/pull/2761](https://togithub.com/dragonflydb/dragonfly/pull/2761) - feat server: bring visibility to script errors by [@​adiholden](https://togithub.com/adiholden) in [https://github.com/dragonflydb/dragonfly/pull/2879](https://togithub.com/dragonflydb/dragonfly/pull/2879) - chore: clean up REPLTAKEOVER flow by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2887](https://togithub.com/dragonflydb/dragonfly/pull/2887) - chore(tiering): Move files and move kb literal to common by [@​dranikpg](https://togithub.com/dranikpg) in [https://github.com/dragonflydb/dragonfly/pull/2868](https://togithub.com/dragonflydb/dragonfly/pull/2868) - chore(interpreter): Support object replies by [@​dranikpg](https://togithub.com/dranikpg) in [https://github.com/dragonflydb/dragonfly/pull/2885](https://togithub.com/dragonflydb/dragonfly/pull/2885) - fix(ci/helm): Stick to v0.73.0 version of prom operator by [@​Pothulapati](https://togithub.com/Pothulapati) in [https://github.com/dragonflydb/dragonfly/pull/2893](https://togithub.com/dragonflydb/dragonfly/pull/2893) - fix(acl): authentication with UDS socket by [@​kostasrim](https://togithub.com/kostasrim) in [https://github.com/dragonflydb/dragonfly/pull/2895](https://togithub.com/dragonflydb/dragonfly/pull/2895) - feat(cluster): add repeated ACK if an error is happened by [@​BorysTheDev](https://togithub.com/BorysTheDev) in [https://github.com/dragonflydb/dragonfly/pull/2892](https://togithub.com/dragonflydb/dragonfly/pull/2892) - chore(blocking): Remove faulty DCHECK by [@​dranikpg](https://togithub.com/dranikpg) in [https://github.com/dragonflydb/dragonfly/pull/2898](https://togithub.com/dragonflydb/dragonfly/pull/2898) - chore: add a clear link on how to build dragonfly from source by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2884](https://togithub.com/dragonflydb/dragonfly/pull/2884) - feat(server): Allow configuration of hashtag extraction by [@​chakaz](https://togithub.com/chakaz) in [https://github.com/dragonflydb/dragonfly/pull/2890](https://togithub.com/dragonflydb/dragonfly/pull/2890) - fix: fix build under macos by [@​BorysTheDev](https://togithub.com/BorysTheDev) in [https://github.com/dragonflydb/dragonfly/pull/2901](https://togithub.com/dragonflydb/dragonfly/pull/2901) - fix(cluster_replication): replicate redis cluster node bug fix by [@​adiholden](https://togithub.com/adiholden) in [https://github.com/dragonflydb/dragonfly/pull/2876](https://togithub.com/dragonflydb/dragonfly/pull/2876) - fix(acl): skip http and add check on connection traversals by [@​kostasrim](https://togithub.com/kostasrim) in [https://github.com/dragonflydb/dragonfly/pull/2883](https://togithub.com/dragonflydb/dragonfly/pull/2883) - fix(zset): Better memory consumption calculation by [@​chakaz](https://togithub.com/chakaz) in [https://github.com/dragonflydb/dragonfly/pull/2900](https://togithub.com/dragonflydb/dragonfly/pull/2900) - fix: fix ld for num converting by [@​BorysTheDev](https://togithub.com/BorysTheDev) in [https://github.com/dragonflydb/dragonfly/pull/2902](https://togithub.com/dragonflydb/dragonfly/pull/2902) - chore: add help string for memory_fiberstack_vms_bytes by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2903](https://togithub.com/dragonflydb/dragonfly/pull/2903) - fix(sanitizers): false positive fail on multi_test::Eval by [@​kostasrim](https://togithub.com/kostasrim) in [https://github.com/dragonflydb/dragonfly/pull/2896](https://togithub.com/dragonflydb/dragonfly/pull/2896) - chore: pull helio and add ipv6 replication test by [@​dranikpg](https://togithub.com/dranikpg) in [https://github.com/dragonflydb/dragonfly/pull/2889](https://togithub.com/dragonflydb/dragonfly/pull/2889) - chore: add ipv6 support for native linux release by [@​romange](https://togithub.com/romange) in [https://github.com/dragonflydb/dragonfly/pull/2908](https://togithub.com/dragonflydb/dragonfly/pull/2908) ##### Huge thanks to all the contributors! ❤️ **Full Changelog**: dragonflydb/dragonfly@v1.16.0...v1.17.0 </details> <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zMDEuNSIsInVwZGF0ZWRJblZlciI6IjM3LjMwMS41IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL21pbm9yIl19--> Co-authored-by: repo-jeeves[bot] <106431701+repo-jeeves[bot]@users.noreply.github.com>
This PR generalizes the machanism of running transaction callbacks during scheduling, removing the need for specialized ScheduleUniqueShard/RunQuickie. Instead, transactions can be run now during ScheduleInShard - called "immediate" runs - if the transaction is concluding and either only a single shard is active or the operation can be safely repeated if scheduling failed (idempotent commands, like MGET).
Updates transaction stats to mirror the new changes more closely.