Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible aborted reads with process crashes #1072

Closed
aphyr opened this issue Jun 14, 2022 · 2 comments
Closed

Possible aborted reads with process crashes #1072

aphyr opened this issue Jun 14, 2022 · 2 comments
Labels

Comments

@aphyr
Copy link

aphyr commented Jun 14, 2022

Versions

  • etcd: 3.5.3
  • jetcd: 0.7.1
  • java: openjdk version "17.0.3" 2022-04-19 (Debian)

Describe the bug

I've got a Jepsen test case for etcd which appears to show something like an aborted read in response to process crashes in five-node clusters running on Debian stable. A Jetcd client submits a transaction with a modified-revision compare clause on some number of keys, and a success branch which performs some writes. Occasionally etcd returns a TxnResponse which (per wireshark) does not contain a succeeded field. When we ask Jetcd for TxnResponse.isSucceeded, it returns false; a reasonable client would assume that this transaction did not execute its successful branch. However, the effects of the successful branch are visible to later reads. If this occurred in an SQL database (and we treated succeeded for a txn with only a success branch as meaning commit/abort), I'd be inclined to call this an aborted read.

I've filed this in detail on the main etcd repo, but they've repeatedly informed me this must be a client issue--either something in jetcd or in the Jepsen test itself. I'd be delighted to find out this is the case, but based on the wireshark disassembly, I can't see how this could be the client's fault: at the wire level, I can't find any way to distinguish these "not-succeeded-but-actually-succeeded" transactions from "not-succeeded-and-actually-not-succeeded" ones. I was hoping you might be able to share some insight!

To Reproduce

Clone https://github.com/jepsen-io/etcd at a1bf380a1c09d62bf6bf2e7b97bd02a35902ed36, and run:

lein run test-all -w append --concurrency 2n --time-limit 1000 --rate 1000 --test-count 5 --nemesis kill

Expected behavior

I expect that transactions which return TxnResponse.isSucceeded() = false would not, in fact, appear to execute their success branches.

Additional context

@lburgazzoli
Copy link
Collaborator

Honestly it's been long time since I looked at the transaction code so it is quite difficult to me to give any hint at this stage. I'd be happy to include any patch in case it is an issue in jetcd.

@github-actions
Copy link

This issue is stale because it has been open 60 days with no activity.
Remove stale label or comment or this will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants