Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Add Robustness test to reproduce issue 18089 #19169

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

jmao-dd
Copy link

@jmao-dd jmao-dd commented Jan 11, 2025

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

This is just an idea about how to reproduce issue 18089 using robustness tests.

With a few tweaks I was able to reproduce the issue by issuing more Delete requests and running frequent compact.

make test-robustness-issue18089
...
logger.go:146: 2025-01-11T00:36:07.292Z     ERROR   Broke watch guarantee   {"guarantee": "resumable", "client": 9, "request": {"Key":"/registry/pods/","Revision":539,"WithPrefix":true,"WithProgressNotify":true,"WithPrevKV":true}, "got-event": {"Type":"put-operation","Key":"/registry/pods/default/jzIW7","Value":{"Value":"563","Hash":0},"Revision":540,"IsCreate":false,"PrevValue":{"Value":{"Value":"560","Hash":0},"ModRevision":537}}, "want-event": {"Type":"delete-operation","Key":"/registry/pods/default/DOj8f","Value":{"Value":"","Hash":0},"Revision":539,"IsCreate":false}}
    validate.go:48: Failed validating watch history, err: broke Resumable - A broken watch can be resumed by establishing a new watch starting after the last revision received in a watch event before the break, so long as the revision is in the history window
    logger.go:146: 2025-01-11T00:36:07.292Z     INFO    Validating serializable operations
    logger.go:146: 2025-01-11T00:36:07.292Z     INFO    Saving robustness test report   {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721"}
    logger.go:146: 2025-01-11T00:36:07.292Z     INFO    Saving member data dir  {"member": "TestRobustnessRegressionIssue18089-test-0", "path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/server-TestRobustnessRegressionIssue18089-test-0"}
    logger.go:146: 2025-01-11T00:36:07.292Z     INFO    Saving watch operations {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-1/watch.json"}
    logger.go:146: 2025-01-11T00:36:07.293Z     INFO    no KV operations for client, skip persisting    {"client-id": 1}
    logger.go:146: 2025-01-11T00:36:07.294Z     INFO    no watch operations for client, skip persisting {"client-id": 2}
    logger.go:146: 2025-01-11T00:36:07.294Z     INFO    Saving operation history        {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-2/operations.json"}
    logger.go:146: 2025-01-11T00:36:07.294Z     INFO    Saving watch operations {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-3/watch.json"}
    logger.go:146: 2025-01-11T00:36:07.294Z     INFO    Saving operation history        {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-3/operations.json"}
    logger.go:146: 2025-01-11T00:36:07.294Z     INFO    Saving watch operations {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-4/watch.json"}
    logger.go:146: 2025-01-11T00:36:07.295Z     INFO    Saving operation history        {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-4/operations.json"}
    logger.go:146: 2025-01-11T00:36:07.295Z     INFO    Saving watch operations {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-5/watch.json"}
    logger.go:146: 2025-01-11T00:36:07.296Z     INFO    Saving operation history        {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-5/operations.json"}
    logger.go:146: 2025-01-11T00:36:07.296Z     INFO    Saving watch operations {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-6/watch.json"}
    logger.go:146: 2025-01-11T00:36:07.297Z     INFO    Saving operation history        {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-6/operations.json"}
    logger.go:146: 2025-01-11T00:36:07.297Z     INFO    Saving watch operations {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-7/watch.json"}
    logger.go:146: 2025-01-11T00:36:07.297Z     INFO    Saving operation history        {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-7/operations.json"}
    logger.go:146: 2025-01-11T00:36:07.298Z     INFO    Saving watch operations {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-8/watch.json"}
    logger.go:146: 2025-01-11T00:36:07.298Z     INFO    Saving operation history        {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-8/operations.json"}
    logger.go:146: 2025-01-11T00:36:07.299Z     INFO    Saving watch operations {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-9/watch.json"}
    logger.go:146: 2025-01-11T00:36:07.299Z     INFO    Saving operation history        {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-9/operations.json"}
    logger.go:146: 2025-01-11T00:36:07.299Z     INFO    Saving watch operations {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-10/watch.json"}
    logger.go:146: 2025-01-11T00:36:07.300Z     INFO    Saving operation history        {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-10/operations.json"}
    logger.go:146: 2025-01-11T00:36:07.301Z     INFO    no watch operations for client, skip persisting {"client-id": 11}
    logger.go:146: 2025-01-11T00:36:07.301Z     INFO    Saving operation history        {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/client-11/operations.json"}
    logger.go:146: 2025-01-11T00:36:07.301Z     INFO    Saving visualization    {"path": "/tmp/TestRobustnessRegression_Issue18089/1736555767292791721/history.html"}
    logger.go:146: 2025-01-11T00:36:08.015Z     INFO    killing server...       {"name": "TestRobustnessRegressionIssue18089-test-0"}
    logger.go:146: 2025-01-11T00:36:08.015Z     INFO    stopping server...      {"name": "TestRobustnessRegressionIssue18089-test-0"}
    logger.go:146: 2025-01-11T00:36:08.018Z     INFO    stopped server. {"name": "TestRobustnessRegressionIssue18089-test-0"}
--- FAIL: TestRobustnessRegression (6.24s)
    --- FAIL: TestRobustnessRegression/Issue18089 (6.21s)
FAIL
FAIL    go.etcd.io/etcd/tests/v3/robustness     54.537s
FAIL
FAIL: (code:1):
  % (cd tests && 'env' 'ETCD_VERIFY=all' 'go' 'test' 'go.etcd.io/etcd/tests/v3/robustness' '-timeout=30m' '-v' '--run=TestRobustnessRegression/Issue18089' '-timeout' '1h' '--count' '100' '--failfast' '--bin-dir=/tmp/etcd-v3.5.15-failpoints/bin')
ERROR: Tests for following packages failed:
   go.etcd.io/etcd/tests/v3/robustness
FAIL: 'robustness' FAILED at Sat 11 Jan 2025 12:36:08 AM UTC
make[1]: *** [Makefile:59: test-robustness] Error 255
make[1]: Leaving directory '/home/---/jmao/etcd'
Successful reproduction

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jmao-dd
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link

Hi @jmao-dd. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@@ -37,7 +37,7 @@ var (
RequestTimeout = 200 * time.Millisecond
WatchTimeout = time.Second
MultiOpTxnOpCount = 4
CompactionPeriod = 200 * time.Millisecond
CompactionPeriod = 20 * time.Millisecond
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refactor this to be controlled by traffic parameter in a formal PR.

{Choice: List, Weight: 5},
{Choice: Delete, Weight: 30},
{Choice: Put, Weight: 30},
{Choice: CompareAndSet, Weight: 20},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a new type of Traffic for high Delete traffic

@@ -208,7 +208,7 @@ func RunCompactLoop(ctx context.Context, c *client.RecordingClient, period time.
}

// Range allows for both revision has been compacted and future revision errors
compactRev := random.RandRange(lastRev, resp.Header.Revision+5)
compactRev := random.RandRange(lastRev, resp.Header.Revision)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will refactor this to be controlled by traffic parameter in a formal PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants