-
Notifications
You must be signed in to change notification settings - Fork 725
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various cleanups for Elasticsearch test #51
Conversation
This increases the default logging to DEBUG, and sets TRACE logging for gateway and discovery packages.
This allows someone to collect the ES logs and data *after* a test run. Otherwise the logs and data is removed and ES is stopped, making further debugging impossible.
After an index is created, clients should always wait for the index to be fully created (the request returns immediately) before starting the test.
No need for the `query_then_fetch` setting, use ten seconds instead of one minute, and a more reasonable size of 20 rather than 2.
@aphyr also, how would you feel about me replacing Elastisch with vanilla clj-http? Elastisch uses clj-http internally anyway and it would reduce the number of moving parts in this test. I'm happy to submit another PR if you are interested. |
(throw (RuntimeException. err)))))) | ||
|
||
:settings {"index" {"refresh_interval" "1s"}}) | ||
(catch Throwable t)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really prefer to know about connection errors etc that happen here; the only reason it's appropriate to noop is if the index already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will remove this to only handle IndexAlreadyExistsException
as you previously did. I do think using basic clj-http
would be easier, as it'd allow you to use:
(try+
...
(catch [:status 400]
;; ignore
Instead of catching any throwable during index creation, catch a specific exception and ensure it was only because the index already existed. Additionally, this changes the hardcoded node count of 5 to `(count (:nodes test))` so a dynamic number of nodes can be used.
Pushed another commit addressing your feedback, thanks for taking a look! |
Excellent, thanks @dakrone :) |
Various cleanups for Elasticsearch test
Do these tests run for you? Elasticsearch doesn't even start on my nodes any more; times out waiting for cluster recovery. |
@aphyr I just double-checked this and the tests are still running for me |
Various cleanups for Elasticsearch test
This change makes four changes:
In order to actually determine the root cause of issues, more verbose logging is needed. This defaults to more verbose logging for Elasticsearch and adds the ability to change it from Jepsen in the future (instead of manually by hand).
nuke!
to before testsWithout this, Jepsen deletes all traces of itself after running, which makes debugging much more difficult (no logs and no data left).
An optional change, but I figured I would make this change anyway.
This is a key part of testing Elasticsearch, and clients should always do this when creating indices.
Note that I was able to reproduce the failures in elastic/elasticsearch#10426 (about half of the time) without these changes, however after the change which waits for green after index creation, I am no longer able to reproduce data loss with the
create-pause
test (still evaluating the other tests).