-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture 'ForceMerge' task list exceptions #1804
Conversation
Hmm, actually turns out there's some scenario where we get a successful request, but the response body doesn't contain an empty
So I've got a reproduction env setup and re-running with the |
Stack trace that triggered this PR:
Exception is raised in here and caught here. Based on this we know the exception is |
I should have left this in draft because I've been trying to reproduce the issue, but I agree, and I came to the same conclusion as you did - I just couldn't see how a successful response would be missing
But, I've now managed to reproduce the issue and I'm squarely back in the ❓ camp.. as it looks like it's possible the API response has changed?
It looks like somehow the force merge API response (and not the tasks list API response) is returned:
|
I'm becoming increasingly more convinced we have a bug in from elasticsearch import AsyncElasticsearch
import asyncio
import ssl
import certifi
from datetime import datetime, timezone
ssl_context = ssl.create_default_context(ssl.Purpose.SERVER_AUTH, cafile=certifi.where())
ssl_context.check_hostname = False
ssl_context.verify_mode = ssl.CERT_NONE
es = AsyncElasticsearch(
hosts=["https://10.132.0.63:9200"],
ssl_context=ssl_context,
verify_certs=False,
basic_auth=("elastic", "changeme"),
)
async def tasks():
while True:
tasks = await es.tasks.list(params={"actions": "indices:admin/forcemerge"})
print(datetime.now(timezone.utc), tasks)
async def main():
await tasks()
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main()) I run this script concurrently with the Rally reproduction and don't see this behaviour. Standalone AsyncElasticsearch:
Rally logs:
|
It's definitely looking like a client side issue. I've had mixed luck reproducing the issue with a standalone script using both the vanilla I enabled Setting
What Elasticsearch reports:
If we then decode those responses:
What the Python client's elastic transport package logged:
|
Talk about convenient timing! I've confirmed we're hitting this bug aio-libs/aiohttp#7864.
I think I'll raise a PR to pin us to |
No longer needed, #1806 is the fix. |
Apparently it's possible that we just swallow the exception here. In 100s of runs we haven't seen this before, but I'd like to properly capture the error if it happens again, then we can decide how (or if) we add any resiliency.