-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: can't add a new key into hash during iteration #1331
Comments
I don't see an obvious care where we are adding keys to the hash during iteration. The I can rewrite the code to avoid using the default block for assigning the list. This should be a simple change that I can ship with the next release so you can test the change. |
This change should go out with our next release. Once this is live, can you update and let me know if this resolves the issue you've been experiencing? |
Thanks for looking into this! I will update as soon as a new release is cut. Unfortunately since the issue is very intermittent (most recently we saw ~2 weeks between occurrences on our production traffic- we logged 692 errors on 11/4, then 139 errors on 11/17), It will be difficult to let you know if it definitively resolves the issue. However I will be sure to update here if the error ever occurs again after updating. |
Going to go ahead and close this for now, but feel free to reopen if you see this again after upgrading to the latest version. |
Hi there! I'm a colleague of @wjordan. This issue has recurred for us using
|
Let's reopen and take a look... |
Is there any additional information we can provide to help investigate the error? Thanks! |
It's just interesting because Trevor's original comment about the mutexes seems to still apply. If you have some sort of scenario that can reliably recreate this (if possible) that may help. Additionally, how frequently are you seeing this issue currently? |
It occurs periodically in bursts. For example, we had this happen not at all for the 2 weeks leading up to June 13. Then it happened 252 times on June 13 and 119 times on June 14. Then not at all for a few days, then 511 times on June 19. When this does happen, it appears to be in a concentrated burst at a rate of about 1-2 faults per minute. They have all been on the same front end server within a burst as far as I can tell, so something about state of the gem or the server, rather than some externality such as network. (This is running on an EC2 instance within AWS VPC, btw.) Even with the intermittency, this happens enough that this is presently our top operational issue in terms of faults/week. |
This is an issue where I'd encourage an upgrade to V3 of the SDK. It is generally better with concurrency, and while I can't guarantee it fixes this issue, it should make it a lot easier to narrow down with the more detailed stack traces. |
Closing soon if this remains inactive, I'd love to know if V3 doesn't resolve this. |
We haven't upgraded to V3 yet, but I will follow up here when we do complete an upgrade. This is still affecting our production application- the last burst was 1,232 instances of this error on Apr 29, running version 2.10.79. |
Hi there! This issue just occurred again for us using We're currently investigating available gem upgrades to see if they help:
Here's a recent backtrace:
Firehose showing up in the backtrace is new, and we're investigating to see if there's something we're doing wrong in our application layer that might cause this, or any infrastructure-level events that might explain it. In the meantime we figured a backtrace from the v3 sdk might be helpful. Thank you! |
Let's reopen and take a look then. One other thing that may be relevant, can you tell us a bit more about how you're doing concurrency in your application? It may help us to attempt to reproduce your issue. |
Do you have any additional details that would help us reproduce the issue? |
Closing this issue. Happy to re-open if there are additional details to reproduce. |
We had the same issue, in |
Hi, I just started seeing this issue with version
Are you still seeing this issue? Are there any ideas on how to fix it? |
We had encountered the same issue. As a result of our investigation, I succeeded in writing a small reproduction code.
require 'aws-sdk-s3'
require 'aws-sdk-dynamodb'
Aws::S3::Client.new.list_buckets
puts Seahorse::Client::NetHttp::ConnectionPool.pools.first.instance_variable_get(:@pool)
# {"https://s3.ap-northeast-1.amazonaws.com"=>[#<Net::HTTP s3.ap-northeast-1.amazonaws.com:443 open=true>]}
Thread.new do
pool = Seahorse::Client::NetHttp::ConnectionPool.pools.first
loop { pool.size } # iterate Seahorse::Client::NetHttp::ConnectionPool @pool forever
end
sleep 0.1 # wait a miniute for the thread to start
fork do
Aws::DynamoDB::Client.new.list_tables
# => can't add a new key into hash during iteration (RuntimeError)
# error occurs when trying `@pool["https://dynamodb.ap-northeast-1.amazonaws.com"] = []`
end This program raises This may be due to a race condition between the parent and child process when using fork(2) system call. How to fix? Please add a line the begging of the forked process for clearing the seahorse connection pools. Like following: fork do
Aws.empty_connection_pools!
...
end |
Thanks for adding a reproduction for this. I understand this is some long standing bug. I was able to reproduce, but then fix, when I change the size method to not iterate (just make a call to If it helps, I can push a fix to that size method, but I can't guarantee you will stop seeing this? |
On our production application we are receiving intermittent stack traces on S3 requests- the errors arrive in bursts of a couple hundred over an hour or two, then nothing for a few days, then another burst (6 over the last 30 days). I suspect this might be some sort of concurrency issue related to network-error retries somewhere deep in the
seahorse
stack, but could use some help figuring out what exactly is going on and how to fix.Using
aws-sdk-core
2.6.1.Relevant part of the stack trace (
net_http
):block in initialize
yield
block in session_for
synchronize
session_for
session
transmit
call
Full
aws-sdk
stack trace with all handlers:Reproducing the issue with debug logging would be challenging (since the issue only occurs intermittently in production), so I'm hoping to track this down by inspecting the codebase directly.
The text was updated successfully, but these errors were encountered: