Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: can't add a new key into hash during iteration #1331

Closed
wjordan opened this issue Nov 2, 2016 · 20 comments · Fixed by #3015
Closed

RuntimeError: can't add a new key into hash during iteration #1331

wjordan opened this issue Nov 2, 2016 · 20 comments · Fixed by #3015

Comments

@wjordan
Copy link

wjordan commented Nov 2, 2016

On our production application we are receiving intermittent stack traces on S3 requests- the errors arrive in bursts of a couple hundred over an hour or two, then nothing for a few days, then another burst (6 over the last 30 days). I suspect this might be some sort of concurrency issue related to network-error retries somewhere deep in the seahorse stack, but could use some help figuring out what exactly is going on and how to fix.

Using aws-sdk-core 2.6.1.

Relevant part of the stack trace (net_http):

Full aws-sdk stack trace with all handlers:

[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/connection_pool.rb:40 :in `block in initialize`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/connection_pool.rb:89 :in `yield`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/connection_pool.rb:89 :in `block in session_for`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/connection_pool.rb:87 :in `synchronize`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/connection_pool.rb:87 :in `session_for`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/handler.rb:116 :in `session`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/handler.rb:68 :in `transmit`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/net_http/handler.rb:42 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/plugins/content_length.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_request_signer.rb:88 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_request_signer.rb:23 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/xml/error_handler.rb:8 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:10 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_request_signer.rb:65 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_redirects.rb:15 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/retry_errors.rb:87 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_dualstack.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_accelerate.rb:42 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_md5s.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_expect_100_continue.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_bucket_name_restrictions.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_bucket_dns.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/rest/handler.rb:7 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/user_agent.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/plugins/endpoint.rb:41 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_url_encoded_keys.rb:39 :in `manage_keys`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_url_encoded_keys.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/param_validator.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/plugins/raise_response_errors.rb:14 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_sse_cpk.rb:19 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_dualstack.rb:23 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/s3_accelerate.rb:33 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/param_converter.rb:20 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/aws-sdk-core/plugins/response_paging.rb:26 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/plugins/response_target.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/request.rb:70 :in `send_request`
[GEM_ROOT]/gems/aws-sdk-core-2.6.1/lib/seahorse/client/base.rb:207 :in `block (2 levels) in define_operation_methods`
[...Application code stacktrace...]

Reproducing the issue with debug logging would be challenging (since the issue only occurs intermittently in production), so I'm hoping to track this down by inspecting the codebase directly.

@trevorrowe
Copy link
Member

I don't see an obvious care where we are adding keys to the hash during iteration. The @pool hash is protected by a mutex and is only enumerated or indexed behind that mutex. The stack trace is terminating inside the hash's block that yields for missing keys.

I can rewrite the code to avoid using the default block for assigning the list. This should be a simple change that I can ship with the next release so you can test the change.

trevorrowe added a commit that referenced this issue Nov 17, 2016
@trevorrowe
Copy link
Member

This change should go out with our next release. Once this is live, can you update and let me know if this resolves the issue you've been experiencing?

@wjordan
Copy link
Author

wjordan commented Nov 17, 2016

Thanks for looking into this! I will update as soon as a new release is cut.

Unfortunately since the issue is very intermittent (most recently we saw ~2 weeks between occurrences on our production traffic- we logged 692 errors on 11/4, then 139 errors on 11/17), It will be difficult to let you know if it definitively resolves the issue. However I will be sure to update here if the error ever occurs again after updating.

@awood45
Copy link
Member

awood45 commented Nov 29, 2016

Going to go ahead and close this for now, but feel free to reopen if you see this again after upgrading to the latest version.

@awood45 awood45 closed this as completed Nov 29, 2016
@islemaster
Copy link

Hi there! I'm a colleague of @wjordan. This issue has recurred for us using aws-sdk-core 2.9.14. Here's a recent stacktrace:

RuntimeError: can't add a new key into hash during iteration
101             # No error raised? Good, check the session into the pool.
102             @pool_mutex.synchronize do
103               @pool[endpoint] = [] unless @pool.key?(endpoint)
104               @pool[endpoint] << session
105             end
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/connection_pool.rb:103 :in `block in session_for`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/connection_pool.rb:102 :in `synchronize`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/connection_pool.rb:102 :in `session_for`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/handler.rb:119 :in `session`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/handler.rb:71 :in `transmit`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/net_http/handler.rb:45 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/plugins/content_length.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_request_signer.rb:88 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_request_signer.rb:23 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/xml/error_handler.rb:8 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:10 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_request_signer.rb:65 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_redirects.rb:15 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/retry_errors.rb:88 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_dualstack.rb:32 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_accelerate.rb:49 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_md5s.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_expect_100_continue.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_bucket_name_restrictions.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_bucket_dns.rb:31 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/rest/handler.rb:7 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/user_agent.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/plugins/endpoint.rb:41 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/param_validator.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/plugins/raise_response_errors.rb:14 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_sse_cpk.rb:19 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_dualstack.rb:24 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/s3_accelerate.rb:34 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:20 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/idempotency_token.rb:18 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/aws-sdk-core/plugins/param_converter.rb:20 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/plugins/response_target.rb:21 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/request.rb:70 :in `send_request`
[GEM_ROOT]/gems/aws-sdk-core-2.9.14/lib/seahorse/client/base.rb:207 :in `block (2 levels) in define_operation_methods`
[...Application code stacktrace...]

@awood45
Copy link
Member

awood45 commented May 25, 2017

Let's reopen and take a look...

@awood45 awood45 reopened this May 25, 2017
@joshlory
Copy link

Is there any additional information we can provide to help investigate the error? Thanks!

@awood45
Copy link
Member

awood45 commented Jun 19, 2017

It's just interesting because Trevor's original comment about the mutexes seems to still apply. If you have some sort of scenario that can reliably recreate this (if possible) that may help.

Additionally, how frequently are you seeing this issue currently?

@jeremydstone
Copy link

It occurs periodically in bursts. For example, we had this happen not at all for the 2 weeks leading up to June 13. Then it happened 252 times on June 13 and 119 times on June 14. Then not at all for a few days, then 511 times on June 19.

When this does happen, it appears to be in a concentrated burst at a rate of about 1-2 faults per minute. They have all been on the same front end server within a burst as far as I can tell, so something about state of the gem or the server, rather than some externality such as network. (This is running on an EC2 instance within AWS VPC, btw.)

Even with the intermittency, this happens enough that this is presently our top operational issue in terms of faults/week.

@awood45
Copy link
Member

awood45 commented Jan 2, 2018

This is an issue where I'd encourage an upgrade to V3 of the SDK. It is generally better with concurrency, and while I can't guarantee it fixes this issue, it should make it a lot easier to narrow down with the more detailed stack traces.

@awood45
Copy link
Member

awood45 commented May 18, 2018

Closing soon if this remains inactive, I'd love to know if V3 doesn't resolve this.

@awood45 awood45 added the closing-soon This issue will automatically close in 4 days unless further comments are made. label May 18, 2018
@wjordan
Copy link
Author

wjordan commented May 18, 2018

We haven't upgraded to V3 yet, but I will follow up here when we do complete an upgrade. This is still affecting our production application- the last burst was 1,232 instances of this error on Apr 29, running version 2.10.79.

@awood45 awood45 closed this as completed Jun 22, 2018
@islemaster
Copy link

islemaster commented Sep 17, 2018

Hi there! This issue just occurred again for us using aws-sdk-core 3.22.1, aws-sdk-firehose 1.4.0. We saw a burst of 600 instances of this error in a 24-hour window this weekend, September 15-16.

We're currently investigating available gem upgrades to see if they help:

  • aws-sdk-firehose from 1.4.0 to 1.6.0
  • aws-sdk-core from 3.22.1 to 3.27.0

Here's a recent backtrace:

[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/connection_pool.rb:107 :in `block in session_for`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/connection_pool.rb:106 :in `synchronize`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/connection_pool.rb:106 :in `session_for`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/handler.rb:121 :in `session`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/handler.rb:73 :in `transmit`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/net_http/handler.rb:47 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/plugins/content_length.rb:12 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/json/error_handler.rb:8 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/signature_v4.rb:66 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:10 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/retry_errors.rb:138 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/json/handler.rb:11 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/user_agent.rb:13 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/plugins/endpoint.rb:45 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/param_validator.rb:24 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/plugins/raise_response_errors.rb:14 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:20 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/idempotency_token.rb:17 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/param_converter.rb:24 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/aws-sdk-core/plugins/response_paging.rb:10 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/plugins/response_target.rb:23 :in `call`
[GEM_ROOT]/gems/aws-sdk-core-3.22.1/lib/seahorse/client/request.rb:70 :in `send_request`
[GEM_ROOT]/gems/aws-sdk-firehose-1.4.0/lib/aws-sdk-firehose/client.rb:1040 :in `put_record`
[Application repo]/lib/cdo/firehose.rb:52 :in `put_record`

Firehose showing up in the backtrace is new, and we're investigating to see if there's something we're doing wrong in our application layer that might cause this, or any infrastructure-level events that might explain it. In the meantime we figured a backtrace from the v3 sdk might be helpful. Thank you!

@awood45 awood45 added Version 3 and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Sep 20, 2018
@awood45
Copy link
Member

awood45 commented Sep 20, 2018

Let's reopen and take a look then. One other thing that may be relevant, can you tell us a bit more about how you're doing concurrency in your application? It may help us to attempt to reproduce your issue.

@awood45 awood45 reopened this Sep 20, 2018
@srchase srchase added the closing-soon This issue will automatically close in 4 days unless further comments are made. label Oct 5, 2018
@srchase
Copy link
Contributor

srchase commented Oct 5, 2018

@islemaster

Do you have any additional details that would help us reproduce the issue?

@srchase
Copy link
Contributor

srchase commented Oct 22, 2018

Closing this issue. Happy to re-open if there are additional details to reproduce.

@srchase srchase closed this as completed Oct 22, 2018
@srchase srchase removed the closing-soon This issue will automatically close in 4 days unless further comments are made. label Oct 22, 2018
@n-ikhil
Copy link

n-ikhil commented Jun 16, 2022

We had the same issue, in aws-sdk-core (3.104.4)
We used passenger as web server with smart spawning (which is default). We suspected this issue is due to copy-on-write used in forking process. You can refer this link on how passenger does this. You can try using direct spawning if you have similar setup. In our case we made changes in our code, cleared all connection pool in fork hook and these errors stopped completely

@rottgoth
Copy link

Hi, I just started seeing this issue with version aws-sdk-core (3.180.3) while ramping up concurrency with puma.
This is the stack trace that we got:

at block in session_for(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/connection_pool.rb:111)
at synchronize(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/connection_pool.rb:110)
at session_for(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/connection_pool.rb:110)
at session(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/handler.rb:128)
at transmit(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/handler.rb:76)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/net_http/handler.rb:50)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/content_length.rb:24)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/request_callback.rb:87)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-dynamodb-1.93.1/lib/aws-sdk-dynamodb/plugins/crc32_validation.rb:32)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/json/error_handler.rb:10)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/sign.rb:49)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/transfer_encoding.rb:26)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/helpful_socket_errors.rb:12)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/retry_errors.rb:360)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/user_agent.rb:37)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/http_checksum.rb:19)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/endpoint_pattern.rb:30)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/checksum_algorithm.rb:136)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/request_compression.rb:94)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/json/handler.rb:13)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/recursion_detection.rb:18)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-dynamodb-1.93.1/lib/aws-sdk-dynamodb/plugins/endpoints.rb:41)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/endpoint_discovery.rb:84)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/endpoint.rb:47)
at block in call(/usr/local/bundle/ruby/3.0.0/gems/ddtrace-0.54.2/lib/ddtrace/contrib/aws/instrumentation.rb:21)
at trace(/usr/local/bundle/ruby/3.0.0/gems/ddtrace-0.54.2/lib/ddtrace/tracer.rb:283)
at call(/usr/local/bundle/ruby/3.0.0/gems/ddtrace-0.54.2/lib/ddtrace/contrib/aws/instrumentation.rb:20)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/param_validator.rb:26)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/logging.rb:41)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/raise_response_errors.rb:16)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-dynamodb-1.93.1/lib/aws-sdk-dynamodb/plugins/simple_attributes.rb:119)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/checksum_algorithm.rb:111)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/jsonvalue_converter.rb:16)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/idempotency_token.rb:19)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/param_converter.rb:26)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/request_callback.rb:71)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/aws-sdk-core/plugins/response_paging.rb:12)
at call(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/plugins/response_target.rb:24)
at send_request(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-core-3.180.3/lib/seahorse/client/request.rb:72)
at describe_table(/usr/local/bundle/ruby/3.0.0/gems/aws-sdk-dynamodb-1.93.1/lib/aws-sdk-dynamodb/client.rb:3008)

Are you still seeing this issue? Are there any ideas on how to fix it?

@yokonao
Copy link

yokonao commented Apr 29, 2024

We had encountered the same issue.

As a result of our investigation, I succeeded in writing a small reproduction code.

  • aws-sdk-core 3.170.0
  • aws-sdk-s3 1.106.0
  • aws-sdk-dynamodb 1.84.0
require 'aws-sdk-s3'
require 'aws-sdk-dynamodb'

Aws::S3::Client.new.list_buckets
puts Seahorse::Client::NetHttp::ConnectionPool.pools.first.instance_variable_get(:@pool)
# {"https://s3.ap-northeast-1.amazonaws.com"=>[#<Net::HTTP s3.ap-northeast-1.amazonaws.com:443 open=true>]}

Thread.new do
  pool = Seahorse::Client::NetHttp::ConnectionPool.pools.first
  loop { pool.size } # iterate Seahorse::Client::NetHttp::ConnectionPool @pool forever
end

sleep 0.1 # wait a miniute for the thread to start

fork do
  Aws::DynamoDB::Client.new.list_tables
  # => can't add a new key into hash during iteration (RuntimeError)
  # error occurs when trying `@pool["https://dynamodb.ap-northeast-1.amazonaws.com"] = []`
end

This program raises RuntimeError: can't add a new key into hash during iteration!

This may be due to a race condition between the parent and child process when using fork(2) system call.

How to fix? Please add a line the begging of the forked process for clearing the seahorse connection pools. Like following:

fork do
  Aws.empty_connection_pools!
  
  ...
end

@mullermp
Copy link
Contributor

Thanks for adding a reproduction for this. I understand this is some long standing bug. I was able to reproduce, but then fix, when I change the size method to not iterate (just make a call to @pool.values.flatten.size) However I assume this would still happen because there are other methods that iterate the pool with each_pair, such as empty! and _clean. My understanding is, since fork makes a copy of the mutex, synchronize does not help here. Fork may be copying the state during the iteration.

If it helps, I can push a fix to that size method, but I can't guarantee you will stop seeing this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.