Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite ConnectionPool for efficiency #328

Merged
merged 1 commit into from
Feb 23, 2016

Conversation

mattbrictson
Copy link
Member

This is a rewrite of SSHKit's ConnectionPool. The advantages of this version are:

  • Cleaner API
  • Less mutex locking
  • Asynchronous connection eviction/closing
  • Faster (hopefully)

Cleaner API

# Old API
entry = pool.checkout(host, username, &Net::SSH.method(:start))
begin
  # do stuff with entry.connection
ensure
  pool.checkin(entry)
end

# New API
pool.with(Net::SSH.method(:start), host, username) do |connection|
  # do stuff with connection
end

Less mutex locking

Previously, every checkin and checkout would use a mutex to lock the entire pool every time. Now, each unique connection key has its own separate mutex, so any locking is done per host, and not for the entire pool.

Asynchronous connection eviction/closing

Previously, every checkin and checkout would trigger an expiration check for all connections in the pool. Thus every thread must potentially wait for connections used by other threads to be cleaned up and closed.

Now this logic is removed. Eviction happens in a separate background thread periodically sweeps the connections to identify and close stale ones.

Faster (hopefully)

In my testing the new pool is faster, but only marginally so. However this is in a artificial environment using sleep instead of real SSH commands.

@byroot – Can you test this branch and let me know how it performs against real servers?

TODO:

  • Fix skipped unit tests
  • CHANGELOG

@mattbrictson
Copy link
Member Author

This PR is related to #326.


private
def cache_enabled?
idle_timeout && idle_timeout > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we're back to either disable the pooling entirely, or to have to prune connection?

I don't know if I'm alone, but in our case a full deploy take about 4 to 5 minutes and we do it from inside the datacenter, so pruning idle connections is really not a concern.

Any way we could disable it without disabling pooling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My goal was to improve efficiency across the board, rather than add conditionals as workarounds. The point is that If you just set idle_timeout to a high value, you shouldn't pay any performance penalty, because a mutex is no longer being used for pruning.

But, like I said, the proof will be in the numbers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we'll see. I didn't have time to test yet.

@byroot
Copy link
Contributor

byroot commented Feb 8, 2016

So the benchmark is fairly similar to before the change:
capture d ecran 2016-02-07 a 20 23 40

As for the profiles, cold:

==================================
  Mode: wall(1000)
  Samples: 2642 (8.61% miss rate)
  GC: 278 (10.52%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
       312  (11.8%)         304  (11.5%)     SSHKit::Runner::Parallel#execute
       228   (8.6%)         228   (8.6%)     block (3 levels) in Net::SSH::KnownHosts#keys_for
       413  (15.6%)         163   (6.2%)     block (2 levels) in Net::SSH::KnownHosts#keys_for
       173   (6.5%)         111   (4.2%)     Net::SSH::Transport::Kex::DiffieHellmanGroup1SHA1#send_kexinit
       193   (7.3%)         106   (4.0%)     OpenSSL::PKey::DH#valid?
        89   (3.4%)          89   (3.4%)     Net::SSH::Compat.io_select
        87   (3.3%)          87   (3.3%)     block in OpenSSL::PKey::DH#valid?
        94   (3.6%)          62   (2.3%)     block in Net::SSH::Config.load
        59   (2.2%)          59   (2.2%)     block (2 levels) in Net::SSH::Transport::ServerVersion#negotiate!
        62   (2.3%)          52   (2.0%)     Net::SSH::Buffer#read
        75   (2.8%)          40   (1.5%)     Net::SSH::BufferedIo#fill
        39   (1.5%)          39   (1.5%)     block (2 levels) in Net::SSH::Connection::Channel#forward_local_env
       127   (4.8%)          33   (1.2%)     Net::SSH::Config.load
        32   (1.2%)          32   (1.2%)     Logger#debug?
        38   (1.4%)          32   (1.2%)     Net::SSH::Transport::State#update_next_iv
        55   (2.1%)          31   (1.2%)     Net::SSH::BufferedIo#send_pending
        29   (1.1%)          29   (1.1%)     Net::SSH::Buffer#initialize
        24   (0.9%)          24   (0.9%)     Net::SSH::Transport::HMAC::Abstract.mac_length
        24   (0.9%)          24   (0.9%)     Net::SSH::Transport::HMAC::Abstract.digest_class
       447  (16.9%)          22   (0.8%)     Net::SSH::KnownHosts#keys_for
        22   (0.8%)          22   (0.8%)     Net::SSH::Buffer#write_long
        22   (0.8%)          22   (0.8%)     Net::SSH::BufferedIo#input
        21   (0.8%)          21   (0.8%)     Net::SSH::Buffer#length
       186   (7.0%)          21   (0.8%)     Net::SSH::Transport::PacketStream#poll_next_packet
        20   (0.8%)          20   (0.8%)     Net::SSH::Config.pattern2regex
        19   (0.7%)          19   (0.7%)     Net::SSH::KnownHosts#known_host_hash?
        97   (3.7%)          19   (0.7%)     Net::SSH::Buffer.from
        19   (0.7%)          19   (0.7%)     Addrinfo#connect_internal
        45   (1.7%)          18   (0.7%)     Net::SSH::Buffer#read_long
        74   (2.8%)          18   (0.7%)     block in Net::SSH::Buffer.from

Warm:

==================================
  Mode: wall(1000)
  Samples: 3260 (1.81% miss rate)
  GC: 350 (10.74%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
       383  (11.7%)         383  (11.7%)     block (2 levels) in Net::SSH::Connection::Channel#forward_local_env
       331  (10.2%)         331  (10.2%)     Net::SSH::Compat.io_select
       116   (3.6%)         102   (3.1%)     Net::SSH::Buffer#read
       100   (3.1%)         100   (3.1%)     Net::SSH::Transport::HMAC::Abstract.digest_class
       100   (3.1%)         100   (3.1%)     Net::SSH::Transport::HMAC::Abstract.mac_length
       133   (4.1%)          80   (2.5%)     Net::SSH::BufferedIo#send_pending
       139   (4.3%)          74   (2.3%)     Net::SSH::BufferedIo#fill
        71   (2.2%)          71   (2.2%)     Net::SSH::Transport::State#update_next_iv
        80   (2.5%)          71   (2.2%)     SSHKit::Runner::Parallel#execute
        57   (1.7%)          57   (1.7%)     Net::SSH::Buffer#initialize
        55   (1.7%)          55   (1.7%)     Net::SSH::BufferedIo#output
        50   (1.5%)          50   (1.5%)     Logger#debug?
       628  (19.3%)          48   (1.5%)     Net::SSH::Transport::PacketStream#poll_next_packet
        94   (2.9%)          47   (1.4%)     Net::SSH::Transport::State#update_cipher
        46   (1.4%)          46   (1.4%)     block in Net::SSH::Transport::PacketStream#enqueue_packet
        45   (1.4%)          45   (1.4%)     Net::SSH::Buffer#write_long
       909  (27.9%)          44   (1.3%)     block in Net::SSH::Transport::Session#poll_message
        43   (1.3%)          43   (1.3%)     Net::SSH::BufferedIo#input
        41   (1.3%)          41   (1.3%)     block (3 levels) in <class:Digest>
        80   (2.5%)          40   (1.2%)     Net::SSH::Buffer#read_long
        40   (1.2%)          40   (1.2%)     Net::SSH::Buffer#length
        38   (1.2%)          38   (1.2%)     #<Module:0x007fad251d6c60>.reporter
       170   (5.2%)          36   (1.1%)     block in Net::SSH::Connection::Session#postprocess
        34   (1.0%)          34   (1.0%)     SSHKit::Color#colorize?
       107   (3.3%)          32   (1.0%)     block in Net::SSH::Buffer.from
        31   (1.0%)          31   (1.0%)     Net::SSH::Buffer#to_s
        31   (1.0%)          31   (1.0%)     Net::SSH::Buffer#append
        30   (0.9%)          30   (0.9%)     Net::SSH::Connection::Channel#active?
      2739  (84.0%)          29   (0.9%)     Net::SSH::Connection::Session#dispatch_incoming_packets
        28   (0.9%)          28   (0.9%)     Net::SSH::Connection::Channel#[]

So I guess there wasn't much contention on that mutex in the first place. I would 👍 that change though, because if someone happen to use cap with an alternative implementation that might make a big difference.

@leehambley
Copy link
Member

👍

@mattbrictson
Copy link
Member Author

Thanks for the review! It is nice to see that SSHKit is no longer a hot spot in the profile output, even if this doesn't make much difference in terms of wall time.

There are two skipped unit tests that I need to fix before this can be merged, as well as edits to the CHANGELOG. I'll try to get those wrapped up this week.

@mattbrictson mattbrictson force-pushed the connection-pool-rewrite branch from e76e556 to 69665a9 Compare February 12, 2016 22:08
@mattbrictson mattbrictson changed the title WIP: rewrite ConnectionPool for efficiency Rewrite ConnectionPool for efficiency Feb 12, 2016
@mattbrictson mattbrictson force-pushed the connection-pool-rewrite branch from 69665a9 to 9dd807c Compare February 12, 2016 22:09
@mattbrictson
Copy link
Member Author

I've rebased and squash this to a single commit, added a detailed commit message, and updated CHANGELOG and README.

@leehambley This is ready for final review and merge.

@mattbrictson mattbrictson force-pushed the connection-pool-rewrite branch from 9dd807c to 72cf0b4 Compare February 22, 2016 22:24
@mattbrictson
Copy link
Member Author

@leehambley This has been rebased and is ready to merge.

This is a rewrite of SSHKit's ConnectionPool. The advantages of this version
are:

* Cleaner API
* Less mutex locking
* Asynchronous connection eviction/closing
* Faster (slightly)

Cleaner API
===========

 # Old API
entry = pool.checkout(host, username, &Net::SSH.method(:start))
begin
  # do stuff with entry.connection
ensure
  pool.checkin(entry)
end

 # New API
pool.with(Net::SSH.method(:start), host, username) do |connection|
  # do stuff with connection
end

Less mutex locking
==================

Previously, every checkin and checkout would use a mutex to lock the entire pool
every time. Now, each unique connection key has its own separate mutex, so any
locking is done per host, and not for the entire pool.

Asynchronous connection eviction/closing
========================================

Previously, every checkin and checkout would trigger an expiration check for all
connections in the pool. Thus every thread must potentially wait for connections
used by other threads to be cleaned up and closed.

Now this logic is removed. Eviction happens in a separate background thread
periodically sweeps the connections to identify and close stale ones.

Faster (slightly)
=================

Profile before rewrite:

TOTAL    (pct)     SAMPLES    (pct)     FRAME
   54  (12.0%)          54  (12.0%)     SSHKit::Backend::ConnectionPool::Entry#expired?
   50  (11.1%)          50  (11.1%)     Net::SSH::Compat.io_select
   28   (6.2%)          28   (6.2%)     block (2 levels) in Net::SSH::Connection::Channel#forward_local_env
   11   (2.4%)          11   (2.4%)     Net::SSH::Transport::HMAC::Abstract.mac_length
   11   (2.4%)          10   (2.2%)     Net::SSH::Buffer#read
   22   (4.9%)          10   (2.2%)     Net::SSH::BufferedIo#fill
    9   (2.0%)           9   (2.0%)     Net::SSH::Transport::State#update_next_iv
   12   (2.7%)           9   (2.0%)     Net::SSH::BufferedIo#send_pending
   63  (14.0%)           8   (1.8%)     block (2 levels) in SSHKit::Backend::ConnectionPool#prune_expired
    8   (1.8%)           8   (1.8%)     Net::SSH::Buffer#initialize
    7   (1.6%)           7   (1.6%)     Net::SSH::Buffer#append

After rewrite:

TOTAL    (pct)     SAMPLES    (pct)     FRAME
  383  (11.7%)         383  (11.7%)     block (2 levels) in Net::SSH::Connection::Channel#forward_local_env
  331  (10.2%)         331  (10.2%)     Net::SSH::Compat.io_select
  116   (3.6%)         102   (3.1%)     Net::SSH::Buffer#read
  100   (3.1%)         100   (3.1%)     Net::SSH::Transport::HMAC::Abstract.digest_class
  100   (3.1%)         100   (3.1%)     Net::SSH::Transport::HMAC::Abstract.mac_length
  133   (4.1%)          80   (2.5%)     Net::SSH::BufferedIo#send_pending
  139   (4.3%)          74   (2.3%)     Net::SSH::BufferedIo#fill
   71   (2.2%)          71   (2.2%)     Net::SSH::Transport::State#update_next_iv
   80   (2.5%)          71   (2.2%)     SSHKit::Runner::Parallel#execute
   57   (1.7%)          57   (1.7%)     Net::SSH::Buffer#initialize
   55   (1.7%)          55   (1.7%)     Net::SSH::BufferedIo#output

Note that ConnectionPool is no longer listed.
@mattbrictson mattbrictson force-pushed the connection-pool-rewrite branch from 72cf0b4 to 6de8614 Compare February 22, 2016 22:27
leehambley added a commit that referenced this pull request Feb 23, 2016
@leehambley leehambley merged commit 9249e7e into capistrano:master Feb 23, 2016
@leehambley
Copy link
Member

Thanks @mattbrictson

jsonn pushed a commit to jsonn/pkgsrc that referenced this pull request Oct 17, 2016
## [1.11.3][] (2016-09-16)

  * Fix known_hosts caching to match on the entire hostlist
    [PR #364](capistrano/sshkit#364) @byroot

## [1.11.2][] (2016-07-29)

### Bug fixes

  * Fixed a crash occurring when `Host@keys` was set to a non-Enumerable.
    @xavierholt [PR #360](capistrano/sshkit#360)

## [1.11.1][] (2016-06-17)

### Bug fixes

  * Fixed a regression in 1.11.0 that would cause
    `ArgumentError: invalid option(s): known_hosts` in some older versions of
    net-ssh. @byroot [#357](capistrano/sshkit#357)

## [1.11.0][] (2016-06-14)

### Bug fixes

  * Fixed colorized output alignment in Logger::Pretty. @xavierholt
    [PR #349](capistrano/sshkit#349)
  * Fixed a bug that prevented nested `with` calls
    [#43](capistrano/sshkit#43)

### Other changes

  * Known hosts lookup optimization is now enabled by default. @byroot

## 1.10.0 (2016-04-22)

  * You can now opt-in to caching of SSH's known_hosts file for a speed boost
    when deploying to a large fleet of servers. Refer to the
    [README](https://github.com/capistrano/sshkit/tree/v1.10.0#known-hosts-caching) for
    details. We plan to turn this on by default in a future version of SSHKit.
    [PR #330](capistrano/sshkit#330) @byroot
  * SSHKit now explicitly closes its pooled SSH connections when Ruby exits;
    this fixes `zlib(finalizer): the stream was freed prematurely` warnings
    [PR #343](capistrano/sshkit#343) @mattbrictson
  * Allow command map entries (`SSHKit::CommandMap#[]`) to be Procs
    [PR #310](capistrano/sshkit#310)
    @mikz

## 1.9.0

**Refer to the 1.9.0.rc1 release notes for a full list of new features, fixes,
and potentially breaking changes since SSHKit 1.8.1.** There are no changes
since 1.9.0.rc1.

## 1.9.0.rc1

### Potentially breaking changes

  * The SSHKit DSL is no longer automatically included when you `require` it.
    **This means you  must now explicitly `include SSHKit::DSL`.**
    See [PR #219](capistrano/sshkit#219) for details.
    @beatrichartz
  * `SSHKit::Backend::Printer#test` now always returns true
    [PR #312](capistrano/sshkit#312) @mikz

### New features

  * `SSHKit::Formatter::Abstract` now accepts an optional Hash of options
    [PR #308](capistrano/sshkit#308) @mattbrictson
  * Add `SSHKit::Backend.current` so that Capistrano plugin authors can refactor
    helper methods and still have easy access to the currently-executing Backend
    without having to use global variables.
  * Add `SSHKit.config.default_runner` options that allows to override default command runner.
    This option also accepts a name of the custom runner class.
  * The ConnectionPool has been rewritten in this release to be more efficient
    and have a cleaner internal API. You can still completely disable the pool
    by setting `SSHKit::Backend::Netssh.pool.idle_timeout = 0`.
    @mattbrictson @byroot [PR #328](capistrano/sshkit#328)

### Bug fixes

  * make sure working directory for commands is properly cleared after `within` blocks
    [PR #307](capistrano/sshkit#307)
    @steved
  * display more accurate string for commands with spaces being output in `Formatter::Pretty`
    [PR #304](capistrano/sshkit#304)
    @steved
    [PR #319](capistrano/sshkit#319) @mattbrictson
  * Fix a race condition experienced in JRuby that could cause multi-server
    deploys to fail. [PR #322](capistrano/sshkit#322)
    @mattbrictson
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants