Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use attributes from the active PG connection #185

Merged
merged 5 commits into from
Nov 29, 2022

Conversation

cbandy
Copy link
Contributor

@cbandy cbandy commented Nov 10, 2022

As discovered and addressed in #142, the pg gem and its underlying libpq library allow multiple hosts and network addresses in the connection string. The solution applied there, however, does not always return accurate information about the connected peer.

When there are multiple hosts or addresses, the library tries each in order until one succeeds. The target_session_attrs setting allows both primary and replica servers to be specified in the connection string, and the library will figure which is which. Always using the first value in the connection string ignores this stuff.

I don't see an explanation for why conninfo_hash was used in 463c14c, so perhaps I'm missing some benefit to it.

The separate methods (host, hostaddr, port, user, and db) describe the singular active connection from the resolved connection string (IPv4, IPv6, primary, replica, etc). I'm pretty sure we should use these when they're available.

There are a couple of older versions of libpq that allow multiple hosts/addresses but lack a hostaddr method. For those, conninfo_hash is accurate when it returns a single address. When it returns multiple addresses, we can use the first like #142 or omit net.peer.ip. The latter seems better to me, so that's what I've done in this PR.

TODO:

'net.peer.ip' => conninfo_hash[:hostaddr]&.to_s,
'net.peer.port' => first_in_list(conninfo_hash[:port]&.to_s)
'net.peer.ip' => transport_addr&.to_s,
'net.peer.port' => port&.to_s
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I'm curious about this line. Semantic Conventions say that peer.port should be an integer. Is to_s appropriate here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually recall what the reasons were for stringification here; which suggests that perhaps there was no valid reason and it was just a mistake. If you want to include a fix for this in the PR, that would be a welcome change. 😄

@ahayworth
Copy link
Contributor

This looks like a good change, I will happily review it once you're ready. I agree with everything outlined in the PR description, including the TODO you added. Thank you so much for this!


I don't see an explanation for why conninfo_hash was used in 463c14c, so perhaps I'm missing some benefit to it.

For context, this was contributed before we split off the -contrib repo, so the original PR that added this instrumentation is here: open-telemetry/opentelemetry-ruby#659

I don't recall having any specific reason for using conninfo_hash; if there are better ways to get that information than we should use it. I think it was just my best attempt at getting the right information since I am not an expert on the pg gem itself. We are most grateful for contributions that improve the quality of instrumentation in this way - we are certainly not experts and we're covering a very wide surface area in this repo. 😍

@cbandy
Copy link
Contributor Author

cbandy commented Nov 17, 2022

I haven't forgotten about this! let(:port) { nil } exposed a bug in the pg gem I'm waiting on now: ged/ruby-pg#491

The implementation here will consider versions of that gem with and without whatever fix comes.

@ahayworth
Copy link
Contributor

@cbandy looks like they just merged a fix, too!

@cbandy cbandy force-pushed the pg-attributes branch 2 times, most recently from 9c9ddd3 to 7cf8c7f Compare November 19, 2022 15:13
@cbandy
Copy link
Contributor Author

cbandy commented Nov 19, 2022

Rebased and ready to go. I've added a test verifying that active connection information is returned when using "multiple hosts."

IP address and port depend on the version of the gem and the underlying driver. The original implementation was correct to use conninfo_hash in the past. Recent versions have the desired behavior, so this detects and uses those.

I made three other changes to the transport attributes after reading the semantic conventions:

  1. net.peer.port is now an integer.
  2. net.transport is now ip_tcp when not using a Unix socket.
  3. net.transport is not set when using a Unix socket. Instead, net.sock.family is set to unix.

I wanted to fill out net.sock.peer.addr for the Unix socket case, but there isn't a good way to get that information from the gem or driver. I left net.peer.name set to the directory containing the socket, as we've done before.

@cbandy
Copy link
Contributor Author

cbandy commented Nov 19, 2022

Amended for Rubocop.

Copy link
Contributor

@ahayworth ahayworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No major issues, and I'm grateful that this PR is so thorough. And because you've been so thorough and helpful I have rewarded you with a little finicky comment to address before we merge it. 😆 ❤️ Great job catching the semantic convention changes, as well!

As a meta question, should we test multiple versions of the PG gem via the Appraisals file? I don't think I have a strong opinion either way.

def transport_addr
# The hostaddr method is available when the gem is built against
# a recent version of libpq.
return hostaddr if defined?(hostaddr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. I actually didn't know you could use defined? in that way (I thought this was maybe calling defined?(result-of-unintentional-hostaddr-call) which wouldn't be a good thing to rely on). But wouldn't you know it, it's right there in the docs 🤦

https://ruby-doc.org/docs/keywords/1.9/Object.html#method-i-defined-3F

Thank you, I learned something new today. 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I learned it during this PR as well. I saw it as part of module prepending: super if defined? super

In this case, we can use respond_to?. I wonder if one is definitely faster? 🤷🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean; either is probably fine. But my first meeting of the morning cancelled so that means its time for MicroBenchmark, Thanksgiving Edition 🦃

Looks like defined? may be a tiny bit faster? It probably doesn't have to go through things like method_missing and whatever. Here are the results:

/-------------------------------------------------------------\
|                  user     system      total        real     |
|    defined?:   0.094393   0.000956   0.095349 (  0.095374)  |
| respond_to?:   0.099900   0.001003   0.100903 (  0.100944)  |
\-------------------------------------------------------------/
   \
    \                                  ,+*^^*+___+++_
     \                           ,*^^^^              )
      \                       _+*                     ^**+_
       \                    +^       _ _++*+_+++_,         )
                _+^^*+_    (     ,+*^ ^          \+_        )
               {       )  (    ,(    ,_+--+--,      ^)      ^\
              { (@)    } f   ,(  ,+-^ __*_*_  ^^\_   ^\       )
             {:;-/    (_+*-+^^^^^+*+*<_ _++_)_    )    )      /
            ( /  (    (        ,___    ^*+_+* )   <    <      \
             U _/     )    *--<  ) ^\-----++__)   )    )       )
              (      )  _(^)^^))  )  )\^^^^^))^*+/    /       /
            (      /  (_))_^)) )  )  ))^^^^^))^^^)__/     +^^
           (     ,/    (^))^))  )  ) ))^^^^^^^))^^)       _)
            *+__+*       (_))^)  ) ) ))^^^^^^))^^^^^)____*^
            \             \_)^)_)) ))^^^^^^^^^^))^^^^)
             (_             ^\__^^^^^^^^^^^^))^^^^^^^)
               ^\___            ^\__^^^^^^))^^^^^^^^)\\
                    ^^^^^\uuu/^^\uuu/^^^^\^\^\^\^\^\^\^\
                       ___) >____) >___   ^\_\_\_\_\_\_\)
                      ^^^//\\_^^//\\_^       ^(\_\_\_\)
                        ^^^ ^^ ^^^ ^

@cbandy
Copy link
Contributor Author

cbandy commented Nov 23, 2022

As a meta question, should we test multiple versions of the PG gem via the Appraisals file? I don't think I have a strong opinion either way.

I'll try that out. I can use the RubyGems dataset at Honeycomb to pick some useful versions.

The pg gem and its underlying libpq library allow multiple hosts and
network addresses in the connection string. The conninfo_hash method
describes that parsed string, not the actual host or address that is
presently connected.

The separate methods (host, hostaddr, port, user, and db) describe the
singular connection that is active from that connection string.

The hostaddr method is not available in older versions of libpq, so we
continue to use the conninfo_hash method when it can reliably identify
the network address of the active connection.

The port method can fail in older versions of the gem, so we again
fallback to the parsed connection string when appropriate.
@cbandy
Copy link
Contributor Author

cbandy commented Nov 23, 2022

Amended host variable and added appraisals for some interesting pg versions:

>> BUNDLE_GEMFILE=/app/instrumentation/pg/gemfiles/pg_1.2.gemfile bundle list pg
/bundle/gems/pg-1.2.3
>> BUNDLE_GEMFILE=/app/instrumentation/pg/gemfiles/pg_1.3.gemfile bundle list pg
/bundle/gems/pg-1.3.5
>> BUNDLE_GEMFILE=/app/instrumentation/pg/gemfiles/pg_latest.gemfile bundle list pg
/bundle/gems/pg-1.4.5

@ahayworth
Copy link
Contributor

@cbandy Thanks so much for the back-and-forth on this. One more ✅ and I think we're good to merge. (cc @ericmustin since you've been looking)

@ericmustin
Copy link
Contributor

kicked the ci, i think you have to pull in #211 for things to pass. :shakes-fist-at-jruby:

@ahayworth ahayworth merged commit 207369a into open-telemetry:main Nov 29, 2022
@cbandy cbandy deleted the pg-attributes branch November 30, 2022 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants