-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add transport socket wrapper to collect OS-level TCP stats and export as envoy metrics #17682
Conversation
Signed-off-by: Greg Greenway <[email protected]>
This is useful for things like percents, which should be in the range 0.0-1.0. Signed-off-by: Greg Greenway <[email protected]>
Signed-off-by: Greg Greenway <[email protected]>
Signed-off-by: Greg Greenway <[email protected]>
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
cc @jmarantz. I found that integer histograms made it ugly to deal with a percent metric. I had to make it in the range 0-10000 to get enough precision in the 0-5% range, where retransmissions are most of the time, which requires the receiver of the metric to know what the range is, and multiply/divide by a known factor to put it into human-readable range. Do you think adding a floating-point histogram type is the right way to solve this, or is there some better way to approach this. Also, if we go this route, should I split out the histogram changes into a separate PR? |
cc @mattklein123. WDYT of the changes I had to make on |
* amount of boilerplate. | ||
*/ | ||
template <class ConfigProto, class ProtocolOptionsProto = ConfigProto> | ||
class UpstreamFactoryBase : public Server::Configuration::NamedUpstreamNetworkFilterConfigFactory { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the first upstream network filter; adding the boiler-plate-reducer, similar to the one for downstream network filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I dropped a few high level comments for thinking.
/wait
option (udpa.annotations.file_status).package_version_status = ACTIVE; | ||
|
||
// [#protodoc-title: Linux Network Stats] | ||
// Echo :ref:`configuration overview <config_network_filters_linuxnetworkstats>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
envoy/stats/histogram.h
Outdated
@@ -117,6 +117,8 @@ class Histogram : public Metric { | |||
* Records an unsigned value in the unit specified during the construction. | |||
*/ | |||
virtual void recordValue(uint64_t value) PURE; | |||
|
|||
virtual void recordFloatValue(double value) PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc comments please
envoy/network/connection.h
Outdated
/** | ||
* Calls `getsockopt` on the underlying socket. | ||
*/ | ||
virtual Api::SysCallIntResult getSocketOption(int level, int optname, void* optval, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I don't love this either. A few options for thought:
- At minimum, I would make this a const method, even if it requires const_cast internally to implement, as I agree this is logically a const method.
- One other option would to actually return some type of generic connection info struct. Internally this would be mapped from the linux TCP_INFO struct, and in all other cases just return an empty absl::optional<>. If you went down this path, you could call the filter
connection_info
and just have it be documented that it currently only works on Linux. I like this option as eventually I could see this getting implemented on Windows and it would avoid a bunch of churn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the not-too-distant future, I want to implement support in envoy for MPTCP (which is currently being added to newer linux kernels). It has a similar API for getsockopt(fd, MPTCP_INFO, ...)
, which could use the same API if we leave it like this.
I agree with making it const if we use this api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference is still to provide a generic info API that maps between Linux specific data to a generic struct, and just leave it not implemented on non-Linux.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea that just occurred to me: maybe this should be a transport_socket instead of a network filter. It could use PassthroughSocket
, wrap another "real" transport socket, and I think it gets all the correct calls (such as connection being closed before it is closed). And it has direct access to the ioHandle() which already exposes getOption()
which wraps getsockopt()
. That would avoid any changes to the connection interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure that would work also. As I said I still think some level of abstraction here would not be very difficult and make it easy to support on other OS later, but ultimately that's up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at it briefly, looking at the stats supported by windows. They are similar, but different enough that it would be hard to abstract. For instance, linux tracks retransmissions by segments, and windows tracks them by bytes. Mac does both. Linux gives a metric for the minimum RTT over some undefined period of time (some complicated window algorithm). Mac gives the average RTT. Windows gives the minRTT over the lifetime of the connection.
So similar information can be derived, but I'm worried that it wouldn't be close enough to call them the same stat name, because a time series containing multiple platforms wouldn't make sense then. For all of these metrics, they're being measured in an OS-dependent way, so I'm worried that trying to make this cross-platform just ends up with weirdness in the platform differences.
On the other hand, they might be close enough to be useful in the same way. The metrics are at least advertised to be the same in some cases.
Either way, I think doing a transport socket causes less ugliness on the Connection interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to make it something abstracted, should it include quic? All the same metrics roughly apply to quic as well logically (though I haven't looked at what metrics quiche gives us access to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it would be optimal to include QUIC. I guess my advise to abstract only as much as you need for your current case and we can go from there?
RE histograms: if we can get enough dynamic range without changing the basic data type that seems better. I'm not sure what multipliers you are thinking of, but it certainly seems reasonable to add new Unit choices if that helps. +1 for coming up with some platform-agnostic abstraction for the sys-call results. |
Can definitely get enough dynamic range from uint64. The problem is pushing the value out in meaingful units. If I multiple the 0-1 value by 10,000, I think that's enough dynamic range. But then I need a way when publishing to divide by 10,000 and convert to float, to publish a sane range. I guess one way of doing that would be to add a new unit of Percent_10000, and when publishing a Percent_10000, convert to float and divide by 10000 at that time to publish in the correct range. Would that be more desirable? Does any of that code look at the units when writing out the underlying stat format (statsd, prometheus, etc)? |
@jmarantz one more question: should we publish a percent metric as 0-1, or 0-100? I don't have a strong preference. I searched and it looks like there is only 1 other metric that is a percent (percent of timeout budget used by http request), and that publishes 0-100. Prometheus docs say to use 0-1 for percents, but those docs say to use base units for everything (like seconds instead of milliseconds), and we don't follow that guidance hardly anywhere. I don't have any preference on 0-1 vs 0-100, as long as there is enough significant digits in whatever we choose to get the data we want. But I am against publishing a percent as 0-10000 in order to get the precision. That's making consuming the metric more difficult than it should be. |
I don't know that much about the various stats sinks deal with units, but I could do a survey and render an opinion. I am not altogether against changing the rep to use a float or double but feel like that might have unintended consequence we might not be sufficiently testing for. FWIW the term "basis points" is used in finance to denote 100ths of a percent: https://www.investopedia.com/terms/b/basispoint.asp -- would using that somewhat standard terminology help? |
Prometheus natively supports fractional values for most things, and this is encouraged (eg everything is supposed to be in seconds, even if the expected values are much less than 1 second). Statsd seems to encode just enough information for the database to know how to store them, and has some strange conventions, but the docs seem to say that fractional values are supported. I think it's generally up to the receiver to know what the units are. No idea for the others.
I certainly don't want to change existing stats to use floating point numbers; it is substantially slower to deal with them. I think it may make the most sense to encode some units in the metric, have the raw stored value be fixed-point, and convert to something sensible for the stat-sink in the output path. There will be some complication in that: for prometheus, we support configurable histogram buckets. We use those to call
I'm familiar with the term; not sure how familiar others would be. |
Re: cross-platform things
|
Kind of. The problem is when the different platforms don't measure the same thing. If a platform doesn't measure something at all, we can just omit the stat. But what if two platforms measure similar things, but they're not quite the same? Or they're trying to measure the same thing, but with different methodology that gives meaningfully different results? Do we create different stat names for the two platforms, with very similar names, or just use the same name? |
I think that putting a prefix with the platform name in the stats might be useful regardless. I think we can add the prefix for all the stats. I dont have any customer data at this point but I could see how someone who has a mixed OS cluster might be interested in TCP perf metric per platform. |
This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions! |
I'll get back to this eventually; got busy with other things for awhile. |
envoy/stats/histogram.h
Outdated
@@ -104,8 +104,16 @@ class Histogram : public Metric { | |||
Bytes, | |||
Microseconds, | |||
Milliseconds, | |||
Percent, // A percent value stored as fixed-point, where the stored value is divided by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we go with this can we split this out into an independent change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. But I wanted to iterate on these together at first to make sure the stats API we add is right for the use case.
Signed-off-by: Greg Greenway <[email protected]>
Signed-off-by: Greg Greenway <[email protected]>
Signed-off-by: Greg Greenway <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool. LGTM with just a couple of small questions.
/wait-any
|
||
// Period to update stats while the connection is open. If unset, updates only happen when the | ||
// connection is closed. Stats are always updated one final time when the connection is closed. | ||
google.protobuf.Duration update_period = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should have some minimum sensible time here if this is set? wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should be a minimum of 1ms, because we can't set timers for less than that. Other than that, if someone wants to burn CPU by querying this almost constantly, and getting really high granularity, I'm fine with letting them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK maybe add an annotation for this then? Otherwise LGTM.
/wait
void closeSocket(Network::ConnectionEvent event) override; | ||
|
||
private: | ||
absl::optional<struct tcp_info> querySocketInfo(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably missing something but how does this compile? Doesn't this require knowing the size of tcp_info? I'm concerned that somehow this is using the wrong version with the missing fields? Otherwise is the forward decl needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. It may be that the compiler doesn't care if there's no caller of the function. And since it is private, the only caller is in tcp_stats.cc, which has the definition before the function is called. I don't know. C++ is weird and complicated.
Signed-off-by: Greg Greenway <[email protected]>
Signed-off-by: Greg Greenway <[email protected]>
Signed-off-by: Greg Greenway <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
Signed-off-by: Greg Greenway <[email protected]>
Signed-off-by: Greg Greenway <[email protected]>
Commit Message: This adds a new network filter for collecting linux-specific TCP stats and publishing them as envoy metrics, with a configurable stat_prefix so that stats can be attributed to a specific listener or cluster.
Commit Message: This adds a new transport socket wrapper for collecting OS-level TCP stats and publishing them as envoy metrics, in either the cluster or listener stats namespace (depending on upstream or downstream use). This allows attributing the TCP stats to specific listeners and clusters.
Additional Description:
Risk Level: Low
Testing: Added unit and integration tests; manually tested to make sure results look sane.
Docs Changes: Added docs for new stats and transport socket
Release Notes: Added
Platform Specific Features: Currently only supports Linux; other platform support can be added as needed
[Optional Runtime guard:]
Fixes #17617
[Optional Deprecated:]
[Optional API Considerations:]