Add transport socket wrapper to collect OS-level TCP stats and export as envoy metrics #17682

ggreenway · 2021-08-11T18:34:08Z

Commit Message: This adds a new network filter for collecting linux-specific TCP stats and publishing them as envoy metrics, with a configurable stat_prefix so that stats can be attributed to a specific listener or cluster.
Commit Message: This adds a new transport socket wrapper for collecting OS-level TCP stats and publishing them as envoy metrics, in either the cluster or listener stats namespace (depending on upstream or downstream use). This allows attributing the TCP stats to specific listeners and clusters.
Additional Description:
Risk Level: Low
Testing: Added unit and integration tests; manually tested to make sure results look sane.
Docs Changes: Added docs for new stats and transport socket
Release Notes: Added
Platform Specific Features: Currently only supports Linux; other platform support can be added as needed
[Optional Runtime guard:]
Fixes #17617
[Optional Deprecated:]
[Optional API Considerations:]

Signed-off-by: Greg Greenway <[email protected]>

This is useful for things like percents, which should be in the range 0.0-1.0. Signed-off-by: Greg Greenway <[email protected]>

Signed-off-by: Greg Greenway <[email protected]>

repokitteh-read-only · 2021-08-11T18:34:12Z

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #17682 was opened by ggreenway.

see: more, trace.

repokitteh-read-only · 2021-08-11T18:34:17Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
envoyproxy/api-shepherds assignee is @mattklein123
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #17682 was opened by ggreenway.

see: more, trace.

ggreenway · 2021-08-11T18:36:28Z

cc @jmarantz. I found that integer histograms made it ugly to deal with a percent metric. I had to make it in the range 0-10000 to get enough precision in the 0-5% range, where retransmissions are most of the time, which requires the receiver of the metric to know what the range is, and multiply/divide by a known factor to put it into human-readable range. Do you think adding a floating-point histogram type is the right way to solve this, or is there some better way to approach this. Also, if we go this route, should I split out the histogram changes into a separate PR?

ggreenway · 2021-08-11T18:37:57Z

cc @mattklein123. WDYT of the changes I had to make on Connection to make this work? I'm not sure I love exposing getsockopt on every Connection, but it's const (logically) so hopefully shouldn't cause any issues?

ggreenway · 2021-08-11T18:43:16Z

source/extensions/filters/network/common/factory_base.h

+ * amount of boilerplate.
+ */
+template <class ConfigProto, class ProtocolOptionsProto = ConfigProto>
+class UpstreamFactoryBase : public Server::Configuration::NamedUpstreamNetworkFilterConfigFactory {


I think this is the first upstream network filter; adding the boiler-plate-reducer, similar to the one for downstream network filters.

mattklein123

Thanks for working on this. I dropped a few high level comments for thinking.

/wait

mattklein123 · 2021-08-12T16:58:07Z

api/envoy/extensions/filters/network/linux_network_stats/v3/linux_network_stats.proto

+option (udpa.annotations.file_status).package_version_status = ACTIVE;
+
+// [#protodoc-title: Linux Network Stats]
+// Echo :ref:`configuration overview <config_network_filters_linuxnetworkstats>`.


mattklein123 · 2021-08-12T16:59:22Z

envoy/stats/histogram.h

@@ -117,6 +117,8 @@ class Histogram : public Metric {
   * Records an unsigned value in the unit specified during the construction.
   */
  virtual void recordValue(uint64_t value) PURE;
+
+  virtual void recordFloatValue(double value) PURE;


doc comments please

mattklein123 · 2021-08-12T17:09:20Z

envoy/network/connection.h

+  /**
+   * Calls `getsockopt` on the underlying socket.
+   */
+  virtual Api::SysCallIntResult getSocketOption(int level, int optname, void* optval,


Yeah I don't love this either. A few options for thought:

At minimum, I would make this a const method, even if it requires const_cast internally to implement, as I agree this is logically a const method.

One other option would to actually return some type of generic connection info struct. Internally this would be mapped from the linux TCP_INFO struct, and in all other cases just return an empty absl::optional<>. If you went down this path, you could call the filter connection_info and just have it be documented that it currently only works on Linux. I like this option as eventually I could see this getting implemented on Windows and it would avoid a bunch of churn.

In the not-too-distant future, I want to implement support in envoy for MPTCP (which is currently being added to newer linux kernels). It has a similar API for getsockopt(fd, MPTCP_INFO, ...), which could use the same API if we leave it like this.

I agree with making it const if we use this api.

My preference is still to provide a generic info API that maps between Linux specific data to a generic struct, and just leave it not implemented on non-Linux.

Another idea that just occurred to me: maybe this should be a transport_socket instead of a network filter. It could use PassthroughSocket, wrap another "real" transport socket, and I think it gets all the correct calls (such as connection being closed before it is closed). And it has direct access to the ioHandle() which already exposes getOption() which wraps getsockopt(). That would avoid any changes to the connection interface.

Sure that would work also. As I said I still think some level of abstraction here would not be very difficult and make it easy to support on other OS later, but ultimately that's up to you.

I looked at it briefly, looking at the stats supported by windows. They are similar, but different enough that it would be hard to abstract. For instance, linux tracks retransmissions by segments, and windows tracks them by bytes. Mac does both. Linux gives a metric for the minimum RTT over some undefined period of time (some complicated window algorithm). Mac gives the average RTT. Windows gives the minRTT over the lifetime of the connection.

So similar information can be derived, but I'm worried that it wouldn't be close enough to call them the same stat name, because a time series containing multiple platforms wouldn't make sense then. For all of these metrics, they're being measured in an OS-dependent way, so I'm worried that trying to make this cross-platform just ends up with weirdness in the platform differences.

On the other hand, they might be close enough to be useful in the same way. The metrics are at least advertised to be the same in some cases.

Either way, I think doing a transport socket causes less ugliness on the Connection interface.

If we want to make it something abstracted, should it include quic? All the same metrics roughly apply to quic as well logically (though I haven't looked at what metrics quiche gives us access to).

Yes I think it would be optimal to include QUIC. I guess my advise to abstract only as much as you need for your current case and we can go from there?

jmarantz · 2021-08-12T20:34:25Z

RE histograms: if we can get enough dynamic range without changing the basic data type that seems better. I'm not sure what multipliers you are thinking of, but it certainly seems reasonable to add new Unit choices if that helps.

+1 for coming up with some platform-agnostic abstraction for the sys-call results.

ggreenway · 2021-08-12T20:57:20Z

RE histograms: if we can get enough dynamic range without changing the basic data type that seems better. I'm not sure what multipliers you are thinking of, but it certainly seems reasonable to add new Unit choices if that helps.

+1 for coming up with some platform-agnostic abstraction for the sys-call results.

Can definitely get enough dynamic range from uint64. The problem is pushing the value out in meaingful units. If I multiple the 0-1 value by 10,000, I think that's enough dynamic range. But then I need a way when publishing to divide by 10,000 and convert to float, to publish a sane range.

I guess one way of doing that would be to add a new unit of Percent_10000, and when publishing a Percent_10000, convert to float and divide by 10000 at that time to publish in the correct range. Would that be more desirable? Does any of that code look at the units when writing out the underlying stat format (statsd, prometheus, etc)?

ggreenway · 2021-08-12T21:28:32Z

@jmarantz one more question: should we publish a percent metric as 0-1, or 0-100? I don't have a strong preference. I searched and it looks like there is only 1 other metric that is a percent (percent of timeout budget used by http request), and that publishes 0-100. Prometheus docs say to use 0-1 for percents, but those docs say to use base units for everything (like seconds instead of milliseconds), and we don't follow that guidance hardly anywhere.

I don't have any preference on 0-1 vs 0-100, as long as there is enough significant digits in whatever we choose to get the data we want.

But I am against publishing a percent as 0-10000 in order to get the precision. That's making consuming the metric more difficult than it should be.

jmarantz · 2021-08-12T22:42:02Z

I don't know that much about the various stats sinks deal with units, but I could do a survey and render an opinion.

I am not altogether against changing the rep to use a float or double but feel like that might have unintended consequence we might not be sufficiently testing for.

FWIW the term "basis points" is used in finance to denote 100ths of a percent: https://www.investopedia.com/terms/b/basispoint.asp -- would using that somewhat standard terminology help?

ggreenway · 2021-08-12T23:05:56Z

I don't know that much about the various stats sinks deal with units, but I could do a survey and render an opinion.

Prometheus natively supports fractional values for most things, and this is encouraged (eg everything is supposed to be in seconds, even if the expected values are much less than 1 second).

Statsd seems to encode just enough information for the database to know how to store them, and has some strange conventions, but the docs seem to say that fractional values are supported. I think it's generally up to the receiver to know what the units are.

No idea for the others.

I am not altogether against changing the rep to use a float or double but feel like that might have unintended consequence we might not be sufficiently testing for.

I certainly don't want to change existing stats to use floating point numbers; it is substantially slower to deal with them.

I think it may make the most sense to encode some units in the metric, have the raw stored value be fixed-point, and convert to something sensible for the stat-sink in the output path.

There will be some complication in that: for prometheus, we support configurable histogram buckets. We use those to call hist_approx_count_below(). We'll need to scale the buckets here too; not unmanageable, but not free.

FWIW the term "basis points" is used in finance to denote 100ths of a percent: https://www.investopedia.com/terms/b/basispoint.asp -- would using that somewhat standard terminology help?

I'm familiar with the term; not sure how familiar others would be.

davinci26 · 2021-08-12T23:26:44Z

Re: cross-platform things

Config/AP: I would rather if the config was cross platform and it was network_stats instead of linux_network_stats because that way a service mesh provider such Istio or OSM can add this filter to their config and then Envoy will know which one to use.
Implementation/Wise: Can we have an internal struct (basically expand) EnvoyTcpInfo and each platform implements this class and internally we abstract the platform details. So we can define another function os_sys_calls or in IoHandle which just returns EnvoyTcpInfo.

ggreenway · 2021-08-12T23:35:08Z

Re: cross-platform things

Config/AP: I would rather if the config was cross platform and it was network_stats instead of linux_network_stats because that way a service mesh provider such Istio or OSM can add this filter to their config and then Envoy will know which one to use.

Implementation/Wise: Can we have an internal struct (basically expand) EnvoyTcpInfo and each platform implements this class and internally we abstract the platform details. So we can define another function os_sys_calls or in IoHandle which just returns EnvoyTcpInfo.

Kind of. The problem is when the different platforms don't measure the same thing. If a platform doesn't measure something at all, we can just omit the stat.

But what if two platforms measure similar things, but they're not quite the same? Or they're trying to measure the same thing, but with different methodology that gives meaningfully different results? Do we create different stat names for the two platforms, with very similar names, or just use the same name?

davinci26 · 2021-08-13T16:22:24Z

I think that putting a prefix with the platform name in the stats might be useful regardless. I think we can add the prefix for all the stats.

I dont have any customer data at this point but I could see how someone who has a mixed OS cluster might be interested in TCP perf metric per platform.

github-actions · 2021-09-12T20:01:22Z

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions · 2021-09-20T00:02:11Z

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

ggreenway · 2021-09-20T15:28:46Z

I'll get back to this eventually; got busy with other things for awhile.

ggreenway · 2021-10-13T00:40:07Z

envoy/stats/histogram.h

@@ -104,8 +104,16 @@ class Histogram : public Metric {
    Bytes,
    Microseconds,
    Milliseconds,
+    Percent, // A percent value stored as fixed-point, where the stored value is divided by


@jmarantz wdyt about this for a fixed-point scaled percent representation for histograms? Changes can be viewed isolated here: 6b1f8aa

Assuming we go with this can we split this out into an independent change?

Yep. But I wanted to iterate on these together at first to make sure the stats API we add is right for the use case.

Signed-off-by: Greg Greenway <[email protected]>

mattklein123

Very cool. LGTM with just a couple of small questions.

/wait-any

mattklein123 · 2021-10-21T18:50:20Z

api/envoy/extensions/transport_sockets/tcp_stats/v3/tcp_stats.proto

+
+  // Period to update stats while the connection is open. If unset, updates only happen when the
+  // connection is closed. Stats are always updated one final time when the connection is closed.
+  google.protobuf.Duration update_period = 2;


Do you think we should have some minimum sensible time here if this is set? wdyt?

Probably should be a minimum of 1ms, because we can't set timers for less than that. Other than that, if someone wants to burn CPU by querying this almost constantly, and getting really high granularity, I'm fine with letting them.

OK maybe add an annotation for this then? Otherwise LGTM.

/wait

mattklein123 · 2021-10-21T18:57:14Z

source/extensions/transport_sockets/tcp_stats/tcp_stats.h

+  void closeSocket(Network::ConnectionEvent event) override;
+
+private:
+  absl::optional<struct tcp_info> querySocketInfo();


I'm probably missing something but how does this compile? Doesn't this require knowing the size of tcp_info? I'm concerned that somehow this is using the wrong version with the missing fields? Otherwise is the forward decl needed?

That's a good question. It may be that the compiler doesn't care if there's no caller of the function. And since it is private, the only caller is in tcp_stats.cc, which has the definition before the function is called. I don't know. C++ is weird and complicated.

Signed-off-by: Greg Greenway <[email protected]>

mattklein123

LGTM, thanks.

Signed-off-by: Greg Greenway <[email protected]>

ggreenway added 4 commits August 10, 2021 12:55

network filter: New filter for linux-specific TCP network stats

680393d

Signed-off-by: Greg Greenway <[email protected]>

Add support for floating point values to histograms.

848c69a

This is useful for things like percents, which should be in the range 0.0-1.0. Signed-off-by: Greg Greenway <[email protected]>

Support upstream network filter

3e660b1

Signed-off-by: Greg Greenway <[email protected]>

fixes

aa4bbbb

Signed-off-by: Greg Greenway <[email protected]>

repokitteh-read-only bot added the api label Aug 11, 2021

repokitteh-read-only bot assigned mattklein123 Aug 11, 2021

ggreenway commented Aug 11, 2021

View reviewed changes

mattklein123 requested changes Aug 12, 2021

View reviewed changes

repokitteh-read-only bot added the waiting label Aug 12, 2021

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Sep 12, 2021

github-actions bot closed this Sep 20, 2021

ggreenway reopened this Oct 13, 2021

repokitteh-read-only bot removed the waiting label Oct 13, 2021

ggreenway commented Oct 13, 2021

View reviewed changes

ggreenway assigned jmarantz Oct 13, 2021

ggreenway added 4 commits October 20, 2021 10:40

Merge remote-tracking branch 'upstream/main' into linux-stats

105050e

alphabetize release note

2753b2e

Signed-off-by: Greg Greenway <[email protected]>

proto format

ef53e11

Signed-off-by: Greg Greenway <[email protected]>

Merge remote-tracking branch 'upstream/main' into linux-stats

ae18dcf

Signed-off-by: Greg Greenway <[email protected]>

mattklein123 reviewed Oct 21, 2021

View reviewed changes

repokitteh-read-only bot added waiting:any waiting and removed waiting:any labels Oct 21, 2021

ggreenway added 2 commits October 21, 2021 17:08

proto validation of duration

77af5e7

Signed-off-by: Greg Greenway <[email protected]>

Merge remote-tracking branch 'upstream/main' into linux-stats

f90cd6f

repokitteh-read-only bot removed the waiting label Oct 22, 2021

ggreenway added 2 commits October 22, 2021 07:16

clang tidy

dc7e353

Signed-off-by: Greg Greenway <[email protected]>

more test coverage

cec0abc

Signed-off-by: Greg Greenway <[email protected]>

mattklein123 previously approved these changes Oct 22, 2021

View reviewed changes

repokitteh-read-only bot removed the api label Oct 22, 2021

ggreenway added 2 commits October 22, 2021 08:30

more coverage

1399acf

Signed-off-by: Greg Greenway <[email protected]>

Merge remote-tracking branch 'upstream/main' into linux-stats

d718b4a

ggreenway dismissed mattklein123’s stale review via d718b4a October 22, 2021 15:31

repokitteh-read-only bot added the api label Oct 22, 2021

mattklein123 previously approved these changes Oct 22, 2021

View reviewed changes

repokitteh-read-only bot removed the api label Oct 22, 2021

fix windows build by moving #include

d2ce0e6

Signed-off-by: Greg Greenway <[email protected]>

ggreenway dismissed mattklein123’s stale review via d2ce0e6 October 22, 2021 18:02

repokitteh-read-only bot added the api label Oct 22, 2021

mattklein123 approved these changes Oct 25, 2021

View reviewed changes

repokitteh-read-only bot removed the api label Oct 25, 2021

ggreenway merged commit b959a46 into envoyproxy:main Oct 25, 2021

ericvn mentioned this pull request Oct 26, 2021

Automator: update envoy@ in istio/proxy@master istio/proxy#3547

Merged

lambdai mentioned this pull request Oct 26, 2021

Disable envoy tcp_stats transport extension by default istio/istio#35742

Closed

XuehaiPan mentioned this pull request Jan 13, 2022

envoy 1.21.0 Homebrew/homebrew-core#90921

Closed

Add transport socket wrapper to collect OS-level TCP stats and export as envoy metrics #17682

Add transport socket wrapper to collect OS-level TCP stats and export as envoy metrics #17682

Conversation

ggreenway commented Aug 11, 2021 • edited Loading

repokitteh-read-only bot commented Aug 11, 2021

repokitteh-read-only bot commented Aug 11, 2021

ggreenway commented Aug 11, 2021

ggreenway commented Aug 11, 2021

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmarantz commented Aug 12, 2021

ggreenway commented Aug 12, 2021

ggreenway commented Aug 12, 2021

jmarantz commented Aug 12, 2021

ggreenway commented Aug 12, 2021

davinci26 commented Aug 12, 2021

ggreenway commented Aug 12, 2021

davinci26 commented Aug 13, 2021

github-actions bot commented Sep 12, 2021

github-actions bot commented Sep 20, 2021

ggreenway commented Sep 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

ggreenway commented Aug 11, 2021 •

edited

Loading