Configure a lower scrape interval for Project Contour Envoy service monitor #1194

niels-s · 2020-11-18T16:02:14Z

Current situation
At the moment the envoy service monitor doesn't configure a scrape interval so it falls back to the default scrape interval of Prometheus which is 30 seconds.

Impact
Because the scrape interval is so large we suspect it smoothens our latency for that period and we are unable to see any latency spikes. When we compare our own application-specific monitoring we see spikes between 50ms ~ 100ms and occasionally 1s in p99 latency. However, when looking at the envoy latency metrics we can see any spikes at all for p99.

Ideal future situation
Lower the scrape interval of the envoy service monitor so we are able to notice spikes in our latency graphs

Implementation options

add a hardcoded lower scrape interval 1s,5s, 10s to the envoy service monitor
or make it configurable to specify a scrape interval for project contour component

Additional information
Currently using Lokomotive v0.4.1, however, there are no changes regarding this request in v0.5.0

invidian · 2020-11-18T16:48:14Z

@niels-s if you change the scrape interval manually in the ServiceMonitor, does it show up latency spikes?

niels-s · 2020-11-19T09:23:30Z

That's a fair question 👍 should have thought of that myself 😅 I've adjusted the interval manually for no, I'll give it some time to gather some data with the new interval and report back.

niels-s · 2020-11-19T10:47:44Z

To test I increased the interval from the default 30s to the extreme 1s scrape interval. Below I've gathered some screenshots when using a 1m and a 2s range query to display the Envoy data.

This one display the 1m range query, you can see the spikes only go up to 4000ms

While in the example below when we are able to use a smaller time range of 2s, we can see spikes up to 10000ms, where the previous chart would smoothen the spikes to only half the actual latency.

Perhaps scrape interval of 1s would be too aggressive, but the default of 30s smoothens the spikes in this case of latency, but actually all the metrics. So, in general, it would be great if we could lower the scrape interval.

invidian · 2020-11-19T11:13:21Z

Thanks for checking it @niels-s. I wonder if there is some other way to find out about those latency spikes without increasing the scrape interval 🤔 Maybe tuning envoy histogram buckets?

invidian added area/components Items related to components kind/enhancement New feature or request labels Nov 18, 2020

surajssd added the proposed/next-sprint Issues proposed for next sprint label Nov 25, 2020

iaguis removed the proposed/next-sprint Issues proposed for next sprint label Nov 25, 2020

surajssd self-assigned this Dec 1, 2020

surajssd mentioned this issue Dec 2, 2020

contour: parameterise envoy scraping interval #1229

Merged

surajssd closed this as completed in #1229 Dec 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure a lower scrape interval for Project Contour Envoy service monitor #1194

Configure a lower scrape interval for Project Contour Envoy service monitor #1194

niels-s commented Nov 18, 2020

invidian commented Nov 18, 2020

niels-s commented Nov 19, 2020

niels-s commented Nov 19, 2020

invidian commented Nov 19, 2020

Configure a lower scrape interval for Project Contour Envoy service monitor #1194

Configure a lower scrape interval for Project Contour Envoy service monitor #1194

Comments

niels-s commented Nov 18, 2020

invidian commented Nov 18, 2020

niels-s commented Nov 19, 2020

niels-s commented Nov 19, 2020

invidian commented Nov 19, 2020