Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(statsd): Use symbolicator-style aggregation #4446

Merged
merged 14 commits into from
Jan 16, 2025
Merged

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Jan 15, 2025

Replace the existing MetricsClient that uses statsdproxy and rely on a modified version of the aggregator used in symbolicator.

Note: This PR is a proof of concept and takes a few shortcuts (see below). It therefore should not get merged into a calendar release as-is.

  1. Aggregation cannot be disabled.
  2. The sample rate is hard-coded to 1.
  3. Metrics are not testable using with_capturing_test_client for the time being.

Might fix https://github.com/getsentry/team-ingest/issues/613

["requests.body_read.duration:0|ms|#route:unknown,size:lt10KB,status:completed"]
)
}
// #[test]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will restore all tests that currently use with_capturing_test_client after we've tested the PoC.

@@ -67,7 +67,7 @@ impl RelayStats {
worker = &worker_name,
);
metric!(
gauge(RuntimeGauges::WorkerMeanPollTime) =
gauge_f(RuntimeGauges::WorkerMeanPollTime) =
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super elegant, but this was the easiest way to support both integer and float based gauges.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost completely replaced the contents of this file with https://github.com/getsentry/symbolicator/blob/master/crates/symbolicator-service/src/metrics.rs, with some modifications necessary to support our use case.

@@ -17,13 +17,12 @@
//! or the [`metric!`] macro will become a noop. Only when configured, metrics will actually be
//! collected.
//!
//! To initialize the client, either use [`set_client`] to pass a custom client, or use
//! [`init`] to create a default client with known arguments:
//! To initialize the client use [`init`] to create a default client with known arguments:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left the original doc comment intact, but the rest of this file is a modified copy of https://github.com/getsentry/symbolicator/blob/1873ebad53b96edb8ba1380e3d9cff824ae3de86/crates/symbolicator-service/src/metrics.rs.

@@ -0,0 +1,221 @@
/// A metric for capturing timings.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symbolicator does not have enum types to declare metrics, so I copied the contents of this file from

/// A metric for capturing timings.
///
/// Timings are a positive number of milliseconds between a start and end time. Examples include
/// time taken to render a web page or time taken for a database call to return.
///
/// ## Example
///
/// ```
/// use relay_statsd::{metric, TimerMetric};
///
/// enum MyTimer {
/// ProcessA,
/// ProcessB,
/// }
///
/// impl TimerMetric for MyTimer {
/// fn name(&self) -> &'static str {
/// match self {
/// Self::ProcessA => "process_a",
/// Self::ProcessB => "process_b",
/// }
/// }
/// }
///
/// # fn process_a() {}
///
/// // measure time by explicitly setting a std::timer::Duration
/// # use std::time::Instant;
/// let start_time = Instant::now();
/// process_a();
/// metric!(timer(MyTimer::ProcessA) = start_time.elapsed());
///
/// // provide tags to a timer
/// metric!(
/// timer(MyTimer::ProcessA) = start_time.elapsed(),
/// server = "server1",
/// host = "host1",
/// );
///
/// // measure time implicitly by enclosing a code block in a metric
/// metric!(timer(MyTimer::ProcessA), {
/// process_a();
/// });
///
/// // measure block and also provide tags
/// metric!(
/// timer(MyTimer::ProcessB),
/// server = "server1",
/// host = "host1",
/// {
/// process_a();
/// }
/// );
///
/// ```
pub trait TimerMetric {
/// Returns the timer metric name that will be sent to statsd.
fn name(&self) -> &'static str;
}
/// A metric for capturing counters.
///
/// Counters are simple values incremented or decremented by a client. The rates at which these
/// events occur or average values will be determined by the server receiving them. Examples of
/// counter uses include number of logins to a system or requests received.
///
/// ## Example
///
/// ```
/// use relay_statsd::{metric, CounterMetric};
///
/// enum MyCounter {
/// TotalRequests,
/// TotalBytes,
/// }
///
/// impl CounterMetric for MyCounter {
/// fn name(&self) -> &'static str {
/// match self {
/// Self::TotalRequests => "total_requests",
/// Self::TotalBytes => "total_bytes",
/// }
/// }
/// }
///
/// # let buffer = &[(), ()];
///
/// // add to the counter
/// metric!(counter(MyCounter::TotalRequests) += 1);
/// metric!(counter(MyCounter::TotalBytes) += buffer.len() as i64);
///
/// // add to the counter and provide tags
/// metric!(
/// counter(MyCounter::TotalRequests) += 1,
/// server = "s1",
/// host = "h1"
/// );
///
/// // subtract from the counter
/// metric!(counter(MyCounter::TotalRequests) -= 1);
///
/// // subtract from the counter and provide tags
/// metric!(
/// counter(MyCounter::TotalRequests) -= 1,
/// server = "s1",
/// host = "h1"
/// );
/// ```
pub trait CounterMetric {
/// Returns the counter metric name that will be sent to statsd.
fn name(&self) -> &'static str;
}
/// A metric for capturing histograms.
///
/// Histograms are values whose distribution is calculated by the server. The distribution
/// calculated for histograms is often similar to that of timers. Histograms can be thought of as a
/// more general (not limited to timing things) form of timers.
///
/// ## Example
///
/// ```
/// use relay_statsd::{metric, HistogramMetric};
///
/// struct QueueSize;
///
/// impl HistogramMetric for QueueSize {
/// fn name(&self) -> &'static str {
/// "queue_size"
/// }
/// }
///
/// # use std::collections::VecDeque;
/// let queue = VecDeque::new();
/// # let _hint: &VecDeque<()> = &queue;
///
/// // record a histogram value
/// metric!(histogram(QueueSize) = queue.len() as u64);
///
/// // record with tags
/// metric!(
/// histogram(QueueSize) = queue.len() as u64,
/// server = "server1",
/// host = "host1",
/// );
/// ```
pub trait HistogramMetric {
/// Returns the histogram metric name that will be sent to statsd.
fn name(&self) -> &'static str;
}
/// A metric for capturing sets.
///
/// Sets count the number of unique elements in a group. You can use them to, for example, count the
/// unique visitors to your site.
///
/// ## Example
///
/// ```
/// use relay_statsd::{metric, SetMetric};
///
/// enum MySet {
/// UniqueProjects,
/// UniqueUsers,
/// }
///
/// impl SetMetric for MySet {
/// fn name(&self) -> &'static str {
/// match self {
/// MySet::UniqueProjects => "unique_projects",
/// MySet::UniqueUsers => "unique_users",
/// }
/// }
/// }
///
/// # use std::collections::HashSet;
/// let users = HashSet::new();
/// # let _hint: &HashSet<()> = &users;
///
/// // use a set metric
/// metric!(set(MySet::UniqueUsers) = users.len() as i64);
///
/// // use a set metric with tags
/// metric!(
/// set(MySet::UniqueUsers) = users.len() as i64,
/// server = "server1",
/// host = "host1",
/// );
/// ```
pub trait SetMetric {
/// Returns the set metric name that will be sent to statsd.
fn name(&self) -> &'static str;
}
/// A metric for capturing gauges.
///
/// Gauge values are an instantaneous measurement of a value determined by the client. They do not
/// change unless changed by the client. Examples include things like load average or how many
/// connections are active.
///
/// ## Example
///
/// ```
/// use relay_statsd::{metric, GaugeMetric};
///
/// struct QueueSize;
///
/// impl GaugeMetric for QueueSize {
/// fn name(&self) -> &'static str {
/// "queue_size"
/// }
/// }
///
/// # use std::collections::VecDeque;
/// let queue = VecDeque::new();
/// # let _hint: &VecDeque<()> = &queue;
///
/// // a simple gauge value
/// metric!(gauge(QueueSize) = queue.len() as u64);
///
/// // a gauge with tags
/// metric!(
/// gauge(QueueSize) = queue.len() as u64,
/// server = "server1",
/// host = "host1"
/// );
/// ```
pub trait GaugeMetric {
/// Returns the gauge metric name that will be sent to statsd.
fn name(&self) -> &'static str;
}
.

@jjbayer jjbayer requested a review from Litarnus January 15, 2025 15:26
Comment on lines +102 to +105
/// The interval in which to flush out metrics.
/// NOTE: In particular for timer metrics, we have observed that for some reason, *some* of the timer
/// metrics are getting lost (interestingly enough, not all of them) and are not being aggregated into the `.count`
/// sub-metric collected by `veneur`. Lets just flush a lot more often in order to emit less metrics per-flush.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a random thought but I wonder if having sub-second intervals for the current solution would have helped.

@jjbayer jjbayer marked this pull request as ready for review January 16, 2025 08:12
@jjbayer jjbayer requested a review from a team as a code owner January 16, 2025 08:12
@jjbayer jjbayer merged commit 2f82dea into master Jan 16, 2025
26 checks passed
@jjbayer jjbayer deleted the feat/local-aggregator branch January 16, 2025 12:18
Dav1dde added a commit that referenced this pull request Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants