diagnostics: add datadriven testing to diagnostics output to detect regressions/changes #134450

dhartunian · 2024-11-06T19:52:48Z

Today, it's not possible to easily detect changes in the diagnostic output for a cluster.

What's been tricky here is managing the fact that the output of diagnostics is non-deterministic. We can probably filter the output lightly to make a deterministic payload for testing. This will likely have to be the sql and schema stats data which can contain arbitrary numbers.

Jira issue: CRDB-44083

blathers-crl · 2024-11-06T19:52:49Z

Hi @dhartunian, please add branch-* labels to identify which branch(es) this C-bug affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

blathers-crl · 2024-11-06T20:01:26Z

Hi @exalate-issue-sync[bot], please add branch-* labels to identify which branch(es) this C-bug affects.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

This change adds test coverage to the diagnostic reporter that's meant to catch situations where schema or statement scrubbing is accidentally turned off. In the course of adding tests for SQL Stats it was discovered that the diagnostics reporter would include statements that were in internal applications (`$ internal` prefix) so a change was made to omit those from the reports. Resolves: cockroachdb#134450 Release note: None

139062: server,sql: increase redaction coverage of diagnostics tests r=angles-n-daemons a=dhartunian This change adds test coverage to the diagnostic reporter that's meant to catch situations where schema or statement scrubbing is accidentally turned off. In the course of adding tests for SQL Stats it was discovered that the diagnostics reporter would include statements that were in internal applications (`$ internal` prefix) so a change was made to omit those from the reports. Resolves: #134450 Release note: None 139066: db-console: update rac v2 overload dashboard charts r=sumeerbhola a=kvoli Update the db console overload dashboard to: - remove metrics associated with v1 replication admission control - rename metrics associated with v2 replication admission control to remove the version reference - add a chart containing the per-node send queue size in bytes <details><summary>Screenshots</summary> <p> ![image](https://github.com/user-attachments/assets/5ce5b9eb-4f87-4a4b-a6a5-185c688f199e) ![image](https://github.com/user-attachments/assets/faea8862-0f90-415c-8ce1-0ece9b40f988) ![image](https://github.com/user-attachments/assets/9667f41b-607c-4b17-b3c4-dceba6e77ccb) </p> </details> Resolves: #128039 Release note (ui change): The overload dashboard on DB Console now shows only the v2 replication admission control metrics, where previously it displayed both v1 and v2 metrics. Additionally, the aggregate size of queued replication entries is now shown. 139171: sql: use parsed statements for persistedsqlstats r=fqazi a=fqazi Previously, we would re-parse SQL statements used to upsert statement and txn stats. To address this patch, this patch will parse these statements once and use ExecParsed to reduce CPU usage. This patch also adds a simple benchmark for this code path as well, which shows a small 1% delta. Before: BenchmarkSQLStatsFlush 100 1415926687 ns/op 319339313 B/op 2302002 allocs/op After: BenchmarkSQLStatsFlush 100 1396673170 ns/op 319003310 B/op 2298192 allocs/op Fixes: #134583 Release note: None 139273: roachtest: collect qps metrics over longer window in gracefuldrain test r=arulajmani a=arulajmani The gracefuldrain test was modernized in cf30717. Prior to that commit, QPS metrics were collected over a 10s interval, whereas the modernization refactor changed this to 1 second intervals. Looking at a few recent test failures, I see QPS metrics above the failure threshold, which makes me think suspect that this 1s interval is causing the sorts of inaccuracies MeasureQPS warns against. Also See #133020 (comment). One thing that doesn't line up is the timeline of this tests failure and cf30717. Still, this patch changes the metric's interval back to 10s. References #133020 Release note: None Co-authored-by: David Hartunian <[email protected]> Co-authored-by: Austen McClernon <[email protected]> Co-authored-by: Faizan Qazi <[email protected]> Co-authored-by: Arul Ajmani <[email protected]>

dhartunian added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-postmortem Originated from a Postmortem action item. P-1 Issues/test failures with a fix SLA of 1 month T-observability labels Nov 6, 2024

dhartunian removed the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Nov 6, 2024

exalate-issue-sync bot added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Nov 6, 2024

exalate-issue-sync bot removed the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Nov 7, 2024

exalate-issue-sync bot assigned dhartunian Nov 26, 2024

dhartunian mentioned this issue Jan 14, 2025

server,sql: increase redaction coverage of diagnostics tests #139062

Merged

craig bot closed this as completed in f876eb6 Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diagnostics: add datadriven testing to diagnostics output to detect regressions/changes #134450

diagnostics: add datadriven testing to diagnostics output to detect regressions/changes #134450

dhartunian commented Nov 6, 2024 •

edited by cockroach-jira-scripts

Loading

blathers-crl bot commented Nov 6, 2024

blathers-crl bot commented Nov 6, 2024

diagnostics: add datadriven testing to diagnostics output to detect regressions/changes #134450

diagnostics: add datadriven testing to diagnostics output to detect regressions/changes #134450

Comments

dhartunian commented Nov 6, 2024 • edited by cockroach-jira-scripts Loading

blathers-crl bot commented Nov 6, 2024

blathers-crl bot commented Nov 6, 2024

dhartunian commented Nov 6, 2024 •

edited by cockroach-jira-scripts

Loading