Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Change frequency column in cloudquery_table_frequency to milliseconds #1428

Merged
merged 1 commit into from
Feb 5, 2025

Conversation

akash1810
Copy link
Member

@akash1810 akash1810 commented Feb 4, 2025

Note

These database changes have been deployed to CODE and therefore the database schema is only compatible with this branch. Before testing another branch on CODE, we'll need to apply another migration to revert the schema change. Or, rebase the feature branch on this one.

What does this change and why?

Updates the cloudquery_table_frequency table to store frequency as milliseconds. Using milliseconds allows for a higher degree of accuracy. For example, we can identify tables that are updated multiple times a day.

To understand the daily rate, we can do:

SELECT table_name
       , 86400000 / frequency AS daily_rate -- 86400000 = milliseconds in a day
FROM   cloudquery_table_frequency;

How has it been verified?

I've deployed to CODE and seen the migration apply:

image

Using the above SQL, we can see the effective daily rate of each table:

image

…lliseconds

Using milliseconds allows for a higher degree of accuracy.
For example, we can identify tables that are updated multiple times a day.
@akash1810 akash1810 marked this pull request as ready for review February 4, 2025 21:47
@akash1810 akash1810 requested review from a team as code owners February 4, 2025 21:47
Copy link
Contributor

@adamnfish adamnfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, I agree this is the simpler approach overall.

I hadn't had a chance to go through the rest of the catalogue (and grafana dashboards) and check if there are other jobs that would be affected by this change, but on the assumption that we've at least checked there's nothing critical that will break with these type changes, I think we could merge this and see.

I'll update the cloudquery paid rows dashboard to match the new form once this is merged and has run to populate the table.

table_name String @id
frequency String?
table_name String @id
frequency BigInt
Copy link
Contributor

@adamnfish adamnfish Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this is no longer an optional field in the schema 👍

@akash1810
Copy link
Member Author

I hadn't had a chance to go through the rest of the catalogue (and grafana dashboards) and check if there are other jobs that would be affected by this change, but on the assumption that we've at least checked there's nothing critical that will break with these type changes, I think we could merge this and see.

I think some of the Grafana alerts will break. However, we've been ignoring these alarms for a while, so this won't make a significant difference.

@akash1810 akash1810 merged commit 3240bd6 into main Feb 5, 2025
7 checks passed
@akash1810 akash1810 deleted the aa/frequency branch February 5, 2025 10:35
@akash1810
Copy link
Member Author

akash1810 commented Feb 5, 2025

I'll orchestrate this on PROD by:

  1. Disabling CD
  2. Stopping any running tasks There were none running
  3. Deploying the database migration
  4. Deploying the rest of the changes
  5. Restart the stopped tasks
  6. Re-enable CD

This is because each ECS task performs an INSERT into the cloudquery_table_frequency table - changing the schema midway through might cause issues.

@akash1810
Copy link
Member Author

akash1810 commented Feb 5, 2025

Confirming this is working as expected. On PROD we can see the aws_elbv2_% and aws_elbv1_% tables are updated 48 times a day, or every 30 minutes.

Each task will need to run at least once before this table is completely accurate. Alternatively, we could seed the table with data ourselves.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants