Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reverted email analytics jobs commits #20835

Merged
merged 5 commits into from
Aug 27, 2024
Merged

Conversation

9larsons
Copy link
Contributor

ref https://linear.app/tryghost/issue/ENG-1518

After releasing the analytics job improvements, it appears for large sites we're awfully close to missing some Mailgun events because of an unexpected behavior of the aggregateStats call for just the opened events job. This is taking 2-5x(+) the amount of time that the aggregate queries take for the other jobs, despite not being dependent on the events.

To err on the side of caution, we're going to roll this back and look to optimize the aggregation queries before re-implementing. And we may be a bit more cautious in giving some but not all priority to the opened events.

@9larsons 9larsons merged commit 8f3985b into main Aug 27, 2024
22 checks passed
@9larsons 9larsons deleted the revert-analytics-jobs-commits branch August 27, 2024 21:15
9larsons added a commit that referenced this pull request Aug 27, 2024
ref https://linear.app/tryghost/issue/ENG-1518

After releasing the analytics job improvements, it appears for large
sites we're awfully close to missing some Mailgun events because of an
unexpected behavior of the aggregateStats call for just the opened
events job. This is taking 2-5x(+) the amount of time that the aggregate
queries take for the other jobs, despite not being dependent on the
events.

To err on the side of caution, we're going to roll this back and look to
optimize the aggregation queries before re-implementing. And we may be a
bit more cautious in giving _some_ but not _all_ priority to the
`opened` events.
9larsons added a commit that referenced this pull request Sep 3, 2024
9larsons added a commit that referenced this pull request Sep 5, 2024
ref #20835
- reimplemented email analytics changes that prioritized opened events
over other events in order to speed up open analytics
- added db persistence to fetch missing job to ensure we re-fetch every
window of events, especially important if we restart following a large
email batch

We learned a few things with the previous trial run of this. Namely,
that event throughput is not as high as we initially saw in the data for
particularly large databases. This set of changes is more conservative,
while a touch more complicated, in ensuring we capture edge cases for
really large newsletter sends (100k+ members).

In general, we want to make sure we're fetching new open events at least
every 5 mins, and often much faster than that, unless it's a quiet
period (suggesting we haven't had a newsletter send or much outstanding
event data).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant