-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BCI-3492 [LogPoller]: Allow withObservedExecAndRowsAffected to report non-zero rows affected #14057
BCI-3492 [LogPoller]: Allow withObservedExecAndRowsAffected to report non-zero rows affected #14057
Conversation
Also: - Change behavior of DeleteExpiredLogs to delete logs which don't match any filter - Add a test case to ensure the dataset size is published properly during pruning
@@ -285,7 +285,7 @@ func withObservedExecAndRowsAffected(o *ObservedORM, queryName string, queryType | |||
WithLabelValues(o.chainId, queryName, string(queryType)). | |||
Observe(float64(time.Since(queryStarted))) | |||
|
|||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
// DeleteExpiredLogs removes any logs which either: | ||
// - don't match any currently registered filters, or | ||
// - have a timestamp older than any matching filter's retention, UNLESS there is at | ||
// least one matching filter with retention=0 | ||
func (o *DSORM) DeleteExpiredLogs(ctx context.Context, limit int64) (int64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to accidentally remove logs when job is updated? Let's imagine the following scenario
- We have a job, that registered multiple LogPoller filters during the init
- CLO proposes a job replacement, a slight change to the job definition, but contracts stay the same so filters would be the same, so the logs required for plugin to work
- NOP accepts the job which triggers removing the old job and creating a new one. It includes unregistering old filters and registering new ones
- In the meantime LogPoller
PruneExpiredLogs
kicks in trying to remove stale logs. It sees a new set of "abandoned" logs, because filters were just removed. It removes these logs immediately.
- In the meantime LogPoller
- Plugin init registers new filters (which are the same), but logs are gone so we might need to run
replay
I wonder if this is something that could happen or impact the system. How does it work for other products? What if job is removed and added in a two way process? (via API). AFAIK CLO does this entire flow in a large single db transaction, but I guess it might change at some point because of the loopps architecture, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I'm pretty sure it's okay for now, since as you say CLO does the whole flow in a single db transaction. Someone could manually remove a job, and then add a similar one later... but I think it would be a natural user expectation to assume that there are no guarantees about events the old job cared about still being there for the new job to use. If CLO changes so that it's no longer in a single db transaction, that seems more problematic since it's presented to the user as an "Update" rather than a "Remove" followed by a separate "Create". But I think this would cause similar problems with or without this change. One way or another, we'll have to solve some issues if we can no longer guarantee the ability for CLO to make atomic job updates.
Plugin init registers new filters (which are the same), but logs are gone so we might need to run replay
Any plugin which cares about past logs already does a replay to the appropriate block when it's initialized. (For example, going back to pick up an OCR2 configuration event.) So I think the only issue is with missing logs that get emitted around the time the old job is getting swapped out with the new job. But even without this change, there is a potential for missing some logs while that's happening. This would just increase the likelihood of it happening, since it would apply not just to events emitted between the delete and create operations but also those emitted shortly before the delete where the old job hadn't had time to process them before being deleted.
One way I can think of for how we might be able to solve this issue is with ChainReader. If it can expose a method for updating a filter that does UnregisterFilter and then RegisterFilter within the same db transaction, that should solve it I think. We might already even be doing this with Bind(), I'm not sure but @ilija42 should know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. One thing that comes to my mind that might work as an additional safety check is to wait for some "sane" threshold before deleting logs without the filter. When retention is null
we match only logs older than X days. Not sure what that threshold should be (maybe configurable?), but that might prevent us from accidentally losing logs right after deleting the job in some weird way. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking that might be what we have to do, if we can't solve things with ChainReader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just thought of a simple solution which I think would be pretty robust: in UnregisterFilter()
, we could return successful but delay the actual removal of the filter by 1 hour (or any long enough time, it could even be 24 hours but that seems like overkill). As long as any call to RegisterFilter()
arrives within that 1 hour grace period that matches the logs that would have been deleted, there is no time at which the pruner would be allowed to delete them.
Quality Gate passedIssues Measures |
* develop: CRIB CI integration (#13924) fix: refactor sonarqube scan args (#13875) BCI-3492 [LogPoller]: Allow withObservedExecAndRowsAffected to report non-zero rows affected (#14057) Add error mapping for Astar (#13990) [BCI-3862][chainlink] - Change DSL Block primitive to string instead of int (#14033) [KS-412] Validate called DON membership in TriggerPublisher (#14040) [TT-1429] notifying guardian on some failures (#14073) Add Mantle errors (#14053) Fix write target nil dereferences (#14059) Allow retrying failed transmissions (#14017) auto-11214: migrate more tests to foundry (#13934)
Addresses this bug: https://smartcontract-it.atlassian.net/browse/BCI-3492
Previously, rows affected was only getting reported for log & block pruning when there was an error with the query, so it was always 0