-
Notifications
You must be signed in to change notification settings - Fork 33
Add constraint to prevent duplicate headers #150
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better for the unique constraint to be on the hash and block number alone, so that we don't get duplicate headers from different nodes and don't have to worry about duplicate logs as a result? Is there any case where we would want duplicate headers from different nodes? Otherwise,
I'm torn about this. I think we do in theory want to be able to be syncing the same header from different nodes, but there's probably a good bit of further work to be done to facilitate enabling that in the context of our various operations. One core thing to consider is that the "node" differs not just between instances (e.g. geth vs parity) but also with same instance pointed at different networks (e.g. mainnet vs kovan). Just another place where we probably need more work to truly say we support running against multiple networks at one time, but worth consideration. Perhaps it makes sense to lock down on one instance/network for now, but I think I'd be inclined to kick that over to another issue/PR. If we go that route, then we can probably strip out a lot of the code around |
That's also an interesting constraint ( |
This would definitely be a separate PR but what about changing
Hmm I'm probably missing something bigger picture but I don't see why we would upsert on a |
You're totally right about that constraint. For some reason I was thinking of it as a uniqueness constraint on the block number or block number + node fingerprint, which would both efficiently upsert reorged headers but (I think) fail to cascade delete now-invalidated nested data. Unique by block number + hash would not trigger problems on cascade delete since constraint violations there mean literal data duplication rather than reorg or anything like that. |
I like the idea of storing node id and fingerprint as an array on the header! I'd want to think a bit about how we're using those fields in queries like |
Last thing I'll say is that I think it'd be totally awesome if we could boil all of this down to a single SQL query to: (1) try to insert a header Feels like it could be pretty ugly and tricky, but looking at this issue has me pretty skeptical of trying to manage header uniqueness in application code. |
@@ -8,7 +8,8 @@ CREATE TABLE public.headers | |||
block_timestamp NUMERIC, | |||
check_count INTEGER NOT NULL DEFAULT 0, | |||
eth_node_id INTEGER NOT NULL REFERENCES eth_nodes (id) ON DELETE CASCADE, | |||
eth_node_fingerprint VARCHAR(128) | |||
eth_node_fingerprint VARCHAR(128), | |||
UNIQUE (block_number, hash, eth_node_fingerprint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always super anxious about us half-supporting syncing with several clients, since I'm not certain we actually account for that everywhere ._.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed. Really what this is accomplishing, though, is just making sure that running against multiple chains (e.g. mainnet/kovan) won't cause destructive operations against one another. Definitely would like to do more to make multi-client support for one chain robust
if err != nil { | ||
log.Error("insertHeader: error inserting header: ", err) | ||
if err == sql.ErrNoRows { | ||
return 0, ErrValidHeaderExists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any corresponding caller error matching, since we're returning an error instead of just logging now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Header sync will continue on this error https://github.com/vulcanize/vulcanizedb/blob/staging/pkg/history/populate_headers.go#L56
- Disallow inserts of headers with the same number, hash, and node fingerprint, since it will enable duplicate log fetching for the same header
8f4343d
to
e252229
Compare
fingerprint, since it will enable duplicate log fetching for the
same header
Resolves #148
Minimal change set to prevent concurrent processes from inserting duplicate headers. Would be cool to revisit how we can simplify the application code while preserving our cascade delete functionality (that handles reorgs).
Also important to note that we can still end up with duplicate logs since headers with the same number and hash come from different nodes 🤔