-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cache statediffs when DB connection is dropped #129
Comments
@i-norden - I will get started on this task, I think it's a good way for me to build up my understanding of how we connect to Postgres, and how we perform writes to it. Starting PointI am going to browse through the codebase to get a general sense of understanding as it relates to the following:
Proposed ChangesThe following is what I had in mind:
InputWhat should we do for TestingTesting this component will be critical. Here are a few thoughts:
Writing Integration TestsI am not sure that we currently have a method for writing integration tests and testing them on the running instance of Geth. With that in mind, it might make sense for me to work on getting Foundry integrated into our existing stack. The best way to proceed might be:
|
Looks good, the main remark I have is that this is going to be trickier than perhaps conveyed here at a high level because of how tightly coupled our statediffing process is to Postgres. When operating in the direct writing mode there is no independent "statediff object", the statediff object is all the INSERT statements piling up inside the pending Postgres transaction. This is for performance reasons, if we first create a statediff object as some distinct Go type then we have to reiterate that entire object once again when we go to map all the relations and insert into a DB. We have vestigial methods (e.g. |
Actually I think the better way, from both a performance and an engineering complexity perspective, to approach it is to switch over to using the file-writing indexer if/when we lose our Postgres connection. And so the cache simply becomes a bunch of sql files we can then load into Postgres. |
If that is the taken approach one thing we need to do is add a configuration option to the file-writing indexer that tells it to write out SQL statements that include |
Most of this makes a lot of sense, the last bit doesn't just yet (I am sure it will once I get a better understanding of what is going on). At this point, I also agree that simply dumping some SQL files and loading them up upon "reconnection" is easier from a testing perspective and probably from an engineering perspective. Thank you for the rich insight, I'm going to continue diving into the code base and try to understand what is happening. |
@i-norden - I believe we can close this issue since the |
If DB connection is dropped or fails, the geth client should continue to create statediff objects and cache them up to some configurable limit (in memory and/or write out to files i.e. journaling analogous to mempool journaling). When the connection is regained or the node is restarted, the client will write these cached diffs to the database (first, and then goes back to tracking diffs at head).
The text was updated successfully, but these errors were encountered: