-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: Implement alternative indexing modalities #2313
Comments
Indeed, it's a liveness but not safety thing |
@nambrot can't we just upload a cache of the rocks DB? |
Fetching the DB ^ is interesting I'm skeptical of the etherscan / postgres DB indexing. Feels like a lot of effort for a questionable improvement in a post-DTM world We've spent a lot of time working on the quirks of indexing and I'm wary that interacting with Etherscan APIs will increase the surface area we need to maintain & be more confusing to configure & for postgres DB, now we would need to take our scraper / DB SLA much more seriously. Would we lose the flexibility to mess with the DB / scraper and just point the explorer to a new DB without affecting anyone else? |
Just to be clear, this is not about an either or, this is the equivalent of a fallback provider for indexed data. I totally agree that with DTM, the benefit of this becomes smaller, but I also think the cost of this is pretty small. This doesn't have to run perfectly and can fallback to the current mechanism, but can imo provide relief for PI deployers to not pay up for indexing from scracth (even if it no longer blocks) for fat head chains.
Yes we could, would you prefer that over just querying the explorer API? |
Another source that we imo have not considered enough is the graph protocol. It provides very solid indexed data with great performance and is made for querying from agents like ours. Default subgraphs can be deployed and ran by anybody, and then queried by anybody. In fact, that is what the Khalani folks did (@serejke or @samtvlabs if you want to chime in) |
We have a bunch of Graph Subgraphs running in our Kubernetes backend. It works fairly well, although some of the debugging tools are quite poor and fixing things has caused us a lot of issues. I believe we have a bunch of updated Helm charts as we needed to make quite a few changes for some of the chains we're live on. Happy to help if there's any Graph queries as we've been using them for over 2 years. Also as an aside, is anyone working on stress testing theoretical validator counts and the impact on latency? This is something I was quite interested in doing. Spin up a local testing environment and then create 50 validators. Run a bunch of messages. Check the latency of the messages. Then increase to 100. Check latency. Repeat. Compare results and then trawl through the logs to identify bottle necks. We're probably 8 - 12 weeks away from having our infra ready for this. But at some point I'd like to do it. |
Not really, for most of the use cases we have been talking with folks about, 50 validators would actually be quite gas intensive, that's where using alternate signature schemes would probably be worth it |
I believe the Graph protocol could be a good replacement of the in-house Hyperlane indexing solution, or an alternative for the brave ones. Setting up and maintaining a subgraph is quite time-consuming, but well documented. Subgraph also consumes a lot of RPC quota, though, especially if misconfigured or has unnecessary subgraphs. We faced an issue with Hyperlane indexing that in case of RPC outages (Infura recently had a Mumbai downtime), the Hyperlane ate up all the daily quota. With Subgraph we can hope that a bigger community handled indexing edge-cases more thoroughly. Spinning up a Graph only for the purpose of fallback indexing data seems a bit of an overkill. Alternatively, we can simply post batches of messages to S3. We run subgraphs for the Balancer, and our Axon chain (indexing blocks data for explorer) and are mostly fine with it. We've recently moved the deployment to EKS and using RDS for storing subgraphs data. |
I think @aroralanuk and I are very aligned with this direction |
### Description - Adding "InsertedIntoEvent" for standalone indexing of events by the agents ### Drive-by changes None ### Related issues #2313 ### Backward compatibility Yes ### Testing Unit tests
https://www.streamingfast.io/ has support for ethereum, solana, and cosmos |
I think we should turn this into an epic and not have this be small or a bounty There are structural changes to the agent codebase & the solution for this is very unclear |
i had some thoughts about trying to do it without needing to make changes, or at least significant ones, to the rest of the codebase. if it were to be the case that all instantiations of the indexer need to be updated and tested, i would think that it probably should be an epic. but, if as @nambrot mentioned above, the modular behavior could be added as an option, but if passed no args about the log source, the indexer falls back to it's previously defined functionality, hopefully one could avoid significant refactors. here's a proposed solution: Define the LogFetcher trait:
Define a default implementation for the fetch_logs method of the Indexer trait:
For specific log fetchers:
By doing this, RpcLogFetcher automatically gets the fetch_logs behavior of the Indexer trait. For existing implementations of the Indexer trait: I'm not very experienced with rust, so this is primarily GPT output, but is this the right general direction / are there any glaring issues that pop up to more seasoned eyeballs, in terms of keeping the scope small enough for a bounty? And I guess the better question is whether or not the preferred implementation is one that does require a significant refactor, rather than making compromises on the design pattern of the new log fetcher to accomodate the existing agent structure? either way, would be interested in seeing if i can help out on this one if there are 'bountyable' parts of a larger project, idk why but it seems to have grabbed my attention lol edit: gpt thought a trait was a struct lol |
@twilwa what kind of indexer do you think could be used instead of the RPC one once you do this change? IIUC, you still need to make all the RPC like requests then. IMO what would be most valuable is to have something closer to the sequence based indexer which is able to fetch already indexed events from the graph, an existing scraper DB or etherscan? |
actually a pretty salient point -- it does seem like the logfetcher redesign essentially modularizes for the sake of modularizing/doesn't really provide new flexibility if it's still just firing RPC-like requests at the end of the day. Dug around a bit more, came up with this: Proposal: Modularized Fetch Mechanism for Pre-indexed Data Sources
Where LogQuery can be a struct that captures all necessary parameters to fetch logs from an external source.
This will act as a bridge between our system and any external source. This example demonstrates a hypothetical integration with an external source.
Introduce a new configuration variant for the external log source.
Modify the relevant methods in ChainConf to use this new connection configuration:
Flexibility: The LogQuery struct should be designed keeping in mind the possible variations in querying mechanisms across different external platforms. the LogQuery struct for an external source like Etherscan could look like the following:
With this design, constructing and using a LogQuery instance would be intuitive:
This structure provides a clear representation of the query parameters needed to fetch logs from Etherscan. It can be expanded or adapted as necessary for other external platforms. The to_url method makes it easy to convert the query into a request URL, encapsulating the transformation logic within the struct itself. a bit more comprehensive, i think, but not sure what impact something like that might have on the rest of the codebase/whether scope's still appropriate for bounty-tier edit: make prettier :3 |
https://github.com/google/leveldb/blob/main/doc/index.md#snapshots I think publishing archives of the scraper/relayer DB and having the binaries accept a (partial) archive upon startup would be pretty impactful here |
Problem:
Solution:
This tracks building out these alternative indexing modalities
Tasks
Related
The text was updated successfully, but these errors were encountered: