Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Implement alternative indexing modalities #2313

Open
5 of 6 tasks
nambrot opened this issue Jun 1, 2023 · 18 comments
Open
5 of 6 tasks

Epic: Implement alternative indexing modalities #2313

nambrot opened this issue Jun 1, 2023 · 18 comments

Comments

@nambrot
Copy link
Contributor

nambrot commented Jun 1, 2023

Problem:

  • All agents need to index messages (relayer also indexes IGP payments)
  • Death to Merkle Tree #2192 allowed relayer indexing to no longer message processing
  • Indexing inherently costs RPC requests. Polling for messages costs RPC requests.

Solution:

  • Following Easily configurable log fetchers #3268, we'll be able to modularize the fetching of logs so that other sources of logs can be used (in addition to the default RPC-based method). Note that the current indexing relies on a configurable max window which is relatively small (1999 by default). Other sources presumably can support much larger windows, which would make this even faster.
  • Potential sources:
    • Block explorers like etherscan (can use the block explorer struct from Audit config shapes #1867)
    • Scraper Postgres Database
    • Graph protocol subgraphs
    • Sharing RocksDB data with agent operators, though this doesn't fit our paradigm as well
    • Putting messages indexed by validators in the checkpoint format that's stored in S3, and allowing relayers to rely on this
  • This could allow the more frequent polling of cheaper sources (while always being independent by still polling the RPC, maybe with less frequency)

This tracks building out these alternative indexing modalities

Tasks

Preview Give feedback
  1. 1 of 1
    dx permissionless
    tkporter

Related

Preview Give feedback
  1. agent bug indexing
    daniel-savu
  2. agent bug indexing tech-debt
    tkporter
  3. 0 of 1
    agent indexing
    tkporter
@nambrot
Copy link
Contributor Author

nambrot commented Jun 1, 2023

@asaj @yorhodes As far as I know, log fetching can be untrusted since agents can build the merkle tree and compare the roots, before using the indexed data?

@asaj
Copy link
Contributor

asaj commented Jun 1, 2023

Indeed, it's a liveness but not safety thing

@asaj
Copy link
Contributor

asaj commented Jun 6, 2023

@nambrot can't we just upload a cache of the rocks DB?

@tkporter
Copy link
Collaborator

tkporter commented Jun 7, 2023

Fetching the DB ^ is interesting

I'm skeptical of the etherscan / postgres DB indexing. Feels like a lot of effort for a questionable improvement in a post-DTM world

We've spent a lot of time working on the quirks of indexing and I'm wary that interacting with Etherscan APIs will increase the surface area we need to maintain & be more confusing to configure

& for postgres DB, now we would need to take our scraper / DB SLA much more seriously. Would we lose the flexibility to mess with the DB / scraper and just point the explorer to a new DB without affecting anyone else?

@nambrot
Copy link
Contributor Author

nambrot commented Jun 7, 2023

Just to be clear, this is not about an either or, this is the equivalent of a fallback provider for indexed data. I totally agree that with DTM, the benefit of this becomes smaller, but I also think the cost of this is pretty small. This doesn't have to run perfectly and can fallback to the current mechanism, but can imo provide relief for PI deployers to not pay up for indexing from scracth (even if it no longer blocks) for fat head chains.

@nambrot can't we just upload a cache of the rocks DB?

Yes we could, would you prefer that over just querying the explorer API?

@nambrot
Copy link
Contributor Author

nambrot commented Jul 3, 2023

Another source that we imo have not considered enough is the graph protocol. It provides very solid indexed data with great performance and is made for querying from agents like ours. Default subgraphs can be deployed and ran by anybody, and then queried by anybody. In fact, that is what the Khalani folks did (@serejke or @samtvlabs if you want to chime in)

@HariSeldon23
Copy link

We have a bunch of Graph Subgraphs running in our Kubernetes backend. It works fairly well, although some of the debugging tools are quite poor and fixing things has caused us a lot of issues. I believe we have a bunch of updated Helm charts as we needed to make quite a few changes for some of the chains we're live on.

Happy to help if there's any Graph queries as we've been using them for over 2 years.

Also as an aside, is anyone working on stress testing theoretical validator counts and the impact on latency? This is something I was quite interested in doing. Spin up a local testing environment and then create 50 validators. Run a bunch of messages. Check the latency of the messages. Then increase to 100. Check latency. Repeat. Compare results and then trawl through the logs to identify bottle necks. We're probably 8 - 12 weeks away from having our infra ready for this. But at some point I'd like to do it.

@nambrot
Copy link
Contributor Author

nambrot commented Jul 4, 2023

Not really, for most of the use cases we have been talking with folks about, 50 validators would actually be quite gas intensive, that's where using alternate signature schemes would probably be worth it

@serejke
Copy link
Contributor

serejke commented Jul 4, 2023

I believe the Graph protocol could be a good replacement of the in-house Hyperlane indexing solution, or an alternative for the brave ones. Setting up and maintaining a subgraph is quite time-consuming, but well documented. Subgraph also consumes a lot of RPC quota, though, especially if misconfigured or has unnecessary subgraphs.

We faced an issue with Hyperlane indexing that in case of RPC outages (Infura recently had a Mumbai downtime), the Hyperlane ate up all the daily quota. With Subgraph we can hope that a bigger community handled indexing edge-cases more thoroughly.

Spinning up a Graph only for the purpose of fallback indexing data seems a bit of an overkill. Alternatively, we can simply post batches of messages to S3.

We run subgraphs for the Balancer, and our Axon chain (indexing blocks data for explorer) and are mostly fine with it. We've recently moved the deployment to EKS and using RDS for storing subgraphs data.

@avious00 avious00 moved this to Backlog in Hyperlane Tasks Sep 14, 2023
@avious00 avious00 changed the title Agent Indexing takes time and can be costy Agent Indexing takes time and can be costly Sep 14, 2023
@yorhodes
Copy link
Member

yorhodes commented Sep 15, 2023

With Subgraph we can hope that a bigger community handled indexing edge-cases more thoroughly

I think @aroralanuk and I are very aligned with this direction

aroralanuk added a commit that referenced this issue Sep 18, 2023
### Description

- Adding "InsertedIntoEvent" for standalone indexing of events by the
agents

### Drive-by changes

None

### Related issues

#2313 

### Backward compatibility

Yes

### Testing

Unit tests
@yorhodes
Copy link
Member

https://www.streamingfast.io/ has support for ethereum, solana, and cosmos
I did a quick weekend hack https://github.com/yorhodes/substreams-template and it seems like this has the right combination of rust support, accessible configuration, and tip streaming optimizations

@tkporter
Copy link
Collaborator

I think we should turn this into an epic and not have this be small or a bounty

There are structural changes to the agent codebase & the solution for this is very unclear

@twilwa
Copy link

twilwa commented Sep 20, 2023

i had some thoughts about trying to do it without needing to make changes, or at least significant ones, to the rest of the codebase. if it were to be the case that all instantiations of the indexer need to be updated and tested, i would think that it probably should be an epic. but, if as @nambrot mentioned above, the modular behavior could be added as an option, but if passed no args about the log source, the indexer falls back to it's previously defined functionality, hopefully one could avoid significant refactors.

here's a proposed solution:

Define the LogFetcher trait:

pub trait LogFetcher<T: Sized>: Send + Sync {
    fn fetch_logs(&self, range: RangeInclusive<u32>) -> ChainResult<Vec<(T, LogMeta)>>;
}

Define a default implementation for the fetch_logs method of the Indexer trait:
This will serve as the fallback method for any type that implements the Indexer trait but does not override the fetch_logs method.

#[async_trait]
impl<T, F> Indexer<T> for F
where
    T: Sized,
    F: LogFetcher<T> + Send + Sync + Debug,
{
    async fn fetch_logs(&self, range: RangeInclusive<u32>) -> ChainResult<Vec<(T, LogMeta)>> {
        LogFetcher::fetch_logs(self, range).await
    }
    // ... Keep the other methods as is.
}

For specific log fetchers:
For types that want to use a specific log fetching mechanism, they can implement the LogFetcher trait and leverage the default Indexer trait implementation.

pub struct RpcLogFetcher;

impl LogFetcher<MyType> for RpcLogFetcher {
    fn fetch_logs(&self, range: RangeInclusive<u32>) -> ChainResult<Vec<(MyType, LogMeta)>> {
        // Implement log fetching using the RPC-based method.
    }
}

By doing this, RpcLogFetcher automatically gets the fetch_logs behavior of the Indexer trait.

For existing implementations of the Indexer trait:
These will continue to use their current fetch_logs logic since they've already provided an implementation. New implementations can choose whether to provide their own logic or use a specific log fetching mechanism via the LogFetcher trait.

I'm not very experienced with rust, so this is primarily GPT output, but is this the right general direction / are there any glaring issues that pop up to more seasoned eyeballs, in terms of keeping the scope small enough for a bounty? And I guess the better question is whether or not the preferred implementation is one that does require a significant refactor, rather than making compromises on the design pattern of the new log fetcher to accomodate the existing agent structure?

either way, would be interested in seeing if i can help out on this one if there are 'bountyable' parts of a larger project, idk why but it seems to have grabbed my attention lol

edit: gpt thought a trait was a struct lol

@avious00
Copy link
Contributor

hey @twilwa / yikes, just saw your comment. let me get @nambrot and one of the engineers to weigh in here on your approach, open to a bounty if it makes sense! always appreciate you diving in

@nambrot
Copy link
Contributor Author

nambrot commented Sep 28, 2023

@twilwa what kind of indexer do you think could be used instead of the RPC one once you do this change? IIUC, you still need to make all the RPC like requests then. IMO what would be most valuable is to have something closer to the sequence based indexer which is able to fetch already indexed events from the graph, an existing scraper DB or etherscan?

@twilwa
Copy link

twilwa commented Sep 28, 2023

actually a pretty salient point -- it does seem like the logfetcher redesign essentially modularizes for the sake of modularizing/doesn't really provide new flexibility if it's still just firing RPC-like requests at the end of the day. Dug around a bit more, came up with this:

Proposal: Modularized Fetch Mechanism for Pre-indexed Data Sources
Objective: Extend our current chain interaction mechanism to fetch logs from external pre-indexed data sources, making our system more adaptable and reducing dependency on direct RPC calls.

  1. Designing the ExternalLogProvider Interface:
// Interface for any external log provider.
#[async_trait]
pub trait ExternalLogProvider: Send + Sync + Debug {
    async fn fetch_logs(&self, query: LogQuery) -> ChainResult<Vec<Log>>;
}

Where LogQuery can be a struct that captures all necessary parameters to fetch logs from an external source.

  1. Implementing the Generic External Log Provider:

This will act as a bridge between our system and any external source. This example demonstrates a hypothetical integration with an external source.

struct GenericExternalProvider {
    endpoint: String,
    // ... other configuration options
}

#[async_trait]
impl ExternalLogProvider for GenericExternalProvider {
    async fn fetch_logs(&self, query: LogQuery) -> ChainResult<Vec<Log>> {
        // Make an HTTP request or other methods to fetch logs
        // from the external source using `self.endpoint` and `query`.
        // Return the logs.
    }
}
  1. Integrating with the System:

Introduce a new configuration variant for the external log source.

pub enum ChainConnectionConf {
    // ... existing variants
    External(GenericExternalProvider),
}

Modify the relevant methods in ChainConf to use this new connection configuration:

impl ChainConf {
    // ...

    /// Try to convert the chain settings into a message indexer
    pub async fn build_message_indexer(&self, metrics: &CoreMetrics) -> Result<Box<dyn SequenceIndexer<HyperlaneMessage>>> {
        let ctx = "Building message indexer";
        let locator = self.locator(self.addresses.mailbox);

        match &self.connection {
            // ... existing match arms
            ChainConnectionConf::External(provider) => {
                let logs = provider.fetch_logs(/* relevant query params */).await?;
                // Process and return logs as necessary
            }
        }
        .context(ctx)
    }

    // ...
}
  1. Future Considerations:

Flexibility: The LogQuery struct should be designed keeping in mind the possible variations in querying mechanisms across different external platforms.
Performance: While this draft focuses on functionality, performance considerations should be accounted for in the future, possibly via caching mechanisms.
Trust: Verification mechanisms should be put in place to ensure the authenticity of fetched logs.
Conclusion: This proposal provides a modularized mechanism to fetch logs from any pre-indexed data source, reducing our system's dependency on specific chains or platforms. By introducing a generic interface and provider, we can easily plug in different data sources as needed, enhancing flexibility and adaptability.

the LogQuery struct for an external source like Etherscan could look like the following:

struct LogQuery {
    /// The module, for Etherscan it will be "logs".
    module: String,

    /// The action, for Etherscan it will be "getLogs".
    action: String,

    /// Contract address to fetch logs for.
    address: H256,

    /// Block number (or block hash) to start fetching logs from.
    from_block: u32,

    /// Block number (or block hash) to end fetching logs at.
    to_block: u32,

    /// Pagination details: Page number.
    page: u32,

    /// Pagination details: How many logs to fetch in one request.
    offset: u32,

    /// The API key for the external service.
    api_key: String,

    // ... (any other necessary fields)
}

impl LogQuery {
    /// For Etherscan, this method can return a URL with the query parameters set.
    fn to_url(&self, base_url: &str) -> String {
        format!(
            "{base_url}?module={module}&action={action}&address={address}&fromBlock={from_block}&toBlock={to_block}&page={page}&offset={offset}&apikey={api_key}",
            base_url=base_url,
            module=self.module,
            action=self.action,
            address=self.address,
            from_block=self.from_block,
            to_block=self.to_block,
            page=self.page,
            offset=self.offset,
            api_key=self.api_key
        )
    }
}

With this design, constructing and using a LogQuery instance would be intuitive:

let query = LogQuery {
    module: "logs".to_string(),
    action: "getLogs".to_string(),
    address: H256::from("0xbd3531da5cf5857e7cfaa92426877b022e612cf8"),
    from_block: 12878196,
    to_block: 12878196,
    page: 1,
    offset: 1000,
    api_key: "YourApiKeyToken".to_string(),
};

let url = query.to_url("https://api.etherscan.io/api");
// Now use this URL to fetch logs from Etherscan.

This structure provides a clear representation of the query parameters needed to fetch logs from Etherscan. It can be expanded or adapted as necessary for other external platforms. The to_url method makes it easy to convert the query into a request URL, encapsulating the transformation logic within the struct itself.

a bit more comprehensive, i think, but not sure what impact something like that might have on the rest of the codebase/whether scope's still appropriate for bounty-tier

edit: make prettier :3

@tkporter tkporter mentioned this issue Oct 4, 2023
3 tasks
@yorhodes
Copy link
Member

https://github.com/google/leveldb/blob/main/doc/index.md#snapshots I think publishing archives of the scraper/relayer DB and having the binaries accept a (partial) archive upon startup would be pretty impactful here

@tkporter tkporter removed the small bounty size label Feb 5, 2024
@tkporter tkporter changed the title Agent Indexing takes time and can be costly Alternative indexing modalities Feb 22, 2024
@tkporter tkporter changed the title Alternative indexing modalities Implement alternative indexing modalities Feb 22, 2024
@tkporter
Copy link
Collaborator

See #3281 for changes to indexing - I'm going to change this issue to track specifically implementing alternative indexing modalities (as the groundwork for supporting this is covered in issues tracked by #3281, like #3268)

@nambrot nambrot changed the title Implement alternative indexing modalities Epic: Implement alternative indexing modalities Feb 27, 2024
@nambrot nambrot added the epic label Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

8 participants