Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent docs/config improvements after EthTokyo #2113

Closed
nambrot opened this issue Apr 18, 2023 · 8 comments · Fixed by #2148
Closed

Agent docs/config improvements after EthTokyo #2113

nambrot opened this issue Apr 18, 2023 · 8 comments · Fixed by #2148
Assignees
Labels
agent permissionless permissionless deployments PI

Comments

@nambrot
Copy link
Contributor

nambrot commented Apr 18, 2023

Problems:
- A lot of configuration fatigue
- missing field validator is too vague
- No such file or directory (os error 2) is a frustrating error
- Machine specs section on starting a validator looks out of place
- Error: Destination chain base does not have a configured signer feels not as actionable as we can be
- Missing chain configuration for sepolia; this includes protocol type and the connection information is quite vague
- in the relayer guide, similar to the validator guide, the specific sequence is a bit unclear. Going page by page gets me to message filtering which i should not need to check out. The Env page does again not distinguish between required and optional vars and does not tell me what the next step is, when i am done
- Way too easy to miss HYP_RELAYER_ALLOWLOCALCHECKPOINTSYNCERS
- Not mounting the DB_PATH will result in relayers needing to index from genesis every time which is very annoying
- When running two relayers locally, it is possible to actually have both read/write to the same DB which is very hard to debug and obviously does not work
-  "no name found" - typo results in weird message
- When restarting anvil, previous artifacts are no longer valid, but very hard to debug

Possible Solutions (not exhaustive):
- Collapse running validator/relayer into a single page as much as possible
- Default validator interval to something like 5s
- Default HYP_BASE_METRICS (maybe already done)
- Default HYP_BASE_TRACING_LEVEL and FORMAT (maybe already done)
- Segment the environment variables/config into required and optional ones (i.e. ones that have reasonable defaults should be further down)
- “missing field validator” should have a more useful message indicate to specify the validator key
- When using localStorage checkpoint synced, just create the directory if it doesn’t exist
- When signers are missing, point to default signer as much as possible
- If an RPC url is missing, call it out as such, don’t say chain configuration is missing with protocol type and connection information
- HYP_RELAYER_ALLOWLOCALCHECKPOINTSYNCERS needs to be made more obvious. When it’s not enabled but we discover a local storage checkpoint synced, we should log an info log line to indicate that this is the issue why metadata could not be fetched
- Indicate that relayers when using docker should mount external storage for DB_PATH
- Detect when the agent runs on a DB for which it was not configured (i.e. its the DB of some other agents)
- "no name found" - should have a more clear error message that they probably have a typo in the RPC url env var

@nambrot nambrot converted this from a draft issue Apr 18, 2023
@nambrot nambrot added agent permissionless permissionless deployments PI labels Apr 18, 2023
@nambrot nambrot moved this from On Deck to Sprint in Hyperlane Tasks Apr 19, 2023
@mattiekat mattiekat moved this from Sprint to In Progress in Hyperlane Tasks Apr 20, 2023
@mattiekat
Copy link
Contributor

Current default if the metrics port is not defined is to just not host the metrics webserver at all which we could change but then there might be conflicts if they try to run multiple agents at once. I think this also fine because I would assume metrics are more of an "Advanced configuration" piece that is not really relevant for getting something up and running at first.

I am also generally cutting down the configuration that is "required" since we now have the reference page for anyone who wants get more into the weeds.

tracing config (level and fmt) already default so removing them from the guide (still in the reference).

Not sure where "Missing field validator" came from, I think that is an old message now.

With the most recent release, the default database directory now includes the chain name, so hopefully that will prevent the conflicts between relayers. Adding docs though to help clarify for people that specify it manually.

@mattiekat
Copy link
Contributor

mattiekat commented Apr 26, 2023

TODO: update metric port to default to 9090 and log a warning if we cannot start the metrics server due to a conflict.

@mattiekat
Copy link
Contributor

Not sure where "no name found" comes from; any additional context?

When restarting anvil, previous artifacts are no longer valid, but very hard to debug
What is the desired change for this? I can add a message somewhere but not sure how to detect this case.

@mattiekat
Copy link
Contributor

TODO: Indicate that relayers when using docker should mount external storage for DB_PATH
TODO: Detect when the agent runs on a DB for which it was not configured (i.e. its the DB of some other agents)

@nambrot
Copy link
Contributor Author

nambrot commented Apr 28, 2023

Not sure where "Missing field validator" came from, I think that is an old message now.

I believe this happens when you don't specify the validator signer? Or what is the error message now?

With the most recent release, the default database directory now includes the chain name, so hopefully that will prevent the conflicts between relayers. Adding docs though to help clarify for people that specify it manually.

Oh I like that

@nambrot
Copy link
Contributor Author

nambrot commented Apr 28, 2023

Not sure where "no name found" comes from; any additional context?

If you specify a connection URL with a typo in the chain name, imo that is the error message that (used to) pop up (which indeed is hard to tell where it comes from :))

When restarting anvil, previous artifacts are no longer valid, but very hard to debug
What is the desired change for this? I can add a message somewhere but not sure how to detect this case.

You can make an eth_getCode call at the contract addresses, and if it doesnt return anything it means the contract does not exist, and I would probably log it as an error. You can just do this maybe when the call to the mailbox root reverts

@mattiekat
Copy link
Contributor

Went ahead and made it so the database officially works with multiple domains (technically it already did but it was poorly documented and was implemented as a hack (before my time)). I figure this will also help with #1899

@mattiekat
Copy link
Contributor

Not sure where "Missing field validator" came from, I think that is an old message now.

I believe this happens when you don't specify the validator signer? Or what is the error message now?

It should now just default to SignerConf::Node

Not sure where "no name found" comes from; any additional context?

If you specify a connection URL with a typo in the chain name, imo that is the error message that (used to) pop up (which indeed is hard to tell where it comes from :))

Not going to worry about this for the moment with #1867 changing how that works anyway. Not honestly sure what I can do anyway since if they put in the wrong chain name it will have the wrong path. I can't fix that but it should at least list the path it tried to follow.

When restarting anvil, previous artifacts are no longer valid, but very hard to debug
What is the desired change for this? I can add a message somewhere but not sure how to detect this case.

You can make an eth_getCode call at the contract addresses, and if it doesnt return anything it means the contract does not exist, and I would probably log it as an error. You can just do this maybe when the call to the mailbox root reverts

I do not see a good place to change this in the code. The mailbox is constructed without checking the address it looks like and then there's just a bunch of different calls it supports. And at any of the points the error could be something other than an invalid mailbox.

@mattiekat mattiekat moved this from In Progress to In Review in Hyperlane Tasks Apr 28, 2023
mattiekat added a commit that referenced this issue May 10, 2023
### Description

Bunch of minor tweaks and updates based on what we learned from
EthTokyo.

- Updated Rust readme and the root readme to align with the hyperlane
docs
- Defaults to a 5 sec validator interval now
- Makes the metrics port default to `9090` and have a soft failure if it
can't start
- Now will automatically create the checkpointsyncer directory if it is
missing
- Hyperlane database now officially uses the domain name and id as a
prefix internally to support multiple relayers using the same database
- 

### Drive-by changes

Refactored and cleaned up the database code. Removed a lot of
duplicate/unnecessary/redundant functions.

### Related issues

- Fixes #2113 

### Backward compatibility

_Are these changes backward compatible?_

Yes

_Are there any infrastructure implications, e.g. changes that would
prohibit deploying older commits using this infra tooling?_

Yes - This changes the database prefixes internally to use domain name
and domain id; also prepares us for #1899


### Testing

_What kind of testing have these changes undergone?_

Manual
Unit Tests
@github-project-automation github-project-automation bot moved this from In Review to Done in Hyperlane Tasks May 10, 2023
vinhyenvodoi98 pushed a commit to vinhyenvodoi98/hyperlane-monorepo that referenced this issue May 10, 2023
### Description

Bunch of minor tweaks and updates based on what we learned from
EthTokyo.

- Updated Rust readme and the root readme to align with the hyperlane
docs
- Defaults to a 5 sec validator interval now
- Makes the metrics port default to `9090` and have a soft failure if it
can't start
- Now will automatically create the checkpointsyncer directory if it is
missing
- Hyperlane database now officially uses the domain name and id as a
prefix internally to support multiple relayers using the same database
- 

### Drive-by changes

Refactored and cleaned up the database code. Removed a lot of
duplicate/unnecessary/redundant functions.

### Related issues

- Fixes hyperlane-xyz#2113 

### Backward compatibility

_Are these changes backward compatible?_

Yes

_Are there any infrastructure implications, e.g. changes that would
prohibit deploying older commits using this infra tooling?_

Yes - This changes the database prefixes internally to use domain name
and domain id; also prepares us for hyperlane-xyz#1899


### Testing

_What kind of testing have these changes undergone?_

Manual
Unit Tests
vinhyenvodoi98 pushed a commit to vinhyenvodoi98/hyperlane-monorepo that referenced this issue May 11, 2023
### Description

Bunch of minor tweaks and updates based on what we learned from
EthTokyo.

- Updated Rust readme and the root readme to align with the hyperlane
docs
- Defaults to a 5 sec validator interval now
- Makes the metrics port default to `9090` and have a soft failure if it
can't start
- Now will automatically create the checkpointsyncer directory if it is
missing
- Hyperlane database now officially uses the domain name and id as a
prefix internally to support multiple relayers using the same database
-

### Drive-by changes

Refactored and cleaned up the database code. Removed a lot of
duplicate/unnecessary/redundant functions.

### Related issues

- Fixes hyperlane-xyz#2113

### Backward compatibility

_Are these changes backward compatible?_

Yes

_Are there any infrastructure implications, e.g. changes that would
prohibit deploying older commits using this infra tooling?_

Yes - This changes the database prefixes internally to use domain name
and domain id; also prepares us for hyperlane-xyz#1899

### Testing

_What kind of testing have these changes undergone?_

Manual
Unit Tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent permissionless permissionless deployments PI
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants