Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark nv23 migration #12128

Closed
4 of 5 tasks
Tracked by #11939
jennijuju opened this issue Jun 20, 2024 · 5 comments
Closed
4 of 5 tasks
Tracked by #11939

Benchmark nv23 migration #12128

jennijuju opened this issue Jun 20, 2024 · 5 comments

Comments

@jennijuju
Copy link
Member

jennijuju commented Jun 20, 2024

  • Running the migration in "offline" mode.
  • Running the migration in "online" mode -- this means letting the premigration happen while the lotus daemon is syncing the chain, making sure the time and memory usage are okay
  • Param finalization: based on time and memory usage of the experiments, we need to settle on the correct number of and epochs for premigration
  • Expected durations (premigration and migration) and memory requirements are added to Lotus CHANGELOG.
  • Get an archival node to run the migration. (Moved to the master tracking doc)
@BigLep
Copy link
Member

BigLep commented Jun 21, 2024

@jennijuju : below are some questions for a newbie. I'm fine to get a verbal brain dump if that is quicker for you (and I can then expand the text). Here are the questions I have:

Update go-state-types

Does that mean updating lotus to use the latest go-state-types release (as part of updating lotus dependencies)

Migration Integration

Is there an example PR of this in the past to refer to?

rerunning the migration using lotus-shed with the latest RC

Do we have docs on this (e.g., sample commands that have been run in the past)? It's fine if not, but I would want to capture this. My thinking is we need to capture what we do manually today so we can have multiple people do it and potentially automate.

Ensuring correctness (run state invariants check)

What are the commands to run?

and that it's fast enough

Do we have more guidance on what "fast enough" means?

Running the migration in "online" mode

What network do we do this on, and what hardware?

making sure the time and memory usage are okay

Does this mean that there should be no degradation when the migration is happening?

Param finalization

Where does this get proposed/discussed? (Is there an example post from the past?)
Where does this get announced? (Is there an example post from the past?)


General questions:

  1. When in the network rollout should this happen? Is it the Butterfly phase, the Calibration phase?

@jennijuju
Copy link
Member Author

Migration Integration

the basic one is a part of the go state type upgrade skeleton & different FIP may have different needs. we have done this for nv23

rerunning the migration using lotus-shed

not docs it a cli: lotus-shed migrate-state --repo=<> <network version>

What are the commands to run?

./lotus-shed migrate-state --check-invariants (probably needs to update invariants first

Do we have more guidance on what "fast enough" means?

for this upgrade i expect it to be less than 5 sec if its not .. 1sec lolol. for more complicated migration we would like to be within 10-20sec range for the final migration, cuz anything beyond that is v close to block time -> miner may lose block or network wise we have null blocks

What network do we do this on, and what hardware?

mainnet - cuz it has the most states. yah whatever machine folks are using to sync the chain is sufficient for this. (some maybe slower some maybe faster, but its a part of the benchmark gathering imho)

Does this mean that there should be no degradation when the migration is happening?

for this one, yes.

When in the network rollout should this happen? Is it the Butterfly phase, the Calibration phase?

can do one as soon as we have the most code needed; def should do one before final calib release to catch any state invariant mismatch that may point out bugs

@rjan90
Copy link
Contributor

rjan90 commented Jul 18, 2024

Running the migration benchmark on a lower powered machine. (Spec: Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz, 128GiB RAM, SSD)

In offline mode following the tutorial here:

./lotus-shed migrate-state --repo=/mnt/lotuschain/.lotus 23 bafy2bzaced52k4dqvalwtrqramxvzwsgfcyqwqlipaj6s7zsdd2mcuvmq3y4c
-----
completed round actual (without cache), took  54.465324442s
-----
completed premigration, took  41.727872627s
completed round actual (with cache), took  28.820865108s

Max memory usage observed during the migration was 23GiB.

In online mode following the tutorial here:

Pre-Migration in "online-mode" on the same hardware as mentioned above:

tail -10000 lotusdaemon.log | grep migration
2024-07-22T08:05:00.242+0200	WARN	statemgr	stmgr/forks.go:250	STARTING pre-migration
2024-07-22T08:05:00.397+0200	INFO	fil-consensus	filcns/upgrades.go:2740	Creating migration jobs
2024-07-22T08:05:46.714+0200	INFO	fil-consensus	filcns/upgrades.go:2740	Done creating 3157950 migration jobs after 46.317066454s
2024-07-22T08:06:03.240+0200	WARN	statemgr	stmgr/forks.go:263	COMPLETED pre-migration	{"duration": 62.997402954}

And the actual migration in "online-mode":

2024-07-22T09:05:32.320+0200	WARN	statemgr	stmgr/forks.go:202	STARTING migration	{"height": "4110850", "from": "bafy2bzacea6oelpohhmscmqwkko5pmus2sccne6fht4sfeqyvui22qkf7hqc6"}
2024-07-22T09:05:32.321+0200	INFO	fil-consensus	filcns/upgrades.go:2740	Creating migration jobs
2024-07-22T09:06:07.012+0200	INFO	fil-consensus	filcns/upgrades.go:2740	Done creating 3158117 migration jobs after 34.691519164s
2024-07-22T09:06:13.894+0200	WARN	statemgr	stmgr/forks.go:211	COMPLETED migration	{"height": "4110850", "from": "bafy2bzacea6oelpohhmscmqwkko5pmus2sccne6fht4sfeqyvui22qkf7hqc6", "to": "bafy2bzacecshskvmz5yrpezdzlxmcti36jqdzu255nzv6pakod6xscevgexrg", "duration": 41.574212928}

Max memory usage observed during the migration was 30GiB.

@rjan90 rjan90 moved this from 🥞 Todo to 🏃 In Progress in nv23 Track Board Jul 22, 2024
@rjan90
Copy link
Contributor

rjan90 commented Jul 22, 2024

Expected durations (premigration and migration) and memory requirements are added to Lotus CHANGELOG.

Updated numbers/memory requirements in the changelog here: 33d0861

I have moved the Get an archival node to run the migration. to a separate step in https://docs.google.com/document/d/11jN9E4IcgcbU_6acAIHuh29v7yGFj7OHiiGYpoUuSEs/edit. As we will have to ask one of our partners to run this, until ArchiOz is up and running.

@rjan90 rjan90 moved this from 🏃 In Progress to ✅ Done in nv23 Track Board Jul 22, 2024
@rjan90
Copy link
Contributor

rjan90 commented Jul 22, 2024

Closing as completed.

@rjan90 rjan90 closed this as completed Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: ✅ Done
Development

No branches or pull requests

3 participants