Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProveCommit aggregate SysErrOutOfGas when publishing and no aggreate #7002

Closed
7 tasks done
f8-ptrk opened this issue Aug 7, 2021 · 16 comments
Closed
7 tasks done

ProveCommit aggregate SysErrOutOfGas when publishing and no aggreate #7002

f8-ptrk opened this issue Aug 7, 2021 · 16 comments
Labels
area/sealing kind/bug Kind: Bug P1 P1: Must be resolved
Milestone

Comments

@f8-ptrk
Copy link
Contributor

f8-ptrk commented Aug 7, 2021

Checklist

  • This is not a security-related bug/issue. If it is, please follow please follow the security policy.
  • This is not a question or a support request. If you have any lotus related questions, please ask in the lotus forum.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not an enhancement request. If it is, please file a improvement suggestion instead.
  • I have searched on the issue tracker and the lotus forum, and there is no existing related issue or discussion.
  • I am running the Latest release, or the most recent RC(release canadiate) for the upcoming release or the dev branch(master), or have an issue updating to any of these.
  • I did not make any code changes to lotus.

Lotus component

lotus miner/worker - sealing

Lotus Version

Daemon:  1.11.1-rc2+mainnet+git.40449f1cc+api1.2.0
Local: lotus-miner version 1.11.1-rc2+mainnet+git.40449f1cc

Describe the Bug

when aggregating provecommits is turned on and the miner manually publishes them via lotus-miner sectors batching commit --publish-now and the base Fee is under the threshold for aggregating some of the prove commit messages send to the chain are running into a SysErrOutOfGas error

https://filfox.info/en/address/f0418632

see messages send at height 999926.


lotus-miner sectors batching commit --publish-now
Batch 0:
	Message: bafy2bzacecbvw2lcwrdmi55capbjzlciit3xkqxx5ixvp7irees4ygi262jx4
	Sectors:
		10587	OK
Batch 1:
	Message: bafy2bzaceausw5bjdki7vudz4b75bqtqxwcy4peakmgachhcvroi2kwqmychk
	Sectors:
		10605	OK
Batch 2:
	Message: bafy2bzaced3idm4niplsbgfg3x6vkb3pdt4gx64366dvu33newxoejxjlas56
	Sectors:
		10586	OK
Batch 3:
	Message: bafy2bzacedw6ulmmh47mzirpck4jicceijhe4h7z26znaatumpnogrvp4brps
	Sectors:
		10594	OK
Batch 4:
	Message: bafy2bzaceanamcrmpalwrpnejp6fdhhiamlgemzpvb65w2ylfixivoqfzophu
	Sectors:
		10568	OK
Batch 5:
	Message: bafy2bzaceb4ahkxghi6x23bid6hjiqkf6pj4lyaduh5pbtduicktekr2q375k
	Sectors:
		10585	OK
Batch 6:
	Message: bafy2bzaceaxrd4bmrhfxwvk26sf7n5z4vlo3mbi3jgynhrwiq5rb4g6sceoqo
	Sectors:
		10618	OK
Batch 7:
	Message: bafy2bzacedvymkvuqipc3t3gwunvs5ymbhjpfpps7zt756vlw4gazqzfpo3da
	Sectors:
		10611	OK
Batch 8:
	Message: bafy2bzacebjznrap4kou5uvpczs25wsvtgkhcmgha24thooh6uvty7au2c4n6
	Sectors:
		10596	OK
Batch 9:
	Message: bafy2bzacea3ey56iweqkqtnn7fsfuctewejfj6ykvraydynpevx5vgo4j6vtk
	Sectors:
		10608	OK
Batch 10:
	Message: bafy2bzacea5wsceekjzk3oennndzz4hvmcfjnzc7tudks33nzpawlge5lc7z2
	Sectors:
		10589	OK
Batch 11:
	Message: bafy2bzacedl2rpaecb6hxrmemtqsrqcql4fbcug7salcbzucyluegfxytkgku
	Sectors:
		10590	OK
Batch 12:
	Message: bafy2bzacedqjujqlygjy4bupbe5goxmipi7rizvidvnx6jjgpy5mzxxtyafom
	Sectors:
		10602	OK
Batch 13:
	Message: bafy2bzacedcnzjc4eh3skh67jinv4fnrza4xvkxkxqxx7puppmczjzltdug4y
	Sectors:
		10548	OK
Batch 14:
	Message: bafy2bzacecnaz4k3dynwv3d54c4yviy2rifo7vd3fzwxyilfjp7jya347jne6
	Sectors:
		10561	OK
Batch 15:
	Message: bafy2bzacebkdzvxhcidq52ngg3w353mi2zmnfhudh3u4bbj73j7eug7k46tfe
	Sectors:
		10573	OK
Batch 16:
	Message: bafy2bzaceb4opnqjfic6ddmaxaedz3w3fs6hzj3r2ng355uzzt42uea5qglk4
	Sectors:
		10616	OK
Batch 17:
	Message: bafy2bzacedaykihvdy3wd5h2n5efzfqddvdri464fazgeq7rm7mf4akimrkzw
	Sectors:
		10591	OK
Batch 18:
	Message: bafy2bzaced6u5gvhqu75atp536sm5npvi4ika6tiukwhvaphfr3344qfjjl5c
	Sectors:
		10599	OK
Batch 19:
	Message: bafy2bzacedfo56yuap2n5o6ufhc2hef7h2avo2megjolwsymgugqkj2lela6k
	Sectors:
		10598	OK
Batch 20:
	Message: bafy2bzacea75vbvrnnofykdluska27fij32r7cr7ahqarcblqpvkeqbzkkpni
	Sectors:
		10613	OK
Batch 21:
	Message: bafy2bzaceazzm5iigpis2v7dftjrzbwqvh65u6rfiomn4xyg6ed37ybh5gp3w
	Sectors:
		10574	OK
Batch 22:
	Message: bafy2bzacecuqz4d2g3gthrd6ykzibbl4apjqc222xll4r6bg7io4yh63j4cja
	Sectors:
		10569	OK
Batch 23:
	Message: bafy2bzacec4wkrlegbrkqf5azfstbgwchwk34w2rr3xy4xfoi7qq4c3veycno
	Sectors:
		10559	OK
Batch 24:
	Message: bafy2bzaced5w7a4qgmebqn7g2vox3xllrzfldq2mh4jirhhyxoz5p7p3wgxmq
	Sectors:
		10610	OK
Batch 25:
	Message: bafy2bzacebjbkdjvgxok7azpuc3rhjldb5u27d7hs7mjx7rk5jozpt4fe4ipc
	Sectors:
		10621	OK
Batch 26:
	Message: bafy2bzaceahp6iuuczhl3tnrcfrudmewmh2lez3obhvvj7o42tt22hyjy5wi2
	Sectors:
		10626	OK
Batch 27:
	Message: bafy2bzacedok6kx5o7rlkliyu6zcdx6kcuocm2k3hejw4lmns6u5i2oggrbjm
	Sectors:
		10553	OK
Batch 28:
	Message: bafy2bzaced5a4nuk6modktbxrygsvgvycn3wt7ebzr6mdsk3un2fqn3dzfokm
	Sectors:
		10588	OK
Batch 29:
	Message: bafy2bzacedjtztxsaenfyl5ttasyl5jiqdkqzifc2mg3es4mhypj23kfpmp5g
	Sectors:
		10600	OK
Batch 30:
	Message: bafy2bzaced5uvpwzsz7lsixprfwu7oh7q5l2ctii4aoydmlhyspkak42s5iyc
	Sectors:
		10612	OK
Batch 31:
	Message: bafy2bzacebn5y4ds4hlhmdvcbpotrvmtxipvurd36zb5hhocfonzqg7nei4ni
	Sectors:
		10607	OK
Batch 32:
	Message: bafy2bzacea2phj4nphsm2aejabezsskbkv3rqqs3w4wpt5sktkosf63tesjw2
	Sectors:
		10615	OK
Batch 33:
	Message: bafy2bzaceagtiquv7bo57yp2anf7jlbp7pzvhncgddnijguhadchgmsasitge
	Sectors:
		10592	OK
Batch 34:
	Message: bafy2bzaceay3b3bxt6dh6nyjett4wvz2rkasdeexlzrn3gwifrgngvkykyu26
	Sectors:
		10601	OK
Batch 35:
	Message: bafy2bzacech6ualuuuwkx66b6mczmhs7dbj3hy2eq2yiof7iuibwf7epxlfzq
	Sectors:
		10624	OK
Batch 36:
	Message: bafy2bzacedgpimyposeagnftteqoax7aie5onqcxmrawtba6mqvy25bh4v574
	Sectors:
		10556	OK
Batch 37:
	Message: bafy2bzacebxq2nvfcaclzts5vcfm2kox2i5wetuxnism62xjry3yzgjhkfvww
	Sectors:
		10595	OK
Batch 38:
	Message: bafy2bzacedr6uqc33yfk3qodydiixxymmrrygqdnhebg3s2crmms2findp6zc
	Sectors:
		10620	OK
Batch 39:
	Message: bafy2bzaceb5ax2x5jucp5meumrxric4heutusw2dgr66sxxdiarwnpzkn4t2g
	Sectors:
		10614	OK
Batch 40:
	Message: bafy2bzaceaj7fij3cxrq5qeqv4c6ydaoc5y3duhdtiz3fnta5btbikwkvilz6
	Sectors:
		10606	OK
Batch 41:
	Message: bafy2bzacedy5rycnvpnah4ptggocwqmhdekkedttzj4b6uobf7u2ei34zchqm
	Sectors:
		10552	OK
Batch 42:
	Message: bafy2bzacebrubuir46dbjwdncjm3t7i56oexhe6lhqlgd3tmxtwhjraqyabtw
	Sectors:
		10629	OK
Batch 43:
	Message: bafy2bzacebxpbfuxvunzlosetwbs43tpvtacssdzmdzl5jdj5eo2f3q36yqka
	Sectors:
		10597	OK
Batch 44:
	Message: bafy2bzacec6upc33psrvfnc5sbuxi3bo5l6pu374ucs4kywe5jhlcomh4cbb2
	Sectors:
		10609	OK
Batch 45:
	Message: bafy2bzaceckwvnahgbbfnsx5ykzku7op7gbnznvvj2nk652pcgciudzmyrpd6
	Sectors:
		10604	OK
Batch 46:
	Message: bafy2bzacecltscji6zcspyw6wz2nebfog3vpzzd3a2pf6ckhcd4cwfflxpwfm
	Sectors:
		10617	OK
Batch 47:
	Message: bafy2bzaceavti3secdx2dmp2p62yrgsxzv7xiweyoqicu7f5n5g44gjzk2bse
	Sectors:
		10580	OK
Batch 48:
	Message: bafy2bzacebs67odmmpedno7bnoklgwirsoch22r4owjgnf3c5o2qq7z6jnwxw
	Sectors:
		10577	OK
Batch 49:
	Message: bafy2bzaced6b6pxcnj63z73ll7ucso2e2mp2jvqiuuxypwovbtgdcqxvk7h5k
	Sectors:
		10579	OK
Batch 50:
	Message: bafy2bzaceawblbnttcm5gboeahlmjvqy3lfenguw3nbmmm7tdvgaq764f2ow6
	Sectors:
		10593	OK
Batch 51:
	Message: bafy2bzacedoi3qm2mz4bcod5kkyx4fl4r73ihjbfhughcaaw5nzxz6mhwizes
	Sectors:
		10603	OK
Batch 52:
	Message: bafy2bzacedbyxb2g5jx7l67hucuhyzk44u6mtfgqxlg6z4l7kdejkl6mfhdmq
	Sectors:
		10622	OK

Logging Information

please let us know if you need detailed logs and what exactly you need.

Repo Steps

aggregate sectors for some time, manually push them when the baseFee is below the threshold for batching

@jennijuju
Copy link
Member

logs around the failure and message CID please?

@jennijuju
Copy link
Member

also is there a reason why you are manually pushing aggregation when the basefee is low instead of setting AggregateAboveBaseFee in config?

@jennijuju jennijuju added need/author-input Hint: Needs Author Input and removed need/triage kind/bug Kind: Bug labels Aug 7, 2021
@f8-ptrk
Copy link
Contributor Author

f8-ptrk commented Aug 7, 2021

want them on chain before wdPost, that's why i pushed them manually. the fee is set to the default. then manually publishing it is not aggregating them but sending a lot of single provecommit messages when the baseFee is below the threshold

it needs one manual publish to get the rhythm set to have them published before wdPosts to optimize the aggregate publishing and storage power gains.

cat ./main/miner/config.toml 
  ListenAddress = "/ip4/10.0.0.10/tcp/2345/http"
  RemoteListenAddress = "10.0.0.10:2345"
#  Timeout = "30s"
#
[Backup]
  DisableMetadataLog = true
#
[Libp2p]
  ListenAddresses = ["/ip4/1.2.3.4/tcp/15001"]
  AnnounceAddresses = ["/ip4/1.2.3.4/tcp/15001"]
#  NoAnnounceAddresses = []
#  ConnMgrLow = 180
#  ConnMgrHigh = 540
#  ConnMgrGrace = "20s"
#
[Pubsub]
#  Bootstrapper = false
#  RemoteTracer = "/dns4/pubsub-tracer.filecoin.io/tcp/4001/p2p/QmTd6UvR47vUidRNZ1ZKXHrAFhqTJAD27rKL9XYghEKgKX"
#
[Dealmaking]
  ConsiderOnlineStorageDeals = true
  ConsiderOfflineStorageDeals = false
  ConsiderOnlineRetrievalDeals = true
  ConsiderOfflineRetrievalDeals = false
  ConsiderVerifiedStorageDeals = true
  ConsiderUnverifiedStorageDeals = true
#  PieceCidBlocklist = []
  ExpectedSealDuration = "24h0m0s"
  PublishMsgPeriod = "4h0m0s"
  MaxDealsPerPublishMsg = 25
#  MaxProviderCollateralMultiplier = 2
#  Filter = ""
#  RetrievalFilter = ""
#
[Sealing]
  MaxWaitDealsSectors = 20
  MaxSealingSectors = 250
  MaxSealingSectorsForDeals = 150
  WaitDealsDelay = "2h0m0s"
  AlwaysKeepUnsealedCopy = true
  FinalizeEarly = true
  BatchPreCommits = true
  MaxPreCommitBatch = 8
  PreCommitBatchWait = "8h0m0s"
  PreCommitBatchSlack = "3h0m0s"
  AggregateCommits = true
  AggregateAboveBaseFee = "0.00000000015 FIL"
  MinCommitBatch = 4
  MaxCommitBatch = 819
  CommitBatchWait = "8h0m0s"
  CommitBatchSlack = "1h0m0s"
#  TerminateBatchMax = 100
#  TerminateBatchMin = 1
#  TerminateBatchWait = "5m0s"
#
[Storage]
  ParallelFetchLimit = 10
  AllowAddPiece = false
  AllowPreCommit1 = false
  AllowPreCommit2 = false
  AllowCommit = false
  AllowUnseal = true
#
[Fees]
  MaxPreCommitGasFee = "0.04 FIL"
  MaxCommitGasFee = "0.22 FIL"
  MaxTerminateGasFee = "0.5 FIL"
  MaxWindowPoStGasFee = "3 FIL"
  MaxPublishDealsFee = "0.05 FIL"
  MaxMarketBalanceAddFee = "0.007 FIL"
#
[Addresses]
  PreCommitControl = ["f0418577"]
  CommitControl = ["f0418578"]
#  DisableOwnerFallback = false
#  DisableWorkerFallback = false
#


@f8-ptrk
Copy link
Contributor Author

f8-ptrk commented Aug 7, 2021

msg CIDs in the above's command output.

log files are downloading

@f8-ptrk
Copy link
Contributor Author

f8-ptrk commented Aug 7, 2021

all times in UTC. sectors sealed on 06/07-08 and the error in committing/publishing manually on 07-08

miner_log-07-08-21.txt
miner_log-06-08-21.txt

@f8-ptrk
Copy link
Contributor Author

f8-ptrk commented Aug 7, 2021

this happens too if the batch gets pushed non manual and is not aggregated!

bafy2bzacedvib236qpb7otegwshvroyoogz5ktyuipahxnaonlq2ugongwtlo
bafy2bzaceadnsdmtr4urz2sxpxah4x2kfcuadcujgk7rj4kqlecglakqnzbss

and others

@f8-ptrk
Copy link
Contributor Author

f8-ptrk commented Aug 9, 2021

quick update on the fallout of this:

  • the messages that failed left sectors hanging in CommitFinalize state
  • the sectors did not resume the natural flow of states after miner restarts (i think we restarted twice or more)
  • last night they resumed, 10+ at a time, clogging the network severely (10+ FINs at the same time)
  • missed a block due to slow network
  • missed all wdPosts due to clogged network

conclusion: no fun at all, turned of aggregation as it is a severe risk in our eyes.

@f8-ptrk
Copy link
Contributor Author

f8-ptrk commented Aug 9, 2021

note to myself:

logs for this resuming will be Mon 09 Aug 2021 10:12:21 AM UTC and the 24h before that if someone asks

@f8-ptrk f8-ptrk changed the title ProveCommit aggregate SysErrOutOfGas when manually publishing and no aggreate ProveCommit aggregate SysErrOutOfGas when publishing and no aggreate Aug 9, 2021
@supriya-premkumar
Copy link

I am running into the same issue as well. Additionally, the sectors are still in the batching commit queue even after publishing.

@holodnak
Copy link

holodnak commented Sep 3, 2022

I am experiencing this SysErrOutOfGas when manually sending the committed sectors using the lotus-miner sectors batching commit --publish-now command. It is usually about 10% of the sectors submitted, today for example, I had 112 sectors I wanted on chain before WindowPost, and after running the command, 11 of them had the out of gas error.

@shrenujbansal
Copy link
Contributor

@f8-ptrk Are you able to resubmit the prove commits and have those messages succeed on subsequent tries?
Do you notice this happen when doing prove commits for a smaller number of sectors?

@f8-ptrk
Copy link
Contributor Author

f8-ptrk commented Mar 21, 2023

resubmit, yes. yeah - maybe 64or so. after that they succeeded

we turned the feature off after the first fail as it is most likely way more expensive to use it than not using it (paying double for messages is hard to come by even with aggregation)

@shrenujbansal
Copy link
Contributor

we turned the feature off after the first fail as it is most likely way more expensive to use it than not using it (paying double for messages is hard to come by even with aggregation)

Which feature? Provecommit aggregation?
From what you pointed above and what I've seen in my local testing, this happens even without provecommit aggregation. It might even be more likely in the unaggregated case since this error is likely to happen when submitting several provecommit messages all at once

This is because gas estimation for submitting poreps is not deterministic due to underlying AMT used to store per-miner batched proofs. Look at filecoin-project/specs-actors#1319 for further context

I would recommend staggering your prove commits in smaller batches so you're less likely to run into this issue at the moment

@rjan90 rjan90 moved this from ⭐️ In Scope to 🏗 In Progress in Lotus-Miner-V2 Mar 22, 2023
@f8-ptrk
Copy link
Contributor Author

f8-ptrk commented Mar 22, 2023

We saw the out of gas when the individual messages were send with aggregation turned on.

This is because gas estimation for submitting poreps is not deterministic due to underlying AMT used to store per-miner batched proofs. Look at filecoin-project/specs-actors#1319 for further context

I would recommend staggering your prove commits in smaller batches so you're less likely to run into this issue at the moment

whats the recommended number here? We actually planned, when considering the feature again, to go straight 819 in a batch!

@rjan90 rjan90 moved this from 🏗 In Progress to 🧪In Testing in Lotus-Miner-V2 Mar 23, 2023
@rjan90 rjan90 moved this from 🧪In Testing to ✅ Done - ready for v1.23.0 in Lotus-Miner-V2 Apr 3, 2023
@f8-ptrk
Copy link
Contributor Author

f8-ptrk commented May 27, 2023

it still fails. terribly. provecommit aggregate, even on CC sectors, is a waste of gas with this "feature" here.

will this be fixed? def. not in 1.23.0

@rjan90
Copy link
Contributor

rjan90 commented May 27, 2023

This has been fixed in v1.23.1. A followup ticket to get more friendly UX-output when the messages are staggered across multiple epochs are here: #10708

@rjan90 rjan90 closed this as completed May 27, 2023
@github-project-automation github-project-automation bot moved this from ✅ Done to 👀 In Review in Lotus-Miner-V2 May 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sealing kind/bug Kind: Bug P1 P1: Must be resolved
Projects
Status: 👀 In Review
Development

No branches or pull requests

6 participants