Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/stewardship doesn't check final leaves #4696

Closed
ldeffenb opened this issue May 28, 2024 · 3 comments
Closed

/stewardship doesn't check final leaves #4696

ldeffenb opened this issue May 28, 2024 · 3 comments
Assignees
Labels
issue needs-triaging new issues that need triaging

Comments

@ldeffenb
Copy link
Collaborator

ldeffenb commented May 28, 2024

Context

2.1.0 (and all earlier, I suspect)

Summary

I have a file reference on the swarm mainnet that is missing a chunk, but yet the /stewardship API insists that the reference is retrievable. I have the reference locally pinned, but due to the earlier bugs in the swarm, one of the BMT chunks is no longer local and in fact, no longer in the swarm.

mainet Reference: a3ad3e5ea830b930c62ddf11d198c6492261bbdb612fe17fabe7585b5aedd52d
Composed of BMT chunks:

a3ad3e5ea830b930c62ddf11d198c6492261bbdb612fe17fabe7585b5aedd52d
9204ce19779429a0c37f0f2eb23d539dd9226d84c4bd287a2aede3a2940177a1
cf973b6794a7d1dd5c8df9c2085f31643f4da935df791d0bb077800641600dfb
d71d0a8405546fe15228edecc4f6efc0c56f336f81ec48f6377fc89ad84c0f5c
e1cc5a6c1e9b270bde443f392cfa9fa8b0cef168d147fe23828790a8521b4584
ec9bc0f22a84e2e4af4c3ddcd5213b3114ea60441a94ca14486e7db8c91c93f1 <- Missing
fc7f913696b3d06756cfaee1d84db4b0e1a7c041f43bd9456174b7bd0816c136

If you try to retrieve the missing chunk with the /chunks API, you will see that it is truly missing, even though /stewardship says the entire main reference is retrievable. Pulling the file contents with /bytes will give you an incomplete file.

curl http://localhost:1633/bytes/a3ad3e5ea830b930c62ddf11d198c6492261bbdb612fe17fabe7585b5aedd52d
curl: (18) transfer closed with 22592 bytes remaining to read

Expected behavior

If /stewardship says a reference is retrievable, I'd expect to be able to retrieve it.

Actual behavior

/stewardship says it is retrievable
/bytes is short
/chunks of the missing chunk fails.

Steps to reproduce

Just try the references shown above on the current mainnet swarm.

Possible solution

The stewardship API invokes s.steward.IsRetrievable

res, err := s.steward.IsRetrievable(r.Context(), paths.Address)

IsRetrievable simply uses the prebuilt netTraverser which uses an internal netGetter to retrieve chunks from the swarm. But the issue is in the actual traverser.

switch err := s.netTraverser.Traverse(ctx, root, noop); {

traversal.Traverse determines that the reference is not an SOC and also not a manifest node, so it simply invokes processBytes on the address.

if err := processBytes(addr); err != nil {

processBytes creates a joiner and then invokes j.IterateChunkAddresses with the stewardship callback function which is actually noop.

j, _, err := joiner.New(ctx, s.getter, s.putter, ref)

err = j.IterateChunkAddresses(iterFn)

noop := func(leaf swarm.Address) error { return nil }

But we still haven't seen the issue which is actually in the joiner iteration itself. The first thing IterateChunkAddresses does is invoke the callback on the original address. Note that the joiner has not actually fetched this address from the swarm, but invokes the noop callback. We're lucky because Traverse actually DID retrieve the root chunk from the swarm to perform the SOC and manifest tests.

err := fn(j.addr)

ch, err := s.getter.Get(ctx, addr)

switch mf, err := manifest.NewDefaultManifestReference(addr, ls); {

So the root chunk has actually been retrieved with steward's netGetter. Now, the joiner invokes processChunkAddresses which traverses the BMT for the reference. For each of the addresses, the callback is invoked, note that this is BEFORE any retrieval has been done on those child chunks.

if err := fn(addr); err != nil {

I suspect that the issue where a callback-chunk is not be retrieved is if the sec is less than a chunkSize. The continue avoids actually checking if the chunk is retrievable.

if sec <= swarm.ChunkSize {

The child chunk is retrieved inside a new goroutine and recursively passed back into processChunkAddresses.

ch, err := g.Get(ectx, addr)

return j.processChunkAddresses(ectx, fn, chunkData, subtrieSpan, parities)

I leave these references to the powers-that-bee that understand the BMT better than I do to figure out where the traverser/joiner is not actually checking this one single chunk of this BMT file reference. It is supposed to be a 256x256 PNG file that should be approximately (from a different map version):

8-98-138

The following mainnet swarm references are other /bytes version of this same map tile. They may or may not be retrievable in the current swarm (at least, until I manage to re-push them after 2.1.0 settles down).

 95d26ccba7998a49a6aa6534459158f0e434d335a2ef1cd70ec29cc89c8ab7d9
 0d7354af390889a45472a5687be0ea9c02540d7a668a441703868af038e2bd2d
 0d7354af390889a45472a5687be0ea9c02540d7a668a441703868af038e2bd2d
 dc948df0e9b0db6874dc12f2bf6d89e0258c2cd0165e53974430f642228198ea
 6a1d92983b028aead3299459309ae8878472ab6fb5591b930cf5c2b543f17038
 2ec6d614ac1ce99dbf5bb54aea72194bc6c3f7ecd952ea68295b691eb3497310
 310a5e4d0a0e6240823cf1530958c70ab43411bdc4e9d8eee81f0c129301e3ba
 a3ad3e5ea830b930c62ddf11d198c6492261bbdb612fe17fabe7585b5aedd52d
 bbf893626a95b8609d174c18010404e06e5fd4c6bfad46ebba2a86d47b28ad99
 6132026206b9068d02231bd9af20dfac6e52ab80f4770742cd232dd350d85c91
 c1144defcbe7dd3ef229a2d0d21cdbc2032c0f5c66edfd3f240c02fd88de24b4
 2238885700a6da70e82cac353decb6b9d6a1ccd9e5e284c4b4a4fd869d34e827
 8bf48f37175dd99988dc5bda5608509967dfb3d7d41e6bc93c027c336b13e667
 269ede36d00fdf8cccb32e7006d433cdad9c81af32e65d344ebeff522dd30f31
 a3ad3e5ea830b930c62ddf11d198c6492261bbdb612fe17fabe7585b5aedd52d
 ca5f0729d0a7c5a7e27b228466bff311d5d36b2c14220dab7268cca85aeadf64
 e0446a201c59b6fba65898225a05cabbfb31c4d848b08aa7a13406c85f651f09

Another example of this issue is mainnet reference f6e186f14ecd3ae0f9a61ce8c23348c8d27d23398ff7f24e224b53f2b5c488b6 which is retrievable, but is composed of the following BMT chunks, one of which is missing from the swarm.

f6e186f14ecd3ae0f9a61ce8c23348c8d27d23398ff7f24e224b53f2b5c488b6
d402ec274357db00d332015dfc3a2e43416f287736344a69fa877aa176730082 <- Missing
6410d648329a71a8a94b1db2db6f737057d7f2ef654c11c807bbabfe35429be5
6dc42ec40f054bacffc8cd590ee4b2eb2f4fd85f5216adaf964c67cda0968149
c0d0897c4ec900b8d325306bfe445082a7c9f8e7dcff2f18254f7a6b83634dc8
644b5701bef2ef45e2583babd9cc4b9abc5c7150ea468058c2109cf605bda221

@ldeffenb ldeffenb added the needs-triaging new issues that need triaging label May 28, 2024
@bee-runner bee-runner bot added the issue label May 28, 2024
@ldeffenb
Copy link
Collaborator Author

An interesting side effect of this issue is that for the references that /stewardship lies about retrievability, the -X PUT /stewardship of that same reference will fail. This is because the Traverse/Joiner callback in steward's Reupload attempts to fetch the chunk before pushing it out into the swarm.

fn := func(addr swarm.Address) error {

c, err := getter.Get(ctx, addr)

return uploaderSession.Put(ctx, c.WithStamp(stamp))

@istae
Copy link
Member

istae commented Jun 5, 2024

noop := func(leaf swarm.Address) error { return nil }

this has to be changed to fetch the leaf addresses from the network, probably ideal to have some parallel processing here to not fetch each chunk one by one one.

@ldeffenb
Copy link
Collaborator Author

ldeffenb commented Jun 5, 2024

Yeah, unfortunately the stewardship callback is only handed an address and no indication on what that address represents. So if the callback fetches from the network, it'll be redundantly fetching all but the leaves which will be singly fetched. And the traverser itself, which is actually doing most of the network fetching by virtue of the getter, really shouldn't be needing to invoke the getter on those leaves as that would penalize other traverser using code by fetching chunks that may not need to be fetched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
issue needs-triaging new issues that need triaging
Projects
None yet
Development

No branches or pull requests

3 participants