-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/stewardship doesn't check final leaves #4696
Comments
An interesting side effect of this issue is that for the references that /stewardship lies about retrievability, the -X PUT /stewardship of that same reference will fail. This is because the Traverse/Joiner callback in steward's Reupload attempts to fetch the chunk before pushing it out into the swarm. Line 56 in 97e7ee6
Line 57 in 97e7ee6
Line 67 in 97e7ee6
|
Line 82 in 97e7ee6
this has to be changed to fetch the leaf addresses from the network, probably ideal to have some parallel processing here to not fetch each chunk one by one one. |
Yeah, unfortunately the stewardship callback is only handed an address and no indication on what that address represents. So if the callback fetches from the network, it'll be redundantly fetching all but the leaves which will be singly fetched. And the traverser itself, which is actually doing most of the network fetching by virtue of the getter, really shouldn't be needing to invoke the getter on those leaves as that would penalize other traverser using code by fetching chunks that may not need to be fetched. |
Context
2.1.0 (and all earlier, I suspect)
Summary
I have a file reference on the swarm mainnet that is missing a chunk, but yet the /stewardship API insists that the reference is retrievable. I have the reference locally pinned, but due to the earlier bugs in the swarm, one of the BMT chunks is no longer local and in fact, no longer in the swarm.
mainet Reference:
a3ad3e5ea830b930c62ddf11d198c6492261bbdb612fe17fabe7585b5aedd52d
Composed of BMT chunks:
If you try to retrieve the missing chunk with the /chunks API, you will see that it is truly missing, even though /stewardship says the entire main reference is retrievable. Pulling the file contents with /bytes will give you an incomplete file.
Expected behavior
If /stewardship says a reference is retrievable, I'd expect to be able to retrieve it.
Actual behavior
/stewardship says it is retrievable
/bytes is short
/chunks of the missing chunk fails.
Steps to reproduce
Just try the references shown above on the current mainnet swarm.
Possible solution
The stewardship API invokes
s.steward.IsRetrievable
bee/pkg/api/stewardship.go
Line 94 in 97e7ee6
IsRetrievable
simply uses the prebuiltnetTraverser
which uses an internalnetGetter
to retrieve chunks from the swarm. But the issue is in the actual traverser.bee/pkg/steward/steward.go
Line 83 in 97e7ee6
traversal.Traverse
determines that the reference is not an SOC and also not a manifest node, so it simply invokesprocessBytes
on the address.bee/pkg/traversal/traversal.go
Line 88 in 97e7ee6
processBytes
creates a joiner and then invokes j.IterateChunkAddresses with the stewardship callback function which is actuallynoop
.bee/pkg/traversal/traversal.go
Line 45 in 97e7ee6
bee/pkg/traversal/traversal.go
Line 49 in 97e7ee6
bee/pkg/steward/steward.go
Line 82 in 97e7ee6
But we still haven't seen the issue which is actually in the joiner iteration itself. The first thing
IterateChunkAddresses
does is invoke the callback on the original address. Note that the joiner has not actually fetched this address from the swarm, but invokes thenoop
callback. We're lucky because Traverse actually DID retrieve the root chunk from the swarm to perform the SOC and manifest tests.bee/pkg/file/joiner/joiner.go
Line 353 in 97e7ee6
bee/pkg/traversal/traversal.go
Line 58 in 97e7ee6
bee/pkg/traversal/traversal.go
Line 69 in 97e7ee6
So the root chunk has actually been retrieved with steward's netGetter. Now, the joiner invokes processChunkAddresses which traverses the BMT for the reference. For each of the addresses, the callback is invoked, note that this is BEFORE any retrieval has been done on those child chunks.
bee/pkg/file/joiner/joiner.go
Line 384 in 97e7ee6
I suspect that the issue where a callback-chunk is not be retrieved is if the sec is less than a chunkSize. The continue avoids actually checking if the chunk is retrievable.
bee/pkg/file/joiner/joiner.go
Line 392 in 97e7ee6
The child chunk is retrieved inside a new goroutine and recursively passed back into processChunkAddresses.
bee/pkg/file/joiner/joiner.go
Line 403 in 97e7ee6
bee/pkg/file/joiner/joiner.go
Line 412 in 97e7ee6
I leave these references to the powers-that-bee that understand the BMT better than I do to figure out where the traverser/joiner is not actually checking this one single chunk of this BMT file reference. It is supposed to be a 256x256 PNG file that should be approximately (from a different map version):
The following mainnet swarm references are other /bytes version of this same map tile. They may or may not be retrievable in the current swarm (at least, until I manage to re-push them after 2.1.0 settles down).
Another example of this issue is mainnet reference
f6e186f14ecd3ae0f9a61ce8c23348c8d27d23398ff7f24e224b53f2b5c488b6
which is retrievable, but is composed of the following BMT chunks, one of which is missing from the swarm.The text was updated successfully, but these errors were encountered: