From 0b7c6cb44cf8da5dafa6556739332cc957f09c43 Mon Sep 17 00:00:00 2001 From: Adin Schmahmann Date: Thu, 24 Aug 2023 17:59:40 -0400 Subject: [PATCH 1/7] frc(0069): change v2 piece multihashes to enable arbitrarily sized data --- FRCs/frc-0069.md | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-) diff --git a/FRCs/frc-0069.md b/FRCs/frc-0069.md index 566b3193..1a5c3994 100644 --- a/FRCs/frc-0069.md +++ b/FRCs/frc-0069.md @@ -24,7 +24,7 @@ For example, in much of the relevant portions of the Filecoin spec and lotus' Go This makes it much more natural to work with new concepts like [Deal Aggregates](https://pkg.go.dev/github.com/filecoin-project/go-data-segment@v0.0.0-20230605095649-5d01fdd3e4a1/datasegment#NewAggregate), as proposed in [FRC-0058](https://github.com/filecoin-project/FIPs/blob/7e499523c9c7ed2c48c6a36967f7f011cee1fefd/FRCs/frc-0058.md). -To resolve this we introduce a new multihash type [fr32-sha2-256-trunc254-padded-binary-tree multihash](https://link.tld) which combines the root hash with the tree height (which in the full and balanced binary trees used in Piece commitments is equivalent to size). +To resolve this we introduce a new multihash type [fr32-sha2-256-trunc254-padded-binary-tree multihash](https://link.tld) which combines the root hash with the tree height and the amount of padding of the data with zeros such that the result is a full and balanced binary tree after the fr32 padding is applied. For Piece commitments the amount of added padding is zero. ## Specification @@ -38,9 +38,16 @@ The core component introduce in this specification is a new multihash type fr32- The multihash code for this type is 0x1011 as identifier in the [multicodec code table](https://link.tld). -The digest for the multihash is 33 bytes. The first byte defines the height of the tree. For example if the first byte is 0 the tree is a single 32-byte leaf. Similarly, if the first byte is 30 then the tree is 30 levels deep which, since the leaves must be 32 bytes, represents a piece of size 32*2^30 bytes = 32GiB. - -Note that the data processed by this hash function must be of size `N = 2^i * 127/128` where `i` is any positive integer >=7 and <=255. This means that the minimum hashable amount of data is 127 bytes. +The digest for the multihash is a variable number of bytes. +- The first byte defines the height of the tree + - For example if the first byte is 0 the tree is a single 32-byte leaf + - Similarly, if the first byte is 30 then the tree is 30 levels deep which, since the leaves must be 32 bytes, represents a piece of size 32*2^30 bytes = 32GiB. +- The next bytes are a [uvarint](https://github.com/multiformats/unsigned-varint) of the number of bytes needed to pad the underlying data such that after FR32 padding it will be a full binary tree + - If the data is < 127 bytes then it is padded to 127 bytes. For example, if the data is of size 12 the padding will be `127-12 = 115` (TODO: Is this necessary or would even 32 bytes be ok?) + - For example, if the data is of size 254 (i.e. 2^i * 127/128, where i is a positive integer >= 7) then this will be 0 + - For example, if the data is of size 256 then the padding is `508-256=252` + - Note: because the unsigned-varint spec currently has a maximum representable size of 2^63-1 (and 9 bytes to represent the varint) this puts a cap on the maximum size of data representable by this multihash as well + - Note: because this is padding data it must be less than the size of the underlying data ### Raw codec @@ -63,11 +70,17 @@ A tuple of (v1 Piece CID, Piece size) can be converted into a valid v2 Piece CID ## Test Cases -Take data of size 127*4 bytes where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, and the last 127 are 3. The v1 piece CID of this data would be `baga6ea4seaqes3nobte6ezpp4wqan2age2s5yxcatzotcvobhgcmv5wi2xh5mbi`. The multihash-based piece CID would be `bafkzcibbarew3lqmzhrgl37fuadoqbrguxofyqe6luyvlqjzqtfpnsgvz7lak`. With a base16 multibase this would be `f015591202104496dae0cc9e265efe5a006e80626a5dc5c409e5d3155c13984caf6c8d5cfd605` or equivalently (`(multibase = f) | (CIDv1 prefix = 0x01) | (Raw codec = 0x55) | (fr32-sha2-256-trunc254-padded-binary-tree multihash encoded varint = 0x9120) | (length of digest = 0x21) | (tree height = 0x04) | (underlying hash digest = 496...605)`) +Take data of size 127*4 bytes where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, and the last 127 are 3. The v1 piece CID of this data would be `baga6ea4seaqes3nobte6ezpp4wqan2age2s5yxcatzotcvobhgcmv5wi2xh5mbi`. The multihash-based piece CID would be `bafkzcibcaqaes3nobte6ezpp4wqan2age2s5yxcatzotcvobhgcmv5wi2xh5mbi`. With a base16 multibase this would be `f01559120220400496dae0cc9e265efe5a006e80626a5dc5c409e5d3155c13984caf6c8d5cfd605` or equivalently (`(multibase = f) | (CIDv1 prefix = 0x01) | (Raw codec = 0x55) | (fr32-sha2-256-trunc254-padded-binary-tree multihash encoded varint = 0x9120) | (length of digest = 0x22) | (tree height = 0x04) | (amount of data padding = 0x00) | (underlying hash digest = 496...605)`) + +Given the piece CID v1 of the empty 32 GiB piece (i.e. 32 * 2^30 * 127/128 bytes of zeros)`baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq` the corresponding mutlihash piece CID would be `bafkzcibcdyaao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq` + +Given the piece CID v1 of the empty 64 GiB piece (i.e. 64 * 2^30 * 127/128 bytes of zeros)`baga6ea4seaqomqafu276g53zko4k23xzh4h4uecjwicbmvhsuqi7o4bhthhm4aq` the corresponding piece mutlihash piece CID would be `bafkzcibcd4aomqafu276g53zko4k23xzh4h4uecjwicbmvhsuqi7o4bhthhm4aq` + +Take data of size 127*8 bytes where the first where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, the next 127 are 3 and the remaining `127*4` bytes are 0. There is no padding so the v1 piece CID would be `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` and the v2 would be `bafkzcibcauan42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa`. -Given the piece CID v1 of the empty 32 GiB piece `baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq` the corresponding mutlihash piece CID would be `bafkzcibbdydx4x66gxcqveyduviaty2jrjhl5x7ttrbloefxgdmoy6whv6td4` +Take data of size 128*4 bytes where the first where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, the next 127 are 3 and the remaining 127+4 bytes are 0. There is `377` bytes of padding needed and so the v1 piece CID is `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` (notice it's the same as above) and the v2 is `bafkzcibdax4qfxticxolgseegik2stpfgkkuwyf6kufex3doorkvmzpjuxwe4dz4`. -Given the piece CID v1 of the empty 64 GiB piece `baga6ea4seaqomqafu276g53zko4k23xzh4h4uecjwicbmvhsuqi7o4bhthhm4aq` the corresponding piece mutlihash piece CID would be `bafkzcibbd7teabngx7rxo6ktxcww56j7b7fbasnsaqlfj4vech3xaj4zz3hae` +Take the data above and append one more zero (i.e. 127 0s, 127 1s, 127 2s, 127 3s, 132 0s). There are `376` bytes of padding needed and so the v1 piece CID is `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` (notice it's the same as above) and the v2 is `bafkzcibdax4afxticxolgseegik2stpfgkkuwyf6kufex3doorkvmzpjuxwe4dz4`. ## Security Considerations Does not impact core Filecoin security. @@ -82,7 +95,7 @@ This also enables the use of IPFS-based tooling for moving around Pieces, with t ## Implementation -- There is a Go implementation in a [fork of go-fil-commcid](https://github.com/filecoin-project/go-fil-commcid/pull/5) which can get merged [upstream](https://github.com/filecoin-project/go-fil-commcid) upon acceptance of the FRC. +- There is a Go implementation in a [fork of go-fil-commcid](https://github.com/filecoin-project/go-fil-commcid/pull/6) which can get merged [upstream](https://github.com/filecoin-project/go-fil-commcid) upon acceptance of the FRC. - There is a JavaScript implementation in https://github.com/web3-storage/data-segment/ ## Copyright From 31ca3dd6397298e28d17786d516356ca9bb5406e Mon Sep 17 00:00:00 2001 From: Adin Schmahmann Date: Tue, 19 Sep 2023 02:44:11 -0400 Subject: [PATCH 2/7] be more explicit about zero padding being needed for the pieces used in Filecoin consensus --- FRCs/frc-0069.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/FRCs/frc-0069.md b/FRCs/frc-0069.md index 1a5c3994..ee09e4b3 100644 --- a/FRCs/frc-0069.md +++ b/FRCs/frc-0069.md @@ -24,7 +24,7 @@ For example, in much of the relevant portions of the Filecoin spec and lotus' Go This makes it much more natural to work with new concepts like [Deal Aggregates](https://pkg.go.dev/github.com/filecoin-project/go-data-segment@v0.0.0-20230605095649-5d01fdd3e4a1/datasegment#NewAggregate), as proposed in [FRC-0058](https://github.com/filecoin-project/FIPs/blob/7e499523c9c7ed2c48c6a36967f7f011cee1fefd/FRCs/frc-0058.md). -To resolve this we introduce a new multihash type [fr32-sha2-256-trunc254-padded-binary-tree multihash](https://link.tld) which combines the root hash with the tree height and the amount of padding of the data with zeros such that the result is a full and balanced binary tree after the fr32 padding is applied. For Piece commitments the amount of added padding is zero. +To resolve this we introduce a new multihash type [fr32-sha2-256-trunc254-padded-binary-tree multihash](https://link.tld) which combines the root hash with the tree height and the amount of padding of the data with zeros such that the result is a full and balanced binary tree after the fr32 padding is applied. If this multihash were to be used to reference the full piece as understood by the Filecoin on-chain consensus mechanism, this would involve the special case where the padding is zero is used. ## Specification From 16c519350d44fcaaf1f761ec74ed12e4926df2ba Mon Sep 17 00:00:00 2001 From: Adin Schmahmann Date: Tue, 10 Oct 2023 16:14:06 -0400 Subject: [PATCH 3/7] frc(0069): update draft to put the padding before the height. draft cleanup --- FRCs/frc-0069.md | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/FRCs/frc-0069.md b/FRCs/frc-0069.md index ee09e4b3..c24ac357 100644 --- a/FRCs/frc-0069.md +++ b/FRCs/frc-0069.md @@ -12,7 +12,7 @@ created: 2023-07-26 ## Simple Summary -Introduces an alternative CID representation for the FR32 padded sha256-trunc254-padded binary merkle trees used in Filecoin Piece Commitments (i.e. CommP). In it we use the [Raw codec](https://github.com/multiformats/multicodec/blob/566eaf857a9d20573d3910221db7b34d98e8a0fc/table.csv#L41) and a new [sha2-256-trunc254-padded-binary-tree multihash (or piece multihash)](https://link.tld) rather than the [Fil-commitment-unsealed codec](https://github.com/multiformats/multicodec/blob/566eaf857a9d20573d3910221db7b34d98e8a0fc/table.csv#L517) and the [sha2-256-trunc254-padded multihash](https://github.com/multiformats/multicodec/blob/566eaf857a9d20573d3910221db7b34d98e8a0fc/table.csv#L149). +Introduces an alternative CID representation for the FR32 padded sha256-trunc254-padded binary merkle trees used in Filecoin Piece Commitments (i.e. CommP). In it we use the [Raw codec](https://github.com/multiformats/multicodec/blob/566eaf857a9d20573d3910221db7b34d98e8a0fc/table.csv#L41) and a new [sha2-256-trunc254-padded-binary-tree multihash (or piece multihash)](https://github.com/multiformats/multicodec/pull/331) rather than the [Fil-commitment-unsealed codec](https://github.com/multiformats/multicodec/blob/566eaf857a9d20573d3910221db7b34d98e8a0fc/table.csv#L517) and the [sha2-256-trunc254-padded multihash](https://github.com/multiformats/multicodec/blob/566eaf857a9d20573d3910221db7b34d98e8a0fc/table.csv#L149). ## Abstract @@ -24,7 +24,7 @@ For example, in much of the relevant portions of the Filecoin spec and lotus' Go This makes it much more natural to work with new concepts like [Deal Aggregates](https://pkg.go.dev/github.com/filecoin-project/go-data-segment@v0.0.0-20230605095649-5d01fdd3e4a1/datasegment#NewAggregate), as proposed in [FRC-0058](https://github.com/filecoin-project/FIPs/blob/7e499523c9c7ed2c48c6a36967f7f011cee1fefd/FRCs/frc-0058.md). -To resolve this we introduce a new multihash type [fr32-sha2-256-trunc254-padded-binary-tree multihash](https://link.tld) which combines the root hash with the tree height and the amount of padding of the data with zeros such that the result is a full and balanced binary tree after the fr32 padding is applied. If this multihash were to be used to reference the full piece as understood by the Filecoin on-chain consensus mechanism, this would involve the special case where the padding is zero is used. +To resolve this we introduce a new multihash type [fr32-sha2-256-trunc254-padded-binary-tree multihash](https://github.com/multiformats/multicodec/pull/331) which combines the root hash with the tree height and the amount of padding of the data with zeros such that the result is a full and balanced binary tree after the fr32 padding is applied. If this multihash were to be used to reference the full piece as understood by the Filecoin on-chain consensus mechanism, this would involve the special case where a padding of zero is used. ## Specification @@ -36,18 +36,25 @@ A CIDv1 requires a: The core component introduce in this specification is a new multihash type fr32-sha2-256-trunc254-padded-binary-tree multihash. -The multihash code for this type is 0x1011 as identifier in the [multicodec code table](https://link.tld). +The multihash code for this type is 0x1011 as identified in the [multicodec code table](https://github.com/multiformats/multicodec/pull/331). The digest for the multihash is a variable number of bytes. -- The first byte defines the height of the tree - - For example if the first byte is 0 the tree is a single 32-byte leaf - - Similarly, if the first byte is 30 then the tree is 30 levels deep which, since the leaves must be 32 bytes, represents a piece of size 32*2^30 bytes = 32GiB. -- The next bytes are a [uvarint](https://github.com/multiformats/unsigned-varint) of the number of bytes needed to pad the underlying data such that after FR32 padding it will be a full binary tree + +It can be roughly described as `uvarint padding | uint8 height | 32 byte root data` where `|` means concatenation. + +- The first bytes are a [uvarint](https://github.com/multiformats/unsigned-varint) of the number of bytes needed to pad the underlying data such that after FR32 padding it will be a full binary tree - If the data is < 127 bytes then it is padded to 127 bytes. For example, if the data is of size 12 the padding will be `127-12 = 115` (TODO: Is this necessary or would even 32 bytes be ok?) - For example, if the data is of size 254 (i.e. 2^i * 127/128, where i is a positive integer >= 7) then this will be 0 - For example, if the data is of size 256 then the padding is `508-256=252` - Note: because the unsigned-varint spec currently has a maximum representable size of 2^63-1 (and 9 bytes to represent the varint) this puts a cap on the maximum size of data representable by this multihash as well - - Note: because this is padding data it must be less than the size of the underlying data + - Note: because this is padding data it must be less than the size of the underlying data + - (TODO: this is not true for data less than 64 bytes if padding up to 127) +- The next byte defines the height of the tree + - For example if the first byte is 0 the tree is a single 32-byte leaf + - Similarly, if the first byte is 30 then the tree is 30 levels deep which, since the leaves must be 32 bytes, represents a piece of size 32*2^30 bytes = 32GiB. +- The last 32 bytes are the value at the root of the binary tree + +Note: This structure is such that the last 33 bytes of the multihash are `uint 8 height | 32 byte root data` which is the data that the proofs underlying Filecoin consensus can attest to (i.e. the proofs don't know how much of the padding is padding vs zeros that are part of the user data). ### Raw codec @@ -70,17 +77,17 @@ A tuple of (v1 Piece CID, Piece size) can be converted into a valid v2 Piece CID ## Test Cases -Take data of size 127*4 bytes where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, and the last 127 are 3. The v1 piece CID of this data would be `baga6ea4seaqes3nobte6ezpp4wqan2age2s5yxcatzotcvobhgcmv5wi2xh5mbi`. The multihash-based piece CID would be `bafkzcibcaqaes3nobte6ezpp4wqan2age2s5yxcatzotcvobhgcmv5wi2xh5mbi`. With a base16 multibase this would be `f01559120220400496dae0cc9e265efe5a006e80626a5dc5c409e5d3155c13984caf6c8d5cfd605` or equivalently (`(multibase = f) | (CIDv1 prefix = 0x01) | (Raw codec = 0x55) | (fr32-sha2-256-trunc254-padded-binary-tree multihash encoded varint = 0x9120) | (length of digest = 0x22) | (tree height = 0x04) | (amount of data padding = 0x00) | (underlying hash digest = 496...605)`) +Take data of size 127*4 bytes where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, and the last 127 are 3. The v1 piece CID of this data would be `baga6ea4seaqes3nobte6ezpp4wqan2age2s5yxcatzotcvobhgcmv5wi2xh5mbi`. The multihash-based piece CID would be `bafkzcibcaaces3nobte6ezpp4wqan2age2s5yxcatzotcvobhgcmv5wi2xh5mbi`. With a base16 multibase this would be `f01559120220004496dae0cc9e265efe5a006e80626a5dc5c409e5d3155c13984caf6c8d5cfd605` or equivalently (`(multibase = f) | (CIDv1 prefix = 0x01) | (Raw codec = 0x55) | (fr32-sha2-256-trunc254-padded-binary-tree multihash encoded varint = 0x9120) | (length of digest = 0x22) | (amount of data padding = 0x00) | (tree height = 0x04) | (underlying hash digest = 496...605)`) -Given the piece CID v1 of the empty 32 GiB piece (i.e. 32 * 2^30 * 127/128 bytes of zeros)`baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq` the corresponding mutlihash piece CID would be `bafkzcibcdyaao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq` +Given the piece CID v1 of the empty 32 GiB piece (i.e. 32 * 2^30 * 127/128 bytes of zeros)`baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq` the corresponding mutlihash piece CID would be `bafkzcibcaapao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq` -Given the piece CID v1 of the empty 64 GiB piece (i.e. 64 * 2^30 * 127/128 bytes of zeros)`baga6ea4seaqomqafu276g53zko4k23xzh4h4uecjwicbmvhsuqi7o4bhthhm4aq` the corresponding piece mutlihash piece CID would be `bafkzcibcd4aomqafu276g53zko4k23xzh4h4uecjwicbmvhsuqi7o4bhthhm4aq` +Given the piece CID v1 of the empty 64 GiB piece (i.e. 64 * 2^30 * 127/128 bytes of zeros)`baga6ea4seaqomqafu276g53zko4k23xzh4h4uecjwicbmvhsuqi7o4bhthhm4aq` the corresponding piece mutlihash piece CID would be `bafkzcibcaap6mqafu276g53zko4k23xzh4h4uecjwicbmvhsuqi7o4bhthhm4aq` Take data of size 127*8 bytes where the first where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, the next 127 are 3 and the remaining `127*4` bytes are 0. There is no padding so the v1 piece CID would be `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` and the v2 would be `bafkzcibcauan42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa`. -Take data of size 128*4 bytes where the first where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, the next 127 are 3 and the remaining 127+4 bytes are 0. There is `377` bytes of padding needed and so the v1 piece CID is `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` (notice it's the same as above) and the v2 is `bafkzcibdax4qfxticxolgseegik2stpfgkkuwyf6kufex3doorkvmzpjuxwe4dz4`. +Take data of size 128*4 bytes where the first where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, the next 127 are 3 and the remaining 127+4 bytes are 0. There is `377` bytes of padding needed and so the v1 piece CID is `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` (notice it's the same as above) and the v2 is `bafkzcibd7ebalxticxolgseegik2stpfgkkuwyf6kufex3doorkvmzpjuxwe4dz4`. -Take the data above and append one more zero (i.e. 127 0s, 127 1s, 127 2s, 127 3s, 132 0s). There are `376` bytes of padding needed and so the v1 piece CID is `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` (notice it's the same as above) and the v2 is `bafkzcibdax4afxticxolgseegik2stpfgkkuwyf6kufex3doorkvmzpjuxwe4dz4`. +Take the data above and append one more zero (i.e. 127 0s, 127 1s, 127 2s, 127 3s, 132 0s). There are `376` bytes of padding needed and so the v1 piece CID is `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` (notice it's the same as above) and the v2 is `bafkzcibd7abalxticxolgseegik2stpfgkkuwyf6kufex3doorkvmzpjuxwe4dz4`. ## Security Considerations Does not impact core Filecoin security. From 345a9bd6dfa9f76197ef1baf65189871e64e83bb Mon Sep 17 00:00:00 2001 From: Adin Schmahmann Date: Mon, 16 Oct 2023 11:30:56 -0400 Subject: [PATCH 4/7] update draft test cases --- FRCs/frc-0069.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/FRCs/frc-0069.md b/FRCs/frc-0069.md index c24ac357..f04ebb43 100644 --- a/FRCs/frc-0069.md +++ b/FRCs/frc-0069.md @@ -85,9 +85,9 @@ Given the piece CID v1 of the empty 64 GiB piece (i.e. 64 * 2^30 * 127/128 bytes Take data of size 127*8 bytes where the first where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, the next 127 are 3 and the remaining `127*4` bytes are 0. There is no padding so the v1 piece CID would be `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` and the v2 would be `bafkzcibcauan42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa`. -Take data of size 128*4 bytes where the first where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, the next 127 are 3 and the remaining 127+4 bytes are 0. There is `377` bytes of padding needed and so the v1 piece CID is `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` (notice it's the same as above) and the v2 is `bafkzcibd7ebalxticxolgseegik2stpfgkkuwyf6kufex3doorkvmzpjuxwe4dz4`. +Take data of size 128*4 bytes where the first where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, the next 127 are 3 and the remaining 4 bytes are 0. There is `504` bytes of padding needed and so the v1 piece CID is `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` (notice it's the same as above) and the v2 is `bafkzcibd7abqlxticxolgseegik2stpfgkkuwyf6kufex3doorkvmzpjuxwe4dz4`. -Take the data above and append one more zero (i.e. 127 0s, 127 1s, 127 2s, 127 3s, 132 0s). There are `376` bytes of padding needed and so the v1 piece CID is `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` (notice it's the same as above) and the v2 is `bafkzcibd7abalxticxolgseegik2stpfgkkuwyf6kufex3doorkvmzpjuxwe4dz4`. +Take the data above and append one more zero (i.e. 127 0s, 127 1s, 127 2s, 127 3s, 5 0s). There are `503` bytes of padding needed and so the v1 piece CID is `baga6ea4seaqn42av3szurbbscwuu3zjssvfwbpsvbjf6y3tukvlgl2nf5rha6pa` (notice it's the same as above) and the v2 is `bafkzcibd64bqlxticxolgseegik2stpfgkkuwyf6kufex3doorkvmzpjuxwe4dz4`. ## Security Considerations Does not impact core Filecoin security. From 55f2958bdddefab6ba5bcc856bc4a23483c5c741 Mon Sep 17 00:00:00 2001 From: Adin Schmahmann Date: Mon, 16 Oct 2023 14:17:00 -0400 Subject: [PATCH 5/7] assert that if the data is smaller than 127 bytes it must be padded up to 127 bytes --- FRCs/frc-0069.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/FRCs/frc-0069.md b/FRCs/frc-0069.md index f04ebb43..7db01110 100644 --- a/FRCs/frc-0069.md +++ b/FRCs/frc-0069.md @@ -43,12 +43,11 @@ The digest for the multihash is a variable number of bytes. It can be roughly described as `uvarint padding | uint8 height | 32 byte root data` where `|` means concatenation. - The first bytes are a [uvarint](https://github.com/multiformats/unsigned-varint) of the number of bytes needed to pad the underlying data such that after FR32 padding it will be a full binary tree - - If the data is < 127 bytes then it is padded to 127 bytes. For example, if the data is of size 12 the padding will be `127-12 = 115` (TODO: Is this necessary or would even 32 bytes be ok?) + - If the data is < 127 bytes then it is padded to 127 bytes. For example, if the data is of size 12 the padding will be `127-12 = 115` - For example, if the data is of size 254 (i.e. 2^i * 127/128, where i is a positive integer >= 7) then this will be 0 - For example, if the data is of size 256 then the padding is `508-256=252` - Note: because the unsigned-varint spec currently has a maximum representable size of 2^63-1 (and 9 bytes to represent the varint) this puts a cap on the maximum size of data representable by this multihash as well - - Note: because this is padding data it must be less than the size of the underlying data - - (TODO: this is not true for data less than 64 bytes if padding up to 127) + - Note: because this is padding data it must be less than the size of the underlying data (with the exception of data less than 127 bytes which is always padded up to 127 bytes) - The next byte defines the height of the tree - For example if the first byte is 0 the tree is a single 32-byte leaf - Similarly, if the first byte is 30 then the tree is 30 levels deep which, since the leaves must be 32 bytes, represents a piece of size 32*2^30 bytes = 32GiB. From 9d8f2490deea4d9cddd169cff45da5c6f16c8823 Mon Sep 17 00:00:00 2001 From: Adin Schmahmann Date: Fri, 20 Oct 2023 17:55:09 -0400 Subject: [PATCH 6/7] add test fixture for 0 size payload Co-authored-by: Irakli Gozalishvili --- FRCs/frc-0069.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/FRCs/frc-0069.md b/FRCs/frc-0069.md index 7db01110..e4539d38 100644 --- a/FRCs/frc-0069.md +++ b/FRCs/frc-0069.md @@ -77,7 +77,11 @@ A tuple of (v1 Piece CID, Piece size) can be converted into a valid v2 Piece CID ## Test Cases Take data of size 127*4 bytes where the first 127 bytes are 0, the next 127 are 1, the next 127 are 2, and the last 127 are 3. The v1 piece CID of this data would be `baga6ea4seaqes3nobte6ezpp4wqan2age2s5yxcatzotcvobhgcmv5wi2xh5mbi`. The multihash-based piece CID would be `bafkzcibcaaces3nobte6ezpp4wqan2age2s5yxcatzotcvobhgcmv5wi2xh5mbi`. With a base16 multibase this would be `f01559120220004496dae0cc9e265efe5a006e80626a5dc5c409e5d3155c13984caf6c8d5cfd605` or equivalently (`(multibase = f) | (CIDv1 prefix = 0x01) | (Raw codec = 0x55) | (fr32-sha2-256-trunc254-padded-binary-tree multihash encoded varint = 0x9120) | (length of digest = 0x22) | (amount of data padding = 0x00) | (tree height = 0x04) | (underlying hash digest = 496...605)`) +Take payload of size 0 bytes. There MUST be `127` bytes of padding. The v2 piece CID MUST be `bafkzcibcp4bdomn3tgwgrh3g532zopskstnbrd2n3sxfqbze7rxt7vqn7veigmy`. +Take payload of 127 bytes where all bytes are 0. There MUST be `0` bytes of padding. The hight MUST be `2`. The v2 piece CID MUST be `bafkzcibcaabdomn3tgwgrh3g532zopskstnbrd2n3sxfqbze7rxt7vqn7veigmy`. + +Take payload of 128 bytes where all bytes are 0. There MUST be `126` bytes of padding. The height MUST be `3`. The v2 piece CID MUST be `bafkzcibcpybwiktap34inmaex4wbs6cghlq5i2j2yd2bb2zndn5ep7ralzphkdy` Given the piece CID v1 of the empty 32 GiB piece (i.e. 32 * 2^30 * 127/128 bytes of zeros)`baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq` the corresponding mutlihash piece CID would be `bafkzcibcaapao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq` Given the piece CID v1 of the empty 64 GiB piece (i.e. 64 * 2^30 * 127/128 bytes of zeros)`baga6ea4seaqomqafu276g53zko4k23xzh4h4uecjwicbmvhsuqi7o4bhthhm4aq` the corresponding piece mutlihash piece CID would be `bafkzcibcaap6mqafu276g53zko4k23xzh4h4uecjwicbmvhsuqi7o4bhthhm4aq` From d931caa7c2bdaae11f5f37d2e157e97448e657d6 Mon Sep 17 00:00:00 2001 From: Adin Schmahmann Date: Fri, 20 Oct 2023 17:55:34 -0400 Subject: [PATCH 7/7] wording change Co-authored-by: Peter Rabbitson --- FRCs/frc-0069.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/FRCs/frc-0069.md b/FRCs/frc-0069.md index e4539d38..731c978f 100644 --- a/FRCs/frc-0069.md +++ b/FRCs/frc-0069.md @@ -49,7 +49,8 @@ It can be roughly described as `uvarint padding | uint8 height | 32 byte root da - Note: because the unsigned-varint spec currently has a maximum representable size of 2^63-1 (and 9 bytes to represent the varint) this puts a cap on the maximum size of data representable by this multihash as well - Note: because this is padding data it must be less than the size of the underlying data (with the exception of data less than 127 bytes which is always padded up to 127 bytes) - The next byte defines the height of the tree - - For example if the first byte is 0 the tree is a single 32-byte leaf + - For example if the first byte is 0 the tree is a single 32-byte node + - Similarly, if the first byte is 30 then the tree is 30 levels deep which, since the leaves must be 32 bytes, represents a piece of size 32*2^30 bytes = 32GiB. - The last 32 bytes are the value at the root of the binary tree