Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elmattic/actors review c1 #1368

Merged
merged 6 commits into from
Jan 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions utils/bitfield/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ rand_xorshift = "0.2.0"
rand = "0.7.3"
criterion = "0.3"
serde_json = "1.0"
# TODO remove fork in future (allowing non utf8 strings to be cbor deserialized)
serde_cbor = { package = "cs_serde_cbor", version = "0.12", features = [
"tags"
] }

[features]
json = []
Expand Down
6 changes: 6 additions & 0 deletions utils/bitfield/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,12 @@ use std::{

type Result<T> = std::result::Result<T, &'static str>;

// MaxEncodedSize is the maximum encoded size of a bitfield. When expanded into
// a slice of runs, a bitfield of this size should not exceed 2MiB of memory.
//
// This bitfield can fit at least 3072 sparse elements.
const MAX_ENCODED_SIZE: usize = 32 << 10;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not 1 << 15? i assume there is a reason, just curious!

Copy link
Contributor Author

@elmattic elmattic Jan 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy pasta of the way go-bitfield defines the constant.
filecoin-project/go-bitfield@3478657

I guess they write it like this to make 32KiB more visible :)


/// A bit field with buffered insertion/removal that serializes to/from RLE+. Similar to
/// `HashSet<usize>`, but more memory-efficient when long runs of 1s and 0s are present.
#[derive(Debug, Default, Clone)]
Expand Down
16 changes: 9 additions & 7 deletions utils/bitfield/src/rleplus/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -67,15 +67,11 @@ mod writer;
pub use reader::BitReader;
pub use writer::BitWriter;

use super::{BitField, Result};
use super::{BitField, Result, MAX_ENCODED_SIZE};
use serde::{Deserialize, Deserializer, Serialize, Serializer};
use std::borrow::Cow;

// MaxEncodedSize is the maximum encoded size of a bitfield. When expanded into
// a slice of runs, a bitfield of this size should not exceed 2MiB of memory.
//
// This bitfield can fit at least 3072 sparse elements.
const MAX_ENCODED_SIZE: usize = 32 << 10;
pub const VERSION: u8 = 0;

impl Serialize for BitField {
fn serialize<S>(&self, serializer: S) -> std::result::Result<S::Ok, S::Error>
Expand All @@ -99,6 +95,12 @@ impl<'de> Deserialize<'de> for BitField {
D: Deserializer<'de>,
{
let bytes: Cow<'de, [u8]> = serde_bytes::deserialize(deserializer)?;
if bytes.len() > MAX_ENCODED_SIZE {
return Err(serde::de::Error::custom(format!(
"decoded bitfield was too large {}",
bytes.len()
)));
}
Self::from_bytes(&bytes).map_err(serde::de::Error::custom)
}
}
Expand All @@ -115,7 +117,7 @@ impl BitField {
let mut reader = BitReader::new(bytes);

let version = reader.read(2);
if version != 0 {
if version != VERSION {
return Err("incorrect version");
}

Expand Down
12 changes: 11 additions & 1 deletion utils/bitfield/src/unvalidated.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
// Copyright 2019-2022 ChainSafe Systems
// SPDX-License-Identifier: Apache-2.0, MIT

use super::{BitField, Result};
use super::rleplus::VERSION;
use super::{BitField, Result, MAX_ENCODED_SIZE};
use encoding::serde_bytes;
use serde::{Deserialize, Deserializer, Serialize};

Expand Down Expand Up @@ -62,6 +63,15 @@ impl<'de> Deserialize<'de> for UnvalidatedBitField {
D: Deserializer<'de>,
{
let bytes: Vec<u8> = serde_bytes::deserialize(deserializer)?;
if bytes.len() > MAX_ENCODED_SIZE {
return Err(serde::de::Error::custom(format!(
"decoded bitfield was too large {}",
bytes.len()
)));
}
if !bytes.is_empty() && bytes[0] & 3 != VERSION {
return Err(serde::de::Error::custom("invalid RLE+ version".to_string()));
}
Ok(Self::Unvalidated(bytes))
}
}
59 changes: 58 additions & 1 deletion utils/bitfield/tests/bitfield_tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@
// SPDX-License-Identifier: Apache-2.0, MIT

use ahash::AHashSet;
use forest_bitfield::{bitfield, BitField};
use forest_bitfield::{bitfield, BitField, UnvalidatedBitField};
use rand::{Rng, SeedableRng};
use rand_xorshift::XorShiftRng;
use serde_cbor::ser::Serializer;
use std::iter::FromIterator;

fn random_indices(range: usize, seed: u64) -> Vec<usize> {
Expand Down Expand Up @@ -223,3 +224,59 @@ fn padding() {
let deserialized: BitField = encoding::from_slice(&cbor).unwrap();
assert_eq!(deserialized, bf);
}

#[test]
fn bitfield_deserialize() {
// Set alternating bits for worst-case size performance
let mut bf = BitField::new();
let mut i = 0;
while i < 262_142 {
bf.set(i);
i += 2;
}
let bytes = bf.to_bytes();
let mut cbor = Vec::new();
serde_bytes::serialize(&bytes, &mut Serializer::new(&mut cbor)).unwrap();
let res: Result<BitField, _> = encoding::from_slice(&cbor);
assert!(res.is_ok());

// Set alternating bits for worst-case size performance
let mut bf = BitField::new();
let mut i = 0;
while i < 262_143 {
bf.set(i);
i += 2;
}
let bytes = bf.to_bytes();
let mut cbor = Vec::new();
serde_bytes::serialize(&bytes, &mut Serializer::new(&mut cbor)).unwrap();
let res: Result<BitField, _> = encoding::from_slice(&cbor);
assert!(res.is_err());
}

#[test]
fn unvalidated_deserialize() {
// Set alternating bits for worst-case size performance
let mut bf = BitField::new();
let mut i = 0;
while i < 262_143 {
bf.set(i);
i += 2;
}
let bytes = bf.to_bytes();
let mut cbor = Vec::new();
serde_bytes::serialize(&bytes, &mut Serializer::new(&mut cbor)).unwrap();
let res: Result<UnvalidatedBitField, _> = encoding::from_slice(&cbor);
assert!(res.is_err());
}

#[test]
fn unvalidated_deserialize_version() {
let bf = bitfield![1, 1, 1, 1, 1, 1, 1, 1];
let mut bytes = bf.to_bytes();
bytes[0] |= 0x1; // flip bit to corrupt version
let mut cbor = Vec::new();
serde_bytes::serialize(&bytes, &mut Serializer::new(&mut cbor)).unwrap();
let res: Result<UnvalidatedBitField, _> = encoding::from_slice(&cbor);
assert!(res.is_err());
}