Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use core and alloc crates for no_std compatibility #980

Closed
wants to merge 18 commits into from

Conversation

soareschen
Copy link
Contributor

Part of informalsystems/hermes#1158

This follows similar work in informalsystems/hermes#1156 to move closer to supporting no_std in tendermint-rs.

  • Referenced an issue explaining the need for the change
  • Updated all relevant documentation in docs
  • Updated all code comments where relevant
  • Wrote tests
  • Added entry in .changelog/

@codecov-commenter
Copy link

codecov-commenter commented Sep 17, 2021

Codecov Report

Merging #980 (766dfdb) into master (9d83b54) will increase coverage by 0.2%.
The diff coverage is 69.3%.

❗ Current head 766dfdb differs from pull request most recent head 701f028. Consider uploading reports for the commit 701f028 to get more accurate results
Impacted file tree graph

@@           Coverage Diff            @@
##           master    #980     +/-   ##
========================================
+ Coverage    72.5%   72.7%   +0.2%     
========================================
  Files         203     203             
  Lines       16595   16585     -10     
========================================
+ Hits        12032   12062     +30     
+ Misses       4563    4523     -40     
Impacted Files Coverage Δ
light-client/src/components/verifier.rs 87.9% <ø> (ø)
light-client/src/predicates/errors.rs 50.0% <ø> (ø)
proto/src/serializers/bytes.rs 80.0% <ø> (ø)
proto/src/serializers/optional_from_str.rs 0.0% <ø> (ø)
proto/src/serializers/part_set_header_total.rs 100.0% <ø> (ø)
proto/src/serializers/time_duration.rs 87.5% <ø> (ø)
proto/src/serializers/timestamp.rs 96.3% <ø> (ø)
proto/src/serializers/txs.rs 64.5% <ø> (ø)
rpc/src/client.rs 16.9% <ø> (ø)
rpc/src/client/bin/main.rs 0.6% <ø> (ø)
... and 124 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9d83b54...701f028. Read the comment docs.

@soareschen soareschen marked this pull request as ready for review September 27, 2021 08:48
@@ -107,9 +117,15 @@ pub mod serializers;
/// // We expect a validation error here
/// assert!(MyDomainType::decode(invalid_raw_bytes.as_ref()).is_err());
/// ```
pub trait Protobuf<T: Message + From<Self> + Default>
pub trait Protobuf<T>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have modified the Protobuf trait such that it allows the message type to implement TryFrom instead of From. This allows safe parsing of the TimeStamp type, to prevent undefined behavior caused by integer overflow in DOS attack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally the idea here was that we wanted conversions from SerializationType -> DomainType to be fallible to allow for validation, and then from DomainType -> SerializationType to be infallible because DomainType should be a subset of SerializationType.

Changing this would imply that we could possibly have DomainTypes that possibly do not have valid serialized representations, which implies to me that those DomainTypes are incorrectly modeled, does it not?

So then the practical consequences of this architectural change would be:

  1. All From<DomainType> for SerializationType impls now need to become TryFrom<DomainType> for SerializationType, touching all domain types
  2. We need custom serde Serialize trait impls, as per https://github.com/informalsystems/tendermint-rs/pull/980/files#r717530415

This seems like a lot of work/code just to be able to cater for the DomainTimestamp -> SerializedTimestamp conversion, does it not? Is there no way to just implement a better DomainTimestamp whose conversion to SerializedTimestamp is infallible (as it should actually be)?


SystemTime::from(prost_value).into()
fn try_from(value: Timestamp) -> Result<Self, Error> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implement TryFrom instead of From to allow safe conversion of the raw protobuf Timestamp to the Time type. This is necessary as one is using i32 while the other use u32. Although the overflow is unlikely to occur, for security purpose it is still possible that an attacker may try to provide overflow time values to cause undefined behavior on the relayer or the chain. So it is better to return an error in case that happens.

As a result of this change, a number of From instances for various data structures that use Time have to be updated to implement TryFrom instead.

fn from(value: Time) -> Self {
// prost_types::Timestamp has a SystemTime converter but
// tendermint_proto::Timestamp can be JSON-encoded
let prost_value = prost_types::Timestamp::from(SystemTime::from(value));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove the use of SystemTime here, as this is the only place where the tendermint crate is still using std.

}
}

impl Serialize for Block {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By changing from TryFrom to Try, it is not possible to auto derive the Serialize trait through type attributes. (serde-rs/serde#1524) So I just implement the trait manually.

I'm not sure why there is a need to auto derive the Serialize trait other than for use in testing. In any case, I see that there are many custom impl blocks for Serialize in tendermint, so it should be fine to add a few more.

// TODO: Either add this check in verification or remove this test because otherwise there's no
// point of it
// TODO: Either add this check in verification or remove this test because otherwise there's
// no point of it
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there are some changes the Rust nightly that cause comments to be formatted slightly differently by cargo fmt.


fn add(self, rhs: Duration) -> Self::Output {
let st: SystemTime = self.into();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remove the use of SystemTime for addition/substration here to support no_std. To perform safe addition in chrono, we have to take care of integer overflows by returning error. This is important as the overflow result may cause computation such as trusted period to return incorrect result.

@@ -91,7 +93,7 @@ hyper = { version = "0.14", optional = true, features = ["client", "http1", "htt
hyper-proxy = { version = "0.9", optional = true }
hyper-rustls = { version = "0.22.1", optional = true }
structopt = { version = "0.3", optional = true }
tokio = { version = "1.0", optional = true }
tokio = { version = "1.0", optional = true, features = ["rt-multi-thread"] }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature's already specified for the websocket-client feature, which I think is the only one that needs it. #990 fixes the mock client's (incorrect) dependency on this Tokio feature.

@@ -107,9 +117,15 @@ pub mod serializers;
/// // We expect a validation error here
/// assert!(MyDomainType::decode(invalid_raw_bytes.as_ref()).is_err());
/// ```
pub trait Protobuf<T: Message + From<Self> + Default>
pub trait Protobuf<T>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally the idea here was that we wanted conversions from SerializationType -> DomainType to be fallible to allow for validation, and then from DomainType -> SerializationType to be infallible because DomainType should be a subset of SerializationType.

Changing this would imply that we could possibly have DomainTypes that possibly do not have valid serialized representations, which implies to me that those DomainTypes are incorrectly modeled, does it not?

So then the practical consequences of this architectural change would be:

  1. All From<DomainType> for SerializationType impls now need to become TryFrom<DomainType> for SerializationType, touching all domain types
  2. We need custom serde Serialize trait impls, as per https://github.com/informalsystems/tendermint-rs/pull/980/files#r717530415

This seems like a lot of work/code just to be able to cater for the DomainTimestamp -> SerializedTimestamp conversion, does it not? Is there no way to just implement a better DomainTimestamp whose conversion to SerializedTimestamp is infallible (as it should actually be)?

@soareschen
Copy link
Contributor Author

Originally the idea here was that we wanted conversions from SerializationType -> DomainType to be fallible to allow for validation, and then from DomainType -> SerializationType to be infallible because DomainType should be a subset of SerializationType.
Changing this would imply that we could possibly have DomainTypes that possibly do not have valid serialized representations, which implies to me that those DomainTypes are incorrectly modeled, does it not?

Just because the type signature of a function does not return an error, does not mean that there isn't actually an error. The purpose of defining type signatures is to reflect the actual behavior of the implementation as accurate as possible. In the case of the timestamp conversion, defining a From instance for it is simply ignoring the fact that there is a chance for that to fail. This kind of error-ignorance is especially common in languages like Go, where even when the code fails, a "valid" value is still always returned and the caller can pretend that the error never happens.

If we really want to ignore the failure and define a From instance for Timestamp using the underlying chrono::DateTime, there are a few choices:

  1. Raise panic when conversion between unsigned and signed integers fail, allowing the relayer to potentially be DOS'ed with attacker sending an overflow timestamp.
  2. Allow silent integer overflow using as coercion, allowing an attacker to send an overly large timestamp to go back in time and cause potential security vulnerabilities.
  3. Default to 0 or max seconds/nanoseconds value when conversion fails.
  4. Use an alternative library other than chrono so that it can process the seconds and nanoseconds in the same precision as the protobuf fields. (I am actually tempted on this, but couldn't find better alternatives)

I originally chose option 3 as the solution. But given that there aren't too many code changes, I do think that it is worth handling the error properly. I also had similar issue in ibc-rs, where I couldn't use the Protobuf trait because of the lack of From trait, and had to copy-paste the decode function with minor modification to workaround that.

Btw note that all types implementing From also automatically implement TryFrom, with the error type being Infallible.

@thanethomson
Copy link
Contributor

thanethomson commented Sep 28, 2021

Raise panic when conversion between unsigned and signed integers fail, allowing the relayer to potentially be DOS'ed with attacker sending an overflow timestamp.

Is this a real possibility? How would an attacker trigger an incorrect conversion from Time -> Timestamp (which is really a conversion from chrono::DateTime<Utc> -> Timestamp) from the outside?

The only way I could see that being a possibility is if we somehow constructed an incorrect chrono::DateTime<Utc> in the first place, right?

Here's my reasoning:

  • chrono::DateTime::timestamp_subsec_nanos() returns a u32 (max value 2^32 > 4x10^9)
  • The Protobuf Timestamp::nanos field is i32 (max value 2^31 - 1, which is > 2x10^9)
  • There are only 1x10^9 nanoseconds in a second, so any nanosecond value >= 1x10^9 should be invalid

Hence if we want to prevent invalid nanosecond values, we should check that the timestamp_subsec_nanos() value of the constructed chrono::DateTime<Utc> is < 10^9 when we're deserializing it, correct?

If we do somehow end up with a chrono::DateTime<Utc>::timestamp_subsec_nanos() value > 999_999_999 (as per the docs in the case of a leap second), should we not rather increment the seconds and use the nanoseconds modulo 1000_000_000? How are leap seconds usually handled?

Copy link
Contributor

@xla xla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bit of a pattern here where change-sets are proposed which often sneak in changes not directly pertaining to the set goal of the change-set. If we want to change the semantics of the Prototype trait this should be reasoned about and proposed in isolation instead of in the context of switching to core and alloc. It would be warranted to preface such a change with a small ADR so the design and rational can be discussed instead of it being intermingled with the review of the implementation.

@soareschen
Copy link
Contributor Author

If we do somehow end up with a chrono::DateTime<Utc>::timestamp_subsec_nanos() value > 999_999_999 (as per the docs in the case of a leap second), should we not rather increment the seconds and use the nanoseconds modulo 1000_000_000? How are leap seconds usually handled?

Yeah, in that particular case it shouldn't happen unless there is a bug in chrono itself. I suppose in case there is a conversion error, we could use 0 as the default subsecond nanos.

For the case of TryFrom<Timestamp>, we would still have to do safe conversion, since the protobuf can be supplied by an attacker. Btw I also just notice that we are supposed to use Utc.timestamp_opt instead of Utc.timestamp so that chrono does not panic in the case of an overflow input.

I'll check whether it is fine to have TryFrom<Timestamp> but From<Time>. The two asymmetric requirements still always confuse me.

There is a bit of a pattern here where change-sets are proposed which often sneak in changes not directly pertaining to the set goal of the change-set. If we want to change the semantics of the Prototype trait this should be reasoned about and proposed in isolation instead of in the context of switching to core and alloc. It would be warranted to preface such a change with a small ADR so the design and rational can be discussed instead of it being intermingled with the review of the implementation.

Yeah the changes get a bit large as I tried to get rid of the use of std::time::SystemTime ASAP. I will break this into two separate PRs instead.

@soareschen
Copy link
Contributor Author

Closing this PR in favor of #993 and #994.

It seems like we do able to avoid modifying the Protobuf trait if only TryFrom<Timestamp> is implemented. Thanks @thanethomson for pointing that out!

@soareschen soareschen closed this Sep 29, 2021
@xla xla deleted the soares/use-core-alloc branch September 29, 2021 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no_std no_std compatibility
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants