Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Style guide: Add error test coverage, integer overflow, and no_std. #154

Merged
merged 2 commits into from
Jun 30, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 88 additions & 3 deletions docs/style-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -367,13 +367,13 @@ Call non-panicking data access APIs whenever data is not guaranteed to be safe.

This should not include the contract of code in a different Crate. I.e. if a function in a different Crate promises to return a valid map key, but it's not a compile time checked type (like an enum), then the calling code must allow for it to fail.

### Don't Handle Errors :: required
### Don't Handle Errors :: suggested

Functions which can error for any reason must return a `Result`, and APIs should be designed such that you should never need to recover from an [Err](https://doc.rust-lang.org/std/result/enum.Result.html#variant.Err) internally (which should always be immediately propagated up to the user by using the [`?` operator](https://doc.rust-lang.org/edition-guide/rust-2018/error-handling-and-panics/the-question-mark-operator-for-easier-error-handling.html)). I.e. Never write library code which recovers from its own "errors", since if it can be recovered from, then it wasn't an "error".
Functions which can error for any reason must return a `Result`, and APIs should be designed such that you should not generally need to recover from an [Err](https://doc.rust-lang.org/std/result/enum.Result.html#variant.Err) internally (which should normally be immediately propagated up to the user by using the [`?` operator](https://doc.rust-lang.org/edition-guide/rust-2018/error-handling-and-panics/the-question-mark-operator-for-easier-error-handling.html)). I.e. don't generally write library code which recovers from its own "errors", since if it can be recovered from, then it wasn't an "error".

This approach should mean that error handling and the design of functions which can propagate errors is consistent everywhere. For non-error cases, where different types of result are possible, use a normal enum.

The only time you might need to handle an **Err** is if you call APIs outside the library which return Result rather than Option (e.g. allowing a retry for data loading).
Since an `Err` in `Result` is more expressive than a `None` in `Option`, there may be cases in which it is appropriate to handle a recoverable error. For example, you may call a function with one set of inputs, and if that call fails, you attempt to call it with a second set of inputs, before propagating the error.

Finally, and fairly obviously, **never turn an error into a panic by unconditionally unwrapping the result in the library**.

Expand Down Expand Up @@ -406,6 +406,91 @@ Examples:
* Try to open a file - Result
* Try to parse a string into a valid Language Identifier - Result

### Test all error cases :: required

You should write unit tests that cover all code paths that can generate an error.

If you find a bit of error-handling code that is unreachable by a unit test, you should consider replacing that code with `unreachable!()`.

## Integer Overflow

### Do not have unchecked overflow :: required

Malformed user input should not be able to cause integer overflow inside ICU4X implementation code. You should return an error result if the user's input is too big and may cause integer overflow.

#### Use appropriately-sized integer types

You don't need a `usize` to represent a decimal digit 0-9. By using the smallest possible integer type, you can reason better about cases where overflow can or cannot occur.

#### Checked Arithmetic

In cases where a hard limit on input is not possible, you can use methods such as [checked_add](https://doc.rust-lang.org/std/primitive.usize.html#method.checked_add).

#### Bounds Testing

By default, the `+` operator in Rust will panic upon overflow in debug mode. You should employ thorough testing of boundaries to ensure that your arithmetic is never able to overflow.

### Don't accept infinite iterators :: suggested

Consider using `enumerate` to avoid an infinite iterator causing integer overflow by checking that the index does not exceed a bound that you set on the input.

Please do not mind that the following example could be written with `fold` instead of `enumerate`.

```rust
#[non_exhaustive]
#[derive(Debug, PartialEq)]
pub enum Error {
Limit,
}

/// iter: must contain fewer than usize::MAX elements
pub fn count_zeros<T>(iter: T) -> Result<usize, Error>
where
T: Iterator<Item=u8>
{
let mut result: usize = 0;
for (i, v) in iter.enumerate() {
if i == std::usize::MAX {
return Err(Error::Limit)
}
if v == 0 {
result += 1;
}
}
return Ok(result)
}
```

# Imports and Configurations

## Features

### Use no_std :: suggested

Main issues: [#77](https://github.com/unicode-org/icu4x/issues/77), [#151](https://github.com/unicode-org/icu4x/issues/151)

Most ICU4X code will not work in [no_std](https://rust-embedded.github.io/book/intro/no-std.html), since memory allocation is very often required to handle edge cases. Even our most fundamental type, Locale, requires memory allocation.

However, when designing traits and interfaces, we should make them `no_std`-friendly, such that we can more easily expand in this direction more easily in the future.

## Dependencies

### Avoid heavy dependencies :: suggested

Code size is an important factor for portability. One of the easiest ways to accidentally bloat your code size is to pull in a heavy dependency.

When possible, write your code in such a way as to reduce dependencies, especially dependencies on heavier libraries. If you need to add a dependency, consider putting it behind a feature flag.

### Avoid `std::collections::HashMap` :: suggested

The standard library HashMap makes bloated binaries. In order to reduce code size, consider one of these options:

1. If the keys are known ahead of time, use a `struct` or `enum_map`.
2. If the set is small, use a vector (with linear-time lookup).
3. If the set is larger, use a sorted vector (with logarithmic-time lookup).

For option 3, note that sorting algorithms are also bloated. If possible, you should perform sorting offline (e.g., by requiring that data returned by the data provider is already sorted).
sffc marked this conversation as resolved.
Show resolved Hide resolved

# Advanced Features

## Operator Overloading
Expand Down