Memory safety guideline

Motivation

Tempesta FW, as well as some other our clients' projects, is a mission critical software, working on the Internet edges. Security and reliability of the software is the key property.

The CISA Product Security Bad Practices demands a memory safety roadmap. The Case for Memory Safe Roadmaps suggests particular steps and technologies to make C and C++ code safe(-er). This document provides safe and security code guidelines for C and C++.

This guideline is mandatory for all projects that process untrusted user inputs. Tempesta FW or TLS are examples of such code. Tempesta DB or user-space logger aren't.

This page is incomplete and is supposed to be extended and/or fixed in almost all sections.

General practices

This section describes common for Linux kernel C as well as C++ programming practices.

Use address sanitizers

Use KASAN for the Linux kernel or Clang AddressSanitizer for C or C++ user-space code. The Clang address sanitizer doesn't reveal as many issues as valgrind.

The address sanitizers, especially valgrind, imply significant performance overhead, so they must be used with long running tests suites, not with resulting production code.

Use static analyzers

Static analyzers also must be integrated with CI. We have good experience with Coverity scan, which is free for open source projects, and cppcheck. Clang static analyzer misses too many code problems.

Code coverage

gcov(1) can be used for the Linux kernel and user-space to measure the code coverage by tests.

The Case for Memory Safe Roadmaps recommends 80% coverage, but the absolute value doesn't mean much in practice. If the coverage is 80%, then we analyze the 20% not covered code and it may contain only trivial code (e.g. wrappers). Form the other hand, if we have coverage like 95%, then the rest 5% may container quite crucial piece of code.

Fuzzing

We aim deterministic fuzzing, which mutate, probably infinitely, some data corpus. The mutations are non-random and obey particular rules to not to be rejected by tested system early and on the same code check. For example, if we generated HTTP requests randomly, then the tested HTTP server would reject the most of the requests on the first method character since only a small set of strings is allowed as a HTTP method.

Also if a deterministic fuzzer discovers a problem, it can be ran again to reproduce the problem.

Use assertions only if a disaster is inevitable

The Linux kernel patch verification script shows warnings for assertions (BUG_ON() and family in the kernel). The problem with assertions is that the crash program. Assertions should be used only if a program crash or similar disaster is inevitable. A good example could be:

    assert(p != NULL);
    f(p->foo);

If p is NULL, then the program is going to crash anyway. But if we crash early, then we know we crash happens and we avoid possible security exploitations of the code flaw.

Form the other hand a bad example is

    assert(a > b);
    f(a);

The relation between a and b doesn't necessary lead to a crash or data leakage. If it does in f(), then the assertion must be close to the code, which crashes. Also f() and the calling code may change in such a way that a <= b is a valid condition.

C++

There are guidelines for C++ safe programming and C++ provides a lot for tools for safer programming in comparison with plain C.

Use new standards

C++ is getting better in terms of safety and security, so newer standards introduce safety features, which should be employed in our code.

Follow the C++ Guidelines

There are at least two notable C++ guidelines:

The guidelines provide many rules accompanied by examples of safe and unsafe code. If you do C++ edge server development, then you must read and follow the guidelines.

Consider some of the rules briefly and in more details in the following sections:

Avoid bound errors
By default use const. In Rust all variables are constant by default and mut keyword is used to declare a variable as mutable. With this rule, we aim to achieve the same level of control over unwanted memory changes.

Avoid raw memory operations

Typically a data plane (e.g. network packets processing) code is performance crucial, so we do use custom memory allocators, which require raw memory operations.

For such cases Rust programs must use unsafe blocks, which is equal to default C++ mode. The "default" C++ is fast, but unsafe (see Herb Sutter's keynote).

Wherever, performance isn't crucial, at least in control plane, such as configuration processing, safe, yet slower, C++ techniques must be used.

For example this unsafe C-like code:

char buf[1024];
unsigned size = sizeof(buf) - SOME_CONSTANT;

buf[size] = '\0';

read_json_config(buf, size);

Should be replaces with safer:

constexpr auto size = 1024;

std::array<char, size> buf = { 0 };

read_json_config(buf, size - SOME_CONSTANT);

The one problem with the original code is that it involves address arithmetics, which is easy to make a mistake in. Another problem is that it leaves the areas of uninitialized memory: if the JSON document is shorter than size, then there could be uninitialized data between the end of read string and written \0.

Use hardened libc++ and compiler options

To enforce avoidance of using raw (and unsafe) pointers, use Clang++ Safe Buffers and hardened libc++. The hardened libc++ should be used in the fast mode for production builds and there should be a CI job for the build with debug mode.

Use std::unique_ptr instead of raw pointers

Wherever you use * for a raw pointer, make sure that you can't use std::unique_ptr or references &. In general, for non-performance crucial code and the code, which doesn't need to work with raw memory, use std::unique_ptr or std::shared_ptr. E.g. instead of

tasks[i].client = new Client(foo);

use

tasks[i].client = std::make_unique<Client>(foo);

Also read C++ Core Guidelines: R.3: A raw pointer is non-owning.

Avoid C-style arrays

For example, instead of char buf[100] use std::string, std::array or std::vector.

If you still need a C-style array, use std::span or std::string_view to safely work with it's length. Consider an example serialization function (inspired by the blog post and C++ Core Guidelines: Catch run-time errors early):

void
serialize(const char *str, size_t len)
{
 	std::cout << len << ": ";
	for (auto i = 0; i < len; ++i)
		std::cout << str[i] << " ";
	std::cout << std::endl;
}

You can call the function as

char str[] = {'a', 'b', 'c'};
serialize(str, sizeof(str));

If you define str as a C-string, then you need to adjust the len computation:

char *str = "abc";
serialize(str, sizeof(str) - 1);

Next, if you change the type to int, then you need other len computation:

int str[] = {'a', 'b', 'c'};
serialize(str, sizeof(str) / sizeof(str[0]));

The point is that it's easy to make a bug in length computation. C++ STL provides span and string_view to safely pass C strings and arrays with correct length computation:

void
serialize(std::span<char> array)
{
	std::cout << array.size() << ": ";
	for (const auto c: array)
		std::cout << c << " ";
	std::cout << std::endl;
}

void
print(std::string_view str)
{
	std::cout << str.size() << ": " << str << std::endl;
}

int
main()
{
	char array[] = {'a', 'b', 'c'};
	serialize(array);

	const char *str = "abc";
	print(str);

	return 0;
}

Or, better, use std::array and std::string (note that serialize() and print() aren't changed and work just the same way):

std::array<char, 3> a{'a', 'b', 'c'};
std::string s("abc");

serialize(a);
print(s);

Also reference C++ Core Guidelines: Prefer using STL array or vector instead of a C array for this rule.

Access containers with bounds checking

Prefer access to containers with bounds checking, e.g prefer std::vector::at() to std::vector::operator[].

Restrict the code

Use const and noexcept specifiers wherever possible. This makes the code easier to review, faster and allows compiler and static analyzers to do their work better.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly