-
Notifications
You must be signed in to change notification settings - Fork 104
Memory safety guideline
Tempesta FW, as well as some other our clients' projects, is a mission critical software, working on the Internet edges. Security and reliability of the software is the key property.
The CISA Product Security Bad Practices demands a memory safety roadmap. The Case for Memory Safe Roadmaps suggests particular steps and technologies to make C and C++ code safe(-er). This document provides safe and security code guidelines for C and C++.
This guideline is mandatory for all projects that process untrusted user inputs. Tempesta FW or TLS are examples of such code. Tempesta DB or user-space logger aren't.
This page is incomplete and is supposed to be extended and/or fixed in almost all sections.
This section describes common for Linux kernel C as well as C++ programming practices.
Use KASAN for the Linux kernel or
Clang AddressSanitizer for
C or C++ user-space code. The Clang address sanitizer doesn't reveal as many
issues as valgrind
.
The address sanitizers, especially valgrind
, imply significant performance overhead,
so they must be used with long running tests suites, not with resulting production code.
Static analyzers also must be integrated with CI. We have good experience with Coverity scan, which is free for open source projects, and cppcheck. Clang static analyzer misses too many code problems.
gcov(1) can be used for the Linux kernel and user-space to measure the code coverage by tests.
The Case for Memory Safe Roadmaps recommends 80% coverage, but the absolute value doesn't mean much in practice. If the coverage is 80%, then we analyze the 20% not covered code and it may contain only trivial code (e.g. wrappers). Form the other hand, if we have coverage like 95%, then the rest 5% may container quite crucial piece of code.
We aim deterministic fuzzing, which mutate, probably infinitely, some data corpus. The mutations are non-random and obey particular rules to not to be rejected by tested system early and on the same code check. For example, if we generated HTTP requests randomly, then the tested HTTP server would reject the most of the requests on the first method character since only a small set of strings is allowed as a HTTP method.
Also if a deterministic fuzzer discovers a problem, it can be ran again to reproduce the problem.
The Linux kernel patch verification script
shows warnings for assertions (BUG_ON()
and family in the kernel).
The problem with assertions is that the crash program.
Assertions should be used only if a program crash or similar disaster is inevitable.
A good example could be:
assert(p != NULL);
f(p->foo);
If p
is NULL
, then the program is going to crash anyway. But if we crash early,
then we know we crash happens and we avoid possible security exploitations of the
code flaw.
Form the other hand a bad example is
assert(a > b);
f(a);
The relation between a
and b
doesn't necessary lead to a crash or data leakage.
If it does in f()
, then the assertion must be close to the code, which crashes.
Also f()
and the calling code may change in such a way that a <= b
is a valid
condition.
There are guidelines for C++ safe programming and C++ provides a lot for tools for safer programming in comparison with plain C.
C++ is getting better in terms of safety and security, so newer standards introduce safety features, which should be employed in our code.
There are at least two notable C++ guidelines:
The guidelines provide many rules accompanied by examples of safe and unsafe code. If you do C++ edge server development, then you must read and follow the guidelines.
Consider some of the rules briefly and in more details in the following sections:
- Avoid bound errors
-
By default use
const
. In Rust all variables are constant by default andmut
keyword is used to declare a variable as mutable. With this rule, we aim to achieve the same level of control over unwanted memory changes.
Typically a data plane (e.g. network packets processing) code is performance crucial, so we do use custom memory allocators, which require raw memory operations.
For such cases Rust programs must use unsafe blocks, which is equal to default C++ mode. The "default" C++ is fast, but unsafe (see Herb Sutter's keynote).
Wherever, performance isn't crucial, at least in control plane, such as configuration processing, safe, yet slower, C++ techniques must be used.
For example this unsafe C-like code:
char buf[1024];
unsigned size = sizeof(buf) - SOME_CONSTANT;
buf[size] = '\0';
read_json_config(buf, size);
Should be replaces with safer:
constexpr auto size = 1024;
std::array<char, size> buf = { 0 };
read_json_config(buf, size - SOME_CONSTANT);
The one problem with the original code is that it involves address arithmetics,
which is easy to make a mistake in. Another problem is that it leaves the areas
of uninitialized memory: if the JSON document is shorter than size
, then there
could be uninitialized data between the end of read string and written \0
.
To enforce avoidance of using raw (and unsafe) pointers, use Clang++ Safe Buffers and hardened libc++. The hardened libc++ should be used in the fast mode for production builds and there should be a CI job for the build with debug mode.
Wherever you use *
for a raw pointer, make sure that you can't use std::unique_ptr
or references &
. In general, for non-performance crucial code and the code, which
doesn't need to work with raw memory, use std::unique_ptr
or std::shared_ptr
. E.g.
instead of
tasks[i].client = new Client(foo);
use
tasks[i].client = std::make_unique<Client>(foo);
Also read C++ Core Guidelines: R.3: A raw pointer is non-owning.
For example, instead of char buf[100]
use std::string
, std::array
or std::vector
.
If you still need a C-style array, use std::span
or std::string_view
to safely
work with it's length. Consider an example serialization function (inspired by
the blog post
and C++ Core Guidelines: Catch run-time errors early):
void
serialize(const char *str, size_t len)
{
std::cout << len << ": ";
for (auto i = 0; i < len; ++i)
std::cout << str[i] << " ";
std::cout << std::endl;
}
You can call the function as
char str[] = {'a', 'b', 'c'};
serialize(str, sizeof(str));
If you define str
as a C-string, then you need to adjust the len
computation:
char *str = "abc";
serialize(str, sizeof(str) - 1);
Next, if you change the type to int
, then you need other len
computation:
int str[] = {'a', 'b', 'c'};
serialize(str, sizeof(str) / sizeof(str[0]));
The point is that it's easy to make a bug in length computation.
C++ STL provides span
and string_view
to safely pass C strings and arrays
with correct length computation:
void
serialize(std::span<char> array)
{
std::cout << array.size() << ": ";
for (const auto c: array)
std::cout << c << " ";
std::cout << std::endl;
}
void
print(std::string_view str)
{
std::cout << str.size() << ": " << str << std::endl;
}
int
main()
{
char array[] = {'a', 'b', 'c'};
serialize(array);
const char *str = "abc";
print(str);
return 0;
}
Or, better, use std::array
and std::string
(note that serialize()
and
print()
aren't changed and work just the same way):
std::array<char, 3> a{'a', 'b', 'c'};
std::string s("abc");
serialize(a);
print(s);
Also reference C++ Core Guidelines: Prefer using STL array or vector instead of a C array for this rule.
Prefer access to containers with bounds checking, e.g prefer std::vector::at()
to
std::vector::operator[]
.
Use const
and noexcept
specifiers wherever possible. This makes the code easier
to review, faster and allows compiler and static analyzers to do their work better.
- Home
- Requirements
- Installation
-
Configuration
- Migration from Nginx
- On-the-fly reconfiguration
- Handling clients
- Backend servers
- Load Balancing
- Caching Responses
- Non-Idempotent Requests
- Modify HTTP Messages
- Virtual hosts and locations
- HTTP Session Management
- HTTP Tables
- HTTP(S) Security
- Header Via
- Health monitor
- TLS
- Virtual host confusion
- Traffic Filtering by Fingerprints
- Run & Stop
- Application Performance Monitoring
- Use cases
- Performance
- Contributing