Skip to content

Commit

Permalink
[Docs] Add some new info on performance tuning
Browse files Browse the repository at this point in the history
Signed-off-by: Dmitrii Kuvaiskii <[email protected]>
  • Loading branch information
Dmitrii Kuvaiskii committed Feb 23, 2021
1 parent 2decd11 commit 9e0ac10
Showing 1 changed file with 33 additions and 14 deletions.
47 changes: 33 additions & 14 deletions Documentation/devel/performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -332,13 +332,13 @@ Modern Icelake machines remove many of the hardware bottlenecks of Intel SGX. If
you must use an older machine (Skylake, Caby Lake, Mehlow), you should be aware
that they have severe SGX-hardware limitations. In particular:

#. EPC size. You can think of EPC as a physical cache (just like L3 cache) for
enclave pages. On all currently available machines, EPC is only 128-256MB in
size. This means that if the application has a working set size of more than
100-200MB, enclave pages will be evicted from EPC into RAM. Eviction of
enclave pages (also called EPC swapping or paging) is a very expensive
hardware operation. Some applications have a working set size of MBs/GBs of
data, so performance will be significantly impaired.
#. :term:`EPC` size. You can think of EPC as a physical cache (just like L3
cache) for enclave pages. On all currently available machines, EPC is only
128-256MB in size. This means that if the application has a working set size
of more than 100-200MB, enclave pages will be evicted from EPC into RAM.
Eviction of enclave pages (also called EPC swapping or paging) is a very
expensive hardware operation. Some applications have a working set size of
MBs/GBs of data, so performance will be significantly impaired.

#. RDTSC/RDTSCP instructions. These instructions are forbidden to execute in an
SGX enclave on older machines. Unfortunately, many applications and runtimes
Expand All @@ -362,6 +362,13 @@ that they have severe SGX-hardware limitations. In particular:
version that performs function calls inside Graphene instead of raw SYSCALL
instructions and thus avoids this overhead).

#. CPU topology. The CPU topology may negatively affect performance of Graphene.
For example, if the machine has several NUMA domains, it is important to
restrict Graphene runs to only one NUMA domain, e.g., via the command
``numactl --cpunodebind=0 --membind=0``. Otherwise Graphene may spread
enclave threads and enclave memory across several NUMA domains, which will
lead to higher memory access latencies and overall worse performance.

Other considerations
--------------------

Expand All @@ -372,6 +379,17 @@ non-debug configuration (again, typically simple ``make SGX=1`` is sufficient).
Finally, disable the debug log of Graphene by specifying the manifest option
``loader.log_level = "none"``.

There are several manifest options that may improve performance of some
workloads. The manifest options include:

- ``libos.check_invalid_pointers = 0`` -- disable checks of invalid pointers on
system call invocations. Most real-world applications never provide invalid
arguments to system calls, so there is no need in additional checks.
- ``sgx.preheat_enclave = 1`` -- pre-fault all enclave pages during enclave
initialization. This shifts the overhead of page faults on non-present enclave
pages from runtime to enclave startup time. Using this option makes sense only
if the whole enclave memory fits into :term:`EPC`.

If your application periodically fails and complains about seemingly irrelevant
things, it may be due to insufficient enclave memory. Please try to increase
enclave size by tweaking ``sgx.enclave_size = "512M"``,
Expand Down Expand Up @@ -510,8 +528,8 @@ Further reading
SGX profiling
-------------

There is some experimental support for profiling the code inside the SGX
enclave. Here is how to use it:
There is support for profiling the code inside the SGX enclave. Here is how to
use it:

#. Compile Graphene with ``SGX=1 DEBUG=1``.

Expand All @@ -527,11 +545,12 @@ enclave. Here is how to use it:

#. Run ``perf report -i <data file>`` (see :ref:`perf` above).

*Note*: The accuracy of this tool is unclear. The SGX profiling works by
measuring the value of instruction pointer on each asynchronous enclave exit
(AEX), which happen on Linux scheduler interrupts, as well as other events such
as page faults. While we attempt to measure time (and not only count
occurences), the results might be inaccurate.
*Note*: The accuracy of this tool is unclear (though we had positive experiences
using the tool so far). The SGX profiling works by measuring the value of
instruction pointer on each asynchronous enclave exit (AEX), which happen on
Linux scheduler interrupts, as well as other events such as page faults. While
we attempt to measure time (and not only count occurences), the results might be
inaccurate.

.. _sgx-profile-ocall:

Expand Down

0 comments on commit 9e0ac10

Please sign in to comment.