-
Notifications
You must be signed in to change notification settings - Fork 36
1.3. Memory FAQ
This page is a must read for all LWJGL users.
LWJGL requires the use of off-heap memory when passing data to native libraries. Likewise, any buffers returned from native libraries are always backed by off-heap memory. This is not an LWJGL limitation. There are two issues with Java objects and arrays that live on the JVM heap:
- It is not possible to control the layout of Java objects. Different JVMs and different JVM settings produce very different field layouts. Native libraries on the other hand expect data with very precisely defined layouts.
- Any Java object or array may be moved by the GC at any time, concurrently with the execution of of a native method call. All JNI methods are executed at a safepoint so, by definition, must not access heap data.
The standard approach is:
- Using JNI functions to access Java objects, which is painfully slow.
- Using JNI functions to "pin" Java arrays (
Get/ReleasePrimitiveArrayCritical
or Hotspot Critical Natives) which is also inefficient for several reasons.
LWJGL on the other hand is designed to be used with direct (off-heap) java.nio
buffer classes
for passing data to and from native code. ByteBuffer
and the other classes is not the best
possible abstraction for off-heap data and their API is not ideal, but it is the only officially
supported way to access off-heap data in Java.
The easiest way to think of ByteBuffer
is as a wrapper over a native C pointer, plus the array
length (the buffer.capacity()
). LWJGL maps C primitive types to the corresponding class in
java.nio
. Arrays of pointers are mapped to the org.lwjgl.PointerBuffer
class. Pointers to
structs are mapped to the corresponding struct class. Pointers to struct arrays are mapped to the
corresponding <StructClass>.Buffer
class. PointerBuffer
and the struct Buffer
classes have
an API very similar to java.nio
buffers.
The buffer byte order must be set to ByteOrder.nativeOrder()
. It is basically required for
correct closs-platform behavior. It also results in the best performance.
All buffer instances created by LWJGL are always set to the native byte order.
After getting familiar with the above mappings, the next step is learning how to handle allocation of such buffers. This is a critical issue and LWJGL offers several options. The options are listed below ordered from more-to-less efficient. Every time you make a decision on how to handle an allocation, you should consider the first option. If that's not applicable, consider the second option, and so on.
Java does not support explicit stack allocation of Java objects and obviously does not support off-heap stack allocation either. In C it's very simple: you declare a variable inside a function and it's stack allocated. When the function returns, the variable's memory is reclaimed automatically (and without overhead). There's no such equivalent in Java.
Similarly, it is not possible in Java to call a native function that expects or returns a struct by value. Such functions in LWJGL bindings are wrapped and exposed with a pointer-to-struct parameter or return value.
This is a problem because it's very common to need small, short-lived allocations when calling native functions. For example, creating a vertex buffer object in OpenGL in C:
GLuint vbo;
glGenBuffers(1, &vbo); // very simple
and with LWJGL:
IntBuffer ip = ...; // need a 4-byte buffer here
glGenBuffers(1, ip);
int vbo = ip.get(0);
A real IntBuffer
allocation in the above example, regardless of the implementation, would be
vastly more inefficient than the stack pointer in the equivalent C code.
The usual answer to this problem, in LWJGL 2 and other Java libraries, is to allocate the buffer once, cache it and reuse it in many method calls. This is an incredibly unsatisfying solution:
- It leads to ugly code and wastes memory.
- To avoid wasting memory, static buffers are usually used.
- Using static buffers leads to either concurrency bugs or less than ideal performance (due to synchronization).
The LWJGL 3 answer is the org.lwjgl.system.MemoryStack
API. It's been designed to be used with
static imports and try-with-resources blocks. The above example becomes:
int vbo;
try (MemoryStack stack = stackPush()) {
IntBuffer ip = stack.callocInt(1);
glGenBuffers(1, ip);
vbo = ip.get(0);
} // stack automatically popped, ip memory automatically reclaimed
It is obviously more verbose, but has the following advantages:
- More than one allocation is usually required, but the try-with-resources boilerplate remains the same.
- The semantics of the above code perfectly match the requirements. The stack memory is thread-local, just like a real C thread stack.
- Performance is ideal. The stack push and pop are simple bumps of a pointer and the
IntBuffer
instance allocation is either eliminated with escape analysis or handled by the next minor/eden GC cycle (super efficiently).
Note 1: The default stack size is
32kb
. It can be changed with-Dorg.lwjgl.system.stackSize
orConfiguration.STACK_SIZE
.
Note 2: Structs and struct buffers can also be allocated on the
MemoryStack
.
Note 3: The static, thread-local
MemoryStack
API is just a convenience. There's additional API that lets you create and/or useMemoryStack
instances as you see fit.
Sometimes stack allocation cannot be used. The memory that must be allocated is too big or the
allocation is long lived. In such cases, the next best option is explicit memory management.
Either via the org.lwjgl.system.MemoryUtil
API or a specific memory allocator (currently
available in LWJGL: stdlib
, jemalloc
). Example:
ByteBuffer buffer = memAlloc(2 * 1024 * 1024); // 2MB
// use buffer...
memFree(buffer); // free when no longer needed
Note 1: Just like in C, the user is responsible for deallocating memory allocated with
malloc
usingfree
.
Note 2: API for the standard functions
calloc
,realloc
andaligned_alloc
is also available.
Note 3: The Java objects allocated with the explicit memory management functions are also subject to escape analysis.
Sometimes the explicit memory management API cannot be used either. Maybe a particular allocation
is hard to track without complicating the code, or it might not be possible to know exactly when
it is no longer required. Such cases are legitimate candidates for using org.lwjgl.BufferUtils
.
This class existed in older LWJGL versions with the same API. It uses ByteBuffer.allocateDirect
to do the allocations which has one major advantage: the user does not need to deallocate the
off-heap memory explicitly, it is done automatically by the GC.
On the other hand, it has the following disadvantages:
- It is slow, much slower than the raw
malloc
call. A lot of overhead on top of a function that is already slow. - It scales badly with concurrent allocations.
- It arbitrarily limits the amount of allocated memory (
-XX:MaxDirectMemorySize
). - Like Java arrays, the allocated memory is always zeroed-out. This is not necessarily bad, but having the option would be better.
- There's no way to deallocate the allocated memory on demand (without JDK-specific reflection hacks). Instead, a reference queue is used that usually requires two GC cycles to free the native memory. This may lead to OOM errors under pressure.
An example of LWJGL using
BufferUtils
internally is for allocating the memory that backs the thread-localMemoryStack
instances. It is a long lived allocation that must be deallocated when the thread dies, so we let the GC take care of it.
- Use
org.lwjgl.system.MemoryStack
and if not possible... - Use
org.lwjgl.system.MemoryUtil
and if not possible... - Use
org.lwjgl.BufferUtil
Yes, read the Memory Management in LWJGL 3 blog post.