The purpose of distx is to make it easy to create binary programs and distribute them. Distx can run the same binary code on different operating systems. You still need to compile your code for different architectures. Distx works on the same concept as wine. However instead of being based on the windows API, it tires to create a minimal API that can be supported by as many operating systems as possible.
The quickstart should work under Linux (x86_64, aarch64) and Windows (x86_64).
git clone "https://github.com/yokto/distx.git"
cd distx/test_programs
../distx.py install
../distx.py run build
../distx.py run test
You might get this bug when trying to install the dependencies
https://github.com/python/cpython/issues/65115
You should be able to fix it by opening the following urls in your browser
https://distx.org
https://cloudfront.net
Now get started
git clone "https://github.com/yokto/distx.git"
dir distx\test_programs
python ..\distx.py install
python ..\distx.py run build
python ..\distx.py run test
Create the c++ file in distx/test_programs.
[...distx/test_programs]$ cat hello_world.cpp
#include <iostream>
int main() {
std::cout << "hello world!";
}
Build it. It is important that the final binary is inside _distx and that the path within that directory matches the soname.
_distx/distx.org_2024-xload-x86_64/bin/xload_linux _distx/distx.org_2024-llvm-x86_64/bin/clang++ hello_world.cpp -o _distx/hello_world -Wl,-soname=/hello_world
Run it.
_distx/distx.org_2024-xload-x86_64/bin/xload_linux _distx/hello_world
On linux you can register the loader. Just put it in a place that makes sense. E.g. /_distx/xload_linux and register it with the kernel. Afterwards you can call distx programs without calling the loader explicitly.
mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
echo ":zwolf:M:0:\\x7FELF\\x02\\x01\\x01\\x13::/_distx/xload_linux:" > /proc/sys/fs/binfmt_misc/register
Distx can run the same binary code on different operating systems. You still need to compile your code for different architectures. It consists of the following components.
The loader is quite simple. It just takes the binary, loads it into ram. Links it with it's dependencies and executes it.
_distx/distx.org_2024-xload-x86_64/bin/xload_linux path-to-exec arg1 arg2
Dependencies can only be other Distx libraries. The program itself can dynamically load system libraries and symbols however.
Distx is the name of the distribution as well as the name of the overall project.
At the moment this is a simple python script that downloads build dependencies. It can some predefined commands this helps with not always having to specify paths when building a project.
distx.py install # installs dependencies specified in distx-build.json
distx.py run build # runs the command "build" which is specified in distx-build.json
distx-build.json is the config file used.
{
"// comment1": "it will loop through those variables when it encounters a {VAR}",
"loop": {
"ARCH": ["x86_64", "aarch64"],
"BUILD_DEP": ["llvm", "xload", "xbuild"],
"RUN_DEP": ["xlibc", "llvm-libcxx", "llvm-rt"]
},
"// comment2": "just a plain list of build time dependencies",
"// comment3": "we can also make use of the default variables DISTX_PREFIX, DISTX_SRC, DISTX_INSTALL, DISTX_BUILD, DISTX_ARCH, DISTX_DEP",
"dep": [
"{DISTX_PREFIX}llvm-common",
"{DISTX_PREFIX}{BUILD_DEP}-{DISTX_ARCH}",
"{DISTX_PREFIX}{RUN_DEP}-{ARCH}",
"{DISTX_PREFIX}{RUN_DEP}-common"
],
"// comment4": "list of commands that can be run with distx.py run <...>",
"run": {
"build": {
"cmd": ["{DISTX_DEP}/{DISTX_PREFIX}xbuild-{DISTX_ARCH}/bin/xbuild", "build"]
},
"test": {
"cmd": ["{DISTX_DEP}/{DISTX_PREFIX}xbuild-{DISTX_ARCH}/bin/xbuild", "test"]
}
}
}
Default variables
- DISTX_PREFIX: prefix of the repository where all the packages are
- DISTX_SRC: path to sources (usually current working directory)
- DISTX_INSTALL: path to install directory
- DISTX_BUILD: path to build directory
- DISTX_ARCH: host architecture
- DISTX_DEP: path to dependencies
This part is relatively straight forward. We just use clang. The main problem here is that clang is somewhat intertwined with the libc++/libunwind which we use as runtime libraries. There are some default compiler options we added. One important part is that we always need an soname that tells us where the library file is. I.e.
$ find /some/path
./somelib/lib/x64/lib.so
./someexe/bin/x64/exe
$ readelf -a somelib/lib/x64/lib.so | grep SONAME
(SONAME) Library soname: [somelib/lib/x64/lib.so]
$ readelf -a someexe/bin/x64/exe | grep SONAME
(SONAME) Library soname: [someexe/bin/x64/exe]
So you need to build it with
clang -Wl,-soname,someexe/bin/x64/exe -o someexe/bin/x64/exe exe.c
This is a complicated part of the project. We would like to get the smalles possible interface that will make it easy to port to many operating systems. The initial idea was to simply write a libc (xlibc) that just calls the underlying operatingsystems libc functions. This works mostly. However of course this is a monumental task. There are also many libc functions that have nothing to do with the operating system. Such as all the string, wchar and locale functions. Even printf, and scanf are mostly about formating and contain very little actual os specific code. On the other hand there is a lot missing from libc such as networking.
We use clangs libc++ as c++ standard library.
... difficult :( might try qt
At the moment the web is the only framework that allows app developers to deploy their apps to most platforms with relative ease. Native App Developers have it hard to deploy to all platforms. They need to build their project for multiple cpu architectures (x86, x64, arm32, arm64, riscv...) and multiple operating systems. On Linux they might even have to do it for every distribution. Even worse, cross compilation is usually very cumbersome. They often need a machine for every configuration in their build infrastructure. Qt is probably the most popular non-web based Gui Framework. Setting up an infrastructure to build a Qt app for every system that might want it is a considerable effort.
The web is an amazing framework. It is so extensive and easy to deploy everywhere use that it makes it hard for other frameworks to compete. However there are several problems with it in the long run.
- Since it is standardized it can only ever add features.
- It's interface is so big that it is hard for a newcommer to implement. Therefore there will be fewer and fewer full implementations.
- The web is slow to add new functionality (for good reason).
- Lack of access to very basic native APIs. I.e. File System, TCP ..
- If the process of standardizing ever goes in the wrong direction, the rest of the world will still be stuck with it.
There are several options for reducing build complexity.
- Make the package Architecture independend. This can be done by shipping bytecode / sourcecode instead of binaries. If done correctly the performance of this is probably ok for most purposes. I am not sure why there are no popular cross platform java gui frameworks for example. I can only assume it is because writing a good Gui framework is such a big task. We will not be going in this direction however for two reasons. Firstly, there is some small performance impact. Secondly, the bytecode / source code format will also be a big interface that can not change. If we ship binaries, the binary format is of course an interface that does not change much. However this is anyway given by the hardware. And over the long run new architectures might be added and old ones removed.
- Make it easier to find files on all platforms. This is mostly a C/C++ specific problem. But this is actually the easiest to solve. We well try to solve this problem. If files are always to be found in the same place, a previously complex process might become easy enough to solve with a simple Makefile
- Make the package Operating System independend. This is the core of what this project tries to acomplish. The idea is that we compile our program in some format (here elf) load it into memory and execute it. When an operating system call comes, we simply call the underlying operating system call. We know this is has reasonable performance as it is exactly the same concept as wine/proton. The binary format (elf) is not a very big interface. At the moment of this writing, we are able to load and run some reasonable part of libc++ on just shy of ~1000 lines of code for the loader code.
So in summary we will try to make a system where you need to build for each architecture (point 1) but not for each operating system (point 3). We will also try to make files available in the same place on all operating systems (point 2).
To understand this section it is good if you have a good understanding of how programs are loaded. There is a short overview in the "Loading on Linux" section
A program will always need some functionality from the operating system. The question is where we want to set this interface.
We could set it at the systemcall interface. This is what docker/WSL (windows subsystem linux) does.
Advantages:
- What happens in the address space is very well controlled and thus probably very similar wherever it's run.
- Can run a lot of existing applications Disadvantages:
- Obviously tied to the systemcall interface of one operating system.
- If run on a differnt system a lot of out of process virtualization is needed (linux kernel/xserver in case of WSL)
Docker is the less interesting case it only runs natively under linux. Ignoring security concerns, it is actually not so different from just running a statically linked binary. In Docker it seems uncommon to run GUI applications. It seems most suited for applications that don't rely on too much periphery.
WSL is more intersting. It actually has a linux kernel running beside the windows kernel so the executable can actually run normal linux binaries. This is only a general solution if all operating system providers implement this same functionality. There is probably no way for an independent party to implement such a thing for OSX. Because of this same reason, it is also very difficult to add to or change this interface. Any new API would first need to be supported by all providers. Also it is unclear how userspace drivers should work. To avoid unnecessary overhead a program should ideally call directly to the userspace drivers provided by the system. The system probably needs to provide linux userspace drivers that talk to virtual kernel drivers that then talk to the real kernel drivers in WSL. I assume this is how WSL manages to support OpenGL, but I'm not sure???. WSL seems to run it's own version of an XServer. Ideally this overhead could also be removed.
This is what wine does. The userspace will load the code of the windows binary. For all windows system functions wine will link in implementions that use linux systemcalls.
Advantages:
- This seems to work verry well for a lot of games with very little overhead. Disadvantages:
- Can probably never run anything without proprietary code.
There is a part of code between the libc api and the api that wine virtualizes. This part probably consists of proprietary code linked into the executable itself and proprietary code in dlls provided by wine.
What should the loader do and what should the executable do?
We want to decide for the following things.
- threading
- Implementation on Linux relies on
fork()
. So we could instead have the interface atfork()
- Implementation on Linux relies on
- initializing libraries
- memory allocation
- This would work very well as either library or just provided by the system
- linking symbols
- We can not put this into a dynamically linked library because it needs to run before dynamic linking
- We don't want to put it in all executables because it would make executables much to big and be very unmodular
How does this compare to wine?
Here we describe the different components involved in starting a new program.
We are mostly interested in dynamically linked programs. So this is always assumed.
- Exit from old program (old executable). Old program calls the systemcall execve which takes the new executable as argument.
- Kernel. The kernel loads the code of the executable and reads path to the interpreter (dynamic loader) from it. It also loads the code of the dynamic loader. It also allocates a stack and puts some arguments on it. Then it passes control to the Dynamic Loader (userspace).
- Dynamic Loader (dl in userspace). The dynamic loads the depencencies (libraries) of the executable and links it. Then it gives control to the _start function.
- _start (executable). The start function is actually statically linked inside the executable from crt1.o. Not sure what exactly it does. But it calls __libc_start_main in glibc.
- __libc_start_main (glibc). Not sure what it does. But it does eventually call main of the executable.
- main (executable)
The old program calles the execve systemcall. This call does not return. Instead the specified program is executed in this process (programs usually call clone before execve).
The program specified in the execve systemcall can be either a script with she-bang or an elf executable. We will always assume it's an elf executable.
The load_elf_binary
function in the binfmt_elf.c
in the kernel is responsible for loading the executable into memory.
Simplified load_elf_binary
function:
/* Now we do a little grungy work by mmapping the ELF image into
the correct location in memory. */
for(i = 0 i++ elf_ppnt in program_headers) {
error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,
elf_prot, elf_flags, total_size);
}
if (interpreter) {
elf_entry = load_elf_interp(interp_elf_ex,
interpreter,
load_bias, interp_elf_phdata,
&arch_state);
}
Defined in glibc/csu. This contains some important files. crt1.o crti.o crtn.o
This gets added to to every executable at the beginning.
It contains the _start
symbol.
It does not get added to -shared
objects.
It fiddles around with the stack a bit, sets up env, auxv, runs __libc_start_main
.
This in turn fiddles some more, calls the initializers of all the shared libraries and executes main.
In our loader these tasks are done by the loader so we (hopefully) don't need this.
They seem to do some magic with the .init .fini section.
They are only a couple of lines of assembly.
E.g in sysdeps/x86_64/crti.S
I think crti
can setup __gmon_start__
which is some kind of monitoring/progiling tool if it's present.
https://stevens.netmeister.org/631/elf.html
This is all for 64 bit.
- Elf Header 64bytes
- Program Headers 56bytes * number of headers (this describes how things get loaded into memory / mostly for the dynamic linker)
- irrelevant for dynamic linker
- .interp (only if set with -Wl,--dynamic-linker=...) string pointing to dynamic linker
- we don't use this
- .hash (only if dynamic) hash table lookup for string -> entry in .symtab
- we find this from .dynamic -> DT_HASH
- .gnu.hash (only if dynamic) faster version of .hash
- .dynsym (only if dynamic) list of symbols from other libraries
- .dynstr (only if dynamic) strings for .dynsym
- .rela.dyn (only if dynamic) relocation non ptl (i.e. variables). sais which part of the code this needs to be copied to
- .rela.plt (only if dynamic) relocation for ptl (procedur linkage talbe. i.e. functions). sais which part of the code this needs to be copied to
- Padding to Pagesize 0x1000 (4096)
- .plt (procedure linkage table) when .text wants to call an external function it calls here. then from here it does jmpq to Global Offset Table. Then first time into dl_runtime_resolve_xsavec. Second time directly
- .text (program code)
- Padding to Pagesize
- .rodata (read only data) ....
- Padding to Pagesize 0x1000 - sizeof(.dynamic)
- .dynamic (entries in .strtab)
- .got.plt (starts at pagesize)
- Section Headers 64bytes * number of headers (this actually describes all the .something sections / mostly for the static linker)
In the programs address space there folloing parts are usually mapped. Note executables compiled with -pie -fPIE are more like dynamic libraries and have ASLR (Address Space Layout Randomization). 1 - 9 -> 0x400000 (readonly) 11 - 12 -> 0x401000 (read execute) 14 - .. -> 0x402000 Mappings of a running program can be checked at /proc//maps
Other sections:
- .gnu.version (VERSYM in dynamic)
- .gnu.version_d (VERDEF in dynamic)
http://articles.manugarg.com/aboutelfauxiliaryvectors.html
Assuming your code looks like this. foo
and bar
are external functions
foo();
bar();
foo();
The assembly code will look like this
call 0x11130 <foo.plt>
call 0x11140 <bar.plt>
call 0x11130 <foo.plt>
The plt will look
0x11120 push 0x2fe2(%rip) # plt base
0x11126 jmp *0x2fe4(%rip) # points to _dl_runtime_resolve_xsavec
0x1112c nopl 0x0(%rax)
0x11130 jmp 0x22220 # this is foo@plt and points to [email protected]
0x11136 push 0x0
0x1113b jmp 0x11120
0x11140 jmp 0x22228 # this is bar@plt and points to [email protected]
0x11146 push 0x1 # plt entry
0x1114b jmp 0x11120
The got will look like
0x22220 0x11136 # this is [email protected] (this will be replaced to point to the actual foo)
0x22228 0x11146 # this is [email protected]
Trying to run ELF executables on Windows we run into the following problem. Our executables want most of their segments to be loaded contiguously. Their alignment is the pagesize. However under windows mapping files into memory is only possible with allignment dwAllocationGranularity. For now we solve this by just copying the data. This has two backdraws.
- Multiple copies of the same library in memory.
- The whole memory segment has to be rwx. This could be a security problem.
After fighting with c++ on gcc a bit, it seems like it might easier to port glibc to windows and just emulate the systemcalls than to port stdlibc++ to a libc that just uses the corresponding libc calls of the os. This is because glibc has many calls that are not really standard.
It seems gdb gets info about newly loaded librarys when the loader calls _dl_debug_state
In gdb target_insert_breakpoint
.
There are different breakpoints.
The only one I observed was a memory breakpoint in default_memory_insert_break
.
GDB then writes an interupt instruction 0xCC
(x86) into the program where it wants the breakpoint.
You can't observe this in gdb itself since it changes it back at any interrupt and inserts it again when you continue running the program.
The following program will print different results depending on weather there is a breakpoint on main or not.
#include <stdio.h>
int main() {
printf("instruction %02hhX\n", *(char*)(&main + 4));
return 0;
}
http://sourceware.org/gdb/onlinedocs/gdb/JIT-Interface.html
separate debug from exec data as follows
objcopy --strip-debug a.out a.exe
objcopy --only-keep-debug a.out a.dbg
objcopy --add-gnu-debuglink="a.dbg" "a.exe"
FileSiz can be shorter than MemSiz. This happens for instance for the .bss section of uninitialized variables. The loader needs to set them to 0.
There is a script for that in ./bootstrap.sh
*Q: Why does ninja on linux always rebuild everything.
A: because clangs writes the dependency files to /_zwolf which exists for a clang running on distx but not for ninja running on linux. You can link /_zwolf to your zwolf root dir.