-
Notifications
You must be signed in to change notification settings - Fork 195
ARM Tech Preview (1.3.2)
This area houses the latest info/tips regarding the ARM tech preview included in the OpenHPC 1.3.2 release (September 2017). The provided packages are targeted at 64-bit server platforms, however it is being released as a Tech Preview initially as there are some known issues around provisioning and a subset of development libraries when exercised on the target OS distributions versions and tested hardware platforms.
The information on this page is intended to supplement the aarch64
OpenHPC Installation Guides for SLES-12-SP2 and CentOS-7.3. In particular, the provisioning steps outlined with Warewulf (most of the steps in sections 4.3 thru 4.9) are not directly usable without additional modification to the PXE boot configuration.
- `-intel- , -impi- , -mvapich2- are not supported. The first two are intel proprietary and the third is only configured for intel NIC solutions. These account for 121 of the packages differences between aarch64 and x86_64.
- GSL: a small subset of tests performed with the GSL library failed precision related tests. This is currently attributed to the fact that the tests included in GSL are tuned for x86 which does 80-bit extended precision.
- PAPI: hardware counter availability may not be available depending on the underlying ARM platform.
- MPI: available hardware for this Tech Preview release was ethernet only. The available MPI stacks reflect this test environment. Mellanox InfiniBand stacks do work, but have to be installed by hand at this time.
- mpiP: appears to have trouble collecting certain information in certain scenarios causing it to fail integration tests
- Warewulf: the ARM Standard Base Boot Requirements and Standard Base System Architecture requires specific UEFI support during the boot process which doesn't seem to be compatible with the way warewulf currently provisions worker nodes. There is a work-around, but it requires some manual intervention during installation and deployment of the nodes.
- Need to add network drivers during bootstrap for different platforms (usb, thunderx, etc.)
- Need to add network drivers during bootstrap for different platforms (usb, thunderx, etc.)
On the three tested system configurations we found disparity with respect to the availability of performance counters. Since all ARM 64-bit hardware has performance counters and PAPI has support for ARM64 performance counter hardware, this is likely a problem with either the kernel or the device tree passed to the kernel from the firmware. You can determine whether or not you have access on your platform's configuration by running papi_avail(1):
# module load papi
# papi_avail
...
Number Hardware Counters : 0 (Xgene Mustang)
-- or --
Number Hardware Counters : 6 (Softiron Seattle)
-- or --
Number Hardware Counters : 0 (Cavium ThunderX)
The other thing to note is that while the ARM Architecture specifies a core set of performance counters, many more may be available depending on the microarchitecture. We are in the process of working with the various silicon partners to make sure support for these is available in PAPI. As we discover workarounds enabling additional counter support for various platforms we will include them here.
Lustre client support has been available for a while on both the 32-bit and 64-bit ARM platforms. However, since different ARM platforms require different kernels than the standard ones found in the CentOS-7.3 distributions we couldn't easily build a lustre that would work for specific platform configurations. For now on CentOS-7.3, you'll have to build your own kernel if you want lustre support.
MVAPICH packages compiled but we did not have InfiniBand hardware support in our testbeds at the time of these release to validate the packages and/or any instructions relating to them. We are working with platform vendors to acquire sufficient hardware to test this in the future. If you have working InfiniBand support on your ARM platform you may be able to get existing libraries to work on your own.
Network booting is a bit different on ARM platforms - ARM servers all must use UEFI firmware, so in order to network boot them at the moment you must netboot a GRUB2 EFI netboot image which then tftpboots kernel and RAMFS from the server. It is also important to remember that current ARM servers may use a different kernel than the one provided by a distribution.
We are resolving these issues within the warewulf community and expect it to be fixed in the next major release as they move to the more portable iPXE over syslinux.
The best chance of success is to use the kernel and modules that come installed on the server and use those for network booting with a warewulf created ramdisk. Basic PXE boot instructions can be found on the Linaro website: https://wiki.linaro.org/LEG/Engineering/Kernel/UEFI/UEFI_Network_Booting . However, the Linaro instructions are specific to running on ARM emulators, specific instructions for the OpenHPC test platforms follow:
Obtain a bootnetaa64.efi GRUB2 image from your distribution or build it yourself and put in directory
-
Install grub2 packages:
% rpm -ihv http://build.openhpc.community/home:/eric/SLE_12_SP1/aarch64/grub2-2.02~beta3-1.1.aarch64.rpm http://build.openhpc.community/home:/eric/SLE_12_SP1/aarch64/grub2-arm64-efi-2.02~beta3-1.1.aarch64.rpm
-
Create a working grub2 EFI binary and copy it into /srv/tftpboot/aarch64/grub.efi:
% grub2-mkimage -O arm64-efi -o grub.efi -p /aarch64/boot/grub2 `ls /usr/lib/grub2/arm64-efi/*.mod | cut -d . -f 1`
-
Edit /srv/tftpboot/aarch64/boot/grub2/grub.cfg, adjust bootstrap path to match your warewulf generated setup
echo Now booting ${net_efinet0_hostname} with Warewulf bootstrap
echo Loading kernel...
linux (tftp)/warewulf/bootstrap/6/Image ro wwhostname=$net_efinet0_hostname quiet wwmaster=$net_default_server \ wwipaddr=$net_efinet0_ip wwnetmask=255.255.0.0 wwnetdev=eth0
echo Done!
echo Loading initrd...
initrd (tftp)/warewulf/bootstrap/6/initfs.gz
echo Done!
# Override for ARM Servers
if substring (option vendor-class-identifier, 15, 5) = "00011" {
filename "aarch64/grub.efi";
}
Right now it doesn't appear ipmi pxe commands effect the UEFI boot configuration settings, so you'll have to interrupt boot on the serial console and configure PXE manually on each worker. This is also a good time to capture the hardware MAC address to give to DHCP and warewulf if you don't know it already.
- Cavium ThunderX uArchitecture, armv8
- ThunderX (version a1)
- 2 socket, 48-core, 128GB of Memory
- Linux Version 4.4.21-64-default
- Tested against SLES-12-SP1 install
- EFI v2.40 by Cavium Thunder cn88xx EFI ThunderX-Firmware-Release-1.22.9-15-gcc66a09 Aug 4 2016 16:55:45
APM X-C1 Server Development Platform (Mustang)
- APM X-Gene uArchitecture, armv8
- APM X-Gene-1
- 1 socket, 8 core, 16GB of memory
- Linux Version 4.4.11-reference.135.aarch64
- Tested against CentOS-7.2 install
- EFI v2.40 by X-Gene Mustang Board EFI Nov 24 2015 13:22:41
- ARM Cortex-A57 uArchitecture, armv8
- AMD Seattle Processor (Rev.B0)
- 1 socket, 8 core, 16GB of Memory
- Linux version 4.4.21-64-default
- Tested against SLES-12-SP1 install
- EFI v2.40 by American Megatrends
Please feel free to email any questions related to this Tech Preview to the OpenHPC mailing list ([email protected] & https://groups.io/g/openhpc-users) and we will endeavor to do our best to answer them and include the response for others to benefit from.