Slides/videos from Linux Plumbers Conf 2020 (Virtual)
Slides: TODO: links
The LLVM MC and LLVM BoF at Linux Plumbers Conf 2020 went well; we had 9 talks/session, 1 call for a new mailing list, 2 maintainers named (Nathan Chancellor, Nick Desaulniers), 3 sent improving documentation and committing to a support model around the latest release of clang (clang-10/10.0.1), and we recorded 172 attendees in our MC at one point.
Will Deacon gave a great background with simple examples defining Control vs Data vs Address dependencies. He highlighted how modern hardware can reorder read->read control dependencies, and specifically that compiler transforms that could convert read->read address dependencies into control dependencies would break hardware ordering.
Being able to help identify such cases during a build would be helpful. There
was some discussion about bringing this terminology of the 3 different
dependencies into the ISO WG14 standards body. Peter Zijlstra and Paul McKenney
discussed a feature request for annotating control dependencies maybe with the
use of keyword volatile
on the closing }
of a loop. It was noted that loop
exit in general was a concern. Marco Elver suggested marking a block statement
volatile
.
Peter had the idea to start a kernel toolchain agnostic mailing list to discuss the details and design goals further. Nick Desaulniers emailed VGER postmaster about the idea: https://groups.google.com/g/clang-built-linux/c/GLEkFKlDXfo/m/o6UmfyvDAAAJ.
Geoffrey Thomas presented the work around prototyping in tree support for writing Linux kernel drivers in Rust. Improved memory safety and statistics around 2/3 of bugs written C/C++ being memory safety issues were referenced.
It was noted that the suggestion was not to to attempt to rewrite the kernel
in Rust, but rather provide Kbuild integration such that greenfield drivers may
be written in the language. Also, discussion about using cargo
but requiring
all modules be in tree occurred.
There were a fair amount of questions around rustc
's dependence on LLVM,
which lacks backends for some of the more obscure ISAs that the Linux kernel
supports. It was noted that there's interest and potentially a bounty on
implementing a Rust frontend to GCC, though it was suggested that having
rustc
emit a GCC IR might be less work.
It seems that the use of bindgen
for automated language bindings to the
kernel had a few questions. bindgen
requires libclang to parse kernel
headers. The generated bindings don't provide lifetime annotations so it's
common to wrap autogenerated bindings in manual wrappers that provide lifetime
annotations. Auto generating bindings can help detect when interfaces change.
There was also questions around whether targeting the save version of LLVM IR
or not as Clang for kernel C could would imply an ABI breakage. It likely
would, though in practice for small samples this has yet to be an issue. For
strict guarantees, the Clang and rustc
would have to use precisely the same
version of LLVM.
Sami Tolvanen, Bill Wendling, and Nick Desaulniers gave an overview of what these are, and some brief numbers showing their successful use in 3 different kernel distributions from Google. Upstreaming this work was a major question, as the build times went up significantly for LTO or even thinLTO builds. Also, the profiling data had to be post processed using an open source utility that's out of tree both for the kernel and LLVM. Mark Brown suggested that maybe git wasn't the best place to store binary data that undergoes significant churn, and that maybe CI systems in place for the kernel could provide relevant training data.
The talk covered some similar topics as Ian Bearman's talk "Exploring Profile Guided Optimization of the Linux Kernel." https://linuxplumbersconf.org/event/7/contributions/771/ Collaboration between toolchain implementations on kernel patches was encouraged.
Nathan Chancellor and Nathan Huckleberry presented data and techniques for measuring compile times of the Linux kernel with Clang. Mr. Chancellor presented data showing GCC beating Clang across the board. It was noted that the use of profiling data and LTO builds of Clang could bring Clang more in line to be competitive, but the same modifications and measurements were not done for GCC which likely would see significant performance improvements as well.
Mr. Huckleberry presented graphs and profile reports of builds with Clang, noting that there was significant low hanging fruit around inline asm statements (13% of a build wasted recomputing values, since fixed in clang-11) and macros with large token counts, such as the kernel's use of GNU C statement expressions (identified but not yet fixed). Profile data was shown that significant time was spent in the compiler front end (lexing, parsing, and semantic analysis) rather than the backend (optimization and codegen). Work was also show for Perfetto which allowed graphical profiles to be shared and queried. It was noted that this is early days of compiler performance optimization research, and that there was still a lot of work to do here.
Jason Gunthorpe asked about the use of precompiled headers for the kernel, which Arnd Bergmann reported was problematic last time Arnd tried.
Arnd hinted at a WIP series of patches that significantly cut down on the compile times with both GCC and Clang by minimizing header dependencies. https://drive.google.com/file/d/1GFCmN3r93EJImvo-cbYJLd-iY1vJ_G5i/view?usp=sharing provides a visualization of the problem. Nodes in the graph with high fan in or fan out may be interesting to break up.
Marco Elver asked about "include-what-you-use" (IWYU), a tool commonly used to solve this problem. Ilie Halip and Arnd Bergmann bother reported issues running that tool on the kernel sources due to it not understanding the kernels sometimes-config-based includes.
Miguel Ojeda and Nathan Huckleberry presented work they've done to support automating fixing kernel style nits via clang-tidy rules and help catch bugs via static analysis via clang-tidy and scan-build (Clang's static analyzer).
When polled, there was a split between maintainers that would and would not
consider running clang-tidy on their who subtree inducing churn. git clang-tidy
was suggested for developers to format just their patches and not
the rest of whole files. Will Deacon noted that many maintainers no longer
run ./scripts/checkpatch.pl
on their trees due to false positives.
clang-tidy
was presented as a codebase specific linter for writing codebase
specific warnings. Masahiro Yamada asked if warnings could live in-tree of the
kernel. Stephen Hines clarified that clang-tidy warnings were appropriate to
upstream into LLVM, as many projects have custom rules in clang-tidy and that
it was easy to specify the checks you want. Nathan's patch to enable clang-tidy
already disables all checks, then re-enables Linux kernel specific ones.
Bill Wendling gave a presentation on how he designed and implemented an
extension to the GNU C extension asm goto
to support outputs along the
fallthrough path. This feature was requested by Linus and other kernel
developers to improve some of the code for the happy path of
get_user
/put_user
. Some ambiguous cases were pointed out.
Collaboration with GCC developers was welcomed in implementing. A shared kernel toolchain mailing list would be preferred to do such design collaboration in the future. Since then, Segher Boessenkool has reached out to Nick Desaulniers, Bill Wendling, and James Knight to discuss nitty gritty details.
Prof. Mathieu Acher from University of Renne 1, Inria presented research on the use of Machine Learning (ML) classification via statical analysis and use of decision trees to help identify broken kernel configurations. Mathieu noted that the kernel has 10^6000 configurations, and that a decision tree made it easy to visualize what commonalities various broken builds may have. Further research was meant to analyze which configs either hurt binary size (such as CONFIG_DEBUG_INFO and friends) or compile times.
Arnd Bergmann noted that in his randconfig testing, he observed about 1.6% of randconfig builds failing with Clang, which was not much more than he observed in builds with GCC.
Mathieu noted that their testing was done against x86_64, and that other lighter tested architectures would like face more build failures with GCC (or Clang).
Dan Rue and Antonio Terceiro presented work they've done to build the TuxMake and TuxBuild microservices, to help maintainers or CI system developers solve build scaling related issues, and maintain artifacts for reproducibility.
It was demonstrated that LKFT is already making use of the services to run ~70 builds in ~15 minutes.
Kees Cook asked about boot-testing related microservices.
Khem Raj asked about distributed builds. Dan explained that most kernel developers don't do such builds so they avoid them in case that causes differences in the resulting binaries.
Dan recommended checking out https://gitlab.com/Linaro/tuxbuild or emailing [email protected] for access.
Nick Desaulniers gave a quick overview of supported architectures supported by LLVM and the Linux kernel (arm, arm64, x86, powerpc, mips, arc, hexagon, riscv, s390, sparc) and mentioned to CI implementors that we'd like to get environments setup to be testing those ISAs.
Then the move to using LLVM=1
to simplify testing was discussed. Such a move
would help give test coverage to LLVM's binutils substitutes (ld.lld, llvm-nm,
etc.). Also, this would help minimize the command line to build the kernel.
Kevin Hillman implemented support for LLVM=1 in KernelCI shortly thereafter.
Nick mentioned that it would be nice too to omit CROSS_COMPILE since it was mostly redundant and could be inferred from ARCH in most cases. Masahiro Yamada asked if CROSS_COMPILE could/should be omitted if LLVM=1 LLVM_IAS=1. Mark Brown noted this could be tricky for environments with multiple versions of toolchains installed, such as for KenelCI.
Guillame Tucker mentioned that LLVM=1 should be tested with scripts/merge_config.sh, which has problems with CC=clang. Guillame later noted that LLVM=1 was good to go with merge_config.sh.
Finally, Geoffrey Thomas recommended checking out a tool called "rust crater."
Thank you to all of our great speakers, those that submitted proposals for the MC, and attendees. Particularly:
- Aditya Kumar
- Alessandro Decina
- Alex Gaynor
- Antonio Terceiro
- Bill Wendling
- Dan Rue
- Geoffrey Thomas
- John Baublitz
- Josh Triplett
- Mathieu Acher
- Miguel Ojeda
- Nathan Chancellor
- Nathan Huckleberry
- Nick Desaulniers
- Paul McKenney
- Peter Parkanyi
- Peter Zijlstra
- Sami Tolvanen
- Will Deacon
Thank you to the MC leads for putting together the proposal and reviewing submissions, as well as moderating:
- Behan Webster
- Nick Desaulniers
Thank you to the Planning Committee, for the tireless effort involved in planning and building infrastructure for the virtual event. We saw many of you working hard day of behind the scenes resolving minor issues. This was a major contribution of time and effort to the Linux ecosystem, and we're so thankful for all of the work you did that made Linux plumbers conf 2020 such a success.
- Carlos O'Donnell
- Christian Brauner
- David Woodhouse
- Elena Zannoni
- Guy Lunardi
- James Bottomley
- Jon Corbert
- Kate Stewart
- Laura Abbott
- Paul McKenney
- Steven Rostedt