Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare ld.lld failures #123567

Open
chestnykh opened this issue Jan 20, 2025 · 6 comments
Open

Rare ld.lld failures #123567

chestnykh opened this issue Jan 20, 2025 · 6 comments
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] lld:ELF

Comments

@chestnykh
Copy link
Contributor

chestnykh commented Jan 20, 2025

In a few runs ld.lld (17.0.6) fails with SIGBUS with the following bt in coredump:

#0  __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:345
()  0x000055836ad02906 in void lld::elf::InputSection::writeTo<llvm::object::ELFType<(llvm::support::endianness)1, false> >(unsigned char*) ()
()  0x000055836ad26bcd in std::_Function_handler<void (), lld::elf::OutputSection::writeTo<llvm::object::ELFType<(llvm::support::endianness)1, false> >(unsigned char*, llvm::parallel::TaskGroup&)::{lambda()#3}>::_M_invoke(std::_Any_data const&) ()
()  0x00007f14a5934ca4 in std::_Function_handler<void (), llvm::parallel::TaskGroup::spawn(std::function<void ()>, bool)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
so  0x00007f14a5934ac9 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::{lambda()#1}::operator()() const::{lambda()#1}> > >::_M_run() ()
.6  0x00007f14a5015b2f in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
86  0x00007f14aa417fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
95  0x00007f14a4cf406f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This happens in debian-10 running in docker with glibc-2.28

Unfortunately i don't have reproducers :(

@MaskRay any ideas?

@llvmbot
Copy link
Member

llvmbot commented Jan 20, 2025

@llvm/issue-subscribers-lld-elf

Author: Dmitry Chestnykh (chestnykh)

In a few runs ld.lld fails with SIGBUS with the following bt in coredump: ``` #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:345 () 0x000055836ad02906 in void lld::elf::InputSection::writeTo<llvm::object::ELFType<(llvm::support::endianness)1, false> >(unsigned char*) () () 0x000055836ad26bcd in std::_Function_handler<void (), lld::elf::OutputSection::writeTo<llvm::object::ELFType<(llvm::support::endianness)1, false> >(unsigned char*, llvm::parallel::TaskGroup&)::{lambda()#3}>::_M_invoke(std::_Any_data const&) () () 0x00007f14a5934ca4 in std::_Function_handler<void (), llvm::parallel::TaskGroup::spawn(std::function<void ()>, bool)::{lambda()#1}>::_M_invoke(std::_Any_data const&) () so 0x00007f14a5934ac9 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<llvm::parallel::detail::(anonymous namespace)::ThreadPoolExecutor::ThreadPoolExecutor(llvm::ThreadPoolStrategy)::{lambda()#1}::operator()() const::{lambda()#1}> > >::_M_run() () .6 0x00007f14a5015b2f in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6 86 0x00007f14aa417fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486 95 0x00007f14a4cf406f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 ```

Unfortunately i don't have reproducers :(

@MaskRay any ideas?

@chestnykh
Copy link
Contributor Author

chestnykh commented Jan 20, 2025

The error happens in memcpy in

  // Copy section contents from source object file to output file
  // and then apply relocations.
  memcpy(buf, content().data(), content().size());
  relocate<ELFT>(ctx, buf, buf + content().size());

@chestnykh
Copy link
Contributor Author

I guess that probably the error is caused by filesystem and/or kernel behavior around mmap'ed files

@EugeneZelenko EugeneZelenko added the crash Prefer [crash-on-valid] or [crash-on-invalid] label Jan 20, 2025
@chestnykh
Copy link
Contributor Author

chestnykh commented Jan 20, 2025

This may be caused by lld's memory consumption. I've downloaded the repro from #100511, ran it and got very similar error.
At the moment error occured dmesg reported oom-killer activity and in htop i saw that lld ate 27G/32G of memory. I think that at the peak it ate all the memory on the host.
ld.lld process sometimes is killed by oom killer and sometimes (if i write -1000 to oom_score_adj of lld's process' file) ld.lld segfaults when oom killer kills another process

trunk lld gives the same result

@rnk
Copy link
Collaborator

rnk commented Jan 21, 2025

I think you're right, this is an OOM situation during memory-mapped output writing. For performance, LLD assumes inputs + outputs will fit into RAM, and memory usage is already somewhat optimized. At best, LLD can really only help you diagnose the problem more easily by catching SIGBUS and diagnosing the problem for you, but this isn't going to be reliable. There may be other approaches to force the kernel to commit pages ahead of time, but it will be difficult to do that with zero performance overhead.

@chestnykh
Copy link
Contributor Author

I think you're right, this is an OOM situation during memory-mapped output writing. For performance, LLD assumes inputs + outputs will fit into RAM, and memory usage is already somewhat optimized. At best, LLD can really only help you diagnose the problem more easily by catching SIGBUS and diagnosing the problem for you, but this isn't going to be reliable. There may be other approaches to force the kernel to commit pages ahead of time, but it will be difficult to do that with zero performance overhead.

LLD has --no-mmap-output-file option, but the behaviour is the same both with and without this option

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crash Prefer [crash-on-valid] or [crash-on-invalid] lld:ELF
Projects
None yet
Development

No branches or pull requests

4 participants