Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VM] Process terminates with handled exceptions in FFI library #53267

Closed
blagoev opened this issue Aug 18, 2023 · 33 comments
Closed

[VM] Process terminates with handled exceptions in FFI library #53267

blagoev opened this issue Aug 18, 2023 · 33 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi P1 A high priority bug; for example, a single project is unusable or has many test failures

Comments

@blagoev
Copy link
Contributor

blagoev commented Aug 18, 2023

  • Dart SDK version: 3.1.0 (stable) (Tue Aug 15 21:33:36 2023 +0000) on "linux_x64"
  • Using the Dart supplied executable within the Flutter package
  • Linux, Ubuntu 20

We are seeing consistent process termination with the latest stable Dart executable on Linux with FFI
We have this code in the native library.

try {
    return f();
    }
    catch (...) {
        set_last_exception(std::current_exception());
        return {};
    };

The f() throws a std::exception.

The process ignores the catch(...) block and proceeds to terminate.

The same library binary file works correct when loaded by the previous Dart stable version 3.0 executable.

The stack trace at termination is this

        libc.so.6!__GI_raise(int sig=6, int sig@entry=6) Line 50	C	Symbols loaded.
 	libc.so.6!__GI_abort() Line 79	C	Symbols loaded.
 	libstdc++.so.6!__gnu_cxx::__verbose_terminate_handler()		No symbols loaded.
 	libstdc++.so.6![Unknown/Just-In-Time compiled code]		No symbols loaded.
 	libstdc++.so.6!__gxx_personality_v0		No symbols loaded.
 	[Unknown/Just-In-Time compiled code]		No symbols loaded.
 	_Unwind_RaiseException		No symbols loaded.
 	libgcc_s.so.1!_Unwind_Resume_or_Rethrow		No symbols loaded.
 	libstdc++.so.6!__cxa_rethrow		No symbols loaded.
 	libstdc++.so.6![Unknown/Just-In-Time compiled code]		No symbols loaded.
 	libstdc++.so.6!__gxx_personality_v0		No symbols loaded.
 	[Unknown/Just-In-Time compiled code]		No symbols loaded.
 	_Unwind_RaiseException		No symbols loaded.
 	libstdc++.so.6!__cxa_throw		No symbols loaded.
>	librealm_dart.so!realm::Object::verify_attached(const class realm::Object * const this=0x7fa54000e260) Line 179	C++	Symbols loaded.

In this case the f() is the verify_attached function which throws the exception.

The output contains

terminate called after throwing an instance of 'realm::InvalidatedObjectException'
terminate called recursively
@mit-mit
Copy link
Member

mit-mit commented Aug 18, 2023

cc @dcharkes

@mit-mit mit-mit added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi labels Aug 18, 2023
@blagoev
Copy link
Contributor Author

blagoev commented Aug 18, 2023

Tested with the Dart SDK distributed outside Flutter and it fails as well.

Not sure how could this happen. The only thing I can come up with is, if the process is installing an exception trap which get called before the catch all block.

@blagoev
Copy link
Contributor Author

blagoev commented Aug 18, 2023

I think what's going on is the Dart process is installing the Verbose Terminate Handler but I can not find where it does this.
https://gcc.gnu.org/onlinedocs/libstdc++/manual/termination.html (Verbose Terminate Handler)
I believe that std::abort is called after the __verbose_terminate_handler by default which terminates the process and this is consistent with the behavior we are seeing.

Ignore, it seems this is incorrect.

@dcharkes
Copy link
Contributor

@blagoev could you provide a minimal reproduction? I tried to reproduce it with https://dart-review.googlesource.com/c/sdk/+/321721, but that doesn't crash for me. Maybe I'm not using the right combination of features to trigger the crash. (Running on the latest commit from the main branch.)

@blagoev
Copy link
Contributor Author

blagoev commented Aug 18, 2023

Sure, i will try to repro it outside Realm. It may be Realm related, but then why would be working with the previous Dart VM with no recompile.

Will write back.

@dcharkes
Copy link
Contributor

dcharkes commented Aug 18, 2023

It might well be something in the Dart VM. However, it's very hard to diagnose for me without a repro.

Besides @liamappelbe's refactorings on callbacks, I don't believe we had any significant modifications to the workings of the FFI from 3.0 to 3.1. If your minimal repro includes callbacks that would be a place to start looking. Else it might any change in the VM.

With a minimal repro, I can probably triage the issue here locally.

@blagoev
Copy link
Contributor Author

blagoev commented Aug 18, 2023

Ok managed to repro it on top of the hello_world sample here https://github.com/dart-lang/samples/tree/main/ffi/hello_world. Used the current main branch head ef57c26d7d9f6428b83fe9f1f704ba13f18f0b37
Here is the gist https://gist.github.com/blagoev/cb7c9005adce97a7d45861ad8baefd21
renamed hello.c to hello.cpp and deleted hello.def

@blagoev
Copy link
Contributor Author

blagoev commented Aug 18, 2023

I got a hint from a colleague that it might be different c++ runtimes for the process and the shared library. How is the Dart executable compiled? For the the repro I am using gcc 9.3

@nielsenko
Copy link

@dcharkes

For your convenience I have created a repository with a reproduction. It is based on the ffi plugin you get by default, when running flutter create -t plugin_ffi with minimal changes, like using c++ instead of c.

On Linux

First build the ffi lib:

mkdir build-native
cmake -S src/ -B build-native/
cmake --build build-native/

next compile bin/bomb.dart with both Dart 3.0.6 and 3.10

flutter/3.10.6/bin/dart compile exe bin/bomb.dart -o build-native/before.exe
flutter/3.13.0/bin/dart compile exe bin/bomb.dart -o build-native/after.exe

Run both:

# Dart 3..0.6
./build-native/before.exe
-1
# Dart 3.1.0
./build-native/after.exe
terminate called after throwing an instance of 'bomb'
terminate called recursively
Aborted (core dumped)

I'm using the dart exe included with Flutter version 3.10.6 and 3.13.0 respectively.

Basically the code is just:

FFI_PLUGIN_EXPORT intptr_t sum(intptr_t a, intptr_t b) { 
  try {
    throw bomb{};
    return a + b; 
  }
  catch (std::exception& e) {
    return -1; // catch and return error code
  }
}
import 'package:bomb/bomb.dart';

void main(List<String> args) {
  print(sum(1, 2));
}

plus plumbing to load the dynamic library.

@mit-mit mit-mit added the P1 A high priority bug; for example, a single project is unusable or has many test failures label Aug 21, 2023
@dcharkes
Copy link
Contributor

Thanks for the repro @nielsenko.

I can repro this with:

$ ~/Downloads/3.0.6/dart-sdk/bin/dart --version
Dart SDK version: 3.0.6 (stable) (Tue Jul 11 18:49:07 2023 +0000) on "linux_x64"
$ ~/Downloads/3.0.6/dart-sdk/bin/dart compile exe bin/bomb.dart -o build-native/threeosix.exe
Info: Compiling with sound null safety.
Generated: /usr/local/google/home/dacoharkes/src/nielsenko/reproduction-dart-sdk-53267/build-native/threeosix.exe
$ ./build-native/threeosix.exe 
-1

$ ~/Downloads/dart-sdk/bin/dart --version
Dart SDK version: 3.1.0 (stable) (Tue Aug 15 21:33:36 2023 +0000) on "linux_x64"
$ ~/Downloads/dart-sdk/bin/dart compile exe bin/bomb.dart -o build-native/threeone.exe
$ ./build-native/threeone.exe 
terminate called after throwing an instance of 'bomb'
terminate called recursively
Aborted

$ dart --version
Dart SDK version: 3.2.0-edge.4d4a922638cda394c59605936a56874f75819b9e (be) (Mon Aug 21 08:21:37 2023 +0000) on "linux_x64"
$ dart compile exe bin/bomb.dart -o build-native/main.exe
Generated: /usr/local/google/home/dacoharkes/src/nielsenko/reproduction-dart-sdk-53267/build-native/main.exe
$ ./build-native/main.exe 
terminate called after throwing an instance of 'bomb'
terminate called recursively
Aborted

Direct compiler invocation without CMake, in case the compiler matters:

$ /usr/bin/c++ -fPIC -shared -std=gnu++17 -o build-native/libbomb.so  /usr/local/google/home/dacoharkes/src/nielsenko/reproduction-dart-sdk-53267/src/bomb.cpp
$ /usr/bin/c++ --version
c++ (Debian 12.2.0-14) 12.2.0

I'll see if I can figure out what's going on here.

@dcharkes
Copy link
Contributor

dcharkes commented Aug 21, 2023

I modified the bisection script to bisect this:

$ tools/bisect.dart \
-Dend=3e0d976d4b9eb3ba1da948b40bb49521279c83b2 \
-Dstart=58338376220ee7641a71c6198e909c7fc4a7c2ae \
-Dsdk_path="/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/" \
-Dtest_command="python3 tools/build.py -mrelease -ax64 create_platform_sdk dart_precompiled_runtime" \
-Dtest_command="/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/out/ReleaseX64/dart-sdk/bin/dart compile exe /usr/local/google/home/dacoharkes/src/nielsenko/reproduction-dart-sdk-53267/bin/bomb.dart -o /usr/local/google/home/dacoharkes/src/nielsenko/reproduction-dart-sdk-53267/build-native/repro.exe" \
-Dtest_command="/usr/local/google/home/dacoharkes/src/nielsenko/reproduction-dart-sdk-53267/build-native/repro.exe" \
-Dfailure_string="terminate called"

Writing detailed log to /usr/local/google/home/dacoharkes/dart-sdk/sdk/.dart_tool/bisect_dart/20230821_repro.exe/logs/full.txt.
Ensuring SDK repo in /usr/local/google/home/dacoharkes/master-dart-sdk/sdk/.
Ensuring failure reproduces on 5833837.
Commit 5833837, reproduces failure.
Ensuring failure does not reproduce on a251281.
Commit a251281, does not reproduce failure.
Bisecting 5833837...fde31b1 (1684 commits). Trying b20b8ef.
Commit b20b8ef, reproduces failure.
Bisecting b20b8ef...fde31b1 (842 commits). Trying d3da431.
Commit d3da431, reproduces failure.
Bisecting d3da431...fde31b1 (421 commits). Trying 7575e2c.
Commit 7575e2c, reproduces failure.
Bisecting 7575e2c...fde31b1 (211 commits). Trying b9fa34d.
Commit b9fa34d, reproduces failure.
Bisecting b9fa34d...fde31b1 (106 commits). Trying fa3a72f.
Commit fa3a72f, does not reproduce failure.
Bisecting b9fa34d...4cd9c9c (53 commits). Trying 754aca0.
Commit 754aca0, does not reproduce failure.
Bisecting b9fa34d...3c284a0 (26 commits). Trying 62a67ba.
Commit 62a67ba, does not reproduce failure.
Bisecting b9fa34d...fffd7e8 (13 commits). Trying 4c0de8e.
Commit 4c0de8e, reproduces failure.
Bisecting 4c0de8e...fffd7e8 (7 commits). Trying c948d37.
Commit c948d37, reproduces failure.
Bisecting c948d37...fffd7e8 (4 commits). Trying 7c1e2fe.
Commit 7c1e2fe, does not reproduce failure.
Bisecting c948d37...7b2cfdb (2 commits). Trying 7b2cfdb.
Commit 7b2cfdb, reproduces failure.
Bisected to 7b2cfdb.

Very interesting.

@rmacnak-google @mkustermann @mraleph any ideas why removing tcmalloc could cause this?

@rmacnak-google
Copy link
Contributor

Maybe linking with tcmalloc dragged in some unwind tables or such? It seems relevant that the VM is built with -fno-exceptions, but tcmalloc was built with the default.

@dcharkes
Copy link
Contributor

dcharkes commented Aug 21, 2023

Thanks for the suggestion @rmacnak-google!

I gave a quick spin to removing that flag from our build:

$ git log
commit 57e3f142d773dbb5daa3bb471f5e35ba065e8fef (HEAD -> repro-53267)
Author: Daco Harkes <[email protected]>
Date:   Mon Aug 21 18:47:15 2023 +0000

    [vm] Remove -fno-exceptions

commit 7b2cfdbc8cd74cda8096acdcee4a8e112cf0c3b8
Author: Ryan Macnak <[email protected]>
Date:   Tue Apr 18 20:56:00 2023 +0000

    [standalone] Remove tcmalloc.

$ git diff HEAD^1..HEAD
diff --git a/build/config/gcc/BUILD.gn b/build/config/gcc/BUILD.gn
index 40ec2bcde5d..032ad5dbbca 100644
--- a/build/config/gcc/BUILD.gn
+++ b/build/config/gcc/BUILD.gn
@@ -41,7 +41,7 @@ config("executable_ldconfig") {
 }
 
 config("no_exceptions") {
-  no_exceptions_flags = [ "-fno-exceptions" ]
+  no_exceptions_flags = [  ]
   cflags_cc = no_exceptions_flags
   cflags_objcc = no_exceptions_flags
 }
diff --git a/runtime/BUILD.gn b/runtime/BUILD.gn
index 7268731a0ca..8d08557a42f 100644
--- a/runtime/BUILD.gn
+++ b/runtime/BUILD.gn
@@ -222,7 +222,6 @@ config("dart_config") {
       "-g3",
       "-ggdb3",
       "-fno-rtti",
-      "-fno-exceptions",
     ]
     if (is_clang) {
       cflags += [

But that still fails to catch the exception:

$ ~/master-dart-sdk/sdk/out/ReleaseX64/dart-sdk/bin/dart --version
Dart SDK version: 3.1.0-edge.57e3f142d773dbb5daa3bb471f5e35ba065e8fef (be) (Mon Aug 21 18:47:15 2023 +0000) on "linux_x64"
$ ~/master-dart-sdk/sdk/out/ReleaseX64/dart-sdk/bin/dart compile exe bin/bomb.dart -o build-native/fexceptions.exe
Info: Compiling with sound null safety.
Generated: /usr/local/google/home/dacoharkes/src/nielsenko/reproduction-dart-sdk-53267/build-native/fexceptions.exe
$ ./build-native/fexceptions.exe 
terminate called after throwing an instance of 'bomb'
terminate called recursively
Aborted

The symptoms look very similar though: https://www.reddit.com/r/cpp_questions/comments/qd9265/exceptions_when_exceptions_are_disabled/

int sum(int a, int b) {
  try {
    throw bomb{};
    return a + b;
  } catch (...) {
    return -1; // catch and return error code
  }
}
sum(int, int):
        push    rbp
        mov     rbp, rsp
        push    rbx
        sub     rsp, 24
        mov     DWORD PTR [rbp-20], edi
        mov     DWORD PTR [rbp-24], esi
        mov     edi, 8
        call    __cxa_allocate_exception
        mov     rbx, rax
        mov     QWORD PTR [rbx], 0
        mov     rdi, rbx
        call    bomb::bomb() [complete object constructor]
        mov     edx, OFFSET FLAT:bomb::~bomb() [complete object destructor]
        mov     esi, OFFSET FLAT:typeinfo for bomb
        mov     rdi, rbx
        call    __cxa_throw
        mov     rdi, rax
        call    __cxa_begin_catch
        mov     ebx, -1
        call    __cxa_end_catch
        mov     eax, ebx
        mov     rbx, QWORD PTR [rbp-8]
        leave
        ret

https://godbolt.org/z/h8v8nKaEj

If things are not caught, it might be that the __cxa_begin_catch and __cxa_end_catch are called from the standard library included in the Dart precompiled runtime executable, and that these are no-ops.

The exception catching symbols are in the dylib itself:

$ nm  build-native/libbomb.so 
0000000000004050 b completed.0
                 U __cxa_allocate_exception@CXXABI_1.3
                 U __cxa_begin_catch@CXXABI_1.3
                 U __cxa_end_catch@CXXABI_1.3
                 w __cxa_finalize
                 U __cxa_throw@CXXABI_1.3
...

I'll dig deeper later.

copybara-service bot pushed a commit that referenced this issue Aug 22, 2023
`-Dtest_command` is now a multi-argument, commands will be run in
sequence when multiple are passed.

The test commands now need to include the executable. Previous
invocations now need to be passed python3 in front of tools/test.py.

Both stdout and stderr from test commands are now taken into account
for matching on the error string.

The above changes enable bisecting failures other than tests failing
through test.py invocations such as:
#53267 (comment)

Change-Id: I4c3cb800b69e34a15c375f45a417bb2c28d9df8e
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/322020
Reviewed-by: Martin Kustermann <[email protected]>
Commit-Queue: Daco Harkes <[email protected]>
@mkustermann
Copy link
Member

The exception catching symbols are in the dylib itself:

$ nm  build-native/libbomb.so 
0000000000004050 b completed.0
                 U __cxa_allocate_exception@CXXABI_1.3
                 U __cxa_begin_catch@CXXABI_1.3
                 U __cxa_end_catch@CXXABI_1.3
                 w __cxa_finalize
                 U __cxa_throw@CXXABI_1.3
...

The U actually means the symbols are not in that library - but are provided at runtime (U == undefined), w means it's a weak symbol and it may take a different one at runtime.

I got a hint from a colleague that it might be different c++ runtimes for the process and the shared library. How is the Dart executable compiled? For the the repro I am using gcc 9.3

Normally we compile using a pinpointed version of clang + sysroot. (search for buildtools in our dart-lang/sdk repositories DEPS file)

@nielsenko
Copy link

@mkustermann / @dcharkes The comment on gcc 9.3 from @blagoev was about Realm, but the reproduction I made just use what is picked up by the default cmake setup you get when running flutter create -t plugin_ffi, except language is changed to C++.

On the ubuntu-22 box I used for the reproduction that is g++ 11.4.0 in case it is relevant.

Also, note that it works when using Flutter. You can try the sample in the example folder, which was surprising to me, but I guess running flutter loads some extra code, that "fixes" the issue.

@dcharkes
Copy link
Contributor

@nielsenko Yes, the compiler doesn't matter, my reproduction is with g++ 12. I don't think it's relevant.

Thanks @mkustermann for your keen eye! 👁️

The process ignores the catch(...) block and proceeds to terminate.

This is not entirely correct, the throw (__cxa_throw libcxxabi runtime call) terminates the process. So we exit before we would ever reach the runtime call for the catch.

libc.so.6!__pthread_kill_implementation(pthread_t threadid, int signo, int no_tid) (pthread_kill.c:44)
libc.so.6!__pthread_kill_internal(int signo, pthread_t threadid) (pthread_kill.c:89)
libc.so.6!__GI___pthread_kill(pthread_t threadid, int signo) (pthread_kill.c:89)
[Unknown/Just-In-Time compiled code] (Unknown Source:0)
libc.so.6!__GI_abort() (abort.c:79)
libstdc++.so.6!__gnu_cxx::__verbose_terminate_handler() (Unknown Source:0)
libstdc++.so.6![Unknown/Just-In-Time compiled code] (Unknown Source:0)
libstdc++.so.6!__gxx_personality_v0 (Unknown Source:0)
unwind_phase2(unw_context_t * uc, unw_cursor_t * cursor, _Unwind_Exception * exception_object) (UnwindLevel1.c:261)
_Unwind_RaiseException(_Unwind_Exception * exception_object) (UnwindLevel1.c:438)
libgcc_s.so.1!_Unwind_Resume_or_Rethrow (Unknown Source:0)
libstdc++.so.6!__cxa_rethrow (Unknown Source:0)
libstdc++.so.6![Unknown/Just-In-Time compiled code] (Unknown Source:0)
libstdc++.so.6!__gxx_personality_v0 (Unknown Source:0)
unwind_phase2(unw_context_t * uc, unw_cursor_t * cursor, _Unwind_Exception * exception_object) (UnwindLevel1.c:261)
_Unwind_RaiseException(_Unwind_Exception * exception_object) (UnwindLevel1.c:438)
libstdc++.so.6!__cxa_throw (Unknown Source:0)
libbomb.so!sum (Unknown Source:0)
bomb-2.aotsnapshot!FfiTrampoline__sum (Unknown Source:0)
bomb-2.aotsnapshot!FfiTrampoline__sum (Unknown Source:0)
bomb-2.aotsnapshot!BombBindings.sum() (bomb/bomb_bindings_generated.dart:38)
bomb-2.aotsnapshot!sum() (bomb/bomb.dart:13)
bomb-2.aotsnapshot!main() (/usr/local/google/home/dacoharkes/src/nielsenko/reproduction-dart-sdk-53267/bin/bomb.dart:4)
bomb-2.aotsnapshot!main (Unknown Source:0)
bomb-2.aotsnapshot!_Closure.call (Unknown Source:0)
bomb-2.aotsnapshot!_delayEntrypointInvocation.<anonymous closure>() (isolate-patch/isolate_patch.dart:294)
bomb-2.aotsnapshot!_Closure.call (Unknown Source:0)
bomb-2.aotsnapshot!_RawReceivePort._handleMessage() (isolate-patch/isolate_patch.dart:189)
bomb-2.aotsnapshot!stub InvokeDartCode (Unknown Source:0)
dart::DartEntry::InvokeFunction(const dart::Function & function, const dart::Array & arguments, const dart::Array & arguments_descriptor) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/dart_entry.cc:160)
dart::DartLibraryCalls::HandleMessage(Dart_Port port_id, const dart::Instance & message) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/dart_entry.cc:760)
dart::IsolateMessageHandler::HandleMessage(dart::IsolateMessageHandler * this, std::__2::unique_ptr<dart::Message, std::__2::default_delete<dart::Message> > message) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/isolate.cc:1437)
dart::MessageHandler::HandleMessages(dart::MessageHandler * this, dart::MonitorLocker * ml, bool allow_normal_messages, bool allow_multiple_normal_messages) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/message_handler.cc:239)
dart::MessageHandler::TaskCallback(dart::MessageHandler * this) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/message_handler.cc:450)
dart::ThreadPool::WorkerLoop(dart::ThreadPool * this, dart::ThreadPool::Worker * worker) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/thread_pool.cc:158)
dart::ThreadPool::Worker::Main(dart::uword args) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/thread_pool.cc:330)
dart::ThreadStart(void * data_ptr) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/os_thread_linux.cc:154)
libc.so.6!start_thread(void * arg) (pthread_create.c:444)
libc.so.6!clone3() (clone3.S:81)

If I run the same aotsnapshot in gdb with the dart_precompiled_runtime that catches the exception and returns -1 instead, the call stack is similar up to the first occurrence of __gxx_personality_v0:

libstdc++.so.6!__gxx_personality_v0 (Unknown Source:0)
unwind_phase1(unw_context_t * uc, unw_cursor_t * cursor, _Unwind_Exception * exception_object) (UnwindLevel1.c:139)
_Unwind_RaiseException(_Unwind_Exception * exception_object) (UnwindLevel1.c:433)
libstdc++.so.6!__cxa_throw (Unknown Source:0)
libbomb.so!sum (Unknown Source:0)
bomb-2.aotsnapshot!FfiTrampoline__sum (Unknown Source:0)
bomb-2.aotsnapshot!FfiTrampoline__sum (Unknown Source:0)
bomb-2.aotsnapshot!BombBindings.sum() (bomb/bomb_bindings_generated.dart:38)
bomb-2.aotsnapshot!sum() (bomb/bomb.dart:13)
bomb-2.aotsnapshot!main() (/usr/local/google/home/dacoharkes/src/nielsenko/reproduction-dart-sdk-53267/bin/bomb.dart:4)
bomb-2.aotsnapshot!main (Unknown Source:0)
bomb-2.aotsnapshot!_Closure.call (Unknown Source:0)
bomb-2.aotsnapshot!_delayEntrypointInvocation.<anonymous closure>() (isolate-patch/isolate_patch.dart:294)
bomb-2.aotsnapshot!_Closure.call (Unknown Source:0)
bomb-2.aotsnapshot!_RawReceivePort._handleMessage() (isolate-patch/isolate_patch.dart:189)
bomb-2.aotsnapshot!stub InvokeDartCode (Unknown Source:0)
dart::DartEntry::InvokeFunction(const dart::Function & function, const dart::Array & arguments, const dart::Array & arguments_descriptor) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/dart_entry.cc:160)
dart::DartLibraryCalls::HandleMessage(Dart_Port port_id, const dart::Instance & message) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/dart_entry.cc:760)
dart::IsolateMessageHandler::HandleMessage(dart::IsolateMessageHandler * this, std::__2::unique_ptr<dart::Message, std::__2::default_delete<dart::Message> > message) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/isolate.cc:1437)
dart::MessageHandler::HandleMessages(dart::MessageHandler * this, dart::MonitorLocker * ml, bool allow_normal_messages, bool allow_multiple_normal_messages) (/usr/local/google/home/dacoharkes/master-dart-sdk/sdk/runtime/vm/message_handler.cc:239)

In libcxxabi/src/cxa_personality.cpp __gxx_personality_v0 I believe it then diverges by calling a different handler for the exception. (That is if I correlate the machine code correctly with the source code.)

__gxx_personality_v0
#endif
#endif
                    (int version, _Unwind_Action actions, uint64_t exceptionClass,
                     _Unwind_Exception* unwind_exception, _Unwind_Context* context)
{
// ...
        // Jump to the handler.
        set_registers(unwind_exception, context, results);

The machine code for __gxx_personality_v0 is identical (for commit 7b2cfdb and its parent). So it's not that we're including different libcxxabi source code. But it probably is that a different exception handler is configured.

I'll see if I can restore the exception handler by tinkering with the Dart build.


Side note: we've also had a bunch of other libstdc++ related issues which we all solved by building our own libstdc++ so that we don't have these incompatibilities in our own test suite. ddc11f8

The GNU C++ compiler, g++, has a compiler command line option to switch between various different C++ ABIs. This explicit version switch is the flag -fabi-version. In addition, some g++ command line options may change the ABI as a side-effect of use. Such flags include -fpack-struct and -fno-exceptions, but include others: see the complete list in the GCC manual under the heading Options for Code Generation Conventions.

The configure options used when building a specific libstdc++ version may also impact the resulting library ABI. The available configure options, and their impact on the library ABI, are documented here.

Putting all of these ideas together results in the C++ Standard Library ABI, which is the compilation of a given library API by a given compiler ABI. In a nutshell:

“ library API + compiler ABI = library ABI ”

https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html

So, conceptually we have an ABI mismatch between the Dart precompiled runtime (-fno-exceptions) and the dynamic library (-fexceptions). This should lead to issues.

@dcharkes
Copy link
Contributor

dcharkes commented Aug 23, 2023

In slimming the Dart executable down to nothing, I have succeed in reproducing the symptoms in a standalone executable:

#include <dlfcn.h>
#include <iostream>

int main(int argc, char** argv) {
  auto* dl_handle = dlopen(
      "/usr/local/google/home/dacoharkes/src/nielsenko/"
      "reproduction-dart-sdk-53267/build-native/libbomb.so",
      RTLD_LAZY);
  auto* sym_address = dlsym(dl_handle, "sum");
  auto* fun =
      reinterpret_cast<intptr_t (*)(intptr_t a, intptr_t b)>(sym_address);
  auto u = fun(1, 2);
  std::cout << "result: " << u << "\n";
}

A single flag causes the difference in behavior.

executable("main_repro") {
  sources = [ "main_repro.cc" ]
  ldflags = [
    # With this flag: terminate called after throwing an instance of 'bomb'.
    # Without this flag: exception is caught and -1 is returned.
    "-rdynamic"
  ]
}

# Does not abort on the exception.
executable("main_repro2") {
  sources = [ "main_repro.cc" ]
  ldflags = [
    # With this flag: terminate called after throwing an instance of 'bomb'.
    # Without this flag: exception is caught and -1 is returned.
    # "-rdynamic"
  ]
}
$ out/DebugX64/main_repro
terminate called after throwing an instance of 'bomb'
terminate called recursively
Aborted
$ out/DebugX64/main_repro2 
result: -1

I do not yet understand why this causes the difference. My hunch is that somehow some libcxx symbols get resolved to ones in the executable rather than in the dylib. I'll investigate further later.

@dcharkes
Copy link
Contributor

dcharkes commented Aug 24, 2023

The repro executables differ in _Unwind symbols:

$ nm -D out/DebugX64/main_repro &> nm_main_repro.txt
$ nm -D out/DebugX64/main_repro2 &> nm_main_repro2.txt
$ git diff --no-index nm_main_repro.txt nm_main_repro2.txt
diff --git a/nm_main_repro.txt b/nm_main_repro2.txt
index fb4daf146c5..a1aef6b50f8 100644
--- a/nm_main_repro.txt
+++ b/nm_main_repro2.txt
@@ -13,8 +13,6 @@
                  U __ctype_toupper_loc@GLIBC_2.3
                  U __cxa_atexit@GLIBC_2.2.5
                  w __cxa_finalize@GLIBC_2.2.5
-00000000000ac160 D __data_start
-00000000000ac160 W data_start
                  w __deregister_frame_info
                  U dladdr@GLIBC_2.2.5
                  U dl_iterate_phdr@GLIBC_2.2.5
@@ -23,7 +21,6 @@
                  U __errno_location@GLIBC_2.2.5
                  U fclose@GLIBC_2.2.5
                  U fflush@GLIBC_2.2.5
-00000000000a218c T _fini
                  U fopen@GLIBC_2.2.5
                  U fprintf@GLIBC_2.2.5
                  U fputc@GLIBC_2.2.5
@@ -37,8 +34,6 @@
                  U getc@GLIBC_2.2.5
                  U getenv@GLIBC_2.2.5
                  w __gmon_start__
-00000000000a2174 T _init
-0000000000015420 R _IO_stdin_used
                  U iswalpha_l@GLIBC_2.3
                  U iswblank_l@GLIBC_2.3
                  U iswcntrl_l@GLIBC_2.3
@@ -49,11 +44,8 @@
                  U iswspace_l@GLIBC_2.3
                  U iswupper_l@GLIBC_2.3
                  U iswxdigit_l@GLIBC_2.3
-00000000000a2170 T __libc_csu_fini
-00000000000a2100 T __libc_csu_init
                  U __libc_start_main@GLIBC_2.2.5
                  U localeconv@GLIBC_2.2.5
-000000000003aa90 T main
                  U malloc@GLIBC_2.2.5
                  U mbrtowc@GLIBC_2.2.5
                  U mbsnrtowcs@GLIBC_2.2.5
@@ -94,7 +86,6 @@
                  U setlocale@GLIBC_2.2.5
                  U snprintf@GLIBC_2.2.5
                  U sscanf@GLIBC_2.2.5
-000000000003a9a0 T _start
                  U stderr@GLIBC_2.2.5
                  U stdin@GLIBC_2.2.5
                  U stdout@GLIBC_2.2.5
@@ -125,35 +116,6 @@
                  U towlower_l@GLIBC_2.3
                  U towupper_l@GLIBC_2.3
                  U ungetc@GLIBC_2.2.5
-000000000009e490 T __unw_add_dynamic_eh_frame_section
-000000000009e030 T __unw_add_dynamic_fde
-00000000000a20a4 W unw_getcontext
-000000000009d980 W unw_get_fpreg
-000000000009dbd0 W unw_get_proc_info
-000000000009dce0 W unw_get_proc_name
-000000000009d7f0 W unw_get_reg
-000000000009d500 T _Unwind_DeleteException
-000000000009d360 T _Unwind_ForcedUnwind
-000000000009d560 T _Unwind_GetGR
-000000000009d620 T _Unwind_GetIP
-000000000009d3f0 T _Unwind_GetLanguageSpecificData
-000000000009d490 T _Unwind_GetRegionStart
-000000000009c8c0 T _Unwind_RaiseException
-000000000009cf80 T _Unwind_Resume
-000000000009d5c0 T _Unwind_SetGR
-000000000009d680 T _Unwind_SetIP
-000000000009d6e0 W unw_init_local
-000000000009dd80 W unw_is_fpreg
-000000000009de80 W unw_is_signal_frame
-000000000009df00 W unw_iterate_dwarf_unwind_cache
-00000000000ac2b8 D unw_local_addr_space
-000000000009de00 W unw_regname
-000000000009e7b0 T __unw_remove_dynamic_eh_frame_section
-000000000009e3b0 T __unw_remove_dynamic_fde
-000000000009dc60 W unw_resume
-000000000009da20 W unw_set_fpreg
-000000000009d890 W unw_set_reg
-000000000009dad0 W unw_step
                  U uselocale@GLIBC_2.3
                  U vasprintf@GLIBC_2.2.5
                  U vfprintf@GLIBC_2.2.5

The one none of the _Unwind symbols runs catches the exception as expected. The one with some of _Unwind symbols aborts. @rmacnak-google Your initial hunch was in the right direction!

If I do the same exercise on 7b2cfdb and its parent 7c1e2fe, the diff is quite big due to the tcmalloc removal. The diff also contains a difference in _Unwind symbols. However, in this case the dart_precompiled_runtime from both commits contain some _Unwind symbols. 7b2cfdb, which aborts, contains a smaller set than 7c1e2fe.

@@ -2463,249 +2314,26 @@ T _printRawObject
 T _printStackTrace
 T _printStackTraceWithLocals
 T _start
-T _Unwind_Backtrace
 T _Unwind_DeleteException
-T _Unwind_Find_FDE
-T _Unwind_FindEnclosingFunction
 T _Unwind_ForcedUnwind
-T _Unwind_GetCFA
-T _Unwind_GetDataRelBase
 T _Unwind_GetGR
 T _Unwind_GetIP
-T _Unwind_GetIPInfo
 T _Unwind_GetLanguageSpecificData
 T _Unwind_GetRegionStart
-T _Unwind_GetTextRelBase
 T _Unwind_RaiseException
 T _Unwind_Resume
-T _Unwind_Resume_or_Rethrow
 T _Unwind_SetGR
 T _Unwind_SetIP

This leads me to believe the dart_precompiled_runtime has a partial set of _Unwind symbols exposed and that the dylib then links to a part of the C++ runtime and uses part of its own C++ runtime. Then maybe we get rid of the issue if we can hide all _Unwind symbols. In order to validate this hypothesis, we need to play around with linker flags for exposing symbols. Unfortunately, I have not found a linker that says "do -rdynamic but skip symbol x". So, we'll have to get creative to verify.

For reference, we use -rdynamic because of:

sdk/runtime/bin/BUILD.gn

Lines 794 to 801 in dbd7b96

if (is_win) {
ldflags = [ "/EXPORT:Dart_True" ]
} else {
# Adds all symbols to the dynamic symbol table, not just used ones.
# This is needed to make native extensions work. It is also needed to get
# symbols in VM-generated backtraces and profiles.
ldflags = [ "-rdynamic" ]
}

To make the native extensions work we just need to ensure we export dart_api.h symbols I believe instead of blindly exporting everything.
For the symbols in VM-generated backtraces and profiles, I'll need to check the behavior in perf and crashes if we remove -rdynamic.
I'll keep on digging. (Anyone, feel free to chime in.)

edit: removing -rdynamic from the Dart build makes exception catching work again on the main branch.

$ ~/dart-sdk/sdk/out/DebugX64/dart-sdk/bin/dart compile exe bin/bomb.dart -o build-native/nordynamic.exe
Generated: /usr/local/google/home/dacoharkes/src/nielsenko/reproduction-dart-sdk-53267/build-native/nordynamic.exe
$ ./build-native/nordynamic.exe 
-1

The question is what this does to (1) native extensions, (2) VM-generated backtraces, and (3) profiles.
We can probably address (1) by listing the symbols to export, but (2) and (3) sound like they benefit from the full list of symbols.

@nielsenko
Copy link

Given that you use -rdynamic on all platforms except windows, it is (perhaps) interesting that this does not cause an issue on macos. Is that due to clang vs gcc, or is something else at play?

@dcharkes
Copy link
Contributor

dcharkes commented Aug 28, 2023

Update: Exporting only the Dart_* symbols on Linux with -Wl,--export-dynamic-symbol=Dart_* and without -rdynamic solves the issue: https://dart-review.googlesource.com/c/sdk/+/322405/5.

This breaks the symbols in the perf tool on dartaotruntime in the SDK, but the dart_precompiled_runtime with a manually built SDK still has the symbols. (Note that we have no tests on the CI exercising perf to see if we have symbols.)

It might be possible to export all symbols in dartaotruntime except for the ones in the libcxx, but I haven't found a set of flags to achieve that. (There is a way to specify a list symbols to be exported in a file, so we can probably write some script that finds all symbols and then filters the ones that we don't want. However, it is less pretty than just setting a compiler flag.)

@mit-mit Do we care about being able to use the perf command-line tool on Linux with the dartaotruntime on released SDKs?

@rmacnak-google @alexmarkov @mkustermann @mraleph Any concerns about switching our build from -rdynamic to -Wl,--export-dynamic-symbol=Dart_* on Linux?

It is also needed to get symbols in VM-generated backtraces

What use case does this refer to?

cc @sstrickl perf has a --callee-graph dwarf option. Does that ring a bell? Is that something we could use to support having symbols for dartaotruntime in released SDKs?

TODO @dcharkes:

  • Add regression test.
  • Add perf usage tests. (Probably with perf ... &> log.txt.)
  • Add vm backtrace test.
  • Land a fix.

Side note: we only have symbols when doing dartaotruntime/dart_precompiled_runtime [aot snapshot]. dart compile exe has no symbols in perf #47360.

nielsenko added a commit to realm/realm-dart that referenced this issue Aug 28, 2023
@rmacnak-google
Copy link
Contributor

-rdynamic was added for the VM's profiler to be able to symbolize frames in the runtime using dladdr. The signal handler that prints backtraces for VM crashes was later built on the same stack walker and symbolizer.

We might replace this with the mechanism [1,2] that was added for Fuchsia because Fuchsia's linker has a broken -rdynamic, where we externally build our own compact symbolization table that is separate from the dynamic table. This works better for Fuchsia because Fuchsia's package model means we can reliably have a separate file available. For Linux/Mac/Android, we would want to link this table into the VM binary; this would have to use a different trick than our normal way of embedding resources to avoid a cyclic build dependency (maybe using objcopy).

[1] https://github.com/flutter/engine/tree/master/shell/platform/fuchsia/runtime/dart/profiler_symbols
[2] https://github.com/dart-lang/sdk/blob/main/runtime/vm/native_symbol_fuchsia.cc

@dcharkes
Copy link
Contributor

dcharkes commented Aug 28, 2023

Thanks @rmacnak-google! That is helpful! TIL about dladdr.

Note: all of the above functionality is apparently not exercised in tests, no tests failed when removing -rdynamic, we should add test coverage.

I see that the way that Flutter produces the external symbol file is first to produce an unstripped exe, then getting all the symbols with nm. (It produces a stripped exe with strip from the unstripped exe.)
We could use an unstripped exe to produce a list of all symbols with nm, filter that list on C++ runtime functions, and then use strip with --strip-symbol to remove all the C++ runtime functions and then compile again with -Wl,--export-dynamic-symbol-list=[file] to produce an executable without the C++ runtime functions as dynamic symbols. The downside of this approach would be that we would have to come up with a list of C++ runtime symbol names, and that these symbols would no longer show up in profiles or backtraces. The benefit would be that we could keep using the dynamic table.

@rmacnak-google Any benefits in trying to stick with the dynamic table or going for our custom format? Also, why did we make our own format instead of producing dwarf debug file?

@rmacnak-google
Copy link
Contributor

I think the main benefit of using the dynamic table is compatibility with other backtracers that might end up in our process, such as code called from FFI that uses glibc's backtrace_symbols. Until this issue, it was also quite simple to just add the one linker flag.

The custom format was to get something simple to handle at runtime and something compact. The full DWARF output for the VM is mostly things not needed to symbolize frames.

@mkustermann
Copy link
Member

IIRC flutter engine hides all symbols except the ones from their embedding API. How does our profiler work with the flutter engine? If it works there, can we do the same for standalone? If it doesn't work there, maybe we can come up with something that works for both?

In general it seems like a good discipline to only export the symbols we actually want to be available / usable at runtime by other code (amongst many reasons: avoiding customers depending on symbols in the VM we don't consider part of our API)

@dcharkes
Copy link
Contributor

IIRC flutter engine hides all symbols except the ones from their embedding API. How does our profiler work with the flutter engine? If it works there, can we do the same for standalone? If it doesn't work there, maybe we can come up with something that works for both?

I don't see any C++ symbols at all in the profiler in devtools (in Dart standalone on Linux), only Dart symbols. Maybe I'm holding it wrong? @bkonyi Do we surface the time spent in the VM and show the C++ function names at all in devtools? If yes, what kind of Dart program should I invoke to show this? (I tried some stuff in benchmarks/ but that didn't show any time spent in the runtime.

@dcharkes
Copy link
Contributor

@mkustermann suggested hiding all symbols that start with _, however all mangled C++ symbols start with underscore. It reduces the amount of symbols from 23k to 2k in the dart_precompiled_runtime.

But following this idea, we can hide all symbols that have a U as the second character (when mangled), which includes all the _Unwind symbols:

-Wl,--export-dynamic-symbol=[a-zA-Z_][a-zA-TV-Z_]*

diff --git a/log.txt b/log2.txt
index 2d2d0d25c58..e3368a3bd7e 100644
--- a/log.txt
+++ b/log2.txt
@@ -1736,16 +1736,6 @@ W unw_get_fpreg
 W unw_get_proc_info
 W unw_get_proc_name
 W unw_get_reg
-T _Unwind_DeleteException
-T _Unwind_ForcedUnwind
-T _Unwind_GetGR
-T _Unwind_GetIP
-T _Unwind_GetLanguageSpecificData
-T _Unwind_GetRegionStart
-T _Unwind_RaiseException
-T _Unwind_Resume
-T _Unwind_SetGR
-T _Unwind_SetIP
 W unw_init_local
 W unw_is_fpreg
 W unw_is_signal_frame
@@ -2318,14 +2308,9 @@ T utext_previous32From_68
 T utext_replace_68
 T utext_setNativeIndex_68
 T utext_setup_68
-D _UTF16BEData_68
-D _UTF16Data_68
-D _UTF16LEData_68
-D _UTF16v2Data_68
 T utf8_appendCharSafeBody_68
 T utf8_back1SafeBody_68
 R utf8_countTrailBytes_68
-D _UTF8Data_68
 T utf8_nextCharSafeBody_68
 T utf8_prevCharSafeBody_68
 U utimensat@GLIBC_2.6
@@ -2456,7 +2441,6 @@ U wcstoull@GLIBC_2.2.5
 U wmemchr@GLIBC_2.2.5
 U wmemcmp@GLIBC_2.2.5
 U write@GLIBC_2.2.5
-B x86_cpu_enable_avx512
 U __xstat64@GLIBC_2.2.5
 T ucasemap_mapUTF8(int, unsigned int, icu_68::BreakIterator*, char*, int, char const*, int, void (*)(int, unsigned int, icu_68::BreakIterator*, unsigned char const*, int, icu_68::ByteSink&, icu_68::Edits*, UErrorCode&), icu_68::Edits*, UErrorCode&)
 T ucasemap_mapUTF8(int, unsigned int, icu_68::BreakIterator*, char const*, int, void (*)(int, unsigned int, icu_68::BreakIterator*, unsigned char const*, int, icu_68::ByteSink&, icu_68::Edits*, UErrorCode&), icu_68::ByteSink&, icu_68::Edits*, UErrorCode&)

This also makes the reproduction properly return -1 instead of abort on exception throwing.

@bkonyi
Copy link
Contributor

bkonyi commented Aug 31, 2023

I don't see any C++ symbols at all in the profiler in devtools (in Dart standalone on Linux), only Dart symbols. Maybe I'm holding it wrong? @bkonyi Do we surface the time spent in the VM and show the C++ function names at all in devtools? If yes, what kind of Dart program should I invoke to show this? (I tried some stuff in benchmarks/ but that didn't show any time spent in the runtime.

Sorry for the delay, I must have missed this.

Yes, you can see native code in the DevTools CPU profile, but I believe it's hidden by default. If you go to the CPU Profiler page and select the filter button (to the right of the CPU Flame Chart subtab), you be able to remove the 'Hide Native code' filter:

image

Here's a sample profile after removing the filter:

image

@dcharkes
Copy link
Contributor

dcharkes commented Sep 1, 2023

We might replace this with the mechanism [1,2] that was added for Fuchsia because Fuchsia's linker has a broken -rdynamic, where we externally build our own compact symbolization table that is separate from the dynamic table.

CL that does this: https://dart-review.googlesource.com/c/sdk/+/323702
I have verified it also fixes the issue reported.
Thanks @rmacnak-google !

copybara-service bot pushed a commit that referenced this issue Sep 5, 2023
We were exporting all symbols to the dynamic table so that they could be looked up using `dladdr` for the profiler and backtracer. The symbols include our statically-linked libcxx, which can create trouble when another DSO has a different version of libcxx. Now we export only the VM embedding API functions (`Dart_*`) and use a specially produced table to do the symbolization.

TEST=runtime/tests/vm/dart/exported_symbols_test.dart
TEST=runtime/tests/vm/dart/symbolized_crash_test.dart
Bug: #53267
Change-Id: I2ee494fba86f67127ba0f6f402622f01a4662207
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/323702
Reviewed-by: Daco Harkes <[email protected]>
Commit-Queue: Ryan Macnak <[email protected]>
@dcharkes
Copy link
Contributor

@mit-mit Should we cherry pick b8ee3a9?

@mit-mit
Copy link
Member

mit-mit commented Sep 11, 2023

Yeah, I think we should.

@dcharkes
Copy link
Contributor

@rmacnak-google Will you make the CP?

@rmacnak-google
Copy link
Contributor

@a-siva
Copy link
Contributor

a-siva commented Sep 19, 2023

Closing this issue as a cherry pick has been created for both the stable and beta channels.

@a-siva a-siva closed this as completed Sep 19, 2023
copybara-service bot pushed a commit that referenced this issue Sep 25, 2023
We were exporting all symbols to the dynamic table so that they could be looked up using `dladdr` for the profiler and backtracer. The symbols include our statically-linked libcxx, which can create trouble when another DSO has a different version of libcxx. Now we export only the VM embedding API functions (`Dart_*`) and use a specially produced table to do the symbolization.

Bug: #53267
Change-Id: I50f8150d194a564c116d95383c28a1a2aba0714e
Cherry-pick: https://dart-review.googlesource.com/c/sdk/+/323702
Cherry-pick-request: #53503
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/325126
Reviewed-by: Michael Thomsen <[email protected]>
Reviewed-by: Daco Harkes <[email protected]>
Commit-Queue: Siva Annamalai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi P1 A high priority bug; for example, a single project is unusable or has many test failures
Projects
None yet
Development

No branches or pull requests

9 participants