-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VM] Process terminates with handled exceptions in FFI library #53267
Comments
cc @dcharkes |
Tested with the Dart SDK distributed outside Flutter and it fails as well. Not sure how could this happen. The only thing I can come up with is, if the process is installing an exception trap which get called before the catch all block. |
Ignore, it seems this is incorrect. |
@blagoev could you provide a minimal reproduction? I tried to reproduce it with https://dart-review.googlesource.com/c/sdk/+/321721, but that doesn't crash for me. Maybe I'm not using the right combination of features to trigger the crash. (Running on the latest commit from the main branch.) |
Sure, i will try to repro it outside Realm. It may be Realm related, but then why would be working with the previous Dart VM with no recompile. Will write back. |
It might well be something in the Dart VM. However, it's very hard to diagnose for me without a repro. Besides @liamappelbe's refactorings on callbacks, I don't believe we had any significant modifications to the workings of the FFI from 3.0 to 3.1. If your minimal repro includes callbacks that would be a place to start looking. Else it might any change in the VM. With a minimal repro, I can probably triage the issue here locally. |
Ok managed to repro it on top of the hello_world sample here https://github.com/dart-lang/samples/tree/main/ffi/hello_world. Used the current main branch head |
I got a hint from a colleague that it might be different c++ runtimes for the process and the shared library. How is the Dart executable compiled? For the the repro I am using gcc 9.3 |
For your convenience I have created a repository with a reproduction. It is based on the ffi plugin you get by default, when running On Linux First build the ffi lib: mkdir build-native
cmake -S src/ -B build-native/
cmake --build build-native/ next compile flutter/3.10.6/bin/dart compile exe bin/bomb.dart -o build-native/before.exe
flutter/3.13.0/bin/dart compile exe bin/bomb.dart -o build-native/after.exe Run both:
I'm using the dart exe included with Flutter version 3.10.6 and 3.13.0 respectively. Basically the code is just: FFI_PLUGIN_EXPORT intptr_t sum(intptr_t a, intptr_t b) {
try {
throw bomb{};
return a + b;
}
catch (std::exception& e) {
return -1; // catch and return error code
}
} import 'package:bomb/bomb.dart';
void main(List<String> args) {
print(sum(1, 2));
} plus plumbing to load the dynamic library. |
Thanks for the repro @nielsenko. I can repro this with:
Direct compiler invocation without CMake, in case the compiler matters:
I'll see if I can figure out what's going on here. |
I modified the bisection script to bisect this:
Writing detailed log to /usr/local/google/home/dacoharkes/dart-sdk/sdk/.dart_tool/bisect_dart/20230821_repro.exe/logs/full.txt. Very interesting. @rmacnak-google @mkustermann @mraleph any ideas why removing tcmalloc could cause this? |
Maybe linking with tcmalloc dragged in some unwind tables or such? It seems relevant that the VM is built with |
Thanks for the suggestion @rmacnak-google! I gave a quick spin to removing that flag from our build:
diff --git a/build/config/gcc/BUILD.gn b/build/config/gcc/BUILD.gn
index 40ec2bcde5d..032ad5dbbca 100644
--- a/build/config/gcc/BUILD.gn
+++ b/build/config/gcc/BUILD.gn
@@ -41,7 +41,7 @@ config("executable_ldconfig") {
}
config("no_exceptions") {
- no_exceptions_flags = [ "-fno-exceptions" ]
+ no_exceptions_flags = [ ]
cflags_cc = no_exceptions_flags
cflags_objcc = no_exceptions_flags
}
diff --git a/runtime/BUILD.gn b/runtime/BUILD.gn
index 7268731a0ca..8d08557a42f 100644
--- a/runtime/BUILD.gn
+++ b/runtime/BUILD.gn
@@ -222,7 +222,6 @@ config("dart_config") {
"-g3",
"-ggdb3",
"-fno-rtti",
- "-fno-exceptions",
]
if (is_clang) {
cflags += [ But that still fails to catch the exception:
The symptoms look very similar though: https://www.reddit.com/r/cpp_questions/comments/qd9265/exceptions_when_exceptions_are_disabled/ int sum(int a, int b) {
try {
throw bomb{};
return a + b;
} catch (...) {
return -1; // catch and return error code
}
} sum(int, int):
push rbp
mov rbp, rsp
push rbx
sub rsp, 24
mov DWORD PTR [rbp-20], edi
mov DWORD PTR [rbp-24], esi
mov edi, 8
call __cxa_allocate_exception
mov rbx, rax
mov QWORD PTR [rbx], 0
mov rdi, rbx
call bomb::bomb() [complete object constructor]
mov edx, OFFSET FLAT:bomb::~bomb() [complete object destructor]
mov esi, OFFSET FLAT:typeinfo for bomb
mov rdi, rbx
call __cxa_throw
mov rdi, rax
call __cxa_begin_catch
mov ebx, -1
call __cxa_end_catch
mov eax, ebx
mov rbx, QWORD PTR [rbp-8]
leave
ret https://godbolt.org/z/h8v8nKaEj
The exception catching symbols are in the dylib itself:
I'll dig deeper later. |
`-Dtest_command` is now a multi-argument, commands will be run in sequence when multiple are passed. The test commands now need to include the executable. Previous invocations now need to be passed python3 in front of tools/test.py. Both stdout and stderr from test commands are now taken into account for matching on the error string. The above changes enable bisecting failures other than tests failing through test.py invocations such as: #53267 (comment) Change-Id: I4c3cb800b69e34a15c375f45a417bb2c28d9df8e Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/322020 Reviewed-by: Martin Kustermann <[email protected]> Commit-Queue: Daco Harkes <[email protected]>
The
Normally we compile using a pinpointed version of clang + sysroot. (search for |
@mkustermann / @dcharkes The comment on gcc 9.3 from @blagoev was about Realm, but the reproduction I made just use what is picked up by the default cmake setup you get when running On the ubuntu-22 box I used for the reproduction that is g++ 11.4.0 in case it is relevant. Also, note that it works when using Flutter. You can try the sample in the example folder, which was surprising to me, but I guess running flutter loads some extra code, that "fixes" the issue. |
@nielsenko Yes, the compiler doesn't matter, my reproduction is with g++ 12. I don't think it's relevant. Thanks @mkustermann for your keen eye! 👁️
This is not entirely correct, the throw (
If I run the same aotsnapshot in gdb with the dart_precompiled_runtime that catches the exception and returns -1 instead, the call stack is similar up to the first occurrence of
In __gxx_personality_v0
#endif
#endif
(int version, _Unwind_Action actions, uint64_t exceptionClass,
_Unwind_Exception* unwind_exception, _Unwind_Context* context)
{
// ...
// Jump to the handler.
set_registers(unwind_exception, context, results); The machine code for I'll see if I can restore the exception handler by tinkering with the Dart build. Side note: we've also had a bunch of other libstdc++ related issues which we all solved by building our own libstdc++ so that we don't have these incompatibilities in our own test suite. ddc11f8
https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html So, conceptually we have an ABI mismatch between the Dart precompiled runtime ( |
In slimming the Dart executable down to nothing, I have succeed in reproducing the symptoms in a standalone executable: #include <dlfcn.h>
#include <iostream>
int main(int argc, char** argv) {
auto* dl_handle = dlopen(
"/usr/local/google/home/dacoharkes/src/nielsenko/"
"reproduction-dart-sdk-53267/build-native/libbomb.so",
RTLD_LAZY);
auto* sym_address = dlsym(dl_handle, "sum");
auto* fun =
reinterpret_cast<intptr_t (*)(intptr_t a, intptr_t b)>(sym_address);
auto u = fun(1, 2);
std::cout << "result: " << u << "\n";
} A single flag causes the difference in behavior.
I do not yet understand why this causes the difference. My hunch is that somehow some libcxx symbols get resolved to ones in the executable rather than in the dylib. I'll investigate further later. |
The repro executables differ in
diff --git a/nm_main_repro.txt b/nm_main_repro2.txt
index fb4daf146c5..a1aef6b50f8 100644
--- a/nm_main_repro.txt
+++ b/nm_main_repro2.txt
@@ -13,8 +13,6 @@
U __ctype_toupper_loc@GLIBC_2.3
U __cxa_atexit@GLIBC_2.2.5
w __cxa_finalize@GLIBC_2.2.5
-00000000000ac160 D __data_start
-00000000000ac160 W data_start
w __deregister_frame_info
U dladdr@GLIBC_2.2.5
U dl_iterate_phdr@GLIBC_2.2.5
@@ -23,7 +21,6 @@
U __errno_location@GLIBC_2.2.5
U fclose@GLIBC_2.2.5
U fflush@GLIBC_2.2.5
-00000000000a218c T _fini
U fopen@GLIBC_2.2.5
U fprintf@GLIBC_2.2.5
U fputc@GLIBC_2.2.5
@@ -37,8 +34,6 @@
U getc@GLIBC_2.2.5
U getenv@GLIBC_2.2.5
w __gmon_start__
-00000000000a2174 T _init
-0000000000015420 R _IO_stdin_used
U iswalpha_l@GLIBC_2.3
U iswblank_l@GLIBC_2.3
U iswcntrl_l@GLIBC_2.3
@@ -49,11 +44,8 @@
U iswspace_l@GLIBC_2.3
U iswupper_l@GLIBC_2.3
U iswxdigit_l@GLIBC_2.3
-00000000000a2170 T __libc_csu_fini
-00000000000a2100 T __libc_csu_init
U __libc_start_main@GLIBC_2.2.5
U localeconv@GLIBC_2.2.5
-000000000003aa90 T main
U malloc@GLIBC_2.2.5
U mbrtowc@GLIBC_2.2.5
U mbsnrtowcs@GLIBC_2.2.5
@@ -94,7 +86,6 @@
U setlocale@GLIBC_2.2.5
U snprintf@GLIBC_2.2.5
U sscanf@GLIBC_2.2.5
-000000000003a9a0 T _start
U stderr@GLIBC_2.2.5
U stdin@GLIBC_2.2.5
U stdout@GLIBC_2.2.5
@@ -125,35 +116,6 @@
U towlower_l@GLIBC_2.3
U towupper_l@GLIBC_2.3
U ungetc@GLIBC_2.2.5
-000000000009e490 T __unw_add_dynamic_eh_frame_section
-000000000009e030 T __unw_add_dynamic_fde
-00000000000a20a4 W unw_getcontext
-000000000009d980 W unw_get_fpreg
-000000000009dbd0 W unw_get_proc_info
-000000000009dce0 W unw_get_proc_name
-000000000009d7f0 W unw_get_reg
-000000000009d500 T _Unwind_DeleteException
-000000000009d360 T _Unwind_ForcedUnwind
-000000000009d560 T _Unwind_GetGR
-000000000009d620 T _Unwind_GetIP
-000000000009d3f0 T _Unwind_GetLanguageSpecificData
-000000000009d490 T _Unwind_GetRegionStart
-000000000009c8c0 T _Unwind_RaiseException
-000000000009cf80 T _Unwind_Resume
-000000000009d5c0 T _Unwind_SetGR
-000000000009d680 T _Unwind_SetIP
-000000000009d6e0 W unw_init_local
-000000000009dd80 W unw_is_fpreg
-000000000009de80 W unw_is_signal_frame
-000000000009df00 W unw_iterate_dwarf_unwind_cache
-00000000000ac2b8 D unw_local_addr_space
-000000000009de00 W unw_regname
-000000000009e7b0 T __unw_remove_dynamic_eh_frame_section
-000000000009e3b0 T __unw_remove_dynamic_fde
-000000000009dc60 W unw_resume
-000000000009da20 W unw_set_fpreg
-000000000009d890 W unw_set_reg
-000000000009dad0 W unw_step
U uselocale@GLIBC_2.3
U vasprintf@GLIBC_2.2.5
U vfprintf@GLIBC_2.2.5 The one none of the If I do the same exercise on 7b2cfdb and its parent 7c1e2fe, the diff is quite big due to the tcmalloc removal. The diff also contains a difference in @@ -2463,249 +2314,26 @@ T _printRawObject
T _printStackTrace
T _printStackTraceWithLocals
T _start
-T _Unwind_Backtrace
T _Unwind_DeleteException
-T _Unwind_Find_FDE
-T _Unwind_FindEnclosingFunction
T _Unwind_ForcedUnwind
-T _Unwind_GetCFA
-T _Unwind_GetDataRelBase
T _Unwind_GetGR
T _Unwind_GetIP
-T _Unwind_GetIPInfo
T _Unwind_GetLanguageSpecificData
T _Unwind_GetRegionStart
-T _Unwind_GetTextRelBase
T _Unwind_RaiseException
T _Unwind_Resume
-T _Unwind_Resume_or_Rethrow
T _Unwind_SetGR
T _Unwind_SetIP This leads me to believe the For reference, we use Lines 794 to 801 in dbd7b96
To make the native extensions work we just need to ensure we export edit: removing
The question is what this does to (1) native extensions, (2) VM-generated backtraces, and (3) profiles. |
Given that you use |
Update: Exporting only the This breaks the symbols in the It might be possible to export all symbols in @mit-mit Do we care about being able to use the @rmacnak-google @alexmarkov @mkustermann @mraleph Any concerns about switching our build from
What use case does this refer to? cc @sstrickl TODO @dcharkes:
Side note: we only have symbols when doing |
We might replace this with the mechanism [1,2] that was added for Fuchsia because Fuchsia's linker has a broken [1] https://github.com/flutter/engine/tree/master/shell/platform/fuchsia/runtime/dart/profiler_symbols |
Thanks @rmacnak-google! That is helpful! TIL about Note: all of the above functionality is apparently not exercised in tests, no tests failed when removing I see that the way that Flutter produces the external symbol file is first to produce an unstripped exe, then getting all the symbols with @rmacnak-google Any benefits in trying to stick with the dynamic table or going for our custom format? Also, why did we make our own format instead of producing dwarf debug file? |
I think the main benefit of using the dynamic table is compatibility with other backtracers that might end up in our process, such as code called from FFI that uses glibc's The custom format was to get something simple to handle at runtime and something compact. The full DWARF output for the VM is mostly things not needed to symbolize frames. |
IIRC flutter engine hides all symbols except the ones from their embedding API. How does our profiler work with the flutter engine? If it works there, can we do the same for standalone? If it doesn't work there, maybe we can come up with something that works for both? In general it seems like a good discipline to only export the symbols we actually want to be available / usable at runtime by other code (amongst many reasons: avoiding customers depending on symbols in the VM we don't consider part of our API) |
I don't see any C++ symbols at all in the profiler in devtools (in Dart standalone on Linux), only Dart symbols. Maybe I'm holding it wrong? @bkonyi Do we surface the time spent in the VM and show the C++ function names at all in devtools? If yes, what kind of Dart program should I invoke to show this? (I tried some stuff in benchmarks/ but that didn't show any time spent in the runtime. |
@mkustermann suggested hiding all symbols that start with But following this idea, we can hide all symbols that have a
diff --git a/log.txt b/log2.txt
index 2d2d0d25c58..e3368a3bd7e 100644
--- a/log.txt
+++ b/log2.txt
@@ -1736,16 +1736,6 @@ W unw_get_fpreg
W unw_get_proc_info
W unw_get_proc_name
W unw_get_reg
-T _Unwind_DeleteException
-T _Unwind_ForcedUnwind
-T _Unwind_GetGR
-T _Unwind_GetIP
-T _Unwind_GetLanguageSpecificData
-T _Unwind_GetRegionStart
-T _Unwind_RaiseException
-T _Unwind_Resume
-T _Unwind_SetGR
-T _Unwind_SetIP
W unw_init_local
W unw_is_fpreg
W unw_is_signal_frame
@@ -2318,14 +2308,9 @@ T utext_previous32From_68
T utext_replace_68
T utext_setNativeIndex_68
T utext_setup_68
-D _UTF16BEData_68
-D _UTF16Data_68
-D _UTF16LEData_68
-D _UTF16v2Data_68
T utf8_appendCharSafeBody_68
T utf8_back1SafeBody_68
R utf8_countTrailBytes_68
-D _UTF8Data_68
T utf8_nextCharSafeBody_68
T utf8_prevCharSafeBody_68
U utimensat@GLIBC_2.6
@@ -2456,7 +2441,6 @@ U wcstoull@GLIBC_2.2.5
U wmemchr@GLIBC_2.2.5
U wmemcmp@GLIBC_2.2.5
U write@GLIBC_2.2.5
-B x86_cpu_enable_avx512
U __xstat64@GLIBC_2.2.5
T ucasemap_mapUTF8(int, unsigned int, icu_68::BreakIterator*, char*, int, char const*, int, void (*)(int, unsigned int, icu_68::BreakIterator*, unsigned char const*, int, icu_68::ByteSink&, icu_68::Edits*, UErrorCode&), icu_68::Edits*, UErrorCode&)
T ucasemap_mapUTF8(int, unsigned int, icu_68::BreakIterator*, char const*, int, void (*)(int, unsigned int, icu_68::BreakIterator*, unsigned char const*, int, icu_68::ByteSink&, icu_68::Edits*, UErrorCode&), icu_68::ByteSink&, icu_68::Edits*, UErrorCode&) This also makes the reproduction properly return |
Sorry for the delay, I must have missed this. Yes, you can see native code in the DevTools CPU profile, but I believe it's hidden by default. If you go to the CPU Profiler page and select the filter button (to the right of the CPU Flame Chart subtab), you be able to remove the 'Hide Native code' filter: ![]() Here's a sample profile after removing the filter: ![]() |
CL that does this: https://dart-review.googlesource.com/c/sdk/+/323702 |
We were exporting all symbols to the dynamic table so that they could be looked up using `dladdr` for the profiler and backtracer. The symbols include our statically-linked libcxx, which can create trouble when another DSO has a different version of libcxx. Now we export only the VM embedding API functions (`Dart_*`) and use a specially produced table to do the symbolization. TEST=runtime/tests/vm/dart/exported_symbols_test.dart TEST=runtime/tests/vm/dart/symbolized_crash_test.dart Bug: #53267 Change-Id: I2ee494fba86f67127ba0f6f402622f01a4662207 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/323702 Reviewed-by: Daco Harkes <[email protected]> Commit-Queue: Ryan Macnak <[email protected]>
Yeah, I think we should. |
@rmacnak-google Will you make the CP? |
Closing this issue as a cherry pick has been created for both the stable and beta channels. |
We were exporting all symbols to the dynamic table so that they could be looked up using `dladdr` for the profiler and backtracer. The symbols include our statically-linked libcxx, which can create trouble when another DSO has a different version of libcxx. Now we export only the VM embedding API functions (`Dart_*`) and use a specially produced table to do the symbolization. Bug: #53267 Change-Id: I50f8150d194a564c116d95383c28a1a2aba0714e Cherry-pick: https://dart-review.googlesource.com/c/sdk/+/323702 Cherry-pick-request: #53503 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/325126 Reviewed-by: Michael Thomsen <[email protected]> Reviewed-by: Daco Harkes <[email protected]> Commit-Queue: Siva Annamalai <[email protected]>
We are seeing consistent process termination with the latest stable Dart executable on Linux with FFI
We have this code in the native library.
The
f()
throws astd::exception
.The process ignores the
catch(...)
block and proceeds to terminate.The same library binary file works correct when loaded by the previous Dart stable version 3.0 executable.
The stack trace at termination is this
In this case the
f()
is theverify_attached
function which throws the exception.The output contains
The text was updated successfully, but these errors were encountered: