Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NDK crashes symbolification #8832

Open
akravch opened this issue Mar 25, 2024 · 9 comments
Open

NDK crashes symbolification #8832

akravch opened this issue Mar 25, 2024 · 9 comments
Assignees
Labels
Area: App+Library Build Issues when building Library projects or Application projects. enhancement Proposed change to current functionality.

Comments

@akravch
Copy link

akravch commented Mar 25, 2024

Android application type

.NET Android (net7.0-android, net8.0-android, etc.)

Affected platform version

.NET 8.0.200, workload 34.0.52/8.0.100

Description

Since updating from Xamarin to .NET 8 we are getting a lot of native crashes from production users. In order to investigate the reason, I'm trying to find a way to symbolificate them.

Our crash monitoring system is based on AppCenter, which uses Google Breakpad internally, but at this point I would be happy with any manual way of symbolification, either with Breakpad's minidump_stackwalk, ndk-stack, or anything else.

First, I noticed that by default there are no debug symbols at all in the .dll.so files produced by AOT under . Running readelf -Ws usually gives something like that:

Symbol table '.dynsym' contains 2 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000003b30     0 OBJECT  GLOBAL DEFAULT    9 mono_aot_file_info

I have found that setting _AndroidAotStripLibraries to false disables library stripping, but judging by the underscore in its name it's private API:

https://github.com/xamarin/xamarin-android/blob/51bcefdbf08a2c17ab0633d5aee37f22acf5164d/src/Xamarin.Android.Build.Tasks/Microsoft.Android.Sdk/targets/Microsoft.Android.Sdk.Aot.targets#L93

Now readelf shows that there are debug symbols inside:

Symbol table '.dynsym' contains 2 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000003b30     0 OBJECT  GLOBAL DEFAULT    9 mono_aot_file_info

Symbol table '.symtab' contains 40 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000000014e0     0 NOTYPE  LOCAL  DEFAULT    6 jit_code_start
     2: 00000000000014f0    83 FUNC    LOCAL  DEFAULT    6 AndroidSymbols_MainActivity_OnCreate_Android_OS_Bundle
     3: 0000000000003d78     0 OBJECT  LOCAL  DEFAULT   10 mono_aot_AndroidSymbols_got
     4: 0000000000001550    36 FUNC    LOCAL  DEFAULT    6 AndroidSymbols_MainActivity_NativeCrash
     5: 0000000000001580    42 FUNC    LOCAL  DEFAULT    6 AndroidSymbols_MainActivity__ctor
     ...

However, if I crash the app and try to symbolificate the dump using the .dll.so with symbols e.g. with ndk-stack, it doesn't map the addresses to the symbols:

********** Crash dump: **********
Build fingerprint: 'Android/sdk_gphone_x86_64/generic_x86_64:8.0.0/OSR1.180418.026/6741039:userdebug/dev-keys'
#00 0x00000000000212c1 /system/lib64/libc.so (strlen+17)
#01 0x0000000000019582 <anonymous:0000000046c5e000>    <-- I would expect to see AndroidSymbols_MainActivity_NativeCrash here
#02 0x00000000000000d2 <anonymous:00007646df8dd000>
#03 0x0000000000000007 <unknown>
Crash dump is completed

I have also tried disabling assembly compression with <AndroidEnableAssemblyCompression>false</AndroidEnableAssemblyCompression>, but it didn't change anything in terms of symbolification.

I am assuming I am still not building the app correctly for symbolification, so I would appreciate any help with that.

Steps to Reproduce

I have a repro project here, that crashes with an NDK crash on launch (run in Release): https://github.com/akravch/AndroidSymbols. _AndroidAotStripLibraries is set to false.

Also attaching readelf output for the unstripped .dll.so, ndk-stack output and the .dll.so itself.

Did you find any workaround?

No.

Relevant log output

No response

@akravch akravch added Area: App+Library Build Issues when building Library projects or Application projects. needs-triage Issues that need to be assigned. labels Mar 25, 2024
@jonathanpeppers
Copy link
Member

@grendello might be able to comment on how to get better crashes for native code.

For managed code, there used to be the mono-symbolicate tool -- that does what you're looking for here.

These are the open issues:

@steveisok @tommcdon are there any plans to implement something like mono-symbolicate? Is there a modern equivalent that .NET Core or NativeAOT apps use?

@jonathanpeppers jonathanpeppers added enhancement Proposed change to current functionality. and removed needs-triage Issues that need to be assigned. labels Mar 26, 2024
@jonathanpeppers jonathanpeppers added this to the Under Consideration milestone Mar 26, 2024
@grendello
Copy link
Contributor

NDK contains a utility called ndk-stack (it can be found in the NDK's root directory). The utility will take an Android crash and symbolize it, given that it has access to .so files with symbols in them. For dotnet apps, the .so libraries will have their symbols stripped in most cases but you can fetch them from symbol servers with dotnet-symbol library-name.so.

@akravch
Copy link
Author

akravch commented Apr 11, 2024

Hi @grendello, thank you for the response, but as I initially mentioned, I have already tried using ndk-stack and it didn't help in my case. It does help for the libs that are .so-s from the beginning, but not for the .so-s I'm getting from .dll-s after AOT.

Also, dotnet-symbol won't help in this case as well, because there are unlikely symbols for our application on Microsoft servers.

So basically my question is how do I get the symbols for our own .so-s that we get from the .dll-s. What I was able to get after settings _AndroidAotStripLibraries to false looked promising, but didn't help with symbolication after all.

@grendello
Copy link
Contributor

@akravch I'm sorry, I should have read the context... :) The AOT .so files are weird in that they contain code as data, so most symbols will be useless for you. The code there is loaded into the application as if it were JIT-ed but without actually running the JIT and it's not executed directly.

The generated code looks like this (you can see it if you set the AndroidAotAdditionalArguments MSBuild property to save-temps and then look for temp.s files in the obj/ directory of your project):

jit_code_start:

        .byte 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
.text 0
        .balign 16
.Lm_a:
        .local Java_Interop_JavaArray_1_T_REF__ctor_Java_Interop_JniObjectReference__Java_Interop_JniObjectReferenceOptions
        .type Java_Interop_JavaArray_1_T_REF__ctor_Java_Interop_JniObjectReference__Java_Interop_JniObjectReferenceOptions,@function
Java_Interop_JavaArray_1_T_REF__ctor_Java_Interop_JniObjectReference__Java_Interop_JniObjectReferenceOptions:
.inst 0xa9bd7bfd
.inst 0x910003fd
.inst 0xf9000ba0
.inst 0xf9000fa1
.inst 0xf90013a2

adrp x16, mono_aot_Java_Interop_got+0
add x16, x16, :lo12:mono_aot_Java_Interop_got
ldr x0, [x16, 56]
.inst 0xf9400011
.inst 0xb4000051
bl .Lp_1
.inst 0xf9400ba0
.inst 0xf9400fa1
.inst 0xb98023a2
bl .Lp_2
.inst 0x910003bf
.inst 0xa8c37bfd
.inst 0xd65f03c0

// ...skipped

.text 0
        .balign 16
plt:
mono_aot_Java_Interop_plt:
        .local plt__jit_icall_mono_threads_state_poll
        .type plt__jit_icall_mono_threads_state_poll,@function
plt__jit_icall_mono_threads_state_poll:
.Lp_1:
adrp x16, mono_aot_Java_Interop_got+0
add x16, x16, :lo12:mono_aot_Java_Interop_got
ldr x16, [x16, 1944]
br x16
.inst 4004
        .size plt__jit_icall_mono_threads_state_poll,.-plt__jit_icall_mono_threads_state_poll
        .local plt_Java_Interop_JavaObject__ctor_Java_Interop_JniObjectReference__Java_Interop_JniObjectReferenceOptions
        .type plt_Java_Interop_JavaObject__ctor_Java_Interop_JniObjectReference__Java_Interop_JniObjectReferenceOptions,@function
plt_Java_Interop_JavaObject__ctor_Java_Interop_JniObjectReference__Java_Interop_JniObjectReferenceOptions:
.Lp_2:
adrp x16, mono_aot_Java_Interop_got+0
add x16, x16, :lo12:mono_aot_Java_Interop_got
ldr x16, [x16, 1952]
br x16

The .inst lines are the pre-JIT-ed code (roughly) which then needs to be loaded by the runtime and prepared to execute - they will be loaded at a different address than the one you can see in the binary. You can see that those sections are skipped-over (the bl instruction jumps over them to a label somewhere after the "data" section).

@akravch
Copy link
Author

akravch commented Apr 11, 2024

Thanks for the details, that was very enlightening!

So does it mean there is no way at all to trace addresses back to source code in this case?

they will be loaded at a different address than the one you can see in the binary

I guess the addresses are still following some rules, right? In our crash reporting system I see different crashes with completely same addresses and offsets, so the code must be loaded in the same way.

@grendello
Copy link
Contributor

The source code is, eventually, some managed one and you should be able to see it in the trace as a regular managed frame (or a trampoline to one). AOT is merely a shortcut to avoid JIT-ing at run time, the managed symbols should be preserved. You shouldn't think of it as about "native code running in a shared library", but rather as "serialized native code carried by a shared library" - the code will be running in the same place it would run if it was normally JIT-ed.

Addresses have a very good chance of being similar/the same, because your application doesn't change on the device between runs, only between installations/updates. Base pointers may be the same (but don't have to be), offsets off of them will be the same in all cases, until the app's updated.

If you want to try to map addresses to managed symbols, I think you'd need to do it manually. Having the temp.s source generated for your managed library's AOT image and the unstripped .so you can learn what managed symbol it refers to (by name) but I'm not certain this is even necessary. Using AOT doesn't mean your managed code is turned into a shared native library, with a native symbol and all the code in it. It means that JIT is going through the same steps it normally does, but when it hits an AOT-d method, it loads the serialized instruction stream from the .so image instead of generating the code dynamically. In both cases the code ends up in the "same" place, but what's more important - the mapping between the managed code and its native instruction stream is the same. This is to say that the address you see in the native stack trace doesn't really point to a different location in either case. Native stack traces in both AOT and JIT case should largely be the same.

@ivanpovazan
Copy link
Member

ivanpovazan commented May 27, 2024

Hi @akravch, I looked a bit deeper into this issue and the problem seems to related to how native crashes during PInvokes are reported.

From Android's native perspective if we take a look the stack trace you are seeing:

********** Crash dump: **********
Build fingerprint: 'Android/sdk_gphone_x86_64/generic_x86_64:8.0.0/OSR1.180418.026/6741039:userdebug/dev-keys'
#00 0x00000000000212c1 /system/lib64/libc.so (strlen+17)
#01 0x0000000000019582 <anonymous:0000000046c5e000>    <-- I would expect to see AndroidSymbols_MainActivity_NativeCrash here
#02 0x00000000000000d2 <anonymous:00007646df8dd000>
#03 0x0000000000000007 <unknown>
Crash dump is completed

The unknown stack frame is shown because the call to libc comes from jitted methods. When we JIT compile a managed method (from your example: managed strlen) we also compile various trampolines / managed-to-native wrappers which are actually calling the native strlen function. That being said post-mortem resymbolication of such stack trace is not really possible.

However, what is possible is to get such information from the mono runtime when the crash happens.
Mono runtime handles native crashes / signals and these lines https://github.com/dotnet/runtime/blob/9a43de3206c9146690fdb57fce9b37347d8d5720/src/mono/mono/mini/mini-exceptions.c#L2972 are unwinding the managed stack frames before the process has been aborted and printing the results to stdout.
I think this is the information you are looking for.

Sadly, to make all this a bit more difficult, an additional problem here is that there does not seem to be a simple way to redirect stdout/stderr to android logcats (ref: android/ndk#671). But by using the solution provided here and your example code I managed to redirect these messages to logcat and after inspection it indeed shows managed stack trace:

05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: ...
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: =================================================================
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: 	Managed Stacktrace:
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: =================================================================
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: 	  at <unknown> <0xffffffff>
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: 	  at AndroidSymbols.MainActivity:strlen <0x00007>
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: 	  at AndroidSymbols.MainActivity:NativeCrash <0x0002b>
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: 	  at AndroidSymbols.MainActivity:OnCreate <0x0005b>
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: 	  at Android.App.
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: Activity:n_OnCreate_Landroid_os_Bundle_ <0x0007b>
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: 	  at Android.Runtime.JNINativeWrapper:Wrap_JniMarshal_PPL_V <0x00053>
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: 	  at 
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: Android.Runtime.JNINativeWrapper:Wrap_JniMarshal_PPL_V <0x00067>
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: ==============================================================
05-27 14:43:28.267  6544  6599 D com.companyname.AndroidSymbols: ===

Potential workarounds/solutions

  • As a workaround, you could try what I posted above and see if that helps you diagnose better the native crashes.
  • As a potential solution I think we should revisit the decision on printing the stack traces to stdout/stderr rather than to use NDK logging API (like mono_trace is doing). /cc: @steveisok @lambdageek @thaystg

Let me know if this helps and we will find a more robust solution on our end.

@akravch
Copy link
Author

akravch commented Jun 13, 2024

Hi @ivanpovazan, thanks a lot for the idea, we tried it and it actually worked even in production, and the stack traces were readable!

I think it would be really great to have it better integrated into the normal logging process somehow. And if there was an out of the box option to save this output to a file before the app crash, it would almost be an NDK crash handler already.

@espenrl
Copy link

espenrl commented Jun 28, 2024

For anyone interested here is code that redirects stdout (1) to Logcat.

using Java.IO;
using System.Text;
using static Android.Systems.Os;

namespace MauiApp;

// https://codelab.wordpress.com/2014/11/03/how-to-use-standard-output-streams-for-logging-in-android-apps/
// https://stackoverflow.com/questions/23352592/redirecting-stdin-and-stdout-of-c-program-in-android
// https://android.googlesource.com/platform/libcore/+/cff1616/luni/src/main/java/java/io/FileDescriptor.java

/// <summary>
/// On Android stdout are redirected to /dev/null by default.
/// This class redirects stdout to LogCat.
/// </summary>
public static class AndroidRedirectStdToLogCat
{
    private static Thread? Thread;

    private static bool IsStarted => Thread is not null;

    public static void Start(string applicationName)
    {
        if (IsStarted)
        {
            return;
        }

        /* create the pipe and redirect stdout */
        var pipe = Pipe();

        if (pipe is null)
        {
            throw new Exception("Could not create pipe");
        }

        var pipeReader = pipe[0];
        var pipeWriter = pipe[1];

        // NOTE: file descriptors for 0, 1 and 2 are stdin, stdout and stderr (unix)
        Dup2(pipeWriter, 1); // redirect stdout to pipe

        /* spawn the logging thread */
        var thread = new Thread(ThreadLoop);

        Thread = thread;

        thread.Start(new ThreadParameters(applicationName, pipeReader));
    }

    /// <summary>
    /// For testing.
    /// </summary>
    internal static void WriteToStdout(string text)
    {
        // NOTE: System.Console.OpenStandardOutput() points to LogcatStream
        // https://github.com/dotnet/runtime/blob/main/src/libraries/System.Console/src/System/ConsolePal.Android.cs

        // NOTE: Java.IO.FileDescriptor.Out does NOT point to fd 1 (original stdout)

        // use reflection to create a new FileDescriptor pointing to stdout (1)
        using var stdout = new FileDescriptor();
        var field = stdout.Class.GetDeclaredField("descriptor");
        field.Accessible = true;
        field.SetInt(stdout, 1);

        // write to stdout
        using var stream = new FileOutputStream(stdout);

        var bytes = Encoding.UTF8.GetBytes(text);

        stream.Write(bytes);
        stream.Flush();
    }

    private sealed record ThreadParameters(
        string ApplicationName,
        FileDescriptor PipeReaderFileDescriptor);

    private static void ThreadLoop(object? parameter)
    {
        try
        {
            if (parameter is not ThreadParameters parameters)
            {
                throw new Exception("Parameter object expected.");
            }

            var applicationName = parameters.ApplicationName;

            const int bufferSize = 512;
            var buffer = new byte[bufferSize];
            var bufferSpan = buffer.AsSpan();

            var lineTerminator = (byte)'\n';
            int usedBufferLength = 0;
            int readCount;

            using var stream = new FileInputStream(parameters.PipeReaderFileDescriptor);

            // read blocks until it gets input, -1 signifies EOF
            while ((readCount = stream.Read(buffer, usedBufferLength, buffer.Length - usedBufferLength)) > 0)
            {
                var length = usedBufferLength + readCount;
                ReadOnlySpan<byte> rest = bufferSpan[..length];

                // iterate lines and write to LogCat
                while (rest.IndexOf(lineTerminator) is var index && index != -1)
                {
                    var lineBytes = rest[..index]; // skip null terminator
                    WriteToLogCat(lineBytes, applicationName);

                    rest = rest[(index + 1)..]; // skip null terminator
                }

                // line terminator not found, but there may still be some bytes left
                switch (rest.Length)
                {
                    // buffer full, write to LogCat
                    case bufferSize:
                        WriteToLogCat(rest, applicationName);
                        usedBufferLength = 0;
                        break;
                    // copy to start of buffer
                    case > 0 when (rest.Length != usedBufferLength):
                        rest.CopyTo(bufferSpan); // CopyTo deals with overlapping regions
                        usedBufferLength = rest.Length;
                        break;
                    case 0:
                        usedBufferLength = 0;
                        break;
                }
            }

            // write any last content in buffer to LogCat
            if (usedBufferLength > 0)
            {
                var lineBytes = bufferSpan.Slice(0, usedBufferLength);
                WriteToLogCat(lineBytes, applicationName);
            }

            // cleanup
            Thread = null;
        }
        catch (Exception e)
        {
            Android.Util.Log.Debug(nameof(AndroidRedirectStdToLogCat), e.ToString());
        }
    }

    private static void WriteToLogCat(ReadOnlySpan<byte> bytes, string applicationName)
    {
        var lineStr = Encoding.UTF8.GetString(bytes);
        Android.Util.Log.Debug(applicationName, lineStr);
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: App+Library Build Issues when building Library projects or Application projects. enhancement Proposed change to current functionality.
Projects
None yet
Development

No branches or pull requests

6 participants