Closed Bug 803158 Opened 7 years ago Closed 6 years ago

if no crash report is generated by a tegra (or whatever we're running tests on) use ndk-stack to get a stack from the tombstone

Categories

(Firefox Build System :: General, defect)

x86
macOS
defect
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: blassey, Assigned: jmaher)

References

Details

(Keywords: sheriffing-P1)

Attachments

(6 files)

No description provided.
This description of ndk-stack might be useful: https://yssays.wordpress.com/2011/12/27/android-ndk-stack-tool/
Here is a log file with a crash that doesn't get picked up in crashreporter, but spits to logcat:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16336570&tree=Mozilla-Inbound&full=1

I/DEBUG   (  937): *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
I/DEBUG   (  937): Build fingerprint: 'nvidia/harmony/harmony/harmony:2.2/FRF91/20110202.102810:eng/test-keys'
I/DEBUG   (  937): pid: 1700, tid: 1705  >>> org.mozilla.fennec <<<
I/DEBUG   (  937): signal 11 (SIGSEGV), fault addr 00335d58
I/DEBUG   (  937):  r0 00000000  r1 00313310  r2 00335d58  r3 808a23f4
I/DEBUG   (  937):  r4 002faaf0  r5 4b405c98  r6 808a23f4  r7 00000006
I/DEBUG   (  937):  r8 00100000  r9 8084f865  10 4b306000  fp 00131b80
I/DEBUG   (  937):  ip 00000003  sp 4b405c30  lr 8086ed91  pc 8086ee00  cpsr 20000030
I/DEBUG   (  937):  d0  400000003eaaaaab  d1  3ff0000041f00000
I/DEBUG   (  937):  d2  0000000050baf6de  d3  0000000000000000
I/DEBUG   (  937):  d4  0000001c000000b4  d5  3fe999999999999a
I/DEBUG   (  937):  d6  3fe0000000000000  d7  3eaaaaab3f800000
I/DEBUG   (  937):  d8  0000000000000000  d9  0000000000000000
I/DEBUG   (  937):  d10 0000000000000000  d11 0000000000000000
I/DEBUG   (  937):  d12 0000000000000000  d13 0000000000000000
I/DEBUG   (  937):  d14 0000000000000000  d15 0000000000000000
I/DEBUG   (  937):  scr 80000012
I/DEBUG   (  937): 
I/DEBUG   (  937):          #00  pc 0006ee00  /system/lib/libdvm.so
I/DEBUG   (  937):          #01  pc 0006ed8c  /system/lib/libdvm.so
I/DEBUG   (  937):          #02  pc 00074466  /system/lib/libdvm.so
I/DEBUG   (  937):          #03  pc 0006dee8  /system/lib/libdvm.so
I/DEBUG   (  937):          #04  pc 0004f8b6  /system/lib/libdvm.so
I/DEBUG   (  937):          #05  pc 000110a4  /system/lib/libc.so
I/DEBUG   (  937):          #06  pc 00010c38  /system/lib/libc.so
I/DEBUG   (  937): 
I/DEBUG   (  937): code around pc:
I/DEBUG   (  937): 8086ede0 ffff41cf ffff41e6 ffff421b ffff422d 
I/DEBUG   (  937): 8086edf0 4a07a000 181b4b07 58992000 e001460a 
I/DEBUG   (  937): 8086ee00 68526010 d1fb2a00 189b4a01 47706059 
I/DEBUG   (  937): 8086ee10 000046fc 00033600 6843684a 688a18d3 
I/DEBUG   (  937): 8086ee20 6883604b 68ca18d3 68c3608b 18d32000 
I/DEBUG   (  937): 
I/DEBUG   (  937): code around lr:
I/DEBUG   (  937): 8086ed70 4a1e6903 f8dd9300 1871e068 e004f8cd 
I/DEBUG   (  937): 8086ed80 200318b2 3018f8dc ed2af7a7 f830f000 
I/DEBUG   (  937): 8086ed90 30aef89d f8ddb943 f8dee040 f1bcc000 
I/DEBUG   (  937): 8086eda0 bf180000 e0082001 9b159a1a 70d2eb02 
I/DEBUG   (  937): 8086edb0 10419a10 f7ff4620 b049fcd7 bf00bdf0 
I/DEBUG   (  937): 
I/DEBUG   (  937): stack:
I/DEBUG   (  937):     4b405bf0  4b405c98  
I/DEBUG   (  937):     4b405bf4  00000084  
I/DEBUG   (  937):     4b405bf8  00000000  
I/DEBUG   (  937):     4b405bfc  00000000  
I/DEBUG   (  937):     4b405c00  808a23f4  /system/lib/libdvm.so
I/DEBUG   (  937):     4b405c04  4b405ea4  
I/DEBUG   (  937):     4b405c08  00000008  
I/DEBUG   (  937):     4b405c0c  0000006c  
I/DEBUG   (  937):     4b405c10  00000002  
I/DEBUG   (  937):     4b405c14  00310000  
I/DEBUG   (  937):     4b405c18  00000000  
I/DEBUG   (  937):     4b405c1c  002faaf0  
I/DEBUG   (  937):     4b405c20  4b405c98  
I/DEBUG   (  937):     4b405c24  808a23f4  /system/lib/libdvm.so
I/DEBUG   (  937):     4b405c28  df0027ad  
I/DEBUG   (  937):     4b405c2c  00000000  
I/DEBUG   (  937): #01 4b405c30  4b405d4f  
I/DEBUG   (  937):     4b405c34  4b405d44  
I/DEBUG   (  937):     4b405c38  00000065  
I/DEBUG   (  937):     4b405c3c  00000000  
I/DEBUG   (  937):     4b405c40  00000000  
I/DEBUG   (  937):     4b405c44  1feb5e40  
I/DEBUG   (  937):     4b405c48  0000fa00  
I/DEBUG   (  937):     4b405c4c  afd3832c  /system/lib/libc.so
I/DEBUG   (  937):     4b405c50  d39fad79  
I/DEBUG   (  937):     4b405c54  00313768  
I/DEBUG   (  937):     4b405c58  000001c0  
I/DEBUG   (  937):     4b405c5c  000001c2  
I/DEBUG   (  937):     4b405c60  00313318  
I/DEBUG   (  937):     4b405c64  000001c0  
I/DEBUG   (  937):     4b405c68  00000003  
I/DEBUG   (  937):     4b405c6c  002faaf0  
I/DEBUG   (  937):     4b405c70  4b405ea4  
I/DEBUG   (  937):     4b405c74  808a23f4  /system/lib/libdvm.so
I/DEBUG   (  937):     4b405c78  44198e70  /data/dalvik-cache/system@framework@framework.jar@classes.dex
I/DEBUG   (  937):     4b405c7c  808a23f4  /system/lib/libdvm.so
I/DEBUG   (  937):     4b405c80  0000001a  
I/DEBUG   (  937):     4b405c84  4b405d98  
I/DEBUG   (  937):     4b405c88  808964f4  /system/lib/libdvm.so
I/DEBUG   (  937):     4b405c8c  00000394  
I/DEBUG   (  937):     4b405c90  80884688  /system/lib/libdvm.so
I/DEBUG   (  937):     4b405c94  00313348  
I/DEBUG   (  937):     4b405c98  0000000c  
I/DEBUG   (  937):     4b405c9c  00000006  
I/DEBUG   (  937):     4b405ca0  00313818  
I/DEBUG   (  937):     4b405ca4  43189b88  /dev/ashmem/dalvik-LinearAlloc (deleted)
I/DEBUG   (  937):     4b405ca8  002faaf0  
I/DEBUG   (  937):     4b405cac  00314568  
I/DEBUG   (  937):     4b405cb0  002f5884  
I/DEBUG   (  937):     4b405cb4  002f544c  
I/DEBUG   (  937):     4b405cb8  00314568  
I/DEBUG   (  937):     4b405cbc  00000008  
I/DEBUG   (  937):     4b405cc0  00000002  
I/DEBUG   (  937):     4b405cc4  003137f8  
I/DEBUG   (  937):     4b405cc8  00000002  
I/DEBUG   (  937):     4b405ccc  0000007c  
I/DEBUG   (  937):     4b405cd0  00000084  
I/DEBUG   (  937):     4b405cd4  002f58cc  
I/DEBUG   (  937):     4b405cd8  4edc2618  /dev/ashmem/dalvik-jit-code-cache (deleted)
I/DEBUG   (  937):     4b405cdc  00000000  
I/DEBUG   (  937):     4b405ce0  00000000  
I/DEBUG   (  937):     4b405ce4  00000002  
I/DEBUG   (  937):     4b405ce8  00000000  
I/DEBUG   (  937):     4b405cec  00000000  
I/DEBUG   (  937):     4b405cf0  00000000  
I/DEBUG   (  937):     4b405cf4  00000000  
I/DEBUG   (  937):     4b405cf8  00314420  
I/DEBUG   (  937):     4b405cfc  00000000  
I/DEBUG   (  937):     4b405d00  00000000  
I/DEBUG   (  937):     4b405d04  00000000  
I/DEBUG   (  937):     4b405d08  00000000  
I/DEBUG   (  937):     4b405d0c  002f5884  
I/DEBUG   (  937):     4b405d10  00313e8c  
I/DEBUG   (  937):     4b405d14  00000002  
I/DEBUG   (  937):     4b405d18  4b405d98  
I/DEBUG   (  937):     4b405d1c  00000003  
I/DEBUG   (  937):     4b405d20  0000003a  
I/DEBUG   (  937):     4b405d24  00313830  
I/DEBUG   (  937):     4b405d28  003138f8  
I/DEBUG   (  937):     4b405d2c  00000000  
I/DEBUG   (  937):     4b405d30  00000000  
I/DEBUG   (  937):     4b405d34  00000000  
I/DEBUG   (  937):     4b405d38  003141c0  
I/DEBUG   (  937):     4b405d3c  00000000  
I/DEBUG   (  937):     4b405d40  00000000  
I/DEBUG   (  937):     4b405d44  00000000  
I/DEBUG   (  937):     4b405d48  000001cb  
I/DEBUG   (  937):     4b405d4c  008db80c  
I/DEBUG   (  937):     4b405d50  00000000  
I/DEBUG   (  937):     4b405d54  808a7460  
I/DEBUG   (  937):     4b405d58  00000000  
I/DEBUG   (  937):     4b405d5c  00000000  
I/DEBUG   (  937):     4b405d60  be8db80c  [stack]
I/DEBUG   (  937):     4b405d64  8087446b  /system/lib/libdvm.so
I/DEBUG   (  937): debuggerd committing suicide to free the zombie!

In order to use the ndk-stack tool, I need system level symbols.  Is there a way to get these from the tegras or somewhere on the internet?
Depends on: 825643
Blocks: 823452
Keywords: sheriffing-P1
Blocks: 825643, 810471
No longer depends on: 825643
Here is a try run with an intentional Gecko crash on startup:

https://tbpl.mozilla.org/?tree=Try&rev=250358af465d

In (nearly?) all of these cases, mozcrash reports the crash stack perfectly. (The Talos tests format the error message differently, so tbpl offers "stack found after process termination" rather than a helpful PROCESS CRASH line...but all the same crash stack data is in the log.) Also, in (almost?) none of these cases is a tombstone dumped to the logcat. For this type of crash, it seems unlikely that additional processing by ndk-stack would provide any additional information.


Here is a try run with an intentional Java crash on startup:

https://tbpl.mozilla.org/?tree=Try&rev=ef8deb159d5d

As far as I can see, all of these logs contain accurate Java crash stacks under "REPORTING UNCAUGHT EXCEPTION FOR THREAD". tbpl doesn't recognize the error well -- that is bug 823452 -- but an accurate and complete stack is reported in the existing logs.  In these Java crash cases, a tombstone is also usually reported in logcat, and those tombstones look very much like the tombstones for the long-unresolved problem of startup crashes: bug 810471. In these cases -- when there is a tombstone but no Java stack reported by the unhandled exception handler -- additional processing by ndk-stack might provide additional information. Since the tombstones for bug 810471 generally only reference system libraries, we will likely need to provide symbols for system libraries to get any value out of this -- likely only available for pandaboards.
If we have a useful Java stack, the tombstone stack is unlikely to be very useful, right? On crash-stats when we catch a Java exception we send and display that stack in the crash report instead of the native stack.
See Also: → 813132
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #4)
> If we have a useful Java stack, the tombstone stack is unlikely to be very
> useful, right? On crash-stats when we catch a Java exception we send and
> display that stack in the crash report instead of the native stack.

Right. We are really only looking at ndk-stack because of the bug 810471 scenario: we often have a tombstone (with libc and system references only) but no stack. We don't know that ndk-stack will help, but it might.
There is a little bit of ndk-stack info at <path-to-your-ndk>/docs/NDK-STACK.html -- it's not very useful.
On my local pandaboard, I ran my Java-crash-on-startup Fennec. This wrote a tombstone to /data/tombstones and also to logcat. I collected those traces and ran some experiments with ndk-stack, from the r8c Android NDK.

There was substantially more information in the /data/tombstones file than was recorded to logcat. Running ndk-stack on the logcat produced a very abbreviated report:

********** Crash dump: **********
Build fingerprint: 'pandaboard/pandaboard/pandaboard:4.0.4/IMM76I/5:eng/test-keys'
pid: 2089, tid: 2103  >>> org.mozilla.fennec_mozdev <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000000
Stack frame #00  pc 000009aa  /dev/ashmem/libmozalloc.so (deleted)

Running ndk-stack on the /data/tombstones file produced a full report. I tried trimming the logcat, manipulating end-of-line markers, etc, but could not produce a good report from the logcat. I concentrated on the /data/tombstones file instead.

Running ndk-stack on the /data/tombstones file without providing any symbols produced a full report that was concise and much more readable than the /data/tombstones file itself. I don't think the ndk-stack report contains any additional information, but there may be value in doing this simply for the formatting.

Of course we would prefer to use symbols. One stack frame referenced libmozalloc.so; without any libmozalloc symbols provided to ndk-stack, this is reported as:

Stack frame #00  pc 000009aa  /dev/ashmem/libmozalloc.so (deleted)

Providing the unstripped libmozalloc.so improves this and gives us the kind of information we want:

Stack frame #00  pc 000009aa  /dev/ashmem/libmozalloc.so (deleted): Routine mozalloc_abort in /home/mozdev/src/memory/mozalloc/mozalloc_abort.cpp:30

Providing a stripped libmozalloc.so does not help, and produces an annoying error message:

Stack frame #00  pc 000009aa  /dev/ashmem/libmozalloc.so (deleted): Unable to open symbol file /home/mozdev/pandasym/libmozalloc.so. Error (9): Bad file descriptor

I pulled libc.so and other system libraries from the pandaboard, from /system/lib and offered those to ndk-stack. Those libs do not appear to have symbols and produce the same error message:

Stack frame #00  pc 0000cff0  /system/lib/libc.so (epoll_wait): Unable to open symbol file /home/mozdev/pandasym/libc.so. Error (9): Bad file descriptor
Comment on attachment 699448 [details]
best ndk-stack report from experiment in comment 7: from tombstone + unstripped libmozalloc.so

Even this best-case result doesn't seem incredibly useful, especially given the amount of effort it takes to get it.
:jmaher provided a libc.so with symbols, for the pandas. I verified that the libc.so contains symbols, and ndk-stack recognizes them: I applied ndk-stack to the tombstone generated earlier and obtained a report with all the libc references translated.

I patched my panda build by remounting /system rw and pushing libc.so to /system/lib. I launched a normal Fennec build and verified that it started (the new libc.so appears to be valid). I then launched my crashing Fennec build and verified that it crashed; it did crash but did not generate a tombstone. I re-tried several times but could not generate a new tombstone with the new libc.so...but maybe I was just unlucky.

I restored the old libc.so and launched the crashing Fennec twice -- no tombstone the first time, but one was generated the second time.

I tried to put back the new libc.so for further tests, but cp crashed (it uses libc of course!) and left my panda unbootable...I'll re-flash, but since nearly everything uses libc, I cannot think of a safe way to patch my build. Perhaps it would be best to create and test a full image with unstripped libs.
I re-flashed, patched libc.so again, and re-installed the crashing Fennec. This time I was able to get several tombstones. However, ndk-stack fails when reading these tombstones; it reports only:

********** Crash dump: **********
Build fingerprint: 'pandaboard/pandaboard/pandaboard:4.0.4/IMM76I/5:eng/test-keys'
pid: 2448, tid: 2461  >>> org.mozilla.fennec_mozdev <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 00000000
ndk-stack: elff/elf_file.cc:102: static ElfFile* ElfFile::Create(char const*): Assertion `read_bytes != -1 && read_bytes == sizeof(header)' failed.
Stack frame #00  pc 00000000  Aborted
Summary: if no crash report is generated by a tegra (or whatever we're running tests on) use ndk-stack to get a stack from the toomstone → if no crash report is generated by a tegra (or whatever we're running tests on) use ndk-stack to get a stack from the tombstone
:jchen found and fixed a bug in ndk-stack and I tested his fixed ndk-stack against the tombstone in Comment 14 and the new libc.so with symbols: It produced a good report, with functions and line numbers for the libc references.

So :jmaher's libc.so appears to be functional and can be used with ndk-stack to get additional information from these troublesome tombstones. :jchen's patched ndk-stack is highly recommended.
This patch fixes a bug in ndk-stack where it tries to open a symbols file even if there is no symbols file for that stack frame. Apply it against the NDK sources at https://android.googlesource.com/platform/ndk, and run 'make -C sources/host-tools/ndk-stack -f GNUMakefile'

A pre-compiled ndk-stack for 32-bit Linux with the fix is at http://people.mozilla.org/~nchen/ndk-stack.tar.bz2
Depends on: 836332, 836333, 836334
No longer blocks: 823452
are we still interested in doing this?
(In reply to Joel Maher (:jmaher) from comment #19)
> are we still interested in doing this?

I still think that ndk-stack might help us understand some crashes better, but certainly it seems less urgent and in some ways less important now. We don't seem to see nearly as many unexplained crash stacks these days, and it feels like we have significantly fewer test crashes than we did when we opened this bug.

It occurs to me that it would be easier to just copy /data/tombstones files to MOZ_UPLOAD_DIR and not worry about running ndk-stack at test time -- leave it to the person investigating the crash to run ndk-stack on the tombstone if desired. What do you think?
Flags: needinfo?(jmaher)
that is a great idea- unless there are permissions issues getting the files from the device it should be simple.  We would have to adjust the whitelist of file extensions in blobber upload to make that work.

great way to think out of the box:)
Flags: needinfo?(jmaher)
Let's do that then: Filed bug 1042097.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.