Closed Bug 1045804 Opened 10 years ago Closed 9 years ago

Android 4.0 crashes rarely have usable stacks

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: RyanVM, Assigned: gbrown)

References

Details

Crashes like the logs below are pretty common and getting to the point where they're basically being ignored due to how unactionable they are.

https://tbpl.mozilla.org/php/getParsedLog.php?id=44811866&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=44825017&tree=Fx-Team

Or it'll be stacks with a crash at some random address like 0x6161616a as the top frame.

Can we please fix this?
I think we have looked into this a few times, but found no resolution.

For:

09:55:35  WARNING -  PROCESS-CRASH | /tests/dom/indexedDB/test/test_open_objectStore.html | application crashed [Unknown top frame]
09:55:35     INFO -  Crash dump filename: /tmp/tmpJ2rphp/3aad24d1-345c-2f25-1d0a2a65-6d4d4883.dmp
09:55:35     INFO -  stderr from minidump_stackwalk:
09:55:35     INFO -  2014-07-29 09:55:35: minidump_processor.cc:264: INFO: Processing minidump in file /tmp/tmpJ2rphp/3aad24d1-345c-2f25-1d0a2a65-6d4d4883.dmp
09:55:35     INFO -  2014-07-29 09:55:35: minidump.cc:3815: INFO: Minidump opened minidump /tmp/tmpJ2rphp/3aad24d1-345c-2f25-1d0a2a65-6d4d4883.dmp
09:55:35     INFO -  2014-07-29 09:55:35: minidump.cc:3847: ERROR: Minidump header signature mismatch: (0x0, 0x0) != 0x504d444d
09:55:35     INFO -  2014-07-29 09:55:35: minidump_processor.cc:268: ERROR: Minidump /tmp/tmpJ2rphp/3aad24d1-345c-2f25-1d0a2a65-6d4d4883.dmp could not be read
09:55:35     INFO -  2014-07-29 09:55:35: minidump.cc:3787: INFO: Minidump closing minidump
09:55:35     INFO -  2014-07-29 09:55:35: minidump_stackwalk.cc:529: ERROR: MinidumpProcessor::Process failed
09:55:35     INFO -  minidump_stackwalk exited with return code 1

The .dmp file is at mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/253ae290016d19f1a076ab7266a4c1b90d0fb09e5798d39b20d560763315068939741be45ff32adf6cc4d514b6cb3281927b1eaf3b5f16235b583c15ed22f68d and starts with:

0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
0000020 0003 0000 0994 0000 00c0 0000 0004 0000
0000030 32a4 0000 84a8 0003 0005 0000 0374 0000
0000040 7198 0005 0006 0000 00a8 0000 7510 0005
0000050 0000 0000 0000 0000 0000 0000 0000 0000
*
00000c0 0033 0000 08c3 0000 0000 0000 0000 0000
00000d0 0000 0000 0000 0000 0000 0000 7000 bef0
00000e0 0000 0000 1000 0000 0a58 0000 0170 0000

As I recall, :ted has looked at these too.

:blassey -- any ideas?
Flags: needinfo?(blassey.bugs)
See Also: → 991455
Ted and Jim are more likely to know what's going on here than me.
Flags: needinfo?(ted)
Flags: needinfo?(nchen)
Flags: needinfo?(blassey.bugs)
Looks like for some reason, the header in the minidump (the first 32 bytes) got replaced with all zeros. If I replace the header with a sensible one, minidump_stackwalk has no trouble processing it and gives this stack for the minidump in comment 1,

> Thread 13 (crashed)
>  0  libxul.so + 0x1a28f16
>     Found by: given as instruction pointer in context
>  1  libxul.so + 0x1a28f9f
>     Found by: call frame info
>  2  libxul.so!js::gc::Cell::isAligned() const [Heap.h:ac8248c5b891 : 592 + 0x7]
>     Found by: stack scanning
>  3  libxul.so!CheckMarkedThing<JSObject> + 0xbb
>     Found by: call frame info
>  4  libxul.so!js::gc::MarkObjectRange(JSTracer*, unsigned int, js::HeapPtr<JSObject*>*, char const*) [Marking.cpp:ac8248c5b891 : 232 + 0x3]
>     Found by: call frame info
>  5  libxul.so!JSScript::markChildren(JSTracer*) [jsscript.cpp:ac8248c5b891 : 3335 + 0xd]
>     Found by: call frame info
>  6  libxul.so!MarkInternal<JSScript> [Marking.cpp:ac8248c5b891 : 270 + 0xb]
>     Found by: call frame info
>  7  libxul.so!JSFunction::trace(JSTracer*) [jsfun.cpp:ac8248c5b891 : 589 + 0xd]
>     Found by: call frame info
>  8  libxul.so!js::GCMarker::processMarkStackTop(js::SliceBudget&) [Marking.cpp:ac8248c5b891 : 1698 + 0x5]
>     Found by: call frame info
>  9  libxul.so!js::GCMarker::drainMarkStack(js::SliceBudget&) [Marking.cpp:ac8248c5b891 : 1749 + 0x7]
>     Found by: call frame info
> 10  libxul.so!js::gc::GCRuntime::drainMarkStack(js::SliceBudget&, js::gcstats::Phase) [jsgc.cpp:ac8248c5b891 : 4331 + 0x3]
>     Found by: call frame info
> 11  libxul.so!js::gc::GCRuntime::incrementalCollectSlice(long long, JS::gcreason::Reason, js::JSGCInvocationKind) [jsgc.cpp:ac8248c5b891 : 4839 + 0xb]
>     Found by: call frame info
> 12  libxul.so!js::gc::GCRuntime::gcCycle(bool, long long, js::JSGCInvocationKind, JS::gcreason::Reason) [jsgc.cpp:ac8248c5b891 : 5047 + 0x9]
>     Found by: call frame info
> 13  libxul.so!js::gc::GCRuntime::collect(bool, long long, js::JSGCInvocationKind, JS::gcreason::Reason) [jsgc.cpp:ac8248c5b891 : 5174 + 0x11]
>     Found by: call frame info
> 14  libxul.so!JS::IncrementalGC(JSRuntime*, JS::gcreason::Reason, long long) [jsgc.cpp:ac8248c5b891 : 5222 + 0xd]
>     Found by: call frame info
> 15  libxul.so!nsJSContext::GarbageCollectNow(JS::gcreason::Reason, nsJSContext::IsIncremental, nsJSContext::IsShrinking, long long) [nsJSEnvironment.cpp:ac8248c5b891 : 1747 + 0xf]
>     Found by: call frame info
> 16  libxul.so!InterSliceGCTimerFired(nsITimer*, void*) [nsJSEnvironment.cpp:ac8248c5b891 : 2234 + 0x11]
>     Found by: call frame info
> 17  libxul.so!nsTimerImpl::Fire() [nsTimerImpl.cpp:ac8248c5b891 : 618 + 0x5]
>     Found by: call frame info
> 18  libxul.so!nsTimerEvent::Run() [nsTimerImpl.cpp:ac8248c5b891 : 711 + 0x9]
>     Found by: call frame info
> 19  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:ac8248c5b891 : 770 + 0xb]
>     Found by: call frame info
Flags: needinfo?(nchen)
Looking deeper at the minidump, it appears breakpad stops writing the minidump somewhere after this point [1]. Because it flushes the header to disk at the very end, we're left with a minidump that's missing its header and a lot of data. No idea why this happens though.

[1] http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc?rev=aa176fcc56b8#481
I can think of a couple of possibilities:
1) Test harness kills the browser before it's done writing the minidump.
2) Test harness pulls the minidump off of the device before it's fully written.
3) Some part of the Java app kills the process before it's done writing.
4) Something in the minidump writer code crashes after partially writing data.
Flags: needinfo?(ted)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #5)
> I can think of a couple of possibilities:
> 1) Test harness kills the browser before it's done writing the minidump.
> 2) Test harness pulls the minidump off of the device before it's fully
> written.

Geoff, can we try to see if either of these is the case?  Perhaps we can add a 15s (?) sleep before killing or pulling the minidump, to see if that makes any difference, and if it does, we could go about adding a more deterministic method for checking this.

Probably the "right" way to do this is to check the file size and verify it's stable over X seconds before killing the browser or pulling the minidump.  I'm not sure what X should be, though.
Flags: needinfo?(gbrown)
Assignee: nobody → gbrown
Flags: needinfo?(gbrown)
To determine if changes make a difference, I first tried to establish a base line by intentionally crashing and making no changes to minidump handling: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c010d423ae2a. The first 20+ crashes on both Android 2.3 and Android 4.0 have produced perfect stacks. I'll try to find a different way to crash that reproduces the problem.
I had better luck reproducing bad dumps with a Robocop hang - sleep in Java, in the Robotium test thread: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9d39b9c023e0. Most 2.3 dumps are good; most 4.0 dumps are not.

Same thing, with longer waits during the "staged shutdown" that follows a test timeout: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=2ea066f71df2
(In reply to Geoff Brown [:gbrown] from comment #8)
> I had better luck reproducing bad dumps with a Robocop hang - sleep in Java,
> in the Robotium test thread:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=9d39b9c023e0.
> Most 2.3 dumps are good; most 4.0 dumps are not.

Of 10 crashes, 7 failed, typically with "ERROR: Minidump header signature mismatch: (0x0, 0x0) != 0x504d444d" (as in Comment 3).

> Same thing, with longer waits during the "staged shutdown" that follows a
> test timeout:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=2ea066f71df2

Of 10 crashes, 7 failed, typically with "ERROR: Minidump header signature mismatch: (0x0, 0x0) != 0x504d444d".


In the normal "staged shutdown" following a test timeout, the harness does this:
  kill -3 (to generate ANR)
  wait 3 seconds
  kill -6 (to generate minidump and abort)
  wait up to 15 seconds (until process dies)
  kill -9
  retrieve minidump

In my try run with longer waits, we:
  kill -3
  wait 30 seconds
  kill -6
  wait up to 45 seconds
  kill -9
  wait 30 seconds
  retrieve minidump

For this type of "crash", waiting longer in the test harness does not help, suggesting that Ted's 1) and 2) possibilities (Comment 5) are not to blame.
A different example I happened upon: 2.3 robocop crash on shutdown: http://ftp.mozilla.org/pub/mozilla.org/mobile/try-builds/gbrown@mozilla.com-3a577e72d84d/try-android/try_ubuntu64_vm_mobile_test-robocop-1-bm118-tests1-linux64-build489.txt.gz

14:53:10     INFO -  mozcrash Downloading symbols from: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/gbrown@mozilla.com-3a577e72d84d/try-android/fennec-36.0a1.en-US.android-arm.crashreporter-symbols.zip
14:53:10     INFO -  mozcrash Saved minidump as /builds/slave/test/build/blobber_upload_dir/405f6533-aa16-6b57-7b30b7af-12dfed05.dmp
14:53:10  WARNING -  PROCESS-CRASH | testGeckoProfile | application crashed [None]
14:53:10     INFO -  Crash dump filename: /tmp/tmpBpWV3L/405f6533-aa16-6b57-7b30b7af-12dfed05.dmp
14:53:10     INFO -  stderr from minidump_stackwalk:
14:53:10     INFO -  2014-11-04 14:12:39: minidump_processor.cc:264: INFO: Processing minidump in file /tmp/tmpBpWV3L/405f6533-aa16-6b57-7b30b7af-12dfed05.dmp
14:53:10     INFO -  2014-11-04 14:12:39: minidump.cc:3815: INFO: Minidump opened minidump /tmp/tmpBpWV3L/405f6533-aa16-6b57-7b30b7af-12dfed05.dmp
14:53:10     INFO -  2014-11-04 14:12:39: minidump.cc:3860: INFO: Minidump not byte-swapping minidump
14:53:10     INFO -  2014-11-04 14:12:39: minidump.cc:4226: INFO: GetStream: type 7 not present
14:53:10     INFO -  2014-11-04 14:12:39: minidump.cc:4226: INFO: GetStream: type 7 not present
14:53:10     INFO -  2014-11-04 14:12:39: minidump.cc:4226: INFO: GetStream: type 1197932545 not present
14:53:10     INFO -  2014-11-04 14:12:39: minidump.cc:4226: INFO: GetStream: type 6 not present
14:53:10     INFO -  2014-11-04 14:12:39: minidump.cc:4226: INFO: GetStream: type 1197932546 not present
14:53:10     INFO -  2014-11-04 14:12:39: minidump.cc:4226: INFO: GetStream: type 4 not present
14:53:10     INFO -  2014-11-04 14:12:39: minidump.cc:4226: INFO: GetStream: type 3 not present
14:53:10     INFO -  2014-11-04 14:12:39: minidump_processor.cc:112: ERROR: Minidump /tmp/tmpBpWV3L/405f6533-aa16-6b57-7b30b7af-12dfed05.dmp has no thread list
14:53:10     INFO -  2014-11-04 14:12:39: minidump.cc:3787: INFO: Minidump closing minidump
14:53:10     INFO -  2014-11-04 14:12:39: minidump_stackwalk.cc:529: ERROR: MinidumpProcessor::Process failed
I tried eliminating the kill -3, in case ANR generation was interfering with the minidump somehow -- but it did not help.

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=1437d94540e8
(In reply to Jim Chen [:jchen] from comment #4)
> Looking deeper at the minidump, it appears breakpad stops writing the
> minidump somewhere after this point [1]. Because it flushes the header to
> disk at the very end, we're left with a minidump that's missing its header
> and a lot of data. No idea why this happens though.
> 
> [1]
> http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-
> breakpad/src/client/linux/minidump_writer/minidump_writer.
> cc?rev=aa176fcc56b8#481

Thanks Jim. Using my robocop hang (Comment 8), I've been able to reproduce and narrow it down to the /proc/cpuinfo parsing:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc?rev=aa176fcc56b8#1242

Still working on it...
We are crashing at http://hg.mozilla.org/mozilla-central/annotate/c0d559389a5c/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc#l1258:

1258          if (!my_strncmp(line, entry->info_name, strlen(entry->info_name))) {

on the first (only) entry. If I try to call strlen(entry->info_name) at this point, that crashes too. I thought it might be better to call my_strlen instead -- but that crashes just the same. Any attempt to access the content of entry->info_name meets the same fate.

Since entry should be a pointer to cpu_info_table, I tried to access cpu_info_table[0].info_name[0] immediately after it is initialized at http://hg.mozilla.org/mozilla-central/annotate/c0d559389a5c/toolkit/crashreporter/google-breakpad/src/client/linux/minidump_writer/minidump_writer.cc#l1222  -- that crashes too.

1217 struct CpuInfoEntry {
1218   const char* info_name;
1219   int value;
1220   bool found;
1221 } cpu_info_table[] = {
1222   { "processor", -1, false },
1223 #if defined(__i386) || defined(__x86_64)
1224   { "model", 0, false },
1225   { "stepping", 0, false },
1226   { "cpu family", 0, false },
1227 #endif
1228 };

So...stack corruption?

I notice that much of this code was re-written in http://code.google.com/p/google-breakpad/source/detail?r=1160 -- I wonder if an update would help.
It's likely to just be crappy parsing putting bad data in there. If you want to just try pulling that patch in (or pulling in the code wholesale, whatever is easiest), feel free.
I noticed that this crash only happens when breakpad does NOT install an alternate stack for signal handlers at:

http://hg.mozilla.org/mozilla-central/annotate/134d1cfc5c9c/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc#l151

Breadpad does not install an alternate stack when elfloader has already installed one:

http://hg.mozilla.org/mozilla-central/annotate/134d1cfc5c9c/mozglue/linker/ElfLoader.cpp#l1141

On Pandas, the elfloader alternate stack is frequently not installed because the signalHandlingSlow flag is set.

If !signalHandlingSlow, elfloader installs an alternate stack (with size 12K), then breakpad does not install an alternate stack, and (for reasons unknown), the first access to local variable cpu_info_table[0].info_name crashes.

I tried increasing the size of the elfloader alternate stack to 20K -- crashes continued.
I tried increasing the required size of the breakpad alternate stack to 16K (so that breakpad installed a new alternate stack, even when elfloader already installed one) -- crashes continued.
I tried forcing signalHandlingSlow = 1 -- no crashes.
:ryanvm pointed out this recent no-stack failure, where a stack would be useful:

http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android-api-11/1420644839/mozilla-inbound_panda_android_test-mochitest-5-bm102-tests1-panda-build5221.txt.gz

This is an Android 4.0 hang during a mochitest and looks just like the robocop hang that I was using as a test case.
I suspect Robocop shutdown has been to blame for some of the bad crash reports.

Example:

http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-aurora-android-api-9/1421518439/mozilla-aurora_ubuntu64_vm_mobile_test-robocop-2-bm117-tests1-linux64-build20.txt.gz

11:53:48     INFO -  TEST-OK | testPrefsObserver | took 55099ms
11:53:48     INFO -  TEST-START | Shutdown
11:53:48     INFO -  Passed: 20
11:53:48     INFO -  Failed: 0
11:53:48     INFO -  Todo: 0
11:53:48     INFO -  SimpleTest FINISHED
11:53:48     INFO -  INFO | automation.py | Application ran for: 0:01:29.325080
11:53:48     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpkT7k1Opidlog
11:53:48     INFO -  Contents of /data/anr/traces.txt:
11:53:48     INFO -  
11:53:48     INFO -  
11:53:48     INFO -  /data/tombstones does not exist; tombstone check skipped
11:53:48     INFO -  mozcrash Downloading symbols from: https://ftp-ssl.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-aurora-android-api-9/1421518439/fennec-37.0a2.en-US.android-arm.crashreporter-symbols.zip
11:53:48     INFO -  mozcrash Saved minidump as /builds/slave/test/build/blobber_upload_dir/754bb5b0-3d4f-509c-41c75410-38eba852.dmp
11:53:48  WARNING -  PROCESS-CRASH | testPrefsObserver | application crashed [None]
11:53:48     INFO -  Crash dump filename: /tmp/tmp_qrkPz/754bb5b0-3d4f-509c-41c75410-38eba852.dmp
11:53:48     INFO -  stderr from minidump_stackwalk:
11:53:48     INFO -  2015-01-17 11:40:22: minidump_processor.cc:264: INFO: Processing minidump in file /tmp/tmp_qrkPz/754bb5b0-3d4f-509c-41c75410-38eba852.dmp
11:53:48     INFO -  2015-01-17 11:40:22: minidump.cc:3815: INFO: Minidump opened minidump /tmp/tmp_qrkPz/754bb5b0-3d4f-509c-41c75410-38eba852.dmp
11:53:48     INFO -  2015-01-17 11:40:22: minidump.cc:3860: INFO: Minidump not byte-swapping minidump
11:53:48     INFO -  2015-01-17 11:40:22: minidump.cc:4226: INFO: GetStream: type 7 not present
11:53:48     INFO -  2015-01-17 11:40:22: minidump.cc:4226: INFO: GetStream: type 7 not present
11:53:48     INFO -  2015-01-17 11:40:22: minidump.cc:4226: INFO: GetStream: type 1197932545 not present
11:53:48     INFO -  2015-01-17 11:40:22: minidump.cc:4226: INFO: GetStream: type 6 not present
11:53:48     INFO -  2015-01-17 11:40:22: minidump.cc:4226: INFO: GetStream: type 1197932546 not present
11:53:48     INFO -  2015-01-17 11:40:22: minidump.cc:4226: INFO: GetStream: type 4 not present
11:53:48     INFO -  2015-01-17 11:40:22: minidump.cc:4226: INFO: GetStream: type 3 not present
11:53:48     INFO -  2015-01-17 11:40:22: minidump_processor.cc:112: ERROR: Minidump /tmp/tmp_qrkPz/754bb5b0-3d4f-509c-41c75410-38eba852.dmp has no thread list
11:53:48     INFO -  2015-01-17 11:40:22: minidump.cc:3787: INFO: Minidump closing minidump
11:53:48     INFO -  2015-01-17 11:40:22: minidump_stackwalk.cc:529: ERROR: MinidumpProcessor::Process failed

I am not sure, but I think robocop forced the process to exit while the minidump was being written.


Also consider this common Robocop shutdown crash:

http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-aurora-android-api-9/1421446979/mozilla-aurora_ubuntu64_vm_mobile_test-robocop-2-bm68-tests1-linux64-build2.txt.gz

16:24:45     INFO -  TEST-OK | testInputUrlBar | took 90295ms
16:24:45     INFO -  TEST-START | Shutdown
16:24:45     INFO -  Passed: 28
16:24:45     INFO -  Failed: 0
16:24:45     INFO -  Todo: 0
16:24:45     INFO -  SimpleTest FINISHED
16:24:45     INFO -  INFO | automation.py | Application ran for: 0:02:08.156232
16:24:45     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpsffuhfpidlog
16:24:45     INFO -  Contents of /data/anr/traces.txt:
16:24:45     INFO -  
16:24:45     INFO -  
16:24:45     INFO -  /data/tombstones does not exist; tombstone check skipped
16:24:45     INFO -  mozcrash Downloading symbols from: https://ftp-ssl.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-aurora-android-api-9/1421446979/fennec-37.0a2.en-US.android-arm.crashreporter-symbols.zip
16:24:45     INFO -  mozcrash Saved minidump as /builds/slave/test/build/blobber_upload_dir/7e9f24ca-e761-e646-43d8c972-7c28af0c.dmp
16:24:45     INFO -  mozcrash Saved app info as /builds/slave/test/build/blobber_upload_dir/7e9f24ca-e761-e646-43d8c972-7c28af0c.extra
16:24:45  WARNING -  PROCESS-CRASH | testInputUrlBar | application crashed [@ libui.so + 0x1befe]
16:24:45     INFO -  Crash dump filename: /tmp/tmpQCz9BW/7e9f24ca-e761-e646-43d8c972-7c28af0c.dmp
16:24:45     INFO -  Operating system: Android
16:24:45     INFO -                    0.0.0 Linux 2.6.29-ge3d684d #1 Mon Dec 16 22:26:51 UTC 2013 armv7l generic/sdk/generic:2.3.7/GINGERBREAD/eng.ubuntu.20140123.014351:eng/test-keys
16:24:45     INFO -  CPU: arm
16:24:45     INFO -       0 CPUs
16:24:45     INFO -  
16:24:45     INFO -  Crash reason:  SIGSEGV
16:24:45     INFO -  Crash address: 0x2
16:24:45     INFO -  
16:24:45     INFO -  Thread 0 (crashed)
16:24:45     INFO -   0  libui.so + 0x1befe
16:24:45     INFO -       r4 = 0x0024ed30    r5 = 0x00000002    r6 = 0x00000001    r7 = 0x00000002
16:24:45     INFO -       r8 = 0xbeac5460    r9 = 0x4428ca78   r10 = 0x0000abe0    fp = 0xaca9f368
16:24:45     INFO -       sp = 0xbeac53e8    lr = 0xac712bfd    pc = 0xab91befe
16:24:45     INFO -      Found by: given as instruction pointer in context
16:24:45     INFO -   1  libsurfaceflinger_client.so + 0x1977a
16:24:45     INFO -       sp = 0xbeac53fc    pc = 0xac71977c
16:24:45     INFO -      Found by: stack scanning
16:24:45     INFO -   2  libsurfaceflinger_client.so + 0x12bfb
16:24:45     INFO -       sp = 0xbeac5400    pc = 0xac712bfd
16:24:45     INFO -      Found by: stack scanning
16:24:45     INFO -   3  dalvik-heap (deleted) + 0x5e7eee
16:24:45     INFO -       sp = 0xbeac540c    pc = 0x405f0ef0
16:24:45     INFO -      Found by: stack scanning

In this case, breakpad did its job and we produced a perfectly accurate crash dump, but the stack might be perceived as "unusable" since the crash happens in Android system libs.


Recent efforts in bug 1105388 seem to effectively eliminate Robocop shutdown crashes, addressing both of these cases indirectly.
(In reply to Treeherder Robot from comment #18)
> log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=8327010

Robocop shutdown crash:

21:14:07     INFO -  TEST-OK | testAddonManager | took 76896ms
21:14:07     INFO -  TEST-START | Shutdown
21:14:07     INFO -  Passed: 20
21:14:07     INFO -  Failed: 0
21:14:07     INFO -  Todo: 0
21:14:07     INFO -  SimpleTest FINISHED
21:14:27     INFO -  INFO | automation.py | Application ran for: 0:01:47.781898
21:14:27     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmp1kQORWpidlog
21:14:27     INFO -  Contents of /data/anr/traces.txt:
21:14:28     INFO -  mozcrash Downloading symbols from: https://ftp-ssl.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android-api-11/1427858680/fennec-40.0a1.en-US.android-arm.crashreporter-symbols.zip
21:14:30     INFO -  mozcrash Saved minidump as /builds/panda-0094/test/build/blobber_upload_dir/29e4d5b3-1d60-8aa3-4725dfd5-33e5b5fc.dmp
21:14:30     INFO -  mozcrash Saved app info as /builds/panda-0094/test/build/blobber_upload_dir/29e4d5b3-1d60-8aa3-4725dfd5-33e5b5fc.extra
21:14:30  WARNING -  PROCESS-CRASH | testAddonManager | application crashed [None]
21:14:30     INFO -  Crash dump filename: /tmp/tmpUhePxF/29e4d5b3-1d60-8aa3-4725dfd5-33e5b5fc.dmp
21:14:30     INFO -  stderr from minidump_stackwalk:
21:14:30     INFO -  2015-03-31 21:14:30: minidump_processor.cc:264: INFO: Processing minidump in file /tmp/tmpUhePxF/29e4d5b3-1d60-8aa3-4725dfd5-33e5b5fc.dmp
21:14:30     INFO -  2015-03-31 21:14:30: minidump.cc:3815: INFO: Minidump opened minidump /tmp/tmpUhePxF/29e4d5b3-1d60-8aa3-4725dfd5-33e5b5fc.dmp
21:14:30     INFO -  2015-03-31 21:14:30: minidump.cc:3847: ERROR: Minidump header signature mismatch: (0x0, 0x0) != 0x504d444d
21:14:30     INFO -  2015-03-31 21:14:30: minidump_processor.cc:268: ERROR: Minidump /tmp/tmpUhePxF/29e4d5b3-1d60-8aa3-4725dfd5-33e5b5fc.dmp could not be read
21:14:30     INFO -  2015-03-31 21:14:30: minidump.cc:3787: INFO: Minidump closing minidump
21:14:30     INFO -  2015-03-31 21:14:30: minidump_stackwalk.cc:529: ERROR: MinidumpProcessor::Process failed
(In reply to Treeherder Robot from comment #19)
> log:
> https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=8327061

Interesting!

21:43:08     INFO -  03-31 21:42:36.453 D/GeckoAppShell( 2088): Killing via System.exit()
21:43:08     INFO -  03-31 21:42:36.453 V/TabletStatusBar( 1500): setLightsOn(true)
21:43:08     INFO -  03-31 21:42:36.460 E/JavaBinder( 2088): Unknown binder error code. 0xfffffff7
21:43:08     INFO -  03-31 21:42:36.460 E/JavaBinder( 2088): Unknown binder error code. 0xfffffff7
21:43:08     INFO -  03-31 21:42:36.468 F/MOZ_CRASH( 2088): Hit MOZ_CRASH(Unexpected shutdown) at /builds/slave/m-in-and-api-11-d-000000000000/build/mozglue/linker/ElfLoader.cpp:514
21:43:08     INFO -  03-31 21:42:36.468 F/libc    ( 2088): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)

That's http://hg.mozilla.org/mozilla-central/annotate/cf8864126c58/mozglue/linker/ElfLoader.cpp#l514, added for bug 1127464. :snorp -- thoughts?
Flags: needinfo?(snorp)
Geoff do you know how GeckoAppShell.systemExit() is being called? Is it from an addon?

It looks like we're calling exit() without exiting the main loop. That's when that assert gets hit.
Flags: needinfo?(snorp)
I never did get around to following up on interesting findings like comments 13 and 15. 

I don't recall seeing any corrupt dumps for 4.0 (or elsewhere) for months now. And now that Android 4.0 tests are practically eliminated, wontfix seems the sensible resolution here.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.