Closed
Bug 697301
Opened 13 years ago
Closed 11 years ago
all Android crashes with mozalloc_abort at the top of stack have garbled stacks
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
RESOLVED
WORKSFORME
mozilla12
People
(Reporter: dbaron, Assigned: glandium)
Details
Attachments
(1 file)
1023 bytes,
patch
|
cjones
:
review+
|
Details | Diff | Splinter Review |
After looking at bug 696906, I looked through the Fennec top crashes for Aurora/9.0a2 Fennec here: https://crash-stats.mozilla.com/topcrasher/byversion/Fennec/9.0a2/7 It looks to me like all of the crashes with signatures of the form "mozalloc_abort | ..." for various values of "..." have garbled stacks, such that it's impossible to tell what's actually going on. One example for each of the five crashes I looked at: bp-d12bf4bd-6527-4083-990a-76a1f2111019 bp-4d601c3a-d6e3-4a2d-a217-c953f2111023 bp-2eb14835-7cf7-407d-b365-5060b2111025 bp-9fd15a0a-5540-420d-9f9c-ac5232111023 bp-d86e8664-9241-4c9b-a8e1-1e55e2111020 These stacks all look useless: the caller of mozalloc_abort isn't something that would call it, and in many cases (e.g., the first) there are other chains of functions that clearly can't call each other. I looked around at some other crashes, and there clearly are some crashes where we are getting useful crash stacks, such as these: bp-d45a835a-f524-442c-9512-75ae12111019 bp-342b0ff7-38bc-42d7-b041-5581e2111024 bp-f4c3794f-59a7-4f56-8240-75ea42111023 I'm not sure why the mozalloc_abort ones are different, but it seems like there's something wrong with the stack walking for Android/ARM.
Note: the crashes with : Java_org_mozilla_gecko_GeckoAppShell_reportJavaCrash are java crashes and should have "Java Signature" like bug 679176. There's a bug to have Socorro report those : ( bug 686973 ) I am unsure if there is an issue with breakpad and virtual methods ( https://crash-stats.mozilla.com/report/index/d12bf4bd-6527-4083-990a-76a1f2111019 ). I hope to find some sort of STR for this particular case.
Reporter | ||
Comment 2•13 years ago
|
||
I don't think getting steps to reproduce is critical here: we have raw crash dumps to debug on the crash-stats server, the problem lies in converting those raw crash dumps to stack traces, which is code that it should be possible to debug entirely with data we already have.
Comment 3•13 years ago
|
||
The problem here is that we don't have symbols for libc in the crashes you point out. Frame 1 (in libc) is just __libc_android_abort, but the stack walker can't reliably get past that without symbols. This is why I put together my Android Symbol Sender extension: https://addons.mozilla.org/en-US/mobile/addon/android-symbol-sender/ The only other idea I had to make stacks more reliable was to do the stack walking client-side, since on ARM and other architectures like x86-64 all the stack unwind info for all libraries is present on the client. That's filed as bug 650239.
Assignee | ||
Comment 4•12 years ago
|
||
The problem is actually much worse. As the compiler knows mozalloc_abort doesn't return, it doesn't care about keeping the return address, and as such, lr is just garbage and there is no way to guess it from the stack.
Assignee | ||
Comment 5•12 years ago
|
||
For what it's worth, x86 and x64 are apparently safe on most platforms. I validated on OSX 32-bits, Win32, Linux and Linux64 (we apparently don't run xpcshell tests on win64 try), by trying to allocate ~4GB memory on 32-bits builds and 42GB on 64-bits builds with moz_xmalloc. The stack trace was useful in all the mentioned platforms. Which makes ARM the only one affected. Interestingly, moz_xmalloc(42GB) didn't trigger mozalloc_abort on OSX 64-bits. I think we should file a bug for that. I think a fix/workaround would be to force TouchBadMemory not to be inlined. My original idea was to remove MOZ_NORETURN from the mozalloc_abort definition, but that's probably going to affect optimizations in its callers, while well, once we're in mozalloc_abort, we don't care if it itself is optimized to the best.
Assignee | ||
Comment 6•12 years ago
|
||
Attachment #587376 -
Flags: review?(jones.chris.g)
Updated•12 years ago
|
Attachment #587376 -
Flags: review?(jones.chris.g) → review+
Assignee | ||
Comment 7•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/9f00bf6379a6
Whiteboard: [inbound]
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → mh+mozilla
Comment 8•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/9f00bf6379a6
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [inbound]
Target Milestone: --- → mozilla12
Assignee | ||
Updated•12 years ago
|
Assignee | ||
Comment 9•12 years ago
|
||
I'm afraid this might not be enough :-/
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 10•11 years ago
|
||
I think some other bugs made this better. It's probably not worth keeping this one open anymore. If we spot new problems, we'll file new bugs.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•