Closed
Bug 942290
Opened 12 years ago
Closed 11 years ago
Breakpad should handle libraries with DWARF on some functions and ARM EXIDX entries on others
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
People
(Reporter: jld, Assigned: jld)
Details
Attachments
(1 file, 1 obsolete file)
|
10.26 KB,
patch
|
jld
:
review+
|
Details | Diff | Splinter Review |
The Android/ARM libc has system call stubs with annotations for exception handling info, so they appear in the exception index but don't have DWARF frame info. Because the unstripped libc binary has DWARF info for its C code, we use that and don't consider the ARM exception info at all (or, apparently, the normal symbol table — there's no FUNC entry in the Breakpad file for the asm stubs, but they have ELF symbols with appropriate sizes and types).
This is relevant to threads blocked in a syscall at the time of crash (the stack scan might work, or might give bad information), and to bug 941375 (handling sandbox violations like any other crash, at least in testing).
So, we should merge the available information on a per-symbol basis. gdb presumably already does something similar, because it's not confused in this case.
| Assignee | ||
Comment 1•12 years ago
|
||
This seems to work — it uses the exidx entry for a function if the existing StackFrameEntry objects (from DWARF) don't fully cover it.
Example: https://pastebin.mozilla.org/4113532
We're still missing the name (and source location) of the libc function involved, because breakpad doesn't fall back to the information in the ELF symbol table in the absence of a DW_TAG_subprogram, but that's typically going to be obvious from its call site. The important thing is to avoid spurious frames as seen in https://pastebin.mozilla.org/4113618 (the same crash without this patch).
| Assignee | ||
Comment 2•12 years ago
|
||
This bug caused some confusion in analyzing bug 986250, and it could have been worse (the stack scan seems to have actually done the right thing there, which it doesn't always), so I think we might want it for 1.4.
blocking-b2g: --- → 1.4?
Updated•12 years ago
|
blocking-b2g: 1.4? → backlog
| Assignee | ||
Comment 4•12 years ago
|
||
Clearing needinfo for now, as there's not much point to requesting approval on an unreviewed patch.
Flags: needinfo?(jld)
Comment 5•11 years ago
|
||
Comment on attachment 8366117 [details] [diff] [review]
bug942290-breakpad-ehabi-merge-hg0.diff
Review of attachment 8366117 [details] [diff] [review]:
-----------------------------------------------------------------
Overall this looks good. Note that none of this code is currently upstream, which is a bit of a mess. If you could add your patch to toolkit/crashreporter/breakpad-patches/ that would help me not forget to get it upstreamed. I think I'm going to drop a bunch of our other local patches since the profiler isn't using them anymore, but if this is important to you I'll get the exidx parser and this patch upstream.
::: toolkit/crashreporter/google-breakpad/src/common/arm_ex_to_module.cc
@@ +161,5 @@
> }
> return ret;
> }
>
> +bool ARMExToModule::AddStackFrame(uintptr_t addr, size_t size) {
It feels like you might want to put this in a different function instead, like "bool HaveStackInfoCovering(...)". The way this function is working now is a little weird.
Attachment #8366117 -
Flags: review?(ted) → review+
| Assignee | ||
Comment 6•11 years ago
|
||
Thanks for the review. Updated, with stack-frame-entry coverage test factored out as suggested; carrying over r+.
Attachment #8366117 -
Attachment is obsolete: true
Attachment #8425107 -
Flags: review+
| Assignee | ||
Comment 7•11 years ago
|
||
(The patch is also updated to contain a copy of itself.)
| Assignee | ||
Comment 8•11 years ago
|
||
Keywords: checkin-needed
Comment 9•11 years ago
|
||
Keywords: checkin-needed
Whiteboard: [fixed-in-fx-team]
Comment 10•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla32
| Assignee | ||
Comment 11•11 years ago
|
||
(In reply to Jed Davis [:jld] from comment #8)
> https://tbpl.mozilla.org/?tree=Try&rev=686540ed8cdf
For the record (and because I'm not sure how well the automated tests cover this), I also tested locally by backing out the workaround for bug 906996 and provoking it, with and without the final patch.
| Assignee | ||
Comment 12•11 years ago
|
||
Self-needinfo to ask for 1.4 uplift once it seems stable. I can also ask for aurora/beta, but it's more valuable on b2g because of sandboxing. But I'll wait until Fennec/B2G nightlies happen and crash-stats can give me a before/after to point at.
Flags: needinfo?(jld)
| Assignee | ||
Comment 13•11 years ago
|
||
(In reply to Jed Davis [:jld] from comment #12)
> Self-needinfo to ask for 1.4 uplift once it seems stable. I can also ask
> for aurora/beta, but it's more valuable on b2g because of sandboxing. But
> I'll wait until Fennec/B2G nightlies happen and crash-stats can give me a
> before/after to point at.
This might be difficult, because any B2G build that doesn't have this patch and does have the same libc.so (up to the first 4096 bytes of .text, because we don't have real build IDs, apparently?) as one that does will overwrite the symbols if it was more recently uploaded when the crash is submitted. Empirically, I haven't found any crashes on crash-stats where it actually worked.
See also: https://crash-stats.mozilla.com/search/?product=B2G&build_id=%3E20140521000000&platform=Android&version=32.0a1
Also, Fennec need not apply, because we don't (I think?) have any breakpad symbols for libc there.
| Assignee | ||
Comment 14•11 years ago
|
||
I'd like to get this uplifted to 1.4 if possible. The patch applies as-is, and I've tested it locally to verify that it works. Also, I've started a try run: https://tbpl.mozilla.org/?tree=Try&rev=c306c9d90da5
This patch doesn't affect the code that's actually run on devices, only the tool that's used during the build to create the files used to analyze crashes. Thus the risk of uplift is relatively small.
For what we gain from this: see comment #0; we'll get more reliable information about sandbox violations, and possibly about other crashes if there's something important about a thread that was blocked during the crash.
It's been on m-c for 10 days at the time of this writing, and no problems have been ascribed to it.
My understanding is that this only really relevant on b2g, because that's the only platform where we use our dump_syms to gather information from system libraries (and it's the assembly routines for syscalls in libc.so that are at issue here). Also, this patch only affects ARM platforms in any case. Therefore, I haven't also requested uplift to aurora or beta.
blocking-b2g: backlog → 1.4?
Flags: needinfo?(jld)
Comment 15•11 years ago
|
||
Triage will resolve via email and post results here
Comment 16•11 years ago
|
||
This should be totally safe to branch-land. As Jed said, this doesn't affect shipping code, just the tools we use to dump symbols.
Comment 17•11 years ago
|
||
Can we wait and take this in 2.0 instead? I am concerned about having too much already in 1.4.
Flags: needinfo?(ted)
Updated•11 years ago
|
blocking-b2g: 1.4? → backlog
| Assignee | ||
Comment 19•11 years ago
|
||
The response to comment #17 already exists as comment #16. But if Release Management doesn't want the patch, I'm not going to fight over it.
Flags: needinfo?(jld)
Updated•11 years ago
|
blocking-b2g: backlog → ---
tracking-b2g:
--- → backlog
You need to log in
before you can comment on or make changes to this bug.
Description
•