Closed Bug 942290 Opened 8 years ago Closed 7 years ago

Breakpad should handle libraries with DWARF on some functions and ARM EXIDX entries on others

Categories

(Toolkit :: Crash Reporting, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32
tracking-b2g backlog

People

(Reporter: jld, Assigned: jld)

Details

Attachments

(1 file, 1 obsolete file)

The Android/ARM libc has system call stubs with annotations for exception handling info, so they appear in the exception index but don't have DWARF frame info.  Because the unstripped libc binary has DWARF info for its C code, we use that and don't consider the ARM exception info at all (or, apparently, the normal symbol table — there's no FUNC entry in the Breakpad file for the asm stubs, but they have ELF symbols with appropriate sizes and types).

This is relevant to threads blocked in a syscall at the time of crash (the stack scan might work, or might give bad information), and to bug 941375 (handling sandbox violations like any other crash, at least in testing).

So, we should merge the available information on a per-symbol basis.  gdb presumably already does something similar, because it's not confused in this case.
This seems to work — it uses the exidx entry for a function if the existing StackFrameEntry objects (from DWARF) don't fully cover it.

Example: https://pastebin.mozilla.org/4113532

We're still missing the name (and source location) of the libc function involved, because breakpad doesn't fall back to the information in the ELF symbol table in the absence of a DW_TAG_subprogram, but that's typically going to be obvious from its call site.  The important thing is to avoid spurious frames as seen in https://pastebin.mozilla.org/4113618 (the same crash without this patch).
Assignee: nobody → jld
Status: NEW → ASSIGNED
Attachment #8366117 - Flags: review?(ted)
This bug caused some confusion in analyzing bug 986250, and it could have been worse (the stack scan seems to have actually done the right thing there, which it doesn't always), so I think we might want it for 1.4.
blocking-b2g: --- → 1.4?
Jed,

Please request approval on the same.
Flags: needinfo?(jld)
blocking-b2g: 1.4? → backlog
Clearing needinfo for now, as there's not much point to requesting approval on an unreviewed patch.
Flags: needinfo?(jld)
Comment on attachment 8366117 [details] [diff] [review]
bug942290-breakpad-ehabi-merge-hg0.diff

Review of attachment 8366117 [details] [diff] [review]:
-----------------------------------------------------------------

Overall this looks good. Note that none of this code is currently upstream, which is a bit of a mess. If you could add your patch to toolkit/crashreporter/breakpad-patches/ that would help me not forget to get it upstreamed. I think I'm going to drop a bunch of our other local patches since the profiler isn't using them anymore, but if this is important to you I'll get the exidx parser and this patch upstream.

::: toolkit/crashreporter/google-breakpad/src/common/arm_ex_to_module.cc
@@ +161,5 @@
>    }
>    return ret;
>  }
>  
> +bool ARMExToModule::AddStackFrame(uintptr_t addr, size_t size) {

It feels like you might want to put this in a different function instead, like "bool HaveStackInfoCovering(...)". The way this function is working now is a little weird.
Attachment #8366117 - Flags: review?(ted) → review+
Thanks for the review.  Updated, with stack-frame-entry coverage test factored out as suggested; carrying over r+.
Attachment #8366117 - Attachment is obsolete: true
Attachment #8425107 - Flags: review+
(The patch is also updated to contain a copy of itself.)
https://hg.mozilla.org/mozilla-central/rev/1e2c1524892f
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla32
(In reply to Jed Davis [:jld] from comment #8)
> https://tbpl.mozilla.org/?tree=Try&rev=686540ed8cdf

For the record (and because I'm not sure how well the automated tests cover this), I also tested locally by backing out the workaround for bug 906996 and provoking it, with and without the final patch.
Self-needinfo to ask for 1.4 uplift once it seems stable.  I can also ask for aurora/beta, but it's more valuable on b2g because of sandboxing.  But I'll wait until Fennec/B2G nightlies happen and crash-stats can give me a before/after to point at.
Flags: needinfo?(jld)
(In reply to Jed Davis [:jld] from comment #12)
> Self-needinfo to ask for 1.4 uplift once it seems stable.  I can also ask
> for aurora/beta, but it's more valuable on b2g because of sandboxing.  But
> I'll wait until Fennec/B2G nightlies happen and crash-stats can give me a
> before/after to point at.

This might be difficult, because any B2G build that doesn't have this patch and does have the same libc.so (up to the first 4096 bytes of .text, because we don't have real build IDs, apparently?) as one that does will overwrite the symbols if it was more recently uploaded when the crash is submitted.  Empirically, I haven't found any crashes on crash-stats where it actually worked.

See also: https://crash-stats.mozilla.com/search/?product=B2G&build_id=%3E20140521000000&platform=Android&version=32.0a1

Also, Fennec need not apply, because we don't (I think?) have any breakpad symbols for libc there.
I'd like to get this uplifted to 1.4 if possible.  The patch applies as-is, and I've tested it locally to verify that it works.  Also, I've started a try run: https://tbpl.mozilla.org/?tree=Try&rev=c306c9d90da5

This patch doesn't affect the code that's actually run on devices, only the tool that's used during the build to create the files used to analyze crashes.  Thus the risk of uplift is relatively small.

For what we gain from this: see comment #0; we'll get more reliable information about sandbox violations, and possibly about other crashes if there's something important about a thread that was blocked during the crash.

It's been on m-c for 10 days at the time of this writing, and no problems have been ascribed to it.

My understanding is that this only really relevant on b2g, because that's the only platform where we use our dump_syms to gather information from system libraries (and it's the assembly routines for syscalls in libc.so that are at issue here).  Also, this patch only affects ARM platforms in any case.  Therefore, I haven't also requested uplift to aurora or beta.
blocking-b2g: backlog → 1.4?
Flags: needinfo?(jld)
Triage will resolve via email and post results here
This should be totally safe to branch-land. As Jed said, this doesn't affect shipping code, just the tools we use to dump symbols.
Can we wait and take this in 2.0 instead? I am concerned about having too much already in 1.4.
Flags: needinfo?(ted)
That question is for Jed.
Flags: needinfo?(ted) → needinfo?(jld)
blocking-b2g: 1.4? → backlog
The response to comment #17 already exists as comment #16.  But if Release Management doesn't want the patch, I'm not going to fight over it.
Flags: needinfo?(jld)
blocking-b2g: backlog → ---
You need to log in before you can comment on or make changes to this bug.