Closed Bug 1307153 Opened 8 years ago Closed 6 years ago

Add support for gathering client-side stack traces to Fennec

Categories

(Toolkit :: Crash Reporting, defect)

ARM
Android
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: gsvelto, Assigned: gsvelto, NeedInfo)

References

Details

(Whiteboard: [fce-active-legacy])

Attachments

(1 file, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1280477 +++

In bug 1280477 we'll be adding client-gathered stack traces for every main browser crash on all platforms save for Fennec. This bug is specifically to enable that functionality on Fennec too. The biggest difference is going to be how the minidump analyzer gets invoked since Fennec's crash reporter is an Android activity and not a regular program like on the other platforms.
The most straightforward thing to do here might be to link the analyzer code into a shared library on Android, and invoke it via JNI.
NI to track this
Flags: needinfo?(ioana.chiorean)
I started working on this now that the patch for bug 1428775 is ready.
Assignee: nobody → gsvelto
Status: NEW → ASSIGNED
This is already working fine but I'm waiting until bug 1407557 lands before asking for review.
The patch here applies on top of the one in bug 1407557 and adds client-side stack-walking to Fennec's crash pings. r?jchen for the Fennec parts and r?ted for the crashreporter ones. On the crashreporter side I took the liberty of cleaning up the minidump analyzer a bit by removing some redundant error checking as well as cleaning up the moz.build file so that platform-specific options are only used when truly needed.
Comment on attachment 8945760 [details]
Bug 1307153 - Add stack traces to the crash pings in Fennec; .mielczarek

https://reviewboard.mozilla.org/r/215872/#review221816

r+ for Android bits. How much does minidump-analyzer add to the libmozglue.so size?

::: mobile/android/base/java/org/mozilla/gecko/CrashReporter.java:159
(Diff revision 1)
>  
> +        // Compute the minidump hash and generate the stack traces
>          computeMinidumpHash(mPendingExtrasFile, mPendingMinidumpFile);
> +
> +        try {
> +            System.loadLibrary("mozglue");

You should use `GeckoLoader.loadMozGlue`
Attachment #8945760 - Flags: review?(nchen) → review+
Attachment #8945202 - Attachment is obsolete: true
(In reply to Jim Chen [:jchen] [:darchons] from comment #7)
> r+ for Android bits. How much does minidump-analyzer add to the
> libmozglue.so size?

Stripped size goes up from 831 KiB to 1444 KiB. If this increase is a problem we can hack breakpad to make it slimmer (for example by manually dropping support for non-ARM architectures from the breakpad processor) however I'd rather not do it because it would involve touching parts of breakpad we haven't forked.
Yeah that's a pretty significant increase. Is minidump-analyzer already in libxul? If so can we use the version from libxul?
(In reply to Jim Chen [:jchen] [:darchons] from comment #9)
> Yeah that's a pretty significant increase. Is minidump-analyzer already in
> libxul? If so can we use the version from libxul?

No, even on desktop Firefox it's built separately from libxul since we didn't want this extra code to live there as it's only needed after a crash has occurred. What would be an acceptable size increase? I'll see if it can be trimmed down in a not-too-invasive way. Alternatively this can live in its own library that's loaded only from the crash reporter activity. I've put it in mozglue because I noticed that all our JNI interfaces are implemented in that lib.
Jim, what would be an acceptable increase in size? Just so that I can check if it is feasible or not.
Flags: needinfo?(jcheng)
Oops, wrong NI
Flags: needinfo?(jcheng) → needinfo?(nchen)
There's not a hard limit, but more of a product decision. Can you measure the APK size increase as well? It should be smaller than the raw libmozglue size increase.

:snorp what do you think? (see comment 8)
Flags: needinfo?(nchen) → needinfo?(snorp)
I've made both 32- and 64-bit release builds for Android and measured the resulting APK file size, here are the results:

32-bit Android

* w/o patch applied  33147 KiB
* w/  patch applied  33320 KiB

64-bit Android

* w/o patch applied  34610 KiB
* w/  patch applied  34800 KiB

So the increase is 173 KiB for 32-bit and 190 KiB for 64-bit targets. The upside of this is that with this patch not only we'll get crash pings on Fennec as implemented in bug 1280477, but they'll contain a full stack trace for the crashing thread, the crash reason and the list of loaded modules. This should make identifying problems in Fennec much faster than just relying on crash-stats. Additionally we intend to automatically process crash pings to identify crash spikes, security-sensitive crashes and send automated alerts. This isn't ready yet but having this data means it will be possible to implement this kind of functionality.
Depends on: 1407557
No longer depends on: 1280477
Whiteboard: [fce-active] → [fce-active-legacy]
The APK size increase is not bad IMO, but I think it's still something product has to decide. In the meantime is it possible to enable this only for Nightly/Beta? We're mostly concerned about APK size for Release.
Flags: needinfo?(gsvelto)
(In reply to Jim Chen [:jchen] [:darchons] from comment #15)
> The APK size increase is not bad IMO, but I think it's still something
> product has to decide. In the meantime is it possible to enable this only
> for Nightly/Beta? We're mostly concerned about APK size for Release.

Yes, it's possible. In fact on desktop we first enabled this on nightly/beta for testing, then on release. Just let me know what would be the best course of action here.
Flags: needinfo?(gsvelto)
David - are you good to land this stuff? See comment 14 for more details.
Flags: needinfo?(dbolter)
I think landing this is the right call. Thanks!

I agree with Jim's comment 15. I'll needinfo Andreas (GV product manager) in case he'd take it on release too.
Flags: needinfo?(dbolter) → needinfo?(abovens)
Thanks everybody, Ted if you could review this I'd land it right away. We're finally getting crash telemetry from Fennec (see bug 1407557 comment 27) so I'm eager to reach feature parity with desktop.
Flags: needinfo?(ted)
Comment on attachment 8945760 [details]
Bug 1307153 - Add stack traces to the crash pings in Fennec; .mielczarek

https://reviewboard.mozilla.org/r/215872/#review229340

If jchen has signed off on the Android bits then the rest looks fine to me.

::: toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp:238
(Diff revision 1)
>  // Convert the process state to JSON, this includes information about the
>  // crash, the module list and stack traces for every thread
>  
>  static void
> -ConvertProcessStateToJSON(const ProcessState& aProcessState, Json::Value& aRoot)
> +ConvertProcessStateToJSON(const ProcessState& aProcessState, Json::Value& aRoot,
> +                          const bool aFullStacks)

I'm going to guess that you don't need the `const` here :)

::: toolkit/crashreporter/minidump-analyzer/moz.build:10
(Diff revision 1)
>  # file, You can obtain one at http://mozilla.org/MPL/2.0/.
>  
>  if CONFIG['OS_TARGET'] != 'Android':
>      Program('minidump-analyzer')
>  
> +    if CONFIG['OS_TARGET'] == 'Darwin':

You shouldn't need to stick all of these conditionals in here, should you? They're all testing the target platform anyway. (It's not a really big deal either way, though.)
Attachment #8945760 - Flags: review?(ted) → review+
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #20)
> Comment on attachment 8945760 [details]
> Bug 1307153 - Add stack traces to the crash pings in Fennec; .mielczarek
> 
> https://reviewboard.mozilla.org/r/215872/#review229340
> 
> If jchen has signed off on the Android bits then the rest looks fine to me.

Great, thanks!

> ::: toolkit/crashreporter/minidump-analyzer/minidump-analyzer.cpp:238
> (Diff revision 1)
> >  // Convert the process state to JSON, this includes information about the
> >  // crash, the module list and stack traces for every thread
> >  
> >  static void
> > -ConvertProcessStateToJSON(const ProcessState& aProcessState, Json::Value& aRoot)
> > +ConvertProcessStateToJSON(const ProcessState& aProcessState, Json::Value& aRoot,
> > +                          const bool aFullStacks)
> 
> I'm going to guess that you don't need the `const` here :)

No, it's a habit I've picked up by using Rust, now I want everything to be immutable unless it needs to be mutable :-P

> ::: toolkit/crashreporter/minidump-analyzer/moz.build:10
> (Diff revision 1)
> >  # file, You can obtain one at http://mozilla.org/MPL/2.0/.
> >  
> >  if CONFIG['OS_TARGET'] != 'Android':
> >      Program('minidump-analyzer')
> >  
> > +    if CONFIG['OS_TARGET'] == 'Darwin':
> 
> You shouldn't need to stick all of these conditionals in here, should you?
> They're all testing the target platform anyway. (It's not a really big deal
> either way, though.)

Some are not strictly required but I found confusing that we'd define UNICODE and _UNICODE for non-Windows platforms for example. This way it's clearer what's the purpose of certain things we declare in there.
Flags: needinfo?(ted)
Pushed by gsvelto@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/81c0cc6fe4c4
Add stack traces to the crash pings in Fennec; r=jchen,ted.mielczarek
https://hg.mozilla.org/mozilla-central/rev/81c0cc6fe4c4
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
The APK file size increase is not a problem, from my point of view.
Flags: needinfo?(abovens)
Thanks Andreas. Since I landed this we have feature-parity with Firefox desktop as far as crash pings go.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: