Closed Bug 826053 Opened 7 years ago Closed 7 years ago

Detect and report ANRs through our own channel

Categories

(Firefox for Android :: General, defect)

ARM
Android
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 21

People

(Reporter: jchen, Assigned: jchen)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 8 obsolete files)

3.56 KB, patch
blassey
: review+
Details | Diff | Splinter Review
8.10 KB, patch
blassey
: review+
Details | Diff | Splinter Review
8.97 KB, patch
blassey
: review+
Details | Diff | Splinter Review
2.36 KB, patch
blassey
: review+
Details | Diff | Splinter Review
6.83 KB, patch
blassey
: review+
Details | Diff | Splinter Review
1.32 KB, patch
glandium
: review+
Details | Diff | Splinter Review
Currently, we only get reports about ANRs through Google Play which means sources of ANRs could go unnoticed until Beta. If we had our own mechanism for detecting and reporting ANRs, we could catch them a lot sooner in Nightlies.
Assignee: nobody → nchen
Status: NEW → ASSIGNED
(WIP) add an option for configure, MOZ_ANDROID_ANR_REPORTER, that controls ANR reporter support
Attached patch Add skeletal ANRReporter (v1) (obsolete) — Splinter Review
(WIP) ANRReporter waits for the ANR broadcast intent on a separate thread, whose lifetime is controlled by GeckoActivity, so that we only watch for ANRs when Fennec is active
For ping back to the telemetry server, we need some basic metadata, most of which we can obtain through the preprocessor
(WIP) Methods used by main ANRReporter logic
Attached patch Add main ANRReporter logic (v1) (obsolete) — Splinter Review
(WIP) ANRReporter waits for the ANR broadcast intent, detects and reads the traces.txt file, and encapsulates it inside a standard telemetry ping back file. The telemetry ping file is picked up by existing telemetry code and sends it to our metrics server if the user enabled telemetry.
MOZ_ANDROID_ANR_REPORTER enables the ANR reporter and can be set in configure.sh files under branding directories.
Attachment #705356 - Attachment is obsolete: true
Attachment #707072 - Flags: review?(blassey.bugs)
Attached patch Add skeletal ANRReporter (v2) (obsolete) — Splinter Review
Attachment #705361 - Attachment is obsolete: true
Attachment #707074 - Flags: review?(blassey.bugs)
Attachment #705364 - Attachment is obsolete: true
Attachment #707075 - Flags: review?(blassey.bugs)
Attachment #705365 - Attachment is obsolete: true
Attachment #707076 - Flags: review?(blassey.bugs)
Attachment #705367 - Attachment is obsolete: true
Attachment #707078 - Flags: review?(blassey.bugs)
We detect if the main thread has become unstuck by posting a message. Duplicate ANRs that happen before the main thread becomes unstuck are discarded.
Attachment #707081 - Flags: review?(blassey.bugs)
Attachment #707072 - Flags: review?(blassey.bugs) → review+
Attachment #707075 - Flags: review?(blassey.bugs) → review+
Attachment #707076 - Flags: review?(blassey.bugs) → review+
Comment on attachment 707078 [details] [diff] [review]
Add main ANRReporter logic (v2)

Review of attachment 707078 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/ANRReporter.java
@@ +287,5 @@
> +        checksum.update(data);
> +
> +        data = JSONObject.quote(payload).getBytes(PING_CHARSET);
> +        // first and last bytes are quotes inserted by JSONObject.quote; discard them
> +        ping.write(data, 1, data.length - 2);

I'm grumbling in my head about using quote to escape the string. Seems better to just do a string.replace() for \ and "
Attachment #707078 - Flags: review?(blassey.bugs) → review+
Comment on attachment 707081 [details] [diff] [review]
Avoid reporting duplicate ANRs (v1)

Review of attachment 707081 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/ANRReporter.java
@@ +441,5 @@
>  
>      @Override
>      public void onReceive(Context context, Intent intent) {
> +        if (mPendingANR) {
> +            // we already processed an ANR without getting unstuck; skip this one

how do we get in this situation?
Comment on attachment 707074 [details] [diff] [review]
Add skeletal ANRReporter (v2)

Review of attachment 707074 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/base/GeckoActivity.java.in
@@ +34,5 @@
> +
> +    @Override
> +    public void onStop() {
> +        super.onStop();
> +        ANRReporter.unregister();

you're going to be unregistering the receiver when the Awesome bar and settings page are shown, that's not what we want.
Attachment #707074 - Flags: review?(blassey.bugs) → review-
(In reply to Brad Lassey [:blassey] from comment #12)
> Comment on attachment 707078 [details] [diff] [review]
> > +        data = JSONObject.quote(payload).getBytes(PING_CHARSET);
> > +        // first and last bytes are quotes inserted by JSONObject.quote; discard them
> > +        ping.write(data, 1, data.length - 2);
> 
> I'm grumbling in my head about using quote to escape the string. Seems
> better to just do a string.replace() for \ and "

But also for characters like '\n' I think. I went with quote just to be sure it'd be valid JSON.

(In reply to Brad Lassey [:blassey] from comment #13)
> Comment on attachment 707081 [details] [diff] [review]
> >      @Override
> >      public void onReceive(Context context, Intent intent) {
> > +        if (mPendingANR) {
> > +            // we already processed an ANR without getting unstuck; skip this one
> 
> how do we get in this situation?

We get the broadcast every time the ANR dialog displays. So if the user chooses "Wait", and another ANR dialog comes up, we would get the same broadcast again.

(In reply to Brad Lassey [:blassey] from comment #14)
> Comment on attachment 707074 [details] [diff] [review]
> > +
> > +    @Override
> > +    public void onStop() {
> > +        super.onStop();
> > +        ANRReporter.unregister();
> 
> you're going to be unregistering the receiver when the Awesome bar and
> settings page are shown, that's not what we want.

From my testing, onStop doesn't get called until the whole app goes into background. The Awesomebar inherits from GeckoActivity also, so it will keep us registered in any case.
Attachment #707081 - Flags: review?(blassey.bugs) → review+
You were right :)  onStop is called when AwesomeBar or Settings appear, but it appears AwesomeBar's onStart is called before onStop, so it looked like onStop wasn't called. Settings's onStart doesn't do register receiver, so the old patch failed to work with Settings.

The new patch uses onCreate and onDestroy. This covers all of our cases, but also makes it more likely for us to get other app's ANRs. Nevertheless, versus using AndroidManifest, we don't get the ANR broadcast if Fennec is not running, so I think it's a good compromise between wanting to cover all of our ANRs while avoiding other app's ANRs.

I tested the case of us getting killed while the receiver is still registered. Android cleans up after us, and we don't get ANR broadcasts in that case. We still properly register the next time we launch, and I don't see anything in the logs.
Attachment #707074 - Attachment is obsolete: true
Attachment #711499 - Flags: review?(blassey.bugs)
I also removed synchronization from the start-up sequence, to cut down on time spent locking and waiting on ANR thread. Since we are always starting up from the main thread, and we don't interact with the ANR thread during start-up, I think synchronization was unnecessary.
Attachment #711499 - Flags: review?(blassey.bugs) → review+
Configure (and non-trivial makefile) changes must have build peer review. This really should be backed out until that happens.
Backed out rev involving configure change. The back-out should not affect other patches.

https://hg.mozilla.org/integration/mozilla-inbound/rev/0f18d1919efe
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 707072 [details] [diff] [review]
Add MOZ_ANDROID_ANR_REPORTER configure option (v2)

The patch adds the MOZ_ANDROID_ANR_REPORTER configure flag which will be controlled by configure.sh scripts under /mobile/android/branding (similar to MOZ_UPDATER)
Attachment #707072 - Flags: review?(mh+mozilla)
Comment on attachment 707072 [details] [diff] [review]
Add MOZ_ANDROID_ANR_REPORTER configure option (v2)

Review of attachment 707072 [details] [diff] [review]:
-----------------------------------------------------------------

::: configure.in
@@ +4284,5 @@
>  MOZ_SAFE_BROWSING=
>  MOZ_HELP_VIEWER=
>  MOZ_SPELLCHECK=1
>  MOZ_ANDROID_OMTC=
> +MOZ_ANDROID_ANR_REPORTER=

If you want this variable to be settable in a mozconfig, you don't want to set it here. That being said, where is this variable set at all? I haven't seen any patch setting it in files like confvars.sh.

@@ +5143,5 @@
>  dnl ========================================================
> +dnl = Android App Not Responding (ANR) reporter
> +dnl ========================================================
> +if test -n "$MOZ_ANDROID_ANR_REPORTER"; then
> +    AC_DEFINE(MOZ_ANDROID_ANR_REPORTER)

I think it would be better to add DEFINES += -DMOZ_ANDROID_ANR_REPORTER in mobile/android/base/Makefile.in.
Attachment #707072 - Flags: review?(mh+mozilla) → review-
(In reply to Mike Hommey [:glandium] from comment #23)
> Comment on attachment 707072 [details] [diff] [review]
> Add MOZ_ANDROID_ANR_REPORTER configure option (v2)
> 
> Review of attachment 707072 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: configure.in
> @@ +4284,5 @@
> >  MOZ_SAFE_BROWSING=
> >  MOZ_HELP_VIEWER=
> >  MOZ_SPELLCHECK=1
> >  MOZ_ANDROID_OMTC=
> > +MOZ_ANDROID_ANR_REPORTER=
> 
> If you want this variable to be settable in a mozconfig, you don't want to
> set it here. That being said, where is this variable set at all? I haven't
> seen any patch setting it in files like confvars.sh.

Ok, I removed that line. MOZ_ANDROID_ANR_REPORTER will be set in configure.sh files under /mobile/android/branding (similar to how MOZ_UPDATER is set). It will be enabled in Bug 839683 (once other dependant bugs are resolved) and I put up a sample patch there.

> @@ +5143,5 @@
> >  dnl ========================================================
> > +dnl = Android App Not Responding (ANR) reporter
> > +dnl ========================================================
> > +if test -n "$MOZ_ANDROID_ANR_REPORTER"; then
> > +    AC_DEFINE(MOZ_ANDROID_ANR_REPORTER)
> 
> I think it would be better to add DEFINES += -DMOZ_ANDROID_ANR_REPORTER in
> mobile/android/base/Makefile.in.

Added the line

Thanks!
Attachment #707072 - Attachment is obsolete: true
Attachment #713135 - Flags: review?(mh+mozilla)
Comment on attachment 713135 [details] [diff] [review]
Add MOZ_ANDROID_ANR_REPORTER configure option (v3)

Review of attachment 713135 [details] [diff] [review]:
-----------------------------------------------------------------

::: configure.in
@@ +5089,5 @@
>  dnl ========================================================
> +dnl = Android App Not Responding (ANR) reporter
> +dnl ========================================================
> +if test -n "$MOZ_ANDROID_ANR_REPORTER"; then
> +    AC_DEFINE(MOZ_ANDROID_ANR_REPORTER)

You don't need that, with the Makefile.in change :)

::: mobile/android/base/Makefile.in
@@ +323,5 @@
>  DEFINES += -DMOZ_LINKER_EXTRACT=1
>  endif
>  
> +ifdef MOZ_ANDROID_ANR_REPORTER
> +DEFINES += -DMOZ_ANDROID_ANR_REPORTER=1

Might be better to place it in the existing ifdef MOZ_ANDROID_ANR_REPORTER block.
Attachment #713135 - Flags: review?(mh+mozilla) → feedback+
Thanks! Addressed review comments. Now it's simpler than ever :)
Attachment #713135 - Attachment is obsolete: true
Attachment #713547 - Flags: review?(mh+mozilla)
Attachment #713547 - Flags: review?(mh+mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/384274679e10
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.