Closed Bug 1130066 Opened 5 years ago Closed 4 years ago

crash in java.lang.IllegalStateException: Not initialized. at org.mozilla.gecko.health.BrowserHealthRecorder.ensureEnvironment(BrowserHealthRecorder.java)

Categories

(Android Background Services Graveyard :: Firefox Health Report Service, defect, critical)

All
Android
defect
Not set
critical

Tracking

(firefox35 wontfix, firefox36 wontfix, firefox37 wontfix, firefox38 wontfix, firefox39- wontfix, firefox40+ fixed, firefox41+ fixed, firefox42+ fixed, firefox-esr31-, fennec40+)

RESOLVED FIXED
Firefox 42
Tracking Status
firefox35 --- wontfix
firefox36 --- wontfix
firefox37 --- wontfix
firefox38 --- wontfix
firefox39 - wontfix
firefox40 + fixed
firefox41 + fixed
firefox42 + fixed
firefox-esr31 - ---
fennec 40+ ---

People

(Reporter: kbrosnan, Assigned: rnewman, Mentored)

Details

(Keywords: crash, topcrash-android-armv7, Whiteboard: [lang=java])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-f0e5fb7f-dc93-43f6-aa72-c34662150129.
=============================================================

Around 700 crashes a week on release about 200 crashes a week on beta.

java.lang.IllegalStateException: Not initialized.
	at org.mozilla.gecko.health.BrowserHealthRecorder.ensureEnvironment(BrowserHealthRecorder.java:298)
	at org.mozilla.gecko.health.BrowserHealthRecorder.onEnvironmentChanged$2598ce09(BrowserHealthRecorder.java:276)
	at org.mozilla.gecko.GeckoApp.onLocaleChanged(GeckoApp.java:2563)
	at org.mozilla.gecko.GeckoApp.onConfigurationChanged(GeckoApp.java:2122)
	at android.app.ActivityThread.performConfigurationChanged(ActivityThread.java:3694)
	at android.app.ActivityThread.handleConfigurationChanged(ActivityThread.java:3833)
	at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1373)
	at android.os.Handler.dispatchMessage(Handler.java:99)
	at android.os.Looper.loop(Looper.java:137)
	at android.app.ActivityThread.main(ActivityThread.java:4949)
	at java.lang.reflect.Method.invokeNative(Native Method)
	at java.lang.reflect.Method.invoke(Method.java:511)
	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:1043)
	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:810)
	at dalvik.system.NativeStart.main(Native Method)
rnewman: so, this is weird.  A concurrency issue, I assume.  Perhaps include the state in the exception at to aid debugging?

[1] https://dxr.mozilla.org/mozilla-central/source/mobile/android/base/health/BrowserHealthRecorder.java#298
I keep getting this error if I put Firefox in background, and return to it after a couple of hours.
BTW, I have "Firefox Health Report" unchecked...
Yeah, logging the state would help.

This is failing for one of these reasons:

* The health recorder has been closed, and the configuration changed callback fired afterwards.
* There was an error initializing the profile cache. Presumably that's not the case here.
* There was no storage, or storage initialization failed, or there was an exception in storage init. That should never happen. If you can repro, Nicolae, you should have logging to this effect, which would help a lot.



We also should probably quietly abort here rather than crashing… but at least the crashes are useful.
Mentor: rnewman
Component: General → Firefox Health Report Service
Product: Firefox for Android → Android Background Services
Whiteboard: [good next bug][lang=java]
Version: Trunk → unspecified
(In reply to Richard Newman [:rnewman] from comment #3)
> If you can repro,
> Nicolae, you should have logging to this effect, which would help a lot.

I can reproduce it, but not every time (maybe 50%) and only after a couple of hours.
How do I turn on logging?
Either connecting via USB and using the Android developer tools (adb logcat), or install this add-on:

https://addons.mozilla.org/android/addon/logview/
Currently a topcrash on 39
Tracked for 40, 41, 42, because it's a top crash. Uncertain how to manage earlier version or esr, looping in Liz Henry to follow up.
Flags: needinfo?(lhenry)
Assignee: nobody → rnewman
tracking-fennec: ? → 40+
ESR does not exist on Android.
Richard, are you going to address this issue in the 40 time frame? Thanks
Probably not the 40 timeframe, unless someone steps up to do the grunt work, or we get a log from Nicolae or someone else that has a clear smoking gun.

iOS is taking a decent amount of my time, so it would be irresponsible of me to promise anything too strongly.

Comment 3 outlines the (complete set of?) possible causes.
Flags: needinfo?(rnewman)
I haven't experienced the bug since I've changed my phone to a more high end one.
My hypothesis is that we're getting onConfigurationChanged on the main thread immediately after startup (hence the uptime of 2).

Everything that touches `state` is synchronized, so I suspect that initializeStorage has run, failed, and set `state` to something other than INITIALIZING or INITIALIZED.

We then get to onConfigurationChanged -> onLocaleChanged -> BHR.onEnvironmentChanged.

The PIC init must be succeeding, else we wouldn't reach 276. Then we try ensureEnvironment, find that we're in a bad state, and throw.
Status: NEW → ASSIGNED
This patch does three things:

* Unrelated to this bug: synchronize another access to state, and fix a small logic inaccuracy, in delayed distro env.
* Log the current state if our check fails. This will aid in debugging.
* The proposed fix: don't attempt to process an environment change unless we're already initialized or initializing.
Attachment #8634589 - Flags: review?(michael.l.comella)
Comment on attachment 8634589 [details] [diff] [review]
Be careful about environment changes during early or failed BrowserHealthRecorder init. v1

Review of attachment 8634589 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good with the synchronized change (or a comment telling me why I'm wrong). Rationale:
  * Cascading errors: onEnvironmentChanged is always called last in a series of FHR calls (including in onLocaleChanged) so this error is unlikely to be thrown later
  * Data integrity: Assuming BHR.recordSessionEnd is the only writing entry-path, it writes the data to the provided SharedPreferences.Editor (without committing) and HealthReportDatabaseStorage only if state == State.INITIALIZED. This makes me think we may store data in intermediate Objects but we ultimately would not record that data.

Semi-related, it's confusing when things need to be synchronized or not - perhaps it'd be helpful to add some comments saying when certain methods are synchronized and others are not?

::: mobile/android/base/health/BrowserHealthRecorder.java
@@ +571,5 @@
>                  // Now rebuild.
>                  try {
>                      profileCache.completeInitialization();
>  
> +                    synchronized (this) {

We're synchronizing over the anonymous Distribution.ReadyCallback Object - I don't see this anywhere else so this doesn't seem useful. Did you mean BrowserHealthRecorder.this?

For future travelers: this `synchronized` doesn't pose any concurrency threats because both initializeStorage and onEnvironmentChange acquire the lock on the BrowserHealthRecorder Object.

@@ +577,5 @@
> +                            case State.INITIALIZING:
> +                                initializeStorage();
> +                                break;
> +                            case State.INITIALIZED:
> +                                onEnvironmentChanged();

This change makes sense on a, "if things do what they're supposed to" level, but thoroughly following the call hierarchy would be time-consuming and probably not worth it so I didn't.
Attachment #8634589 - Flags: review?(michael.l.comella) → review+
Whiteboard: [good next bug][lang=java] → [lang=java]
(In reply to Michael Comella (:mcomella) from comment #13)

> > +                    synchronized (this) {
> 
> We're synchronizing over the anonymous Distribution.ReadyCallback Object - I
> don't see this anywhere else so this doesn't seem useful. Did you mean
> BrowserHealthRecorder.this?

Good catch, thanks.
url:        https://hg.mozilla.org/integration/fx-team/rev/32d62d7b2060d69b5378a97180a23a3b70502f32
changeset:  32d62d7b2060d69b5378a97180a23a3b70502f32
user:       Richard Newman <rnewman@mozilla.com>
date:       Tue Jul 21 13:26:34 2015 -0700
description:
Bug 1130066 - Be careful about environment changes during early or failed BrowserHealthRecorder init. r=mcomella
Sorry about that. IntelliJ didn't complain, which is a little bit of a surprise.

This time it builds :)
Flags: needinfo?(rnewman)
url:        https://hg.mozilla.org/integration/fx-team/rev/1a63d9d7a7fbd62b804bb686c039fc9a32b354b8
changeset:  1a63d9d7a7fbd62b804bb686c039fc9a32b354b8
user:       Richard Newman <rnewman@mozilla.com>
date:       Tue Jul 21 13:26:34 2015 -0700
description:
Bug 1130066 - Be careful about environment changes during early or failed BrowserHealthRecorder init. r=mcomella
https://hg.mozilla.org/mozilla-central/rev/1a63d9d7a7fb
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 42
Comment on attachment 8634589 [details] [diff] [review]
Be careful about environment changes during early or failed BrowserHealthRecorder init. v1

Approval Request Comment
[Feature/regressing bug #]:
  Unknown race.

[User impact if declined]:
  Crashes.

[Describe test coverage new/current, TreeHerder]:
  Feature is covered by tests, but not for this.

[Risks and why]: 
  Relatively low-risk fix. Suggest waiting a few days for Nightly and Aurora crash reports to see if this does the trick before approving for Beta.

[String/UUID change made/needed]:
  None.
Attachment #8634589 - Flags: approval-mozilla-beta?
Attachment #8634589 - Flags: approval-mozilla-aurora?
Comment on attachment 8634589 [details] [diff] [review]
Be careful about environment changes during early or failed BrowserHealthRecorder init. v1

This has been in m-c for a few days and has automated test coverage. Let's land it on Aurora channel.
Attachment #8634589 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8634589 [details] [diff] [review]
Be careful about environment changes during early or failed BrowserHealthRecorder init. v1

Hard to tell whether this fix is good based on the crash data. In the 7 day report I only see 4 crashes on 42.0a1. I see 1 crash in the 3 day report. This is really our last chance to take and test the fix in 40. Let's take it in beta8 and measure the results. If the fix is not good, we can back it out before beta10. Beta+
Attachment #8634589 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Richard Newman [:rnewman] from comment #17)
> Sorry about that. IntelliJ didn't complain, which is a little bit of a
> surprise.

This is a surprise.  Some language level thing, perhaps?  I know that IntelliJ tends to forget such settings since they're re-determined every import.
You need to log in before you can comment on or make changes to this bug.