Closed Bug 1351169 Opened 3 years ago Closed 3 years ago

Crash in java.lang.NullPointerException: Null native pointer at org.mozilla.gecko.mozglue.GeckoLoader.nativeRun(Native Method)

Categories

(Firefox for Android :: General, defect, P1, critical)

Unspecified
Android
defect

Tracking

()

VERIFIED FIXED
Firefox 55
Tracking Status
firefox52 --- unaffected
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed
firefox58 --- verified

People

(Reporter: ting, Assigned: esawin)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-053d2a01-23d1-4bb2-8f9e-9cab62170327.
=============================================================
Top #2 of Nightly 20170326110230 on Android, 2 crashes from 2 installations.
Possible STR:
1. Enable Custom Tabs
2. Launch Firefox
3. Open a custom tab in Firefox
4. Press the home button to go to the home screen.
5. adb shell am kill org.mozilla.fennec
6. Launch the custom tab activity from the home screen - it will come into the foreground but then exit again immediately (presumably because of https://dxr.mozilla.org/mozilla-central/rev/7a3f514cf8490d271ee373a1d2999e4ea4dee2d7/mobile/android/base/java/org/mozilla/gecko/customtabs/CustomTabsActivity.java#198)
7. Back on the homescreen, select the custom tab activity from the home screen once more.
8. Crash.
ni on Ioana to see if she can reproduce this issue based on Comment 2.
Flags: needinfo?(ioana.chiorean)
(FWIW: I don't hit this crash most of the time, and I was able to successfully open a custom tab *between* my two crash reports in Comment 2. So I didn't mean to imply that this is 100% reproducible or anything like that.  I don't know what was going on in the background that may've contributed to my two instances in quick succession -- maybe my device [Nexus 7] was in a low-memory condition, which triggered background-app-death like JanH's aggressive "adb shell am kill" suggested STR.)
(In reply to Jan Henning [:JanH] from comment #1)
> 6. Launch the custom tab activity from the home screen - it will come into
> [...]
> 7. Back on the homescreen, select the custom tab activity from the home
> screen once more.

Just to eliminate any confusion, what I meant is that when on the home screen, use the task switcher to open the custom tab again.
Priority: -- → P1
I've hit this bug 5 times this morning alone - not sure what the STR are and it definitely doesn't seem reliable.
mystor says on IRC that he was using custom tabs when he hit this crash, BTW -- so seems definitely like the same issue. I hit this on a second device yesterday, too (using custom tabs).

Per comment 1, this was the #2 topcrash as of 10 days ago, so lots of people are hitting this. And it seems like a recent regression - filed 10 days ago, & I don't recall hitting this before a week or so ago.  Could we get someone with Custom Tabs knowledge to take a look & see if they can fix this? (or identify possible regression-causes based on the timeline & back them out?)  Right now, this makes using Custom Tabs a pretty crashy existence.
Flags: needinfo?(janus926)
Sorry, needinfo'd the wrong person. (timdream, I suspect this is in your team's area, but if I'm mistaken, please forward on as-appropriate - thanks!)
Flags: needinfo?(janus926) → needinfo?(timdream)
Crash stats shows this taking off with the first April 1 Nightly (20170401100203), so we end up with this range:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8df9fabf2587b7020889755acb9e75b664fe13cf&tochange=00a166a8640dffa2e0f48650f966d75ca3c1836e
(In reply to Jan Henning [:JanH] from comment #9)
> Crash stats shows this taking off with the first April 1 Nightly
> (20170401100203), so we end up with this range:
> 
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=8df9fabf2587b7020889755acb9e75b664fe13cf&tochange=00a1
> 66a8640dffa2e0f48650f966d75ca3c1836e

That does jibe somewhat with https://crash-analysis.mozilla.com/release-mgmt/2017-04-01/2017-04-01.fennecandroid.nightly.explosiveness.html, but I see crashes all the way back to March 10th.

ni on Sebastian with the hope he might be able to figure out what the root cause might be. Current 285 crashes with 93 installs.
Flags: needinfo?(s.kaspari)
Julian can you take a look? Thanks.
Flags: needinfo?(timdream) → needinfo?(walkingice0204)
I got same problem yesterday since I rebased my code base to latest nightly. I enabled "Don't keep activities" in system settings that trigger crash easily.

Custom-Tab activity itself doesn't do any special things for gecko-initialization. Perhaps WebAppActivity will get same problem here, but I haven't tried it yet.
https://crash-stats.mozilla.com/report/index/c17a965a-9477-40b5-9774-dee1a2170407

I'm able to reproduce this crash on Nightly 2017-04-07 just by backgrounding/foregrounding the app

Device: Galaxy Note 5 (SM-N920C) running Android 6.0.1

Steps:
1. Have Nightly freshly installed
2. Open Nightly
3. Press Home button to background the app
4. Go to Apps and tap on the Nightly icon to foreground the app

Result: Nightly crashes. 

https://crash-stats.mozilla.com/report/index/c17a965a-9477-40b5-9774-dee1a2170407
Flags: needinfo?(ioana.chiorean)
Regression range using my STR from comment 1 is 2017-02-22 to 2017-02-23:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9a9db410f20781076424307265decbc5de1e94
cc&tochange=32dcdde1fc64fc39a9065dc4218265dbc727673f

Out of this range, I've confirmed bug 1322576 to be the first push that a causes a crash.

This in turn means that out of the regression range for the jump-up in crash rates from comment 9, bug 1346542 is the most likely suspect.

Also a note for the future: Bug 1352997 will change the behaviour around https://dxr.mozilla.org/mozilla-central/rev/7a3f514cf8490d271ee373a1d2999e4ea4dee2d7/mobile/android/base/java/org/mozilla/gecko/customtabs/CustomTabsActivity.java#198 (originally implemented in bug 1329664), so my STR will no longer be valid once that bug lands.
Blocks: 1322576, 1346542
Has Regression Range: --- → yes
Has STR: --- → yes
Flags: needinfo?(walkingice0204)
Flags: needinfo?(s.kaspari)
Flags: needinfo?(esawin)
There are multiple independent issues covered by this crash signature so a regression range based on crash stats won't be accurate.

The two issues that I can (unreliably) reproduce are:
1. Compositor creation race: https://crash-stats.mozilla.com/report/index/053d2a01-23d1-4bb2-8f9e-9cab62170327
2. Event dispatcher race: https://crash-stats.mozilla.com/report/index/c17a965a-9477-40b5-9774-dee1a2170407

Issue 1 isn't a recent regression and should be handled in a new bug (which would likely be blocked on bug 1350924).
Issue 2 is most likely a regression from bug 1343613, which we will handle in this bug.
Flags: needinfo?(esawin)
I think we need to block the native queue from flushing calls when we're not attached to Gecko or GeckoView is detached from its window.
Assignee: nobody → esawin
Attachment #8857054 - Flags: review?(nchen)
I'm not sure I understand. Why did we not need this before?
Flags: needinfo?(esawin)
(In reply to Jim Chen [:jchen] [:darchons] from comment #18)
> I'm not sure I understand. Why did we not need this before?

We've been blocking until Gecko was ready and attached before, my refactoring dropped the "Gecko attached" constraint.
Flags: needinfo?(esawin)
Comment on attachment 8857054 [details] [diff] [review]
0001-Bug-1351169-1.0-Disable-native-queue-flushing-when-d.patch

Review of attachment 8857054 [details] [diff] [review]:
-----------------------------------------------------------------

I think the real bug is we're setting the state too early at [1]. We should only set the ready state after `reattach` has been called on the Gecko thread. So we should be setting the state to ready and switching queues inside `GeckoViewSupport::Reattach`.

[1] http://searchfox.org/mozilla-central/rev/2fc8c8d483d9ec9fd0ec319c6c53807f7fa8e8a2/mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoView.java#290
Attachment #8857054 - Flags: review?(nchen)
With this patch, we move the native queue swapping into GeckoView::Window and call it in GeckoViewSupport::Reattach.

This way we don't have to expose the native queue in any way via JNI.
Attachment #8857054 - Attachment is obsolete: true
Attachment #8857651 - Flags: review?(nchen)
Comment on attachment 8857651 [details] [diff] [review]
0001-Bug-1351169-1.1-Update-GeckoView-s-state-after-finis.patch

Review of attachment 8857651 [details] [diff] [review]:
-----------------------------------------------------------------

::: mobile/android/geckoview/src/main/java/org/mozilla/gecko/GeckoView.java
@@ +111,5 @@
>              mNativeQueue.setState(newState);
>          }
> +
> +        @WrapForJNI(calledFrom = "gecko")
> +        private synchronized void attach(final EventDispatcher dispatcher) {

Should be called `onReattach`, and I would use GeckoView here instead of using EventDispatcher (and adding `EventDispatcher.getNativeQueue`).
Attachment #8857651 - Flags: review?(nchen) → review+
Pushed by esawin@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab34b3ca9901
[1.2] Update GeckoView's state after finishing window reattachment. r=jchen
Addressed comments.
Attachment #8857651 - Attachment is obsolete: true
Attachment #8858002 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/ab34b3ca9901
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 55
This signature is the #1 topcrash for the Android build of 20170413100208.
Are we sure it's really fixed?
Flags: needinfo?(esawin)
For me, the STR from comment #1 (amending step 2. to read "Open Firefox, open a few additional tabs and close them again to make sure the custom tab is allocated a tab ID that will not be present after Firefox is killed") no longer crash, although with some add-ons enabled we still seem to be able to end up in a broken state if finish() gets called right during startup() - that would be a different bug, though and that particular custom tab behaviour will probably be going away anyway...
(In reply to Julian Seward [:jseward] from comment #26)
> This signature is the #1 topcrash for the Android build of 20170413100208.
> Are we sure it's really fixed?

The crash signature covers a range of crashes, see comment 16.

In this bug, I've only fixed the event dispatcher race, which was a recent regression.
We need to handle each issue separately in a new bug since they are likely unrelated.
Flags: needinfo?(esawin)
(In reply to Eugen Sawin [:esawin] from comment #28)
> (In reply to Julian Seward [:jseward] from comment #26)
> > This signature is the #1 topcrash for the Android build of 20170413100208.
> > Are we sure it's really fixed?
> 
> The crash signature covers a range of crashes, see comment 16.
> 
> In this bug, I've only fixed the event dispatcher race, which was a recent
> regression.
> We need to handle each issue separately in a new bug since they are likely
> unrelated.

Eugen: How do I tell the difference between the event dispatch crashes and the other ones? We are starting to see this signature in 56, and I would like to get a new bug on file for the crashes that remain to be addressed. Thanks!
Flags: needinfo?(esawin)
(In reply to Marcia Knous [:marcia - use ni] from comment #29)
> (In reply to Eugen Sawin [:esawin] from comment #28)
> > (In reply to Julian Seward [:jseward] from comment #26)
> > > This signature is the #1 topcrash for the Android build of 20170413100208.
> > > Are we sure it's really fixed?
> > 
> > The crash signature covers a range of crashes, see comment 16.
> > 
> > In this bug, I've only fixed the event dispatcher race, which was a recent
> > regression.
> > We need to handle each issue separately in a new bug since they are likely
> > unrelated.
> 
> Eugen: How do I tell the difference between the event dispatch crashes and
> the other ones? We are starting to see this signature in 56, and I would
> like to get a new bug on file for the crashes that remain to be addressed.
> Thanks!

See example in comment 16.
Frame 0 of the crash stack will contain something related to  mozilla::java::EventDispatcher for crashes caused by the event dispatcher race (assumed fix in this bug).

I expect most if not all remaining crashes to be related to the compositor creation race, in this case frame 0 will contain something related to mozilla::java::LayerView::Compositor.

Thanks for filing! We should triage and fix the remaining issue, too.
Flags: needinfo?(esawin)
For reference, Bug 1372895 was filed for the mozilla::java::EventDispatcher crashes.
The crash is old and not reproducing anymore on custom tabs and PWAs, we can close the bug as fixed.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.