Closed Bug 1553135 Opened 5 years ago Closed 5 years ago

Crash in [@ nsWindow::RecvScreenPixels]

Categories

(Core Graveyard :: Widget: Android, defect, P1)

Unspecified
Android
defect

Tracking

(firefox-esr60 unaffected, firefox-esr68 unaffected, firefox67 wontfix, firefox67.0.1 wontfix, firefox68 wontfix, firefox69 fixed, firefox70 fixed)

RESOLVED FIXED
mozilla70
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox67 --- wontfix
firefox67.0.1 --- wontfix
firefox68 --- wontfix
firefox69 --- fixed
firefox70 --- fixed

People

(Reporter: davidb, Assigned: fluffyemily)

References

Details

(Keywords: crash, regression, Whiteboard: [geckoview:fenix:m7])

Crash Data

Attachments

(1 file)

This bug is for crash report bp-10b7d77e-9d56-431f-bb8e-e376d0190521.

Top 10 frames of crashing thread:

0 libxul.so nsWindow::RecvScreenPixels widget/android/nsWindow.cpp:2248
1 libxul.so mozilla::layers::UiCompositorControllerChild::RecvScreenPixels gfx/layers/ipc/UiCompositorControllerChild.cpp:262
2 libxul.so mozilla::layers::PUiCompositorControllerChild::OnMessageReceived ipc/ipdl/PUiCompositorControllerChild.cpp:506
3 libxul.so mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2151
4 libxul.so mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1968
5 libxul.so long long mozilla::jni::NativeStub<mozilla::java::GeckoThread::RunUiThreadCallback_t, GeckoThreadSupport, mozilla::jni::Args<> >::Wrap<&GeckoThreadSupport::RunUiThreadCallback> widget/android/jni/Natives.h:689
6 base.odex base.odex@0x1209a7 
7 dalvik-LinearAlloc (deleted) dalvik-LinearAlloc @0xe36a 
8 dalvik-main space (region space) (deleted) dalvik-main space @0x1141c9e 
9 libart.so libart.so@0x3c06f5 

More here: https://crash-stats.mozilla.com/signature/?product=Fenix&signature=nsWindow%3A%3ARecvScreenPixels

Looks like a null-deref of mLayerViewSupport in nsWindow. This is probably a widget issue, moving components.

Component: Graphics: Layers → Widget: Android

21 crashes/6 installs in Fenix 68.0b0. Fairly small set of affected devices running APIs 26-28. This is one of the top crashes if you factor out all the system@framework@boot-framework.art and liblog.so signatures.

Randall I think this is your code, any thoughts?

Flags: needinfo?(rbarker)

I'm not sure what is happening. It shouldn't be possible for the lvs to be null since it is locked before invocation. If it is null then there is a bug in our WindowPtr<> . Otherwise the aMem could be null? But I'm not really sure how either could be null.

Flags: needinfo?(rbarker)

Emily said she'd take a look.

Assignee: nobody → etoop

I suspect this is more likely that aMem is null. I wonder if this could occur if the compositor is brought down during the screen pixel capture?

The priority flag is not set for this bug.
:snorp, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(snorp)

NI+ Randall for comment 6 thoughts, and Chris to get this into our prioritization queue.

Flags: needinfo?(snorp)
Flags: needinfo?(rbarker)
Flags: needinfo?(cpeterson)

I've tagged this bug as post- Fenix MVP. As Marcia says, the crash volume is pretty low.

Flags: needinfo?(cpeterson)
Priority: -- → P2
Whiteboard: [geckoview:fenix:m7]

If it is the aMem probably would be good to figure what part is null and check it first.

Flags: needinfo?(rbarker)

(In reply to Chris Peterson [:cpeterson] from comment #9)

I've tagged this bug as post- Fenix MVP. As Marcia says, the crash volume is pretty low.

Now that Fenix MVP has been released, it appears this is in the top 5 in Socorro (although not a high volume crash).

So, I'm pretty sure I know what is going on now, and it's because bug 1560641 was raised. This seems to be a race condition caused if detach is called while a screen shot is being taken. The solution to this is twofold:

  1. Lock the list of waiting screenshot GeckoResults such that we can't try and use the same one to both notify for detach and notify for a completed screenshot.
  2. Check to ensure that mLayerViewSupport is not null before trying to return a screenshot.
Depends on: 1560641

This is caused by a race condition when the compositor is detached. Because the actual detachment happens in a new thread, detach can complete and release the lock on mLayerViewSupport, and RecvScreenPixels can obtain the lock, before mLayerViewSupport is properly cleaned up. We therefore check to ensure that lvs is not null before calling a method on it.

Attachment #9075988 - Attachment description: Bug 1553135 - Ensure that `mLayerViewSupport` is present before calling `RecvScreenPixels`. r=rbarker → Bug 1553135 - Lock `LayerViewSupport` during detach so that other methods cannot be called during this time. r=rbarker
Pushed by etoop@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/920cd58d56ff
Lock `LayerViewSupport` during detach so that other methods cannot be called during this time. r=geckoview-reviewers,snorp
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70

Is this something we should consider backporting to Beta for GV69 or can this ride with 70 to release?

This is something we should consider for GV69.

Flags: needinfo?(etoop)

Comment on attachment 9075988 [details]
Bug 1553135 - Lock LayerViewSupport during detach so that other methods cannot be called during this time. r=rbarker

Beta/Release Uplift Approval Request

  • User impact if declined: In rare cases Fenix will crash rather than closing down cleanly.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce:
  • List of other uplifts needed: Bug 1560641
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This crash was not visible to the user has it only happens when the app is being backgrounded.
  • String changes made/needed:
Attachment #9075988 - Flags: approval-mozilla-beta?

Comment on attachment 9075988 [details]
Bug 1553135 - Lock LayerViewSupport during detach so that other methods cannot be called during this time. r=rbarker

Fixes a Fenix crash. Approved for GV69.

Attachment #9075988 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

[geckoview:fenix:m7] bugs should be priority P1.

I'm editing a bunch of GeckoView bugs. If you'd like to filter all this bugmail, search and destroy emails containing this UUID:

e88a5094-0fc0-4b7c-b7c5-aef00a11dbc9

Priority: P2 → P1

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression
Regressions: 1715036
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: