Closed Bug 1238541 Opened 4 years ago Closed 4 years ago

crash in mozilla::gl::SharedSurface_EGLImage::ProducerReleaseImpl

Categories

(Core :: Graphics, defect, critical)

46 Branch
ARM
Android
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla47
Tracking Status
firefox46 + fixed
firefox47 --- fixed

People

(Reporter: kats, Assigned: jgilbert)

References

Details

(4 keywords, Whiteboard: [webvr])

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-fd8b28be-dd2b-475f-aae2-58ebc2160111.
=============================================================

No particular STR, seems to happen randomly.

Here is another report: https://crash-stats.mozilla.com/report/index/98fc06c9-f11a-447f-ba7a-cabc82160111
Easily reproducible with today's Fennec Nightly at
http://2016.render-conf.com/
Sometimes the page doesn't crash right away, but after some scrolling around it's a safe bet it eventually will.
(on Lollipop Z3Compact & Marshmallow/Nexus6 either)
I see this every time when loading: https://www.donmccurdy.com/aframe-gamepad-controls/

Same as Flaki, it sometimes takes a few seconds, but eventually (within 30 sec) will crash.
Looks like 100% reproducibility now. Any ETA on someone looking into fixing this?
Flags: needinfo?(bugmail.mozilla)
Jim, is this also fallout from the recent fennec widget changes? If not we should move it to graphics.
Flags: needinfo?(bugmail.mozilla) → needinfo?(nchen)
I don't know for sure, but it doesn't look like it.
Component: Widget: Android → Graphics, Panning and Zooming
Flags: needinfo?(nchen)
Product: Core → Firefox for Android
It's #3 on the Fennec 46 top crasher list over the last 7 days. snorp, do you have cycles to look into this?
Keywords: topcrash
^
Flags: needinfo?(snorp)
Also cc'ing jgilbert since the MOZ_RELEASE_ASSERT it's dying on was added by him in bug 1144906 back in last June.
Crash when loading or scrolling horizontally https://torflow.uncharted.software/
Jeff, it looks like readback is kinda busted. We acquire the producer lock to get at the FB, but when we release we may have already had a fence created, hitting this assertion. The comment here[0] says the consumer lock is the best one to use, probably for this reason, but the code does not match.

[0] https://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLContext.cpp?from=GLContext.cpp#2780
Flags: needinfo?(snorp) → needinfo?(jgilbert)
Milan, this is a gfx regression
Flags: needinfo?(milan)
Component: Graphics, Panning and Zooming → Graphics
Product: Firefox for Android → Core
Bas, can :jgilbert and you sort this out?  It may be related to pushlayer/poplayer, it's in the range mentioned in comment 8.
Flags: needinfo?(milan) → needinfo?(bas)
[Tracking Requested - why for this release]: We don't want new regressions, and this is a frequent crash.
(In reply to Milan Sreckovic [:milan] from comment #13)
> Bas, can :jgilbert and you sort this out?  It may be related to
> pushlayer/poplayer, it's in the range mentioned in comment 8.

This work is preffed off on Android.
Flags: needinfo?(bas)
Since there's a bunch of large and possibly related changes that landed on the day this started, it would help if somebody who can reproduce this can narrow down the range using mozregression.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #11)
> Jeff, it looks like readback is kinda busted. We acquire the producer lock
> to get at the FB, but when we release we may have already had a fence
> created, hitting this assertion. The comment here[0] says the consumer lock
> is the best one to use, probably for this reason, but the code does not
> match.
> 
> [0]
> https://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLContext.
> cpp?from=GLContext.cpp#2780

ProducerReadAcquire is the right choice here. (the comment is out-of-date)(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #8)

> Seems to have started with the Jan 7 nightly, which would put the regression
> range at
> https://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=9d6ffc7a08b6b47056eefe1e652710a3849adbf7&tochange=e0bc
> d16e1d4b99ba3e542149d0d41e0f60c54b5c

It might have to do with bug 1236762.
Flags: needinfo?(jgilbert)
See Also: → 1236762
Assignee: nobody → jgilbert
I can reproduce this crash using Nexus 6 (Android 6.0) on Firefox for Android 46.0a2 (2016-01-27) after loading get.webgl.org
We're getting reports of all demos at mozvr.com (mixture of WebVR demos from around the web) crashing on Firefox Android nightly. Looking into it now. Reporter indicated they believed this bug was the cause.
Whiteboard: [webvr]
FWIW, This crash happens fairly often for me (1+One device).   Though, I am unable to reproduce these crashes consistently -- even with the examples posted in the comments above.    These crashes seem to happen seemingly randomly sometimes, and the only common thread from my experience is that it's due to webGL content on the page.
It seems to me the assertion is just wrong in the ProducerReadReleaseImpl() case, which right now is
treated the same as ProducerReleaseImpl(). This patch provides a separate path, and waits on
any existing fence in the ProducerReadAcquireImpl().
Attachment #8713686 - Flags: review?(jgilbert)
Attachment #8713686 - Flags: review?(jgilbert) → review+
I see this crash showing up in Fx47 and Fx46
https://hg.mozilla.org/mozilla-central/rev/a85f477cd68f
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Jeff, do you want to uplift this to 46?
Flags: needinfo?(jgilbert)
Tracking because topcrash/regression
(In reply to Benjamin Kerensa [:bkerensa] from comment #25)
> Jeff, do you want to uplift this to 46?

It seems like we should.
Flags: needinfo?(jgilbert)
Comment on attachment 8713686 [details] [diff] [review]
Don't die in SharedSurface_EGLImage::ProducerReadReleaseImpl() if there is an existing fence

Approval Request Comment
[Feature/regressing bug #]: unknown
[User impact if declined]: top-crash
[Describe test coverage new/current, TreeHerder]: none
[Risks and why]: low
[String/UUID change made/needed]: none
Attachment #8713686 - Flags: approval-mozilla-aurora?
Comment on attachment 8713686 [details] [diff] [review]
Don't die in SharedSurface_EGLImage::ProducerReadReleaseImpl() if there is an existing fence

Crash fix, please uplift to aurora.
Attachment #8713686 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.