Crash in mozilla::gl::SurfaceFactory::NewTexClient

RESOLVED FIXED in Firefox -esr52

Status

()

Core
GFX: Color Management
--
critical
RESOLVED FIXED
8 months ago
7 months ago

People

(Reporter: marcia, Assigned: vliu)

Tracking

({crash, regression})

Trunk
mozilla55
Unspecified
Windows 8
crash, regression
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox51 unaffected, firefox52 wontfix, firefox-esr52 fixed, firefox53? fixed, firefox54+ fixed, firefox55 fixed)

Details

(Whiteboard: [gfx-noted], crash signature)

Attachments

(2 attachments, 1 obsolete attachment)

This bug was filed from the Socorro interface and is 
report bp-0b309ba5-ef97-49ae-aad7-8c9c72170302.
=============================================================

Seen while looking at nightly crash stats - crashes started using 20170206030211: http://bit.ly/2lD5sIO

Possible regression range based on build IDs: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3e555770a90a41e04bbb4ac41b65fa2f1db6977d&tochange=20a8536b0bfac74389d3a57bd8dd957d98779ce1

https://sketchfab.com/models/a5b8f6a35f9f4071ae2f38650508fee1 is one URL that shows up - there are a few more from that site as well.

140 crashes in the last 7 days, with 34 installations.
A similar bug 1302410 was reported before but its stack trace was a bit different.
The crash address 0xC0 in both are the same and probably telling us this is a null pointer deference.
Flags: needinfo?(cleu)
See Also: → bug 1302410
Whiteboard: [gfx-noted]
(Assignee)

Comment 2

8 months ago
Created attachment 8842879 [details]
error-log-on-mac.log

Since I only have mac os on my hand at this moment, I'd tried loading [1] on my Nightly mac build. It seems that I saw different error message. From log information, it pointed to permission issue on mac. The attached file was detailed message.


Crash Annotation GraphicsCriticalError: |[C0][GFX1-]: ClientLayerManager::BeginTransaction with IPC channel down. GPU process may have died. (t=2.81714) [GFX1-]: ClientLayerManager::BeginTransaction with IPC channel down. GPU process may have died.
2017-03-02 23:01:05.867 plugin-container[33551:361850] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x923b, name = 'com.apple.tsm.portname'

I will also try it on my win-10 machine to see if it also reproducible.


[1]: https://sketchfab.com/models/a5b8f6a35f9f4071ae2f38650508fee1
From crash stat, all crashes happened on windows which highly suggest it's a Windows only bug.
As to crashes on 52.0b99 which came from single installation, wondering if it's really an valid bug on 52.0b. :/
(Assignee)

Comment 4

8 months ago
(In reply to Vincent Liu[:vliu] from comment #2)

> I will also try it on my win-10 machine to see if it also reproducible.
> 
> 
> [1]: https://sketchfab.com/models/a5b8f6a35f9f4071ae2f38650508fee1

I saw the same assertion message when I run [1] on Nightly build on windows 10. apparently not similar to crash stat reported in this bug. From looked into description of bug 1302410, the difference may caused by not having any VR headset attached. But even so, I think it is worth filing a new bug to discuss.
I have tried to reproduce it on Windows 10 by both integrated(HD4000) and discrete(GTX1070) but got no luck, the 3D model can display properly, although the console keep barfing some warnings which seems not related to this failure.

And for the crash place I don't think it is a driver problem, maybe we need to take a deeper look into the code to find why there is a nullptr-deref.
Flags: needinfo?(cleu)

Updated

8 months ago
status-firefox51: --- → unaffected
You can ignore the "ClientLayerManager::BeginTransaction with IPC channel down. GPU process may have died." message.
#8 Windows topcrash in Nightly 20170303030202. Correlations suggest it is D3D9 only:

> (100.0% in signature vs 11.52% overall) Module "libEGL.dll" = true
> (100.0% in signature vs 11.56% overall) Module "libGLESv2.dll" = true
> (100.0% in signature vs 13.46% overall) Module "d3d9.dll" = true
> (100.0% in signature vs 15.80% overall) Module "d3dcompiler_47.dll" = true

Comment 8

8 months ago
I saw some reports had IPC/GPU related error. Is it possible that SurfaceFactory[2] got destroyed because of IPC/GPU error?

Vincent, please help to check the dump or check SurfaceFactory behavior when you killed GPU process.

[1]https://crash-stats.mozilla.com/report/index/3817839f-208b-4583-8006-d18c72170304
[2]http://searchfox.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#82
Assignee: nobody → vliu
(Assignee)

Comment 9

8 months ago
(In reply to Peter Chang[:pchang] from comment #8)
> I saw some reports had IPC/GPU related error. Is it possible that
> SurfaceFactory[2] got destroyed because of IPC/GPU error?
> 
> Vincent, please help to check the dump or check SurfaceFactory behavior when
> you killed GPU process.
> 
> [1]https://crash-stats.mozilla.com/report/index/3817839f-208b-4583-8006-
> d18c72170304
> [2]http://searchfox.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#82

SurfaceFactory was created and moved into GLScreenBuffer constr. in [3]. The only chance it became to nullptr was in [4]. In other words, 
this factory was entirely lived in GLScreenBuffer lifecycle. 

From looked into GraphicsCriticalError in crash-meta, IPC was closed with reason=AbnormalShutdown[5], probably something wrong in the parent side first and 
child side got crash by unknow reason. Actually there is no any clue to say IPC/GPU error caused SurfaceFactory destroyed.

I'd tried to kill GPU process in local to observe what will happen about SurfaceFactory. The thing is my local windows Nightly build didn't see any GPU process
was created. Maybe some preference value would disable it. I will try to figure it out.

[3]: https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLContext.cpp#2367
[4]: https://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#155
[5]: https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorBridgeChild.cpp#605
(Assignee)

Comment 10

8 months ago
I'd looked into minidump and found SurfaceFactory in GLScreenBuffer became nullptr. Based on this, any operations accessing to this object would cause read access violation.

Two way to move SurfaceFactory to GLScreenBuffer. [1] when GLScreenBuffer was constructed. [2] by calling CreateFactory() and than calling GLScreenBuffer::Morph() to move into.


[1]: https://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#139
[2]: https://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#66


One possible problem to cause the crash would be in [3]. If CreateFactory() returns nullptr in [4] and move to GLScreenBuffer in [3], I think it would cause crash.


[3]:https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLContext.cpp#2367
[4]: https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLContext.cpp#2360

Comment 11

8 months ago
Vincent, could we confirm the flow in comment 10 is running in main thread first? If not, we need to protect the SurfaceFactory.
(Assignee)

Comment 12

8 months ago
Created attachment 8844776 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch

Hi Morris,

Can you please have a review? Thanks.
Attachment #8844776 - Flags: review?(mtseng)
Comment on attachment 8844776 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch

Review of attachment 8844776 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/canvas/WebGLContext.cpp
@@ +2364,5 @@
>              vrmc->GetBackendType(),
>              TextureFlags::ORIGIN_BOTTOM_LEFT);
>  
> +    if (factory) {
> +      screen->Morph(Move(factory));

nit: 4-spaces indent.
Attachment #8844776 - Flags: review?(mtseng) → review+
(Assignee)

Comment 14

8 months ago
Created attachment 8844798 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch

(In reply to Morris Tseng [:mtseng] [:Morris] from comment #13)
> Comment on attachment 8844776 [details] [diff] [review]
> 0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch
> 
> Review of attachment 8844776 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/canvas/WebGLContext.cpp
> @@ +2364,5 @@
> >              vrmc->GetBackendType(),
> >              TextureFlags::ORIGIN_BOTTOM_LEFT);
> >  
> > +    if (factory) {
> > +      screen->Morph(Move(factory));
> 
> nit: 4-spaces indent.

Thanks and r+ was carried.
Attachment #8844776 - Attachment is obsolete: true
(Assignee)

Comment 15

8 months ago
try result: https://treeherder.mozilla.org/#/jobs?repo=try&revision=39cb7bdfdc5b8846c69623e630225eff7a810db6&selectedJob=82379048

Comment 16

8 months ago
Pushed by vliu@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cc38781741fc
Add nullptr check for SurfaceFactory in StartVRPresentation. r=mtseng

Comment 17

8 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/cc38781741fc
Status: NEW → RESOLVED
Last Resolved: 8 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(Assignee)

Comment 18

7 months ago
[Tracking Requested - why for this release]:
The crash seems dropped from observing the past couple days. It is worth uplifting to beta/aurora.
tracking-firefox53: --- → ?
tracking-firefox54: --- → ?
Track 54+ as crash.
Hi Vincent,
Please create uplift request for beta/aurora.
tracking-firefox54: ? → +
status-firefox52: affected → wontfix
status-firefox-esr52: --- → affected
Looks like a good one for ESR52 too given that it's just some added null checks.
(Assignee)

Comment 21

7 months ago
Comment on attachment 8844798 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch

Approval Request Comment
[Feature/Bug causing the regression]: None
[User impact if declined]: might have chance to hit crash for windows user
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Crash rate was dropped after landed in Nightly.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: it's a pointer checking
[Is the change risky?]: No
[Why is the change risky/not risky?]:
[String changes made/needed]:
Attachment #8844798 - Flags: approval-mozilla-beta?
Attachment #8844798 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 22

7 months ago
Comment on attachment 8844798 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch

(In reply to Ryan VanderMeulen [:RyanVM] from comment #20)
> Looks like a good one for ESR52 too given that it's just some added null
> checks.

As uplift to ESR52.

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: might have chance to hit crash for windows user
Fix Landed on Version: FF55
Risk to taking this patch (and alternatives if risky): None 
String or UUID changes made by this patch: None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8844798 - Flags: approval-mozilla-esr52?
Comment on attachment 8844798 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch

Fix a crash and the crash seems dropped in nightly in the past few days. Aurora54+ & Beta53+.
Attachment #8844798 - Flags: approval-mozilla-beta?
Attachment #8844798 - Flags: approval-mozilla-beta+
Attachment #8844798 - Flags: approval-mozilla-aurora?
Attachment #8844798 - Flags: approval-mozilla-aurora+

Comment 24

7 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/e71ac822186e
status-firefox54: affected → fixed

Comment 25

7 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/fd0708b8461b
status-firefox53: affected → fixed
Setting qe-verify- based on Vincent's assessment on manual testing needs (see Comment 21).
Flags: qe-verify-
Comment on attachment 8844798 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch

add null check, esr52+
Attachment #8844798 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+

Comment 28

7 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-esr52/rev/e104d53316d7
status-firefox-esr52: affected → fixed

Updated

7 months ago
Duplicate of this bug: 1302410
You need to log in before you can comment on or make changes to this bug.