Closed
Bug 1343851
Opened 8 years ago
Closed 8 years ago
Crash in mozilla::gl::SurfaceFactory::NewTexClient
Categories
(Core :: Graphics: Color Management, defect)
Tracking
()
RESOLVED
FIXED
mozilla55
People
(Reporter: marcia, Assigned: vliu)
References
Details
(Keywords: crash, regression, Whiteboard: [gfx-noted])
Crash Data
Attachments
(2 files, 1 obsolete file)
134.17 KB,
text/x-log
|
Details | |
1.42 KB,
patch
|
gchang
:
approval-mozilla-aurora+
gchang
:
approval-mozilla-beta+
jcristau
:
approval-mozilla-esr52+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-0b309ba5-ef97-49ae-aad7-8c9c72170302.
=============================================================
Seen while looking at nightly crash stats - crashes started using 20170206030211: http://bit.ly/2lD5sIO
Possible regression range based on build IDs: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3e555770a90a41e04bbb4ac41b65fa2f1db6977d&tochange=20a8536b0bfac74389d3a57bd8dd957d98779ce1
https://sketchfab.com/models/a5b8f6a35f9f4071ae2f38650508fee1 is one URL that shows up - there are a few more from that site as well.
140 crashes in the last 7 days, with 34 installations.
Comment 1•8 years ago
|
||
A similar bug 1302410 was reported before but its stack trace was a bit different.
The crash address 0xC0 in both are the same and probably telling us this is a null pointer deference.
Assignee | ||
Comment 2•8 years ago
|
||
Since I only have mac os on my hand at this moment, I'd tried loading [1] on my Nightly mac build. It seems that I saw different error message. From log information, it pointed to permission issue on mac. The attached file was detailed message.
Crash Annotation GraphicsCriticalError: |[C0][GFX1-]: ClientLayerManager::BeginTransaction with IPC channel down. GPU process may have died. (t=2.81714) [GFX1-]: ClientLayerManager::BeginTransaction with IPC channel down. GPU process may have died.
2017-03-02 23:01:05.867 plugin-container[33551:361850] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x923b, name = 'com.apple.tsm.portname'
I will also try it on my win-10 machine to see if it also reproducible.
[1]: https://sketchfab.com/models/a5b8f6a35f9f4071ae2f38650508fee1
Comment 3•8 years ago
|
||
From crash stat, all crashes happened on windows which highly suggest it's a Windows only bug.
As to crashes on 52.0b99 which came from single installation, wondering if it's really an valid bug on 52.0b. :/
Assignee | ||
Comment 4•8 years ago
|
||
(In reply to Vincent Liu[:vliu] from comment #2)
> I will also try it on my win-10 machine to see if it also reproducible.
>
>
> [1]: https://sketchfab.com/models/a5b8f6a35f9f4071ae2f38650508fee1
I saw the same assertion message when I run [1] on Nightly build on windows 10. apparently not similar to crash stat reported in this bug. From looked into description of bug 1302410, the difference may caused by not having any VR headset attached. But even so, I think it is worth filing a new bug to discuss.
Comment 5•8 years ago
|
||
I have tried to reproduce it on Windows 10 by both integrated(HD4000) and discrete(GTX1070) but got no luck, the 3D model can display properly, although the console keep barfing some warnings which seems not related to this failure.
And for the crash place I don't think it is a driver problem, maybe we need to take a deeper look into the code to find why there is a nullptr-deref.
Flags: needinfo?(cleu)
Updated•8 years ago
|
status-firefox51:
--- → unaffected
You can ignore the "ClientLayerManager::BeginTransaction with IPC channel down. GPU process may have died." message.
![]() |
||
Comment 7•8 years ago
|
||
#8 Windows topcrash in Nightly 20170303030202. Correlations suggest it is D3D9 only:
> (100.0% in signature vs 11.52% overall) Module "libEGL.dll" = true
> (100.0% in signature vs 11.56% overall) Module "libGLESv2.dll" = true
> (100.0% in signature vs 13.46% overall) Module "d3d9.dll" = true
> (100.0% in signature vs 15.80% overall) Module "d3dcompiler_47.dll" = true
Comment 8•8 years ago
|
||
I saw some reports had IPC/GPU related error. Is it possible that SurfaceFactory[2] got destroyed because of IPC/GPU error?
Vincent, please help to check the dump or check SurfaceFactory behavior when you killed GPU process.
[1]https://crash-stats.mozilla.com/report/index/3817839f-208b-4583-8006-d18c72170304
[2]http://searchfox.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#82
Assignee: nobody → vliu
Assignee | ||
Comment 9•8 years ago
|
||
(In reply to Peter Chang[:pchang] from comment #8)
> I saw some reports had IPC/GPU related error. Is it possible that
> SurfaceFactory[2] got destroyed because of IPC/GPU error?
>
> Vincent, please help to check the dump or check SurfaceFactory behavior when
> you killed GPU process.
>
> [1]https://crash-stats.mozilla.com/report/index/3817839f-208b-4583-8006-
> d18c72170304
> [2]http://searchfox.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#82
SurfaceFactory was created and moved into GLScreenBuffer constr. in [3]. The only chance it became to nullptr was in [4]. In other words,
this factory was entirely lived in GLScreenBuffer lifecycle.
From looked into GraphicsCriticalError in crash-meta, IPC was closed with reason=AbnormalShutdown[5], probably something wrong in the parent side first and
child side got crash by unknow reason. Actually there is no any clue to say IPC/GPU error caused SurfaceFactory destroyed.
I'd tried to kill GPU process in local to observe what will happen about SurfaceFactory. The thing is my local windows Nightly build didn't see any GPU process
was created. Maybe some preference value would disable it. I will try to figure it out.
[3]: https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLContext.cpp#2367
[4]: https://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#155
[5]: https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorBridgeChild.cpp#605
Assignee | ||
Comment 10•8 years ago
|
||
I'd looked into minidump and found SurfaceFactory in GLScreenBuffer became nullptr. Based on this, any operations accessing to this object would cause read access violation.
Two way to move SurfaceFactory to GLScreenBuffer. [1] when GLScreenBuffer was constructed. [2] by calling CreateFactory() and than calling GLScreenBuffer::Morph() to move into.
[1]: https://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#139
[2]: https://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#66
One possible problem to cause the crash would be in [3]. If CreateFactory() returns nullptr in [4] and move to GLScreenBuffer in [3], I think it would cause crash.
[3]:https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLContext.cpp#2367
[4]: https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLContext.cpp#2360
Comment 11•8 years ago
|
||
Vincent, could we confirm the flow in comment 10 is running in main thread first? If not, we need to protect the SurfaceFactory.
Assignee | ||
Comment 12•8 years ago
|
||
Hi Morris,
Can you please have a review? Thanks.
Attachment #8844776 -
Flags: review?(mtseng)
Comment 13•8 years ago
|
||
Comment on attachment 8844776 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch
Review of attachment 8844776 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/canvas/WebGLContext.cpp
@@ +2364,5 @@
> vrmc->GetBackendType(),
> TextureFlags::ORIGIN_BOTTOM_LEFT);
>
> + if (factory) {
> + screen->Morph(Move(factory));
nit: 4-spaces indent.
Attachment #8844776 -
Flags: review?(mtseng) → review+
Assignee | ||
Comment 14•8 years ago
|
||
(In reply to Morris Tseng [:mtseng] [:Morris] from comment #13)
> Comment on attachment 8844776 [details] [diff] [review]
> 0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch
>
> Review of attachment 8844776 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: dom/canvas/WebGLContext.cpp
> @@ +2364,5 @@
> > vrmc->GetBackendType(),
> > TextureFlags::ORIGIN_BOTTOM_LEFT);
> >
> > + if (factory) {
> > + screen->Morph(Move(factory));
>
> nit: 4-spaces indent.
Thanks and r+ was carried.
Attachment #8844776 -
Attachment is obsolete: true
Assignee | ||
Comment 15•8 years ago
|
||
Comment 16•8 years ago
|
||
Pushed by vliu@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cc38781741fc
Add nullptr check for SurfaceFactory in StartVRPresentation. r=mtseng
Comment 17•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Assignee | ||
Comment 18•8 years ago
|
||
[Tracking Requested - why for this release]:
The crash seems dropped from observing the past couple days. It is worth uplifting to beta/aurora.
tracking-firefox53:
--- → ?
tracking-firefox54:
--- → ?
Comment 19•8 years ago
|
||
Track 54+ as crash.
Hi Vincent,
Please create uplift request for beta/aurora.
Updated•8 years ago
|
Comment 20•8 years ago
|
||
Looks like a good one for ESR52 too given that it's just some added null checks.
Assignee | ||
Comment 21•8 years ago
|
||
Comment on attachment 8844798 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch
Approval Request Comment
[Feature/Bug causing the regression]: None
[User impact if declined]: might have chance to hit crash for windows user
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Crash rate was dropped after landed in Nightly.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: it's a pointer checking
[Is the change risky?]: No
[Why is the change risky/not risky?]:
[String changes made/needed]:
Attachment #8844798 -
Flags: approval-mozilla-beta?
Attachment #8844798 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 22•8 years ago
|
||
Comment on attachment 8844798 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch
(In reply to Ryan VanderMeulen [:RyanVM] from comment #20)
> Looks like a good one for ESR52 too given that it's just some added null
> checks.
As uplift to ESR52.
[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined: might have chance to hit crash for windows user
Fix Landed on Version: FF55
Risk to taking this patch (and alternatives if risky): None
String or UUID changes made by this patch: None
See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8844798 -
Flags: approval-mozilla-esr52?
Comment 23•8 years ago
|
||
Comment on attachment 8844798 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch
Fix a crash and the crash seems dropped in nightly in the past few days. Aurora54+ & Beta53+.
Attachment #8844798 -
Flags: approval-mozilla-beta?
Attachment #8844798 -
Flags: approval-mozilla-beta+
Attachment #8844798 -
Flags: approval-mozilla-aurora?
Attachment #8844798 -
Flags: approval-mozilla-aurora+
Comment 24•8 years ago
|
||
bugherder uplift |
Comment 25•8 years ago
|
||
bugherder uplift |
Comment 26•8 years ago
|
||
Setting qe-verify- based on Vincent's assessment on manual testing needs (see Comment 21).
Flags: qe-verify-
Comment 27•8 years ago
|
||
Comment on attachment 8844798 [details] [diff] [review]
0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch
add null check, esr52+
Attachment #8844798 -
Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Comment 28•8 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•