Closed Bug 1343851 Opened 8 years ago Closed 8 years ago

Crash in mozilla::gl::SurfaceFactory::NewTexClient

Categories

(Core :: Graphics: Color Management, defect)

Unspecified
Windows 8
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox51 --- unaffected
firefox52 --- wontfix
firefox-esr52 --- fixed
firefox53 ? fixed
firefox54 + fixed
firefox55 --- fixed

People

(Reporter: marcia, Assigned: vliu)

References

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(2 files, 1 obsolete file)

This bug was filed from the Socorro interface and is report bp-0b309ba5-ef97-49ae-aad7-8c9c72170302. ============================================================= Seen while looking at nightly crash stats - crashes started using 20170206030211: http://bit.ly/2lD5sIO Possible regression range based on build IDs: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3e555770a90a41e04bbb4ac41b65fa2f1db6977d&tochange=20a8536b0bfac74389d3a57bd8dd957d98779ce1 https://sketchfab.com/models/a5b8f6a35f9f4071ae2f38650508fee1 is one URL that shows up - there are a few more from that site as well. 140 crashes in the last 7 days, with 34 installations.
A similar bug 1302410 was reported before but its stack trace was a bit different. The crash address 0xC0 in both are the same and probably telling us this is a null pointer deference.
Flags: needinfo?(cleu)
See Also: → 1302410
Whiteboard: [gfx-noted]
Attached file error-log-on-mac.log
Since I only have mac os on my hand at this moment, I'd tried loading [1] on my Nightly mac build. It seems that I saw different error message. From log information, it pointed to permission issue on mac. The attached file was detailed message. Crash Annotation GraphicsCriticalError: |[C0][GFX1-]: ClientLayerManager::BeginTransaction with IPC channel down. GPU process may have died. (t=2.81714) [GFX1-]: ClientLayerManager::BeginTransaction with IPC channel down. GPU process may have died. 2017-03-02 23:01:05.867 plugin-container[33551:361850] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x923b, name = 'com.apple.tsm.portname' I will also try it on my win-10 machine to see if it also reproducible. [1]: https://sketchfab.com/models/a5b8f6a35f9f4071ae2f38650508fee1
From crash stat, all crashes happened on windows which highly suggest it's a Windows only bug. As to crashes on 52.0b99 which came from single installation, wondering if it's really an valid bug on 52.0b. :/
(In reply to Vincent Liu[:vliu] from comment #2) > I will also try it on my win-10 machine to see if it also reproducible. > > > [1]: https://sketchfab.com/models/a5b8f6a35f9f4071ae2f38650508fee1 I saw the same assertion message when I run [1] on Nightly build on windows 10. apparently not similar to crash stat reported in this bug. From looked into description of bug 1302410, the difference may caused by not having any VR headset attached. But even so, I think it is worth filing a new bug to discuss.
I have tried to reproduce it on Windows 10 by both integrated(HD4000) and discrete(GTX1070) but got no luck, the 3D model can display properly, although the console keep barfing some warnings which seems not related to this failure. And for the crash place I don't think it is a driver problem, maybe we need to take a deeper look into the code to find why there is a nullptr-deref.
Flags: needinfo?(cleu)
You can ignore the "ClientLayerManager::BeginTransaction with IPC channel down. GPU process may have died." message.
#8 Windows topcrash in Nightly 20170303030202. Correlations suggest it is D3D9 only: > (100.0% in signature vs 11.52% overall) Module "libEGL.dll" = true > (100.0% in signature vs 11.56% overall) Module "libGLESv2.dll" = true > (100.0% in signature vs 13.46% overall) Module "d3d9.dll" = true > (100.0% in signature vs 15.80% overall) Module "d3dcompiler_47.dll" = true
I saw some reports had IPC/GPU related error. Is it possible that SurfaceFactory[2] got destroyed because of IPC/GPU error? Vincent, please help to check the dump or check SurfaceFactory behavior when you killed GPU process. [1]https://crash-stats.mozilla.com/report/index/3817839f-208b-4583-8006-d18c72170304 [2]http://searchfox.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#82
Assignee: nobody → vliu
(In reply to Peter Chang[:pchang] from comment #8) > I saw some reports had IPC/GPU related error. Is it possible that > SurfaceFactory[2] got destroyed because of IPC/GPU error? > > Vincent, please help to check the dump or check SurfaceFactory behavior when > you killed GPU process. > > [1]https://crash-stats.mozilla.com/report/index/3817839f-208b-4583-8006- > d18c72170304 > [2]http://searchfox.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#82 SurfaceFactory was created and moved into GLScreenBuffer constr. in [3]. The only chance it became to nullptr was in [4]. In other words, this factory was entirely lived in GLScreenBuffer lifecycle. From looked into GraphicsCriticalError in crash-meta, IPC was closed with reason=AbnormalShutdown[5], probably something wrong in the parent side first and child side got crash by unknow reason. Actually there is no any clue to say IPC/GPU error caused SurfaceFactory destroyed. I'd tried to kill GPU process in local to observe what will happen about SurfaceFactory. The thing is my local windows Nightly build didn't see any GPU process was created. Maybe some preference value would disable it. I will try to figure it out. [3]: https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLContext.cpp#2367 [4]: https://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#155 [5]: https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorBridgeChild.cpp#605
I'd looked into minidump and found SurfaceFactory in GLScreenBuffer became nullptr. Based on this, any operations accessing to this object would cause read access violation. Two way to move SurfaceFactory to GLScreenBuffer. [1] when GLScreenBuffer was constructed. [2] by calling CreateFactory() and than calling GLScreenBuffer::Morph() to move into. [1]: https://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#139 [2]: https://dxr.mozilla.org/mozilla-central/source/gfx/gl/GLScreenBuffer.cpp#66 One possible problem to cause the crash would be in [3]. If CreateFactory() returns nullptr in [4] and move to GLScreenBuffer in [3], I think it would cause crash. [3]:https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLContext.cpp#2367 [4]: https://dxr.mozilla.org/mozilla-central/source/dom/canvas/WebGLContext.cpp#2360
Vincent, could we confirm the flow in comment 10 is running in main thread first? If not, we need to protect the SurfaceFactory.
Hi Morris, Can you please have a review? Thanks.
Attachment #8844776 - Flags: review?(mtseng)
Comment on attachment 8844776 [details] [diff] [review] 0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch Review of attachment 8844776 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/canvas/WebGLContext.cpp @@ +2364,5 @@ > vrmc->GetBackendType(), > TextureFlags::ORIGIN_BOTTOM_LEFT); > > + if (factory) { > + screen->Morph(Move(factory)); nit: 4-spaces indent.
Attachment #8844776 - Flags: review?(mtseng) → review+
(In reply to Morris Tseng [:mtseng] [:Morris] from comment #13) > Comment on attachment 8844776 [details] [diff] [review] > 0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch > > Review of attachment 8844776 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: dom/canvas/WebGLContext.cpp > @@ +2364,5 @@ > > vrmc->GetBackendType(), > > TextureFlags::ORIGIN_BOTTOM_LEFT); > > > > + if (factory) { > > + screen->Morph(Move(factory)); > > nit: 4-spaces indent. Thanks and r+ was carried.
Attachment #8844776 - Attachment is obsolete: true
Pushed by vliu@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/cc38781741fc Add nullptr check for SurfaceFactory in StartVRPresentation. r=mtseng
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
[Tracking Requested - why for this release]: The crash seems dropped from observing the past couple days. It is worth uplifting to beta/aurora.
Track 54+ as crash. Hi Vincent, Please create uplift request for beta/aurora.
Looks like a good one for ESR52 too given that it's just some added null checks.
Comment on attachment 8844798 [details] [diff] [review] 0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch Approval Request Comment [Feature/Bug causing the regression]: None [User impact if declined]: might have chance to hit crash for windows user [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: Crash rate was dropped after landed in Nightly. [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: it's a pointer checking [Is the change risky?]: No [Why is the change risky/not risky?]: [String changes made/needed]:
Attachment #8844798 - Flags: approval-mozilla-beta?
Attachment #8844798 - Flags: approval-mozilla-aurora?
Comment on attachment 8844798 [details] [diff] [review] 0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch (In reply to Ryan VanderMeulen [:RyanVM] from comment #20) > Looks like a good one for ESR52 too given that it's just some added null > checks. As uplift to ESR52. [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: User impact if declined: might have chance to hit crash for windows user Fix Landed on Version: FF55 Risk to taking this patch (and alternatives if risky): None String or UUID changes made by this patch: None See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8844798 - Flags: approval-mozilla-esr52?
Comment on attachment 8844798 [details] [diff] [review] 0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch Fix a crash and the crash seems dropped in nightly in the past few days. Aurora54+ & Beta53+.
Attachment #8844798 - Flags: approval-mozilla-beta?
Attachment #8844798 - Flags: approval-mozilla-beta+
Attachment #8844798 - Flags: approval-mozilla-aurora?
Attachment #8844798 - Flags: approval-mozilla-aurora+
Setting qe-verify- based on Vincent's assessment on manual testing needs (see Comment 21).
Flags: qe-verify-
Comment on attachment 8844798 [details] [diff] [review] 0001-Bug-1343851-Add-nullptr-check-for-SurfaceFactory-in-.patch add null check, esr52+
Attachment #8844798 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: