Crash in CContext::ID3D11DeviceContext1_Map_<T>

VERIFIED FIXED in Firefox 57

Status

()

defect
--
critical
VERIFIED FIXED
2 years ago
2 months ago

People

(Reporter: ting, Assigned: sotaro)

Tracking

({crash})

Trunk
mozilla67
Unspecified
Windows 7
Points:
---

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox-esr60 wontfix, firefox55 unaffected, firefox56 disabled, firefox57 fixed, firefox65 wontfix, firefox66 verified, firefox67 verified)

Details

(crash signature)

Attachments

(3 attachments, 1 obsolete attachment)

Reporter

Description

2 years ago
This bug was filed from the Socorro interface and is 
report bp-ddbe7b2f-49f0-4bae-a291-a4c020170801.
=============================================================
Top #22 of Nightly 20170730100307 on Windows, 22 reports from 9 installations.
This looks like a null deref in advanced layers code. The crashes are on either 0x31 or 0x61. Any ideas, David?
Flags: needinfo?(dvander)
It seems that I accidentally hit the similar back trace when I using DxCap to try device reset on recent win10 build. The crash I his showed below.

d3d11.dll!CContext::TID3D11DeviceContext_Map_<4>(struct ID3D11DeviceContext3 *,struct ID3D11Resource *,unsigned int,enum D3D11_MAP,unsigned int,struct D3D11_MAPPED_SUBRESOURCE *)	Unknown
>	xul.dll!mozilla::layers::MLGDeviceD3D11::Map(mozilla::layers::MLGResource * aResource, mozilla::layers::MLGMapType aType, mozilla::layers::MLGMappedResource * aMap) Line 1338	C++
 	xul.dll!mozilla::layers::SharedConstantBuffer::AllocateNewBuffer(unsigned __int64 aBytes, __int64 * aOutOffset, RefPtr<mozilla::layers::MLGBuffer> * aOutBuffer) Line 297	C++
 	xul.dll!mozilla::layers::SharedConstantBuffer::GetBufferPointer(mozilla::layers::AutoBufferUploadBase * aPtr, unsigned __int64 aBytes, __int64 * aOutOffset, RefPtr<mozilla::layers::MLGBuffer> * aOutBuffer) Line 271	C++
 	xul.dll!mozilla::layers::SharedConstantBuffer::Allocate(mozilla::layers::ConstantBufferSection * aHolder, mozilla::layers::AutoBufferUploadBase * aPtr, unsigned __int64 aNumItems, unsigned __int64 aSizeOfItem) Line 275	C++
 	xul.dll!mozilla::layers::SharedConstantBuffer::Allocate(mozilla::layers::ConstantBufferSection * aHolder, unsigned __int64 aNumItems, unsigned __int64 aSizeOfItem, const void * aData) Line 253	C++
 	xul.dll!mozilla::layers::SharedConstantBuffer::Allocate<mozilla::layers::mlg::MaskInformation>(mozilla::layers::ConstantBufferSection * aHolder, const mozilla::layers::mlg::MaskInformation & aItem) Line 227	C++
 	xul.dll!mozilla::layers::FrameBuilder::Build() Line 78	C++
 	xul.dll!mozilla::layers::LayerManagerMLGPU::RenderLayers() Line 351	C++

From looked into it, I found [1] became null before Map was executed. Since some of signatures in this bug also had device reset message, I guess they are relatives. 

[1]: https://searchfox.org/mozilla-central/rev/f0e4ae5f8c40ba742214e89aba3f554da0b89a33/gfx/layers/d3d11/MLGDeviceD3D11.cpp#1335

Do you think adding NullPtr check for [1] to make sure we won't crash in Map()?
Assignee: nobody → vliu
Huh. Good catch. I have *no idea* how I forgot to do this, but we don't check the HRESULT of the CreateBuffer call here [1]. We should check there and gfxCriticalError if FAILED or !buffer. I don't think we need to change Map(), but we could assert that the extracted resource is non-null - MLGBufferD3D11 should never have a non-null ID3D11Buffer underneath.

[1] http://searchfox.org/mozilla-central/source/gfx/layers/d3d11/MLGDeviceD3D11.cpp#628
Flags: needinfo?(dvander)
(In reply to David Anderson [:dvander] from comment #3)
> Huh. Good catch. I have *no idea* how I forgot to do this, but we don't
> check the HRESULT of the CreateBuffer call here [1]. We should check there
> and gfxCriticalError if FAILED or !buffer. 

Yes, I also think we should add a checking mechanism here.

I don't think we need to change
> Map(), but we could assert that the extracted resource is non-null -
> MLGBufferD3D11 should never have a non-null ID3D11Buffer underneath.
> 

From my local observation, I got valid RawPtr for buffer but RawPtr for mBuffer became null. For this, I think we still need to have nullPtr checking to early return and not using assert. How do you think?

buffer	{mRawPtr=0x0000018734752e20 {mBuffer={mRawPtr=0x0000000000000000 <NULL> } mType=Constant (0x00000001) ...} }	RefPtr<mozilla::layers::MLGBuffer>

> [1]
> http://searchfox.org/mozilla-central/source/gfx/layers/d3d11/MLGDeviceD3D11.
> cpp#628
Flags: needinfo?(dvander)
The only reason mBuffer is null is because of the bug in MLGBufferD3D11::Create. After that it should never be null, so we shouldn't need a null check for it in Map().
Flags: needinfo?(dvander)
Hi Jeff,
Since David is on PTO, could you please have a review for the patch? From David's previous comment, I think the patch is positive to him. Thanks.
Attachment #8894427 - Flags: review?(jmuizelaar)
Hi Ryan,

Could you please have a review for the patch? I think you are also suitable for reviewing this part. Thanks.
Attachment #8894427 - Attachment is obsolete: true
Attachment #8894427 - Flags: review?(jmuizelaar)
Attachment #8894754 - Flags: review?(rhunt)
Reporter

Updated

2 years ago
Crash Signature: [@ CContext::ID3D11DeviceContext1_Map_<T>] → [@ CContext::ID3D11DeviceContext1_Map_<T>] [@ CContext::TID3D11DeviceContext_Map_<T>]

Updated

2 years ago
Attachment #8894754 - Flags: review?(rhunt) → review+

Comment 9

2 years ago
Pushed by vliu@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f216d5050f00
Add checking mechanism if ID3D11Buffer created fail. r=rhunt, dvander

Comment 10

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/f216d5050f00
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
This signature continues to crash, at a fairly low level,
and has been doing so at least for the past three months.

Issue still happening on Windows 10 Pro x86 with Fx 66.0b12
Crash report: https://crash-stats.mozilla.org/report/index/02cfe889-7660-49cb-900e-817de0190301

  • Steps to reproduce:
  1. Launch Firefox
  2. Open Facebook and search for Poker Heat game
  3. Click to play game
  • Expected Results:
    There are no issues encountered while using these apps.

  • Actual Results:
    The game remains in a loading state and after a while the tab crashes.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: vincent.liu1013 → nobody

Jessie, since it sounds like we have a way to reproduce this we should try to fix it. Maybe Sotaro would be good person to take a look?

Flags: needinfo?(jbonisteel)

Hey Sotaro - would you be able to take a peak at this?

Flags: needinfo?(jbonisteel) → needinfo?(sotaro.ikeda.g)
Assignee

Comment 15

4 months ago

I take a look :)

Flags: needinfo?(sotaro.ikeda.g)
Assignee

Comment 16

4 months ago

All the remaining crashes seemed to happen under D3D11YCbCrImage::GetAsSourceSurface() and the GetAsSourceSurface() was called from BasicImageLayer::Paint() or LayoutUtils::SurfaceFromElement().

Assignee

Updated

4 months ago
Assignee: nobody → sotaro.ikeda.g
Assignee

Comment 17

4 months ago

The almost all crash logs had the following GraphicsCriticalError logs. TDR seems to be related.

Could not get a DXGI adapter
A content-only TDR is detected
Detected device reset

Assignee

Comment 18

4 months ago

If D3D11Device of D3D11YCbCrImage is already obsoleted by TDR, we should not use it.

Comment 20

3 months ago
Pushed by sikeda@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ff6d406ee71f
Check if D3D11Device is obsoleted in D3D11YCbCrImage::GetAsSourceSurface() r=nical

Comment 21

3 months ago
bugherder
Status: REOPENED → RESOLVED
Closed: 2 years ago3 months ago
Resolution: --- → FIXED
Target Milestone: mozilla57 → mozilla67

Comment 22

3 months ago

the fix here is looking very contained - do you think we should still get it into firefox 66? if so, could you please request approval for uplift... thanks

Flags: needinfo?(sotaro.ikeda.g)
Assignee

Comment 23

3 months ago

Comment on attachment 9047925 [details]
Bug 1386487 - Check if D3D11Device is obsoleted in D3D11YCbCrImage::GetAsSourceSurface()

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: Bug 1223270
  • User impact if declined: Tab could crash during playing a video when TDR happens.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch just added ID3D11Device check.
  • String changes made/needed: None
Flags: needinfo?(sotaro.ikeda.g)
Attachment #9047925 - Flags: approval-mozilla-beta?

Comment on attachment 9047925 [details]
Bug 1386487 - Check if D3D11Device is obsoleted in D3D11YCbCrImage::GetAsSourceSurface()

Further fix for this top crash.
OK for uplift for beta 14.

Attachment #9047925 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

The issue is not reproducing anymore with 66.0b14 and latest 67.0a1 (2019-03-10) on Windows 10x32, however the tab crashes for OOM with this signature: https://crash-stats.mozilla.org/report/index/352f435c-33a5-4daa-b8f4-973940190311 using the steps from comment 12.
Sotaro Ikeda, any thoughts?

Flags: needinfo?(sotaro.ikeda.g)
Assignee

Comment 27

3 months ago

It seems like general oom at syle.

:emilio, can you comment to comment 26?

Flags: needinfo?(sotaro.ikeda.g) → needinfo?(emilio)

Yeah, it'd be interesting to get a memory report of that page on Windows and see if there's something going very wrong with style system memory usage (or somewhere else).

Gabi, is there any chance you could do that? Or somebody with a Win32 machine generally. Though maybe a memory report in win64 is enough to pinpoint the issue.

Flags: needinfo?(emilio) → needinfo?(gasofie)

Hi Emilio,

Sorry to get back to you so late, I had some issues with the Windows machine on which the issue reproduces constantly.
I tried to get the memory dump file for the crash tab but I did not manage to create it in time to contain the relevant information. Can you give me some pointers or a tool that I can use?

Thank you

Flags: needinfo?(gasofie)
Flags: needinfo?(emilio)

You can use about:memory and save it, if you have time. Does the OOM always reproduce in the style system? or does it have different signatures?

Nick, do you know of a good way of finding out what's going on in a near-oom situation in Win32? In Linux I'd just use DMD or instrument the allocations in some other way but...

Flags: needinfo?(n.nethercote)
Flags: needinfo?(gasofie)
Flags: needinfo?(emilio)

On Windows, Firefox periodically checks for low-memory conditions and will get a memory report if possible. If an OOM subsequently happens, the memory report will be included in the crash report. gsvelto may be able to add more details about this.

Also, DMD works on Windows, so you might be able to use it.

Flags: needinfo?(n.nethercote)
Posted file Memory-report
Flags: needinfo?(gasofie)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #30)

You can use about:memory and save it, if you have time. Does the OOM always reproduce in the style system? or does it have different signatures?

Nick, do you know of a good way of finding out what's going on in a near-oom situation in Win32? In Linux I'd just use DMD or instrument the allocations in some other way but...

While trying to reproduce the oom crash and save the memory report, tab crashed again with CContext::ID3D11DeviceContext1_Map_<T> signature on latest beta 67.0b6 on Windows 10x32. Added the memory report json.

The memory report looks entirely normal to me. I don't see anything in there that would indicate an OOM is likely to occur.

Here two new reports for the [@ OOM | small ] crash:

Crashes occurred with same steps under Windows 10x32 on beta 67.0b7.

Ah, then I wouldn't worry about it, those stacks are different and the page is probably just requesting too much data for Win32 to process... I'm not particularly concerned as long as the OOM doesn't happen consistently in one place.

After looking at the Mozilla Crash Reports, there isn't a significant number of crashes for "CContext::ID3D11DeviceContext1_Map_<T>" signature on the latest Firefox builds and based on the above comment concerning the OOM crash I'm setting this issue on verified as fixed.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.