Closed Bug 1820587 Opened 2 years ago Closed 2 years ago

Crash in [@ gfxContext::GetAzureDeviceSpaceClipBounds]

Categories

(Core :: Graphics, defect)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
117 Branch
Tracking Status
firefox-esr102 116+ fixed
firefox-esr115 116+ fixed
firefox115 --- wontfix
firefox116 --- fixed
firefox117 --- fixed

People

(Reporter: diannaS, Assigned: tnikkel)

References

(Blocks 1 open bug, Regression)

Details

(6 keywords, Whiteboard: [fixed by 1842325][tbird crash][firefox crash][adv-main116+r][adv-ESR115.1+r][adv-ESR102.14+r])

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/ea9ecc55-426c-4dea-a04b-5449d0230306

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0  xul.dll  gfxContext::GetAzureDeviceSpaceClipBounds const  gfx/thebes/gfxContext.cpp:821
0  xul.dll  gfxContext::GetClipExtents const  gfx/thebes/gfxContext.cpp:510
1  xul.dll  mozilla::layers::WebRenderLayerManager::MakeSnapshotIfRequired  gfx/layers/wr/WebRenderLayerManager.cpp:530
2  xul.dll  mozilla::layers::WebRenderLayerManager::EndTransactionWithoutLayer  gfx/layers/wr/WebRenderLayerManager.cpp:485
3  xul.dll  mozilla::nsDisplayList::PaintRoot  layout/painting/nsDisplayList.cpp:2300
4  xul.dll  nsLayoutUtils::PaintFrame  layout/base/nsLayoutUtils.cpp:3413
5  xul.dll  mozilla::PresShell::PaintInternal  layout/base/PresShell.cpp:6430
6  xul.dll  nsViewManager::ProcessPendingUpdatesPaint  view/nsViewManager.cpp:433
7  xul.dll  nsViewManager::ProcessPendingUpdatesForView  view/nsViewManager.cpp:368
8  xul.dll  nsViewManager::ProcessPendingUpdates  view/nsViewManager.cpp:941

Timothy, any ideas what could cause this?

Severity: -- → S3
Flags: needinfo?(tnikkel)

Hmm, I don't really see anything.

Flags: needinfo?(tnikkel)

For this Thunderbird user bp-5a679e96-4786-4a5c-b8d2-799670230319, it was simpley a startup crash.

Whiteboard: [tbird crash]
See Also: → 1824568

About 10 out of 44 crashes I see look like they are on poison-ish values, like this one: bp-6dc56e90-80d2-49bb-955e-dab940230412

Another 10 of the crashes are specifically on the value 0x5441554156415791, and they don't seem to be all from the same install time either, so that's odd.

bp-9f4e58a8-e13d-411e-ab48-5f34a0230412
bp-546afb95-aea1-422a-8513-2594d0230412

Given the wildptrs and clearish UAFs, making a sec bug

Group: gfx-core-security
Comment 4 is private: false
Keywords: sec-high

The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:bhood, could you consider increasing the severity of this security bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(bhood)
Severity: S3 → S2
Flags: needinfo?(bhood)

From the correlations it looks like it mostly happens after we run into driver issues.
Also it is oddly dominated by AMD GPUs and CPUs (although there are a few intel CPUs and GPUs in the lot).

(98.48% in signature vs 02.99% overall) GFX_ERROR "Killing GPU process due to IPC reply timeout" = true [100.0% vs 13.54% if adapter_device_id = 0x15d8]
(98.48% in signature vs 03.04% overall) GFX_ERROR "timeout" = true [100.0% vs 13.54% if adapter_device_id = 0x15d8]
(98.48% in signature vs 14.27% overall) adapter_vendor_id = 0x1002
Assignee: nobody → nical.bugzilla
See Also: → 1837198

Bug 1837198 is definitely this bug, summing up what I found out there the gfxContext object pointed by this has been overwritten with data that most likely belongs to another object. I haven't checked but chances are that the object was freed and a similarly sized object was written on top of it.

Blocks: 1837198
See Also: 1837198

[@ std::_Func_class<T>::_Tidy | std::_Func_class<T>::~_Func_class | mozilla::ManagedPostRefreshObserver::~ManagedPostRefreshObserver] has this in the stack for Firefox 114.0.2, e.g. bp-4c9c8f20-662a-42df-8994-aaa420230622.

Top 10 frames of crashing thread:

0  xul.dll  std::_Func_class<mozilla::ManagedPostRefreshObserver::Unregister, bool>::_Tidy  /builds/worker/fetches/vs/VC/Tools/MSVC/14.16.27023/include/functional:1391
0  xul.dll  std::_Func_class<mozilla::ManagedPostRefreshObserver::Unregister, bool>::~_Func_class  /builds/worker/fetches/vs/VC/Tools/MSVC/14.16.27023/include/functional:1271
0  xul.dll  mozilla::ManagedPostRefreshObserver::~ManagedPostRefreshObserver  layout/base/nsRefreshObservers.cpp:19
0  xul.dll  mozilla::ManagedPostRefreshObserver::~ManagedPostRefreshObserver  layout/base/nsRefreshObservers.cpp:19
1  xul.dll  gfxContext::GetAzureDeviceSpaceClipBounds const  gfx/thebes/gfxContext.cpp:579
1  xul.dll  gfxContext::GetClipExtents const  gfx/thebes/gfxContext.cpp:348
2  xul.dll  mozilla::layers::WebRenderLayerManager::MakeSnapshotIfRequired  gfx/layers/wr/WebRenderLayerManager.cpp:532
3  xul.dll  mozilla::layers::WebRenderLayerManager::EndTransactionWithoutLayer  gfx/layers/wr/WebRenderLayerManager.cpp:487
4  xul.dll  mozilla::nsDisplayList::PaintRoot  layout/painting/nsDisplayList.cpp:2342
5  xul.dll  nsLayoutUtils::PaintFrame  layout/base/nsLayoutUtils.cpp:3428
Crash Signature: [@ gfxContext::GetAzureDeviceSpaceClipBounds] → [@ gfxContext::GetAzureDeviceSpaceClipBounds] [@ std::_Func_class<T>::_Tidy | std::_Func_class<T>::~_Func_class | mozilla::ManagedPostRefreshObserver::~ManagedPostRefreshObserver]

Tim, could you take a look?

Assignee: nical.bugzilla → tnikkel
Flags: needinfo?(tnikkel)
Depends on: 1842325

The crashes happen when accessing WebRenderLayerManager::mTarget. mTarget usually null for the normal painting to screen code path. It's only non-null if we are being asked to render to some other surface (like drawWindow for example). mTarget gets set at the start of a transaction and cleared at the end of the transaction. In all the crashes I looked at we are on the normal painting path, so mTarget should be null for the entire transaction. So either (1) the mTarget pointer is getting overwritten with another pointer during the transaction or (2) mTarget is not getting properly cleared at the end of a previous transaction. (1) is basically impossible to find as we have no info from the crashes about when that overwriting might be happening. However, I can definitely see how (2) could be happening. I filed bug 1842325 with a patch to make (2) impossible. We can land that and hopefully it makes these crashes stop happening.

Flags: needinfo?(tnikkel)

If we ignore the Thunderbird 102 crashes, this is essentially a regression in Firefox 112; there were single digit crashes before that, none of which looked memory-poisoned.

(In reply to Andrew McCreight [:mccr8] from comment #4)

Another 10 of the crashes are specifically on the value 0x5441554156415791, and they don't seem to be all from the same install time either, so that's odd.

There are small clumps of "same specific address" that do look odd. Maybe data values that were interpreted as pointers? But frame-poisoning was supposed to prevent that from happening. I guess this isn't a simple use-after-free of the frame object but rather something stomping the gfxContext that's referencing them?

The value above could be interpreted as strange text: TAUAVAWA. The last "A" only if I take the liberty of assuming there's a +50 offset to a base address. Arbitrary to fit the pattern, but also there were a number of other crashing addresses that ended in "50" so maybe? Might also be part of an array of int16 counting up, interpreted as an address? 0x4154, 0x4155, 0x4156, 0x4157 (again, taking the liberty of assuming a +50 offset).

There are also 5 or 6 that crash with what looks like a frame-poisoning address: 0x7ffffffff0de7fff

Whiteboard: [tbird crash] → [tbird crash][firefox crash]

Confirmed the "+50" guess: the crashing instruction for the ones I checked was mov rax, qword [rax + 0x50]

(In reply to Daniel Veditz [:dveditz] from comment #13)

If we ignore the Thunderbird 102 crashes, this is essentially a regression in Firefox 112; there were single digit crashes before that, none of which looked memory-poisoned.

I took a quick look at the change log of files involved here around that time and https://hg.mozilla.org/integration/autoland/rev/6525cdd895fc sticks out. That changeset made the mTarget pointer a raw ptr, whereas it had been a refptr before.

Regressed by: 1815404

Set release status flags based on info from the regressing bug 1815404

We'll check back next week to see if bug 1842325 landing stopped the crashes.

looks like no crashes since b5 (when bug 1842325 landed)

Fixed by bug 1842325!

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Group: gfx-core-security → core-security-release
QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-
Target Milestone: --- → 117 Branch
Whiteboard: [tbird crash][firefox crash] → [tbird crash][firefox crash][adv-main116+r][adv-ESR115.1+r][adv-ESR102.14+r]
Whiteboard: [tbird crash][firefox crash][adv-main116+r][adv-ESR115.1+r][adv-ESR102.14+r] → [fixed by 1842325][tbird crash][firefox crash][adv-main116+r][adv-ESR115.1+r][adv-ESR102.14+r]

(In reply to Daniel Veditz [:dveditz] from comment #13)

(In reply to Andrew McCreight [:mccr8] from comment #4)

Another 10 of the crashes are specifically on the value 0x5441554156415791, and they don't seem to be all from the same install time either, so that's odd.

The value above could be interpreted as strange text: TAUAVAWA.

Probably doesn't matter at this point, but that looks like an amd64 function prologue: A is a REX prefix, PW are register push instructions. AWAVAUAT pushes r15, r14, r13, r12 (and it shows up a lot in strings/hexdump output).

Group: core-security-release
You need to log in before you can comment on or make changes to this bug.