Closed Bug 1897322 Opened 1 year ago Closed 1 year ago

Crash in [@ nsPresContext::GetDocShell]

Categories

(Core :: Layout, defect)

defect

Tracking

()

RESOLVED FIXED
128 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox126 --- unaffected
firefox127 --- unaffected
firefox128 + fixed

People

(Reporter: aryx, Assigned: emilio)

References

(Regression)

Details

(4 keywords)

Crash Data

Attachments

(1 file)

18 crashes from 10 installs of Firefox 128.0a1 on multiple operating systems, and that's with updates turned off for Windows for two days. First reported build ID is 20240514215634, no reports for 127.0a1. From bug 1896586 or bug 1895870?

Crash report: https://crash-stats.mozilla.org/report/index/7b003d3d-af1c-457c-8d7a-2e7330240516

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames:

0  xul.dll  RefPtr<mozilla::dom::Document>::get() const  mfbt/RefPtr.h:314
0  xul.dll  RefPtr<mozilla::dom::Document>::operator->() const  mfbt/RefPtr.h:344
0  xul.dll  nsPresContext::GetDocShell() const  layout/base/nsPresContext.cpp:1652
0  xul.dll  mozilla::PresShell::DoFlushPendingNotifications(mozilla::ChangesToFlush)  layout/base/PresShell.cpp:4283
1  xul.dll  mozilla::PresShell::FlushPendingNotifications(mozilla::ChangesToFlush)  layout/base/PresShell.h:1464
1  xul.dll  nsRefreshDriver::FlushLayoutOnPendingDocsAndFixUpFocus()  layout/base/nsRefreshDriver.cpp:2240
1  xul.dll  nsRefreshDriver::Tick(mozilla::layers::BaseTransactionId<mozilla::VsyncIdType...  layout/base/nsRefreshDriver.cpp:2732
2  xul.dll  mozilla::RefreshDriverTimer::TickDriver(nsRefreshDriver*, mozilla::layers::Ba...  layout/base/nsRefreshDriver.cpp:367
2  xul.dll  mozilla::RefreshDriverTimer::TickRefreshDrivers(mozilla::layers::BaseTransact...  layout/base/nsRefreshDriver.cpp:345
3  xul.dll  mozilla::RefreshDriverTimer::Tick(mozilla::layers::BaseTransactionId<mozilla:...  layout/base/nsRefreshDriver.cpp:361
Flags: needinfo?(emilio)

This majority of these crashes are on the jemalloc poison value, so I'm going to hide this.
For example: bp-6fd31c88-be1c-41e7-91db-16e0f0240516

Group: layout-core-security

Bug 1896024 is another UAF with refresh driver stuff on the stack, though that's supposed to be Android-only. It has a patch that just got merged to m-c today.

See Also: → 1896024

This is likely a regression from bug 1896593. FlushLayoutOnPendingDocsAndFixUpFocus is in the backtrace, and that function was added in bug 1896593 part 1.

Regressed by: 1896593
Severity: -- → S2

Looking in Visual Studio: when we reach PresShell::DoFlushPendingNotifications, we have "this" pointing to a PresShell object that's filled with the poison value:

this = 0x000001b4d8a3c000 mozilla::PresShell *
[...]
+		mUpdateApproximateFrameVisibilityEvent	{mEvent={mRawPtr=0xe5e5e5e5e5e5e5e5 {...} } }	nsRevocableEventPtr<nsRunnableMethod<mozilla::PresShell,void,1,0>>
[...]
+		mSelection	{mRawPtr=0xe5e5e5e5e5e5e5e5 {mRefCnt={mRefCntAndFlags=??? } mDomSelections=0xe5e5e5e5e5e5e5ed {{...}, ...} ...} }	RefPtr<nsFrameSelection>

If I move up the stack, the next two frames are marked as [Inline Frame], notably PresShell::FlushPendingNotifications and nsRefreshDriver::FlushLayoutOnPendingDocsAndFixUpFocus -- so really, moving up puts me in the next non-inline stack frame which is nsRefreshDriver::Tick.

There, I have a valid-looking this pointer and a valid-looking presShell (at a different address from the poisoned one discussed above):

this = 0x000001b4d5acaf00  nsRefreshDriver *
pressShell.mRawPtr = 0x000001b4d5c12000  mozilla::PresShell *

So the presShell local-var in nsRefreshDriver::Tick is still "good", but one of the ones we iterate over in nsRefreshDriver::FlushLayoutOnPendingDocsAndFixUpFocus ends up "bad", even though we add a strong stack reference to it.

FlushLayoutOnPendingDocsAndFixUpFocus does traverse an array of PresShells in a loop, and it only adds a strong reference for the one we're currently iterating over. I wonder if one iteration of the loop somehow blows away the remaining PresShell objects, and then we're hosed when we get to the next loop iteration?

If that's what's happening, then an extremely-simple paper-over might be to change the array to hold RefPtr instances here:
https://searchfox.org/mozilla-central/rev/a18a7c526cf3c531f2fc24db4f0dffbc16290a7e/layout/base/nsRefreshDriver.cpp#2223-2224

void nsRefreshDriver::FlushLayoutOnPendingDocsAndFixUpFocus() {
  AutoTArray<PresShell*, 16> observers;

so i.e. switch to...

AutoTArray<RefPtr<PresShell>, 16>

...or something along those lines. I'm not sure if that might introduce other problems, though. But I think we do need to somehow account for the possibility that one loop iteration might blow away the remaining still-to-be-iterated PresShells...

I think what's supposed to make this safe is the RemoveElement check below that tries to guarantee that we're not removed from the observer list already.

But yeah that's a bit sketchy anyways and is only marginally more efficient.

Blocks: 1897329
Assignee: nobody → emilio
Status: NEW → ASSIGNED
Flags: needinfo?(emilio)
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0328321272d7 More consistently deal with pres shell style observers. r=smaug
Attachment #9402408 - Attachment description: Bug 1897322 - More consistently deal with pres shell style observers. r=smaug,#layout → Bug 1897322 - More consistently deal with pres shell style observers. r=smaug
Backout by sstanca@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9d46b5eb4920 Backed out changeset 0328321272d7 for causing multiple failures. CLOSED TREE
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/152e3df2bd20 More consistently deal with pres shell style observers. r=smaug
Group: layout-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 128 Branch

I found a similar-looking crash on Fenix Nightly, again all on poison values. The signature is [@ mozilla::dom::Document::GetWindow ]. Here's an example crash: bp-777b647b-f6a4-413e-a70e-135430240517. Hopefully this will get fixed by the same patch. We'll have to keep an eye out for that.

Crash Signature: [@ nsPresContext::GetDocShell] → [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ]

Another possible variant: [@ mozilla::WeakPtr<T>::get ] bp-cb149673-675a-42eb-8646-612b00240517 There's only one instance of that crash so I won't add the signature.

See Also: → 1897571
Duplicate of this bug: 1897571
See Also: 1897571

There is a test case in bug 1897571, if you'd like a crash test.

Duplicate of this bug: 1897329

Copying crash signatures from duplicate bugs.

Crash Signature: [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] → [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts]
Crash Signature: [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts] → [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts]
Duplicate of this bug: 1897626

Copying crash signatures from duplicate bugs.

Crash Signature: [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts] → [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts] [@ mozilla::detail::WeakReference::get]

As of right now, there are zero crashes in the myriad signatures here after the patch landed, according to the little chart in Bugzilla.

Crash Signature: [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts] [@ mozilla::detail::WeakReference::get] → [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts] [@ mozilla::detail::WeakReference::get]
Duplicate of this bug: 1897743

Copying crash signatures from duplicate bugs.

Crash Signature: [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts] [@ mozilla::detail::WeakReference::get] → [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts] [@ mozilla::detail::WeakReference::get] [@ nsPIDOMWindowInner::GetOuterWindow]
Crash Signature: [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts] [@ mozilla::detail::WeakReference::get] [@ nsPIDOMWindowInner::GetOuterWindow] → [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts] [@ mozilla::detail::WeakReference::get] [@ nsPIDOMWindowInner::GetOuterWindow]
Duplicate of this bug: 1898015

More signatures.

Crash Signature: [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts] [@ mozilla::detail::WeakReference::get] [@ nsPIDOMWindowInner::GetOuterWindow] → [@ nsPresContext::GetDocShell] [@ mozilla::dom::Document::GetWindow ] [@ mozilla::dom::Document::GetFonts] [@ mozilla::detail::WeakReference::get] [@ nsPIDOMWindowInner::GetOuterWindow] [@ mozilla::WeakPtr<T>::get] [@ mozilla::dom::Document::GetUnret…
Duplicate of this bug: 1898122

Copying crash signatures from duplicate bugs.

Crash Signature: mozilla::dom::Document::GetUnretargetedFocusedContent] → mozilla::dom::Document::GetUnretargetedFocusedContent] [@ nsRefreshDriver::AddStyleFlushObserver]
Crash Signature: mozilla::dom::Document::GetUnretargetedFocusedContent] [@ nsRefreshDriver::AddStyleFlushObserver] → mozilla::dom::Document::GetUnretargetedFocusedContent] [@ nsRefreshDriver::AddStyleFlushObserver]

The patch landed in nightly and beta is affected.
:emilio, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox127 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(emilio)

I don't the existence of this signature on other branches is relevant. This specific spike had a clear regressor that was Nightly only.

Duplicate of this bug: 1898312

Should this have had sec-review and approval flags set & approval+ before landing?

Emilio, can you confirm that 127 and earlier are unaffected, and have we identified the regressor?

Flags: needinfo?(emilio)

Right, the regressor was bug 1896593.

Flags: needinfo?(emilio)
QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: