Closed Bug 1881650 Opened 1 year ago Closed 10 months ago

Poison crash in [@ mozilla::widget::DirectManipulationOwner::Destroy] from AutoWidgetPickerState::PickerState

Categories

(Core :: Widget: Win32, defect, P2)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
124 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox123 --- wontfix
firefox124 --- fixed
firefox125 --- fixed

People

(Reporter: mccr8, Assigned: rkraesig)

References

Details

(4 keywords, Whiteboard: [fixed by bug 1878568][adv-main124+r])

Crash Data

Attachments

(1 obsolete file)

Crash report: https://crash-stats.mozilla.org/report/index/54f6f573-f66a-4261-8a9a-c0ed30240222

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0  xul.dll  RefPtr<mozilla::widget::DManipEventHandler>::operator bool const  mfbt/RefPtr.h:338
0  xul.dll  mozilla::widget::DirectManipulationOwner::Destroy  widget/windows/DirectManipulationOwner.cpp:637
1  xul.dll  nsWindow::DestroyDirectManipulation  widget/windows/nsWindow.cpp:825
1  xul.dll  nsWindow::Destroy  widget/windows/nsWindow.cpp:1145
2  xul.dll  AutoWidgetPickerState::PickerState  widget/windows/nsFilePicker.cpp:74
2  xul.dll  AutoWidgetPickerState::~AutoWidgetPickerState  widget/windows/nsFilePicker.cpp:66
2  xul.dll  nsFilePicker::ShowFilePicker::<lambda_7>::~`lambda at /builds/worker/checkouts/gecko/widget/windows/nsFilePicker.cpp:574:7'  widget/windows/nsFilePicker.cpp:574
2  xul.dll  mozilla::Maybe<`lambda at /builds/worker/checkouts/gecko/widget/windows/nsFilePicker.cpp:574:7'>::reset  mfbt/Maybe.h:645
3  xul.dll  mozilla::MozPromise<mozilla::Maybe<mozilla::widget::filedialog::Results>, long, 1>::ThenValue<`lambda at /builds/worker/checkouts/gecko/widget/windows/nsFilePicker.cpp:574:7', `lambda at /builds/worker/checkouts/gecko/widget/windows/nsFilePicker.cpp:606:7'>::DoResolveOrRejectInternal  xpcom/threads/MozPromise.h:883
4  xul.dll  mozilla::MozPromise<nsTAutoStringN<char16_t, 64>, nsresult, 0>::ThenValueBase::ResolveOrRejectRunnable::Run  xpcom/threads/MozPromise.h:490

I see 7 crashes like this in the last 3 months, going back to 120.0.1. It might be worth keeping an eye on this in light of the async file picker work. I thought it might be related but I guess this goes back further? Or maybe it could even be fixed by it?

Maybe it is more like a sec-moderate if it seems likely that this requires that the user do something (like close the picker at a bad time, say).

Or maybe it could even be fixed by it?

Unfortunately, this seems formally unrelated: the async file picker went out in v122, and there are UAF-looking crashes both before and after that. (As well as nullptr-deref-crashes as far back as v117.)

Assignee: nobody → rkraesig
Severity: -- → S2
Priority: -- → P2

(In reply to Ray Kraesig [:rkraesig] from comment #2)

(As well as nullptr-deref-crashes as far back as v117.)

Strike that; those all appear to have been reported simultaneously by a single client, and are more likely a corrupt installation than this issue. The first version that we have any evidence of exhibiting this does seem to be 120.0.1, as :mccr8 stated.

Unfortunately it doesn't look like this is going to be quite as simple as giving something in the call-stack a RefPtr to work with. The 0xe5e5e5e5e5e5e5e5 value being referenced-through is nsWindow::mDmOwner... but the nsWindow itself is being accessed through AutoWidgetPickerState::mWindow, which is a sanely-valued RefPtr.

Hopefully, what's happening is that the nsFilePicker is being handed a pointer to an already-released window object. If so, either D198624 or D200546 might resolve it. (The latter causes a relevant object to be explicitly passed as an argument at PFilePicker construction time, rather than implicitly acquired by pointer-chasing a round of IPC later in PFilePicker::RecvOpen. The former transforms one of the pointers currently being chased-through into an explicit RefPtr.)

I'm not aware of anything that changed in v120 that could cause this — although if the root cause is indeed IPC-pointer-graph related, :nika might be. ni?ing appropriately.

Flags: needinfo?(nika)

(In reply to Ray Kraesig [:rkraesig] from comment #3)

Unfortunately it doesn't look like this is going to be quite as simple as giving something in the call-stack a RefPtr to work with. The 0xe5e5e5e5e5e5e5e5 value being referenced-through is nsWindow::mDmOwner... but the nsWindow itself is being accessed through AutoWidgetPickerState::mWindow, which is a sanely-valued RefPtr.

Hopefully, what's happening is that the nsFilePicker is being handed a pointer to an already-released window object.

One thing which might be interesting here is that prior to bug 1878568, the code in AutoWidgetPickerState would assume that the passed-in nsIWidget* object must be a nsWindow*, and would directly static_cast<nsWindow*>(aWidget) it https://searchfox.org/mozilla-central/rev/34579ee8b6445d22695484ab1033ee97cc9e13c7/widget/windows/nsFilePicker.cpp#64.

Scanning back up the stack, this widget is discovered by calling DOMWindowToWidget (https://searchfox.org/mozilla-central/rev/34579ee8b6445d22695484ab1033ee97cc9e13c7/widget/nsBaseFilePicker.cpp#169-170), which in turn fetches it from the docshell tree. I believe that in general, we should get a nsWindow from this call, but I'm not sure that it's guaranteed. For example, in headless mode, I believe we could end up with a mozilla::widget::HeadlessWidget() instead (though the crash report claims headless mode is disabled, so I don't think that's the cause), and in content processes I believe we would have a PuppetWidget (though again, the crash is in the parent process, so this is not relevant). I can't find any way in a quick scan that we could have the wrong type here, but I think that if we did encounter type confusion here, it could lead to a crash like this one.

If so, either D198624 or D200546 might resolve it. (The latter causes a relevant object to be explicitly passed as an argument at PFilePicker construction time, rather than implicitly acquired by pointer-chasing a round of IPC later in PFilePicker::RecvOpen. The former transforms one of the pointers currently being chased-through into an explicit RefPtr.)

Given that the nsWindow pointer seems to have a normal value, it seems somewhat unlikely that we got to it from an already-destroyed nsGlobalWindowOuter* pointer, so I'm not sure these changes will have done much here.

Flags: needinfo?(nika)

(In reply to Nika Layzell [:nika] (ni? for response) from comment #4)

Scanning back up the stack, this widget is discovered by calling DOMWindowToWidget (https://searchfox.org/mozilla-central/rev/34579ee8b6445d22695484ab1033ee97cc9e13c7/widget/nsBaseFilePicker.cpp#169-170), which in turn fetches it from the docshell tree. I believe that in general, we should get a nsWindow from this call, but I'm not sure that it's guaranteed.

It's not, and that's been a problem in the past. That was a slapdash patch that's been bugging me since I wrote it; I only recently realized how to get rid of the static_cast.

Headless mode specifically should have been ruled out by that, but CreateWindowlessBrowser() can also create a PuppetWidget, and I'm not sure that some of the contexts it's called in are content-only...

Well, I already wanted to uplift bug 1878568. CC'ing :pascalc for reference.

I think the type mismatch is almost certainly the cause; but there's no smoking gun, and I'm not sure how it would arise or could be exploited. (Possibly via a malicious extension? I don't know whether extensions can supply code that will be executed within the HiddenXULWindow.) If it's exploitable, it's probably very easily badly exploitable, since in some of these crash reports there are what appear to be fragments of URLs as the "pointer" being dereferenced.

Assuming that's what it is (and I'm going to keep digging to see if I can remove the assumption), the first patch of bug 1878568 would fix it. ESR will need a slightly amended patch, as that one won't quite apply cleanly, and the surrounding patches aren't planned for uplift at all. (The offending static_cast has been in there in one form or another since the Firefox 3.x days.)

:dveditz: how does core-security want to move forward with this? In particular, is it relevant whether or not I can find an exploit mechanism?

Flags: needinfo?(dveditz)

I don't know whether extensions can supply code that will be executed within the HiddenXULWindow.

All extensions that run "background scripts" run them in an invisible page. According to the code you linked it is –a– HiddenXULWindow, but each extension gets it's own page inside the Extension Process so it's not –the– HiddenXULWindow we use for the browser itself. But I guess that doesn't really matter. Background scripts don't get a normal browser DOM, but they could do things that cause a picker to show up such as using the Download API

We don't need to find a specific exploit or mechanism, but we don't worry as much about bugs where we can't think of a plausible way for web content or extensions to trigger them. If we think this might be triggerable from web extensions, but not web pages, we can tag it with a sec-moderate rating and then just fix it.

I'm reluctant to fix this on ESR-115 with the current evidence. The older crashes in 117 were all null derefs and we're not seeing any ESR-115 crashes even though they are over-represented in crash data (not throttled, unlike Release Windows crashes). There could have been some other change that made this dangerous cast start causing problems in more recent builds that doesn't happen or is less likely on older releases. Your description of the changes your patch would need, and the lack of other patches it depends on makes me nervous because we get nearly zero testing on ESR-115 before shipping. We only do some security fix verification and basic usage testing. But fixing this on Beta would be good if you're not worried about regressions.

Flags: needinfo?(dveditz)
Attachment #9388531 - Attachment is obsolete: true

(In reply to Daniel Veditz [:dveditz] from comment #8)

All extensions that run "background scripts" run them in an invisible page. According to the code you linked it is –a– HiddenXULWindow, but each extension gets it's own page inside the Extension Process so it's not –the– HiddenXULWindow we use for the browser itself. But I guess that doesn't really matter.

Yup, that'd do it. Wait, no. That certainly sounds like it could be a feasible attack vector, but these crashes are all tagged as being from the parent process. :/

I'm reluctant to fix this on ESR-115 with the current evidence. The older crashes in 117 were all null derefs and we're not seeing any ESR-115 crashes even though they are over-represented in crash data (not throttled, unlike Release Windows crashes).

That's an excellent point. Given that, I'm much more sanguine about letting ESR be. (Abandoning attached patch.)

Your description of the changes your patch would need, and the lack of other patches it depends on makes me nervous because we get nearly zero testing on ESR-115 before shipping. We only do some security fix verification and basic usage testing.

It's not important now; but as it happens, the "changes [my] patch would need" are just a couple of compile-time sanity-checks which don't depend on the larger refactor of which they were originally a part. (I originally misremembered that there was also some internal refactoring, but there wasn't.) The abandoned patch was literally just a copy of the mozilla-central implementation of that class dropped into ESR.

But fixing this on Beta would be good if you're not worried about regressions.

Assuming we've correctly identified the cause — and I don't have any good reason to think we haven't — the fix has already ridden the trains to Beta as part of bug 1878568. If that's enough for you, we can call this RESOLVED.

NI in reference to #c9

Flags: needinfo?(dveditz)

We'll keep an eye out for crashes in Fx124, but that sounds good. I set the "depends on" link to show where it got fixed.

Although you think the code flaw existed on ESR, we didn't see crashes with this signature until Firefox 120. There may have been some other code or timing constraints that prevented these crashes so I'm tagging this as a regression (as is bug 1878568, btw).

We'll need to add this bug to the "roll-up" advisory of internally-found crashes fixed in Fx124

Group: layout-core-security → core-security-release
Depends on: 1878568
Flags: needinfo?(dveditz)
Keywords: regression
Whiteboard: [fixed by bug 1878568]
Whiteboard: [fixed by bug 1878568] → [fixed by bug 1878568][adv-124+r]
Whiteboard: [fixed by bug 1878568][adv-124+r] → [fixed by bug 1878568][adv-main124+r]

:rkraesig, since this bug is a regression, could you fill (if possible) the regressed_by field?
For more information, please visit BugBot documentation.

Flags: needinfo?(rkraesig)

Alas, I'd just be guessing.

Flags: needinfo?(rkraesig)

Ray, can we close out this bug? If not, any idea for the next steps here?

Flags: needinfo?(rkraesig)

I was about to say "yes" — but it looks like shortly after my last comment, we received one crash report for this from ESR 115:

https://crash-stats.mozilla.org/report/index/679d82ff-03e5-41db-acad-01b4e0240322

@dveditz, is that (perhaps along with the fact that the code in D203196 has been running on Nightly for the past few months to no ill effect) sufficient to change the calculus here?

Flags: needinfo?(rkraesig) → needinfo?(dveditz)

No, that's a near-null crash like the ones we saw in 117, unlike the UAF-looking ones that concerned us. I'm still comfortable asserting we don't need this fix on ESR-115.

In any case, we fixed the bug in central so we should mark this bug fixed. Even if there turned out to be 115 concerns we would still call this bug fixed and would address fixing it on 115 through tracking flags or filing a "we should backport bug 1881650" bug. I don't see any need to do either at this time.

Flags: needinfo?(dveditz)

o7

Closing, then, as RESOLVED FIXED.

Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → 124 Branch
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: