Crash in [@ mozilla::layers::APZTaskRunnable::QueueRequest]
Categories
(Core :: Panning and Zooming, defect)
Tracking
()
People
(Reporter: gsvelto, Assigned: tnikkel)
References
(Regression)
Details
(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [post-critsmash-triage][adv-main101+r])
Crash Data
Attachments
(1 file, 1 obsolete file)
48 bytes,
text/x-phabricator-request
|
tjr
:
approval-mozilla-beta+
tjr
:
approval-mozilla-esr91+
tjr
:
sec-approval+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/52e28f90-7912-4c4f-ac4f-5dc130220409
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 10 frames of crashing thread:
0 xul.dll mozilla::layers::APZTaskRunnable::QueueRequest gfx/layers/apz/util/APZTaskRunnable.cpp:80
1 xul.dll mozilla::layers::APZChild::RecvRequestContentRepaint gfx/layers/ipc/APZChild.cpp:46
2 xul.dll mozilla::layers::PAPZChild::OnMessageReceived ipc/ipdl/PAPZChild.cpp:232
3 xul.dll mozilla::layers::PCompositorManagerChild::OnMessageReceived ipc/ipdl/PCompositorManagerChild.cpp:414
4 xul.dll mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1528
5 xul.dll mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:780
6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1187
7 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:85
8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:373
9 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:355
This appears to be a use-after-free crash, the crashing address is 0xffffffffffffffff
because we're hitting bug 1493342 but looking into the registers we clearly see the poison pattern.
Updated•3 years ago
|
Comment 1•3 years ago
|
||
The file in which the crash occurs, APZTaskRunnable.cpp, was added in bug 1730998, which landed in Firefox 95, and we have crash reports dating back to Firefox 95, so this seems like a regression introduced by that change.
Comment 2•3 years ago
|
||
Set release status flags based on info from the regressing bug 1730998
Comment 3•3 years ago
|
||
:hiro, since you are the author of the regressor, bug 1730998, could you take a look?
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Comment 4•3 years ago
|
||
Symptoms are sec-high and it's in the parent process which is worrying. The frequency is low happening by itself, but that doesn't mean it isn't easy to trigger if you know it's there.
It looks like the crash is on this line: mAPZTaskRunnable->QueueRequest(aRequest);
My guess, based on similar crashes in the past and not any knowledge of the code itself, is that something is spinning the event loop or otherwise causing mAPZTaskRunnable to get cleared and thus causing |this| to get destroyed.
Comment 6•3 years ago
|
||
Andrew, can you please point me out one of the bug numbers for the similar crashes? Now I have totally no idea how the crash happens. Thanks!
I'm not sure I have one off the top of my head, but the general issue is if you have a call like
mFoo->SomeMethod();
and SomeMethod spins the event loop, and it can run all sorts of weird stuff, including code that does mFoo = nullptr;
, which will cause mFoo to get destroyed, so this
inside SomeMethod now points to a dead object.
Assignee | ||
Comment 8•3 years ago
|
||
I don't know anything about the ipc implementation, but is it possible that we do something like send a delete msg (at normal priority) and then send a control priority msg (so it'd be handled before the delete msg I presume), and not handle that properly at some level?
Yeah, higher priority messages get handled first, I'm pretty sure. (Also there's no ordering guarantees across multiple top level protocols, though I'd guess that doesn't come into play here.)
Assignee | ||
Comment 10•3 years ago
•
|
||
I think I know why this is happening now. When we call mController->GetTopLevelPresShell() in APZTaskRunnable.cpp that seems like it'd just be returning a pointer, however it is not. ContentProcessController::GetTopLevelPresShell calls BrowserChild::GetTopLevelPresShell which calls BrowserChild::GetTopLevelDocument which what seems like a pure getter on nsIWebNavigation, however it will actually create a document if one does not exist
And I've run into that before where the creation of a doc from what seemed like a getter was causing one of my patches to fail. Anyways, that can get to nsDocShell::CreateAboutBlankContentViewer which can apparently kill the current docshell
I audited all the C++ callers of nsIWebNavigation::GetDocument (there aren't that many), nsWebBrowser::SaveDocument we can just hold a refptr to make it safe (I don't know much about that function so I don't want to change it functionally). The other caller is BrowserChild::GetTopLevelDocument. I checked all users of BrowserChild::GetTopLevelDocument and they all seem better to not create a document. So not too hard to fix up the nsIWebNavigation::GetDocument callers.
However the root of the problem is the PermitUnload call in nsDocShell::CreateAboutBlankContentViewer which could destroy the current docshell in that funciton except for the strong pointer we hold in that function, but then the docshell will probably just die when we return from the function. Looking at all callers of PermitUnload it seems there are several places that could be unsafe (there's quite a lot of auditing to do it properly).
Assignee | ||
Comment 11•3 years ago
|
||
This is the patch to fix the nsIWebNavigation::GetDocument callers.
Since there are potentially unfixed issues from the same root cause as I mentioned in comment 10, we may want to take steps to avoid making the issue too obvious.
Except for the toolkit/components/browser/nsWebBrowser.cpp change, the code could be landed from an open bug under the guise of simplifying/optimizing by not creating a document where one does not exist.
The toolkit/components/browser/nsWebBrowser.cpp is pretty clearly fixing some sort of security issue, but it's not super specific (someone reading the code was likely skip over the GetDocument call thinking it's just a simple getter, and perhaps focus on |mPersist->SaveDocument| call below which seems much more suspicious).
I'm not sure if we need to take that level of caution here or not.
Assignee | ||
Comment 12•3 years ago
|
||
ni for question in comment 11. Do we want to split this patch out as described in comment 11 or do we not need that level of caution?
Personally I don't think it is a huge deal, given that we don't know that these are even really exploitable. It doesn't seem to give much more information than somebody doing a regexp search on the codebase for things matching mFoo->, but maybe Tom has a different opinion.
Comment 14•3 years ago
|
||
Should we just use GetExtantDoc
instead of GetDocument
instead?
Comment 15•3 years ago
|
||
No, I think the normal sec-approval landing process is fine. We can land it the week of the 9th for example.
Comment 16•3 years ago
|
||
This seems fine to land as-is. The strongThis
looks like a typical UAF-prevention fix, and the first bits look vaguely like null de-ref prevention. Certainly don't hint at the doc-creation side-effects you're avoiding.
Assignee | ||
Comment 17•3 years ago
|
||
Sounds good. I'll get it into review and into the normal process.
Assignee | ||
Comment 18•3 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #14)
Should we just use
GetExtantDoc
instead ofGetDocument
instead?
Yes, I think we should use GetExtantDoc instead of GetDocument (that's what the patch does for the callers I'm comfortable changing behaviour of, which is most of them, the C++ ones at least, js callers should be safe at least but still side-effect causing). It would be great if we could get rid of the side-effect causing GetDocument altogether, but there might be places that depend on it. I'm all for followups to remove this foot-gun or make it harder for this to happen again.
Assignee | ||
Comment 19•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 20•3 years ago
|
||
Comment on attachment 9273737 [details]
Bug 1764878. r?emilio
Security Approval Request
- How easily could an exploit be constructed based on the patch?: the root change of switching a GetDocument function from that one creates a document if one does not exist to one that just gets any already existing doc is not immediately obvious, so this is less obvious then most sec patches
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: all
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: Yes
- If not, how different, hard to create, and risky will they be?: trivial
- How likely is this patch to cause regressions; how much testing does it need?: the core of the patch is changing a GetDocument function from that one creates a document (and thus can destroy things) if one does not exist to one that just gets any already existing doc. I audited all of the callers and they all seem like that is the behaviour they want, and I didn't see any way that removing that side effect would cause problems. There's still the possibility that the side effect of that document had some ripple effect through a long chain of events that was beneficial to us somehow that is pretty much impossible to reject with 100% certainty.
- Is Android affected?: Yes
Comment 21•3 years ago
|
||
Comment on attachment 9273737 [details]
Bug 1764878. r?emilio
Approved to land and uplift
![]() |
||
Comment 22•3 years ago
|
||
Landed: https://hg.mozilla.org/integration/autoland/rev/d1f6439e50294e448388dab783df771dcd24d49a
Backed out for causing bustage on nsWebBrowser.cpp:
https://hg.mozilla.org/integration/autoland/rev/efc3eea003f84930302234f2d83974ccacbb3b80
Push with bustage: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&resultStatus=pending%2Crunning%2Ctestfailed%2Cbusted%2Cexception%2Cretry&revision=d1f6439e50294e448388dab783df771dcd24d49a
Build log: https://treeherder.mozilla.org/logviewer?job_id=376764933&repo=autoland
toolkit/components/browser/nsWebBrowser.cpp:813:3: error: Unused "kungFuDeathGrip" 'RefPtr<nsIWebBrowserPersist>' objects constructed from temporary values are prohibited
![]() |
||
Comment 24•3 years ago
|
||
r=emilio
https://hg.mozilla.org/integration/autoland/rev/48b6b2f03b266067a69c96882c05b4b95e60a347
https://hg.mozilla.org/mozilla-central/rev/48b6b2f03b26
Comment 25•3 years ago
|
||
uplift |
Landed for 101.0b3.
https://hg.mozilla.org/releases/mozilla-beta/rev/937b44ee4d8f
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Description
•