Closed Bug 1444851 Opened 8 years ago Closed 8 years ago

Crash in mozilla::a11y::HandlerProvider::BuildDynamicIA2Data

Categories

(Core :: IPC: MSCOM, defect)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

VERIFIED FIXED
Future
Tracking Status
firefox-esr52 --- unaffected
firefox58 --- unaffected
firefox59 --- unaffected
firefox60 + verified
firefox61 + verified

People

(Reporter: calixte, Assigned: Jamie)

Details

(4 keywords)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-27db9eb9-8e69-4e18-b7da-d00510180308. ============================================================= Top 10 frames of crashing thread: 0 xul.dll mozilla::a11y::HandlerProvider::BuildDynamicIA2Data accessible/ipc/win/HandlerProvider.cpp:268 1 xul.dll mozilla::detail::RunnableMethodImpl<mozilla::storage::AsyncExecuteStatements*, nsresult xpcom/threads/nsThreadUtils.h:1200 2 xul.dll mozilla::mscom::MainThreadInvoker::MainThreadAPC ipc/mscom/MainThreadInvoker.cpp:187 3 ntdll.dll RtlDispatchAPC 4 ntdll.dll KiUserApcDispatch 5 ntdll.dll NtTestAlert 6 xul.dll mozilla::SchedulerGroup::Runnable::Run xpcom/threads/SchedulerGroup.cpp:413 7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1040 8 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:97 9 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run ipc/glue/MessagePump.cpp:301 ============================================================= There are 22 crashes (from 15 installations) in nightly 60 starting with buildid 20180307220100. The pushlog for this build is [1]. It could be related to patch [2]. Two crash addresses are 0xffffffffe5e5e5e5. :jamie, could you investigate please ? [1] https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=bccdc6842104&tochange=a859a4b2257e [2] https://hg.mozilla.org/mozilla-central/rev/4811c426205d
Flags: needinfo?
Flags: needinfo? → needinfo?(jteh)
I can't reproduce this, but I have a working theory. The crash occurs when trying to refresh the handler payload; i.e. an MTA thread is handling a call to HandlerProvider::Refresh and dispatches a call to HandlerProvider::BuildDynamicIA2Data to the main thread. We crash when trying to QI the interceptor target to IA2_3: 268 HRESULT hr = mTargetUnk.get()->QueryInterface(NEWEST_IA2_IID, 269 getter_AddRefs(target)); That suggests that |mTargetUnk| is invalid. (I guess it could suggest that |this| is invalid, but I think that's less likely.) Normally, it shouldn't be possible for an Interceptor (and thus its target) to be destroyed before a call to its HandlerProvider, since the client never holds a reference to a HandlerProvider without also holding a reference to the associated Interceptor. However, our use of CoDisconnectObject changes this a little. Here's what I think is happening: 1. A handler requests to refresh its payload. 2. The refresh gets queued to the main thread. 3. Before 2) can execute on the main thread, the underlying accessible gets shut down. 4. That causes CoDisconnectObject to be called on the Interceptor. 5. CoDisconnectObject in turn gets called on the HandlerProvider. 6. However, COM sees that a call to the HandlerProvider is still running, so it marks the HandlerProvider for release but doesn't yet release it. (See the remarks in the MSDN article for CoDisconnectObject.) 7. After that, the CoDisconnectObject call from 4) immediately releases all remote references to the Interceptor, thus causing it to be destroyed. 8. The refresh call queued in 2) then runs on the main thread. 9. It tries to QI on the target, which was destroyed in 7). 10. Crashety crashy crash. Aaron, does this sound plausible to you? As I understand it, HandlerProvider::mTargetUnk is a weak reference (since it's an InterceptorTargetPtr and we "We intentionally do not touch the refcounts of interceptor targets"). I guess we'd either have to: 1. Somehow make it a strong reference, but that means releasing on the main thread, which hurts; or 2. Somehow prevent the Interceptor from being destroyed while its HandlerProvider is still alive, but that creates a reference cycle. Any other thoughts?
Assignee: nobody → jteh
Flags: needinfo?(jteh) → needinfo?(aklotz)
Oh. Or I could just set mTargetUnk to nullptr in HandlerProvider::DisconnectHandlerRemotes and make sure to check for nullptr in relevant places.
Group: core-security → dom-core-security
(In reply to James Teh [:Jamie] from comment #1) > I can't reproduce this, but I have a working theory. <snip> > Aaron, does this sound plausible to you? Yeah, that makes perfect sense to me. (In reply to James Teh [:Jamie] from comment #2) > Oh. Or I could just set mTargetUnk to nullptr in > HandlerProvider::DisconnectHandlerRemotes and make sure to check for nullptr > in relevant places. That sounds to me like the most practical option.
Flags: needinfo?(aklotz)
If a handlerProvider call is pending on another thread, CoDisconnectObject won't release this HandlerProvider immediately. However, the interceptor and its target might be destroyed. MozReview-Commit-ID: 75SyPMIpit0
Attachment #8958663 - Flags: review?(aklotz)
Attachment #8958663 - Flags: review?(aklotz) → review+
Comment on attachment 8958663 [details] [diff] [review] a11y::HandlerProvider: Clear the interceptor target reference when disconnecting remotes. r=aklotz [Security approval request comment] How easily could an exploit be constructed based on the patch? I think it would be extremely difficult. An exploit would need to run in the (sandboxed) content process, coordinate very specific timing conditions between the parent process and content, and place executable code at an exact address. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? The comments discuss an object being destroyed before we expect it to be. If even that is too much, I'd obviously be happy to be advised as to how to proceed. Which older supported branches are affected by this flaw? Firefox 60 (beta). If not all supported branches, which bug introduced the flaw? Firefox 60 (beta). Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? No. However, the patch should apply cleanly as is. How likely is this patch to cause regressions; how much testing does it need? Very unlikely.
Attachment #8958663 - Flags: sec-approval?
Keywords: sec-high
Urg. I didn't find the security bug approval process document until just now. Based on that, I'm guessing I'm going to need to remove most comments from this patch/commit and perhaps associate it with bug 1434822 instead of this one. I'd very much appreciate guidance on this.
Daniel, please see comment #6.
Flags: needinfo?(dveditz)
(In reply to James Teh [:Jamie] from comment #5) > If not all supported branches, which bug introduced the flaw? > Firefox 60 (beta). Is this a regression from bug 1434822? (In reply to James Teh [:Jamie] from comment #6) > now. Based on that, I'm guessing I'm going to need to remove most comments > from this patch/commit and perhaps associate it with bug 1434822 instead of > this one. I'd very much appreciate guidance on this. The comments in the patch look OK to me, especially since the crash seems racy. Just make sure the prime checkin comment isn't that revealing -- and the one in your attachment looks fine. It's ok--best in fact--to mention this bug number since we want to be able to track back reasons for code change in the future.
Flags: needinfo?(dveditz)
Comment on attachment 8958663 [details] [diff] [review] a11y::HandlerProvider: Clear the interceptor target reference when disconnecting remotes. r=aklotz sec-approval=dveditz
Attachment #8958663 - Flags: sec-approval? → sec-approval+
(In reply to Daniel Veditz [:dveditz] from comment #8) > Is this a regression from bug 1434822? Yes. > It's ok--best in fact--to mention > this bug number since we want to be able to track back reasons for code > change in the future. Thanks for the clarification. This page: https://wiki.mozilla.org/Security/Bug_Approval_Process notes the following (emphasis mine): > • When sec-approval+ is given, commits should occur without specific mention of security, *security bugs*, or sec-approvers if possible. As above, if you can check-in with a cover bug in the same area to obfuscate that there is a security fix, that is ideal. Does the term "security bugs" here not relate to bug numbers and/or does this doc need an update? Sorry for so many n00b questions, but given this is my first security land, I'd prefer not to screw it up. :)
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Future
Comment on attachment 8958663 [details] [diff] [review] a11y::HandlerProvider: Clear the interceptor target reference when disconnecting remotes. r=aklotz Approval Request Comment [Feature/Bug causing the regression]: Bug 1434822. [User impact if declined]: Race condition which can cause a crash; possible security implication. [Is this code covered by automated tests?]: No; no platform a11y testing mechanism. [Has the fix been verified in Nightly?]: No. We should be able to verify it by looking at crash stats for builds with the fix after a few days. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None. [Is the change risky?]: No. [Why is the change risky/not risky?]: Straightforward correctness patch. [String changes made/needed]: None.
Attachment #8958663 - Flags: approval-mozilla-beta?
Comment on attachment 8958663 [details] [diff] [review] a11y::HandlerProvider: Clear the interceptor target reference when disconnecting remotes. r=aklotz fix a sec bug / crash in beta60
Attachment #8958663 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Group: dom-core-security → core-security-release
No more crashes in nightly and beta since the patch landed.
Status: RESOLVED → VERIFIED
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: