Closed
Bug 1405818
Opened 7 years ago
Closed 6 years ago
Crash in mozilla::net::HttpChannelChild::Release
Categories
(Core :: Networking, defect, P1)
Tracking
()
RESOLVED
DUPLICATE
of bug 1401459
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox56 | --- | wontfix |
firefox57 | --- | wontfix |
firefox58 | --- | affected |
People
(Reporter: philipp, Assigned: schien)
Details
(4 keywords, Whiteboard: [necko-triaged])
Crash Data
Attachments
(3 obsolete files)
This bug was filed from the Socorro interface and is report bp-745012d3-4275-4f2c-8a81-282540170929. ============================================================= Crashing Thread (9), Name: Socket Thread Frame Module Signature Source 0 xul.dll mozilla::net::HttpChannelChild::Release() netwerk/protocol/http/HttpChannelChild.cpp:239 1 xul.dll RefPtr<mozilla::net::HttpChannelChild>::assign_assuming_AddRef(mozilla::net::HttpChannelChild*) obj-firefox/dist/include/mozilla/RefPtr.h:65 2 xul.dll mozilla::net::HttpBackgroundChannelChild::OnChannelClosed() netwerk/protocol/http/HttpBackgroundChannelChild.cpp:121 3 xul.dll mozilla::detail::RunnableMethodImpl<nsGlobalWindow* const, void ( nsGlobalWindow::*)(void), 1, 0>::Run() obj-firefox/dist/include/nsThreadUtils.h:1172 4 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1446 5 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:480 6 xul.dll mozilla::net::nsSocketTransportService::Run() netwerk/base/nsSocketTransportService2.cpp:976 7 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1446 8 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:480 9 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:339 10 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:319 11 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:299 12 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:506 13 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 14 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 15 ucrtbase.dll _o__CIpow 16 kernel32.dll BaseThreadInitThunk 17 mozglue.dll patched_BaseThreadInitThunk mozglue/build/WindowsDllBlocklist.cpp:824 18 ntdll.dll __RtlUserThreadStart 19 ntdll.dll _RtlUserThreadStart crash reports in the content process with this signature are somewhat rising since firefox 56 on windows. a small portion of them seems to indicate a UAF siatuation, so marking the bug as security sensitive as a precaution.
Updated•7 years ago
|
Group: core-security → network-core-security
Comment 1•7 years ago
|
||
The UAF crashes with this signature seem to be _WRITE access violations. The bulk of the crashes with this signature are _READ violations. Might be different bugs.
Keywords: csectype-uaf,
sec-high
Comment 2•7 years ago
|
||
Jason, can you take a look or help find someone to investigate? Thanks.
Flags: needinfo?(jduell.mcbugs)
Comment 3•7 years ago
|
||
SC: this is a refcount bug for HttpChannelChild on the socket transport thread. I'm wondering if this is somehow related to PBackgroundHttp? It's hard to tell since we've only got a crashing Release() event.
Flags: needinfo?(schien)
Assignee | ||
Comment 4•7 years ago
|
||
Not sure about the exact scenario that triggers this refcount bug. By looking at the backtrace and source code, I do find some member variables not well protected against multi-thread access. https://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/netwerk/ipc/ChannelEventQueue.h#157 https://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/netwerk/protocol/http/HttpChannelChild.h#306
Assignee: nobody → schien
Flags: needinfo?(schien)
Assignee | ||
Comment 5•7 years ago
|
||
Tracing down the path of HttpChannelChild::Release, I also found HttpChannelChild::TrySendDeletingChannel has fundamental logic error and it is definitely introduced by my PBg-Http patch. https://searchfox.org/mozilla-central/rev/1a4a26905f923458679a59a4be1e455ebc53c333/netwerk/protocol/http/HttpChannelChild.cpp#3399
Updated•7 years ago
|
Flags: needinfo?(jduell.mcbugs)
Priority: -- → P1
Whiteboard: [necko-triaged]
Assignee | ||
Comment 6•7 years ago
|
||
Attachment #8917790 -
Flags: feedback?(honzab.moz)
Assignee | ||
Comment 7•7 years ago
|
||
Attachment #8917791 -
Flags: feedback?(honzab.moz)
Assignee | ||
Comment 8•7 years ago
|
||
Attachment #8917792 -
Flags: feedback?(honzab.moz)
Assignee | ||
Comment 9•7 years ago
|
||
These three patches addressed my find in comment #4 and comment #5. Still trying to figure out if any of these fixes have relation to the crash report but worth to get feedback first.
Comment 10•7 years ago
|
||
Comment on attachment 8917791 [details] [diff] [review] make HttpChannelChild::mKeptAlive atomic Review of attachment 8917791 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/protocol/http/HttpChannelChild.cpp @@ +253,5 @@ > // Normally we Send_delete in OnStopRequest, but when we need to retain the > // remote channel for security info IPDL itself holds 1 reference, so we > // Send_delete when refCnt==1. But if !mIPCOpen, then there's nobody to send > // to, so we fall through. > + if (count == 1 && mIPCOpen && mKeptAlive.compareExchange(true, false)) { so, this fix suggests that: object state: mRefCnt == 2, mIPCOpen == true, mKeptAlive == true T1: Release() { --mRefCnt -> 1 mKeptAlive ?== true count ?== 1 context switch T2: AddRef() { ++mRefCnt -> 2 } T2: Release() { --mRefCnt -> 1 mKeptAlive ?== true count ?== 1 mIPCOpen ?== true yes, all true! context switch: T1: continues Release() -> mIPCOpen ?== true yes, all true! call TrySendDeletingChannel(); } context switch: T2: continues Release() -> call TrySendDeletingChannel(); } that's possible, yes, but... if you fix this with making keptalive atomic and use cmpxchg, then there is something else wrong. since if would mean that access to the object and call to release() can be done on multiple threads w/o locking from above. That's bad on its own. Release() and AddRef() should always be considered as implicitly fully atomic because they must always be called under some lock provided by the caller, see [1]. hence, if this particular patch fixes the crashes, then we need to find the place where the locking is not used properly over access to this object. [1] https://www.janbambas.cz/illusion-atomic-reference-counting/
Attachment #8917791 -
Flags: feedback?(honzab.moz) → feedback-
Comment 11•7 years ago
|
||
Comment on attachment 8917790 [details] [diff] [review] make ChannelEventQueue::mOwner atomic Review of attachment 8917790 [details] [diff] [review]: ----------------------------------------------------------------- This doesn't sound like a proper solution, please see the feedback for the " make HttpChannelChild::mKeptAlive atomic " patch, I think the same applies for this patch...
Attachment #8917790 -
Flags: feedback?(honzab.moz) → feedback-
Comment 12•7 years ago
|
||
Comment on attachment 8917792 [details] [diff] [review] fix logic of reentrant HttpChannelChild::TrySendDeletingChannel Review of attachment 8917792 [details] [diff] [review]: ----------------------------------------------------------------- so, this is just a clean up after the " make HttpChannelChild::mKeptAlive atomic " patch?
Comment 13•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #10) > [1] https://www.janbambas.cz/illusion-atomic-reference-counting/ To elaborate a bit more on this, if there is a member or global that we can call both Release() and AddRef() on on two threads, we have the problem. I kinda think that having non-threadsafe refcounter and rather use a lock in case of such complicated Release() (and of course AddRef()) implementation may be a better solution here, both improving performance and safety (multiple atomics are expensive, and can be more expensive than one lock!) Also, from experience, when there are multiple state variables we change and make decisions with, it's better to make the changes and decisions totally atomic (with a lock). If we avoid the concurrent unlocked member access - if there is such, we are OK then.
Assignee | ||
Comment 14•7 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #13) > (In reply to Honza Bambas (:mayhemer) from comment #10) > > [1] https://www.janbambas.cz/illusion-atomic-reference-counting/ > > To elaborate a bit more on this, if there is a member or global that we can > call both Release() and AddRef() on on two threads, we have the problem. > > I kinda think that having non-threadsafe refcounter and rather use a lock in > case of such complicated Release() (and of course AddRef()) implementation > may be a better solution here, both improving performance and safety > (multiple atomics are expensive, and can be more expensive than one lock!) > Also, from experience, when there are multiple state variables we change and > make decisions with, it's better to make the changes and decisions totally > atomic (with a lock). > > If we avoid the concurrent unlocked member access - if there is such, we are > OK then. Thanks for sharing this. HttpChannelChild can be hold by many objects and each of them works on different threads: 1) caller/creator of HttpChannelChild, typically main thread 2) listner of HttpChannelChild, main thread, and target thread if ODA is enabled 3) IPDL, extra refcount hold/release on main thread 4) ChannelEventQueue, main thread, and target thread if ODA is enabled 5) HttpBackgroundChannelChild, main thread and STS thread Not all the references to the HttpChannelChild is controlled by Necko, therefore it would be nearly impossible to use lock on every AddRef/Release to a HttpChannelChild object. This leads to two choices: 1) make HttpChannelChild::AddRef/Release atomic, or 2) make them reliable for multi-thread access at the same time.
Comment 15•7 years ago
|
||
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #14) > (In reply to Honza Bambas (:mayhemer) from comment #13) > > (In reply to Honza Bambas (:mayhemer) from comment #10) > > > [1] https://www.janbambas.cz/illusion-atomic-reference-counting/ > > > > To elaborate a bit more on this, if there is a member or global that we can > > call both Release() and AddRef() on on two threads, we have the problem. > > > > I kinda think that having non-threadsafe refcounter and rather use a lock in > > case of such complicated Release() (and of course AddRef()) implementation > > may be a better solution here, both improving performance and safety > > (multiple atomics are expensive, and can be more expensive than one lock!) > > Also, from experience, when there are multiple state variables we change and > > make decisions with, it's better to make the changes and decisions totally > > atomic (with a lock). > > > > If we avoid the concurrent unlocked member access - if there is such, we are > > OK then. > Thanks for sharing this. HttpChannelChild can be hold by many objects and > each of them works on different threads: > 1) caller/creator of HttpChannelChild, typically main thread > 2) listner of HttpChannelChild, main thread, and target thread if ODA is > enabled > 3) IPDL, extra refcount hold/release on main thread > 4) ChannelEventQueue, main thread, and target thread if ODA is enabled > 5) HttpBackgroundChannelChild, main thread and STS thread > > Not all the references to the HttpChannelChild is controlled by Necko, > therefore it would be nearly impossible to use lock on every AddRef/Release > to a HttpChannelChild object. That is not what I mean. Let's explain on an example. A class nsClass1 has a RefPtr<HttpChannelChild> mPtr member. It's assigned an instance of a valid object. A class nsClass2 has it's own RefPtr member too, pointing to the same http child. So, there are two individual objects keeping ref to the single child channel. The channel's refcnt is then at least 2. If we access and addref() (or release()) nsClass1->mRawPtr on T1 w/o a global lock and also access and addref() (or release()) nsClass2->mRawPtr on T2, again w/o locking, there is no problem. If both release()s, only one of the releases will delete the object, correctly. If one release()s and the other addref()s, the object will still be live - the refcnt will never drop to 0. *The problem* exists when nsClass1->mRawPtr - a single instance of the *RefPtr* aka a single memory cell keeping the address to the http child object - is accessed on T1 and T2 w/o locking! Then there is the race as I describe it in the article. Hence, you don't need one giant lock to access HttpChannelChild references all over the code. You need to care if access to HttpChannelChild *within* a single class (object) that refers it is atomic (done under a lock specific to the class with the mPtr member). > This leads to two choices: 1) make > HttpChannelChild::AddRef/Release atomic, Won't fix *the problem.* > or 2) make them reliable for > multi-thread access at the same time. They mostly are, I believe. But simplification (use of a lock rather than few atomics) would be good, just to eliminate the problem happen to be in release() imperfect atomicity.
Updated•7 years ago
|
Comment 16•7 years ago
|
||
Comment on attachment 8917792 [details] [diff] [review] fix logic of reentrant HttpChannelChild::TrySendDeletingChannel comment 15 and above, dropping f to move forward here. also, feel free to argue with me that I'm wrong :) these things are complicated.
Attachment #8917792 -
Flags: feedback?(honzab.moz) → feedback-
Updated•7 years ago
|
Comment 17•6 years ago
|
||
Hi Shih-Chiang, are you still planning to continue investigating this sec bug?
Flags: needinfo?(schien)
Assignee | ||
Comment 18•6 years ago
|
||
This bug might be a dup to bug 1401459 with different symptom.
Flags: needinfo?(schien)
Assignee | ||
Comment 19•6 years ago
|
||
Double check the code and crash stack, it's highly likely that the patch for bug 1401459 will fix this issue as well. The root cause is in HttpChannelChild::Release that we implicitly use |this| outside of critical section. Two Release runs in different threads might interleaving like this: > T1 T2 >nsrefcnt count = --mRefCnt; //count = 1 > nsrefcnt count = --mRefCnt; //count = 0 > if (mKeptAlive && count == 1 && mIPCOpen) { ... } > if (count == 0) { > mRefCnt = 1; /* stabilize */ > delete this; > return 0; > } >if (mKeptAlive && count == 1 && mIPCOpen) {...} >// access `this` after `delete this` on T2 Thus we'll see UAF in HttpChannelChild::Release. After bug 1401459 landed, HttpChannelChild::Release will always runs on main thread. Therefore, Release will not interleaving with each other anymore.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
Assignee | ||
Updated•6 years ago
|
Attachment #8917790 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8917791 -
Attachment is obsolete: true
Assignee | ||
Updated•6 years ago
|
Attachment #8917792 -
Attachment is obsolete: true
Updated•3 years ago
|
Group: network-core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•