Closed
Bug 1368273
Opened 7 years ago
Closed 6 years ago
Crash in mozilla::dom::cache::StreamControl::CloseAllReadStreams
Categories
(Core :: DOM: Core & HTML, defect, P2)
Tracking
()
People
(Reporter: philipp, Assigned: mdaly)
References
Details
(4 keywords)
Crash Data
This bug was filed from the Socorro interface and is report bp-40a72e0d-a564-4079-bbc0-f5a310170525. ============================================================= Crashing Thread (30) Frame Module Signature Source 0 xul.dll mozilla::dom::cache::StreamControl::CloseAllReadStreams() dom/cache/StreamControl.cpp:72 1 xul.dll mozilla::dom::cache::StreamList::Cancel() dom/cache/StreamList.cpp:149 2 xul.dll mozilla::dom::cache::Context::CancelAll() dom/cache/Context.cpp:899 3 xul.dll mozilla::dom::cache::Manager::Shutdown() dom/cache/Manager.cpp:1815 4 xul.dll `anonymous namespace'::CacheQuotaClient::ShutdownWorkThreads dom/cache/QuotaClient.cpp:211 5 xul.dll mozilla::dom::quota::QuotaManager::Shutdown() dom/quota/ActorsParent.cpp:3217 6 xul.dll mozilla::dom::quota::QuotaManager::ShutdownRunnable::Run() dom/quota/ActorsParent.cpp:2439 7 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1240 8 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/glue/nsThreadUtils.cpp:390 9 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:368 10 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231 11 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211 12 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:490 13 nss3.dll _PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 14 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 15 ucrtbase.dll _o__CIpow 16 kernel32.dll BaseThreadInitThunk 17 ntdll.dll __RtlUserThreadStart 18 ntdll.dll _RtlUserThreadStart this is a low level crash across platforms that got a bit more common after 2017-05-22 & is mostly mostly UAF. a number of comments mention they where on weather.com when the crash happened, so maybe there was a recent change on that page that is tickling this crash now.
Updated•7 years ago
|
Group: core-security → dom-core-security
Keywords: csectype-uaf,
sec-high
Comment 2•7 years ago
|
||
Andrew, do you have any time to look at this?
Flags: needinfo?(bkelly) → needinfo?(bugmail)
Comment 3•7 years ago
|
||
Yes, looking. Leaving needinfo up for nags.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Flags: needinfo?(bugmail)
Comment 5•7 years ago
|
||
No solution yet, investigation results thus far: - weather.com isn't doing anything too crazy with ServiceWorkers or cache. It's using sw-toolbox for its cache purposes and it has a few of those. That said, it is a very busy website, caching a fair bit. (716K caches.sqlite, 11M morgue). Having said that, I did see some additional cache stuff going on in Firefox devtools and under Chrome I saw the SW update itself, so there may also be some kind of A/B thing going on or a war of the service workers or something. - This is definitely a shutdown crash. Reading between the lines of the few weather.com comments in crash-stats and the stacks, I think users may have experienced bad browser janking and then quit, rather than weather.com having any direct causality. Since most of the crashes seem to be non-e10s (at least that's how I'm interpreting the lack of a "DOMIPCEnabled" flag in meta), we certainly would expect any jank to be unpleasant. - Code inspection seemed good. The obvious explanation for the crash is that the non-refcounted CacheStreamControlParent got deleted by IPC but the (refcounted) StreamList still had a reference. However, the CacheStreamControlParent's ActorDestroy explicitly tells the StreamList to forget about it. The control flow around all of that is simple and benefits from good hygiene so higher level logic errors still shouldn't be able to screw these invariants up. Given that PCacheStreamControl is directly managed by PBackground, there aren't even interesting tree edge-cases. - I taught nsIQuotaManagerService's test functionality how to explicitly shutdown a QuotaClient in the hopes of being able to reproduce this in a mochitest. (nsIQMS's reset() doesn't shut down the QuotaClients.) I've got the helper working, but it hasn't borne fruit yet. Presumably because I still need to cause the PBackground channel to self-destruct before triggering the shutdown. I'll attempt to better reproduce the shutdown circumstances as they relate to IPC shutdown tomorrow.
Comment 6•7 years ago
|
||
There is another weak-ref in the mix. The Context maintains a mActivityList or something which contains StreamList objects. I didn't see anything obviously broken there either, but its worth double checking. The other thought I had was perhaps the shutdown code spinning the event loop is breaking some invariant. We expect ActorDestroy() to get called in a separate runnable, but perhaps its running within the spun loop and surprising us somehow.
Updated•7 years ago
|
status-firefox-esr52:
--- → affected
Comment 7•7 years ago
|
||
Note, Cache API is enabled in FF52 ESR, but service workers are not enabled. So the possibility of triggering this there should be greatly reduced.
Updated•7 years ago
|
Priority: -- → P2
Updated•7 years ago
|
Keywords: testcase-wanted
Comment 9•7 years ago
|
||
This has been discussed off-bug a bit through the months. The main summary was that there was no smoking gun, and my efforts at reproduction through various manual and speculative automated stress tests related to creating orphaned caches and inducing quotaclient shutdown via specialized QM logic did not manage to reproduce. Recently bug 1411908 appeared and seems related. In discussion, :bkelly indicated that in his investigatory efforts, he'd found that the site uses the Cache API for its weather map UI that is layered over a google maps style interface, with the mapbox tiles being cached in the Cache API. This potentially introduces a new usage pattern beyond the use of sw-toolbox and its cache that we should still investigate. It also increased our hopes of being able to reproduce, as the sw-toolbox scenario would only result in crashes when the SW updated and a cache would be destroyed. There are also some over-arching concerns about QuotaManager shutdown related to locks and the potential for resuscitation of cache API manager instances near shutdown. (Which may or may not be related to some SQLite contention that's been observed. I've also seen that somewhat recently with IDB which should share similar IO thread invariants to DOM cache, so really there's a lot of investigation that needs to happen all around.) I'm clearing the assignee right now per discussion with :overholt to better indicate this bug is not actively being worked at this time (at least not by me). I will proactively follow bug 1411908's investigation in the near term and attempt to assist. We may want to update the dependencies or involve a meta bug related to shutdown issues for QM and the cache API, as concerns do seem to be spread across several bugs. I'll try to arrange some discussion of this or maybe a brief mini-sprint on that at the imminent all-hands. Aside: bug 1413112 which has not landed yet may help improve things as it cleans up the use of WorkerPrivate.
Assignee: bugmail → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(bugmail)
Comment 10•7 years ago
|
||
Andrew: please find appropriate folks to work on this security bug.
Assignee: nobody → overholt
Comment 12•7 years ago
|
||
Andrew (Sutherland) indicated bug 1411908 could be related in comment 9, btw
Flags: needinfo?(bugmail)
Comment 13•7 years ago
|
||
This is Marion territory now but I'm thinking we should probably resolve INCOMPLETE given comment 9.
Assignee: overholt → mdaly
Assignee | ||
Comment 14•6 years ago
|
||
Confirmed. This is an INCOMPLETE given the cold trail. We'll reopen if this pops up again in the future.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Updated•6 years ago
|
Group: dom-core-security
Updated•5 years ago
|
Component: DOM → DOM: Core & HTML
Updated•2 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•