Closed Bug 1029213 Opened 10 years ago Closed 8 years ago

Shutdown hang in CacheFileIOManager::Shutdown

Categories

(Core :: Networking: Cache, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: Felipe, Assigned: michal)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-active])

Crash Data

Attachments

(1 file)

Attached file sample.txt
Might be a dupe of bug 1020345, but I'm filing this separately because that one was seen on Android and it happened on OSX for me.

>Call graph:
>    2498 Thread_16045340   DispatchQueue_1: com.apple.main-thread  (serial)
>    + 2498 start  (in firefox) + 52  [0x1000013d4]
>    +   2498 main  (in firefox) + 1519  [0x100001d1f]
>    +     2498 XRE_main  (in XUL) + 231  [0x1031ff467]
>    +       2498 XREMain::XRE_main(int, char**, nsXREAppData const*)  (in XUL) + 504  [0x1031ff178]
>    +         2498 ScopedXPCOMStartup::~ScopedXPCOMStartup()  (in XUL) + 148  [0x1031f9d74]
>    +           2498 mozilla::ShutdownXPCOM(nsIServiceManager*)  (in XUL) + 221  [0x1013bd11d]
>    +             2498 nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*)  (in XUL) + 164  [0x1013eb4d4]
>    +               2498 mozilla::net::CacheObserver::Observe(nsISupports*, char const*, char16_t const*)  (in XUL) + 433  [0x1015407e1]
>    +                 2498 mozilla::net::CacheFileIOManager::Shutdown()  (in XUL) + 302  [0x101520cde]
>    +                   2498 PR_WaitCondVar  (in libnss3.dylib) + 105  [0x101116989]
>    +                     2498 _pthread_cond_wait  (in libsystem_c.dylib) + 869  [0x7fff9708dfb9]
>    +                       2498 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff8d4f90fa]

Full sample attached
Depends on: 913822
Felipe, how long did this loop?  Indefinitely or it resurrected after some time?

Looking closer at the stack on IO thread this seems like it's stuck in MemoryPool::PurgeOverMemoryLimit() that should actually be just in-memory operation (no IO), hence fast and on a definite number of elements (tens of thousand).  More looks like a dup of bug 1025913.
Depends on: 1025913
No longer depends on: 913822
Honza, it was running indefinitely, I had to kill the process. But IIRC it wasn't a busy wait (the process was at 0%), so more of a deadlock. I'm not 100% sure on this but 95%.
FWIW my storage _should_ be fast as it is an SSD. But I am running with between 5 - 10GB of free space only.
perhaps this is FIXED?
Flags: needinfo?(honzab.moz)
Whiteboard: [necko-active]
maybe it is, but I'd rather let Michal double-check.
Flags: needinfo?(honzab.moz) → needinfo?(michal.novotny)
This doesn't seem to be a deadlock since there is an activity on IO thread. Main thread just waits until ShutdownEvent::Run() is executed on IO thread. I guess this is a dupe of some already fixed bug because there is no recent report of this problem.
Flags: needinfo?(michal.novotny)
if you think we have enough data to close this then please do so
Assignee: nobody → michal.novotny
This seems to be already fixed.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME

This is forked from bug 1633342 comment 16.
I found some crash reports which hangs in mozilla::net::CacheFileIOManager::Shutdown()

What do you think, michal?

Blocks: 1633342
Crash Signature: [@ shutdownhang | mozilla::net::ShutdownEvent::PostAndWait ]
Flags: needinfo?(michal.novotny)

(In reply to Junior [:junior] from comment #9)

This is forked from bug 1633342 comment 16.
I found some crash reports which hangs in mozilla::net::CacheFileIOManager::Shutdown()

What do you think, michal?

It's hard to say if it really hangs in mozilla::net::CacheFileIOManager::Shutdown() or if it was just killed at the right moment.

This crash is a nice example https://crash-stats.mozilla.org/report/index/c0aafc63-73f8-4f6b-9c11-2f4d80200704#allthreads. We're building index on CacheIO thread which has INDEX priority and on the main thread we've posted ShutdownEvent with WRITE priority which is higher. Building of the index should be interrupted at https://hg.mozilla.org/releases/mozilla-beta/file/51bedc350693e96f61507ed3c79ec4f230a1adc1/netwerk/cache2/CacheIndex.cpp#l2743 which obviously didn't happen. So either CacheIO thread is stuck on FileExists() at https://hg.mozilla.org/releases/mozilla-beta/file/51bedc350693e96f61507ed3c79ec4f230a1adc1/netwerk/cache2/CacheIndex.cpp#l2754 or Firefox was killed due to long shutdown right at the moment when we started shutting down the cache. In both cases, we cannot do much about it.

Flags: needinfo?(michal.novotny)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: