Closed Bug 1020345 Opened 10 years ago Closed 8 years ago

Intermittent CacheFileIOManager::Shutdown stall on Android

Categories

(Core :: Networking: Cache, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 913822

People

(Reporter: gbrown, Unassigned)

References

Details

The test job jsreftest-3 on Android 2.2 frequently fails with an apparent hang on shutdown: Bug 1008422 tracks tests that failed because shutdown took longer than 330 seconds without producing any new output.

Investigation shows that the hang occurs during xpcom-shutdown notification. Even when the test succeeds, xpcom-shutdown notification often (but not always) stalls for a long time.

I recently narrowed down the stall to CacheFileIOManager::Shutdown: On Android 2.2, following the jsreftest-3 tests, CacheFileIOManager::Shutdown frequently takes several *minutes* to complete.

There are much less frequent shutdown hangs on other Android test jobs -- these have not been investigated.  I have not noticed shutdown hangs on other platforms.


Here's a try run demonstrating the stall:

https://tbpl.mozilla.org/?tree=Try&rev=5ac5cdbc7cb4

Consider https://tbpl.mozilla.org/php/getParsedLog.php?id=41028743&tree=Try&full=1

06-04 07:02:04.698 I/Gecko   ( 1749): >>> cache2 CacheObserver: xpcom-shutdown start
06-04 07:02:04.698 I/Gecko   ( 1749): >>> cache2 CacheObserver: xpcom-shutdown -- calling CacheFileIOManager::Shutdown
06-04 07:02:04.698 I/Gecko   ( 1749):  >> CacheFileIOManager::Shutdown start
06-04 07:02:04.708 I/Gecko   ( 1749):   > CacheFileIOManager::Shutdown condVar.Wait start
        ...
06-04 07:06:20.141 I/Gecko   ( 1749):   > CacheFileIOManager::Shutdown condVar.Wait end
06-04 07:06:20.288 I/Gecko   ( 1749):  >> CacheFileIOManager::Shutdown end
06-04 07:06:20.288 I/Gecko   ( 1749): >>> cache2 CacheObserver: xpcom-shutdown end
Can you please run it [1] with NSPR_LOG_MODULES set to "timestamp,cache2:5"?  

[1] http://www.janbambas.cz/get-nspr-log-from-tryserver-run/
Flags: needinfo?(gbrown)
NSPR logging is a little tricky on Android, but here's an attempt:

https://tbpl.mozilla.org/?tree=Try&rev=55c2522ae1f6
Flags: needinfo?(gbrown)
Here is another experiment with non-NSPR logging:

https://tbpl.mozilla.org/?tree=Try&rev=c3a3c3af525b

It looks to me like the fast shutdown cases find a small number of handles at http://hg.mozilla.org/mozilla-central/annotate/668f29cd71b3/netwerk/cache2/CacheFileIOManager.cpp#l1273:

  1273  for (uint32_t i=0 ; i<handles.Length() ; i++) {

while the longer-running cases find substantially more handles. Contrast https://tbpl.mozilla.org/php/getParsedLog.php?id=41060397&tree=Try&full=1 (441 handles) with https://tbpl.mozilla.org/php/getParsedLog.php?id=41043339&tree=Try&full=1 (4 handles).
According the #of handles, this seems like bug 913822.  Also there could be a good mitigation with bug 1013395 that we would like to have for 32.
Depends on: 913822, 1013395
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.