Closed Bug 1055369 Opened 10 years ago Closed 9 years ago

Assertion failure: !handle || !handle->IsDoomed(), at netwerk/cache2/CacheFileIOManager.cpp:2708

Categories

(Core :: Networking: Cache, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: dbaron, Assigned: michal)

References

Details

Attachments

(3 files)

I just crashed due to:

Assertion failure: !handle || !handle->IsDoomed(), at /home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/cache2/CacheFileIOManager.cpp:2708
nsRunnableMethodImpl<tag_nsresult (mozilla::net::CacheFileIOManager::*)(), void, true>::Run() (/home/dbaron/builds/ssd/mozilla-central/obj/firefox-debugopt/netwerk/cache2/../../dist/include/nsThreadUtils.h:394 (discriminator 1))
mozilla::net::CacheIOThread::LoopOneLevel(unsigned int) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/cache2/CacheIOThread.cpp:292)
mozilla::net::CacheIOThread::ThreadFunc() (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/cache2/CacheIOThread.cpp:223)
mozilla::net::CacheIOThread::ThreadFunc(void*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/cache2/CacheIOThread.cpp:169)
_pt_root (/home/dbaron/builds/ssd/mozilla-central/mozilla/nsprpub/pr/src/pthreads/ptthread.c:215)
start_thread (/build/buildd/eglibc-2.19/nptl/pthread_create.c:312 (discriminator 2))
__clone (/build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113)
UNKNOWN (nil)

Program /home/dbaron/bin/running-firefox/firefox (pid = 4834) received signal 11.

I don't know what I was doing at the time, although loading twitter may have been related.


I'm running https://hg.mozilla.org/mozilla-central/rev/7fc96293ada8 plus my local patches as of Tue Aug 12 17:53:40 2014 -0700.

This is probably not related to the constant potential-deadlock assertions I'm seeing from the cache, but who knows.
The assertion was added in bug 1028415 where we fixed the problem in release builds by an ugly hotfix but we added an assertion to catch this in debug builds. Unfortunately we need the NSPR log to find the reason why we have doomed entries in the index. I never hit this assertion locally or on try server.
David,

I know you're one of the only people at Mozilla who runs a debug build all the time.  IIRC you do a few things to your build to make it run a littler faster?  We'd like to try to reproduce this, and I suspect it may take some prolonged browsing with a debug build, so if you can share your build recipe that would be great.

And/or you could run with NSPR logging and share the log when you hit this.  But the log will get *big* quickly, and it may contain login/cookie info.
Flags: needinfo?(dbaron)
I just hit this again, and again didn't realize until after the 300 second SEGV timeout that the reason I wasn't getting anything from the network was that the cache thread had crashed.  (I realize I could probably fix that so it suspends the whole process rather than just doing a sleep on the thread that crashed.)

(I was using facebook, and had recently tried to load an article from a link I saw.  And I was experiencing high network load on the local network which I thought might be interfering.)

Assertion failure: !handle || !handle->IsDoomed(), at /home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/cache2/CacheFileIOManager.cpp:2629
#01: nsRunnableMethodImpl<tag_nsresult (mozilla::net::CacheFileIOManager::*)(), void, true>::Run() (/home/dbaron/builds/ssd/mozilla-central/obj/firefox-debugopt/netwerk/cache2/../../dist/include/nsThreadUtils.h:391 (discriminator 1))
#02: mozilla::net::CacheIOThread::LoopOneLevel(unsigned int) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/cache2/CacheIOThread.cpp:292)
#03: mozilla::net::CacheIOThread::ThreadFunc() (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/cache2/CacheIOThread.cpp:223)
#04: mozilla::net::CacheIOThread::ThreadFunc(void*) (/home/dbaron/builds/ssd/mozilla-central/mozilla/netwerk/cache2/CacheIOThread.cpp:169)
#05: _pt_root (/home/dbaron/builds/ssd/mozilla-central/mozilla/nsprpub/pr/src/pthreads/ptthread.c:215)
#06: start_thread (/build/buildd/eglibc-2.19/nptl/pthread_create.c:312 (discriminator 2))
#07: __clone (/build/buildd/eglibc-2.19/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:113)
#08: ??? (???:???)


SourceStamp=df9cd77641c6, which means https://hg.mozilla.org/mozilla-central/rev/097821fd89ed plus my patch queue as of Fri Oct 10 12:01:43 2014 -0700.
I just hit this a third time, and I have it in gdb right now.  Anything you want?
I finally hit it with NSPR logging. The code in CacheFileIOManager::DoomFileByKeyInternal() is wrong and the bug was introduced with the patch from bug 914644. Instead of removing the dead code (handle->IsDoomed() condition) we changed the GetHandle() method to return also doomed handles. We should never make any presumption about the state of the entry based on existence of the doomed handle. We should always go to the disk when we don't have an active handle. What happens here is:

1) Doom an active entry (no entry in index, doomed handle exists until it is released)
2) Create a new entry (entry is in index, active + doomed handle exist)
3) Release handle of the new entry (entry is in index, only doomed handle exists)


The change in CacheFileHandles::GetHandle() was never needed, the problem was just with ordering of the events.
Assignee: nobody → michal.novotny
Comment on attachment 8567859 [details] [diff] [review]
fix

Review of attachment 8567859 [details] [diff] [review]:
-----------------------------------------------------------------

So as I understand, doing dooms on the OPEN level is quite enough to let dooming work correctly.  Try is green and before bug 914644 we had a perma failure, so it looks OK.

Thanks

r=honzab

::: netwerk/cache2/CacheFileIOManager.cpp
@@ +2609,5 @@
>      if (NS_SUCCEEDED(rv)) {
>        consecutiveFailures = 0;
>      } else if (rv == NS_ERROR_NOT_AVAILABLE) {
>        LOG(("CacheFileIOManager::OverLimitEvictionInternal() - "
>             "DoomFileByKeyInternal() failed. [rv=0x%08x]", rv));

NS_ERROR_NOT_AVAILABLE may be a result only when the file doesn't exist.  is it really a failure? (maybe it's just about updating the comment)
Attachment #8567859 - Flags: review?(honzab.moz) → review+
(In reply to Honza Bambas (:mayhemer) from comment #10)
> Comment on attachment 8567859 [details] [diff] [review]
> 
> So as I understand, doing dooms on the OPEN level is quite enough to let
> dooming work correctly.

Yes

> NS_ERROR_NOT_AVAILABLE may be a result only when the file doesn't exist.  is
> it really a failure? (maybe it's just about updating the comment)

It seems so, but I cannot guarantee that this error is not returned by some of the file operations that we do in DoomFileByKeyInternal() and DoomFileInternal(). It's still an error that should be logged and we could eventually start updating the index. The important thing is to remove the entry from the index so we won't try to evict the entry over and over again.
Sorry, in a moment of absentmindedness on my part, I backed this out for bustage without looking close enough at the failure. Re-landed, sorry for the churn.

https://hg.mozilla.org/integration/mozilla-inbound/rev/d94287364c3d
https://hg.mozilla.org/mozilla-central/rev/d94287364c3d
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in before you can comment on or make changes to this bug.