Don't literally doom HTTP cache entries that were found invalid (never written metadata) after shutdown

RESOLVED FIXED in Firefox 51

Status

()

defect
--
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: mayhemer, Assigned: mayhemer)

Tracking

Trunk
mozilla51
Points:
---

Firefox Tracking Flags

(firefox51 fixed)

Details

(Whiteboard: [necko-active])

Attachments

(1 attachment)

I may found out one of the reasons we still have shutdown crashes with cache2 after all the fixes.  

I'm testing with an extremely slow HDD now (part of the work on bug 1288204).  

A crash is easy to reproduce when you visit a page with number of resources the first time and during it's load (that can be very slow because of slow HDD and cache interaction, another bug to fix, yet to be filed) you simply close the browser.

Cause is simple.  The whole load group is canceled and each http channel then dooms its cache entry, a correct thing to do.  Despite we bypass number of operations right after shutdown in the HTTP cache, dooming must happen.  Doomed stuff must go away - or, it simply must not be found anytime later.

Dooming an actively open entry means: close the current file (expensive), rename it (may be expensive, as we also do IO to find the name).

But we can use potential advantage.  The entries being doomed here are in state of write-progress, hence still "invalid", i.e. NOT having metadata + the checksums written.  We simply may leave these files on disk when after shutdown as are, they will be ignored next time (effectively behave as doomed).
Whiteboard: [necko-active]
The way to reproduce seems to be eventually different:
- clean cache
- start browser
- go to a page with a lot of resources
- quit during its load
=> result: we have a number of entries in the cache that didn't finish writing (are invalid)
- start browser
- go to the same page
=> state: number of channels loading, while cache entries are found invalid (channels get new cache entries and load from network)
- quit during load, all load group's channel in progress canceled with NS_BINDING_ABORTED 
=> sh timeout crash, because we doom all the entries


Reason why we are OK with a first ever visit is that when we are dooming the cache entry files after cancellation, there are actually no physical files on disk yet to delete.  On second load we are trying to get rid of the invalid entries files.

In reality I can imagine that users are frustrated with how slowly pages load (because the HTTP cache may block on slow I/O - network reload would be faster).  So, they close the browser (ending up with number of invalid entries in the cache).  Later they start again, it's still slow, they quit the browser again and they crash this time.

This is not very likely scenario, but still valid.
Status: NEW → ASSIGNED
Summary: Don't literally doom HTTP cache entries that are invalid (never written metadata) after shutdown → Don't literally doom HTTP cache entries that were found invalid (never written metadata) after shutdown
Posted patch v1Splinter Review
This simple change well eliminates the problem.  Note that the aHandle->mInvalid = true assignment in CacheFileIOManager::TruncateSeekSetEOF simply happens too late (on the write level) and can be preceded by dooming.
Attachment #8778399 - Flags: review?(michal.novotny)
Attachment #8778399 - Flags: review?(michal.novotny) → review+
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/45be90d58d47
Mark CacheFileHandles invalid ASAP we can't read metadata, r=michal
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/45be90d58d47
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in before you can comment on or make changes to this bug.