Closed Bug 1276920 Opened 8 years ago Closed 8 years ago

Crash in shutdownhang | mozilla::net::CacheFileIOManager::GetDoomedFile | mozilla::net::ShutdownEvent::PostAndWait

Categories

(Core :: Networking: Cache, defect)

x86
Windows 8
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox47 --- wontfix
firefox48 + fixed
firefox49 --- fixed
relnote-firefox --- 48+
firefox50 --- fixed
firefox51 --- affected

People

(Reporter: mayhemer, Assigned: michal)

Details

(Keywords: crash, topcrash, Whiteboard: [necko-active])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-dbaf5fe3-22a3-4522-b1a3-4e94f2160523.
=============================================================

It might be a loop more than an IO hang, similar to the one we had in finding a trash dir (fixed in bug 1268922).  while (true) used this way is simply not a good thing to do ;)

Michal, can you work on a fix here please?
Summary: Crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | mozilla::net::ShutdownEvent::PostAndWait → Crash in shutdownhang | mozilla::net::CacheFileIOManager::GetDoomedFile | mozilla::net::ShutdownEvent::PostAndWait
Attached patch fixSplinter Review
Attachment #8775085 - Flags: review?(honzab.moz)
Attachment #8775085 - Flags: review?(honzab.moz) → review+
Pushed by mnovotny@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6b2d6afdd59b
Limit number of tries to find unused file name for doomed entry file, r=honzab
https://hg.mozilla.org/mozilla-central/rev/6b2d6afdd59b
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Crash volume for signature 'shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | mozilla::net::ShutdownEvent::PostAndWait':
 - nightly(version 50):6 crashes from 2016-06-06.
 - aurora (version 49):91 crashes from 2016-06-07.
 - beta   (version 48):5938 crashes from 2016-06-06.
 - release(version 47):58606 crashes from 2016-05-31.
 - esr    (version 45):0 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       0       0       2       1       2       0       0
 - aurora       18      13      10      14       6      13      14
 - beta        487     490     515     433     664    1464    1430
 - release    8254    7981    8126    7873    8226    7804    7788
 - esr           0       0       0       0       0       0       0

Affected platform: Windows
Honza,  Michal, can we uplift this patch to aurora, beta and release (if safe)? This is a top crash impacting the release
Flags: needinfo?(michal.novotny)
Flags: needinfo?(honzab.moz)
Keywords: topcrash
I'm all for it, Michal, can you do the 'a?' dance please?
Flags: needinfo?(honzab.moz)
Comment on attachment 8775085 [details] [diff] [review]
fix

Approval Request Comment
[Feature/regressing bug #]: cache2
[User impact if declined]: dooming a file can take a lot of time in case there are many doomed entries, which can cause a shutdown hang
[Describe test coverage new/current, TreeHerder]: I don't think that any existing test triggers the limit when we fail to doom the file. So tested is only dooming an entry when we successfully find an unused doomed file.
[Risks and why]: probably low
[String/UUID change made/needed]: none
Flags: needinfo?(michal.novotny)
Attachment #8775085 - Flags: approval-mozilla-release?
Attachment #8775085 - Flags: approval-mozilla-beta?
Attachment #8775085 - Flags: approval-mozilla-aurora?
Michal, could you detail a bit more?
> [Risks and why]: probably low
I am considering taking this patch to release.
(In reply to Sylvestre Ledru [:sylvestre] from comment #5)
> Honza,  Michal, can we uplift this patch to aurora, beta and release (if
> safe)? This is a top crash impacting the release

It should be safe even for release. OTOH I don't think it will have any impact. I guess that almost all crashes with signature containing mozilla::net::ShutdownEvent::PostAndWait are not caused by this bug (possible GetDoomedFile loop). But due to crash-stats limitation it's hard to get exact numbers.
(In reply to Sylvestre Ledru [:sylvestre] from comment #8)
> Michal, could you detail a bit more?
> > [Risks and why]: probably low
> I am considering taking this patch to release.

It's low because we won't trigger the limit very often (see my comment #9). When we hit it, we fail to doom an entry and this is IMO not very well tested. The error should be propagated to upper layer correctly, but I'm not sure if all callers of CacheFile handle it. Maybe Honza has a better knowledge about this.
Flags: needinfo?(honzab.moz)
Comment on attachment 8775085 [details] [diff] [review]
fix

OK, let's take it to beta (should be in 49 beta 4) and release (48.0.1)

Honza, please tell me if you think it is too risky.
Attachment #8775085 - Flags: approval-mozilla-release?
Attachment #8775085 - Flags: approval-mozilla-release+
Attachment #8775085 - Flags: approval-mozilla-beta?
Attachment #8775085 - Flags: approval-mozilla-beta+
Attachment #8775085 - Flags: approval-mozilla-aurora?
(In reply to Michal Novotny (:michal) from comment #10)
> (In reply to Sylvestre Ledru [:sylvestre] from comment #8)
> > Michal, could you detail a bit more?
> > > [Risks and why]: probably low
> > I am considering taking this patch to release.
> 
> It's low because we won't trigger the limit very often (see my comment #9).
> When we hit it, we fail to doom an entry and this is IMO not very well
> tested. The error should be propagated to upper layer correctly, but I'm not
> sure if all callers of CacheFile handle it. Maybe Honza has a better
> knowledge about this.

Regarding shutdown, I wouldn't worry at all, only result is that the file remain on disk, nothing else.  But that can happen even without this patch, and can be something we may take care in a different bug, when thinking about and seeing the doom failure-rate.

However, no problem to uplift this to release IMO.
Flags: needinfo?(honzab.moz)
Added in the 48.0.1 release notes: "Fix a shutdown issue".
Crash volume for signature 'shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | mozilla::net::ShutdownEvent::PostAndWait':
 - nightly (version 51): 1 crash from 2016-08-01.
 - aurora  (version 50): 8 crashes from 2016-08-01.
 - beta    (version 49): 811 crashes from 2016-08-02.
 - release (version 48): 1075 crashes from 2016-07-25.
 - esr     (version 45): 0 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       0       1
 - aurora        2       4       2
 - beta        155     535     121
 - release     220     564     291
 - esr           0       0       0

Affected platform: Windows

Crash rank on the last 7 days:
             Browser   Content   Plugin
 - nightly
 - aurora
 - beta
 - release
 - esr
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: