Crash in shutdownhang | mozilla::net::CacheFileIOManager::GetDoomedFile | mozilla::net::ShutdownEvent::PostAndWait

RESOLVED FIXED in Firefox 48

Status

()

defect
--
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: mayhemer, Assigned: michal)

Tracking

({crash, topcrash})

Trunk
mozilla50
x86
Windows 8
Points:
---

Firefox Tracking Flags

(firefox47 wontfix, firefox48+ fixed, firefox49 fixed, relnote-firefox 48+, firefox50 fixed, firefox51 affected)

Details

(Whiteboard: [necko-active], crash signature)

Attachments

(1 attachment)

Reporter

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-dbaf5fe3-22a3-4522-b1a3-4e94f2160523.
=============================================================

It might be a loop more than an IO hang, similar to the one we had in finding a trash dir (fixed in bug 1268922).  while (true) used this way is simply not a good thing to do ;)

Michal, can you work on a fix here please?
Reporter

Updated

3 years ago
Summary: Crash in shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | mozilla::net::ShutdownEvent::PostAndWait → Crash in shutdownhang | mozilla::net::CacheFileIOManager::GetDoomedFile | mozilla::net::ShutdownEvent::PostAndWait
Assignee

Comment 1

3 years ago
Posted patch fixSplinter Review
Attachment #8775085 - Flags: review?(honzab.moz)
Reporter

Updated

3 years ago
Attachment #8775085 - Flags: review?(honzab.moz) → review+

Comment 2

3 years ago
Pushed by mnovotny@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6b2d6afdd59b
Limit number of tries to find unused file name for doomed entry file, r=honzab

Comment 3

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/6b2d6afdd59b
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Crash volume for signature 'shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | mozilla::net::ShutdownEvent::PostAndWait':
 - nightly(version 50):6 crashes from 2016-06-06.
 - aurora (version 49):91 crashes from 2016-06-07.
 - beta   (version 48):5938 crashes from 2016-06-06.
 - release(version 47):58606 crashes from 2016-05-31.
 - esr    (version 45):0 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       0       0       2       1       2       0       0
 - aurora       18      13      10      14       6      13      14
 - beta        487     490     515     433     664    1464    1430
 - release    8254    7981    8126    7873    8226    7804    7788
 - esr           0       0       0       0       0       0       0

Affected platform: Windows
Honza,  Michal, can we uplift this patch to aurora, beta and release (if safe)? This is a top crash impacting the release
Flags: needinfo?(michal.novotny)
Flags: needinfo?(honzab.moz)
Keywords: topcrash
Reporter

Comment 6

3 years ago
I'm all for it, Michal, can you do the 'a?' dance please?
Flags: needinfo?(honzab.moz)
Assignee

Comment 7

3 years ago
Comment on attachment 8775085 [details] [diff] [review]
fix

Approval Request Comment
[Feature/regressing bug #]: cache2
[User impact if declined]: dooming a file can take a lot of time in case there are many doomed entries, which can cause a shutdown hang
[Describe test coverage new/current, TreeHerder]: I don't think that any existing test triggers the limit when we fail to doom the file. So tested is only dooming an entry when we successfully find an unused doomed file.
[Risks and why]: probably low
[String/UUID change made/needed]: none
Flags: needinfo?(michal.novotny)
Attachment #8775085 - Flags: approval-mozilla-release?
Attachment #8775085 - Flags: approval-mozilla-beta?
Attachment #8775085 - Flags: approval-mozilla-aurora?
Michal, could you detail a bit more?
> [Risks and why]: probably low
I am considering taking this patch to release.
Assignee

Comment 9

3 years ago
(In reply to Sylvestre Ledru [:sylvestre] from comment #5)
> Honza,  Michal, can we uplift this patch to aurora, beta and release (if
> safe)? This is a top crash impacting the release

It should be safe even for release. OTOH I don't think it will have any impact. I guess that almost all crashes with signature containing mozilla::net::ShutdownEvent::PostAndWait are not caused by this bug (possible GetDoomedFile loop). But due to crash-stats limitation it's hard to get exact numbers.
Assignee

Comment 10

3 years ago
(In reply to Sylvestre Ledru [:sylvestre] from comment #8)
> Michal, could you detail a bit more?
> > [Risks and why]: probably low
> I am considering taking this patch to release.

It's low because we won't trigger the limit very often (see my comment #9). When we hit it, we fail to doom an entry and this is IMO not very well tested. The error should be propagated to upper layer correctly, but I'm not sure if all callers of CacheFile handle it. Maybe Honza has a better knowledge about this.
Flags: needinfo?(honzab.moz)
Comment on attachment 8775085 [details] [diff] [review]
fix

OK, let's take it to beta (should be in 49 beta 4) and release (48.0.1)

Honza, please tell me if you think it is too risky.
Attachment #8775085 - Flags: approval-mozilla-release?
Attachment #8775085 - Flags: approval-mozilla-release+
Attachment #8775085 - Flags: approval-mozilla-beta?
Attachment #8775085 - Flags: approval-mozilla-beta+
Attachment #8775085 - Flags: approval-mozilla-aurora?
Reporter

Comment 14

3 years ago
(In reply to Michal Novotny (:michal) from comment #10)
> (In reply to Sylvestre Ledru [:sylvestre] from comment #8)
> > Michal, could you detail a bit more?
> > > [Risks and why]: probably low
> > I am considering taking this patch to release.
> 
> It's low because we won't trigger the limit very often (see my comment #9).
> When we hit it, we fail to doom an entry and this is IMO not very well
> tested. The error should be propagated to upper layer correctly, but I'm not
> sure if all callers of CacheFile handle it. Maybe Honza has a better
> knowledge about this.

Regarding shutdown, I wouldn't worry at all, only result is that the file remain on disk, nothing else.  But that can happen even without this patch, and can be something we may take care in a different bug, when thinking about and seeing the doom failure-rate.

However, no problem to uplift this to release IMO.
Flags: needinfo?(honzab.moz)
Added in the 48.0.1 release notes: "Fix a shutdown issue".
Crash volume for signature 'shutdownhang | WaitForSingleObjectEx | WaitForSingleObject | PR_WaitCondVar | mozilla::CondVar::Wait | mozilla::net::ShutdownEvent::PostAndWait':
 - nightly (version 51): 1 crash from 2016-08-01.
 - aurora  (version 50): 8 crashes from 2016-08-01.
 - beta    (version 49): 811 crashes from 2016-08-02.
 - release (version 48): 1075 crashes from 2016-07-25.
 - esr     (version 45): 0 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       0       1
 - aurora        2       4       2
 - beta        155     535     121
 - release     220     564     291
 - esr           0       0       0

Affected platform: Windows

Crash rank on the last 7 days:
             Browser   Content   Plugin
 - nightly
 - aurora
 - beta
 - release
 - esr
You need to log in before you can comment on or make changes to this bug.