Closed Bug 1028415 Opened 10 years ago Closed 10 years ago

Cache thread gets stuck in CacheFileIOManager::OverLimitEvictionInternal loop

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla33

Tracking Flags:

Tracking

Status

firefox30

---

unaffected

firefox31

---

unaffected

firefox32

fixed

firefox33

---

fixed

b2g-v2.0

---

fixed

b2g-v2.1

---

fixed

People

(Reporter: jchen, Assigned: michal)

References

Details

Attachments

(2 files)

nightly-cache-loop.txt 10 years ago (inactive) Jim Chen [:jchen] [:darchons] 19.78 KB, text/plain		Details
patch v1 10 years ago Michal Novotny [:michal] 4.88 KB, patch	mayhemer : review+ Sylvestre : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

(inactive) Jim Chen [:jchen] [:darchons]

Reporter

Description

•

10 years ago

Attached file nightly-cache-loop.txt — Details

For some reason, Nightly on my MacBook Air was stuck in a loop in CacheFileIOManager::OverLimitEvictionInternal (CPU was at 100% for many minutes and it was enough to turn on the CPU fan at full speed). This happened on the Cache2 I/O thread, and the main thread was fully responsive. I finally used "Clear Recent History" to clear the last hour of history, and afterwards CPU usage immediately dropped down to 0%. Attached is the output from OS X's internal profiler (I didn't have the Gecko profiler installed).

Michal Novotny [:michal]

Assignee

Comment 1

•

10 years ago

(In reply to Jim Chen [:jchen :nchen] from comment #0) > Created attachment 8443751 [details] > nightly-cache-loop.txt > > For some reason, Nightly on my MacBook Air was stuck in a loop in > CacheFileIOManager::OverLimitEvictionInternal (CPU was at 100% for many > minutes and it was enough to turn on the CPU fan at full speed). This > happened on the Cache2 I/O thread, and the main thread was fully responsive. How big was the cache when this happened? > I finally used "Clear Recent History" to clear the last hour of history, and > afterwards CPU usage immediately dropped down to 0%. "Clear Recent History" always deletes the whole cache and it is always faster than selective evicting of the least useful entries, so this is expected behavior. It seems to me that for some reason the maximum cache size has been dramatically lowered, so we started to evict a lot of entries. But it's not clear to me how this could happen. What's your maximum cache size? Do you have smartsize enabled?

(inactive) Jim Chen [:jchen] [:darchons]

Reporter

Comment 2

•

10 years ago

This happened to me twice yesterday while I was streaming a live BBC broadcast. about:cache from the first time, > Information about the Network Cache Storage Service > Private Anonymous AppID In Browser Element > memory > Number of entries: 1098 > Maximum storage size: 32768 KiB > Storage in use: 6784 KiB > Storage disk location: none, only stored in memory > List Cache Entries > disk > Number of entries: 7896 > Maximum storage size: 327680 KiB > Storage in use: 331063 KiB > Storage disk location: /Users/nchen/Library/Caches/Firefox/Profiles/u6sv5ul4.default/cache2 > List Cache Entries > appcache > Number of entries: 0 > Maximum storage size: 512000 KiB > Storage in use: 0 KiB > Storage disk location: /Users/nchen/Library/Caches/Firefox/Profiles/u6sv5ul4.default/OfflineCache After that, I cleared recent history (about:cache showed cache was cleared), and things returned to normal. But after some time, while I was still streaming the broadcast, this happened again. about:cache from the second time, > Information about the Network Cache Storage Service > Private Anonymous AppID In Browser Element > memory > Number of entries: 0 > Maximum storage size: 32768 KiB > Storage in use: 0 KiB > Storage disk location: none, only stored in memory > disk > Number of entries: 1788 > Maximum storage size: 358400 KiB > Storage in use: 2785412 KiB > Storage disk location: /Users/nchen/Library/Caches/Firefox/Profiles/u6sv5ul4.default/cache2 > List Cache Entries > appcache > Number of entries: 0 > Maximum storage size: 512000 KiB > Storage in use: 0 KiB > Storage disk location: /Users/nchen/Library/Caches/Firefox/Profiles/u6sv5ul4.default/OfflineCache Both times CPU was at 100%, but there was *no* disk activity according to Activity Monitor. The broadcast I was watching was using Flash for streaming; not sure if it's related. Smartsize is enabled according to about:config.

Honza Bambas (:mayhemer)

Comment 3

•

10 years ago

For reference: [18:19] NeilAway: mayhemer: I've had a bug where the cache doom thread was in an infinite loop because the lru entry had already been doomed [18:21] NeilAway: mayhemer: I used the debugger to nop-out both the IsDoomed() checks and the thread eventually finished shrinking the cache, but unfortunately the browser still wouldn't shut down normally and I had to kill it anyway :s

neil@parkwaycc.co.uk

Comment 4

•

10 years ago

I can verify that my 2-week-old 64-bit Windows self-build hung on a background thread in a loop in the same function. I didn't notice the problem until I tried to shut the build down and noticed that the process was lingering with 25% (1 of 4 CPUs) utilisation. My build was an unoptimised build with symbols so I attached a debugger and noticed that the cache entry that it was trying to doom was already doomed. At first I just used the debugger to skip over the IsDoomed() checks in the appropriate functions and some of the cache got evicted (it turns out that forcibly dooming a doomed entry still removes it from the mru list) but the loop got stuck again and I got bored and used the debugger to remove the checks. This allowed the function to complete its loop and return to the thread but unfortunately the browser didn't complete shutdown and I had to kill it anyway.

OS: Mac OS X → All

Michal Novotny [:michal]

Assignee

Comment 5

•

10 years ago

We should never have doomed entries in frecency/expiration array in the index. The CacheFileHandle::mIsDoomed is set to true only in CacheFileIOManager::DoomFileInternal() and the entry is removed from the index just one line above that. Neil, could you please try to reproduce it with NSPR logging turned on (nsHttp:5,cache2:5)?

Michal Novotny [:michal]

Assignee

Comment 6

•

10 years ago

Attached patch patch v1 — Details — Splinter Review

This patch doesn't fix the root of the cause of the problem. It just breaks the loop so we won't try to doom already doomed handle over and over again. I still have no idea how we can have such entry in the index. Neil, did you try to reproduce the problem with NSPR logging enabled?

Assignee: nobody → michal.novotny

Attachment #8456219 - Flags: review?(honzab.moz)

Flags: needinfo?(neil)

Michal Novotny [:michal]

Assignee

Comment 7

•

10 years ago

Comment on attachment 8456219 [details] [diff] [review] patch v1 https://tbpl.mozilla.org/?tree=Try&rev=a64fe943e3d2

neil@parkwaycc.co.uk

Comment 8

•

10 years ago

(In reply to Michal Novotny from comment #6) > This patch doesn't fix the root of the cause of the problem. It just breaks > the loop so we won't try to doom already doomed handle over and over again. > I still have no idea how we can have such entry in the index. Neil, did you > try to reproduce the problem with NSPR logging enabled? Sorry, I haven't been watching the bug, but as it happens, I've not seen the problem again anyway.

Flags: needinfo?(neil)

Honza Bambas (:mayhemer)

Comment 9

•

10 years ago

Comment on attachment 8456219 [details] [diff] [review] patch v1 Review of attachment 8456219 [details] [diff] [review]: ----------------------------------------------------------------- Hope this helps. Tho seems like the problem may actually be somewhere else according michal's claims that this simple cannot happen. r=honzab

Attachment #8456219 - Flags: review?(honzab.moz) → review+

Jason Duell

Comment 10

•

10 years ago

Comment on attachment 8456219 [details] [diff] [review] patch v1 Approval Request Comment [Feature/regressing bug #]: cache2 [User impact if declined]: browser can go into 100% CPU loop on cache thread [Risks and why]: low: fairly small change [String/UUID change made/needed]: none This cache2 fix also seems worth taking on aurora IMO.

Attachment #8456219 - Flags: approval-mozilla-aurora?

Michal Novotny [:michal]

Assignee

Comment 11

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/a078bf8b709d

Lawrence Mandel [:lmandel] (use needinfo)

Comment 12

•

10 years ago

Going to wait for this to land on m-c before granting Aurora approval.

status-firefox30: --- → unaffected

status-firefox31: --- → unaffected

status-firefox32: --- → affected

status-firefox33: --- → fixed

tracking-firefox32: --- → +

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 13

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/a078bf8b709d

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla33

Sylvestre Ledru [:Sylvestre]

Updated

•

10 years ago

Attachment #8456219 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 14

•

10 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/434889d480f8

status-b2g-v2.0: --- → fixed

status-b2g-v2.1: --- → fixed

status-firefox32: affected → fixed

Karl Tomlinson (:karlt)

Updated

•

10 years ago

Depends on: 1085172

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Cache thread gets stuck in CacheFileIOManager::OverLimitEvictionInternal loop

Categories

(Core :: Networking: Cache, defect)

Tracking

()

People

(Reporter: jchen, Assigned: michal)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Comment 14

Updated

Attachment

General

Description

File Name

Content Type