Cache2 I/O thread loops with 100% CPU

RESOLVED DUPLICATE of bug 1248389

Status

()

RESOLVED DUPLICATE of bug 1248389
4 years ago
3 years ago

People

(Reporter: karlt, Unassigned)

Tracking

32 Branch
x86_64
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

4 years ago
+++ This bug was initially created as a clone of Bug #1028415 +++

It's hard to tell whether or not this is the same as reported in bug 1028415, which was fixed on 32 while in Aurora.

A bug means my --disable-strip --disable-install-strip build, which had been running for 17 days, had no symbols for (only) libxul.so.  That bug at least has been fixed in 33.

Loading about:cache or trying to "Update" does not complete.  Sometimes it displays:

Number of entries: 	30
Maximum storage size: 	512000 KiB
Storage in use: 	1439 KiB
Storage disk location: 	/home/karl/.cache/mozilla/firefox/2qrqcu0r.default/OfflineCache
List Cache Entries
(Reporter)

Comment 1

4 years ago
strace does not detect frequent system calls from the thread.  There are some that occur while browsing.  e.g. with -tT

12:57:08 futex(0x7f16237fd3d0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000036>
12:57:08 futex(0x7f16237fd3d0, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000030>
12:57:08 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000085>
12:57:08 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000031>
12:57:08 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000019>
12:57:08 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000033>
12:57:08 access("/home/karl/.cache/mozilla/firefox/2qrqcu0r.default/cache2/entries/72F3B7157C0D1897D2A7C47F4E95778BDB66D077", F_OK) = -1 ENOENT (No such file or directory) <0.000056>
12:57:08 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000023>
12:57:08 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000029>
12:57:08 futex(0x7f164a3645cc, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f164a3645c8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000034>
12:58:46 futex(0x7f16237fd3d0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000038>
12:58:46 futex(0x7f16237fd3d0, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000032>
12:58:46 futex(0x7f16237fd3d0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000031>
12:58:46 futex(0x7f16237fd3d0, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000030>
12:58:46 futex(0x7f165c1e3048, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000033>
12:58:46 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000029>
12:58:46 access("/home/karl/.cache/mozilla/firefox/2qrqcu0r.default/cache2/entries/E66E87C05E4381980877C9FCA978B6B6F2709DC2", F_OK) = -1 ENOENT (No such file or directory) <0.000063>
12:58:46 write(26, "\372", 1)           = 1 <0.000031>
12:58:47 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000024>
12:58:47 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000032>
12:58:51 futex(0x7f16237fd3d0, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000032>
12:58:51 futex(0x7f16237fd3d0, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000016>
12:58:51 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000019>
12:58:51 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 1 <0.000035>
12:58:51 futex(0x7f165c1e3048, FUTEX_WAIT_PRIVATE, 2, NULL) = -1 EAGAIN (Resource temporarily unavailable) <0.000034>
12:58:51 futex(0x7f165c1e3048, FUTEX_WAKE_PRIVATE, 1) = 0 <0.000033>
12:58:51 access("/home/karl/.cache/mozilla/firefox/2qrqcu0r.default/cache2/entries/2B07D301731CD49B08F55FB5F51D4A8289552D87", F_OK) = -1 ENOENT (No such file or directory) <0.000055>
12:58:52 futex(0x7f164a3645cc, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x7f164a3645c8, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1 <0.000038>
12:59:11 access("/home/karl/.cache/mozilla/firefox/2qrqcu0r.default/cache2/entries/530EA4CD53A13D7EA516F635987D6DC9760CB015", F_OK) = -1 ENOENT (No such file or directory) <0.000044>
tracking-fennec: --- → ?
tracking-fennec: ? → -
tracking-fennec: - → ---
I'd be more interested in call stack samples of that thread.
(Reporter)

Comment 3

4 years ago
The strip bug seems to have been fixed, so I'll have symbols next time this happens.
Karl, can you still reproduce?
Flags: needinfo?(karlt)
(Reporter)

Comment 5

4 years ago
I don't have steps to reproduce.
Flags: needinfo?(karlt)
Thanks.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WORKSFORME
(Reporter)

Comment 7

4 years ago
Should I interpret from this that you do not want any reports for intermittent bugs?

We don't usually say "works for me" just because we don't have steps to reproduce.

Bug 1028415 comment 6 and 9 acknowledge that the problem is likely not fixed, and this report and the vote here would seem to indicate that it is not.
Flags: needinfo?(honzab.moz)
(In reply to Karl Tomlinson (back Dec 9 :karlt) from comment #7)
> Should I interpret from this that you do not want any reports for
> intermittent bugs?
> 
> We don't usually say "works for me" just because we don't have steps to
> reproduce.
> 
> Bug 1028415 comment 6 

The comment says it doesn't fix a problem (an unexpected state) in the cache - that problem is not critical, tho.  But it clearly says it fixes the endless looping.

> and 9 

This comment just concurs that.  We haven't find/fix the root cause bug we've fixed the looping.

> acknowledge that the problem is likely not
> fixed, and this report and the vote here would seem to indicate that it is
> not.

In comment 4 (this bug) I was *not* asking for 'steps to reproduce' but if that still happens in general.  And I took comment 5 as you 'cannot reproduce.'  

If I understood incorrectly and you, or anyone, can still see a problem that exactly or closely conforms this bug description then feel free to reopen it.

Until that, this is WFM.

Also, this could be a dup of bug 1025913, fixed on 33.
Flags: needinfo?(honzab.moz)

Comment 9

3 years ago
FYI: According to https://bugzilla.opensuse.org/show_bug.cgi?id=962276 I'm missing an "about:cpu" in Firefox ;-)
(In reply to Ulrich Windl from comment #9)
> FYI: According to https://bugzilla.opensuse.org/show_bug.cgi?id=962276 I'm
> missing an "about:cpu" in Firefox ;-)

You can turn logging on at runtime.  Create logging.cache2 string pref and set it to "Debug" string.  It will start producing log to stderr (or debug output on win).

Comment 11

3 years ago
(In reply to Honza Bambas (:mayhemer) from comment #10)
> You can turn logging on at runtime.  Create logging.cache2 string pref and
> set it to "Debug" string.  It will start producing log to stderr (or debug
> output on win).

I did that, but couldn't find debug output. Looking for any new files in my home directory, I noticed that ~/.cache/mozilla/firefox/oxxwnopu.default/cache2/entries has 139699 entries; is that normal?
(In reply to Ulrich Windl from comment #11)
> (In reply to Honza Bambas (:mayhemer) from comment #10)
> > You can turn logging on at runtime.  Create logging.cache2 string pref and
> > set it to "Debug" string.  It will start producing log to stderr (or debug
> > output on win).
> 
> I did that, but couldn't find debug output. 

It ouputs to strerr.  On windows it should go to the debug output, but I cannot see anything in Dbgview.

If you can reproduce regularly, see https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging.  Just change export NSPR_LOG_MODULES=timestamp,cache2:5.

> Looking for any new files in my
> home directory, I noticed that
> ~/.cache/mozilla/firefox/oxxwnopu.default/cache2/entries has 139699 entries;
> is that normal?

Yes, that is normal.

Comment 13

3 years ago
Created attachment 8719461 [details]
Typescript (compressed) for "NSPR_LOG_MODULES=timestamp,cache2:5"

Restarting Firefox was enough to trigger the problem. I suspect that the problem is triggered when the cache reaches its limit ("cache full"), and maybe if the limit is too low.
I exited from Firefox after a few seconds of filling the logs; I hope this is enough to understand the problem.
(In reply to Ulrich Windl from comment #13)
> Created attachment 8719461 [details]
> Typescript (compressed) for "NSPR_LOG_MODULES=timestamp,cache2:5"
> 
> Restarting Firefox was enough to trigger the problem. I suspect that the
> problem is triggered when the cache reaches its limit ("cache full"), and
> maybe if the limit is too low.
> I exited from Firefox after a few seconds of filling the logs; I hope this
> is enough to understand the problem.

Thanks.  Definitely a bug, but very different from this one.  Filing a new one.
Did you change the cache size? For some reason you have around 487MB of data in the cache at startup while the limit is set to 256MB. So once we read the cache index (which takes 15 seconds) we start evicting entries to shrink the cache to the limit. Evicting one record takes around 0.1s in your case so only 221 records were evicted in 23 seconds before you closed the browser. You have many small files (487MB in about 165000 files) and the problem seems to be in CacheIndex::GetEntryForEviction which always sorts the array (introduced in bug 1082735 which definitely didn't help here). As a quick performance improvement, we should skip sorting the array when we're sure the array hasn't been modified. Next we should compare performance before and after change in 1082735, maybe keeping the array sorted all the time will perform better in case of large arrays.

Comment 16

3 years ago
(In reply to Michal Novotny (:michal) from comment #15)
> Did you change the cache size? For some reason you have around 487MB of data
> in the cache at startup while the limit is set to 256MB. So once we read the

That's in accordance to the settings I see. I cannot remember changing the setting in the last weeks (I'm not saying it was never changed).

> cache index (which takes 15 seconds) we start evicting entries to shrink the
> cache to the limit. Evicting one record takes around 0.1s in your case so
> only 221 records were evicted in 23 seconds before you closed the browser.

I see a lot of cache entries of size 0.

> You have many small files (487MB in about 165000 files) and the problem
> seems to be in CacheIndex::GetEntryForEviction which always sorts the array
> (introduced in bug 1082735 which definitely didn't help here). As a quick
> performance improvement, we should skip sorting the array when we're sure
> the array hasn't been modified. Next we should compare performance before
> and after change in 1082735, maybe keeping the array sorted all the time
> will perform better in case of large arrays.
(In reply to Ulrich Windl from comment #16)
> (In reply to Michal Novotny (:michal) from comment #15)
> > Did you change the cache size? For some reason you have around 487MB of data
> > in the cache at startup while the limit is set to 256MB. So once we read the
> 
> That's in accordance to the settings I see. I cannot remember changing the
> setting in the last weeks (I'm not saying it was never changed).

That would mean that the cache went above the limit. 

> > cache index (which takes 15 seconds) we start evicting entries to shrink the
> > cache to the limit. Evicting one record takes around 0.1s in your case so
> > only 221 records were evicted in 23 seconds before you closed the browser.
> 
> I see a lot of cache entries of size 0.

Do you mean size of the file on the disk in cache2 directory or data size in about:cache listing?

Comment 18

3 years ago
(In reply to Michal Novotny (:michal) from comment #17)
> Do you mean size of the file on the disk in cache2 directory or data size in
> about:cache listing?

In the about:cache listing. In the filesystem most files have the magical size of 2738 bytes.

Comment 19

3 years ago
I have a problem much like this. Firefox churns for almost two minutes and consumes 500 - 600 MB of memory at startup. Some web pages I try to load get stuck during that time. Process Monitor says that Firefox is constantly reading and rereading files from cache2. If I delete my entire cache2 folder Firefox starts quickly the next time but the problem comes back later in the day. What is the new bug that was filed?

Comment 20

3 years ago
Found it -- Bug 1248389.
Resolution: WORKSFORME → DUPLICATE
Duplicate of bug: 1248389
You need to log in before you can comment on or make changes to this bug.