Closed Bug 1191907 Opened 9 years ago Closed 9 years ago

100% cpu usage with 07befc6f54e7 nightly (20150806030205)

Categories

(Core :: Networking: Cache, defect)

42 Branch
defect
Not set
major

Tracking

()

RESOLVED FIXED
Tracking Status
firefox41 --- unaffected
firefox42 + fixed

People

(Reporter: froydnj, Unassigned)

References

Details

(Keywords: power, regression)

I upgraded my nightly today and ever since, htop reports that the "Cache2 I/O" thread in my firefox process is consuming 100% CPU. My nightly apparently doesn't have symbols, so perf is useless at the moment, but it is at least telling me that all the CPU is being spent in libxul, and not with something like lock contention. Honza, do you know if any cache-related changes went in recently that might have caused this?
Flags: needinfo?(honzab.moz)
If you can reproduce, please provide an NSPR log with timestamp,cache2:5 modules.
AFAIK, there were a huge change in how tables are iterated in bug 1182961. Tentatively marking as blocking.
Blocks: 1182961
Flags: needinfo?(honzab.moz) → needinfo?(nfroyd)
Blocks: 1082735
I was seeing 100% parent process CPU usage, though I'm not sure what thread it was. I'm having trouble reproducing now but I'll see if I can figure out what was triggering it.
I think I can reproduce this by going to this link and playing the video: https://twitter.com/nodevember/status/626539473082388480 Here's the stack (sorry for the poor formatting): 2479 Thread_1231998: Cache2 I/O + 2479 thread_start (in libsystem_pthread.dylib) + 13 [0x7fff8ded341d] + 2479 _pthread_start (in libsystem_pthread.dylib) + 176 [0x7fff8ded51e5] + 2479 _pthread_body (in libsystem_pthread.dylib) + 131 [0x7fff8ded5268] + 2479 _pt_root (in libnss3.dylib) + 218 [0x101a052ea] + 2479 mozilla::net::CacheIOThread::ThreadFunc(void*) (in XUL) + 36 [0x101da4464] + 2478 mozilla::net::CacheIOThread::ThreadFunc() (in XUL) + 136 [0x101da45f8] + ! 2463 mozilla::net::CacheIOThread::LoopOneLevel(unsigned int) (in XUL) + 153 [0x101da47b9] + ! : 2447 mozilla::net::WriteEvent::Run() (in XUL) + 110 [0x101db051e] + ! : | 2447 mozilla::net::CacheIndex::OnDataWritten(mozilla::net::CacheFileHandle*, char const*, nsresult) (in XUL) + 486 [0x101dab646] + ! : | 1386 mozilla::net::CacheIndex::WriteRecords() (in XUL) + 207,192,... [0x101da7d3f,0x101da7d30,...] + ! : | 793 mozilla::net::CacheIndex::WriteRecords() (in XUL) + 344 [0x101da7dc8] + ! : | + 793 PLDHashTable::Iterator::Next() (in XUL) + 58,32,... [0x101c8b7ca,0x101c8b7b0,...] + ! : | 148 mozilla::net::CacheIndex::WriteRecords() (in XUL) + 352 [0x101da7dd0] + ! : | + 148 PLDHashTable::Iterator::Done() const (in XUL) + 0,4,... [0x101c8b650,0x101c8b654,...] + ! : | 102 mozilla::net::CacheIndex::WriteRecords() (in XUL) + 200 [0x101da7d38] + ! : | + 102 PLDHashTable::Iterator::Get() const (in XUL) + 0,8,... [0x101c8b780,0x101c8b788,...] + ! : | 17 mozilla::net::CacheIndex::WriteRecords() (in XUL) + 546 [0x101da7e92] + ! : | + 8 mozilla::net::CacheFileIOManager::Write(mozilla::net::CacheFileHandle*, long long, char const*, int, bool, bool,
Hmm, I can't really tell if it is watching that video and skipping around that causes it to go berserk, or maybe going to Facebook and scrolling up and down a lot so that it is loading new content. I can't reproduce it 100% of the time. The fun thing is that once it starts, the 100% CPU usage persists after I close the FB and YouTube tabs.
[Tracking Requested - why for this release]: Regression
Severity: normal → major
Keywords: power
I've found that "browsing around" for several minutes eventually triggers 100% cpu. I'm too stupid to use nspr logging, apparently, so I haven't gotten a good log yet...
Flags: needinfo?(nfroyd)
The best I've been able to do is "the click around to reproduce" method. Setting export NSPR_LOG_MODULES=timestamp,cache2:5 in a terminal and starting Firefox from the terminal works for me, but the logs are really huge. Hundreds of megabytes. If that is ok, I could bzip2 them and upload to people if that would help.
I'm seeing this too. http://dpaste.com/145K1NC has output from OS X's Sample Process (:BenWa determined that there wasn't anything on the main thread doing the hogging, so the Gecko Profiler data wasn't that useful).
FWIW, I don't have clear steps to reproduce. Restarting Firefox, it has normal CPU usage, but after a few minutes of browsing around, it will get stuck at 100% CPU again.
For what it's worth, I was seeing really high CPU usage in Nightly on Windows. I disabled e10s and restarted and CPU levels have been low/normal for the past 10 minutes. I'll keep an eye on CPU through the rest of the day.
(In reply to Wes Kocher (:KWierso) from comment #11) > For what it's worth, I was seeing really high CPU usage in Nightly on > Windows. I disabled e10s and restarted and CPU levels have been low/normal > for the past 10 minutes. I'll keep an eye on CPU through the rest of the day. It's not e10s related. I don't enable e10s and I've experienced this several times today.
> + ! : | 148 mozilla::net::CacheIndex::WriteRecords() (in XUL) + 352 [0x101da7dd0] > + ! : | + 148 PLDHashTable::Iterator::Done() const (in XUL) + 0,4,... This definitely looks like it's my fault. One of bug 1182961's patches modified CacheIndex::WriteRecords, so I've backed all that bug's patches out. Apologies for the inconvenience.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Interesting that this seemingly wasn't caught by any of our tests.
Merged the backouts to m-c so they'll hit tonight's nightly.
(In reply to Nicholas Nethercote [:njn] from comment #14) > Interesting that this seemingly wasn't caught by any of our tests. I don't think we have tests that would catch extra CPU time in background threads that don't really impede the progress on the main thread.
This is marked as fixed and a duplicate of bug 1192114. I'm going to go ahead and track it for 42, to ensure the fix is in, since the cpu usage indicator is alarming and confusing.
Bug 1182961 suggests that it is fixed.
You need to log in before you can comment on or make changes to this bug.