Closed Bug 1022775 Opened 6 years ago Closed 4 years ago

Huge CPU use when waiting for network reply. Pathologically slow.

Categories

(Core :: Networking, defect, major)

31 Branch
x86
Windows 7
defect
Not set
major

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: hartnegg, Unassigned)

References

Details

(Keywords: perf)

Attachments

(4 files, 1 obsolete file)

When Firefox 31 waits for a server reply, for example when trying to access an https resource which exists only as http, it uses a lot of CPU, sometimes 100% of both cores. Firefox 24 uses almost no CPU as long as the server does not reply.
I cannot reproduce the issue on Win 7 x64 or Win XP x86, using the latest Firefox 31 Beta 4 (BuildID: 20140623175014). 

Does this reproduce with a clean profile: https://support.mozilla.org/en-US/kb/profile-manager-create-and-remove-firefox-profiles?

Can you provide some examples of what you tried to access to reproduce this problem?
Component: General → Networking
Product: Firefox → Core
Yeah, too vague information, please provide more details, as much as you can.

If there are any sensitive information you can always make the comment private, also feel free to send the information to (e.g. mine) bugzilla mail directly.
This seems to be reproducable only in VirtualBox. 
Also with 31 beta 4 wich clean profile.
You are running Win7 in the VBox?  Any special software/setting?  Please, provide more details.

What about current Nightly?
Created the log file, will upload in a minute.
Also a hardcopy of the ussage of one of the two CPU kernels.

Called three times a web page that contains only <?php sleep(10) ?>
First call from commandline when starting Firefox, then clicked on reload twice.

The problem appears to be smaller in Win7 than in XP, but still noticable.
Especially when comparing to Firefox 24, where CPU use during wait is zero.

Tested with version 31 beta 4 with clean profile, Win7 in VirtualBox.
Attached file NSPR log (obsolete) —
Attached image firefox-cpu.gif
CPU use (one of two kernels) when loading web page with <?php sleep(10) ?>, and then clicking reload twice.
(In reply to hartnegg from comment #6)
> The problem appears to be smaller in Win7 than in XP, but still noticable.
> Especially when comparing to Firefox 24, where CPU use during wait is zero.

That would seem to indicate there's a regression. Could you test other Firefox versions to see where this first started for you?
The problem seems to grow, or there are two problems.
One started in version 26 or 27, this takes 20% CPU.
The other came in 31, which takes 50%.
I don't see anything bad in the log (no network thread looping).  Recently I have experienced bug 1033493, but not sure it's the same or not.  Maybe try adding another module to NSPR_LOG_MODULES env var: "UrlClassifierDbService:5"
Added log with UrlClassifierDbService:5

First waited 40 secs, then loaded the page with sleep(10) twice.

Btw. there was also several times activity of the process firefox.exe during the 40 secs wait, although I disabled all automatic functions that I could think of, and that is also not visible in the log.
Attachment #8447041 - Attachment is obsolete: true
All tests were made with clean profile.
Will try Gecko Profiler later today.
Gecko Profiler does not see this activity either.
Neither does Process Monitor from SysInternals.
Process Explorer says that there is lots of activity, but not what.
Are there any more NSPR options that I could try?
First, try to add Firefox symbols to Process Explorer: Options/Configure Symbols and set Symbols path to the following: C:\Symbols*http://symbols.mozilla.org/firefox


Other option is to use AMD CodeAnalyst and attach it to firefox.exe process, you also need to setup the symbols a similar way there.  It's not easy to use at the start, but may quickly point to the code that loops w/o any other work needed.


Personally, my next step would be to start Visual Studio Express [1] and attach.  Before that you need to add symbol servers as well [2].  You may also need to download sources to a proper location (you can however with some limitation work w/o it), I can give more information on it when you decide to go this way.

When attached, check in Process Explorer which thread is causing the CPU load, note its ID.  When you pause firefox.exe in Visual Studio, just switch to that thread in the Threads window and see where it is on the stack.  You may also try to run/pause/run/pause few times and check the stack is on same/similar place of the code.

Then you can paste here some sample stacks or at least method names, so that we know more.

Thanks!


[1] http://www.visualstudio.com/downloads/download-visual-studio-vs#d-express-windows-desktop
[2] http://symbols.mozilla.org/firefox
Using Visual Studio sounds as too much effort.

Process Explorer says it's in the main thread,
which is at address firefox.exe+0x249c in Firefox 
version 31.0.0.5297 (german variant).

sample Stacks:

mozjs.dll+0x10bce1
xul.dll+0x134d17
--
ntdll.dll+0xe514
--
ntoskrnl.exe+0x57e8
ntdll.dll+0xe514
gkmedias.dll+0x12b459
gkmedias.dll+0x12bbfe
gkmedias.dll+0x1509bb
gkmedias.dll+0x12cf94
gkmedias.dll+0x12d289
gkmedias.dll+0x150bdc
gkmedias.dll+0x138c8f
gkmedias.dll+0x1560fd
xul.dll+0x1b1cdd
xul.dll+0x1b19da
--
ntdll.dll+0xe514
gkmedias.dll+0x12b459
gkmedias.dll+0x12bbfe
gkmedias.dll+0x1509bb
gkmedias.dll+0x12cf94
gkmedias.dll+0x12d289
--
ntoskrnl.exe+0x57e8
ntdll.dll+0xe514
gkmedias.dll+0x12b459
gkmedias.dll+0x12bbfe
gkmedias.dll+0x1509bb
gkmedias.dll+0x12cf94
gkmedias.dll+0x12d289
--
ntoskrnl.exe+0x57e8
ntdll.dll+0xe514
gkmedias.dll+0x12b459
gkmedias.dll+0x12bbfe
gkmedias.dll+0x1509bb
gkmedias.dll+0x12cf94
gkmedias.dll+0x12d289
gkmedias.dll+0x150bdc
gkmedias.dll+0x138c8f
gkmedias.dll+0x1560fd
xul.dll+0x1b1cdd
xul.dll+0x1b19da
--
ntoskrnl.exe+0x57e8
ntdll.dll+0xe514
gkmedias.dll+0x12b459
gkmedias.dll+0x12bbfe
gkmedias.dll+0x1509bb
gkmedias.dll+0x12cf94
gkmedias.dll+0x12d289
--
ntoskrnl.exe+0x57e8
ntdll.dll+0xe514
gkmedias.dll+0x12b459
gkmedias.dll+0x12bbfe
gkmedias.dll+0x1509bb
gkmedias.dll+0x12cf94
gkmedias.dll+0x12d289
gkmedias.dll+0x150bdc
gkmedias.dll+0x138c8f
gkmedias.dll+0x1560fd
xul.dll+0x1b1cdd
xul.dll+0x1b19da
--
ntoskrnl.exe+0x57e8
ntdll.dll+0xe514
gkmedias.dll+0x12b459
gkmedias.dll+0x12bbfe
gkmedias.dll+0x1509bb
gkmedias.dll+0x12cf94
gkmedias.dll+0x12d289
--
ntoskrnl.exe+0x5909
ntdll.dll+0xe514
Says nothing.  

To get a real data you will have to use Visual Studio, at least to download symbols from the Mozilla symbol servers somewhere (Process Explorer doesn't do that - I have tried).  

You need to:
- install VS
- setup the symbol server and the cache destination directory, and set it to automatically download the symbols for all modules
- attach VS to firefox.exe process, it should start downloading the symbols (it's slow, have patience)
- run Process Explorer
- setup the symbols path to the cache directory you have set in VS
- check the stacks again
I managed to make ProcessExplorer show names.

Also I noticed that not only firefox.exe consumes cpu while it waits for more data from the web server (or for https connection to a server that does not speak https). This happens simultaneously in the process "System".

Inside Firefox it is the thread firefox.exe!wmainCRTStartup
Inside System it is the thread ntoskrnl.exe!SeTokenObjectType+0x158df

Btw. Firefox 31 already consumes 10% cpu when just the mouse cursor hovers over the reload icon, before I click. If I move the mouse away, cpu use immediately drops below 1%.


The stack of seTokenObjectType usually looks like this:

ntoskrnl.exe!ExReleaseResourceLite+0x2b4
ntoskrnl.exe!RtlInsertElementGenericTable+0x3a5
hal.dll!HalClearSoftwareInterrupt+0x342
ntoskrnl.exe!KiIpiServiceRoutine+0x191
ntoskrnl.exe!SeTokenObjectType+0x143de
ntoskrnl.exe!PsCreateSystemThread+0x70
ntoskrnl.exe!KeInitializeTimerEx+0x1e5

For the other stacks I took several snapshots, because they differ a lot from trial to trial, and will upload them as separate files.
It's very hard to figure something out from these stacks..
are you seeing high kernel mode CPU?  (in windows' taskmgr, performance tab, enable the display under "View" - see the red line)
Keywords: perf
Yes, the red line is very high in that situation, which is unusual compared with other web surfing.
horrible slow restart on laptop at work - only 140 tabs.
switched to wired connection - no better.
will try to get a log next time.

(before the FF resetart I had updated to 2014-08-08 nightly, and killed superfetch service on vista)
Severity: normal → major
Summary: Huge CPU use when waiting for network reply → Huge CPU use when waiting for network reply. Pathologically slow.
See Also: → 960238
This bug is not reproducable any more with Firefox 38esr.

However now I'm suffering from bug 960238: high CPU load when starting with many saved tabs, reports twice that chrome://global/content/bindings/browser.xml hangs, finally freezes completely. Works fine with clean profile, but I loose all my open tabs.
(In reply to hartnegg from comment #27)
> Works fine with clean profile, but I loose all my open tabs.

In that case, there's probably something about your profile that is triggering or at least contributing to bug 960238. Wayne, are you able to reproduce this issue anymore?
Flags: needinfo?(vseerror)
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #28)
> (In reply to hartnegg from comment #27)
> > Works fine with clean profile, but I loose all my open tabs.
> 
> In that case, there's probably something about your profile that is
> triggering or at least contributing to bug 960238. Wayne, are you able to
> reproduce this issue anymore?

I haven't seen it. But I've not tried to reproduce
(In reply to hartnegg from comment #27)
> This bug is not reproducable any more with Firefox 38esr.
> 
> However now I'm suffering from bug 960238: high CPU load when starting with
> many saved tabs, reports twice that
> chrome://global/content/bindings/browser.xml hangs, finally freezes
> completely. Works fine with clean profile, but I loose all my open tabs.

hartnegg, you might try using https://addons.mozilla.org/en-us/firefox/addon/about-sessionstore/ and click the buttons at the bottom to reduce the size of your sessionstore file
hartnegg no longer reproduces. And I haven't seen it this year. I've been in airports and tethered many times and NTF. so WFM
Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(vseerror)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.