Closed Bug 1120166 Opened 6 years ago Closed 6 years ago

Huge regression in page load and rendering performance, including higher CPU % and occasional stalls

Categories

(Core :: Networking, defect)

x86_64
Windows 8.1
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: fehe, Unassigned)

References

()

Details

(Keywords: perf, regression)

The December 10 Nightly build introduced a huge performance regression loading and rendering pages.  After much grief, I managed to track the regression pushlog (using the URL: http://mrgrady7787.blogspot.ca/2014/07/americas-news_29.html) to the following: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=8a01e921708c&tochange=5f8dbb495675

Although Bug 1081871 deals with FirefoxOS, it seams to be the most likely suspect.  Feel free to nominate a different bug and component, if you're able to confirm otherwise.

For me, without the regression, the page completes loading in 25 seconds or less.  With the regression, the page completes loading in 90 seconds or more.


STR:
1. Disable e10s and restart your browser
2. If you already have the aforementioned URL loaded, close the tab
3. Clear your browser cache and restart Firefox
4. Load http://mrgrady7787.blogspot.ca/2014/07/americas-news_29.html
5. Repeat Steps 2 to 4, comparing pre and post regression builds
On second thought, this is probably caused by the e10s logging -- even though I don't have e10s enabled, as I'm having a hard time seeing how Bug 1081871 could possible be doing this.
Blocks: 1112112
No longer blocks: 1081871
Component: Hardware Abstraction Layer (HAL) → Networking
(In reply to IU from comment #1)
> On second thought, this is probably caused by the e10s logging -- even
> though I don't have e10s enabled, as I'm having a hard time seeing how Bug
> 1081871 could possible be doing this.

Yes, the code in bug 1081871 is not even compiled on desktop Firefox.
1112112 changes only e10s code, without e10s it should not be called at all (HttpChannelParent, HttpChannelChild and FTPChannelParent) and it adds only some logs nothing else. I am sure it is not it, but i will check it...
No longer blocks: 1112112
I can't see it being bug 1039884, it only applies to media streams.
Is there any way someone could generate 4 builds with each of the suspect bugs backed out so I can verify?
Dragana: How did you confirm that bug 1112112 is not responsible?  Did you do a back-out and were my steps reproducible for you?
Flags: needinfo?(pehrsons)
Flags: needinfo?(pehrsons) → needinfo?(dd.mozilla)
I could not reproduce it on linux, so I did not back it out. I was checking my changes once again and I have check log: the code that I have change is not called at all, so it cannot be the problem.
Flags: needinfo?(dd.mozilla)
I'm not sure I'm able to reproduce this; there aren't perceptible differences with / without these changes, at least for me, on Linux.

That said, if Bug 1024809 does have perf issues this isn't where I'd expect to see them; there's some impact on the first cert verification after startup (including filesystem i/o as the certificate blocklist initializes) and a tiny impact on subsequent verifications... and some time when fx retrieves a new blocklist.xml - but if that's the cause of issue here, I'd expect to see bigger ones elsewhere.
I wonder if this bug is only exposed if some URL is blocked.  At home, I use a DNS sever to blacklist some domains.  At work, where there's no blacklisting, I could not reproduce.  I will investigate some more when I get home.
mmc fyi - maybe blocklist related?
Flags: needinfo?(mmc)
(In reply to IU from comment #9)
> I wonder if this bug is only exposed if some URL is blocked.  At home, I use
> a DNS sever to blacklist some domains.  At work, where there's no
> blacklisting, I could not reproduce.  I will investigate some more when I
> get home.

IU, if you can't reproduce at work, I suspect it is related to your setup. If the page relies on an included script that is being blocked at home but not work, it can hang. I can't reproduce on your test page (with no DNS blocking).
Flags: needinfo?(mmc)
I retested this when I got home and it's clear something on either that page or supporting systems on the Internet has changed, because today I can no longer reproduce -- even with the build that introduced the regression.  This obviously explains why none of you are able to reproduce.

I assume that whatever caused the regression is still there; therefore, I will leave this bug open till the weekend.  If I do not encounter it again by then, I will close it.

Thanks
The regression still remains.  This morning I encountered it again, but only temporarily.  So the conditions for the triggering the regression still remain and don't seem to be related to any DNS blacklisting I am doing.

If someone is at all willing to produced the four builds I requested, I could have them on hand to help pinpoint which of the four bugs is responsible, the next time I encounter it.  Barring that, this is just going to be yet another source of poor Firefox performance that users are going to encounter from time to time.  At least I was able to identify the regression range when it was persistent.

And for the record, I was not the only one who experienced this issue, as evidenced starting here: http://forums.mozillazine.org/viewtopic.php?p=13968113#p13968113
IU thanks for the report and your efforts here.

That site is WFM, fwiw.

dragana's patch doesn't seem likely, but maybe there is something we don't understand going on. Or maybe the regression range finding was broken - that's a risk with problems that only appear temporarily.

IU, I have kicked off two builds for you. They will take a couple hours to complete.

One is a control build - its just the tip of m-c right now.
https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/mcmanus@ducksong.com-9b7929158049

The other is the same tip with 1112112 backed out
https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/mcmanus@ducksong.com-54269c2b9957

Can you compare them with your test case?
(In reply to Patrick McManus [:mcmanus] from comment #14)
> understand going on. Or maybe the regression range finding was broken -
> that's a risk with problems that only appear temporarily.

The regression range is firm.  I tested and retested when the problem was persistent.  That is the valid range.
 
> The other is the same tip with 1112112 backed out
> https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/
> mcmanus@ducksong.com-54269c2b9957

Thanks, but it would be best if I could have all four, as requested, on hand for when I do encounter the problem again -- just in case one of the other three bugs is a fault.
 
> Can you compare them with your test case?

I will do that.  However, as stated, the problem is currently intermittent.  Therefore, it could be days or weeks before I am able to obtain definitive results.
I think whatever caused this issue has since been resolved by some other bug/patch.  When the issue was present,  the browser created high CPU load and became intolerably slow after watching a couple videos on YouTube.  This no longer happens.

Thanks to all who looked into this.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.