Loading page is x2 slower than chrome
Categories
(Core :: JavaScript: GC, defect, P2)
Tracking
()
Performance Impact | medium |
People
(Reporter: alice0775, Unassigned)
References
(Depends on 3 open bugs, )
Details
(Keywords: parity-chrome, perf, perf:pageload)
Attachments
(1 file)
15.62 KB,
text/plain
|
Details |
STR: Open http://www.elegantreader.com/item/14804993 Actual Results: It takes about 30 seconds to complete loading and rendering on Nightly62.0a1. In contrast, less than 10 second on Chrome68. Gecko Profiler log : https://perfht.ml/2JkpiII Expected Results: Faster ;)
![]() |
Reporter | |
Comment 1•5 years ago
|
||
![]() |
Reporter | |
Comment 2•5 years ago
|
||
Gecko Profiler log : https://perfht.ml/2JkB2uD
Comment 3•5 years ago
|
||
https://perfht.ml/2xWVv3C appears to be a JS bug.
Updated•5 years ago
|
Comment 4•5 years ago
|
||
Is it better if you set javascript.options.asyncstack to false? It takes ~5 seconds for me to load that page, definitely not 30.
![]() |
Reporter | |
Comment 5•5 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #4) > Is it better if you set javascript.options.asyncstack to false? It seems nothing change. > > It takes ~5 seconds for me to load that page, definitely not 30. It seems to need low speed network in order to reproduce the Firefox's slowness. My env is ADSL Down 5.0 Mb/s Up 0.6 Mb/s (Measurement result of http://www.bspeedtest.jp).
Comment 6•5 years ago
|
||
Hm but in that case it might be more of a networking issue...
Updated•5 years ago
|
None of these profiles have any networking stuff showing up in them, from my look at things (profiles don't even show the socket thread, for example). Add to that, I can't reproduce (even on my connection that is almost as slow as the reporter's), so I can't collect data locally. I will note that in the network markers tab, none of the loads takes particularly long. I don't really know where to go from here, but there is zero evidence that this is a necko issue.
![]() |
Reporter | |
Comment 8•5 years ago
|
||
regression range: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5c2bc94d359c&tochange=e46cca506613 https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=fd1fd139aa16&tochange=62b602a49394 https://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=1655a3f30b7e&tochange=d43360499b86 Looking across the above: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=fd1fd139aa16&tochange=b881c5526ed2
![]() |
Reporter | |
Comment 9•5 years ago
|
||
More remarkable regression range than the above: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2e70b718903a&tochange=bfd85c9652fa
Comment 10•5 years ago
|
||
Let's see if Quantum Flow triage turns anything up here. (Note: I'd like to remove the regression keyword since a 6 year old regression isn't something we can easily backout :)
Comment 11•5 years ago
|
||
There is some really weird GC stuff going on here. The execution of JS is being repeatedly interrupted by GcSlices with FULL_SLOT_BUFFER as a reason. Also the main thread is spending a lot of its time just waiting around for another thread to wake it up, which suggests synchronous waits on _something_ (not sure what exactly). Paul can you make sense of what's going on with the GCslices here? Are we running into some pathological case where we keep running GCSlices that don't do much work?
Comment 12•5 years ago
|
||
I can reproduce that this is slower in Firefox than chrome. At least twice as slow. There does seem to be a lot of GC activity, that's not always a bad thing but it's probably a big clue in this case. TLDR ==== We need to test this without nursery poisioning to see if it's still a real problem and how big the problem is. First problem ============= Being a nightly version of Firefox we poison the nursery each time we collect it, this is showing up on profiles and counting for (eyeballing from the profile) about half of the nursery collection time. It'll also destort other parts of the profile. I verified this by testing in the release version of Firefox and found that the page was faster there (but probably still room for improvment). Alice0775 you may wish to re-test with a release or beta version of firefox. If you control the site elegantreader.com, Could you also use Performance.mark / Performance.measure to annotate this part of the loading time, then we can see these in profiles and get an accurate measurement of the loading time. I'm not aware if you can measure from page load. I will file a seperate bug to allow people testing Nightly to control the poisoning behaviour so that they can gather profiles where this doesn't affect performance. Next problem ============ All the provided profiles, plus my own profile, show lots of minor GC activity, particularly due to full remembered sets. (This is partly why the poisoning shows up, it wouldn't if the nursery wasn't so busy.) The tenure rate is also high-ish. My guess is that there are lots of tenured->nursery pointers being created, potentially due to pre-tenuring of objects that shouldn't be pre-tenured. And that this is making generational GC less affective, making us collect the nursery more frequently and keepign things alive in the nursery longer than they should be (which means slower nursery collections). What we'd like to see (from the GC) for a program that creates this much GC activity is the same amount (or slightly less) nursery collection which tenures very few of the objects, meaning the nursery collection is much faster. But first wwe'd have to take a deeper look at some profiles without the poisoning. So that's a later step. Misc ==== The provided profiles show a lot of page faults but mine doesn't. But I can still reproduce the slowness in firefox so this is either unrelated or windows (where the other profiles are generated) is more honnest about reporting page faults (or we don't collect that on Linux). Kannan ITYM GCMinor not GCSlice, but let me know if I'm mistaken and missed something. Also Kannan, does a minor collection cause (always cause?) a bailout from the JIT? Do you loose the JITed code and have to re-compile it? Is it a problem or can it be avoided? I agree with Overholt, the regression ranges arn't all that useful. There are two GC related changes there both of them fix crashes/bugs. And so much will have changed algorithmically since then that I doubt there's anything useful there.
![]() |
Reporter | |
Comment 13•5 years ago
|
||
(In reply to Alice0775 White from comment #9) > More remarkable regression range than the above: > https://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=2e70b718903a&tochange=bfd85c9652fa FWIW, via local build, I finally got https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=fd8fd1a7aecd&tochange=9de611848111
Comment 14•5 years ago
|
||
Interesting. NI Olli, I know this change is 6 years ago. But can you think of why this change could make the GC behave in this way? Could it create a large number of tenured->nursery pointers, or even re-write a single tenured->nursery pointer many times? I'll have a scan through the patch myself, but this isn't code I'm familiar with. Thanks.
Comment 15•5 years ago
|
||
Here are some times I've taken, with the gecko profiler set to sample every 0.4ms to increase the accuracy of catching the poisioning. Note that measurement is difficult since I don't have a great way to measure things like this (maybe that's something I don't know how to do without Performance.mark/measure) Nightly: 11sec Nightly no poison: 12sec Alice0775 could you also test to see if this is still a problem when nursery poisoning is disabled. Please add JSGC_DISABLE_POISONING=1 to firefox's environment. For example on Linux I run: $ JSGC_DISABLE_POISONING=1 ./firefox If you can also provide the profile for this execution I can check to see if it truly disabled poisoning. Please also re-test in a release version of firefox and see if it's the same. If it's the same in both cases then this is something we'll want to investigate further.
![]() |
Reporter | |
Comment 16•5 years ago
|
||
(In reply to Paul Bone [:pbone] from comment #15) > > Alice0775 could you also test to see if this is still a problem when nursery > poisoning is disabled. Please add JSGC_DISABLE_POISONING=1 to firefox's > environment. For example on Linux I run: > > $ JSGC_DISABLE_POISONING=1 ./firefox > Step: Open command prompt cd (path to install directory) set JSGC_DISABLE_POISONING=1 .\firefox.exe The time it takes to display the page seems to be almost unchanged. > If you can also provide the profile for this execution I can check to see if > it truly disabled poisoning. > > Please also re-test in a release version of firefox and see if it's the same. > > If it's the same in both cases then this is something we'll want to > investigate further. Nightly62.0a1 (20180627222831) approx. 35sec https://perfht.ml/2MvpVMM Firefox61.0.0 (20180621125625) approx. 20sec https://perfht.ml/2Kq2SlJ
Comment 17•5 years ago
|
||
I don't know why making MutationObserver to use webidl bindings would cause this. Is the site perhaps using MutationObservers as microtask runners in the old builds and Promises in the new builds (we have now correct Promise scheduling in FF, they are also using microtasks). Based on the profile the page is using crazy amounts of promises and message events. And something weird with the js. If you look at those spikes in the profile, they are very very deep JS callstacks, af, bf, fc, ...
Comment 18•5 years ago
|
||
(In reply to Alice0775 White from comment #16) > > (In reply to Paul Bone [:pbone] from comment #15) > > > > Alice0775 could you also test to see if this is still a problem when nursery > > poisoning is disabled. Please add JSGC_DISABLE_POISONING=1 to firefox's > > environment. For example on Linux I run: > > > > $ JSGC_DISABLE_POISONING=1 ./firefox > > > > Step: > Open command prompt > cd (path to install directory) > set JSGC_DISABLE_POISONING=1 > .\firefox.exe > > The time it takes to display the page seems to be almost unchanged. Thanks. > > If you can also provide the profile for this execution I can check to see if > > it truly disabled poisoning. > > > > Please also re-test in a release version of firefox and see if it's the same. > > > > If it's the same in both cases then this is something we'll want to > > investigate further. > > Nightly62.0a1 (20180627222831) approx. 35sec https://perfht.ml/2MvpVMM > Firefox61.0.0 (20180621125625) approx. 20sec https://perfht.ml/2Kq2SlJ The difference here may be other things we do in Nightly builds that help to catch problems. They're definitly having a disproportionate affect on this particular page though. There's still a large difference with Chrome (your reported timings) and so we should continue to investigate.
Comment 19•5 years ago
|
||
(In reply to Olli Pettay [:smaug] from comment #17) > I don't know why making MutationObserver to use webidl bindings would cause > this. > > Is the site perhaps using MutationObservers as microtask runners in the old > builds and Promises in the new builds (we have now correct Promise > scheduling in FF, they are also using microtasks). > Based on the profile the page is using crazy amounts of promises and message > events. That's worth a look, I'll invesigate there. > And something weird with the js. If you look at those spikes in the profile, > they are very very deep JS callstacks, af, bf, fc, ... I saw that but wasn't sure if it was typical for JS either in the past or "modern JS". Since you say it's abnormal also it could be worth looking there also. Thanks.
Comment 20•5 years ago
|
||
My best guess is that something on the tenured heap has a lot of nursery pointers. And for some reason this causes us to perform worse than chrome. Perhaps it only affects us because it's a pretenuring decision we make that chrome doesn't. Other theories are that in the deep recursions we're keeping more stack pointers alive that are holding more things live and cusing the high numbers of nursery promotions. I will (mostly notes for myself): * Check the behaviour with and without nursery collection and pretenuring. * Try to debug what the tenured->nursery pointers are. * Check how many collections there are for each type of collection reason / how many items in each remembered set. * Teach the profiler to calculate evicts-per-second and evicted-bytes-per-second (might as well do mark/sweep f0r major collections also). Verify that this test is on-par with other tests acroding to these stats.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 21•5 years ago
|
||
Maybe the problem here is we're creating too many Promise objects and we spend too much time GCing them... Should we consider allocating those objects on the nursory heap? (See bug 1377500 that tracks some other similar issues)
Comment 22•5 years ago
|
||
That's something I looked for, but as far as I could tell Pormises are nursery allocated, but I was only about 85% sure. I don't exactly know how where in the code to check. If I'm wrong then yes, that is likely to be at least part of the problem.
Comment 23•5 years ago
|
||
One thing I investigated is pretenuring. I suspected that something was being pre-tentured that shouldn't be, and this was filling the remembered sets. This doesn't seem to be the case, disabling pretenuring didn't make a difference. I'm currently writing some code to determine what kinds of objects are involved in tenured->nursery pointers. Then I can decide if some of those could be handled differently.
Comment 24•5 years ago
|
||
Hi Alice, Do you have any contact with the webmaster for this site? I'm trying to find out if this behaviour is typical for vue.js sites or this is somehow something specific the JS on this site does.
![]() |
Reporter | |
Comment 25•5 years ago
|
||
(In reply to Paul Bone [:pbone] from comment #24) > Hi Alice, Do you have any contact with the webmaster for this site? No, I did not.
Comment 26•5 years ago
|
||
It looks as if while the Promises are allocated in the nursery. perhaps the PromiseReactionRecord objects are not. these have references to the Promise and also to functions which are often in the nursery. The next questions are: Are PromiseReactionRecords allocated in the nursery and later evicted, or are they always allocated in the tenured heap and why? Then will allocating them in the nursery speed up this page?
Comment 27•5 years ago
|
||
So at least some of the PromiseReactionRecord objects are being nursery allocated. So they're being tenured because of something else, but I'm not sure what yet. I'm not yet sure that they're the problem.
Comment 28•5 years ago
|
||
So the Promise and PromiseReactionRecords are the most common tenured objects to appear in a slots buffer (they're in the tenured heap, they have some slots that were assigned nursery pointers). val class name Array 702 Object 54 Promise 1159 PromiseReactionRecord 1159 The most common object to appear in the slots buffer is an Object, reated at offset 29 (rows 3-5). val class name alloc site script class name obj Object ?:0 function Ta(t){var e,n={};for(e in t)n[e Object 1 ?:117 function(t,e,n,r){var i=t.ja(),o=t.path. Object 16 ?:29 function(t){if(this.e())return null;var Array 177 Promise 190 PromiseReactionRecord 183 The counts for Array, Promise and PromiseReactionRecord are the counts of tenured objects that point to this nursery object when the slots buffer is scanned. I excluded already forwarded nursery objects in this table. It looks as though this object is created in the code (prettified): g.H = function(t) { return t && !this.C().e() ? { ".value": this.Ca(), ".priority": this.C().H() } : this.Ca() } of http://www.elegantreader.com/dist/vendor.e0b86d19455fa88b4013.js We could try getting this object to be pre tenured. But I'm unsure if that will just shift the problem onto something it points to. I already tried disabling the nursery (same effect as pretenuring everything!) but there was no improvment. NI: jonco, sfink in case this creates any insight for you.
Comment 29•5 years ago
|
||
I tried reducing the tenure rate needed for pretenuring. Previously the nursery would be collected often due to full slot buffer, but now it has various other reasons (and the tenure rate seems to be a bit lower). It didn't improve performance noticibly. I can only explain such a difference with chrome as either 1) some constant factor algorithm is just faster in chrome. 2) there's something we can't allocate in the nursery (now) and it's pointing indirectly at all these other things causing them to be tenured, maybe chrome is able to nursery allocate more things. I wish there was somewthing like perf.html for chrome, then I could get an idea of what their GC does with this example ;-)
Comment 30•5 years ago
|
||
In Chrome devtools->[three dots]->more tools->Javascript profiler ? And there is the Performance tool too.
Comment 31•5 years ago
|
||
Thanks smaug, I took a look but nothing obvious is there that'll give the kind of clues that are really needed. The profiles in both browsers look similar, but this looked a little odd: + Chrome spent between 5-6% of its time doing minor GC + Firefox spent between 3-4%. The Firefox heap is MUCH bigger than Chrome's. I even tried a new browser instance for both. + Chrome: 20-30MB, sometimes just over 100MB at the end of the page load, drops quickly to <10MB. + Firefox: ~260MB It's likely that they're counting different things in this reporting, that it's a difference in the profilers. However 1) A larger heap means more work per GC but fewer GCs (not clear). I'm not sure what can be done next, maybe general work (eg improve tenuring performance) would help move the needle on this page. I'm unassigning myself because I'm out of ideas/things to test / I think i can make better progress elsewhere for now.
Comment 32•5 years ago
|
||
Lowinging priority after chatting with sdetar last week about this.
Comment 33•5 years ago
|
||
After discussions with Paul on this, at this point this bug is somewhat stalled and not possible for FF64. We are moving the QF target to [qf:p1:67] and something we still need to work on, but just not for the QF 64 release.
Comment 34•5 years ago
|
||
As requested an ni? for Paul; I just want him to be aware of Bug 1500920, which -may- have an impact on this, as the JIT code was checking the pretenured flag wrong, and as a result could have created tenured objects more often than expected based on setShouldPretenure.
Updated•5 years ago
|
Comment 35•5 years ago
|
||
This is now a lot faster. I want to keep the bug open a little longer because there's still a lot of full slot buffer GCs here. And that's code I'm modifying at the moment.
![]() |
Reporter | |
Comment 36•5 years ago
|
||
(In reply to Paul Bone [:pbone] from comment #35) > This is now a lot faster. I want to keep the bug open a little longer > because there's still a lot of full slot buffer GCs here. And that's code > I'm modifying at the moment. I retested this on latest Nightly, but nothing improved. Build ID 20181115100051 User Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0 It takes around 30s on Nightly65.0a1, but 10s on Firefox19 and 8s on Chrome Dev72....
Comment 37•5 years ago
|
||
Weird, Firefox Nightly 20181115223444: 14sec Chrome 70.0.3538.77: 9sec Still not as fast as it should be, but it used to be 20+seconds for me in Firefox. I see that yours is much slower. I don't know why, maybe Windows vs Linux is a factor. Thanks.
Comment 38•5 years ago
|
||
the page is also rather slow on my windows10x64 pc on latest nightly. therefore I also created a profile. maybe its helpfull for someone: the site feels slow to me, because the waiting spinner behind the "513 comments" text spins for ~15 seconds. https://perfht.ml/2PtHP8B
Updated•5 years ago
|
Comment 39•5 years ago
|
||
This page uses tons of promises, so anything under bug 1342037 would probably help.
Updated•5 years ago
|
Comment 40•2 years ago
|
||
This is still a really bad problem. It's not 30s, but still takes around 10s+ while Chrome is around 4.
Comment 41•2 years ago
|
||
That profile is awful from a responsiveness perspective. It's allocating fast enough that the GC keeps going nonincremental to catch up.
It also has frequent long delays while it flushes the microtask queue.
For the GC stuff, it would probably be better to slow the mutator down and shift time towards the GC until it can keep up. But that's not going to help the 10s load, and might even make it worse.
It seems to do lots of FETCH_ITEMS
calls (within a promise callback), without ever yielding. There's a fair amount of, but not all that much, network stuff going on throughout the whole run.
Comment 42•2 years ago
|
||
Adding some metrics here for reference:
ContentfulSpeedIndex: Chrome: 2580.00 (+- 109) Firefox: 4388.33 (+- 190) , -70.09%
PageLoadTime: Chrome: 1942.33 (+- 259) Firefox: 3476.44 (+- 889) , -78.98%
Comment 43•2 years ago
|
||
Gonna move this to p2 since it's not a critical website and fixing this would probably require a large architectural change.
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•1 year ago
|
Description
•