Closed Bug 1362068 Opened 3 years ago Closed 2 years ago
Amazon homepage start: loading home page is slow
Name:Firefox Version:55.0a1 Windows 10 64 bit Build ID:20170503030212 Using STR Amazon scenario https://docs.google.com/spreadsheets/d/1Kxn850VasyaG_WfRg3pMInW0hbIT8LP7pRPBYTIpdbM/edit#gid=622178261 here is the Gecko profile.If more information is needed please contact me Gecko profile: https://perfht.ml/2p9juTf
The profile in comment 0 isn't symbolicated so it's not easy to analyze. I captured a few profiles myself on the Acer hardware. Here is two from loading Amazon.com: https://perfht.ml/2peqjmU https://perfht.ml/2pey9wE Here is the analysis from the second profile: * This hits these sync IPCs: bug 1349255 and bug 1347035. In this profile they both take a minute amount of time but I had another profile captured where the latter was taking ~20ms which I couldn't upload due to an error. At any rate the runtime of these will vary due to various factors such as how busy the parent is, the system load, etc. * About 90ms overall is spent under PresShell::Paint() during the large timeline of event loop unresponsiveness during page load. Out of this time, 19ms is displaylist construction and 24ms is layer building. Not sure if that's good or bad... https://perfht.ml/2peEKHx is an expensive refresh driver tick that I noticed. Not sure if that's significant... * Lots of synchronous frame and style flushes coming from Amazon scripts: https://perfht.ml/2peGKPZ... I have contacted Amazon about one such case and have yet to hear back (bug 1357621) but we should just make Gecko as fast as possible on these parts, at least the reflows. Filed bug 1362586. * There is a super expensive JS function that runs at the very early stage of the page load, filed bug 1362591 for that. * In general, a lot of the profile is spent running various JS code, I looked for more things that would catch my eyes to no avail. Could use more JS focused investigation. * Only about 11ms GC (plus 10ms minor GC).
Summary: STR Amazon profiling → Quantum Release Criteria: Loading www.amazon.com
I'm getting close to having instrumented intrinsics to capture time. I'll try it with this page to see if anything sticks out as an obvious issue.
(In reply to Kannan Vijayan [:djvj] from comment #2) > I'm getting close to having instrumented intrinsics to capture time. I'll > try it with this page to see if anything sticks out as an obvious issue. See bug 1362591 in particular. :-)
Alias: LoadingOn_Amazon → qrc-amazon-home
Summary: Quantum Release Criteria: Loading www.amazon.com → Loading Amazon.com home page is slow
Whiteboard: [qf:p1] → [qf:p1][QRC]
Summary: Loading Amazon.com home page is slow → Amazon.com home page is slow
Summary: Amazon.com home page is slow → Amazon homepage start: loading home page is slow
Kannan, based on your comment 2 can you retry to analyze the profile with your instrumented intrinsics to capture time.
Assignee: nobody → kvijayan
Setting needinfo as marker.
Measured just the load time. Builtin usage summary is here (culled everything with ticks < 100k): ToObject ticks=100673 count=2160 std_Math_min ticks=118080 count=1284 String_substr ticks=119104 count=40 str_charCodeAt ticks=142144 count=2446 std_Math_max ticks=157536 count=1367 array_unshift ticks=170848 count=114 array_isArray ticks=172006 count=3962 IsArray ticks=178625 count=3015 String_slice ticks=191584 count=72 IsRegExpObject ticks=213060 count=5485 str_toUpperCase ticks=216736 count=443 RegExpPrototypeOptimizable ticks=220832 count=1809 str_charAt ticks=228834 count=3665 IsCallable ticks=261248 count=4705 SubstringKernel ticks=278176 count=1153 RegExpInstanceOptimizable ticks=279584 count=2850 IsObject ticks=289094 count=7342 ToString ticks=294626 count=5889 array_lastIndexOf ticks=315776 count=92 array_sort ticks=381024 count=200 RegExpGetSubstitution ticks=401888 count=270 UnsafeGetInt32FromReservedSlot ticks=406498 count=6741 IsPackedArray ticks=494019 count=12276 array_pop ticks=683072 count=802 RegExpTester ticks=829984 count=638 _IsConstructing ticks=889248 count=940 array_splice ticks=939168 count=574 str_indexOf ticks=1265792 count=4779 array_shift ticks=1562336 count=1654 array_join ticks=1693760 count=521 str_toLowerCase ticks=1807424 count=7421 RegExpSearcher ticks=2603584 count=2319 StringSplitString ticks=3511872 count=1315 std_Array ticks=3980288 count=2557 RegExpMatcher ticks=6434272 count=2122 array_slice ticks=7952192 count=2795 String_split ticks=9494880 count=1685 array_push ticks=11173504 count=25428 Still spending a lot of time in array_push. I suspect this is mostly interpreter calls, since this is optimized away in baseline now. Makes sense as this is page load. Most of this code isn't baseline optimized since it doesn't get hot. So we see the same top-functions show up from before we optimized them because the interpreter is weighted high.
platform-rel: --- → ?
Whiteboard: [qf:p2][QRC] → [qf:p2][QRC][platform-rel-Amazon]
Whiteboard: [qf:p2][QRC][platform-rel-Amazon] → [qf:p1][QRC][platform-rel-Amazon]
Plz remeasure, kannan.
Whiteboard: [qf:p1][QRC][platform-rel-Amazon] → [qf:i60][qf:p1][QRC][platform-rel-Amazon]
Whiteboard: [qf:i60][qf:p1][QRC][platform-rel-Amazon] → [qf:f60][qf:p1][QRC][platform-rel-Amazon]
Whiteboard: [qf:f60][qf:p1][QRC][platform-rel-Amazon] → [qf:f61][qf:p1][QRC][platform-rel-Amazon]
Whiteboard: [qf:f61][qf:p1][QRC][platform-rel-Amazon] → [qf:f64][qf:p1][QRC][platform-rel-Amazon]
Whiteboard: [qf:f64][qf:p1][QRC][platform-rel-Amazon] → [qf:p1:f64][QRC][platform-rel-Amazon]
I looked at this again yesterday and the profiles show that we are at parity with Chrome. It was ad-hoc testing with multiple runs using my smartphone stopwatch - on a win10 desktop (nightly vs canary) and on my linux laptop (nightly vs stable chrome). Just for kicks, I noted the biggest time-sinks in the profile: 5.6% in ReprotectRegion, invoked by: - 2.3% from Linker::newCode - 1.3% from IonCacheIRCompiler::compile - 1.0% from BaselineCompiler::compile 1.8% in VirtualAlloc called by LifoAlloc 1.2% in NativeGetProprety 0.8% AtomizeChars 0.7% MBasicBlock::new
Closing as worksforme. This was recorded a year ago and we have optimized a number of the high-use builtins that show up in this one. The latest measurements show performance parity on Linux and Win10 at the office.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.