Closed Bug 1362068 Opened 3 years ago Closed 2 years ago

Amazon homepage start: loading home page is slow

Categories

(Core :: JavaScript Engine, defect)

55 Branch
x86_64
Windows 10
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Tracking Status
platform-rel --- ?

People

(Reporter: afilip, Assigned: djvj)

References

(Blocks 2 open bugs, )

Details

(Keywords: meta, perf, Whiteboard: [qf:p1:f64][QRC][platform-rel-Amazon])

Name:Firefox
Version:55.0a1
Windows 10 64 bit
Build ID:20170503030212


Using STR Amazon scenario https://docs.google.com/spreadsheets/d/1Kxn850VasyaG_WfRg3pMInW0hbIT8LP7pRPBYTIpdbM/edit#gid=622178261
 here is the Gecko profile.If more information is needed please contact me

Gecko profile: https://perfht.ml/2p9juTf
The profile in comment 0 isn't symbolicated so it's not easy to analyze.  I captured a few profiles myself on the Acer hardware.  Here is two from loading Amazon.com:

https://perfht.ml/2peqjmU
https://perfht.ml/2pey9wE

Here is the analysis from the second profile:

  * This hits these sync IPCs: bug 1349255 and bug 1347035.  In this profile they both take a minute amount of time but I had another profile captured where the latter was taking ~20ms which I couldn't upload due to an error.  At any rate the runtime of these will vary due to various factors such as how busy the parent is, the system load, etc.
  * About 90ms overall is spent under PresShell::Paint() during the large timeline of event loop unresponsiveness during page load.  Out of this time, 19ms is displaylist construction and 24ms is layer building.  Not sure if that's good or bad...  https://perfht.ml/2peEKHx is an expensive refresh driver tick that I noticed.  Not sure if that's significant...
  * Lots of synchronous frame and style flushes coming from Amazon scripts: https://perfht.ml/2peGKPZ...  I have contacted Amazon about one such case and have yet to hear back (bug 1357621) but we should just make Gecko as fast as possible on these parts, at least the reflows.  Filed bug 1362586.
  * There is a super expensive JS function that runs at the very early stage of the page load, filed bug 1362591 for that.
  * In general, a lot of the profile is spent running various JS code, I looked for more things that would catch my eyes to no avail.  Could use more JS focused investigation.
  * Only about 11ms GC (plus 10ms minor GC).
Flags: needinfo?(kvijayan)
Summary: STR Amazon profiling → Quantum Release Criteria: Loading www.amazon.com
See Also: → 1362625
I'm getting close to having instrumented intrinsics to capture time.  I'll try it with this page to see if anything sticks out as an obvious issue.
Flags: needinfo?(kvijayan)
Whiteboard: [qf]
(In reply to Kannan Vijayan [:djvj] from comment #2)
> I'm getting close to having instrumented intrinsics to capture time.  I'll
> try it with this page to see if anything sticks out as an obvious issue.

See bug 1362591 in particular.  :-)
Whiteboard: [qf] → [qf:p1]
Duplicate of this bug: 1362592
Blocks: Quantum
Keywords: meta
Alias: Loading_Amazon
Alias: Loading_Amazon → LoadingOn_Amazon
Blocks: QRC_FX57
No longer blocks: Quantum
Alias: LoadingOn_Amazon → qrc-amazon-home
Summary: Quantum Release Criteria: Loading www.amazon.com → Loading Amazon.com home page is slow
Whiteboard: [qf:p1] → [qf:p1][QRC]
Depends on: 1363352
Summary: Loading Amazon.com home page is slow → Amazon.com home page is slow
No longer depends on: 1363352
Alias: qrc-amazon-home
Summary: Amazon.com home page is slow → Amazon homepage start: loading home page is slow
Assignee: nihsanullah → nobody
Kannan, based on your comment 2 can you retry to analyze the profile with your instrumented intrinsics to capture time.
Assignee: nobody → kvijayan
Setting needinfo as marker.
Flags: needinfo?(kvijayan)
Measured just the load time.  Builtin usage summary is here (culled everything with ticks < 100k):

ToObject                                 ticks=100673        count=2160         
std_Math_min                             ticks=118080        count=1284         
String_substr                            ticks=119104        count=40           
str_charCodeAt                           ticks=142144        count=2446         
std_Math_max                             ticks=157536        count=1367         
array_unshift                            ticks=170848        count=114          
array_isArray                            ticks=172006        count=3962         
IsArray                                  ticks=178625        count=3015         
String_slice                             ticks=191584        count=72           
IsRegExpObject                           ticks=213060        count=5485         
str_toUpperCase                          ticks=216736        count=443          
RegExpPrototypeOptimizable               ticks=220832        count=1809         
str_charAt                               ticks=228834        count=3665         
IsCallable                               ticks=261248        count=4705         
SubstringKernel                          ticks=278176        count=1153         
RegExpInstanceOptimizable                ticks=279584        count=2850         
IsObject                                 ticks=289094        count=7342         
ToString                                 ticks=294626        count=5889         
array_lastIndexOf                        ticks=315776        count=92           
array_sort                               ticks=381024        count=200          
RegExpGetSubstitution                    ticks=401888        count=270          
UnsafeGetInt32FromReservedSlot           ticks=406498        count=6741         
IsPackedArray                            ticks=494019        count=12276        
array_pop                                ticks=683072        count=802          
RegExpTester                             ticks=829984        count=638          
_IsConstructing                          ticks=889248        count=940          
array_splice                             ticks=939168        count=574          
str_indexOf                              ticks=1265792       count=4779         
array_shift                              ticks=1562336       count=1654         
array_join                               ticks=1693760       count=521          
str_toLowerCase                          ticks=1807424       count=7421         
RegExpSearcher                           ticks=2603584       count=2319         
StringSplitString                        ticks=3511872       count=1315         
std_Array                                ticks=3980288       count=2557         
RegExpMatcher                            ticks=6434272       count=2122         
array_slice                              ticks=7952192       count=2795         
String_split                             ticks=9494880       count=1685         
array_push                               ticks=11173504      count=25428        

Still spending a lot of time in array_push.  I suspect this is mostly interpreter calls, since this is optimized away in baseline now.  Makes sense as this is page load.  Most of this code isn't baseline optimized since it doesn't get hot.  So we see the same top-functions show up from before we optimized them because the interpreter is weighted high.
Flags: needinfo?(kvijayan)
Whiteboard: [qf:p1][QRC] → [qf:p2][QRC]
platform-rel: --- → ?
Whiteboard: [qf:p2][QRC] → [qf:p2][QRC][platform-rel-Amazon]
Keywords: perf
Whiteboard: [qf:p2][QRC][platform-rel-Amazon] → [qf:p1][QRC][platform-rel-Amazon]
Plz remeasure, kannan.
Whiteboard: [qf:p1][QRC][platform-rel-Amazon] → [qf:i60][qf:p1][QRC][platform-rel-Amazon]
Whiteboard: [qf:i60][qf:p1][QRC][platform-rel-Amazon] → [qf:f60][qf:p1][QRC][platform-rel-Amazon]
Whiteboard: [qf:f60][qf:p1][QRC][platform-rel-Amazon] → [qf:f61][qf:p1][QRC][platform-rel-Amazon]
Whiteboard: [qf:f61][qf:p1][QRC][platform-rel-Amazon] → [qf:f64][qf:p1][QRC][platform-rel-Amazon]
Whiteboard: [qf:f64][qf:p1][QRC][platform-rel-Amazon] → [qf:p1:f64][QRC][platform-rel-Amazon]
Component: General → JavaScript Engine
I looked at this again yesterday and the profiles show that we are at parity with Chrome.  It was ad-hoc testing with multiple runs using my smartphone stopwatch - on a win10 desktop (nightly vs canary) and on my linux laptop (nightly vs stable chrome).

Just for kicks, I noted the biggest time-sinks in the profile:

5.6% in ReprotectRegion, invoked by:
  - 2.3% from Linker::newCode
  - 1.3% from IonCacheIRCompiler::compile
  - 1.0% from BaselineCompiler::compile

1.8% in VirtualAlloc called by LifoAlloc
1.2% in NativeGetProprety
0.8% AtomizeChars
0.7% MBasicBlock::new
Closing as worksforme.  This was recorded a year ago and we have optimized a number of the high-use builtins that show up in this one.  The latest measurements show performance parity on Linux and Win10 at the office.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.