Open Bug 1269690 (gdoc_read_basic_table_1_ubuntu(45.25%)) Opened 5 years ago Updated 1 year ago

[perf][google suite][google docs] 45.25%(4528ms) slower than Chrome when opening 1 page table content

Categories

(Core :: JavaScript Engine, defect, P2)

45 Branch
x86
Linux
defect

Tracking

()

ASSIGNED
Tracking Status
platform-rel --- -
firefox-esr45 --- affected
firefox52 --- wontfix

People

(Reporter: sho, Assigned: djvj)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Keywords: perf, Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate])

User Story

You can find all test scripts on github link:
https://github.com/mozilla-twqa/hasal

And you can also find the running script name in comments
for example: 
test scripts: test_chrome_gdoc_create_txt_1
then you can specify the test script name on suite.txt and run it

Attachments

(19 files)

7.84 MB, text/plain
Details
329.61 KB, image/png
Details
8.21 MB, text/plain
Details
8.20 MB, text/plain
Details
958 bytes, text/plain
Details
958 bytes, text/plain
Details
475.16 KB, image/png
Details
437.37 KB, image/png
Details
380.74 KB, image/png
Details
934.19 KB, image/png
Details
3.48 MB, application/json
Details
3.56 MB, application/json
Details
3.15 MB, application/json
Details
3.11 MB, application/json
Details
2.73 MB, text/plain
Details
3.04 MB, text/plain
Details
466.61 KB, text/plain
Details
267.70 KB, text/plain
Details
19.20 KB, patch
Details | Diff | Splinter Review
# Test Case
STR
1. Launch the browser with blank page
2. input the google doc url with 1 table content
3. close the browser

# Hardware
OS: Ubuntu 14.04 LTS 64-bit
CPU: i7-3770 3.4GMhz
Memory: 16GB Ram
Hard Drive: 1TB SATA HDD
Graphics: GK107 [GeForce GT 640]/ GF108 [GeForce GT 440/630]

# Browsers
Firefox version: 45.0.2
Chrome version: 50.0.2661.75

# Result
Browser | Run time (median value) 
Firefox | 14533.3333 ms
Chrome  | 10005.5556 ms
Product: Firefox → Core
Version: unspecified → 45 Branch
# Profiling Data:
https://cleopatra.io/#report=2544480759b8351b33cfb70d34165c6130cf973f

# Performance Timing:
http://goo.gl/fhD77x
{
  "navigationStart\": 1462949113471,
  "unloadEventStart\": 0,  
  "unloadEventEnd\": 0,  
  "redirectStart\": 0,  
  "redirectEnd\": 0,  
  "fetchStart\": 1462949113476,  
  "domainLookupStart\": 1462949113476,  
  "domainLookupEnd\": 1462949113476,  
  "connectStart\": 1462949113476,  
  "connectEnd\": 1462949113476,  
  "requestStart\": 1462949113479,  
  "responseStart\": 1462949113891,  
  "responseEnd\": 1462949113951,  
  "domLoading\": 1462949113895,  
  "domInteractive\": 1462949116185,  
  "domContentLoadedEventStart\": 1462949116276,  
  "domContentLoadedEventEnd\": 1462949116277,  
  "domComplete\": 1462949121346,  
  "loadEventStart\": 1462949121346,  
  "loadEventEnd\": 1462949121353
}

# Test Script:
https://github.com/Mozilla-TWQA/Hasal/blob/master/tests/test_firefox_gdoc_read_basic_table_1.sikuli/test_firefox_gdoc_read_basic_table_1.py

# Test Data:
https://docs.google.com/document/d/1d49NaEpY9G9laZY_OPYfYdjngeAvcGNEPMTEVsW76js/edit
# Try to identify the time between "domLoading" to "loadEventEnd" 

Perf Timing domLoading to loadEventEnd: 7882 - 424 = 7458 ms
AddOn Start to End mark: 41375 - 33189 = 8186 ms

From Gecko Profiling data, the Range [33105, 41343]:
 8111  100.0%  Startup::XRE_Main
 2337   32.4%  ├─ nsHtml5TreeOpExecutor::RunFlushLoop
 2168   26.7%  │  ├─ nsJSUtils::EvaluateString 
  137    1.7%  │  ├─ EventDispatcher::Dispatch
               │  └─ ...so on
               │
 1800   22.2%  ├─ nsInputStreamPump::OnInputStreamReady
 1650   20.3%  │  ├─ nsInputStreamPump::OnStateStop  
  106    1.3%  │  ├─ nsInputStreamPump::OnStateStart
   44    0.5%  │  └─ nsInputStreamPump::OnStateTransfer
               │
 1127   13.9%  ├─ Timer::Fire
 1102   13.6%  │  ├─ js::RunScript
   14    0.2%  │  ├─ nsCycleCollector::forgetSkippable
               │  └─ ...so on
               │
  624    7.7%  ├─ nsRefreshDriver::Tick
  323    4.0%  │  ├─ PressShell::Paint
  113    1.4%  │  ├─ PressShell::Flush (Flush_Style)
               │  └─ ...so on
               └─ ...so on

Note
 1. Filed Bug 1271912 for tracking "nsJSUtils::EvaluateString"
 2. Filed Bug 1271913 for tracking "nsInputStreamPump::OnStateStop"
 3. Filed Bug 1271914 for tracking "js::RunScript"
 4. Filed Bug 1271915 for tracking "nsRefreshDriver::Tick"
QA Contact: ctang
User Story: (updated)
Bug 1271912 has been marked as a duplicate of bug 1270351
Bug 1271913 has been marked as a duplicate of bug 1267971
Bug 1271915 has been marked as a duplicate of bug 1270427
Depends on: 1270351, 1267971, 1270427
Whiteboard: [platform-rel-Google][platform-rel-GoogleDocs]
platform-rel: --- → ?
Flags: needinfo?(tlee)
I found a lot bailout in profiles.  So, I disable IonMonkey, |javascript.options.ion=false|, and do test more times.  The table has been showed ~3s earlier while it was 20+s on my laptop.

Shako,
could you run your test with the change in |about:config| to confirm if I am right?
Flags: needinfo?(tlee) → needinfo?(sho)
Sure, test is already running,  waiting for the result
Flags: needinfo?(sho)
Hi Thinker,
The test is finish. Comparing the result between enable and disable IonMonkey, disable IonMonkey is slower than enable IonMonkey around 2016 ms which is around 8% slower.

Machine spec:
# Hardware
OS: Ubuntu 14.04 LTS 64-bit
CPU: i7-3770 3.4GMhz
Memory: 16GB Ram (4GB 1600MHZ x 4)
Hard Drive: 1TB SATA HDD
Graphics: GF108 [GeForce GT 440/630]

# Browsers
Firefox version: 45.0.2

# Result
Browser | Run time (median value) 
Firefox with enable IonMonkey | 24411 ms
Firefox with disable IonMonkey| 26427 ms
Flags: needinfo?(tlee)
After having told to Shako, we found the way doing test is not what is in my mind.  Shako and his team would modify the test and the framework they used, and do another round.
Flags: needinfo?(tlee)
Severity: normal → critical
Priority: -- → P1
Flags: needinfo?(overholt)
Flags: needinfo?(kchen)
Flags: needinfo?(bugs)
Are we spending lots of time in the JS engine and in the parser? Am I reading the profile correctly?
Flags: needinfo?(overholt)
Flags: needinfo?(nihsanullah)
Flags: needinfo?(hsivonen)
(In reply to Andrew Overholt [:overholt] from comment #9)
> Are we spending lots of time in the JS engine and in the parser? Am I
> reading the profile correctly?

nsHtml5TreeOpExecutor::RunFlushLoop() is responsible for executing scripts provided via <script>, so it seems that this isn't the parser itself being slow but the first parse + execution of *inline* scripts being slow.
Flags: needinfo?(hsivonen)
Inline as in not <script src> but <script> // something </script>
Thanks, Henri. I'll wait for Naveed to comment here.
Attached image profiler screenshot
Are the /usr/lib/firefox/libxul.so references just errors in the profiler or are we actually profiling the system-installed Firefox and not nightly?
Flags: needinfo?(sho)
Make a brief.

Just like what Henri said, there is a 11s long inline script (on my laptop).  The table is showed after the inline script.  There are some sync resytle and reflow during the inline script.  And, the number of bailout is high.  I had tried to disable IonMonkey (for bailout) to see what will happen.  And, it seems a bit faster for the inline script, and table was showed earlier (~3s).  But, it made other JS tasks later slower.

I am waiting for Shako and his members to run more tests to confirm what I have found.  They told me today they will do it in the following days since they just complete the change against their test case to meet what I had requested for.
Reply for comment 13
Hi Andrew,
In this test result and profiler data, we test on the system-installed Firefox. 
If you are looking for the test result on Nightly, you could reference the bug 1285178
Flags: needinfo?(sho)
Hi Thinker,
The test is finish. Comparing the result between enable and disable IonMonkey, disable IonMonkey is slower than enable IonMonkey around 333 ms which is around 4% slower.

The running time measurement start from browser with blank page to the gdoc table snapshot appear.

Machine spec:
# Hardware
OS: Ubuntu 14.04 LTS 64-bit
CPU: i7-3770 3.4GMhz
Memory: 16GB Ram (4GB 1600MHZ x 4)
Hard Drive: 1TB SATA HDD
Graphics: GF108 [GeForce GT 440/630]

# Browsers
Firefox version: 45.0.2

# Result
Browser | Run time (median value) 
Firefox with enable IonMonkey | 7944 ms
Firefox with disable IonMonkey| 8277 ms
Flags: needinfo?(tlee)
(In reply to Shako Ho from comment #15)
> Reply for comment 13
> Hi Andrew,
> In this test result and profiler data, we test on the system-installed
> Firefox. 

I see. I don't know if this is worth the effort.

> If you are looking for the test result on Nightly, you could reference the
> bug 1285178

Ok, I think we should focus on that bug, then. Although I'm surprised the distro-shipped build (or maybe the same would occur with a release build from mozilla.org?) is that much slower!
platform-rel: ? → +
(In reply to Shako Ho from comment #16)
> Hi Thinker,
> The test is finish. Comparing the result between enable and disable
> IonMonkey, disable IonMonkey is slower than enable IonMonkey around 333 ms
> which is around 4% slower.
> 
> The running time measurement start from browser with blank page to the gdoc
> table snapshot appear.
> 
> Machine spec:
> # Hardware
> OS: Ubuntu 14.04 LTS 64-bit
> CPU: i7-3770 3.4GMhz
> Memory: 16GB Ram (4GB 1600MHZ x 4)
> Hard Drive: 1TB SATA HDD
> Graphics: GF108 [GeForce GT 440/630]
> 
> # Browsers
> Firefox version: 45.0.2
> 
> # Result
> Browser | Run time (median value) 
> Firefox with enable IonMonkey | 7944 ms
> Firefox with disable IonMonkey| 8277 ms
Flags: needinfo?(tlee)
Hi Thinker,

The new profiler is already uploaded please check the link below:

ionMonkey disabled:
https://cleopatra.io/#report=2677ae2d3884d928408b35ad4c1ce0408d3e0220

ionMonkey enabled:
https://cleopatra.io/#report=3a4ed1c606478b4d990174633432340c93860677

And also upload the har file in this bug, please find the attachment:
- Har file with ionMonkey enabled 201607151058 
- Har file with ionMonkey disabled 201607151058
Flags: needinfo?(tlee)
Alias: gdoc_read_basic_table_1
Alias: gdoc_read_basic_table_1 → gdoc_read_basic_table_1(45.25%)
Alias: gdoc_read_basic_table_1(45.25%) → gdoc_read_basic_table_1_ubuntu(45.25%)
Hi Thinker,

Just redo the profiler data based on your request, Nightly build 50.0a1 with gecko profiler and perfmarker, before click the hyperlink there will be a "Before click hyperlink" event sent.

ionMonkey disabled:
https://cleopatra.io/#report=dcfea2b043dd1c28447d593fe78448b837d65843

ionMonkey enabled:
https://cleopatra.io/#report=a311efa7d5902550d0571352070175b80101623f
Shako,

I read the profile, one thing is weir.
From last |before click hyperlink| mark to the time table showed, it takes about only 4.3s here.  You last test in comment 16 is about 9s.  They are quite different.  Could check if they are really different in the nightly by running both test cases (typing in location bar, and clicking a link).  It would be an interest problem if old test case are such slow.
Flags: needinfo?(tlee)
Hi Thinker,

Before doing the test, I have one quick question. How do you get the time stamp of table showed? According to the profile, from the "Before click hyperlink" marker to "PageMod End" marker seems takes about 9-10 seconds. But the timing is to the end of page fully loaded, so the table could show up earlier.
Hi Thinker,
I did some comparison between comment 16 and the latest profile, the spending time of table showing seems not take longer in the latest profile.

And you could also reference another bug 1285178  which is test for nightly build.
I'm generating data for IonMonkey bailouts. Maybe it can provide us some ideas:

https://docs.google.com/spreadsheets/d/1K866H7LGNIgiH4V0t7qJsDSDrCtgcjnPhu2JbQaSO8M/edit?usp=sharing

I will keep working on that to get more data for reporting.
I make a simple case with only blank page, and the bailout times is about 200 (halved). After inserting same table into the blank page, it grew up to 400, but not as much as QA's case (~470).

So it looks like the table growing contributes about 200 ~ 300 times bailouts. Maybe it's not too much compare to the bare-bone case.
Flags: needinfo?(kchen)
We've had some discoveries about the symptom. One comparison I noticed, is that there is very huge inline script in HTML that deserialize the content of the document and so on (edit:12). And that script, in Google Chrome, only takes about 1400ms to execute. However, on FirefoxDeveloperEdition 47.a1, it takes ~3500ms to execute. After I upgraded the Developer edition, it grew up to ~7000ms to execute, which is close to the result from Nightly.

The results from both two browsers' performance tools has been saved on my computer. And I now upload screenshots to make my description clear.

I cannot say I got exactly the same result per test. However, Google Chrome is always faster on that inline script. Maybe our JS engine needs some tuning for such inline script, or for the task kind it executes. Therefore, I also uploaded one Gecko profiler for that scope:

https://cleopatra.io/#report=aa4cec913875ef1f9011b0f6910d378073c8d096&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A283439,%22end%22%3A292070%7D,%7B%22type%22%3A%22FocusedCallstackPrefixSampleFilter%22,%22name%22%3A%22%3CAnonymous%3E()%20%40%20edit%3Fpref%3D2%26pli%3D1%3A12%22,%22focusedCallstack%22%3A%5B0,7608%5D,%22appliesToJS%22%3Atrue%7D%5D

But I must say to recognize the point represents the inline script is a bit tricky. Firstly, you need to switch the frame to "Content" with the "JavaScript only" checked. So that you can see the "<Anonymous>() @edit@pref=2&pli=1:12" call. Then, clicking on its arrow then uncheck the "JavaScript only" to show the details inside the engine. Currently I saw that the profiler reports it takes most of time on "js::Shape::search" and "js::NativeGetPropertyNoGC" inside the anonymous function call. I guess it's because the script is expanding the whole document structure so the engine needs to handle a very deep object via lots of getting properties, but I'm not sure this is the cause or the symptom.

NI Naveed because of the JS finding (sorry I cannot NI who may be dedicated to issues like this); NI Bobby because of new updates.
Flags: needinfo?(bchien)
Another thing is I've noticed that our MinorGC is quite unstable in and after the inline script evaluation: sometimes it take 4 minor GCs at least occur 40ms ~ 120ms during the script evaluation time. And very often there will be a ~500ms(!) minor GC after the script evaluation (on Nightly). So far I haven't seen Chrome will take so much time on minor GC, although it also has some pieces larger than 10ms.
Including Terrence and Steve because of GC and GDocs respectively
Flags: needinfo?(nihsanullah) → needinfo?(terrence)
I found a serious problem.  Vsync notifications from the compositor thread had been put in the task queue of the main thread for 16s with my firefox build (debug, m-c), 18s in the whole.  The notification delivered to the refresh driver at 6s later after the content was generated by a big chunk of JS code.  That means time to drawing the content would be 6s shorter if the refresh driver was ticking at right time.

The variant, in time, of firefox is unreasonable huge.  I guess it is one of the major reasons, timing of vsync notifications.  By my estimation, it could affect 10+% of time in average.  We need someone to fix it.
(In reply to Thinker Li [:sinker] from comment #35)
> I found a serious problem.  Vsync notifications from the compositor thread
> had been put in the task queue of the main thread for 16s with my firefox
> build (debug, m-c), 18s in the whole.  The notification delivered to the
> refresh driver at 6s later after the content was generated by a big chunk of
> JS code.  That means time to drawing the content would be 6s shorter if the
> refresh driver was ticking at right time.
> 
Bug 1294625 will address this problem to kick off refresh driver when JS paring is too long.
Therefore, it is possible to render part of content first before paring is finished.
> The variant, in time, of firefox is unreasonable huge.  I guess it is one of
> the major reasons, timing of vsync notifications.  By my estimation, it
> could affect 10+% of time in average.  We need someone to fix it.
Depends on: 1294625
Flags: needinfo?(bchien)
With my colleague's top most MBP (15", 2.7GHz Core i7), the fattest script runs only in 2.4seconds, while Chrome still takes 1.4 seconds to run it. On my MBP (13", 3.1GHz Core i7), Firefox takes 6 ~ 8 seconds to evaluate the script while Chrome takes almost the same 1.4 seconds to evaluate it. I don't know if this reveals any important information, but it is quite interesting.

And now I try to use Tracelogger to probe the case. Unfortunately, the tracelogger cannot generate result in website. I think it's because the case is too large, because for a very simple Google front page it works well.
By the way, all of this benchmarking should be done with JSGC_DISABLE_POISONING set to 1. I created a page https://wiki.mozilla.org/Benchmarking_advice to collect that sort of information, though I'll probably move it to an MDN page once my permissions over there are straightened out.

I attempted to follow the instructions in comment 30, but I could not see "<Anonymous>() @edit@pref=2&pli=1:12" in the cleopatra UI anywhere. It might just be me, though; I have a lot of trouble using cleo. I will attach a screenshot.
Because Tracelogger result cannot not be shared easily as Gecko profiler, I built a Heroku app to demo the data collected online:

https://boiling-coast-67040.herokuapp.com/website/tracelogger.html

This is for this Google Doc case and on my self-built Nightly (NOT debug build). Because my DeveloperEdition reported empty *.tl files while I tried to tracelogger it.

If anyone has new issue want to profile and upload, please NI me. Thanks.
(In reply to Steve Fink [:sfink] [:s:] from comment #38)
> By the way, all of this benchmarking should be done with
> JSGC_DISABLE_POISONING set to 1. I created a page
> https://wiki.mozilla.org/Benchmarking_advice to collect that sort of
> information, though I'll probably move it to an MDN page once my permissions
> over there are straightened out.
> 
> I attempted to follow the instructions in comment 30, but I could not see
> "<Anonymous>() @edit@pref=2&pli=1:12" in the cleopatra UI anywhere. It might
> just be me, though; I have a lot of trouble using cleo. I will attach a
> screenshot.

Okay. Here is a faster way to expand it:

1. move the mouse pointer to the Content timeline until the position of "285143ms"
2. Click on that
3. Wait it expands the stack
4. It should be in the expanded stack
According to the Tracelogger result, I've found the longest inline script seems not in the logs. I checked the log and it contained several inline & external script compilation information, but the edit:12 line wasn't in it.

I don't know if this means something. Since I have a local version to pull it as an external script, I will do another tracelogging result to confirm that.
Thanks, with those instructions I was able to see the Anonymous function you were pointing to.

As I understand it, there are 3 different issues involved here:

1. The vsync timing problem. A mitigation is being worked on in bug 1294625.

2. Very slow minor GCs. I'd like to get the output of a run with JS_GC_PROFILE_NURSERY=50000

3. Very very slow parsing + execution of an inline <script>...</script>, which doesn't seem to be showing up in tracelogger. I'm going to run this locally to investigate this further. (I tried this before and had trouble, but it was OS misconfiguration on my machine, and I recently did a fresh install.) I'd like to figure out if there are any stalls waiting for the script to arrive to be parsed or something.
As I mentioned, I have a local version pull all huge inline scripts out of HTML, and put them in 4 files (ex1.js ~ ex4.js, while ex3.js is corresponding to the largest one costs longest time in the timeline). And I now have the Tracelogger result for it:

https://murmuring-castle-30204.herokuapp.com/website/tracelogger.html

There is only part of ex4.js has been Ion compiled in the graph. If this really shows something, may I assert that our Jits aren't so helpful for these kinds of scripts, even they're pulled out from HTML?
...and I've found the script, will take longer time to evaluate with e10s. My result with externalized scripts is:

Nightly w/  e10s: 4.2 seconds
Nightly w/o e10s: 2.2 seconds

DeveloperEdition w/  e10s: 5.2 seconds
DeveloperEdition w/o e10s: 2.1 seconds

Although it's fluctuated, I think the gap still not negligible. Thinker suggests that there may be too much tasks since extra IPC channels in e10s case. Maybe we should do some serious analysis on this result as well.
Depends on: 1296160
Depends on: 1296161
I've done some profiling on Win10 with official Nightly build. And it shows the performance on the "script" edit:12 makes no difference: they both complete the task in ~2.5 seconds.
Gecko profiler result here:

e10s
https://cleopatra.io/#report=25000cf96d090d5c5e61bbec563902d622b8c710

non-10s
https://cleopatra.io/#report=ee53f03572d113c505c8aabccbb46f4f8a4f5242
Too much |LogMessageRunnable| on the content's main thread, they should be aggregated.
No longer depends on: 1296160, 1296161
A lot of IPC messages in |PCompositorBridge::Msg_DidComposite| type are dispatched to the main thread of the content.  Most of them are with |aTransaction == 0|.  I think we should deliver this message only when necessary.
Flags: needinfo?(hshih)
Thanks, let me check the didComposite() call.
Flags: needinfo?(hshih)
Flags: needinfo?(bugs)
I tried to accumulate all time spent on IonBuilder, and got the result: 660ms. This is a rough measuring so I need to refine the method and compare it to the overview profiles we have already did. However, this can be the first step to answer "in the test case, how much time we will spend on compilation".
Hi Thinker,

Per our discussion, the testing result of partial matching for windows as below:
# Result without local proxy
Browser | Run time (median value) 
Chrome  | 2444.44444444 ms
Firefox | 2622.22222222 ms

# Result with local proxy
Browser | Run time (median value) 
Chrome  | 2344.44444444 ms
Firefox | 2488.88888889 ms
The flow files tested for windows are also attached.
Flags: needinfo?(tlee)
Steve is on it; I don't need to be ni'd.
Flags: needinfo?(terrence)
I have done a POC for prioritized runnable with priority from IPC message.  It looks very promise for time variance of the partial content tests.

This patch make runnables having priority, and process them in the order of their priority and the order of their arrival time.  IPC messages have priority too, the priority can be normal, high, or urgent.  I make the same priority value on runnables.  And, the priority of IPC message are passed to the tasks that is responsible to dequeue and execute them.

The result is the variance of partial content tests is quite low.  I have try it several time.  All them fall in 10s~12s, 11s for most cases.

To Shako,
could you try this patch?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bc5dddb372e6
Flags: needinfo?(tlee) → needinfo?(sho)
sure, will update the test result once test is finished
Flags: needinfo?(sho)
Hi Thinker,

The test result as below:

Median time: 4433.33333333      
Avg time   : 4450.74074074      
Std dev    : 87.7089900809
(In reply to Shako Ho from comment #65)
> Hi Thinker,
> 
> The test result as below:
> 
> Median time: 4433.33333333      
> Avg time   : 4450.74074074      
> Std dev    : 87.7089900809

Hi Shako,
What is the environment for the test?
Comment 58, the average time is about 2xxx.  Why is it 4xxx now?

And, I like to know what is different with the change.  So, could you also put down numbers of a reference test?
By the way, assume the run time is in normal distribution.  It means 95.x% of probability would fall in 4.45s+/-0.194s.
Hi Thinker,
The test result is based on Ubuntu 14.04.
The comment 58 is the test result based on Windows.

And you can also take the result compare with bug 1294625

1. check painting suppression status asap(bug 1271691) + tick refresh driver during html parsing(bug 1294625)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1f7ca95b09f

Median time: 4722.22222222      
Avg time   : 4733.7037037       
Std dev    : 165.812213019

2. tick refresh driver during html parsing(bug 1294625)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0063015b7218

Median time: 4877.77777778      
Avg time   : 5278.51851852      
Std dev    : 730.825866529

3. check painting suppression status asap(bug 1271691)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e14bdefbe9a1

Median time: 4500.0                           
Avg time   : 4512.59259259      
Std dev    : 89.8894306321
Depends on: 1306591
I have found that our helper threads in Ion seemed to only help MIR optimisation, while the "not-helper-threads" done almost 85% work. I don't know if it's in purpose but from the thread overview it makes the tasks very unbalanced. This makes me wonder if we can make it more balanced via delivering more tasks besides MIR optimisation to help threads, maybe we can raise our throughput in this case. However, I don't know if this is do-able so I firstly NI Steve.
Flags: needinfo?(sphink)
Are you seeing this in the TraceLogger output? If I understand correctly, you are seeing one thread doing a large chunk of JIT compilation, while most of the threads are doing nothing. The difficulty is finding things to do in parallel.

I will forward the question to Jan.

jandem - this may be a very naïve view of it, but my understanding is that you run baseline code until it hits some fixed iteration count threshold, then start ion compiling it on a helper thread, then switch over to the compiled version when the compilation is complete. Is that correct? If another script were to get hot enough while the first is compiling, would it get compiled in parallel, or is there only one compilation thread available? Is it possible to ion compile multiple scripts concurrently? Would it be possible, and would it make sense, to adjust the compilation threshold somewhat based on availability of compilation resources?

For example, you could do it in a work-stealing way: reduce the thresholds a bit. When a script hits the threshold, it is queued up in a priority queue sorted by (a function of) execution count. Worker threads pull items off the queue and compile them.

It introduces even more nondeterminism, which is unfortunate, and perhaps compilation simply cannot be done concurrently; I don't know.
Flags: needinfo?(sphink) → needinfo?(jdemooij)
Whiteboard: [platform-rel-Google][platform-rel-GoogleDocs] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
(In reply to Steve Fink [:sfink] [:s:] from comment #70)
> If
> another script were to get hot enough while the first is compiling, would it
> get compiled in parallel, or is there only one compilation thread available?
> Is it possible to ion compile multiple scripts concurrently? Would it be
> possible, and would it make sense, to adjust the compilation threshold
> somewhat based on availability of compilation resources?

We do have some mechanism to pause compilation threads when a new compilation job has higher priority. Hannes can probably say more about this.
Flags: needinfo?(jdemooij) → needinfo?(hv1989)
(In reply to Jan de Mooij [:jandem] from comment #71)
> We do have some mechanism to pause compilation threads when a new
> compilation job has higher priority. Hannes can probably say more about this.

bug 1013172 comment 0 describes this.

For various reasons we decided to only do one compilation at a time. (Not take all cpu power, battery consumption, peak memory ...). Which is what you are seeing here. 

Though, like said, that means priority is important. We currently decide this on the warmUp count and when another script because more hot, we will switch to complete that script first.

Now I don't think seeing a long time of compilations is inherently bad. We are quite aggressive in compiling things. As a result it could be that we keep compiling but that it doesn't make a big difference for the execution speed, since the functions aren't used that much...

If you have a tracelogger log you might want to iterate over:

[start AnnotateScript foo.js]
[start IonCompilation]
[stop IonCompilation]
[stop AnnotateScript]

...
[start script foo.js]
[start Interpreter/Baseline]
[stop Interpreter/Baseline]
[stop script foo.js]
...

[start AnnotateScript foo.js]
[start IonLinking]
[stop IonLinking]
[stop AnnotateScript]

if you would look in between start of the compilation and end of the compilation and count how many times a script was called and how much time we spend in that function (not children). That would give an estimate how much time was lost due to waiting for the compilation to be done. (This is inclusive the time it is waiting on getting build offthread).

If you want to know how to extract this information. Look at slide 35 or last slide. 
https://docs.google.com/presentation/d/1j_Hi9uP2Woz2OoAuxX9We11GbLhh0e8N-tbKYz1MeZU/edit?usp=sharing
It gives a easy way to iterate the tree. Changing that script a little bit should be able to give the above information.

If you have that information you would have an idea how much time we spend in Baseline waiting for IonMonkey to finish. 
If that is small, I'm pretty sure we won't see an improvement by compiling more at the same time.
Flags: needinfo?(hv1989)
Component: General → JavaScript Engine
Summary: [Perf][google docs] 45.25%(4528ms) slower than Chrome when opening 1 page table content → [perf][google suite][google docs] 45.25%(4528ms) slower than Chrome when opening 1 page table content
plat-rel tracked at the meta level
platform-rel: + → -
Shako, Thinker, now that bug 1306591 has landed, could we perhaps get some new numbers here?
Flags: needinfo?(tlee)
Flags: needinfo?(sho)
For gdoc_read_basic_table_1 case:
Firefox: http://tinyurl.com/hbr9nzz
Chrome: http://tinyurl.com/hh5mp23

The result is positive. It's hard to show number which based on same baseline/content with original context on this bug. However, by compare with previous revision, firefox has 13% improvement in this case. In the same time, Chrome has 8% improvement in same case. It's hard to say theses improvement was caused by web content or not, but we test Firefox and Chrome at similar time. 

For all regression cases:
Compared with previous revision: http://tinyurl.com/j8nvcp5
Compared with next revision: http://tinyurl.com/go9ugee

You can see similar trend in all categories from comparison (see results with Firefox tag) with previous revision. This patch has positive results. Especially, very good result in SpeedIndex (SI). 

Conclusion:
This patch does improve perceived performance, roughly less than 10% improvement.
Flags: needinfo?(tlee)
Flags: needinfo?(sho)
Bobby, could you do test to compare with previous revision at the version of gdoc?  In another word, run new and old revision at the same time to eliminate the possible of change of gdoc.  It is more reasonable to compare with firefox itself.
Flags: needinfo?(bchien)
Flags: needinfo?(bchien)
QA Contact: ctang
Flags: needinfo?(bchien)
Hi Thinker,

The test result as below:

OS: Windows 7
CPU: i7-3770 3.4GMhz
Memory: 16GB Ram (4GB 1600MHZ x 4)
Hard Drive: 1TB SATA HDD
Graphics: GF108 [GeForce GT 440/630]

                previous build  |   build with patch
Median time |  14541.6666667 ms |     13050.0 ms
Avg time    |  14541.1111111 ms |     13775.1600753 ms
Std dev     |  118.144084454 ms |     1748.99567772 ms
SI          |   4941.5          |     5108.0
PSI         |   36256.5         |     38125.0
Comparison video attached: https://youtu.be/dOvyBlVHHHE
Flags: needinfo?(bchien)
Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:investigate][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Mass wontfix for bugs affecting firefox 52.
Keywords: perf
Whiteboard: [qf:investigate][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
This seems actionable to me, leaving P1.
Nobody looked into these bugs for a while, moving to P2 and waiting for [qf] re-evaluation before moving back to P1.
Priority: P1 → P2
Whiteboard: [qf][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:p1:64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Vicky, since this qualify as a qf:p1 for Firefox 64, would it be possible to have updated information about this bug?

Adding a profile to justify that this is still a JS issue and a bit of documentation to explain how to reproduce this issue would be great.
Flags: needinfo?(vchin)
We discussed this in the QF triage meeting and agreed this needed further investigation as the steps to reproduce no longer exists. However having read through the above comments it seems based on comment 78/79 that the problem has largely been fixed?
Flags: needinfo?(vchin)
Whiteboard: [qf:p1:64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:p1:f64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Assignee: nobody → kvijayan
Taking notes in the bug as I analyze this further.

Narrow profile of user-jank of 2.2 seconds: https://perfht.ml/2xl21hI

Now, looking at the call stack - I notice a huge chunk of the time spent punching down a call stack using unoptimized fallback paths.  We also invoke a lot of IonCompiles - around 16% of the time spent in this early-phase is ion-compiles.

Nbp's work on reducing ion-compile times by skipping cold basic blocks would improve this component significantly.

The other major thing is just "slow interpreter".  Primarily - we need faster call paths and generally better optimization of fastpaths.  Interpreter ICs should help significantly on this front.

This is confirmed by looking at the inverted C++-only call stack to discover that we're spending 25% of our time in js::Interpret / js::RunScript.


The next hang of 1 second: https://perfht.ml/2PP1Geb

Here also, we spend about 90% of our time in native code, not jits.  About 10% in baseline jit, and 1% in ion.  This is also cold-code issues.

A total of 12% of the time is spent in js::Interpret (and inlined functions) alone.  We're almost never entering the jits here at all.



This is a the same story across the board.  Fundamentally the problem here is that we're running a ton of cold code, and not entering the jits.  The best near-term solution for this is Interpreter ICs.  I don't anticipate that being doable by 64.
(In reply to Kannan Vijayan [:djvj] from comment #86)
> This is confirmed by looking at the inverted C++-only call stack to discover
> that we're spending 25% of our time in js::Interpret / js::RunScript.

js::RunScript will also include the time we spend in Baseline.

I've narrowed down the two slices that you've used to some top hot functions, by using the transforms "Collapse function's subtree across the entire tree" and "Merge function into caller across the entire tree" in the inverted callstack mode. Here's the result:
 hang of 2.2 seconds: https://perfht.ml/2QJLeNg
 next hang of 1 second: https://perfht.ml/2QHfIiY
Only look at the roots of the inverted tree, and only at the top ~15 of that list.

Not sure if it's helpful. The only things that stood out to me, other than what you already said, is the fact that we spend 4.2% of the first slice in CreateThis, and 1.7% of the second slice in CreateThisForFunction.
Based on Kannan's comments in Comment 86, we decided to move the QF target of this bug to [qf:p1:f67].  Additionally, we marked this bug [qf:js:investigate] to indicate we need to do further investigation.  Last, since Kannan believes that improving the interpreter performance would help addresses the issues here, we made this a dependency of the Interpreter IC bug (1361474).
Blocks: 1361474
Whiteboard: [qf:p1:f64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:p1:f67][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate]
Status: NEW → ASSIGNED
Whiteboard: [qf:p1:f67][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate]

Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)

You need to log in before you can comment on or make changes to this bug.