Closed Bug 1269690 (gdoc_read_basic_table_1_ubuntu(45.25%)) Opened 9 years ago Closed 2 years ago

[perf][google suite][google docs] 45.25%(4528ms) slower than Chrome when opening 1 page table content

Tracking

()

Status:

RESOLVED INCOMPLETE

Tracking Flags:

Tracking

Status

platform-rel

---

firefox-esr45

---

affected

firefox52

---

wontfix

People

(Reporter: sho, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs])

User Story

You can find all test scripts on github link:
https://github.com/mozilla-twqa/hasal

And you can also find the running script name in comments
for example: 
test scripts: test_chrome_gdoc_create_txt_1
then you can specify the test script name on suite.txt and run it

Attachments

(19 files)

test_firefox_gdoc_read_basic_table_1_1462949079.har 9 years ago Cynthia Tang [:cynthiatang] 7.84 MB, text/plain		Details
profiler screenshot 9 years ago Andrew Overholt [:overholt] 329.61 KB, image/png		Details
Har file with ionMonkey enabled 201607151058 9 years ago Shako Ho 8.21 MB, text/plain		Details
Har file with ionMonkey disabled 201607151058 9 years ago Shako Ho 8.20 MB, text/plain		Details
Performance Timing obj with ionMonkey enabled 9 years ago Shako Ho 958 bytes, text/plain		Details
Performance Timing obj with ionMonkey disabled 9 years ago Shako Ho 958 bytes, text/plain		Details
Google Chrome ver.52 timeline 9 years ago Greg Weng [:snowmantw][:gweng][:λ] 475.16 KB, image/png		Details
FirefoxDeveloperEdition 50.a2 timeline 9 years ago Greg Weng [:snowmantw][:gweng][:λ] 437.37 KB, image/png		Details
could not find Anonymous func call 9 years ago Steve Fink [:sfink] [:s:] 380.74 KB, image/png		Details
The Anonymous function 9 years ago Greg Weng [:snowmantw][:gweng][:λ] 934.19 KB, image/png		Details
de-noe10s-50.a1.json 9 years ago Greg Weng [:snowmantw][:gweng][:λ] 3.48 MB, application/json		Details
de-with10s-50.a1.json 9 years ago Greg Weng [:snowmantw][:gweng][:λ] 3.56 MB, application/json		Details
nightly-noe10s-exscripts-profile-51.a1.json 9 years ago Greg Weng [:snowmantw][:gweng][:λ] 3.15 MB, application/json		Details
nightly-withe10s-exscripts-profile-51.a1.json 9 years ago Greg Weng [:snowmantw][:gweng][:λ] 3.11 MB, application/json		Details
devtool-perf-win10-e10s-nightly-nocache.json 9 years ago Greg Weng [:snowmantw][:gweng][:λ] 2.73 MB, text/plain		Details
devtool-perf-win10-non-e10s-nightly-nocache.json 9 years ago Greg Weng [:snowmantw][:gweng][:λ] 3.04 MB, text/plain		Details
test_chrome_gdoc_read_basic_table_1_sampleobj.flow 9 years ago Shako Ho 466.61 KB, text/plain		Details
test_firefox_gdoc_read_basic_table_1_sampleobj.flow 9 years ago Shako Ho 267.70 KB, text/plain		Details
priority-ipc-runnable.diff 9 years ago Thinker Li [:sinker] 19.20 KB, patch		Details \| Diff \| Splinter Review

Shako Ho

Reporter

Description

•

9 years ago

# Test Case STR 1. Launch the browser with blank page 2. input the google doc url with 1 table content 3. close the browser # Hardware OS: Ubuntu 14.04 LTS 64-bit CPU: i7-3770 3.4GMhz Memory: 16GB Ram Hard Drive: 1TB SATA HDD Graphics: GK107 [GeForce GT 640]/ GF108 [GeForce GT 440/630] # Browsers Firefox version: 45.0.2 Chrome version: 50.0.2661.75 # Result Browser | Run time (median value) Firefox | 14533.3333 ms Chrome | 10005.5556 ms

Shako Ho

Reporter

Updated

•

9 years ago

Blocks: Meta-Hasal-GDoc

Al Tsai [:atsai]

Updated

•

9 years ago

Product: Firefox → Core

Bobby Chien

Updated

•

9 years ago

Version: unspecified → 45 Branch

Cynthia Tang [:cynthiatang]

Comment 1

•

9 years ago

# Profiling Data: https://cleopatra.io/#report=2544480759b8351b33cfb70d34165c6130cf973f # Performance Timing: http://goo.gl/fhD77x { "navigationStart\": 1462949113471, "unloadEventStart\": 0, "unloadEventEnd\": 0, "redirectStart\": 0, "redirectEnd\": 0, "fetchStart\": 1462949113476, "domainLookupStart\": 1462949113476, "domainLookupEnd\": 1462949113476, "connectStart\": 1462949113476, "connectEnd\": 1462949113476, "requestStart\": 1462949113479, "responseStart\": 1462949113891, "responseEnd\": 1462949113951, "domLoading\": 1462949113895, "domInteractive\": 1462949116185, "domContentLoadedEventStart\": 1462949116276, "domContentLoadedEventEnd\": 1462949116277, "domComplete\": 1462949121346, "loadEventStart\": 1462949121346, "loadEventEnd\": 1462949121353 } # Test Script: https://github.com/Mozilla-TWQA/Hasal/blob/master/tests/test_firefox_gdoc_read_basic_table_1.sikuli/test_firefox_gdoc_read_basic_table_1.py # Test Data: https://docs.google.com/document/d/1d49NaEpY9G9laZY_OPYfYdjngeAvcGNEPMTEVsW76js/edit

Cynthia Tang [:cynthiatang]

Comment 2

•

9 years ago

Attached file test_firefox_gdoc_read_basic_table_1_1462949079.har — Details

Cynthia Tang [:cynthiatang]

Comment 3

•

9 years ago

# Try to identify the time between "domLoading" to "loadEventEnd" Perf Timing domLoading to loadEventEnd: 7882 - 424 = 7458 ms AddOn Start to End mark: 41375 - 33189 = 8186 ms From Gecko Profiling data, the Range [33105, 41343]: 8111 100.0% Startup::XRE_Main 2337 32.4% ├─ nsHtml5TreeOpExecutor::RunFlushLoop 2168 26.7% │ ├─ nsJSUtils::EvaluateString 137 1.7% │ ├─ EventDispatcher::Dispatch │ └─ ...so on │ 1800 22.2% ├─ nsInputStreamPump::OnInputStreamReady 1650 20.3% │ ├─ nsInputStreamPump::OnStateStop 106 1.3% │ ├─ nsInputStreamPump::OnStateStart 44 0.5% │ └─ nsInputStreamPump::OnStateTransfer │ 1127 13.9% ├─ Timer::Fire 1102 13.6% │ ├─ js::RunScript 14 0.2% │ ├─ nsCycleCollector::forgetSkippable │ └─ ...so on │ 624 7.7% ├─ nsRefreshDriver::Tick 323 4.0% │ ├─ PressShell::Paint 113 1.4% │ ├─ PressShell::Flush (Flush_Style) │ └─ ...so on └─ ...so on Note 1. Filed Bug 1271912 for tracking "nsJSUtils::EvaluateString" 2. Filed Bug 1271913 for tracking "nsInputStreamPump::OnStateStop" 3. Filed Bug 1271914 for tracking "js::RunScript" 4. Filed Bug 1271915 for tracking "nsRefreshDriver::Tick"

Depends on: 1271912, 1271913, 1271914, 1271915

QA Contact: ctang

Shako Ho

Reporter

Updated

•

9 years ago

User Story: (updated)

Cynthia Tang [:cynthiatang]

Comment 4

•

9 years ago

Bug 1271912 has been marked as a duplicate of bug 1270351 Bug 1271913 has been marked as a duplicate of bug 1267971 Bug 1271915 has been marked as a duplicate of bug 1270427

Depends on: 1270351, 1267971, 1270427

Desigan Chinniah [:cyberdees] [:dees] [London - GMT]

Updated

•

9 years ago

Whiteboard: [platform-rel-Google][platform-rel-GoogleDocs]

Jenn Chaulk (:jchaulk)

Updated

•

9 years ago

platform-rel: --- → ?

Shako Ho

Reporter

Updated

•

9 years ago

Flags: needinfo?(tlee)

Thinker Li [:sinker]

Comment 5

•

9 years ago

I found a lot bailout in profiles. So, I disable IonMonkey, |javascript.options.ion=false|, and do test more times. The table has been showed ~3s earlier while it was 20+s on my laptop. Shako, could you run your test with the change in |about:config| to confirm if I am right?

Flags: needinfo?(tlee) → needinfo?(sho)

Shako Ho

Reporter

Comment 6

•

9 years ago

Sure, test is already running, waiting for the result

Flags: needinfo?(sho)

Shako Ho

Reporter

Comment 7

•

9 years ago

Hi Thinker, The test is finish. Comparing the result between enable and disable IonMonkey, disable IonMonkey is slower than enable IonMonkey around 2016 ms which is around 8% slower. Machine spec: # Hardware OS: Ubuntu 14.04 LTS 64-bit CPU: i7-3770 3.4GMhz Memory: 16GB Ram (4GB 1600MHZ x 4) Hard Drive: 1TB SATA HDD Graphics: GF108 [GeForce GT 440/630] # Browsers Firefox version: 45.0.2 # Result Browser | Run time (median value) Firefox with enable IonMonkey | 24411 ms Firefox with disable IonMonkey| 26427 ms

Flags: needinfo?(tlee)

Thinker Li [:sinker]

Comment 8

•

9 years ago

After having told to Shako, we found the way doing test is not what is in my mind. Shako and his team would modify the test and the framework they used, and do another round.

Flags: needinfo?(tlee)

Bobby Chien

Updated

•

9 years ago

Severity: normal → critical

Priority: -- → P1

Bobby Chien

Updated

•

9 years ago

See Also: → gdoc_read_basic_table_1(17.2%)

Bobby Chien

Updated

•

9 years ago

Flags: needinfo?(overholt)

Flags: needinfo?(kchen)

Flags: needinfo?(bugs)

Andrew Overholt [:overholt]

Comment 9

•

9 years ago

Are we spending lots of time in the JS engine and in the parser? Am I reading the profile correctly?

Flags: needinfo?(overholt)

Flags: needinfo?(nihsanullah)

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Comment 10

•

9 years ago

(In reply to Andrew Overholt [:overholt] from comment #9) > Are we spending lots of time in the JS engine and in the parser? Am I > reading the profile correctly? nsHtml5TreeOpExecutor::RunFlushLoop() is responsible for executing scripts provided via <script>, so it seems that this isn't the parser itself being slow but the first parse + execution of *inline* scripts being slow.

Flags: needinfo?(hsivonen)

Henri Sivonen (:hsivonen)

Comment 11

•

9 years ago

Inline as in not <script src> but <script> // something </script>

Andrew Overholt [:overholt]

Comment 12

•

9 years ago

Thanks, Henri. I'll wait for Naveed to comment here.

Andrew Overholt [:overholt]

Comment 13

•

9 years ago

Attached image profiler screenshot — Details

Are the /usr/lib/firefox/libxul.so references just errors in the profiler or are we actually profiling the system-installed Firefox and not nightly?

Flags: needinfo?(sho)

Thinker Li [:sinker]

Comment 14

•

9 years ago

Make a brief. Just like what Henri said, there is a 11s long inline script (on my laptop). The table is showed after the inline script. There are some sync resytle and reflow during the inline script. And, the number of bailout is high. I had tried to disable IonMonkey (for bailout) to see what will happen. And, it seems a bit faster for the inline script, and table was showed earlier (~3s). But, it made other JS tasks later slower. I am waiting for Shako and his members to run more tests to confirm what I have found. They told me today they will do it in the following days since they just complete the change against their test case to meet what I had requested for.

Shako Ho

Reporter

Comment 15

•

9 years ago

Reply for comment 13 Hi Andrew, In this test result and profiler data, we test on the system-installed Firefox. If you are looking for the test result on Nightly, you could reference the bug 1285178

Flags: needinfo?(sho)

Shako Ho

Reporter

Comment 16

•

9 years ago

Hi Thinker, The test is finish. Comparing the result between enable and disable IonMonkey, disable IonMonkey is slower than enable IonMonkey around 333 ms which is around 4% slower. The running time measurement start from browser with blank page to the gdoc table snapshot appear. Machine spec: # Hardware OS: Ubuntu 14.04 LTS 64-bit CPU: i7-3770 3.4GMhz Memory: 16GB Ram (4GB 1600MHZ x 4) Hard Drive: 1TB SATA HDD Graphics: GF108 [GeForce GT 440/630] # Browsers Firefox version: 45.0.2 # Result Browser | Run time (median value) Firefox with enable IonMonkey | 7944 ms Firefox with disable IonMonkey| 8277 ms

Flags: needinfo?(tlee)

Andrew Overholt [:overholt]

Comment 17

•

9 years ago

(In reply to Shako Ho from comment #15) > Reply for comment 13 > Hi Andrew, > In this test result and profiler data, we test on the system-installed > Firefox. I see. I don't know if this is worth the effort. > If you are looking for the test result on Nightly, you could reference the > bug 1285178 Ok, I think we should focus on that bug, then. Although I'm surprised the distro-shipped build (or maybe the same would occur with a release build from mozilla.org?) is that much slower!

Jenn Chaulk (:jchaulk)

Updated

•

9 years ago

platform-rel: ? → +

Thinker Li [:sinker]

Comment 18

•

9 years ago

(In reply to Shako Ho from comment #16) > Hi Thinker, > The test is finish. Comparing the result between enable and disable > IonMonkey, disable IonMonkey is slower than enable IonMonkey around 333 ms > which is around 4% slower. > > The running time measurement start from browser with blank page to the gdoc > table snapshot appear. > > Machine spec: > # Hardware > OS: Ubuntu 14.04 LTS 64-bit > CPU: i7-3770 3.4GMhz > Memory: 16GB Ram (4GB 1600MHZ x 4) > Hard Drive: 1TB SATA HDD > Graphics: GF108 [GeForce GT 440/630] > > # Browsers > Firefox version: 45.0.2 > > # Result > Browser | Run time (median value) > Firefox with enable IonMonkey | 7944 ms > Firefox with disable IonMonkey| 8277 ms

Flags: needinfo?(tlee)

Shako Ho

Reporter

Comment 19

•

9 years ago

Attached file Har file with ionMonkey enabled 201607151058 — Details

Shako Ho

Reporter

Comment 20

•

9 years ago

Attached file Har file with ionMonkey disabled 201607151058 — Details

Shako Ho

Reporter

Comment 21

•

9 years ago

Hi Thinker, The new profiler is already uploaded please check the link below: ionMonkey disabled: https://cleopatra.io/#report=2677ae2d3884d928408b35ad4c1ce0408d3e0220 ionMonkey enabled: https://cleopatra.io/#report=3a4ed1c606478b4d990174633432340c93860677 And also upload the har file in this bug, please find the attachment: - Har file with ionMonkey enabled 201607151058 - Har file with ionMonkey disabled 201607151058

Flags: needinfo?(tlee)

Walter Chen[:ypwalter][:wachen]

Updated

•

9 years ago

Alias: gdoc_read_basic_table_1

Walter Chen[:ypwalter][:wachen]

Updated

•

9 years ago

Alias: gdoc_read_basic_table_1 → gdoc_read_basic_table_1(45.25%)

Walter Chen[:ypwalter][:wachen]

Updated

•

9 years ago

Alias: gdoc_read_basic_table_1(45.25%) → gdoc_read_basic_table_1_ubuntu(45.25%)

Walter Chen[:ypwalter][:wachen]

Updated

•

9 years ago

Depends on: gdoc_read_basic_table_1_win7(40%)

Shako Ho

Reporter

Comment 22

•

9 years ago

Attached file Performance Timing obj with ionMonkey enabled — Details

Shako Ho

Reporter

Comment 23

•

9 years ago

Attached file Performance Timing obj with ionMonkey disabled — Details

Shako Ho

Reporter

Comment 24

•

9 years ago

Hi Thinker, Just redo the profiler data based on your request, Nightly build 50.0a1 with gecko profiler and perfmarker, before click the hyperlink there will be a "Before click hyperlink" event sent. ionMonkey disabled: https://cleopatra.io/#report=dcfea2b043dd1c28447d593fe78448b837d65843 ionMonkey enabled: https://cleopatra.io/#report=a311efa7d5902550d0571352070175b80101623f

Thinker Li [:sinker]

Comment 25

•

9 years ago

Shako, I read the profile, one thing is weir. From last |before click hyperlink| mark to the time table showed, it takes about only 4.3s here. You last test in comment 16 is about 9s. They are quite different. Could check if they are really different in the nightly by running both test cases (typing in location bar, and clicking a link). It would be an interest problem if old test case are such slow.

Flags: needinfo?(tlee)

Shako Ho

Reporter

Comment 26

•

9 years ago

Hi Thinker, Before doing the test, I have one quick question. How do you get the time stamp of table showed? According to the profile, from the "Before click hyperlink" marker to "PageMod End" marker seems takes about 9-10 seconds. But the timing is to the end of page fully loaded, so the table could show up earlier.

Shako Ho

Reporter

Comment 27

•

9 years ago

Hi Thinker, I did some comparison between comment 16 and the latest profile, the spending time of table showing seems not take longer in the latest profile. And you could also reference another bug 1285178 which is test for nightly build.

Greg Weng [:snowmantw][:gweng][:λ]

Comment 28

•

9 years ago

I'm generating data for IonMonkey bailouts. Maybe it can provide us some ideas: https://docs.google.com/spreadsheets/d/1K866H7LGNIgiH4V0t7qJsDSDrCtgcjnPhu2JbQaSO8M/edit?usp=sharing I will keep working on that to get more data for reporting.

Greg Weng [:snowmantw][:gweng][:λ]

Comment 29

•

9 years ago

I make a simple case with only blank page, and the bailout times is about 200 (halved). After inserting same table into the blank page, it grew up to 400, but not as much as QA's case (~470). So it looks like the table growing contributes about 200 ~ 300 times bailouts. Maybe it's not too much compare to the bare-bone case.

Bobby Chien

Updated

•

9 years ago

Flags: needinfo?(kchen)

Greg Weng [:snowmantw][:gweng][:λ]

Comment 30

•

9 years ago

We've had some discoveries about the symptom. One comparison I noticed, is that there is very huge inline script in HTML that deserialize the content of the document and so on (edit:12). And that script, in Google Chrome, only takes about 1400ms to execute. However, on FirefoxDeveloperEdition 47.a1, it takes ~3500ms to execute. After I upgraded the Developer edition, it grew up to ~7000ms to execute, which is close to the result from Nightly. The results from both two browsers' performance tools has been saved on my computer. And I now upload screenshots to make my description clear. I cannot say I got exactly the same result per test. However, Google Chrome is always faster on that inline script. Maybe our JS engine needs some tuning for such inline script, or for the task kind it executes. Therefore, I also uploaded one Gecko profiler for that scope: https://cleopatra.io/#report=aa4cec913875ef1f9011b0f6910d378073c8d096&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A283439,%22end%22%3A292070%7D,%7B%22type%22%3A%22FocusedCallstackPrefixSampleFilter%22,%22name%22%3A%22%3CAnonymous%3E()%20%40%20edit%3Fpref%3D2%26pli%3D1%3A12%22,%22focusedCallstack%22%3A%5B0,7608%5D,%22appliesToJS%22%3Atrue%7D%5D But I must say to recognize the point represents the inline script is a bit tricky. Firstly, you need to switch the frame to "Content" with the "JavaScript only" checked. So that you can see the "<Anonymous>() @edit@pref=2&pli=1:12" call. Then, clicking on its arrow then uncheck the "JavaScript only" to show the details inside the engine. Currently I saw that the profiler reports it takes most of time on "js::Shape::search" and "js::NativeGetPropertyNoGC" inside the anonymous function call. I guess it's because the script is expanding the whole document structure so the engine needs to handle a very deep object via lots of getting properties, but I'm not sure this is the cause or the symptom. NI Naveed because of the JS finding (sorry I cannot NI who may be dedicated to issues like this); NI Bobby because of new updates.

Greg Weng [:snowmantw][:gweng][:λ]

Updated

•

9 years ago

Flags: needinfo?(bchien)

Greg Weng [:snowmantw][:gweng][:λ]

Comment 31

•

9 years ago

Attached image Google Chrome ver.52 timeline — Details

Greg Weng [:snowmantw][:gweng][:λ]

Comment 32

•

9 years ago

Attached image FirefoxDeveloperEdition 50.a2 timeline — Details

Greg Weng [:snowmantw][:gweng][:λ]

Comment 33

•

9 years ago

Another thing is I've noticed that our MinorGC is quite unstable in and after the inline script evaluation: sometimes it take 4 minor GCs at least occur 40ms ~ 120ms during the script evaluation time. And very often there will be a ~500ms(!) minor GC after the script evaluation (on Nightly). So far I haven't seen Chrome will take so much time on minor GC, although it also has some pieces larger than 10ms.

Naveed Ihsanullah [:naveed]

Comment 34

•

9 years ago

Including Terrence and Steve because of GC and GDocs respectively

Flags: needinfo?(nihsanullah) → needinfo?(terrence)

Thinker Li [:sinker]

Comment 35

•

9 years ago

I found a serious problem. Vsync notifications from the compositor thread had been put in the task queue of the main thread for 16s with my firefox build (debug, m-c), 18s in the whole. The notification delivered to the refresh driver at 6s later after the content was generated by a big chunk of JS code. That means time to drawing the content would be 6s shorter if the refresh driver was ticking at right time. The variant, in time, of firefox is unreasonable huge. I guess it is one of the major reasons, timing of vsync notifications. By my estimation, it could affect 10+% of time in average. We need someone to fix it.

Peter Chang[:pchang]

Comment 36

•

9 years ago

(In reply to Thinker Li [:sinker] from comment #35) > I found a serious problem. Vsync notifications from the compositor thread > had been put in the task queue of the main thread for 16s with my firefox > build (debug, m-c), 18s in the whole. The notification delivered to the > refresh driver at 6s later after the content was generated by a big chunk of > JS code. That means time to drawing the content would be 6s shorter if the > refresh driver was ticking at right time. > Bug 1294625 will address this problem to kick off refresh driver when JS paring is too long. Therefore, it is possible to render part of content first before paring is finished. > The variant, in time, of firefox is unreasonable huge. I guess it is one of > the major reasons, timing of vsync notifications. By my estimation, it > could affect 10+% of time in average. We need someone to fix it.

Depends on: 1294625

Bobby Chien

Updated

•

9 years ago

Flags: needinfo?(bchien)

Greg Weng [:snowmantw][:gweng][:λ]

Comment 37

•

9 years ago

With my colleague's top most MBP (15", 2.7GHz Core i7), the fattest script runs only in 2.4seconds, while Chrome still takes 1.4 seconds to run it. On my MBP (13", 3.1GHz Core i7), Firefox takes 6 ~ 8 seconds to evaluate the script while Chrome takes almost the same 1.4 seconds to evaluate it. I don't know if this reveals any important information, but it is quite interesting. And now I try to use Tracelogger to probe the case. Unfortunately, the tracelogger cannot generate result in website. I think it's because the case is too large, because for a very simple Google front page it works well.

Steve Fink [:sfink] [:s:]

Comment 38

•

9 years ago

By the way, all of this benchmarking should be done with JSGC_DISABLE_POISONING set to 1. I created a page https://wiki.mozilla.org/Benchmarking_advice to collect that sort of information, though I'll probably move it to an MDN page once my permissions over there are straightened out. I attempted to follow the instructions in comment 30, but I could not see "<Anonymous>() @edit@pref=2&pli=1:12" in the cleopatra UI anywhere. It might just be me, though; I have a lot of trouble using cleo. I will attach a screenshot.

Steve Fink [:sfink] [:s:]

Comment 39

•

9 years ago

Attached image could not find Anonymous func call — Details

Greg Weng [:snowmantw][:gweng][:λ]

Comment 40

•

9 years ago

Because Tracelogger result cannot not be shared easily as Gecko profiler, I built a Heroku app to demo the data collected online: https://boiling-coast-67040.herokuapp.com/website/tracelogger.html This is for this Google Doc case and on my self-built Nightly (NOT debug build). Because my DeveloperEdition reported empty *.tl files while I tried to tracelogger it. If anyone has new issue want to profile and upload, please NI me. Thanks.

Greg Weng [:snowmantw][:gweng][:λ]

Comment 41

•

9 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #38) > By the way, all of this benchmarking should be done with > JSGC_DISABLE_POISONING set to 1. I created a page > https://wiki.mozilla.org/Benchmarking_advice to collect that sort of > information, though I'll probably move it to an MDN page once my permissions > over there are straightened out. > > I attempted to follow the instructions in comment 30, but I could not see > "<Anonymous>() @edit@pref=2&pli=1:12" in the cleopatra UI anywhere. It might > just be me, though; I have a lot of trouble using cleo. I will attach a > screenshot. Okay. Here is a faster way to expand it: 1. move the mouse pointer to the Content timeline until the position of "285143ms" 2. Click on that 3. Wait it expands the stack 4. It should be in the expanded stack

Greg Weng [:snowmantw][:gweng][:λ]

Comment 42

•

9 years ago

Attached image The Anonymous function — Details

Greg Weng [:snowmantw][:gweng][:λ]

Comment 43

•

9 years ago

According to the Tracelogger result, I've found the longest inline script seems not in the logs. I checked the log and it contained several inline & external script compilation information, but the edit:12 line wasn't in it. I don't know if this means something. Since I have a local version to pull it as an external script, I will do another tracelogging result to confirm that.

Steve Fink [:sfink] [:s:]

Comment 44

•

9 years ago

Thanks, with those instructions I was able to see the Anonymous function you were pointing to. As I understand it, there are 3 different issues involved here: 1. The vsync timing problem. A mitigation is being worked on in bug 1294625. 2. Very slow minor GCs. I'd like to get the output of a run with JS_GC_PROFILE_NURSERY=50000 3. Very very slow parsing + execution of an inline <script>...</script>, which doesn't seem to be showing up in tracelogger. I'm going to run this locally to investigate this further. (I tried this before and had trouble, but it was OS misconfiguration on my machine, and I recently did a fresh install.) I'd like to figure out if there are any stalls waiting for the script to arrive to be parsed or something.

Greg Weng [:snowmantw][:gweng][:λ]

Comment 45

•

9 years ago

As I mentioned, I have a local version pull all huge inline scripts out of HTML, and put them in 4 files (ex1.js ~ ex4.js, while ex3.js is corresponding to the largest one costs longest time in the timeline). And I now have the Tracelogger result for it: https://murmuring-castle-30204.herokuapp.com/website/tracelogger.html There is only part of ex4.js has been Ion compiled in the graph. If this really shows something, may I assert that our Jits aren't so helpful for these kinds of scripts, even they're pulled out from HTML?

Greg Weng [:snowmantw][:gweng][:λ]

Comment 46

•

9 years ago

...and I've found the script, will take longer time to evaluate with e10s. My result with externalized scripts is: Nightly w/ e10s: 4.2 seconds Nightly w/o e10s: 2.2 seconds DeveloperEdition w/ e10s: 5.2 seconds DeveloperEdition w/o e10s: 2.1 seconds Although it's fluctuated, I think the gap still not negligible. Thinker suggests that there may be too much tasks since extra IPC channels in e10s case. Maybe we should do some serious analysis on this result as well.

Greg Weng [:snowmantw][:gweng][:λ]

Comment 47

•

9 years ago

Attached file de-noe10s-50.a1.json — Details

Greg Weng [:snowmantw][:gweng][:λ]

Comment 48

•

9 years ago

Attached file de-with10s-50.a1.json — Details

Greg Weng [:snowmantw][:gweng][:λ]

Comment 49

•

9 years ago

Attached file nightly-noe10s-exscripts-profile-51.a1.json — Details

Greg Weng [:snowmantw][:gweng][:λ]

Comment 50

•

9 years ago

Attached file nightly-withe10s-exscripts-profile-51.a1.json — Details

Thinker Li [:sinker]

Updated

•

9 years ago

Depends on: 1296160

Thinker Li [:sinker]

Updated

•

9 years ago

Depends on: 1296161

Greg Weng [:snowmantw][:gweng][:λ]

Comment 51

•

9 years ago

I've done some profiling on Win10 with official Nightly build. And it shows the performance on the "script" edit:12 makes no difference: they both complete the task in ~2.5 seconds. Gecko profiler result here: e10s https://cleopatra.io/#report=25000cf96d090d5c5e61bbec563902d622b8c710 non-10s https://cleopatra.io/#report=ee53f03572d113c505c8aabccbb46f4f8a4f5242

Greg Weng [:snowmantw][:gweng][:λ]

Comment 52

•

9 years ago

Attached file devtool-perf-win10-e10s-nightly-nocache.json — Details

Greg Weng [:snowmantw][:gweng][:λ]

Comment 53

•

9 years ago

Attached file devtool-perf-win10-non-e10s-nightly-nocache.json — Details

Thinker Li [:sinker]

Comment 54

•

9 years ago

Too much |LogMessageRunnable| on the content's main thread, they should be aggregated.

No longer depends on: 1296160, 1296161

Thinker Li [:sinker]

Comment 55

•

9 years ago

A lot of IPC messages in |PCompositorBridge::Msg_DidComposite| type are dispatched to the main thread of the content. Most of them are with |aTransaction == 0|. I think we should deliver this message only when necessary.

Flags: needinfo?(hshih)

Jerry Shih[:jerry] (UTC+8) (inactive)

Comment 56

•

9 years ago

Thanks, let me check the didComposite() call.

Flags: needinfo?(hshih)

Bobby Chien

Updated

•

9 years ago

Flags: needinfo?(bugs)

Greg Weng [:snowmantw][:gweng][:λ]

Comment 57

•

9 years ago

I tried to accumulate all time spent on IonBuilder, and got the result: 660ms. This is a rough measuring so I need to refine the method and compare it to the overview profiles we have already did. However, this can be the first step to answer "in the test case, how much time we will spend on compilation".

Shako Ho

Reporter

Comment 58

•

9 years ago

Hi Thinker, Per our discussion, the testing result of partial matching for windows as below: # Result without local proxy Browser | Run time (median value) Chrome | 2444.44444444 ms Firefox | 2622.22222222 ms # Result with local proxy Browser | Run time (median value) Chrome | 2344.44444444 ms Firefox | 2488.88888889 ms

Shako Ho

Reporter

Comment 59

•

9 years ago

Attached file test_chrome_gdoc_read_basic_table_1_sampleobj.flow — Details

Shako Ho

Reporter

Comment 60

•

9 years ago

Attached file test_firefox_gdoc_read_basic_table_1_sampleobj.flow — Details

Shako Ho

Reporter

Comment 61

•

9 years ago

The flow files tested for windows are also attached.

Flags: needinfo?(tlee)

Terrence Cole [:terrence]

Comment 62

•

9 years ago

Steve is on it; I don't need to be ni'd.

Flags: needinfo?(terrence)

Thinker Li [:sinker]

Comment 63

•

9 years ago

Attached patch priority-ipc-runnable.diff — Details — Splinter Review

I have done a POC for prioritized runnable with priority from IPC message. It looks very promise for time variance of the partial content tests. This patch make runnables having priority, and process them in the order of their priority and the order of their arrival time. IPC messages have priority too, the priority can be normal, high, or urgent. I make the same priority value on runnables. And, the priority of IPC message are passed to the tasks that is responsible to dequeue and execute them. The result is the variance of partial content tests is quite low. I have try it several time. All them fall in 10s~12s, 11s for most cases. To Shako, could you try this patch? https://treeherder.mozilla.org/#/jobs?repo=try&revision=bc5dddb372e6

Flags: needinfo?(tlee) → needinfo?(sho)

Shako Ho

Reporter

Comment 64

•

9 years ago

sure, will update the test result once test is finished

Flags: needinfo?(sho)

Shako Ho

Reporter

Comment 65

•

9 years ago

Hi Thinker, The test result as below: Median time: 4433.33333333 Avg time : 4450.74074074 Std dev : 87.7089900809

Thinker Li [:sinker]

Comment 66

•

9 years ago

(In reply to Shako Ho from comment #65) > Hi Thinker, > > The test result as below: > > Median time: 4433.33333333 > Avg time : 4450.74074074 > Std dev : 87.7089900809 Hi Shako, What is the environment for the test? Comment 58, the average time is about 2xxx. Why is it 4xxx now? And, I like to know what is different with the change. So, could you also put down numbers of a reference test?

Thinker Li [:sinker]

Comment 67

•

9 years ago

By the way, assume the run time is in normal distribution. It means 95.x% of probability would fall in 4.45s+/-0.194s.

Shako Ho

Reporter

Comment 68

•

9 years ago

Hi Thinker, The test result is based on Ubuntu 14.04. The comment 58 is the test result based on Windows. And you can also take the result compare with bug 1294625 1. check painting suppression status asap(bug 1271691) + tick refresh driver during html parsing(bug 1294625) https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1f7ca95b09f Median time: 4722.22222222 Avg time : 4733.7037037 Std dev : 165.812213019 2. tick refresh driver during html parsing(bug 1294625) https://treeherder.mozilla.org/#/jobs?repo=try&revision=0063015b7218 Median time: 4877.77777778 Avg time : 5278.51851852 Std dev : 730.825866529 3. check painting suppression status asap(bug 1271691) https://treeherder.mozilla.org/#/jobs?repo=try&revision=e14bdefbe9a1 Median time: 4500.0 Avg time : 4512.59259259 Std dev : 89.8894306321

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

9 years ago

Depends on: 1306591

Greg Weng [:snowmantw][:gweng][:λ]

Comment 69

•

9 years ago

I have found that our helper threads in Ion seemed to only help MIR optimisation, while the "not-helper-threads" done almost 85% work. I don't know if it's in purpose but from the thread overview it makes the tasks very unbalanced. This makes me wonder if we can make it more balanced via delivering more tasks besides MIR optimisation to help threads, maybe we can raise our throughput in this case. However, I don't know if this is do-able so I firstly NI Steve.

Flags: needinfo?(sphink)

Steve Fink [:sfink] [:s:]

Comment 70

•

9 years ago

Are you seeing this in the TraceLogger output? If I understand correctly, you are seeing one thread doing a large chunk of JIT compilation, while most of the threads are doing nothing. The difficulty is finding things to do in parallel. I will forward the question to Jan. jandem - this may be a very naïve view of it, but my understanding is that you run baseline code until it hits some fixed iteration count threshold, then start ion compiling it on a helper thread, then switch over to the compiled version when the compilation is complete. Is that correct? If another script were to get hot enough while the first is compiling, would it get compiled in parallel, or is there only one compilation thread available? Is it possible to ion compile multiple scripts concurrently? Would it be possible, and would it make sense, to adjust the compilation threshold somewhat based on availability of compilation resources? For example, you could do it in a work-stealing way: reduce the thresholds a bit. When a script hits the threshold, it is queued up in a priority queue sorted by (a function of) execution count. Worker threads pull items off the queue and compile them. It introduces even more nondeterminism, which is unfortunate, and perhaps compilation simply cannot be done concurrently; I don't know.

Flags: needinfo?(sphink) → needinfo?(jdemooij)

Desigan Chinniah [:cyberdees] [:dees] [London - GMT]

Updated

•

9 years ago

Whiteboard: [platform-rel-Google][platform-rel-GoogleDocs] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]

Jan de Mooij [:jandem]

Comment 71

•

9 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #70) > If > another script were to get hot enough while the first is compiling, would it > get compiled in parallel, or is there only one compilation thread available? > Is it possible to ion compile multiple scripts concurrently? Would it be > possible, and would it make sense, to adjust the compilation threshold > somewhat based on availability of compilation resources? We do have some mechanism to pause compilation threads when a new compilation job has higher priority. Hannes can probably say more about this.

Flags: needinfo?(jdemooij) → needinfo?(hv1989)

Hannes Verschore [:h4writer]

Comment 72

•

9 years ago

(In reply to Jan de Mooij [:jandem] from comment #71) > We do have some mechanism to pause compilation threads when a new > compilation job has higher priority. Hannes can probably say more about this. bug 1013172 comment 0 describes this. For various reasons we decided to only do one compilation at a time. (Not take all cpu power, battery consumption, peak memory ...). Which is what you are seeing here. Though, like said, that means priority is important. We currently decide this on the warmUp count and when another script because more hot, we will switch to complete that script first. Now I don't think seeing a long time of compilations is inherently bad. We are quite aggressive in compiling things. As a result it could be that we keep compiling but that it doesn't make a big difference for the execution speed, since the functions aren't used that much... If you have a tracelogger log you might want to iterate over: [start AnnotateScript foo.js] [start IonCompilation] [stop IonCompilation] [stop AnnotateScript] ... [start script foo.js] [start Interpreter/Baseline] [stop Interpreter/Baseline] [stop script foo.js] ... [start AnnotateScript foo.js] [start IonLinking] [stop IonLinking] [stop AnnotateScript] if you would look in between start of the compilation and end of the compilation and count how many times a script was called and how much time we spend in that function (not children). That would give an estimate how much time was lost due to waiting for the compilation to be done. (This is inclusive the time it is waiting on getting build offthread). If you want to know how to extract this information. Look at slide 35 or last slide. https://docs.google.com/presentation/d/1j_Hi9uP2Woz2OoAuxX9We11GbLhh0e8N-tbKYz1MeZU/edit?usp=sharing It gives a easy way to iterate the tree. Changing that script a little bit should be able to give the above information. If you have that information you would have an idea how much time we spend in Baseline waiting for IonMonkey to finish. If that is small, I'm pretty sure we won't see an improvement by compiling more at the same time.

Flags: needinfo?(hv1989)

Bobby Chien

Updated

•

9 years ago

Component: General → JavaScript Engine

Bobby Chien

Updated

•

9 years ago

Summary: [Perf][google docs] 45.25%(4528ms) slower than Chrome when opening 1 page table content → [perf][google suite][google docs] 45.25%(4528ms) slower than Chrome when opening 1 page table content

Jenn Chaulk (:jchaulk)

Comment 73

•

9 years ago

plat-rel tracked at the meta level

platform-rel: + → -

Chris Peterson [:cpeterson]

Updated

•

9 years ago

status-firefox52: --- → affected

status-firefox-esr45: --- → affected

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 75

•

9 years ago

Shako, Thinker, now that bug 1306591 has landed, could we perhaps get some new numbers here?

Flags: needinfo?(tlee)

Flags: needinfo?(sho)

Bobby Chien

Comment 76

•

9 years ago

For gdoc_read_basic_table_1 case: Firefox: http://tinyurl.com/hbr9nzz Chrome: http://tinyurl.com/hh5mp23 The result is positive. It's hard to show number which based on same baseline/content with original context on this bug. However, by compare with previous revision, firefox has 13% improvement in this case. In the same time, Chrome has 8% improvement in same case. It's hard to say theses improvement was caused by web content or not, but we test Firefox and Chrome at similar time. For all regression cases: Compared with previous revision: http://tinyurl.com/j8nvcp5 Compared with next revision: http://tinyurl.com/go9ugee You can see similar trend in all categories from comparison (see results with Firefox tag) with previous revision. This patch has positive results. Especially, very good result in SpeedIndex (SI). Conclusion: This patch does improve perceived performance, roughly less than 10% improvement.

Flags: needinfo?(tlee)

Flags: needinfo?(sho)

Thinker Li [:sinker]

Comment 77

•

9 years ago

Bobby, could you do test to compare with previous revision at the version of gdoc? In another word, run new and old revision at the same time to eliminate the possible of change of gdoc. It is more reasonable to compare with firefox itself.

Flags: needinfo?(bchien)

Bobby Chien

Updated

•

9 years ago

Flags: needinfo?(bchien)

QA Contact: ctang

Bobby Chien

Updated

•

9 years ago

Flags: needinfo?(bchien)

Bobby Chien

Comment 78

•

9 years ago

Linux testing machine is busy. Use Windows machine instead. Previous build: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32/1478579975/ Build with patch: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32/1478642363/

Shako Ho

Reporter

Comment 79

•

9 years ago

Hi Thinker, The test result as below: OS: Windows 7 CPU: i7-3770 3.4GMhz Memory: 16GB Ram (4GB 1600MHZ x 4) Hard Drive: 1TB SATA HDD Graphics: GF108 [GeForce GT 440/630] previous build | build with patch Median time | 14541.6666667 ms | 13050.0 ms Avg time | 14541.1111111 ms | 13775.1600753 ms Std dev | 118.144084454 ms | 1748.99567772 ms SI | 4941.5 | 5108.0 PSI | 36256.5 | 38125.0

Bobby Chien

Comment 80

•

9 years ago

Comparison video attached: https://youtu.be/dOvyBlVHHHE

Flags: needinfo?(bchien)

Bobby Chien

Updated

•

9 years ago

See Also: → gdoc_read_basic_table_1_win7(40%)

:Harald Kirschner :digitarald

Updated

•

8 years ago

Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:investigate][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]

Julien Cristau [:jcristau] (back August 18)

Comment 81

•

8 years ago

Mass wontfix for bugs affecting firefox 52.

status-firefox52: affected → wontfix

Firefox Bug Husbandry Bot

Updated

•

8 years ago

Keywords: perf

Nicolas B. Pierron [:nbp]

Updated

•

7 years ago

Whiteboard: [qf:investigate][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]

Jason Orendorff [:jorendorff]

Comment 82

•

7 years ago

This seems actionable to me, leaving P1.

Nicolas B. Pierron [:nbp]

Comment 83

•

7 years ago

Nobody looked into these bugs for a while, moving to P2 and waiting for [qf] re-evaluation before moving back to P1.

Priority: P1 → P2

Vicky Chin [:vchin]

Updated

•

7 years ago

Whiteboard: [qf][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:p1:64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]

Nicolas B. Pierron [:nbp]

Comment 84

•

7 years ago

Vicky, since this qualify as a qf:p1 for Firefox 64, would it be possible to have updated information about this bug? Adding a profile to justify that this is still a JS issue and a bit of documentation to explain how to reproduce this issue would be great.

Flags: needinfo?(vchin)

Vicky Chin [:vchin]

Comment 85

•

7 years ago

We discussed this in the QF triage meeting and agreed this needed further investigation as the steps to reproduce no longer exists. However having read through the above comments it seems based on comment 78/79 that the problem has largely been fixed?

Flags: needinfo?(vchin)

Steven DeTar [:sdetar]

Updated

•

7 years ago

Whiteboard: [qf:p1:64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:p1:f64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]

Kannan Vijayan [:djvj]

Updated

•

7 years ago

Assignee: nobody → kvijayan

Kannan Vijayan [:djvj]

Comment 86

•

7 years ago

Taking notes in the bug as I analyze this further. Narrow profile of user-jank of 2.2 seconds: https://perfht.ml/2xl21hI Now, looking at the call stack - I notice a huge chunk of the time spent punching down a call stack using unoptimized fallback paths. We also invoke a lot of IonCompiles - around 16% of the time spent in this early-phase is ion-compiles. Nbp's work on reducing ion-compile times by skipping cold basic blocks would improve this component significantly. The other major thing is just "slow interpreter". Primarily - we need faster call paths and generally better optimization of fastpaths. Interpreter ICs should help significantly on this front. This is confirmed by looking at the inverted C++-only call stack to discover that we're spending 25% of our time in js::Interpret / js::RunScript. The next hang of 1 second: https://perfht.ml/2PP1Geb Here also, we spend about 90% of our time in native code, not jits. About 10% in baseline jit, and 1% in ion. This is also cold-code issues. A total of 12% of the time is spent in js::Interpret (and inlined functions) alone. We're almost never entering the jits here at all. This is a the same story across the board. Fundamentally the problem here is that we're running a ton of cold code, and not entering the jits. The best near-term solution for this is Interpreter ICs. I don't anticipate that being doable by 64.

Markus Stange [:mstange]

Comment 87

•

7 years ago

(In reply to Kannan Vijayan [:djvj] from comment #86) > This is confirmed by looking at the inverted C++-only call stack to discover > that we're spending 25% of our time in js::Interpret / js::RunScript. js::RunScript will also include the time we spend in Baseline. I've narrowed down the two slices that you've used to some top hot functions, by using the transforms "Collapse function's subtree across the entire tree" and "Merge function into caller across the entire tree" in the inverted callstack mode. Here's the result: hang of 2.2 seconds: https://perfht.ml/2QJLeNg next hang of 1 second: https://perfht.ml/2QHfIiY Only look at the roots of the inverted tree, and only at the top ~15 of that list. Not sure if it's helpful. The only things that stood out to me, other than what you already said, is the fact that we spend 4.2% of the first slice in CreateThis, and 1.7% of the second slice in CreateThisForFunction.

Steven DeTar [:sdetar]

Comment 88

•

7 years ago

Based on Kannan's comments in Comment 86, we decided to move the QF target of this bug to [qf:p1:f67]. Additionally, we marked this bug [qf:js:investigate] to indicate we need to do further investigation. Last, since Kannan believes that improving the interpreter performance would help addresses the issues here, we made this a dependency of the Interpreter IC bug (1361474).

Blocks: 1361474

Whiteboard: [qf:p1:f64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:p1:f67][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate]

Steven DeTar [:sdetar]

Updated

•

7 years ago

Status: NEW → ASSIGNED

Whiteboard: [qf:p1:f67][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate]

Comment hidden (offtopic)

Caroline Cullen [:caroline]

Updated

•

4 years ago

Assignee: kvijayan → nobody

Status: ASSIGNED → NEW

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → ?

Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: ? → ---

BugBot [:suhaib / :marco/ :calixte]

Comment 90

•

3 years ago

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: critical → --

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

2 years ago

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → INCOMPLETE

You need to log in before you can comment on or make changes to this bug.