Closed
Bug 1269690
(gdoc_read_basic_table_1_ubuntu(45.25%))
Opened 8 years ago
Closed 11 months ago
[perf][google suite][google docs] 45.25%(4528ms) slower than Chrome when opening 1 page table content
Categories
(Core :: JavaScript Engine, defect, P2)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: sho, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Keywords: perf, Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs])
User Story
You can find all test scripts on github link: https://github.com/mozilla-twqa/hasal And you can also find the running script name in comments for example: test scripts: test_chrome_gdoc_create_txt_1 then you can specify the test script name on suite.txt and run it
Attachments
(19 files)
7.84 MB,
text/plain
|
Details | |
329.61 KB,
image/png
|
Details | |
8.21 MB,
text/plain
|
Details | |
8.20 MB,
text/plain
|
Details | |
958 bytes,
text/plain
|
Details | |
958 bytes,
text/plain
|
Details | |
475.16 KB,
image/png
|
Details | |
437.37 KB,
image/png
|
Details | |
380.74 KB,
image/png
|
Details | |
934.19 KB,
image/png
|
Details | |
3.48 MB,
application/json
|
Details | |
3.56 MB,
application/json
|
Details | |
3.15 MB,
application/json
|
Details | |
3.11 MB,
application/json
|
Details | |
2.73 MB,
text/plain
|
Details | |
3.04 MB,
text/plain
|
Details | |
466.61 KB,
text/plain
|
Details | |
267.70 KB,
text/plain
|
Details | |
19.20 KB,
patch
|
Details | Diff | Splinter Review |
# Test Case STR 1. Launch the browser with blank page 2. input the google doc url with 1 table content 3. close the browser # Hardware OS: Ubuntu 14.04 LTS 64-bit CPU: i7-3770 3.4GMhz Memory: 16GB Ram Hard Drive: 1TB SATA HDD Graphics: GK107 [GeForce GT 640]/ GF108 [GeForce GT 440/630] # Browsers Firefox version: 45.0.2 Chrome version: 50.0.2661.75 # Result Browser | Run time (median value) Firefox | 14533.3333 ms Chrome | 10005.5556 ms
Blocks: Meta-Hasal-GDoc
Updated•8 years ago
|
Product: Firefox → Core
Updated•8 years ago
|
Version: unspecified → 45 Branch
Comment 1•8 years ago
|
||
# Profiling Data: https://cleopatra.io/#report=2544480759b8351b33cfb70d34165c6130cf973f # Performance Timing: http://goo.gl/fhD77x { "navigationStart\": 1462949113471, "unloadEventStart\": 0, "unloadEventEnd\": 0, "redirectStart\": 0, "redirectEnd\": 0, "fetchStart\": 1462949113476, "domainLookupStart\": 1462949113476, "domainLookupEnd\": 1462949113476, "connectStart\": 1462949113476, "connectEnd\": 1462949113476, "requestStart\": 1462949113479, "responseStart\": 1462949113891, "responseEnd\": 1462949113951, "domLoading\": 1462949113895, "domInteractive\": 1462949116185, "domContentLoadedEventStart\": 1462949116276, "domContentLoadedEventEnd\": 1462949116277, "domComplete\": 1462949121346, "loadEventStart\": 1462949121346, "loadEventEnd\": 1462949121353 } # Test Script: https://github.com/Mozilla-TWQA/Hasal/blob/master/tests/test_firefox_gdoc_read_basic_table_1.sikuli/test_firefox_gdoc_read_basic_table_1.py # Test Data: https://docs.google.com/document/d/1d49NaEpY9G9laZY_OPYfYdjngeAvcGNEPMTEVsW76js/edit
Comment 2•8 years ago
|
||
Comment 3•8 years ago
|
||
# Try to identify the time between "domLoading" to "loadEventEnd" Perf Timing domLoading to loadEventEnd: 7882 - 424 = 7458 ms AddOn Start to End mark: 41375 - 33189 = 8186 ms From Gecko Profiling data, the Range [33105, 41343]: 8111 100.0% Startup::XRE_Main 2337 32.4% ├─ nsHtml5TreeOpExecutor::RunFlushLoop 2168 26.7% │ ├─ nsJSUtils::EvaluateString 137 1.7% │ ├─ EventDispatcher::Dispatch │ └─ ...so on │ 1800 22.2% ├─ nsInputStreamPump::OnInputStreamReady 1650 20.3% │ ├─ nsInputStreamPump::OnStateStop 106 1.3% │ ├─ nsInputStreamPump::OnStateStart 44 0.5% │ └─ nsInputStreamPump::OnStateTransfer │ 1127 13.9% ├─ Timer::Fire 1102 13.6% │ ├─ js::RunScript 14 0.2% │ ├─ nsCycleCollector::forgetSkippable │ └─ ...so on │ 624 7.7% ├─ nsRefreshDriver::Tick 323 4.0% │ ├─ PressShell::Paint 113 1.4% │ ├─ PressShell::Flush (Flush_Style) │ └─ ...so on └─ ...so on Note 1. Filed Bug 1271912 for tracking "nsJSUtils::EvaluateString" 2. Filed Bug 1271913 for tracking "nsInputStreamPump::OnStateStop" 3. Filed Bug 1271914 for tracking "js::RunScript" 4. Filed Bug 1271915 for tracking "nsRefreshDriver::Tick"
Comment 4•8 years ago
|
||
Bug 1271912 has been marked as a duplicate of bug 1270351 Bug 1271913 has been marked as a duplicate of bug 1267971 Bug 1271915 has been marked as a duplicate of bug 1270427
Updated•8 years ago
|
Whiteboard: [platform-rel-Google][platform-rel-GoogleDocs]
Updated•8 years ago
|
platform-rel: --- → ?
Comment 5•8 years ago
|
||
I found a lot bailout in profiles. So, I disable IonMonkey, |javascript.options.ion=false|, and do test more times. The table has been showed ~3s earlier while it was 20+s on my laptop. Shako, could you run your test with the change in |about:config| to confirm if I am right?
Flags: needinfo?(tlee) → needinfo?(sho)
Sure, test is already running, waiting for the result
Flags: needinfo?(sho)
Hi Thinker, The test is finish. Comparing the result between enable and disable IonMonkey, disable IonMonkey is slower than enable IonMonkey around 2016 ms which is around 8% slower. Machine spec: # Hardware OS: Ubuntu 14.04 LTS 64-bit CPU: i7-3770 3.4GMhz Memory: 16GB Ram (4GB 1600MHZ x 4) Hard Drive: 1TB SATA HDD Graphics: GF108 [GeForce GT 440/630] # Browsers Firefox version: 45.0.2 # Result Browser | Run time (median value) Firefox with enable IonMonkey | 24411 ms Firefox with disable IonMonkey| 26427 ms
Flags: needinfo?(tlee)
Comment 8•8 years ago
|
||
After having told to Shako, we found the way doing test is not what is in my mind. Shako and his team would modify the test and the framework they used, and do another round.
Flags: needinfo?(tlee)
Updated•8 years ago
|
Severity: normal → critical
Priority: -- → P1
Updated•8 years ago
|
See Also: → gdoc_read_basic_table_1(17.2%)
Updated•8 years ago
|
Flags: needinfo?(overholt)
Flags: needinfo?(kchen)
Flags: needinfo?(bugs)
Comment 9•8 years ago
|
||
Are we spending lots of time in the JS engine and in the parser? Am I reading the profile correctly?
Flags: needinfo?(overholt)
Flags: needinfo?(nihsanullah)
Flags: needinfo?(hsivonen)
Comment 10•8 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #9) > Are we spending lots of time in the JS engine and in the parser? Am I > reading the profile correctly? nsHtml5TreeOpExecutor::RunFlushLoop() is responsible for executing scripts provided via <script>, so it seems that this isn't the parser itself being slow but the first parse + execution of *inline* scripts being slow.
Flags: needinfo?(hsivonen)
Comment 11•8 years ago
|
||
Inline as in not <script src> but <script> // something </script>
Comment 12•8 years ago
|
||
Thanks, Henri. I'll wait for Naveed to comment here.
Comment 13•8 years ago
|
||
Are the /usr/lib/firefox/libxul.so references just errors in the profiler or are we actually profiling the system-installed Firefox and not nightly?
Flags: needinfo?(sho)
Comment 14•8 years ago
|
||
Make a brief. Just like what Henri said, there is a 11s long inline script (on my laptop). The table is showed after the inline script. There are some sync resytle and reflow during the inline script. And, the number of bailout is high. I had tried to disable IonMonkey (for bailout) to see what will happen. And, it seems a bit faster for the inline script, and table was showed earlier (~3s). But, it made other JS tasks later slower. I am waiting for Shako and his members to run more tests to confirm what I have found. They told me today they will do it in the following days since they just complete the change against their test case to meet what I had requested for.
Reporter | ||
Comment 15•8 years ago
|
||
Reply for comment 13 Hi Andrew, In this test result and profiler data, we test on the system-installed Firefox. If you are looking for the test result on Nightly, you could reference the bug 1285178
Flags: needinfo?(sho)
Reporter | ||
Comment 16•8 years ago
|
||
Hi Thinker, The test is finish. Comparing the result between enable and disable IonMonkey, disable IonMonkey is slower than enable IonMonkey around 333 ms which is around 4% slower. The running time measurement start from browser with blank page to the gdoc table snapshot appear. Machine spec: # Hardware OS: Ubuntu 14.04 LTS 64-bit CPU: i7-3770 3.4GMhz Memory: 16GB Ram (4GB 1600MHZ x 4) Hard Drive: 1TB SATA HDD Graphics: GF108 [GeForce GT 440/630] # Browsers Firefox version: 45.0.2 # Result Browser | Run time (median value) Firefox with enable IonMonkey | 7944 ms Firefox with disable IonMonkey| 8277 ms
Flags: needinfo?(tlee)
Comment 17•8 years ago
|
||
(In reply to Shako Ho from comment #15) > Reply for comment 13 > Hi Andrew, > In this test result and profiler data, we test on the system-installed > Firefox. I see. I don't know if this is worth the effort. > If you are looking for the test result on Nightly, you could reference the > bug 1285178 Ok, I think we should focus on that bug, then. Although I'm surprised the distro-shipped build (or maybe the same would occur with a release build from mozilla.org?) is that much slower!
Updated•8 years ago
|
platform-rel: ? → +
Comment 18•8 years ago
|
||
(In reply to Shako Ho from comment #16) > Hi Thinker, > The test is finish. Comparing the result between enable and disable > IonMonkey, disable IonMonkey is slower than enable IonMonkey around 333 ms > which is around 4% slower. > > The running time measurement start from browser with blank page to the gdoc > table snapshot appear. > > Machine spec: > # Hardware > OS: Ubuntu 14.04 LTS 64-bit > CPU: i7-3770 3.4GMhz > Memory: 16GB Ram (4GB 1600MHZ x 4) > Hard Drive: 1TB SATA HDD > Graphics: GF108 [GeForce GT 440/630] > > # Browsers > Firefox version: 45.0.2 > > # Result > Browser | Run time (median value) > Firefox with enable IonMonkey | 7944 ms > Firefox with disable IonMonkey| 8277 ms
Flags: needinfo?(tlee)
Reporter | ||
Comment 19•8 years ago
|
||
Reporter | ||
Comment 20•8 years ago
|
||
Reporter | ||
Comment 21•8 years ago
|
||
Hi Thinker, The new profiler is already uploaded please check the link below: ionMonkey disabled: https://cleopatra.io/#report=2677ae2d3884d928408b35ad4c1ce0408d3e0220 ionMonkey enabled: https://cleopatra.io/#report=3a4ed1c606478b4d990174633432340c93860677 And also upload the har file in this bug, please find the attachment: - Har file with ionMonkey enabled 201607151058 - Har file with ionMonkey disabled 201607151058
Flags: needinfo?(tlee)
Updated•8 years ago
|
Alias: gdoc_read_basic_table_1
Updated•8 years ago
|
Alias: gdoc_read_basic_table_1 → gdoc_read_basic_table_1(45.25%)
Updated•8 years ago
|
Alias: gdoc_read_basic_table_1(45.25%) → gdoc_read_basic_table_1_ubuntu(45.25%)
Updated•8 years ago
|
Depends on: gdoc_read_basic_table_1_win7(40%)
Reporter | ||
Comment 22•8 years ago
|
||
Reporter | ||
Comment 23•8 years ago
|
||
Reporter | ||
Comment 24•8 years ago
|
||
Hi Thinker, Just redo the profiler data based on your request, Nightly build 50.0a1 with gecko profiler and perfmarker, before click the hyperlink there will be a "Before click hyperlink" event sent. ionMonkey disabled: https://cleopatra.io/#report=dcfea2b043dd1c28447d593fe78448b837d65843 ionMonkey enabled: https://cleopatra.io/#report=a311efa7d5902550d0571352070175b80101623f
Comment 25•8 years ago
|
||
Shako, I read the profile, one thing is weir. From last |before click hyperlink| mark to the time table showed, it takes about only 4.3s here. You last test in comment 16 is about 9s. They are quite different. Could check if they are really different in the nightly by running both test cases (typing in location bar, and clicking a link). It would be an interest problem if old test case are such slow.
Flags: needinfo?(tlee)
Reporter | ||
Comment 26•8 years ago
|
||
Hi Thinker, Before doing the test, I have one quick question. How do you get the time stamp of table showed? According to the profile, from the "Before click hyperlink" marker to "PageMod End" marker seems takes about 9-10 seconds. But the timing is to the end of page fully loaded, so the table could show up earlier.
Reporter | ||
Comment 27•8 years ago
|
||
Hi Thinker, I did some comparison between comment 16 and the latest profile, the spending time of table showing seems not take longer in the latest profile. And you could also reference another bug 1285178 which is test for nightly build.
Comment 28•8 years ago
|
||
I'm generating data for IonMonkey bailouts. Maybe it can provide us some ideas: https://docs.google.com/spreadsheets/d/1K866H7LGNIgiH4V0t7qJsDSDrCtgcjnPhu2JbQaSO8M/edit?usp=sharing I will keep working on that to get more data for reporting.
Comment 29•8 years ago
|
||
I make a simple case with only blank page, and the bailout times is about 200 (halved). After inserting same table into the blank page, it grew up to 400, but not as much as QA's case (~470). So it looks like the table growing contributes about 200 ~ 300 times bailouts. Maybe it's not too much compare to the bare-bone case.
Updated•8 years ago
|
Flags: needinfo?(kchen)
Comment 30•8 years ago
|
||
We've had some discoveries about the symptom. One comparison I noticed, is that there is very huge inline script in HTML that deserialize the content of the document and so on (edit:12). And that script, in Google Chrome, only takes about 1400ms to execute. However, on FirefoxDeveloperEdition 47.a1, it takes ~3500ms to execute. After I upgraded the Developer edition, it grew up to ~7000ms to execute, which is close to the result from Nightly. The results from both two browsers' performance tools has been saved on my computer. And I now upload screenshots to make my description clear. I cannot say I got exactly the same result per test. However, Google Chrome is always faster on that inline script. Maybe our JS engine needs some tuning for such inline script, or for the task kind it executes. Therefore, I also uploaded one Gecko profiler for that scope: https://cleopatra.io/#report=aa4cec913875ef1f9011b0f6910d378073c8d096&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A283439,%22end%22%3A292070%7D,%7B%22type%22%3A%22FocusedCallstackPrefixSampleFilter%22,%22name%22%3A%22%3CAnonymous%3E()%20%40%20edit%3Fpref%3D2%26pli%3D1%3A12%22,%22focusedCallstack%22%3A%5B0,7608%5D,%22appliesToJS%22%3Atrue%7D%5D But I must say to recognize the point represents the inline script is a bit tricky. Firstly, you need to switch the frame to "Content" with the "JavaScript only" checked. So that you can see the "<Anonymous>() @edit@pref=2&pli=1:12" call. Then, clicking on its arrow then uncheck the "JavaScript only" to show the details inside the engine. Currently I saw that the profiler reports it takes most of time on "js::Shape::search" and "js::NativeGetPropertyNoGC" inside the anonymous function call. I guess it's because the script is expanding the whole document structure so the engine needs to handle a very deep object via lots of getting properties, but I'm not sure this is the cause or the symptom. NI Naveed because of the JS finding (sorry I cannot NI who may be dedicated to issues like this); NI Bobby because of new updates.
Updated•8 years ago
|
Flags: needinfo?(bchien)
Comment 31•8 years ago
|
||
Comment 32•8 years ago
|
||
Comment 33•8 years ago
|
||
Another thing is I've noticed that our MinorGC is quite unstable in and after the inline script evaluation: sometimes it take 4 minor GCs at least occur 40ms ~ 120ms during the script evaluation time. And very often there will be a ~500ms(!) minor GC after the script evaluation (on Nightly). So far I haven't seen Chrome will take so much time on minor GC, although it also has some pieces larger than 10ms.
Comment 34•8 years ago
|
||
Including Terrence and Steve because of GC and GDocs respectively
Flags: needinfo?(nihsanullah) → needinfo?(terrence)
Comment 35•8 years ago
|
||
I found a serious problem. Vsync notifications from the compositor thread had been put in the task queue of the main thread for 16s with my firefox build (debug, m-c), 18s in the whole. The notification delivered to the refresh driver at 6s later after the content was generated by a big chunk of JS code. That means time to drawing the content would be 6s shorter if the refresh driver was ticking at right time. The variant, in time, of firefox is unreasonable huge. I guess it is one of the major reasons, timing of vsync notifications. By my estimation, it could affect 10+% of time in average. We need someone to fix it.
Comment 36•8 years ago
|
||
(In reply to Thinker Li [:sinker] from comment #35) > I found a serious problem. Vsync notifications from the compositor thread > had been put in the task queue of the main thread for 16s with my firefox > build (debug, m-c), 18s in the whole. The notification delivered to the > refresh driver at 6s later after the content was generated by a big chunk of > JS code. That means time to drawing the content would be 6s shorter if the > refresh driver was ticking at right time. > Bug 1294625 will address this problem to kick off refresh driver when JS paring is too long. Therefore, it is possible to render part of content first before paring is finished. > The variant, in time, of firefox is unreasonable huge. I guess it is one of > the major reasons, timing of vsync notifications. By my estimation, it > could affect 10+% of time in average. We need someone to fix it.
Depends on: 1294625
Updated•8 years ago
|
Flags: needinfo?(bchien)
Comment 37•8 years ago
|
||
With my colleague's top most MBP (15", 2.7GHz Core i7), the fattest script runs only in 2.4seconds, while Chrome still takes 1.4 seconds to run it. On my MBP (13", 3.1GHz Core i7), Firefox takes 6 ~ 8 seconds to evaluate the script while Chrome takes almost the same 1.4 seconds to evaluate it. I don't know if this reveals any important information, but it is quite interesting. And now I try to use Tracelogger to probe the case. Unfortunately, the tracelogger cannot generate result in website. I think it's because the case is too large, because for a very simple Google front page it works well.
Comment 38•8 years ago
|
||
By the way, all of this benchmarking should be done with JSGC_DISABLE_POISONING set to 1. I created a page https://wiki.mozilla.org/Benchmarking_advice to collect that sort of information, though I'll probably move it to an MDN page once my permissions over there are straightened out. I attempted to follow the instructions in comment 30, but I could not see "<Anonymous>() @edit@pref=2&pli=1:12" in the cleopatra UI anywhere. It might just be me, though; I have a lot of trouble using cleo. I will attach a screenshot.
Comment 39•8 years ago
|
||
Comment 40•8 years ago
|
||
Because Tracelogger result cannot not be shared easily as Gecko profiler, I built a Heroku app to demo the data collected online: https://boiling-coast-67040.herokuapp.com/website/tracelogger.html This is for this Google Doc case and on my self-built Nightly (NOT debug build). Because my DeveloperEdition reported empty *.tl files while I tried to tracelogger it. If anyone has new issue want to profile and upload, please NI me. Thanks.
Comment 41•8 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #38) > By the way, all of this benchmarking should be done with > JSGC_DISABLE_POISONING set to 1. I created a page > https://wiki.mozilla.org/Benchmarking_advice to collect that sort of > information, though I'll probably move it to an MDN page once my permissions > over there are straightened out. > > I attempted to follow the instructions in comment 30, but I could not see > "<Anonymous>() @edit@pref=2&pli=1:12" in the cleopatra UI anywhere. It might > just be me, though; I have a lot of trouble using cleo. I will attach a > screenshot. Okay. Here is a faster way to expand it: 1. move the mouse pointer to the Content timeline until the position of "285143ms" 2. Click on that 3. Wait it expands the stack 4. It should be in the expanded stack
Comment 42•8 years ago
|
||
Comment 43•8 years ago
|
||
According to the Tracelogger result, I've found the longest inline script seems not in the logs. I checked the log and it contained several inline & external script compilation information, but the edit:12 line wasn't in it. I don't know if this means something. Since I have a local version to pull it as an external script, I will do another tracelogging result to confirm that.
Comment 44•8 years ago
|
||
Thanks, with those instructions I was able to see the Anonymous function you were pointing to. As I understand it, there are 3 different issues involved here: 1. The vsync timing problem. A mitigation is being worked on in bug 1294625. 2. Very slow minor GCs. I'd like to get the output of a run with JS_GC_PROFILE_NURSERY=50000 3. Very very slow parsing + execution of an inline <script>...</script>, which doesn't seem to be showing up in tracelogger. I'm going to run this locally to investigate this further. (I tried this before and had trouble, but it was OS misconfiguration on my machine, and I recently did a fresh install.) I'd like to figure out if there are any stalls waiting for the script to arrive to be parsed or something.
Comment 45•8 years ago
|
||
As I mentioned, I have a local version pull all huge inline scripts out of HTML, and put them in 4 files (ex1.js ~ ex4.js, while ex3.js is corresponding to the largest one costs longest time in the timeline). And I now have the Tracelogger result for it: https://murmuring-castle-30204.herokuapp.com/website/tracelogger.html There is only part of ex4.js has been Ion compiled in the graph. If this really shows something, may I assert that our Jits aren't so helpful for these kinds of scripts, even they're pulled out from HTML?
Comment 46•8 years ago
|
||
...and I've found the script, will take longer time to evaluate with e10s. My result with externalized scripts is: Nightly w/ e10s: 4.2 seconds Nightly w/o e10s: 2.2 seconds DeveloperEdition w/ e10s: 5.2 seconds DeveloperEdition w/o e10s: 2.1 seconds Although it's fluctuated, I think the gap still not negligible. Thinker suggests that there may be too much tasks since extra IPC channels in e10s case. Maybe we should do some serious analysis on this result as well.
Comment 47•8 years ago
|
||
Comment 48•8 years ago
|
||
Comment 49•8 years ago
|
||
Comment 50•8 years ago
|
||
Comment 51•8 years ago
|
||
I've done some profiling on Win10 with official Nightly build. And it shows the performance on the "script" edit:12 makes no difference: they both complete the task in ~2.5 seconds. Gecko profiler result here: e10s https://cleopatra.io/#report=25000cf96d090d5c5e61bbec563902d622b8c710 non-10s https://cleopatra.io/#report=ee53f03572d113c505c8aabccbb46f4f8a4f5242
Comment 52•8 years ago
|
||
Comment 53•8 years ago
|
||
Comment 54•8 years ago
|
||
Too much |LogMessageRunnable| on the content's main thread, they should be aggregated.
Comment 55•8 years ago
|
||
A lot of IPC messages in |PCompositorBridge::Msg_DidComposite| type are dispatched to the main thread of the content. Most of them are with |aTransaction == 0|. I think we should deliver this message only when necessary.
Flags: needinfo?(hshih)
Updated•8 years ago
|
Flags: needinfo?(bugs)
Comment 57•8 years ago
|
||
I tried to accumulate all time spent on IonBuilder, and got the result: 660ms. This is a rough measuring so I need to refine the method and compare it to the overview profiles we have already did. However, this can be the first step to answer "in the test case, how much time we will spend on compilation".
Reporter | ||
Comment 58•8 years ago
|
||
Hi Thinker, Per our discussion, the testing result of partial matching for windows as below: # Result without local proxy Browser | Run time (median value) Chrome | 2444.44444444 ms Firefox | 2622.22222222 ms # Result with local proxy Browser | Run time (median value) Chrome | 2344.44444444 ms Firefox | 2488.88888889 ms
Reporter | ||
Comment 59•8 years ago
|
||
Reporter | ||
Comment 60•8 years ago
|
||
Reporter | ||
Comment 61•8 years ago
|
||
The flow files tested for windows are also attached.
Flags: needinfo?(tlee)
Comment 63•8 years ago
|
||
I have done a POC for prioritized runnable with priority from IPC message. It looks very promise for time variance of the partial content tests. This patch make runnables having priority, and process them in the order of their priority and the order of their arrival time. IPC messages have priority too, the priority can be normal, high, or urgent. I make the same priority value on runnables. And, the priority of IPC message are passed to the tasks that is responsible to dequeue and execute them. The result is the variance of partial content tests is quite low. I have try it several time. All them fall in 10s~12s, 11s for most cases. To Shako, could you try this patch? https://treeherder.mozilla.org/#/jobs?repo=try&revision=bc5dddb372e6
Flags: needinfo?(tlee) → needinfo?(sho)
Reporter | ||
Comment 64•8 years ago
|
||
sure, will update the test result once test is finished
Flags: needinfo?(sho)
Reporter | ||
Comment 65•8 years ago
|
||
Hi Thinker, The test result as below: Median time: 4433.33333333 Avg time : 4450.74074074 Std dev : 87.7089900809
Comment 66•8 years ago
|
||
(In reply to Shako Ho from comment #65) > Hi Thinker, > > The test result as below: > > Median time: 4433.33333333 > Avg time : 4450.74074074 > Std dev : 87.7089900809 Hi Shako, What is the environment for the test? Comment 58, the average time is about 2xxx. Why is it 4xxx now? And, I like to know what is different with the change. So, could you also put down numbers of a reference test?
Comment 67•8 years ago
|
||
By the way, assume the run time is in normal distribution. It means 95.x% of probability would fall in 4.45s+/-0.194s.
Reporter | ||
Comment 68•8 years ago
|
||
Hi Thinker, The test result is based on Ubuntu 14.04. The comment 58 is the test result based on Windows. And you can also take the result compare with bug 1294625 1. check painting suppression status asap(bug 1271691) + tick refresh driver during html parsing(bug 1294625) https://treeherder.mozilla.org/#/jobs?repo=try&revision=e1f7ca95b09f Median time: 4722.22222222 Avg time : 4733.7037037 Std dev : 165.812213019 2. tick refresh driver during html parsing(bug 1294625) https://treeherder.mozilla.org/#/jobs?repo=try&revision=0063015b7218 Median time: 4877.77777778 Avg time : 5278.51851852 Std dev : 730.825866529 3. check painting suppression status asap(bug 1271691) https://treeherder.mozilla.org/#/jobs?repo=try&revision=e14bdefbe9a1 Median time: 4500.0 Avg time : 4512.59259259 Std dev : 89.8894306321
Comment 69•8 years ago
|
||
I have found that our helper threads in Ion seemed to only help MIR optimisation, while the "not-helper-threads" done almost 85% work. I don't know if it's in purpose but from the thread overview it makes the tasks very unbalanced. This makes me wonder if we can make it more balanced via delivering more tasks besides MIR optimisation to help threads, maybe we can raise our throughput in this case. However, I don't know if this is do-able so I firstly NI Steve.
Flags: needinfo?(sphink)
Comment 70•8 years ago
|
||
Are you seeing this in the TraceLogger output? If I understand correctly, you are seeing one thread doing a large chunk of JIT compilation, while most of the threads are doing nothing. The difficulty is finding things to do in parallel. I will forward the question to Jan. jandem - this may be a very naïve view of it, but my understanding is that you run baseline code until it hits some fixed iteration count threshold, then start ion compiling it on a helper thread, then switch over to the compiled version when the compilation is complete. Is that correct? If another script were to get hot enough while the first is compiling, would it get compiled in parallel, or is there only one compilation thread available? Is it possible to ion compile multiple scripts concurrently? Would it be possible, and would it make sense, to adjust the compilation threshold somewhat based on availability of compilation resources? For example, you could do it in a work-stealing way: reduce the thresholds a bit. When a script hits the threshold, it is queued up in a priority queue sorted by (a function of) execution count. Worker threads pull items off the queue and compile them. It introduces even more nondeterminism, which is unfortunate, and perhaps compilation simply cannot be done concurrently; I don't know.
Flags: needinfo?(sphink) → needinfo?(jdemooij)
Updated•8 years ago
|
Whiteboard: [platform-rel-Google][platform-rel-GoogleDocs] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Comment 71•8 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #70) > If > another script were to get hot enough while the first is compiling, would it > get compiled in parallel, or is there only one compilation thread available? > Is it possible to ion compile multiple scripts concurrently? Would it be > possible, and would it make sense, to adjust the compilation threshold > somewhat based on availability of compilation resources? We do have some mechanism to pause compilation threads when a new compilation job has higher priority. Hannes can probably say more about this.
Flags: needinfo?(jdemooij) → needinfo?(hv1989)
Comment 72•8 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #71) > We do have some mechanism to pause compilation threads when a new > compilation job has higher priority. Hannes can probably say more about this. bug 1013172 comment 0 describes this. For various reasons we decided to only do one compilation at a time. (Not take all cpu power, battery consumption, peak memory ...). Which is what you are seeing here. Though, like said, that means priority is important. We currently decide this on the warmUp count and when another script because more hot, we will switch to complete that script first. Now I don't think seeing a long time of compilations is inherently bad. We are quite aggressive in compiling things. As a result it could be that we keep compiling but that it doesn't make a big difference for the execution speed, since the functions aren't used that much... If you have a tracelogger log you might want to iterate over: [start AnnotateScript foo.js] [start IonCompilation] [stop IonCompilation] [stop AnnotateScript] ... [start script foo.js] [start Interpreter/Baseline] [stop Interpreter/Baseline] [stop script foo.js] ... [start AnnotateScript foo.js] [start IonLinking] [stop IonLinking] [stop AnnotateScript] if you would look in between start of the compilation and end of the compilation and count how many times a script was called and how much time we spend in that function (not children). That would give an estimate how much time was lost due to waiting for the compilation to be done. (This is inclusive the time it is waiting on getting build offthread). If you want to know how to extract this information. Look at slide 35 or last slide. https://docs.google.com/presentation/d/1j_Hi9uP2Woz2OoAuxX9We11GbLhh0e8N-tbKYz1MeZU/edit?usp=sharing It gives a easy way to iterate the tree. Changing that script a little bit should be able to give the above information. If you have that information you would have an idea how much time we spend in Baseline waiting for IonMonkey to finish. If that is small, I'm pretty sure we won't see an improvement by compiling more at the same time.
Flags: needinfo?(hv1989)
Updated•8 years ago
|
Component: General → JavaScript Engine
Updated•8 years ago
|
Summary: [Perf][google docs] 45.25%(4528ms) slower than Chrome when opening 1 page table content → [perf][google suite][google docs] 45.25%(4528ms) slower than Chrome when opening 1 page table content
Updated•8 years ago
|
status-firefox52:
--- → affected
status-firefox-esr45:
--- → affected
Comment 75•8 years ago
|
||
Shako, Thinker, now that bug 1306591 has landed, could we perhaps get some new numbers here?
Flags: needinfo?(tlee)
Flags: needinfo?(sho)
Comment 76•8 years ago
|
||
For gdoc_read_basic_table_1 case: Firefox: http://tinyurl.com/hbr9nzz Chrome: http://tinyurl.com/hh5mp23 The result is positive. It's hard to show number which based on same baseline/content with original context on this bug. However, by compare with previous revision, firefox has 13% improvement in this case. In the same time, Chrome has 8% improvement in same case. It's hard to say theses improvement was caused by web content or not, but we test Firefox and Chrome at similar time. For all regression cases: Compared with previous revision: http://tinyurl.com/j8nvcp5 Compared with next revision: http://tinyurl.com/go9ugee You can see similar trend in all categories from comparison (see results with Firefox tag) with previous revision. This patch has positive results. Especially, very good result in SpeedIndex (SI). Conclusion: This patch does improve perceived performance, roughly less than 10% improvement.
Flags: needinfo?(tlee)
Flags: needinfo?(sho)
Comment 77•8 years ago
|
||
Bobby, could you do test to compare with previous revision at the version of gdoc? In another word, run new and old revision at the same time to eliminate the possible of change of gdoc. It is more reasonable to compare with firefox itself.
Flags: needinfo?(bchien)
Updated•8 years ago
|
Flags: needinfo?(bchien)
QA Contact: ctang
Updated•8 years ago
|
Flags: needinfo?(bchien)
Comment 78•8 years ago
|
||
Linux testing machine is busy. Use Windows machine instead. Previous build: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32/1478579975/ Build with patch: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32/1478642363/
Reporter | ||
Comment 79•8 years ago
|
||
Hi Thinker, The test result as below: OS: Windows 7 CPU: i7-3770 3.4GMhz Memory: 16GB Ram (4GB 1600MHZ x 4) Hard Drive: 1TB SATA HDD Graphics: GF108 [GeForce GT 440/630] previous build | build with patch Median time | 14541.6666667 ms | 13050.0 ms Avg time | 14541.1111111 ms | 13775.1600753 ms Std dev | 118.144084454 ms | 1748.99567772 ms SI | 4941.5 | 5108.0 PSI | 36256.5 | 38125.0
Comment 80•8 years ago
|
||
Comparison video attached: https://youtu.be/dOvyBlVHHHE
Flags: needinfo?(bchien)
Updated•8 years ago
|
See Also: → gdoc_read_basic_table_1_win7(40%)
Updated•7 years ago
|
Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:investigate][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Comment 81•7 years ago
|
||
Mass wontfix for bugs affecting firefox 52.
Updated•6 years ago
|
Whiteboard: [qf:investigate][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Comment 82•6 years ago
|
||
This seems actionable to me, leaving P1.
Comment 83•6 years ago
|
||
Nobody looked into these bugs for a while, moving to P2 and waiting for [qf] re-evaluation before moving back to P1.
Priority: P1 → P2
Updated•6 years ago
|
Whiteboard: [qf][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:p1:64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Comment 84•6 years ago
|
||
Vicky, since this qualify as a qf:p1 for Firefox 64, would it be possible to have updated information about this bug? Adding a profile to justify that this is still a JS issue and a bit of documentation to explain how to reproduce this issue would be great.
Flags: needinfo?(vchin)
Comment 85•6 years ago
|
||
We discussed this in the QF triage meeting and agreed this needed further investigation as the steps to reproduce no longer exists. However having read through the above comments it seems based on comment 78/79 that the problem has largely been fixed?
Flags: needinfo?(vchin)
Updated•6 years ago
|
Whiteboard: [qf:p1:64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:p1:f64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Updated•6 years ago
|
Assignee: nobody → kvijayan
Comment 86•6 years ago
|
||
Taking notes in the bug as I analyze this further. Narrow profile of user-jank of 2.2 seconds: https://perfht.ml/2xl21hI Now, looking at the call stack - I notice a huge chunk of the time spent punching down a call stack using unoptimized fallback paths. We also invoke a lot of IonCompiles - around 16% of the time spent in this early-phase is ion-compiles. Nbp's work on reducing ion-compile times by skipping cold basic blocks would improve this component significantly. The other major thing is just "slow interpreter". Primarily - we need faster call paths and generally better optimization of fastpaths. Interpreter ICs should help significantly on this front. This is confirmed by looking at the inverted C++-only call stack to discover that we're spending 25% of our time in js::Interpret / js::RunScript. The next hang of 1 second: https://perfht.ml/2PP1Geb Here also, we spend about 90% of our time in native code, not jits. About 10% in baseline jit, and 1% in ion. This is also cold-code issues. A total of 12% of the time is spent in js::Interpret (and inlined functions) alone. We're almost never entering the jits here at all. This is a the same story across the board. Fundamentally the problem here is that we're running a ton of cold code, and not entering the jits. The best near-term solution for this is Interpreter ICs. I don't anticipate that being doable by 64.
Comment 87•6 years ago
|
||
(In reply to Kannan Vijayan [:djvj] from comment #86) > This is confirmed by looking at the inverted C++-only call stack to discover > that we're spending 25% of our time in js::Interpret / js::RunScript. js::RunScript will also include the time we spend in Baseline. I've narrowed down the two slices that you've used to some top hot functions, by using the transforms "Collapse function's subtree across the entire tree" and "Merge function into caller across the entire tree" in the inverted callstack mode. Here's the result: hang of 2.2 seconds: https://perfht.ml/2QJLeNg next hang of 1 second: https://perfht.ml/2QHfIiY Only look at the roots of the inverted tree, and only at the top ~15 of that list. Not sure if it's helpful. The only things that stood out to me, other than what you already said, is the fact that we spend 4.2% of the first slice in CreateThis, and 1.7% of the second slice in CreateThisForFunction.
Comment 88•6 years ago
|
||
Based on Kannan's comments in Comment 86, we decided to move the QF target of this bug to [qf:p1:f67]. Additionally, we marked this bug [qf:js:investigate] to indicate we need to do further investigation. Last, since Kannan believes that improving the interpreter performance would help addresses the issues here, we made this a dependency of the Interpreter IC bug (1361474).
Blocks: 1361474
Whiteboard: [qf:p1:f64][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:p1:f67][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate]
Updated•6 years ago
|
Status: NEW → ASSIGNED
Whiteboard: [qf:p1:f67][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate]
Comment hidden (offtopic) |
Updated•3 years ago
|
Assignee: kvijayan → nobody
Status: ASSIGNED → NEW
Updated•2 years ago
|
Performance Impact: --- → ?
Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs][qf:js:investigate] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Updated•2 years ago
|
Performance Impact: ? → ---
Comment 90•2 years ago
|
||
In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.
Severity: critical → --
Updated•11 months ago
|
Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•