Closed Bug 1276160 Opened 8 years ago Closed 2 months ago

Performance regression from Release to Nightly

Categories

(Core :: Web Painting, defect, P3)

49 Branch
All
Windows
defect

Tracking

()

VERIFIED FIXED
Tracking Status
e10s - ---
firefox47 - wontfix
firefox48 + wontfix
firefox49 + wontfix
firefox131 --- verified
firefox132 --- verified

People

(Reporter: roxana.leitan, Unassigned)

References

Details

Attachments

(1 file)

Attached file DOM_slow.htm
20160502172042 Mozilla/5.0 (Windows NT 6.1; rv:46.0) Gecko/20100101 Firefox/46.0 20160526030223 Mozilla/5.0 (Windows NT 6.1; rv:49.0) Gecko/20100101 Firefox/49.0 [Affected versions]: Nightly 49.0a1 Aurora 48.0a2 Beta 47.0b8 [Affected platforms]: Windows - all [Steps to reproduce]: 1.Open FF with Clean Profile 2.Open attached testcase 3.Click on "Full Render" button 4.Observe test results [Expected result]: The test results on Nightly should be the same or best than on FF 46.0.1 release(15.899 seconds) [Actual result]: Nightly 49.0a1 -23.271 seconds (with e10s) -19.725 seconds (non e10s) Nightly 48.0a1 -26.493 seconds (with e10s) -22.354 seconds (non e10s) [Regression range]: -The issue is present on all latest Firefox versions(Nightly 49.0a1-46.0a1, Aurora 48.0a2 and Beta 47.0b8) except for 46.0.1 release [Additional notes]: Beta: 47.0b8 build id :20160523113146 - 21.707 seconds Aurora: 48.0a2 build id :20160526004016 - 20.447 seconds Nightly: 47.0a1: 28.014 seconds (e10s) 23.841 seconds (non e10s)
Seems to get progressively worse clicking the "full render" button a second time.
Can you figure out a regression range?
Flags: needinfo?(roxana.leitan)
tracking-e10s: --- → ?
Flags: needinfo?(roxana.leitan)
[Tracking Requested - why for this release]: perf regression in rendering demo
Priority: -- → P1
Tracking this rendering regression for 48/49.
I have investigated this and found following regression range https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=140ac04d7454&tochange=441f5fd256e2 , but I am not sure that is very concludent. Regarding testing results, I think it can be related to e10s. Here are the results where I found the regression range, with and without e10s: Nightly 32.0a1 - 19.330 seconds (e10s) - 14.619 seconds (non e10s) Following are the results while testing( maybe will be useful for investigation): Nightly 44.0a1 build id: 20151028030432 - 32.352 seconds (e10s) - 19.852 seconds (non e10s) 44.0a1 build id: 20151028030432 - 31.339 seconds (e10s) - 19.852 seconds (non e10s) 45.0a1 build id: 20151213030241 - 27.479 seconds (e10s) - 22.191 seconds (non e10s) 46.0a1 build id: 20160124030209 - 28.701 seconds (e10s) - 17.263 seconds (non e10s) 47.0a1 - 28.014 seconds (e10s) - 23.841 seconds (non e10s) 48.0a1 - 26.493 seconds (e10s) - 22.354 seconds (non e10s) 49.0a1 - 23.271 seconds (e10s) - 19.725 seconds (non e10s)
It seems to have gradually worsened, At least since Jan-2016, I found 2 regressions as follows. #1 regression window(w/o e10s): cset : sec 978349072992b565123bcd97e0778abaa7a67256 : 49 66252157547f3f4e0b9ad9fb3b0b96d6df5938fe : 56 #1 Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=978349072992b565123bcd97e0778abaa7a67256&tochange=66252157547f3f4e0b9ad9fb3b0b96d6df5938fe #1 Regressed by: Bug 1235478 #2 a latest regression window(w/o e10s): cset : sec 336a70a8dfe4a980627b43da56c8b93658d3fba9 : 55 f5de44ecf07f3cbd369015663b72d452fda6d177 : 66 #2 Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=336a70a8dfe4a980627b43da56c8b93658d3fba9&tochange=f5de44ecf07f3cbd369015663b72d452fda6d177 #2 Regressed by: Bug 1236043
Blocks: 1236043, 1235478
Flags: needinfo?(mstange)
Flags: needinfo?(hiikezoe)
(In reply to Alice0775 White from comment #6) > #2 a latest regression window(w/o e10s): > cset : sec > 336a70a8dfe4a980627b43da56c8b93658d3fba9 : 55 > f5de44ecf07f3cbd369015663b72d452fda6d177 : 66 > > #2 Pushlog: > https://hg.mozilla.org/integration/mozilla-inbound/ > pushloghtml?fromchange=336a70a8dfe4a980627b43da56c8b93658d3fba9&tochange=f5de > 44ecf07f3cbd369015663b72d452fda6d177 > > #2 Regressed by: > Bug 1236043 At 1x resolution and a not-gigantic window size, the status text in the top right of the page and pixels in the middle of the rendered picture end up in the same 256x256 tile so we end up invalidating more than we used to. I'll try to see if reducing the tile size to 128x128 improves things. But I won't make that change in this bug because this bug is about two different regressions.
Flags: needinfo?(mstange)
The effect of bug 1235478 to wake inactive parent refresh driver up in the *same* tick, as a result rendering process is done in the same tick. In case of attachment 8757188 [details], it only happens on the first rendering. I am surprised it took 7sec in comment 7. I will try the test case locally.
Based on comment 6, if this gradual perf degradation is happening since Jan 2016, this must be a problem with Fx45/46 too. I am not sure if we have used this as a benchmark in the past releases and whether we can use this bug and the data here to potentially block release. Florin, what do you think? I am going to wontfix this for Fx47 for reasons I stated above. Please let me know if there are any concerns.
Flags: needinfo?(florin.mezei)
It turns out waking up inactive parent driver (which means switching timer from inactive to active) happens more often than I thought. It's roughly 20 times in the test case. Before bug 1235478 some rendering frames must be actually skipped (I can't see it in my eyes). After bug 1235478 each frame is rendered correctly, so I think the performance loss is inevitable. One thing is not clear to me is why switching time is so often, I guess it's caused by busyness of the main-thread.
Flags: needinfo?(hiikezoe)
I ran some tests using Windows 10 x 64 and I had the following results : 46.0.1 13.747 s 47.0 14.946 s 47.0b8 14,0 s 48.0a2 17,98 s (e10) 48.0a2 15.00s (non e10) 49.0a1 18,97s (e10) 49.0a1 16.20s (none10) I can also confirm that on second tries the time gets progressively worse, but this problem is with other browsers as well. Also time varies approximately one second.
(In reply to Ritu Kothari (:ritu) from comment #9) > Based on comment 6, if this gradual perf degradation is happening since Jan > 2016, this must be a problem with Fx45/46 too. I am not sure if we have used > this as a benchmark in the past releases and whether we can use this bug and > the data here to potentially block release. Florin, what do you think? I am > going to wontfix this for Fx47 for reasons I stated above. Please let me > know if there are any concerns. We haven't used this as a benchmark for past releases. Based on Cristian's measurements on Windows 10 x64, and my measurements (below) on Windows 7 x64, I am inclined to agree that this is likely not a blocker for 47. It's the developers however who should tell us whether this regression would have a visible impact on the actual users. Windows 7 x64 45.0.2 - 19.4s, 17.9s, 19.0s - avg. ~18.8s 46.0 - 18.7s, 20.4s, 19.4s - avg. ~19.5s 46.0.1 - 19.4s, 19.7s, 20.3s - avg. ~19.8s 47.0RC - 21.1s, 21.0s, 21.8s - avg. ~21.3s
Flags: needinfo?(florin.mezei)
Hi Joel, I'm not sure if you are also aware of this perf regression. Might need your input here.
Flags: needinfo?(jmaher)
thanks for the heads up, I can follow along in this bug. I looked briefly at dromaeo_dom (we only run on linux as the test is too noisy and unstable on other platforms). The results don't look too alarming: https://treeherder.mozilla.org/perf.html#/graphs?timerange=31536000&series=%5Bmozilla-inbound,c787a154e72d73df0c21563ce1ccda67fd8133ee,1,1%5D&series=%5Bmozilla-inbound,fe81acee1b78d6db3838923b0cb6650e5b0168da,1,1%5D&zoom=1442871270978.022,1468826849000,1265.520457441479,1678.15986264594 ^ note: this is a test where higher is better and lower is worse there are many cases were we can show much worse regression over time. Sadly we don't run Talos performance tests on windows 7 x64, nor windows 10 at all.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher - PTO back Monday August 15th) from comment #14) > thanks for the heads up, I can follow along in this bug. I looked briefly > at dromaeo_dom (we only run on linux as the test is too noisy and unstable > on other platforms). The results don't look too alarming: > https://treeherder.mozilla.org/perf.html#/ > graphs?timerange=31536000&series=%5Bmozilla-inbound, > c787a154e72d73df0c21563ce1ccda67fd8133ee,1,1%5D&series=%5Bmozilla-inbound, > fe81acee1b78d6db3838923b0cb6650e5b0168da,1,1%5D&zoom=1442871270978.022, > 1468826849000,1265.520457441479,1678.15986264594 > > ^ note: this is a test where higher is better and lower is worse > > there are many cases were we can show much worse regression over time. > Sadly we don't run Talos performance tests on windows 7 x64, nor windows 10 > at all. Hi :rwood, Because :jmaher is on PTO, is there anyway to identify if any regression happened in win 7 x64 and win 10? Just want to know if this bug impacts a lot. Does no alarming on linux also mean no alarming in win?
Flags: needinfo?(rwood)
Hi :gchang, I don't know enough about the regression or the test to say if acceptable results on linux would also mean acceptable results on win 7 x64 and win 10. As noted above the talos tests aren't run on win 7 x64 and win 10, also the dromaeo_dom test that :jmaher mentioned is only run on linux anyway. I don't have any suggestions, sorry, other than manually running the test in comment 1 and comparing.
Flags: needinfo?(rwood)
Too late to do anything in 48, happy to take a patch in 49
Component: Layout: View Rendering → Layout: Web Painting
Priority: P1 → P3
Severity: normal → S3

Nightly: https://share.firefox.dev/3WYzF81 (4.3s)
Chrome: 6.5s

Calling this fixed.

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED

Marked as verified on Win11x64 using FF builds 132.0a1 (5.251sec) and 131.0 (5.307sec).

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: