All of the Android panda tests regressed when APZ landed: tcheck2: https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,c233ba1133abbd544002dfbc29d9e63ced42a20e,1]&selected=[mozilla-inbound,c233ba1133abbd544002dfbc29d9e63ced42a20e,23540,17964790] tsvgx: https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,53a0f4e33f31d2db3c8676c702c800347e0c1c2c,1]&selected=[mozilla-inbound,53a0f4e33f31d2db3c8676c702c800347e0c1c2c,23540,17964791] tp4m: https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,93f8a6f2ac1cbc553f92cfe7eaac28ded90f1fb4,1]&selected=[mozilla-inbound,93f8a6f2ac1cbc553f92cfe7eaac28ded90f1fb4,23540,17964789] -- Note that it *appears* that the cause is bug 1226627, but this is almost certainly not the case as :kats forgot to clobber the build. Kats volunteered to investigate this.
The tcheck2 difference is expected, since the entire scrolling implementation is different, and we should just accept the new baseline for now. Eventually we'll be rewriting that test anyway. The tsvgx regression is not entirely unexpected since we saw that test regress on desktop platforms when APZ was enabled. However the magnitude of the regression is greater on Android which is interesting. tp4m - no idea what this tests. I'll find the code and see if a regression here makes sense or if it's completely unrelated.
tp4m is tp4 mobile- 20 pages that we load (all cached locally on the webserver): http://people.mozilla.org/~jmaher/taloszips/zips/mobile_tp4.zip these are run through pageloader a few times each and we record the pageload time.
tp4m might also be running on autophone, IIRC.
tp4m and tsvgx are running on autophone, but the device went offline Nov 28 and we don't have data since then.
So then tp4m regression is likely just the extra displayport area that we're painting during page load. This more or less corresponds to the TP5 regression we had on desktop, but again the magnitude of the regression is greater. I'll do some real-world page load profiling to see if anything jumps out.
I was have trouble getting useful profile data, but I dumped the size of the displayport to see if that was significantly different between the Java PZC and the C++ APZ code. Turns out that on initial page load (on nytimes.com) the C++ APZ displayport is actually a little smaller than the Java PZC displayport. However I did see that the low-res displayport for the root layer was unnecessarily large, so I filed bug 1229853 for that. I'm not sure yet if that will actually help. Other than the displayport size the only things I can think of would be relevant here are the extra overhead from event regions (which we can't do much about, and just have to accept) and checking if we're doing extra paints that can be avoided. I'll look at that now.
I tried profiling a bit more and failed miserably. I ended up grabbing the TALOSDATA output in the log from the tpn runs before and after APZ landed. Turns out we're loading 7 pages twice each. I did some retriggers to get 5+ samples and looked at the data broken down by each page load ("X st Y" means average of X with stddev Y): Site/Load Before After Wikipedia1 766.40 st 143.71 849.83 st 48.87 Wikipedia2 752.80 st 53.76 748.67 st 95.73 BBC1 223.20 st 115.48 196.50 st 10.63 BBC2 126.60 st 6.89 195.50 st 22.26 Yahoo1 * 1898.40 st 346.50 2603.00 st 429.97 Yahoo2 * 1326.80 st 96.90 1900.83 st 203.75 ESPN1 864.20 st 52.23 870.83 st 37.92 ESPN2 958.00 st 80.26 1182.00 st 300.69 Amazon1 * 1522.00 st 141.83 1874.33 st 220.47 Amazon2 * 1556.60 st 272.95 1747.50 st 185.15 Yandex1 1106.00 st 234.47 1308.00 st 132.58 Yandex2 1181.60 st 295.16 1088.17 st 133.89 AccuWeather1 * 507.40 st 44.47 735.00 st 240.23 AccuWeather2 * 582.80 st 67.67 709.67 st 156.51 The ones with the asterisks are where there is what I would consider a significant regression, so that only happens on 3 of the 7 pages - m.yahoo.co.jp, amazon.com, and m.accuweather.com. I loaded these pages from the tp4m set that jmaher posted in comment 2 and noticed that of the 7 yahoo and amazon are the only two which are "desktop style" pages (i.e. they get laid out to a wide viewport and load zoomed out). This means that we're inherently painting more CSS pixels for these pages, so the overhead from event regions will be higher, which sort of explains the higher regression there. I can't really explain why the accuweather page regressed so much though.
Ah! So I did a try push with pre-apz and some logging , and it turns out that when the tp4m pages were painted, they didn't have a displayport. I suspect this was a result of the way the harness was loading the pages and bypassing some Java/browser.js code, although I didn't confirm that. With the new APZ code the displayport is getting set in C++ rather than browser.js and so we are painting more in all cases. Given that, I'm happy with accepting the "regression" as the new baseline and moving on.  https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ef1c831decb&selectedJob=14406338
I was looking for a fix to this- on autophone we are getting about 20% of the tp4m tests timing out after switching to APZ. We will have to hack on this a bit, but this will be unplanned work. also should we disable rck2 since it is not valid in the new APZ land until the test is rewritten?
Yeah I can disable rck2 and land that as part of this bug. We can discuss the autophone thing over in bug 1229690.
thanks kats, please let me know what I can do to help out with the rck tests and the autophone tests! I am sure we can all get this figured out in the coming weeks.
(In reply to Kartikaya Gupta (email:firstname.lastname@example.org) from comment #10) > Yeah I can disable rck2 and land that as part of this bug. This is being done in bug 1230572, so I'm closing this bug up again.