Closed Bug 1472110 Opened 6 years ago Closed 6 years ago

3.77 - 5.8% remote-blank / remote-nytimes (android-4-2-armv7-api16, android-6-0-armv8-api16) regression on push d6120c2bb51e2057df51f4d52510bb5f4e8b4ca5 (Fri Jun 29 2018)

Categories

(Firefox Build System :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: perf, regression)

We have detected an autophone (Android) regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=d6120c2bb51e2057df51f4d52510bb5f4e8b4ca5

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  6%  remote-blank android-6-0-armv8-api16 opt      385.48 -> 407.86
  5%  remote-nytimes android-4-2-armv7-api16 opt    2,964.92 -> 3,119.07
  4%  remote-nytimes android-6-0-armv8-api16 opt    1,002.64 -> 1,040.42


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=14082

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/EngineeringProductivity/Autophone
Product: Testing → Firefox Build System
Version: Version 3 → unspecified
Flags: needinfo?(kmaglione+bmo)
I believe this is actually from bug 1453691.
Blocks: 1453691
No longer blocks: 1459004
Flags: needinfo?(kmaglione+bmo) → needinfo?(wisniewskit)
Without 1459004, we can't ship system addons in Fennec. We need it for 2 webcompat-related addons: the fb/google experiment one (which is nightly only, and we'll back out in ~4 weeks), and a mobile equivalent of the gofaster webcompat addon that ships in Dekstop.
Bug 1459004 seems unlikely; how does the change to how the file get generated impact performance at runtime?  Different ordering of the file, maaaybe?  Is the generated file somehow larger than the previous file?
Note that bug 1459004 caused this add-on to start working, so it could very well be the reason. Indeed, the add-on *is* intermittently kicking in when I visit nytimes.com on today's Fennec nightly, as its informational console message appears during some page-loads: "The user agent string has been overridden to get the Chrome experience on this site."

I've investigated during one of those times, and it seems that there are indeed resources being loaded from www.google.com and www.facebook.com, like a bunch of single-pixel tracking gifs and 0-byte html files.

>https://www.google.com/ads/user-lists/1008590664/?random=1530283404354&cv=9&fst=1530280800000&num=1&guid=ON&u_h=640&u_w=360&u_ah=640&u_aw=360&u_cd=24&u_his=2&u_tz=-240&u_java=false&u_nplug=0&u_nmime=0&gtm=G6c&sendb=1&frm=0&url=https://mobile.nytimes.com/&tiba=The New York Times - Breaking News, World News & Multimedia&async=1&fmt=3&cdct=2&is_vtc=1&random=3782160405&resp=GooglemKTybQhCsO&rmt_tld=0&ipr=y

>https://www.facebook.com/tr/?id=100468016962764&ev=PageView&dl=https%3A%2F%2Fmobile.nytimes.com%2F&rl=&if=false&ts=1530283386272&sw=360&sh=640&v=2.8.18&r=stable&ec=0&o=28&it=1530283386198

However, those same resources are sometimes loaded without my add-on enabled, where the normal Firefox UA is being sent. So I can't be sure if the add-on is actually the culprit. It's possible that the add-on is triggering perf regression, perhaps due to a cause like one of these:

- the ad-loading scripts behave differently when given a Chrome UA, which somehow makes them a bit slower on Firefox.
- the ads being loaded are simply not the same ones in all Talos runs, and some of them happen to load more slowly than expected.

It's possible that tweaking the add-on to not kick in on www.google.com/ads and www.facebook.com/tr might be enough to make this regression go away. But then again this all could just be a red herring.
Flags: needinfo?(wisniewskit)
Unfortunately, remote-blank and remote-nytimes regressions usually just mean startup time regressions. That would be my best guess here.
These were all on first visit measurements not the second visit. Autophone S1S2 does start the browser to load a blank page and then shuts down before beginning the real tests fwiw.
(In reply to twisniewski from comment #5)
> It's possible that the add-on is
> triggering perf regression, perhaps due to a cause like one of these:
> 
> - the ad-loading scripts behave differently when given a Chrome UA, which
> somehow makes them a bit slower on Firefox.
> - the ads being loaded are simply not the same ones in all Talos runs, and
> some of them happen to load more slowly than expected.

Can we check these scenarios?
Flags: needinfo?(wisniewskit)
I'm unaware of how the Talos test operates, so I can't be sure. If it could be loading different content, then perhaps we could log the network requests being made by a series of runs of that test, to confirm which ad/etc is being loaded, and which UA string is being sent to each request.

In addition, the fix in bug 1473181 could impact Talos runs as well, so it might be worth waiting for that to land.
Flags: needinfo?(wisniewskit)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #8)
> (In reply to twisniewski from comment #5)
> > It's possible that the add-on is
> > triggering perf regression, perhaps due to a cause like one of these:
> > 
> > - the ad-loading scripts behave differently when given a Chrome UA, which
> > somehow makes them a bit slower on Firefox.
> > - the ads being loaded are simply not the same ones in all Talos runs, and
> > some of them happen to load more slowly than expected.
> 
> Can we check these scenarios?

Joel, we don't run ads in Talos runs, right? My understanding is we have pages stripped of such things so as to be predictable.
Flags: needinfo?(jmaher)
remote-blank is just a blank page load, there is no ads or javascript there.

remote-nytimes could have ads in it- there is debates over should we have ads to be more realistic or not to be more stable.  We have found that ads do surface noise- and sometimes a test is stable and a small change will tickle the ordering or timing and result in a regression or bi-modal distribution because our mozAfterPaint could be before or after the ad is displayed.

Given that we see a regression on remote-blank, I would say that this regression isn't so dependent on the content being loaded.
Flags: needinfo?(jmaher)
No remote content is (or at least should) be loaded from the test pages. wireshark is a little problematic for me considering the other traffic on my network but using the developer tools to load the urls doesn't show any outside network requests that I can see.

If you have access to the vpn you can load the urls from:

http://10.252.73.230:8100/files/ep1/nytimes/nytimes.com/index.html
http://10.252.73.230:8100/files/s1s2/blank.html
How should we proceed on this matter?
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.