Closed Bug 1625470 Opened 5 years ago Closed 5 years ago

Add Chrome & Fenix live site tests in CI

Tracking

(firefox77 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla77

Tracking Flags:

Tracking

Status

firefox77

---

fixed

People

(Reporter: sparky, Assigned: sparky, NeedInfo)

References

Details

Attachments

(5 files)

Bug 1625470 - Part 1: Add --live-sites command line option to Raptor. r?#perftest 5 years ago Greg Mierzwinski [:sparky] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1625470 - Part 2: Add new live-site-only tests to Raptor-Browsertime. r?#perftest 5 years ago Greg Mierzwinski [:sparky] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1625470 - Part 3: Add CI tasks for Chrome and Fenix live site tests. r?#perftest 5 years ago Greg Mierzwinski [:sparky] 47 bytes, text/x-phabricator-request		Details \| Review
Possibly problematic pages 5 years ago Greg Mierzwinski [:sparky] 2.03 MB, application/x-7z-compressed		Details
possibly-problematic.7z 5 years ago Greg Mierzwinski [:sparky] 829.95 KB, application/x-7z-compressed		Details

Greg Mierzwinski [:sparky]

Assignee

Description

•

5 years ago

This bug is for enabling Chrome and Fenix tests in CI. This depends on finding the tier 1 perf tests.

Greg Mierzwinski [:sparky]

Assignee

Updated

•

5 years ago

Priority: -- → P2

Greg Mierzwinski [:sparky]

Assignee

Comment 1

•

5 years ago

We'll use this list of page load tests here found from here:

espn
amazon-search
google-slides
twitch
fandom
twitter
facebook
tumblr
yandex
wikia
wikipedia
bing

Greg Mierzwinski [:sparky]

Assignee

Comment 2

•

5 years ago

•

Edited

:acreskey, :tarek, what do you think about this list of sites? Are there any that sites we should test that aren't listed and are there any sites that are listed but shouldn't be tested?

EDIT: Note that some of these tests don't currently exist for mobile, so they would be added (they exist for desktop though).

Flags: needinfo?(tarek)

Flags: needinfo?(acreskey)

Andrew Creskey [:acreskey]

Comment 3

•

5 years ago

Greg, the sites from Bug 1591466 that you listed in Comment 1 are those that caught a lot of regressions in raptor.

For comparing Fenix to Chrome, I suspect that we will want sites prioritized by popularity. (Adding Eric)

We will have to see the videos for Chrome, but we will also have to prune any that trigger progressive web app pop-ups (I think only wikipedia from that list).

To improve stability, we also favored sites that linked to a subresource, for instance:
https://www.bbc.com/news/business-47245877
instead of
https://www.bbc.com

This is what we've been using for the Fennec/Chrome/Fenix comparisons:
https://docs.google.com/spreadsheets/d/18qCiz3SReDgDPwhbYfuDrbnBK1030FuVWGBHWwdgCFY/edit#gid=589899771

Flags: needinfo?(acreskey) → needinfo?(esmyth)

Greg Mierzwinski [:sparky]

Assignee

Comment 4

•

5 years ago

Yes, those are the tests that are most likely to catch performance issues (regressions). The list of tests you've been testing with seem fine to me, I'd use those instead of the list I provided. For the first patch of this live site testing, I don't think we should have any more than 20 tests running - we can add more later.

Greg Mierzwinski [:sparky]

Assignee

Comment 5

•

5 years ago

Attached file Bug 1625470 - Part 1: Add --live-sites command line option to Raptor. r?#perftest — Details

This is the first patch of a multi-part patch to enable live site testing on Google Chrome for Android and Fenix. It adds the --live-sites command line option that disables the mozproxy playback (or enables live sites) when it is supplied.

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → gmierz2

Status: NEW → ASSIGNED

Greg Mierzwinski [:sparky]

Assignee

Comment 6

•

5 years ago

Attached file Bug 1625470 - Part 2: Add new live-site-only tests to Raptor-Browsertime. r?#perftest — Details

This patch adds some new live-site-only tests to Raptor-Browsertime, these will run on Fenix, & Chrome in CI but are enabled for all mobile browsers. The live sites reside in a new live folder since they can be used on desktop and mobile if needed.

Depends on D69051

Greg Mierzwinski [:sparky]

Assignee

Comment 7

•

5 years ago

Attached file Bug 1625470 - Part 3: Add CI tasks for Chrome and Fenix live site tests. r?#perftest — Details

This patch adds the new live site tests as Raptor-Browsertime tasks in CI. These will be scheduled to run through the general-perf-testing cron task on Monday/Wednesday/Friday.

Depends on D69052

Greg Mierzwinski [:sparky]

Assignee

Comment 8

•

5 years ago

Here's a try run that runs the test list you gave as live sites: https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=f64048cd4f010693cc4547afe9c9009657956fd7

Tarek Ziadé (:tarek)

Comment 9

•

5 years ago

+1 to Andrew's suggestions

Flags: needinfo?(tarek)

esmyth

Comment 10

•

5 years ago

+1 to Andrew's suggestions

Flags: needinfo?(esmyth)

Greg Mierzwinski [:sparky]

Assignee

Comment 11

•

5 years ago

Attached file Possibly problematic pages — Details

So I've updated the patches and they are ready for review. I ran the live site tests today and looked at the videos of the pages they produced.

These are the pages I've identified as potentially problematic:

360-cn: Has a carouselle that moves at the end of the test (lots of variability in this for visual metrics, I attached two videos to show this).
booking-sf: Has a web-app pop-up above the page.
discord: Doesn't seem to be testing it much (full page webapp pop-up?).
expedia: Has a web-app pop-up above the page and large travel advisory due to covid pandemic.
google-search: Animation is going to cause variability.
jianshu: Web-app full-page popup? Same sort of thing as discord.
ranker: Carouselle that will likely fail like 360-cn has.
reddit-thread: Web-app popup that significantly changes (grays) the pixels.
wikia-marvel: Web-app popup at the top.

I've attached an archive containing videos for these pages. :esymth/:acreskey what do you think about these issues?

Greg Mierzwinski [:sparky]

Assignee

Updated

•

5 years ago

Flags: needinfo?(esmyth)

Flags: needinfo?(acreskey)

Andrew Creskey [:acreskey]

Comment 12

•

5 years ago

Thank you for collecting all the potential problem sites, Greg.
Thoughts on each below:

(In reply to Greg Mierzwinski [:sparky] from comment #11)

360-cn: Has a carouselle that moves at the end of the test (lots of variability in this for visual metrics, I attached two videos to show this).

That's not great, it's ~1/3 of the screen and moving.
I recommend removal of this site.

booking-sf: Has a web-app pop-up above the page.

This is an ad for the native app, but it's part of the html content.
I'm OK with this, since it is content that was loaded and rendered by Gecko.

discord: Doesn't seem to be testing it much (full page webapp pop-up?).

Also Ok with this since it is html content.

expedia: Has a web-app pop-up above the page and large travel advisory due to covid pandemic.

I think it's OK - the test does capture the slow loadtime and it's all html.

google-search: Animation is going to cause variability.

That's true, although presumably it should be the same for all browsers.
Also, in general google doesn't run animations on the home page, so this should go away over time.

jianshu: Web-app full-page popup? Same sort of thing as discord.

I think this is OK.

ranker: Carouselle that will likely fail like 360-cn has.

That looks problematic. I recommend removal of this site.

reddit-thread: Web-app popup that significantly changes (grays) the pixels.

This is web content as well and should render the same on all browsers.

wikia-marvel: Web-app popup at the top.

I actually like this one - the ad for the native app causes a layout shift which should be reflected in the vis mets.

Flags: needinfo?(acreskey)

Greg Mierzwinski [:sparky]

Assignee

Comment 13

•

5 years ago

Thanks :acreskey, so we should remove 360-cn, and ranker. An alternative is that we disable visual-metric tests for those two sites so we can still get technical metrics if that's desirable.

Would anyone have two other sites that we might want to test?

Andrew Creskey [:acreskey]

Comment 14

•

5 years ago

Greg, my preference would be to remove them and find sites that we can collect full metrics from.

These are some of the next entries from the tp6m list (sorry, I haven't screened them for potential vismet issues)

https://medium.com/s/coincidences-are-a-lie/could-america-have-also-been-the-birthplace-of-impressionism-cb3d31a2e22d
https://people.com/amp-stories/royal-a-to-z
https://rumble.com/v3c44t-foxes-jumping-on-my-trampoline.html
https://sbnation.com/2018/2/12/16719476/college-football-playoff-rankings-explained
https://tripadvisor.com/Hotels-g187147-Paris_Ile_de_France-Hotels.html
https://urbandictionary.com/define.php?term=awesome%20sauce

Greg Mierzwinski [:sparky]

Assignee

Comment 15

•

5 years ago

Sounds good to me. No problem, it's simple for me to test these out in CI with the patches. I'll post what I find from them.

Greg Mierzwinski [:sparky]

Assignee

Updated

•

5 years ago

Depends on: 1630048

Greg Mierzwinski [:sparky]

Assignee

Comment 16

•

5 years ago

:acreskey, I tested out all the pages you suggested here (what a green try push for live sites by the way!): https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=3042d316cf05cfbf888530c80ae96892c4860c47

The only one that I found that might be problematic is the tripadvisor-paris test which seems to intermittently open a calendar and the amount of time and where the calendar is shown changes as well. I've attached a few videos of this issue.

The other pages look great though, so this is the test list that we would go with. It's 23 test pages, which is a little over what I was hoping for but given that we will only run these tests 3 times a week, I think it will be fine:

        - amazon-search
        - bbc
        - booking-sf
        - cnn-ampstories
        - discord
        - espn
        - expedia
        - facebook-cristiano
        - fashionbeans
        - google
        - google-accounts
        - imdb-firefox
        - jianshu
        - medium-article
        - microsoft-support
        - nytimes
        - people-article
        - reddit-thread
        - rumble-fox
        - stackoverflow-question
        - urbandictionary-define
        - wikia-marvel
        - youtube-watch

This page doesn't exist anymore and I couldn't find it so I didn't add it to the tests: https://sbnation.com/2018/2/12/16719476/college-football-playoff-rankings-explained

One final question is, should these run in both warm and cold, or would having only one of those variants be sufficient?

Flags: needinfo?(acreskey)

Greg Mierzwinski [:sparky]

Assignee

Comment 17

•

5 years ago

Attached file possibly-problematic.7z — Details

Contains the problematic recordings.

Andrew Creskey [:acreskey]

Comment 18

•

5 years ago

Greg - nice work.

I agree about removing the tripadvisor site that seems to alternate content.

Since it's still not known if we can do reasonably stable live sites in CI, I suggest starting with just cold page loads.

I noticed that some of the vismet jobs give visual metrics,
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=3042d316cf05cfbf888530c80ae96892c4860c47&selectedJob=297965782

But others only show a fetch_content opt: 2.89 result?
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=3042d316cf05cfbf888530c80ae96892c4860c47&selectedJob=297966085

Flags: needinfo?(acreskey)

Greg Mierzwinski [:sparky]

Assignee

Comment 19

•

5 years ago

Sounds good, I'll restrict it to cold only.

Yes, I'm not entirely sure why that's happening, the schema validation passed on the machines, but treeherder seems to be failing to validate it for some reason. I've filed a bug about this issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1630714

Greg Mierzwinski [:sparky]

Assignee

Comment 20

•

5 years ago

Try run for the latest patches: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6d0a6e26cbc4e67bb50304cd5c7ab9cf89e78172

Pulsebot

Comment 21

•

5 years ago

Pushed by gmierz2@outlook.com: https://hg.mozilla.org/integration/autoland/rev/baadcf329af6 Part 1: Add --live-sites command line option to Raptor. r=perftest-reviewers,Bebe,AlexandruIonescu https://hg.mozilla.org/integration/autoland/rev/56559ad3456d Part 2: Add new live-site-only tests to Raptor-Browsertime. r=perftest-reviewers,AlexandruIonescu https://hg.mozilla.org/integration/autoland/rev/cd37e7afa17a Part 3: Add CI tasks for Chrome and Fenix live site tests. r=perftest-reviewers,Bebe

Cosmin Sabou [:CosminS]

Comment 22

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/baadcf329af6
https://hg.mozilla.org/mozilla-central/rev/56559ad3456d
https://hg.mozilla.org/mozilla-central/rev/cd37e7afa17a

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox77: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla77

Greg Mierzwinski [:sparky]

Assignee

Updated

•

5 years ago

Regressions: 1632865

Greg Mierzwinski [:sparky]

Assignee

Updated

•

5 years ago

Regressions: 1632993

Greg Mierzwinski [:sparky]

Assignee

Updated

•

5 years ago

Blocks: 1634184

You need to log in before you can comment on or make changes to this bug.