Add Chrome & Fenix live site tests in CI
Categories
(Testing :: Raptor, task, P2)
Tracking
(firefox77 fixed)
Tracking | Status | |
---|---|---|
firefox77 | --- | fixed |
People
(Reporter: sparky, Assigned: sparky, NeedInfo)
References
Details
Attachments
(5 files)
This bug is for enabling Chrome and Fenix tests in CI. This depends on finding the tier 1 perf tests.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
We'll use this list of page load tests here found from here:
- espn
- amazon-search
- google-slides
- twitch
- fandom
- tumblr
- yandex
- wikia
- wikipedia
- bing
Assignee | ||
Comment 2•5 years ago
•
|
||
:acreskey, :tarek, what do you think about this list of sites? Are there any that sites we should test that aren't listed and are there any sites that are listed but shouldn't be tested?
EDIT: Note that some of these tests don't currently exist for mobile, so they would be added (they exist for desktop though).
Comment 3•5 years ago
|
||
Greg, the sites from Bug 1591466 that you listed in Comment 1 are those that caught a lot of regressions in raptor.
For comparing Fenix to Chrome, I suspect that we will want sites prioritized by popularity. (Adding Eric)
We will have to see the videos for Chrome, but we will also have to prune any that trigger progressive web app pop-ups (I think only wikipedia from that list).
To improve stability, we also favored sites that linked to a subresource, for instance:
https://www.bbc.com/news/business-47245877
instead of
https://www.bbc.com
This is what we've been using for the Fennec/Chrome/Fenix comparisons:
https://docs.google.com/spreadsheets/d/18qCiz3SReDgDPwhbYfuDrbnBK1030FuVWGBHWwdgCFY/edit#gid=589899771
Assignee | ||
Comment 4•5 years ago
|
||
Yes, those are the tests that are most likely to catch performance issues (regressions). The list of tests you've been testing with seem fine to me, I'd use those instead of the list I provided. For the first patch of this live site testing, I don't think we should have any more than 20 tests running - we can add more later.
Assignee | ||
Comment 5•5 years ago
|
||
This is the first patch of a multi-part patch to enable live site testing on Google Chrome for Android and Fenix. It adds the --live-sites
command line option that disables the mozproxy playback (or enables live sites) when it is supplied.
Updated•5 years ago
|
Assignee | ||
Comment 6•5 years ago
|
||
This patch adds some new live-site-only tests to Raptor-Browsertime, these will run on Fenix, & Chrome in CI but are enabled for all mobile browsers. The live sites reside in a new live
folder since they can be used on desktop and mobile if needed.
Depends on D69051
Assignee | ||
Comment 7•5 years ago
|
||
This patch adds the new live site tests as Raptor-Browsertime tasks in CI. These will be scheduled to run through the general-perf-testing cron task on Monday/Wednesday/Friday.
Depends on D69052
Assignee | ||
Comment 8•5 years ago
|
||
Here's a try run that runs the test list you gave as live sites: https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=f64048cd4f010693cc4547afe9c9009657956fd7
Assignee | ||
Comment 11•5 years ago
|
||
So I've updated the patches and they are ready for review. I ran the live site tests today and looked at the videos of the pages they produced.
These are the pages I've identified as potentially problematic:
- 360-cn: Has a carouselle that moves at the end of the test (lots of variability in this for visual metrics, I attached two videos to show this).
- booking-sf: Has a web-app pop-up above the page.
- discord: Doesn't seem to be testing it much (full page webapp pop-up?).
- expedia: Has a web-app pop-up above the page and large travel advisory due to covid pandemic.
- google-search: Animation is going to cause variability.
- jianshu: Web-app full-page popup? Same sort of thing as discord.
- ranker: Carouselle that will likely fail like 360-cn has.
- reddit-thread: Web-app popup that significantly changes (grays) the pixels.
- wikia-marvel: Web-app popup at the top.
I've attached an archive containing videos for these pages. :esymth/:acreskey what do you think about these issues?
Assignee | ||
Updated•5 years ago
|
Comment 12•5 years ago
|
||
Thank you for collecting all the potential problem sites, Greg.
Thoughts on each below:
(In reply to Greg Mierzwinski [:sparky] from comment #11)
- 360-cn: Has a carouselle that moves at the end of the test (lots of variability in this for visual metrics, I attached two videos to show this).
That's not great, it's ~1/3 of the screen and moving.
I recommend removal of this site.
- booking-sf: Has a web-app pop-up above the page.
This is an ad for the native app, but it's part of the html content.
I'm OK with this, since it is content that was loaded and rendered by Gecko.
- discord: Doesn't seem to be testing it much (full page webapp pop-up?).
Also Ok with this since it is html content.
- expedia: Has a web-app pop-up above the page and large travel advisory due to covid pandemic.
I think it's OK - the test does capture the slow loadtime and it's all html.
- google-search: Animation is going to cause variability.
That's true, although presumably it should be the same for all browsers.
Also, in general google doesn't run animations on the home page, so this should go away over time.
- jianshu: Web-app full-page popup? Same sort of thing as discord.
I think this is OK.
- ranker: Carouselle that will likely fail like 360-cn has.
That looks problematic. I recommend removal of this site.
- reddit-thread: Web-app popup that significantly changes (grays) the pixels.
This is web content as well and should render the same on all browsers.
- wikia-marvel: Web-app popup at the top.
I actually like this one - the ad for the native app causes a layout shift which should be reflected in the vis mets.
Assignee | ||
Comment 13•5 years ago
|
||
Thanks :acreskey, so we should remove 360-cn, and ranker. An alternative is that we disable visual-metric tests for those two sites so we can still get technical metrics if that's desirable.
Would anyone have two other sites that we might want to test?
Comment 14•5 years ago
|
||
Greg, my preference would be to remove them and find sites that we can collect full metrics from.
These are some of the next entries from the tp6m list (sorry, I haven't screened them for potential vismet issues)
https://medium.com/s/coincidences-are-a-lie/could-america-have-also-been-the-birthplace-of-impressionism-cb3d31a2e22d
https://people.com/amp-stories/royal-a-to-z
https://rumble.com/v3c44t-foxes-jumping-on-my-trampoline.html
https://sbnation.com/2018/2/12/16719476/college-football-playoff-rankings-explained
https://tripadvisor.com/Hotels-g187147-Paris_Ile_de_France-Hotels.html
https://urbandictionary.com/define.php?term=awesome%20sauce
Assignee | ||
Comment 15•5 years ago
|
||
Sounds good to me. No problem, it's simple for me to test these out in CI with the patches. I'll post what I find from them.
Assignee | ||
Comment 16•5 years ago
|
||
:acreskey, I tested out all the pages you suggested here (what a green try push for live sites by the way!): https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=3042d316cf05cfbf888530c80ae96892c4860c47
The only one that I found that might be problematic is the tripadvisor-paris test which seems to intermittently open a calendar and the amount of time and where the calendar is shown changes as well. I've attached a few videos of this issue.
The other pages look great though, so this is the test list that we would go with. It's 23 test pages, which is a little over what I was hoping for but given that we will only run these tests 3 times a week, I think it will be fine:
- amazon-search
- bbc
- booking-sf
- cnn-ampstories
- discord
- espn
- expedia
- facebook-cristiano
- fashionbeans
- google
- google-accounts
- imdb-firefox
- jianshu
- medium-article
- microsoft-support
- nytimes
- people-article
- reddit-thread
- rumble-fox
- stackoverflow-question
- urbandictionary-define
- wikia-marvel
- youtube-watch
This page doesn't exist anymore and I couldn't find it so I didn't add it to the tests: https://sbnation.com/2018/2/12/16719476/college-football-playoff-rankings-explained
One final question is, should these run in both warm and cold, or would having only one of those variants be sufficient?
Assignee | ||
Comment 17•5 years ago
|
||
Contains the problematic recordings.
Comment 18•5 years ago
|
||
Greg - nice work.
I agree about removing the tripadvisor site that seems to alternate content.
Since it's still not known if we can do reasonably stable live sites in CI, I suggest starting with just cold page loads.
I noticed that some of the vismet
jobs give visual metrics,
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=3042d316cf05cfbf888530c80ae96892c4860c47&selectedJob=297965782
But others only show a fetch_content opt: 2.89
result?
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=3042d316cf05cfbf888530c80ae96892c4860c47&selectedJob=297966085
Assignee | ||
Comment 19•5 years ago
|
||
Sounds good, I'll restrict it to cold only.
Yes, I'm not entirely sure why that's happening, the schema validation passed on the machines, but treeherder seems to be failing to validate it for some reason. I've filed a bug about this issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1630714
Assignee | ||
Comment 20•5 years ago
|
||
Try run for the latest patches: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6d0a6e26cbc4e67bb50304cd5c7ab9cf89e78172
Comment 21•5 years ago
|
||
Comment 22•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/baadcf329af6
https://hg.mozilla.org/mozilla-central/rev/56559ad3456d
https://hg.mozilla.org/mozilla-central/rev/cd37e7afa17a
Description
•