Closed Bug 1625470 Opened 5 years ago Closed 5 years ago

Add Chrome & Fenix live site tests in CI

Categories

(Testing :: Raptor, task, P2)

Version 3
task

Tracking

(firefox77 fixed)

RESOLVED FIXED
mozilla77
Tracking Status
firefox77 --- fixed

People

(Reporter: sparky, Assigned: sparky, NeedInfo)

References

Details

Attachments

(5 files)

This bug is for enabling Chrome and Fenix tests in CI. This depends on finding the tier 1 perf tests.

Priority: -- → P2

We'll use this list of page load tests here found from here:

  1. espn
  2. amazon-search
  3. google-slides
  4. twitch
  5. fandom
  6. twitter
  7. facebook
  8. tumblr
  9. yandex
  10. wikia
  11. wikipedia
  12. bing

:acreskey, :tarek, what do you think about this list of sites? Are there any that sites we should test that aren't listed and are there any sites that are listed but shouldn't be tested?

EDIT: Note that some of these tests don't currently exist for mobile, so they would be added (they exist for desktop though).

Flags: needinfo?(tarek)
Flags: needinfo?(acreskey)

Greg, the sites from Bug 1591466 that you listed in Comment 1 are those that caught a lot of regressions in raptor.

For comparing Fenix to Chrome, I suspect that we will want sites prioritized by popularity. (Adding Eric)

We will have to see the videos for Chrome, but we will also have to prune any that trigger progressive web app pop-ups (I think only wikipedia from that list).

To improve stability, we also favored sites that linked to a subresource, for instance:
https://www.bbc.com/news/business-47245877
instead of
https://www.bbc.com

This is what we've been using for the Fennec/Chrome/Fenix comparisons:
https://docs.google.com/spreadsheets/d/18qCiz3SReDgDPwhbYfuDrbnBK1030FuVWGBHWwdgCFY/edit#gid=589899771

Flags: needinfo?(acreskey) → needinfo?(esmyth)

Yes, those are the tests that are most likely to catch performance issues (regressions). The list of tests you've been testing with seem fine to me, I'd use those instead of the list I provided. For the first patch of this live site testing, I don't think we should have any more than 20 tests running - we can add more later.

This is the first patch of a multi-part patch to enable live site testing on Google Chrome for Android and Fenix. It adds the --live-sites command line option that disables the mozproxy playback (or enables live sites) when it is supplied.

Assignee: nobody → gmierz2
Status: NEW → ASSIGNED

This patch adds some new live-site-only tests to Raptor-Browsertime, these will run on Fenix, & Chrome in CI but are enabled for all mobile browsers. The live sites reside in a new live folder since they can be used on desktop and mobile if needed.

Depends on D69051

This patch adds the new live site tests as Raptor-Browsertime tasks in CI. These will be scheduled to run through the general-perf-testing cron task on Monday/Wednesday/Friday.

Depends on D69052

+1 to Andrew's suggestions

Flags: needinfo?(tarek)

+1 to Andrew's suggestions

Flags: needinfo?(esmyth)

So I've updated the patches and they are ready for review. I ran the live site tests today and looked at the videos of the pages they produced.

These are the pages I've identified as potentially problematic:

  1. 360-cn: Has a carouselle that moves at the end of the test (lots of variability in this for visual metrics, I attached two videos to show this).
  2. booking-sf: Has a web-app pop-up above the page.
  3. discord: Doesn't seem to be testing it much (full page webapp pop-up?).
  4. expedia: Has a web-app pop-up above the page and large travel advisory due to covid pandemic.
  5. google-search: Animation is going to cause variability.
  6. jianshu: Web-app full-page popup? Same sort of thing as discord.
  7. ranker: Carouselle that will likely fail like 360-cn has.
  8. reddit-thread: Web-app popup that significantly changes (grays) the pixels.
  9. wikia-marvel: Web-app popup at the top.

I've attached an archive containing videos for these pages. :esymth/:acreskey what do you think about these issues?

Flags: needinfo?(esmyth)
Flags: needinfo?(acreskey)

Thank you for collecting all the potential problem sites, Greg.
Thoughts on each below:

(In reply to Greg Mierzwinski [:sparky] from comment #11)

  1. 360-cn: Has a carouselle that moves at the end of the test (lots of variability in this for visual metrics, I attached two videos to show this).

That's not great, it's ~1/3 of the screen and moving.
I recommend removal of this site.

  1. booking-sf: Has a web-app pop-up above the page.

This is an ad for the native app, but it's part of the html content.
I'm OK with this, since it is content that was loaded and rendered by Gecko.

  1. discord: Doesn't seem to be testing it much (full page webapp pop-up?).

Also Ok with this since it is html content.

  1. expedia: Has a web-app pop-up above the page and large travel advisory due to covid pandemic.

I think it's OK - the test does capture the slow loadtime and it's all html.

  1. google-search: Animation is going to cause variability.

That's true, although presumably it should be the same for all browsers.
Also, in general google doesn't run animations on the home page, so this should go away over time.

  1. jianshu: Web-app full-page popup? Same sort of thing as discord.

I think this is OK.

  1. ranker: Carouselle that will likely fail like 360-cn has.

That looks problematic. I recommend removal of this site.

  1. reddit-thread: Web-app popup that significantly changes (grays) the pixels.

This is web content as well and should render the same on all browsers.

  1. wikia-marvel: Web-app popup at the top.

I actually like this one - the ad for the native app causes a layout shift which should be reflected in the vis mets.

Flags: needinfo?(acreskey)

Thanks :acreskey, so we should remove 360-cn, and ranker. An alternative is that we disable visual-metric tests for those two sites so we can still get technical metrics if that's desirable.

Would anyone have two other sites that we might want to test?

Sounds good to me. No problem, it's simple for me to test these out in CI with the patches. I'll post what I find from them.

Depends on: 1630048

:acreskey, I tested out all the pages you suggested here (what a green try push for live sites by the way!): https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=3042d316cf05cfbf888530c80ae96892c4860c47

The only one that I found that might be problematic is the tripadvisor-paris test which seems to intermittently open a calendar and the amount of time and where the calendar is shown changes as well. I've attached a few videos of this issue.

The other pages look great though, so this is the test list that we would go with. It's 23 test pages, which is a little over what I was hoping for but given that we will only run these tests 3 times a week, I think it will be fine:

        - amazon-search
        - bbc
        - booking-sf
        - cnn-ampstories
        - discord
        - espn
        - expedia
        - facebook-cristiano
        - fashionbeans
        - google
        - google-accounts
        - imdb-firefox
        - jianshu
        - medium-article
        - microsoft-support
        - nytimes
        - people-article
        - reddit-thread
        - rumble-fox
        - stackoverflow-question
        - urbandictionary-define
        - wikia-marvel
        - youtube-watch

This page doesn't exist anymore and I couldn't find it so I didn't add it to the tests: https://sbnation.com/2018/2/12/16719476/college-football-playoff-rankings-explained

One final question is, should these run in both warm and cold, or would having only one of those variants be sufficient?

Flags: needinfo?(acreskey)

Contains the problematic recordings.

Greg - nice work.

I agree about removing the tripadvisor site that seems to alternate content.

Since it's still not known if we can do reasonably stable live sites in CI, I suggest starting with just cold page loads.

I noticed that some of the vismet jobs give visual metrics,
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=3042d316cf05cfbf888530c80ae96892c4860c47&selectedJob=297965782

But others only show a fetch_content opt: 2.89 result?
https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=3042d316cf05cfbf888530c80ae96892c4860c47&selectedJob=297966085

Flags: needinfo?(acreskey)

Sounds good, I'll restrict it to cold only.

Yes, I'm not entirely sure why that's happening, the schema validation passed on the machines, but treeherder seems to be failing to validate it for some reason. I've filed a bug about this issue: https://bugzilla.mozilla.org/show_bug.cgi?id=1630714

Pushed by gmierz2@outlook.com: https://hg.mozilla.org/integration/autoland/rev/baadcf329af6 Part 1: Add --live-sites command line option to Raptor. r=perftest-reviewers,Bebe,AlexandruIonescu https://hg.mozilla.org/integration/autoland/rev/56559ad3456d Part 2: Add new live-site-only tests to Raptor-Browsertime. r=perftest-reviewers,AlexandruIonescu https://hg.mozilla.org/integration/autoland/rev/cd37e7afa17a Part 3: Add CI tasks for Chrome and Fenix live site tests. r=perftest-reviewers,Bebe
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77
Regressions: 1632865
Regressions: 1632993
Blocks: 1634184
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: