Closed Bug 1548845 Opened 5 years ago Closed 5 years ago

Integrate Youtube video playback performance suite as Raptor benchmark test

Tracking

(firefox68 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla68

Tracking Flags:

Tracking

Status

firefox68

---

fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

Attachments

(7 files)

Bug 1548845 - [raptor] Integrate Youtube video playback performance suite as benchmark test. 5 years ago Henrik Skupin [:whimboo][⌚️UTC+2] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1548845 - [raptor] Enable Youtube Playback benchmark tests in CI. #perftest 5 years ago Henrik Skupin [:whimboo][⌚️UTC+2] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1548845 - [raptor] Use a multiplier for page timeout when using live sites. r=#perftest 5 years ago Henrik Skupin [:whimboo][⌚️UTC+2] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1548845 - [raptor] Allow tests to specify the alertChangeType. r=#perftest 5 years ago Henrik Skupin [:whimboo][⌚️UTC+2] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1548845 - [raptor] Fix local import of filter module. r=#perftest 5 years ago Henrik Skupin [:whimboo][⌚️UTC+2] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1548845 - [raptor] Log discarded tests in filter_live_sites. r=#perftest 5 years ago Henrik Skupin [:whimboo][⌚️UTC+2] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1548845 - [raptor] Don't filter-out tests which are white-listed for "use_live_sites". r=#perftest 5 years ago Henrik Skupin [:whimboo][⌚️UTC+2] 47 bytes, text/x-phabricator-request		Details \| Review

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Description

•

5 years ago

Originally intended the tests should be run as a video streaming playback test and as such I worked on bug 1539111, but as figured out those tests better fit into the benchmark suite.

There is a copy of the tests in our own infra temporarily at:
http://yttest.dev.mozaws.net/2019/main.html?test_type=playbackperf-test

How do we want to handle those tests:

Don't run them by a local checkout as what we do for other benchmarks, but from the above mirror site as live tests
Run all the tests of the suite once (each will be played for 15s)
Track each of the ~100 tests separately in Perfherder as subtests
Subtests have measurements for dropped frames and total frames
Top level score is % of test cases passing, wchin can be alerted on and reported into health dashboard; a failure is > 0 dropped frames

The target is 100% passing.

Once the benchmark landed on m-c we will have to figure out on which specific platforms we want to run this kind of test. Chris Pearce might help us with that later.

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

5 years ago

Depends on: 1547717

Robert Wood [:rwood]

Updated

•

5 years ago

Priority: -- → P1

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

5 years ago

Depends on: 1549557

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 1

•

5 years ago

To integrate those tests in CI I wonder on which platforms we actually want to run those tests. When I started to work on that about a month ago we primarily focused FireTV stick 4k (not doable in CI yet due to missing hardware), and Windows ARM64.

Since then some priorities have been changed but actually shouldn't cause changes for our testing matrix. As such it would be good to know which other platforms would benefit from such results. My idea would be to start running those tests for shippable (Nightly) builds on win64, win32, linux64, and MacOS.

For mobile we might consider geckoview based applications, but those would only work once bug 1547717 got fixed.

Chris, do you have any feedback in regards of platforms to run this test suite on?

Flags: needinfo?(cpearce)

Chris Pearce [:cpearce (Not reading bugmail)]

Comment 2

•

5 years ago

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #1)

Chris, do you have any feedback in regards of platforms to run this test suite on?

As you suggested, win64, win32, linux64, and MacOS, plus some variant of GeckoView when it's available sound reasonable to me.

Flags: needinfo?(cpearce)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 3

•

5 years ago

Perfect. So lets do it. I just pushed a very first WIP to try. Lets see how and if it works:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c50a8779f3f2520a0c14a318f557b29b7445d91

Assignee: nobody → hskupin

Status: NEW → ASSIGNED

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 4

•

5 years ago

Here are the first results of the benchmark in perfherder:

x-platform graph (without linux which has busted builds):

https://treeherder.mozilla.org/perf.html#/graphs?series=try,2009309,1,10&series=try,2009305,1,10&series=try,2009297,1,10&selected=try,2009309,470806,804236841,10

We have a very poor playback quality on MacOS, but that's something we already know. Most likely due to bug 1400787. The following results for all the contained videos shows that:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&newProject=try&originalRevision=9c50a8779f3f2520a0c14a318f557b29b7445d91&newRevision=8b80929cca79b03383726fe97a22624a1061ed60&originalSignature=2009309&newSignature=2009309&framework=10

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 5

•

5 years ago

Attached file Bug 1548845 - [raptor] Integrate Youtube video playback performance suite as benchmark test. — Details

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 6

•

5 years ago

Attached file Bug 1548845 - [raptor] Enable Youtube Playback benchmark tests in CI. #perftest — Details

Depends on D30483

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 7

•

5 years ago

With the new try build the profiling jobs fail, also the normal Raptor job on aarch64:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c5ad1d21436f36d6428348365c4f97beb9a71d7&selectedJob=245418969

The problem with profiling should be that the generated profiles are simply too large and take a long time to get sent to the Python harness.

What I will do next is to figure out why we are failing on aarch64.

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 8

•

5 years ago

So I accidentally set too short max-runtime settings for the jobs in CI when I just copied/pasted entries from other raptor jobs. As such the profiling had only 15min, and the rest 30min. This is clearly too short.

Here a new try build with way higher values so that we can see how long it actually takes:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4701975dfc42466a2fd1c7c8c7e09a2f00962821

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 9

•

5 years ago

Attached file Bug 1548845 - [raptor] Use a multiplier for page timeout when using live sites. r=#perftest — Details

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 10

•

5 years ago

Attached file Bug 1548845 - [raptor] Allow tests to specify the alertChangeType. r=#perftest — Details

Depends on D30530

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 11

•

5 years ago

Attached file Bug 1548845 - [raptor] Fix local import of filter module. r=#perftest — Details

To not collide with the built-in "filter" method, the local
filter module should be named as filters.

Depends on D30531

Phabricator Automation

Updated

•

5 years ago

Attachment #9063765 - Attachment description: Bug 1548845 - [raptor] Enable Youtube Playback benchmark tests in CI. → Bug 1548845 - [raptor] Enable Youtube Playback benchmark tests in CI. #perftest

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

5 years ago

Depends on: 1547932

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 12

•

5 years ago

Retrieving the profile actually takes ages, which causes all the tests to timeout:

14:04:41 INFO - raptor-control-server received webext_status: retrieving gecko profile
14:21:11 INFO - raptor-main application timed out after 2583 seconds

With that behavior we cannot have the profiling job enabled by default.

I will see if I can improve the performance when transferring the data.

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 13

•

5 years ago

By default the raptor job take about 27 minutes, except for aarch64 which needs 36 minutes. As such a job timeout of 45 minutes seems to be fine.

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

5 years ago

Depends on: 1550702

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 14

•

5 years ago

I would suggest that we don't hard-block this benchmark suite from landing on the needed changes for gecko profile data handling (bug 1550702). We can add the profiler jobs once that's done.

No longer depends on: 1550702

Comment 15

•

5 years ago

As just agreed in the streaming meeting we don't want to run Raptor profiling jobs by default due to the size of data to be collected. Instead everyone could run this locally, or push a reduced set of tests to try. This will all be documented.

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 16

•

5 years ago

I pushed a couple of try builds over the weekend:

https://treeherder.mozilla.org/perf.html#/graphs?series=try,2009309,1,10&series=try,2010930,1,10&series=try,2009297,1,10&series=try,2009939,1,10&series=try,2009305,1,10&selected=try,2010930,473206,807130796,10

As you can see the noise is very high, especially for aarch64. It means that using the mean of all dropped frames might maybe not possible (yet) on at least those affected platforms?

I'm not that eager to see all the alerts coming up for sheriffing. :/

Flags: needinfo?(igoldan)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 17

•

5 years ago

Also as talked with Paul on IRC and in the streaming meeting on Friday the aarch64 devices have a driver bug (see bug 1548410) and as such we currently fallback to software decoding. Similar what we also do on MacOS.

I would suggest to disable alerts on those platforms, and only keep them enabled for those which make use of hardware decoding. Those are Linux, and Windows.

Ionut, what the best place to disable alerts? Does it have to be part of the perfherder data as send by Raptor, or can this enabled/disabled via perfherder?

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 18

•

5 years ago

I just tested this patch with an Android device and there is a problem loading the remote page due to "ERROR_PROXY_CONNECTION_REFUSED`. Maybe I should really fix the proxy issue before landing this patch at all.

Ionuț Goldan [:igoldan]

Comment 19

•

5 years ago

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #16)

I pushed a couple of try builds over the weekend:

https://treeherder.mozilla.org/perf.html#/graphs?series=try,2009309,1,10&series=try,2010930,1,10&series=try,2009297,1,10&series=try,2009939,1,10&series=try,2009305,1,10&selected=try,2010930,473206,807130796,10

As you can see the noise is very high, especially for aarch64. It means that using the mean of all dropped frames might maybe not possible (yet) on at least those affected platforms?

I'm not that eager to see all the alerts coming up for sheriffing. :/

Yes, you are right. The noise levels are quite high. Not suited for being sheriffed.

Ionuț Goldan [:igoldan]

Comment 20

•

5 years ago

•

Edited

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #17)

Ionut, what the best place to disable alerts? Does it have to be part of the perfherder data as send by Raptor, or can this enabled/disabled via perfherder?

Perfherder only shows what should be alerted. The tests are the ones that decide that, inside the PERFHERDER_DATA they dump, via the should_alert, subtest_should_alert fields.

Flags: needinfo?(igoldan)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 21

•

5 years ago

To integrate disabling of alerts based on the platform we need the oskey as avaialable in raptor.py also in this method:

https://searchfox.org/mozilla-central/rev/94c6b5f06d2464f6780a52f32e917d25ddc30d6b/testing/raptor/raptor/output.py#35

Rob, I could get this down the full stack from Raptor through the result handler, and the output class. Or I just request it here again via mozinfo. What would be your preferred solution? Thanks!

Flags: needinfo?(rwood)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 22

•

5 years ago

We no longer need bug 1549557 given that we always operate on lower is better now.

No longer depends on: 1549557

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 23

•

5 years ago

Also this benchmark test will only be run on mozilla-central for the time being. It means that no alerting will be active, and as such the work to disable alerts on MacOS and aarch64 can be moved to a follow-up bug.

We will enable the tests on integration and beta branches only if the noise is low.

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

5 years ago

Blocks: 1552439

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 24

•

5 years ago

I moved the needinfo from Rob forward to bug 1552439.

Here hopefully the final try build of my patch series. If all goes well, we can land it today!

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a307b8b3fef4e67f0fca94119421a05d662ac9d6

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

5 years ago

Flags: needinfo?(rwood)

Pulsebot

Comment 25

•

5 years ago

Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6b3a8394727f [raptor] Use a multiplier for page timeout when using live sites. r=perftest-reviewers,rwood https://hg.mozilla.org/integration/autoland/rev/152615db9db6 [raptor] Allow tests to specify the alertChangeType. r=perftest-reviewers,rwood https://hg.mozilla.org/integration/autoland/rev/a2544ca8c593 [raptor] Fix local import of filter module. r=perftest-reviewers,rwood https://hg.mozilla.org/integration/autoland/rev/609f489bdc8c [raptor] Integrate Youtube video playback performance suite as benchmark test. r=perftest-reviewers,rwood https://hg.mozilla.org/integration/autoland/rev/934d2f88195d [raptor] Enable Youtube Playback benchmark tests in CI. #perftest r=perftest-reviewers,stephendonner

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

5 years ago

Blocks: 1552484

Alexandru Michis [:malexandru]

Comment 26

•

5 years ago

Backed out 5 changesets (Bug 1548845) for failing new youtube playback raptor tests.

Backout: https://hg.mozilla.org/integration/autoland/rev/f72947acdfcd662c26a8e84efac58e703b2ce2ec

Push that started the failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=934d2f88195de26cc114451e6511613d27f997aa&selectedJob=247036985

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=247036985&repo=autoland&lineNumber=528

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 27

•

5 years ago

First I have to blame myself in not disabling those jobs on integration and beta branches. Not sure how this could slip through. I will fix it and clearly ask for review from Rob.

But what I totally not understand are the following lines:

13:53:45 INFO - raptor-manifest abort: specified test name doesn't exist
13:53:45 INFO - raptor-main abort: no tests found

The test job was running properly on try this morning. So not sure why it cannot find the file anymore.

Rob, do you have an idea what's going wrong here? For me https://hg.mozilla.org/integration/autoland/rev/609f489bdc8c contains everything what is needed.

Flags: needinfo?(hskupin) → needinfo?(rwood)

Robert Wood [:rwood]

Comment 28

•

5 years ago

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #27)

Rob, do you have an idea what's going wrong here? For me https://hg.mozilla.org/integration/autoland/rev/609f489bdc8c contains everything what is needed.

Hey Henrik,

Yes what has happened there is the Raptor 'fail-safe' to ensure that tests with 'use_lives_sites' enabled don't run on any repos except 'try' (or locally of course). This was to ensure that we don't accidently add any data from tp6 page load with live sites to perfherder, because we don't want to have automated regression detection for live page load sites because of the data noise.

The 'fail safe' will filter out any tests with live sites enabled if not running locally or on 'try', see [0]. Since we don't do automated regression detection on mozilla-central, if you like you could update this filter to allow running live sites on mozilla-central (that was an oversight on my part). Thanks, sorry for the hassle here.

[0] https://searchfox.org/mozilla-central/rev/0078b9e7d42c366b102d7aec918caf64fed1d574/testing/raptor/raptor/manifest.py#42

Flags: needinfo?(rwood)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 29

•

5 years ago

Oh! That would totally explain it. Sadly the filter doesn't log anything when it discards a test. As discussed with Rob on IRC I will just add an INFO line.

Also we want to have a whitelist of tests which are allowed to run as live tests on integration and beta branches. But those should always be at maximum tier-2 but not tier-1. So until we found a solution for those video tests they will have to be kept running as tier-2.

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 30

•

5 years ago

Attached file Bug 1548845 - [raptor] Log discarded tests in filter_live_sites. r=#perftest — Details

Depends on D30532

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 31

•

5 years ago

Attached file Bug 1548845 - [raptor] Don't filter-out tests which are white-listed for "use_live_sites". r=#perftest — Details

Depends on D31681

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 32

•

5 years ago

New try build with the latest additions included:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4904bb60de0464748516d6705524b6263f909a0

Pulsebot

Comment 33

•

5 years ago

Pushed by rwood@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7dc6dbc72a41 [raptor] Use a multiplier for page timeout when using live sites. r=perftest-reviewers,rwood https://hg.mozilla.org/integration/autoland/rev/3f21ab675585 [raptor] Allow tests to specify the alertChangeType. r=perftest-reviewers,rwood https://hg.mozilla.org/integration/autoland/rev/8ac4327262ce [raptor] Fix local import of filter module. r=perftest-reviewers,rwood https://hg.mozilla.org/integration/autoland/rev/c66e71b65b55 [raptor] Log discarded tests in filter_live_sites. r=perftest-reviewers,rwood https://hg.mozilla.org/integration/autoland/rev/6977ccf2ff65 [raptor] Don't filter-out tests which are white-listed for "use_live_sites". r=perftest-reviewers,stephendonner,rwood

Dorel Luca [:dluca]

Comment 34

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7dc6dbc72a41
https://hg.mozilla.org/mozilla-central/rev/3f21ab675585
https://hg.mozilla.org/mozilla-central/rev/8ac4327262ce
https://hg.mozilla.org/mozilla-central/rev/c66e71b65b55
https://hg.mozilla.org/mozilla-central/rev/6977ccf2ff65

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox68: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla68

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

5 years ago

Blocks: 1552738

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Comment 35

•

5 years ago

Rob, when landing those patches you actually didn't land all of them because you haven't been at the top-most revision when clicking the Lando button. It means the two most important patches are missing, and I'm going to land those now.

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

5 years ago

Flags: needinfo?(rwood)

Henrik Skupin [:whimboo][⌚️UTC+2]

Assignee

Updated

•

5 years ago

Status: RESOLVED → REOPENED

status-firefox68: fixed → ---

Resolution: FIXED → ---

Target Milestone: mozilla68 → ---

Pulsebot

Comment 36

•

5 years ago

Pushed by hskupin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a757325b7690 [raptor] Integrate Youtube video playback performance suite as benchmark test. r=perftest-reviewers,rwood https://hg.mozilla.org/integration/autoland/rev/0a796fa7c16f [raptor] Enable Youtube Playback benchmark tests in CI. #perftest r=perftest-reviewers,stephendonner,rwood

Alexandru Michis [:malexandru]

Comment 37

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a757325b7690
https://hg.mozilla.org/mozilla-central/rev/0a796fa7c16f

Status: REOPENED → RESOLVED

Closed: 5 years ago → 5 years ago

status-firefox68: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla68

Robert Wood [:rwood]

Comment 38

•

5 years ago

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #35)

Rob, when landing those patches you actually didn't land all of them because you haven't been at the top-most revision when clicking the Lando button. It means the two most important patches are missing, and I'm going to land those now.

Ugh, apologies, didn't realize that was even possible - I'll watch that in the future thanks for pointing that out!

Flags: needinfo?(rwood)

Asif Youssuff

Updated

•

5 years ago

Regressions: 1567010

Asif Youssuff

Updated

•

5 years ago

No longer regressions: 1567010

You need to log in before you can comment on or make changes to this bug.