Closed Bug 1548845 Opened 5 months ago Closed 5 months ago

Integrate Youtube video playback performance suite as Raptor benchmark test

Categories

(Testing :: Raptor, enhancement, P1)

Version 3
enhancement

Tracking

(firefox68 fixed)

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: whimboo, Assigned: whimboo)

References

(Depends on 1 open bug, Blocks 3 open bugs)

Details

Attachments

(7 files)

Originally intended the tests should be run as a video streaming playback test and as such I worked on bug 1539111, but as figured out those tests better fit into the benchmark suite.

There is a copy of the tests in our own infra temporarily at:
http://yttest.dev.mozaws.net/2019/main.html?test_type=playbackperf-test

How do we want to handle those tests:

  • Don't run them by a local checkout as what we do for other benchmarks, but from the above mirror site as live tests
  • Run all the tests of the suite once (each will be played for 15s)
  • Track each of the ~100 tests separately in Perfherder as subtests
  • Subtests have measurements for dropped frames and total frames
  • Top level score is % of test cases passing, wchin can be alerted on and reported into health dashboard; a failure is > 0 dropped frames

The target is 100% passing.

Once the benchmark landed on m-c we will have to figure out on which specific platforms we want to run this kind of test. Chris Pearce might help us with that later.

Depends on: 1547717
Priority: -- → P1
Depends on: 1549557

To integrate those tests in CI I wonder on which platforms we actually want to run those tests. When I started to work on that about a month ago we primarily focused FireTV stick 4k (not doable in CI yet due to missing hardware), and Windows ARM64.

Since then some priorities have been changed but actually shouldn't cause changes for our testing matrix. As such it would be good to know which other platforms would benefit from such results. My idea would be to start running those tests for shippable (Nightly) builds on win64, win32, linux64, and MacOS.

For mobile we might consider geckoview based applications, but those would only work once bug 1547717 got fixed.

Chris, do you have any feedback in regards of platforms to run this test suite on?

Flags: needinfo?(cpearce)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #1)

Chris, do you have any feedback in regards of platforms to run this test suite on?

As you suggested, win64, win32, linux64, and MacOS, plus some variant of GeckoView when it's available sound reasonable to me.

Flags: needinfo?(cpearce)

Perfect. So lets do it. I just pushed a very first WIP to try. Lets see how and if it works:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c50a8779f3f2520a0c14a318f557b29b7445d91

Assignee: nobody → hskupin
Status: NEW → ASSIGNED

With the new try build the profiling jobs fail, also the normal Raptor job on aarch64:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c5ad1d21436f36d6428348365c4f97beb9a71d7&selectedJob=245418969

The problem with profiling should be that the generated profiles are simply too large and take a long time to get sent to the Python harness.

What I will do next is to figure out why we are failing on aarch64.

So I accidentally set too short max-runtime settings for the jobs in CI when I just copied/pasted entries from other raptor jobs. As such the profiling had only 15min, and the rest 30min. This is clearly too short.

Here a new try build with way higher values so that we can see how long it actually takes:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4701975dfc42466a2fd1c7c8c7e09a2f00962821

To not collide with the built-in "filter" method, the local
filter module should be named as filters.

Depends on D30531

Attachment #9063765 - Attachment description: Bug 1548845 - [raptor] Enable Youtube Playback benchmark tests in CI. → Bug 1548845 - [raptor] Enable Youtube Playback benchmark tests in CI. #perftest
Depends on: 1547932

Retrieving the profile actually takes ages, which causes all the tests to timeout:

14:04:41 INFO - raptor-control-server received webext_status: retrieving gecko profile
14:21:11 INFO - raptor-main application timed out after 2583 seconds

With that behavior we cannot have the profiling job enabled by default.

I will see if I can improve the performance when transferring the data.

By default the raptor job take about 27 minutes, except for aarch64 which needs 36 minutes. As such a job timeout of 45 minutes seems to be fine.

Depends on: 1550702

I would suggest that we don't hard-block this benchmark suite from landing on the needed changes for gecko profile data handling (bug 1550702). We can add the profiler jobs once that's done.

No longer depends on: 1550702
See Also: → 1550702

As just agreed in the streaming meeting we don't want to run Raptor profiling jobs by default due to the size of data to be collected. Instead everyone could run this locally, or push a reduced set of tests to try. This will all be documented.

I pushed a couple of try builds over the weekend:

https://treeherder.mozilla.org/perf.html#/graphs?series=try,2009309,1,10&series=try,2010930,1,10&series=try,2009297,1,10&series=try,2009939,1,10&series=try,2009305,1,10&selected=try,2010930,473206,807130796,10

As you can see the noise is very high, especially for aarch64. It means that using the mean of all dropped frames might maybe not possible (yet) on at least those affected platforms?

I'm not that eager to see all the alerts coming up for sheriffing. :/

Flags: needinfo?(igoldan)

Also as talked with Paul on IRC and in the streaming meeting on Friday the aarch64 devices have a driver bug (see bug 1548410) and as such we currently fallback to software decoding. Similar what we also do on MacOS.

I would suggest to disable alerts on those platforms, and only keep them enabled for those which make use of hardware decoding. Those are Linux, and Windows.

Ionut, what the best place to disable alerts? Does it have to be part of the perfherder data as send by Raptor, or can this enabled/disabled via perfherder?

I just tested this patch with an Android device and there is a problem loading the remote page due to "ERROR_PROXY_CONNECTION_REFUSED`. Maybe I should really fix the proxy issue before landing this patch at all.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #16)

I pushed a couple of try builds over the weekend:

https://treeherder.mozilla.org/perf.html#/graphs?series=try,2009309,1,10&series=try,2010930,1,10&series=try,2009297,1,10&series=try,2009939,1,10&series=try,2009305,1,10&selected=try,2010930,473206,807130796,10

As you can see the noise is very high, especially for aarch64. It means that using the mean of all dropped frames might maybe not possible (yet) on at least those affected platforms?

I'm not that eager to see all the alerts coming up for sheriffing. :/

Yes, you are right. The noise levels are quite high. Not suited for being sheriffed.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #17)

Ionut, what the best place to disable alerts? Does it have to be part of the perfherder data as send by Raptor, or can this enabled/disabled via perfherder?

Perfherder only shows what should be alerted. The tests are the ones that decide that, inside the PERFHERDER_DATA they dump, via the should_alert, subtest_should_alert fields.

Flags: needinfo?(igoldan)

To integrate disabling of alerts based on the platform we need the oskey as avaialable in raptor.py also in this method:

https://searchfox.org/mozilla-central/rev/94c6b5f06d2464f6780a52f32e917d25ddc30d6b/testing/raptor/raptor/output.py#35

Rob, I could get this down the full stack from Raptor through the result handler, and the output class. Or I just request it here again via mozinfo. What would be your preferred solution? Thanks!

Flags: needinfo?(rwood)

We no longer need bug 1549557 given that we always operate on lower is better now.

No longer depends on: 1549557

Also this benchmark test will only be run on mozilla-central for the time being. It means that no alerting will be active, and as such the work to disable alerts on MacOS and aarch64 can be moved to a follow-up bug.

We will enable the tests on integration and beta branches only if the noise is low.

Blocks: 1552439

I moved the needinfo from Rob forward to bug 1552439.

Here hopefully the final try build of my patch series. If all goes well, we can land it today!

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a307b8b3fef4e67f0fca94119421a05d662ac9d6

Flags: needinfo?(rwood)
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6b3a8394727f
[raptor] Use a multiplier for page timeout when using live sites. r=perftest-reviewers,rwood
https://hg.mozilla.org/integration/autoland/rev/152615db9db6
[raptor] Allow tests to specify the alertChangeType. r=perftest-reviewers,rwood
https://hg.mozilla.org/integration/autoland/rev/a2544ca8c593
[raptor] Fix local import of filter module. r=perftest-reviewers,rwood
https://hg.mozilla.org/integration/autoland/rev/609f489bdc8c
[raptor] Integrate Youtube video playback performance suite as benchmark test. r=perftest-reviewers,rwood
https://hg.mozilla.org/integration/autoland/rev/934d2f88195d
[raptor] Enable Youtube Playback benchmark tests in CI. #perftest r=perftest-reviewers,stephendonner
Blocks: 1552484

First I have to blame myself in not disabling those jobs on integration and beta branches. Not sure how this could slip through. I will fix it and clearly ask for review from Rob.

But what I totally not understand are the following lines:

13:53:45 INFO - raptor-manifest abort: specified test name doesn't exist
13:53:45 INFO - raptor-main abort: no tests found

The test job was running properly on try this morning. So not sure why it cannot find the file anymore.

Rob, do you have an idea what's going wrong here? For me https://hg.mozilla.org/integration/autoland/rev/609f489bdc8c contains everything what is needed.

Flags: needinfo?(hskupin) → needinfo?(rwood)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #27)

Rob, do you have an idea what's going wrong here? For me https://hg.mozilla.org/integration/autoland/rev/609f489bdc8c contains everything what is needed.

Hey Henrik,

Yes what has happened there is the Raptor 'fail-safe' to ensure that tests with 'use_lives_sites' enabled don't run on any repos except 'try' (or locally of course). This was to ensure that we don't accidently add any data from tp6 page load with live sites to perfherder, because we don't want to have automated regression detection for live page load sites because of the data noise.

The 'fail safe' will filter out any tests with live sites enabled if not running locally or on 'try', see [0]. Since we don't do automated regression detection on mozilla-central, if you like you could update this filter to allow running live sites on mozilla-central (that was an oversight on my part). Thanks, sorry for the hassle here.

[0] https://searchfox.org/mozilla-central/rev/0078b9e7d42c366b102d7aec918caf64fed1d574/testing/raptor/raptor/manifest.py#42

Flags: needinfo?(rwood)

Oh! That would totally explain it. Sadly the filter doesn't log anything when it discards a test. As discussed with Rob on IRC I will just add an INFO line.

Also we want to have a whitelist of tests which are allowed to run as live tests on integration and beta branches. But those should always be at maximum tier-2 but not tier-1. So until we found a solution for those video tests they will have to be kept running as tier-2.

Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7dc6dbc72a41
[raptor] Use a multiplier for page timeout when using live sites. r=perftest-reviewers,rwood
https://hg.mozilla.org/integration/autoland/rev/3f21ab675585
[raptor] Allow tests to specify the alertChangeType. r=perftest-reviewers,rwood
https://hg.mozilla.org/integration/autoland/rev/8ac4327262ce
[raptor] Fix local import of filter module. r=perftest-reviewers,rwood
https://hg.mozilla.org/integration/autoland/rev/c66e71b65b55
[raptor] Log discarded tests in filter_live_sites. r=perftest-reviewers,rwood
https://hg.mozilla.org/integration/autoland/rev/6977ccf2ff65
[raptor] Don't filter-out tests which are white-listed for "use_live_sites". r=perftest-reviewers,stephendonner,rwood
Blocks: 1552738

Rob, when landing those patches you actually didn't land all of them because you haven't been at the top-most revision when clicking the Lando button. It means the two most important patches are missing, and I'm going to land those now.

Flags: needinfo?(rwood)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla68 → ---
Pushed by hskupin@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a757325b7690
[raptor] Integrate Youtube video playback performance suite as benchmark test. r=perftest-reviewers,rwood
https://hg.mozilla.org/integration/autoland/rev/0a796fa7c16f
[raptor] Enable Youtube Playback benchmark tests in CI. #perftest r=perftest-reviewers,stephendonner,rwood
Status: REOPENED → RESOLVED
Closed: 5 months ago5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #35)

Rob, when landing those patches you actually didn't land all of them because you haven't been at the top-most revision when clicking the Lando button. It means the two most important patches are missing, and I'm going to land those now.

Ugh, apologies, didn't realize that was even possible - I'll watch that in the future thanks for pointing that out!

Flags: needinfo?(rwood)
Regressions: 1567010
No longer regressions: 1567010
You need to log in before you can comment on or make changes to this bug.