Integrate Youtube video playback performance suite as Raptor benchmark test
Categories
(Testing :: Raptor, enhancement, P1)
Tracking
(firefox68 fixed)
Tracking | Status | |
---|---|---|
firefox68 | --- | fixed |
People
(Reporter: whimboo, Assigned: whimboo)
References
(Depends on 1 open bug, Blocks 2 open bugs)
Details
Attachments
(7 files)
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
Originally intended the tests should be run as a video streaming playback test and as such I worked on bug 1539111, but as figured out those tests better fit into the benchmark suite.
There is a copy of the tests in our own infra temporarily at:
http://yttest.dev.mozaws.net/2019/main.html?test_type=playbackperf-test
How do we want to handle those tests:
- Don't run them by a local checkout as what we do for other benchmarks, but from the above mirror site as live tests
- Run all the tests of the suite once (each will be played for 15s)
- Track each of the ~100 tests separately in Perfherder as subtests
- Subtests have measurements for dropped frames and total frames
- Top level score is % of test cases passing, wchin can be alerted on and reported into health dashboard; a failure is > 0 dropped frames
The target is 100% passing.
Once the benchmark landed on m-c we will have to figure out on which specific platforms we want to run this kind of test. Chris Pearce might help us with that later.
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
To integrate those tests in CI I wonder on which platforms we actually want to run those tests. When I started to work on that about a month ago we primarily focused FireTV stick 4k (not doable in CI yet due to missing hardware), and Windows ARM64.
Since then some priorities have been changed but actually shouldn't cause changes for our testing matrix. As such it would be good to know which other platforms would benefit from such results. My idea would be to start running those tests for shippable (Nightly) builds on win64, win32, linux64, and MacOS.
For mobile we might consider geckoview based applications, but those would only work once bug 1547717 got fixed.
Chris, do you have any feedback in regards of platforms to run this test suite on?
Comment 2•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #1)
Chris, do you have any feedback in regards of platforms to run this test suite on?
As you suggested, win64, win32, linux64, and MacOS, plus some variant of GeckoView when it's available sound reasonable to me.
Assignee | ||
Comment 3•5 years ago
|
||
Perfect. So lets do it. I just pushed a very first WIP to try. Lets see how and if it works:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c50a8779f3f2520a0c14a318f557b29b7445d91
Assignee | ||
Comment 4•5 years ago
|
||
Here are the first results of the benchmark in perfherder:
x-platform graph (without linux which has busted builds):
We have a very poor playback quality on MacOS, but that's something we already know. Most likely due to bug 1400787. The following results for all the contained videos shows that:
Assignee | ||
Comment 5•5 years ago
|
||
Assignee | ||
Comment 6•5 years ago
|
||
Depends on D30483
Assignee | ||
Comment 7•5 years ago
|
||
With the new try build the profiling jobs fail, also the normal Raptor job on aarch64:
The problem with profiling should be that the generated profiles are simply too large and take a long time to get sent to the Python harness.
What I will do next is to figure out why we are failing on aarch64.
Assignee | ||
Comment 8•5 years ago
|
||
So I accidentally set too short max-runtime settings for the jobs in CI when I just copied/pasted entries from other raptor jobs. As such the profiling had only 15min, and the rest 30min. This is clearly too short.
Here a new try build with way higher values so that we can see how long it actually takes:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4701975dfc42466a2fd1c7c8c7e09a2f00962821
Assignee | ||
Comment 9•5 years ago
|
||
Assignee | ||
Comment 10•5 years ago
|
||
Depends on D30530
Assignee | ||
Comment 11•5 years ago
|
||
To not collide with the built-in "filter" method, the local
filter module should be named as filters.
Depends on D30531
Updated•5 years ago
|
Assignee | ||
Comment 12•5 years ago
|
||
Retrieving the profile actually takes ages, which causes all the tests to timeout:
14:04:41 INFO - raptor-control-server received webext_status: retrieving gecko profile
14:21:11 INFO - raptor-main application timed out after 2583 seconds
With that behavior we cannot have the profiling job enabled by default.
I will see if I can improve the performance when transferring the data.
Assignee | ||
Comment 13•5 years ago
|
||
By default the raptor job take about 27 minutes, except for aarch64 which needs 36 minutes. As such a job timeout of 45 minutes seems to be fine.
Assignee | ||
Comment 14•5 years ago
|
||
I would suggest that we don't hard-block this benchmark suite from landing on the needed changes for gecko profile data handling (bug 1550702). We can add the profiler jobs once that's done.
Assignee | ||
Comment 15•5 years ago
|
||
As just agreed in the streaming meeting we don't want to run Raptor profiling jobs by default due to the size of data to be collected. Instead everyone could run this locally, or push a reduced set of tests to try. This will all be documented.
Assignee | ||
Comment 16•5 years ago
|
||
I pushed a couple of try builds over the weekend:
As you can see the noise is very high, especially for aarch64. It means that using the mean of all dropped frames might maybe not possible (yet) on at least those affected platforms?
I'm not that eager to see all the alerts coming up for sheriffing. :/
Assignee | ||
Comment 17•5 years ago
|
||
Also as talked with Paul on IRC and in the streaming meeting on Friday the aarch64 devices have a driver bug (see bug 1548410) and as such we currently fallback to software decoding. Similar what we also do on MacOS.
I would suggest to disable alerts on those platforms, and only keep them enabled for those which make use of hardware decoding. Those are Linux, and Windows.
Ionut, what the best place to disable alerts? Does it have to be part of the perfherder data as send by Raptor, or can this enabled/disabled via perfherder?
Assignee | ||
Comment 18•5 years ago
|
||
I just tested this patch with an Android device and there is a problem loading the remote page due to "ERROR_PROXY_CONNECTION_REFUSED`. Maybe I should really fix the proxy issue before landing this patch at all.
Comment 19•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #16)
I pushed a couple of try builds over the weekend:
As you can see the noise is very high, especially for aarch64. It means that using the mean of all dropped frames might maybe not possible (yet) on at least those affected platforms?
I'm not that eager to see all the alerts coming up for sheriffing. :/
Yes, you are right. The noise levels are quite high. Not suited for being sheriffed.
Comment 20•5 years ago
•
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #17)
Ionut, what the best place to disable alerts? Does it have to be part of the perfherder data as send by Raptor, or can this enabled/disabled via perfherder?
Perfherder only shows what should be alerted. The tests are the ones that decide that, inside the PERFHERDER_DATA they dump, via the should_alert
, subtest_should_alert
fields.
Assignee | ||
Comment 21•5 years ago
|
||
To integrate disabling of alerts based on the platform we need the oskey
as avaialable in raptor.py also in this method:
Rob, I could get this down the full stack from Raptor through the result handler, and the output class. Or I just request it here again via mozinfo. What would be your preferred solution? Thanks!
Assignee | ||
Comment 22•5 years ago
|
||
We no longer need bug 1549557 given that we always operate on lower is better now.
Assignee | ||
Comment 23•5 years ago
|
||
Also this benchmark test will only be run on mozilla-central for the time being. It means that no alerting will be active, and as such the work to disable alerts on MacOS and aarch64 can be moved to a follow-up bug.
We will enable the tests on integration and beta branches only if the noise is low.
Assignee | ||
Comment 24•5 years ago
|
||
I moved the needinfo from Rob forward to bug 1552439.
Here hopefully the final try build of my patch series. If all goes well, we can land it today!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a307b8b3fef4e67f0fca94119421a05d662ac9d6
Assignee | ||
Updated•5 years ago
|
Comment 25•5 years ago
|
||
Comment 26•5 years ago
|
||
Backed out 5 changesets (Bug 1548845) for failing new youtube playback raptor tests.
Backout: https://hg.mozilla.org/integration/autoland/rev/f72947acdfcd662c26a8e84efac58e703b2ce2ec
Push that started the failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=934d2f88195de26cc114451e6511613d27f997aa&selectedJob=247036985
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=247036985&repo=autoland&lineNumber=528
Assignee | ||
Comment 27•5 years ago
|
||
First I have to blame myself in not disabling those jobs on integration and beta branches. Not sure how this could slip through. I will fix it and clearly ask for review from Rob.
But what I totally not understand are the following lines:
13:53:45 INFO - raptor-manifest abort: specified test name doesn't exist
13:53:45 INFO - raptor-main abort: no tests found
The test job was running properly on try this morning. So not sure why it cannot find the file anymore.
Rob, do you have an idea what's going wrong here? For me https://hg.mozilla.org/integration/autoland/rev/609f489bdc8c contains everything what is needed.
Comment 28•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #27)
Rob, do you have an idea what's going wrong here? For me https://hg.mozilla.org/integration/autoland/rev/609f489bdc8c contains everything what is needed.
Hey Henrik,
Yes what has happened there is the Raptor 'fail-safe' to ensure that tests with 'use_lives_sites' enabled don't run on any repos except 'try' (or locally of course). This was to ensure that we don't accidently add any data from tp6 page load with live sites to perfherder, because we don't want to have automated regression detection for live page load sites because of the data noise.
The 'fail safe' will filter out any tests with live sites enabled if not running locally or on 'try', see [0]. Since we don't do automated regression detection on mozilla-central, if you like you could update this filter to allow running live sites on mozilla-central (that was an oversight on my part). Thanks, sorry for the hassle here.
Assignee | ||
Comment 29•5 years ago
|
||
Oh! That would totally explain it. Sadly the filter doesn't log anything when it discards a test. As discussed with Rob on IRC I will just add an INFO line.
Also we want to have a whitelist of tests which are allowed to run as live tests on integration and beta branches. But those should always be at maximum tier-2 but not tier-1. So until we found a solution for those video tests they will have to be kept running as tier-2.
Assignee | ||
Comment 30•5 years ago
|
||
Depends on D30532
Assignee | ||
Comment 31•5 years ago
|
||
Depends on D31681
Assignee | ||
Comment 32•5 years ago
|
||
New try build with the latest additions included:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4904bb60de0464748516d6705524b6263f909a0
Comment 33•5 years ago
|
||
Comment 34•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/7dc6dbc72a41
https://hg.mozilla.org/mozilla-central/rev/3f21ab675585
https://hg.mozilla.org/mozilla-central/rev/8ac4327262ce
https://hg.mozilla.org/mozilla-central/rev/c66e71b65b55
https://hg.mozilla.org/mozilla-central/rev/6977ccf2ff65
Assignee | ||
Comment 35•5 years ago
|
||
Rob, when landing those patches you actually didn't land all of them because you haven't been at the top-most revision when clicking the Lando button. It means the two most important patches are missing, and I'm going to land those now.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Comment 36•5 years ago
|
||
Comment 37•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a757325b7690
https://hg.mozilla.org/mozilla-central/rev/0a796fa7c16f
Comment 38•5 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #35)
Rob, when landing those patches you actually didn't land all of them because you haven't been at the top-most revision when clicking the Lando button. It means the two most important patches are missing, and I'm going to land those now.
Ugh, apologies, didn't realize that was even possible - I'll watch that in the future thanks for pointing that out!
Description
•