Closed Bug 1649819 Opened 5 years ago Closed 5 years ago

Perma raptor task timeouts for youtube widevine related tests

Categories

(Core :: Audio/Video: Playback, defect, P1)

80 Branch
defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox77 --- unaffected
firefox78 --- unaffected
firefox79 --- unaffected
firefox80 --- fixed

People

(Reporter: aryx, Assigned: Gijs)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(3 files)

Flags: needinfo?(gmierz2)
Summary: raptor task timeouts for youtube widevine related tests → Perma raptor task timeouts for youtube widevine related tests

Thanks for letting me know about this, I'm testing to see if increasing the timeout will fix it (the test is working locally, but the videos are longer than what we had with the 2019 branch): https://treeherder.mozilla.org/#/jobs?repo=try&revision=b58895719dd5a28d37466aad8fcc60972f56e940

(leaving the ni? open for myself)

Severity: -- → S2
Flags: needinfo?(gmierz2)
Priority: -- → P2
Flags: needinfo?(gmierz2)

This is very odd. I noticed that the failures started in this push so it seems like we may be able to find a commit that caused it: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=widevine&revision=79d69f36a2202fadd572b0243b618e100d29b525

I found one patch which made changes to something involving DRM content: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=youtube&revision=e319075a4c0e7a4569518311a1b4a4688b2637b4

I ran the same tests on it but they didn't fail. Then I ran the tests on the child commits in that push and they still weren't failing. I tested locally as well but I couldn't reproduce it, I'm testing it once more today though in case I missed something.

:bryce, :gijs would either of you have an idea of what might be happening here? The widevine youtube playback test failures started with the push that contained your patches and it seems like the most likely culprit in all the commits.

Flags: needinfo?(gmierz2)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(bvandyk)

(In reply to Greg Mierzwinski [:sparky] from comment #2)

This is very odd. I noticed that the failures started in this push so it seems like we may be able to find a commit that caused it: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=widevine&revision=79d69f36a2202fadd572b0243b618e100d29b525

I found one patch which made changes to something involving DRM content: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=youtube&revision=e319075a4c0e7a4569518311a1b4a4688b2637b4

I ran the same tests on it but they didn't fail. Then I ran the tests on the child commits in that push and they still weren't failing. I tested locally as well but I couldn't reproduce it, I'm testing it once more today though in case I missed something.

:bryce, :gijs would either of you have an idea of what might be happening here? The widevine youtube playback test failures started with the push that contained your patches and it seems like the most likely culprit in all the commits.

No. The actor code that got changed should only be used if the DRM/EME stuff is disabled or unavailable, I think.

What does the test actually test? I tried looking for documentation (e.g. https://firefox-source-docs.mozilla.org/testing/perfdocs/raptor.html ) but found none. I assume it's not against an actual real, web-based youtube instance, but against something more static?

It seems super suspicious that autoland is green (as was your trypush with a longer timeout) and mozilla-central is not. Have we figured out what the difference is yet? IIRC the perf numbers can be substantially different because of different branding and thus different behaviour in some components.

Of course, the profiler jobs are also green, so that's no help.

As an aside, this is a practical case of "the perf test says bad things but we have no idea why" as discussed in our recent meeting...

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(gmierz2)

(In reply to :Gijs (he/him) from comment #3)

No. The actor code that got changed should only be used if the DRM/EME stuff is disabled or unavailable, I think.

What does the test actually test? I tried looking for documentation (e.g. https://firefox-source-docs.mozilla.org/testing/perfdocs/raptor.html ) but found none. I assume it's not against an actual real, web-based youtube instance, but against something more static?

When I run the tests locally, I do get a banner about it being installed but I don't think that falls into your classifications. That's correct, these tests run a video with various settings to monitor dropped frames. You can see this here, we'll have these in the docs in the future (it's a work in progress atm): https://yttest.dev.mozaws.net/testing_2020/perf-youtube-playback/2019/main.html?test_type=playbackperf-widevine-sfr-h264-test&raptor=true&exclude=1,2&muted=true&command=run

It seems super suspicious that autoland is green (as was your trypush with a longer timeout) and mozilla-central is not. Have we figured out what the difference is yet? IIRC the perf numbers can be substantially different because of different branding and thus different behaviour in some components.

Of course, the profiler jobs are also green, so that's no help.

As an aside, this is a practical case of "the perf test says bad things but we have no idea why" as discussed in our recent meeting...

Right now I'm thinking it's related to what you mentioned different branding and thus different behaviour in some components. in mozilla-central. I tried it out again today just in case and it's certainly passing on try without any issues (it didn't get anywhere near the new task timeout value): https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=ce2ddb9b847d6454cf6d9613d2065280fe435187

I checked here to see if conditioned profiles could be causing this and they aren't: https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=833d0287d08f5e2edf137f61a7543df00fcb6992

I don't see any changes related to CI either at or near that day either: https://changelog.dev.mozaws.net/

:gijs, :aryx do either of you see any patches in that mozilla-central commit that could have changed things related to branding or know who I could ask about that?

CC tomprince in case you see anything running in that mozilla-central commit which may be causing this m-c only failure.

Flags: needinfo?(gmierz2)
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(aryx.bugmail)

(In reply to Greg Mierzwinski [:sparky] from comment #4)

(In reply to :Gijs (he/him) from comment #3)

No. The actor code that got changed should only be used if the DRM/EME stuff is disabled or unavailable, I think.

What does the test actually test? I tried looking for documentation (e.g. https://firefox-source-docs.mozilla.org/testing/perfdocs/raptor.html ) but found none. I assume it's not against an actual real, web-based youtube instance, but against something more static?

When I run the tests locally, I do get a banner about it being installed but I don't think that falls into your classifications.

Huh. No, that banner does have to do with the code that got modified, so I guess that's plausible? But it still doesn't make much sense that this is working on the integration branches. If the patch broke something, I'd expect the breakage to be consistent... Basically, we changed the way the content process communicates with the parent about DRM stuff.

Do we intend for this test to measure the cost of setting up the DRM stuff? That is, we normally have to download CDMs from the different providers, which happens async on first run. I would expect for the perf test, we measure that separately, not as part of a dropped frames test...

Of course, if the changes somehow broke the CDM download then that would be bad... but I can't think what would have done that (and why it would only do it some of the time, but consistently on m-c!)... Bryce might have a better idea.

Flags: needinfo?(gijskruitbosch+bugs)

(In reply to :Gijs (he/him) from comment #5)

Do we intend for this test to measure the cost of setting up the DRM stuff? That is, we normally have to download CDMs from the different providers, which happens async on first run. I would expect for the perf test, we measure that separately, not as part of a dropped frames test...

No we don't measure that cost, only the dropped frames per video that played. In this case the DRM stuff for widevine is set up once we navigate to the test page and a video starts playing - we use some preferences to get the CDMs downloaded and installed. In other words, it's not preinstalled.

Following those patches the hope is that things should behave the same as before, but also in a Fission friendly way. So I've overlooked something if behaviour has changed (which is entirely possible). I don't have a clear idea of what's going on, but let me add some ideas to try and help us get there.

  • Those changes touch messages that go between the UI and lower level code concerned with the CDM being enabled and installed.
    • They shouldn't, if working as anticipated, add much run time to the tests. They certainly shouldn't make these test runs go from ~22 mins to 45 mins and time out.
    • If I've made a mistake somewhere I could envision problems such as the CDM (DRM thingy) not being installed, or us not resolving DRM promises correctly when the CDM becomes installed. Depending on what happens these could manifest in differently, but if the yellow DRM installing notification never goes away that could indicate something is wrong, similarly if the promise requesting media key system access never resolves in the test. I've just manually tested this works in my local build and don't see anything immediately broken.
  • Different build configs interact with DRM being enabled or not. I wonder if there's anything the is difference between the builds that could be tickling a bug that was introduced in the above changes.

Do we have mix of passing and failing runs of these tests on MC? I see Linux ones reliably failing in the link from comment 0, but other operating systems look intermittent. E.g. Windows 7 shippable appears to be broken sometimes and other times not. I don't know what this suggest, if anything.

Flags: needinfo?(bvandyk)

(In reply to Bryce Seager van Dyk (:bryce) from comment #7)

Do we have mix of passing and failing runs of these tests on MC? I see Linux ones reliably failing in the link from comment 0, but other operating systems look intermittent. E.g. Windows 7 shippable appears to be broken sometimes and other times not. I don't know what this suggest, if anything.

Yes it is an intermittent failure, but it's a near-perma failure on linux since it's rarely passed. I haven't been able to find a difference between the passing and failing runs yet.

So running locally, I managed to reproduce an error that would cause the task to timeout:

TestExecutor:  Test 12.2.2.1:PlaybackPerfH264240p30fps@1X threw an error: NotSupportedError: Timed out while waiting for a CDM update 

Here's what I did, maybe one of you can reproduce it too:

hg update -r 31fb4a2a69124b6a21e5d8f44a0407d489749407
./mach build
./mach raptor -t raptor-youtube-playback-widevine-h264-sfr --post-startup-delay 1 --project mozilla-central        # TEST PASSING
hg update -r 79d69f36a2202fadd572b0243b618e100d29b525
./mach build
./mach raptor -t raptor-youtube-playback-widevine-h264-sfr --post-startup-delay 1 --project mozilla-central
1. When the browser starts, and the test page is loading click on the chrome to get focus on the window.
2. Wait for the yellow banner to show, then immediately move the browser window - make sure you hold it for a good bit of time 5-15s (I moved mine to the top-left corner, but it doesn't seem to matter).

I haven't seen it pass yet after following these steps and on the previous mozilla-central commit (31fb4a2a69124b6a21e5d8f44a0407d489749407), I can freely move the browser and the CDM is still installed.

Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(bvandyk)
Flags: needinfo?(aryx.bugmail)

I don't have a linux build/machine to hand, I will try and get back to this on Monday.

Blocks: 1589796
Assignee: nobody → gmierz2
Status: NEW → ASSIGNED
Attachment #9161603 - Attachment description: Bug 1649819 - Temporarily disable youtube widevine tests. r?#perftest → Bug 1649819 - Temporarily disable youtube widevine tests on linux and windows7. r?#perftest

We're going to disable the linux/windows7 tests temporarily while this is being investigated since they are wasting CI time.

I'm not sure why, or how, but I was able to get a failure on try yesterday: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3b3a785cfe92d93528ef76299d7fc3e37ac5b40c

Pushed by gmierz2@outlook.com: https://hg.mozilla.org/integration/autoland/rev/99e22250e61d Temporarily disable youtube widevine tests on linux and windows7. r=tarek

Thanks for the steps in comment #8. I'm actually unsure what the window moving has to do with anything (and perhaps that deserves further investigation in a separate bug? Feels like it's interfering with idle callback running...), but I did finally figure out what the issue is.

We stopped communicating over the message manager, and instead communicate over the actors' IPC channel. But I missed that there was non-browser-media.js code that depended on this communication, too, and it didn't get converted, so now it's just sitting there and never gets any messages. This is https://searchfox.org/mozilla-central/rev/91d82d7cbf05a71954dfa49d0e43824c7c973e62/toolkit/mozapps/extensions/internal/GMPProvider.jsm#561,572 , which explicitly/proactively forces a check for updates to the CDM immediately when a request comes in that tries to use the CDM. Now that we no longer do that, the CDM gets installed only when some timer kicks in sometime after startup and goes "gee, maybe we should install some CDMs, let's have a look". That explains why installs aren't happening quickly anymore and breaking the test.

This is just an implementation bug so I'm gonna move the bug.

Assignee: gmierz2 → gijskruitbosch+bugs
Component: Raptor → Audio/Video: Playback
Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(bvandyk)
Keywords: regression
Priority: P2 → P1
Product: Testing → Core
Regressed by: 1642465
Has Regression Range: --- → yes

I pushed to try to confirm we can re-enable: https://treeherder.mozilla.org/#/jobs?repo=try&revision=22bc9268c811eee47aeb27c986c857c8afa098fe

FWIW, I investigated the window movement stuff. It turns out the post-first-startup download of the CDM lives behind a check for user interaction activity (!), so if you keep moving the mouse / typing, then no download will happen. Still unclear why that'd impact the linux and win7 builds, particularly, but I'm not super bothered about that. I filed bug 1651161 and co to evaluate if/how we can be more clear about how _scheduleArbitrarilyLateStartupTasks works and what guarantees it (doesn't) provide(s), and to check if all the things that live there are really best placed there or need to move somewhere else.

For this particular test, though it found a real bug, I wonder if we should get a separate bug on file to try to somehow ensure that the GMP add-ons are already present when it runs, given that checking that those bits work isn't really what the perf test is designed for! :sparky, what do you think?

Flags: needinfo?(gmierz2)
See Also: → 1651230
Pushed by gijskruitbosch@gmail.com: https://hg.mozilla.org/integration/autoland/rev/732b3cb93098 fix triggering gmp/widevine/openh264 eme CDM installs by attempting to use them, r=bryce
Keywords: leave-open

:gijs, thanks a lot for fixing the issue! I've filed a bug to get the tests re-enabled.

I agree, we should have the installation done separately. I'm thinking we could make use of conditioned profiles for this but I'm not sure if the installation would be stored in the profiles or if we have to do it on the machine we are running the tests on. Another question with this solution would be, are there any legal implications of having a publicly accessible profile with the CDM installed on it? (It sounds like it could fall under redistribution).

That said, could we block that change on bug 1651230 so that we have some coverage for these components until some functional tests are added?

Flags: needinfo?(gmierz2) → needinfo?(gijskruitbosch+bugs)

(In reply to Greg Mierzwinski [:sparky] from comment #23)

:gijs, thanks a lot for fixing the issue! I've filed a bug to get the tests re-enabled.

I agree, we should have the installation done separately. I'm thinking we could make use of conditioned profiles for this but I'm not sure if the installation would be stored in the profiles or if we have to do it on the machine we are running the tests on.

They get stored in the profile, yes. Try running await AddonManager.getAddonsByTypes(["plugin"]) in the browser console in your usual user profile, and you'll see the gmpPath is a profile folder path.

Another question with this solution would be, are there any legal implications of having a publicly accessible profile with the CDM installed on it? (It sounds like it could fall under redistribution).

I'm not a lawyer, so I don't know about this. If this is a problem, you could obviously also always download them from the official location using a python script or whatever, prior to then having the Firefox profile use them, but you'd need to check with the add-ons team whether that's likely to be a viable option in terms of manually/outside-of-Firefox "prepping" the profile to load these. Lastly, I guess you could also consider making the step of installing the CDMs an explicit part of the perf test, checking if they are there and failing the test before doing any attempts at playback if not.

That said, could we block that change on bug 1651230 so that we have some coverage for these components until some functional tests are added?

That sounds good to me.

Flags: needinfo?(gijskruitbosch+bugs)

I guess given bug 1651353 we can close this? Though if it helps anything we can reopen until the tests are re-enabled...

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Version: unspecified → 80 Branch

(In reply to Greg Mierzwinski [:sparky] from comment #27)

This has started again today: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedTaskRun=ObpAMA1uSKi9WTz6kCnnZg.0&tier=1%2C2%2C3&searchStr=raptor&revision=2c8bc998c107bd6a6182b5fc4740fe77fc07cfca

So is there a new regressor? bug 1651353 suggests the affected tests were re-enabled 5 days ago, and the failures do look like they started with this morning's merge which you highlighted... Probably worth filing a new bug and bisecting that down again? :-(

Flags: needinfo?(gmierz2)

The failures aren't related to any patch, the server needs its certificate updated apparently. :bebe, would you or alex be able to update it?

Flags: needinfo?(gmierz2) → needinfo?(fstrugariu)

Alex Can you take care of this?

Flags: needinfo?(fstrugariu) → needinfo?(aionescu)
See Also: → 1652451

(In reply to :Gijs (he/him) from comment #28)

(In reply to Greg Mierzwinski [:sparky] from comment #27)

This has started again today: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&selectedTaskRun=ObpAMA1uSKi9WTz6kCnnZg.0&tier=1%2C2%2C3&searchStr=raptor&revision=2c8bc998c107bd6a6182b5fc4740fe77fc07cfca

So is there a new regressor? bug 1651353 suggests the affected tests were re-enabled 5 days ago, and the failures do look like they started with this morning's merge which you highlighted... Probably worth filing a new bug and bisecting that down again? :-(

Oh I missed your message! No new regressor, it's a server issue - thanks for your quick reply though :)
Filed a new bug for this here: https://bugzilla.mozilla.org/show_bug.cgi?id=1652451

Flags: needinfo?(aionescu)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: