Closed Bug 1558966 Opened 5 years ago Closed 2 months ago

Intermittent /encrypted-media/clearkey-mp4-playback-temporary-multikey-sequential-readyState.https.html | org.w3.clearkey, successful playback, temporary, mp4, multiple keys, sequential, readyState - assert_equals: It should require all keys to reach end

Categories

(Core :: Audio/Video: Playback, defect, P5)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [retriggered][stockwell disabled])

Attachments

(1 file)

Filed by: apavel [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=251475240&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/OpO7Hl5MSXyykmDaO75Szg/runs/0/artifacts/public/logs/live_backing.log


20:26:50 INFO - TEST-START | /encrypted-media/clearkey-mp4-playback-temporary-multikey-sequential-readyState.https.html
20:26:50 INFO - Closing window 34359738369
20:26:50 INFO - PID 10988 | [Child 3272, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_156[GMP 10800, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:50 INFO - PID 10988 | [Parent 7272, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:50 INFO - PID 10988 | [Child 11244, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:52 INFO -
20:26:52 INFO - TEST-UNEXPECTED-FAIL | /encrypted-media/clearkey-mp4-playback-temporary-multikey-sequential-readyState.https.html | org.w3.clearkey, successful playback, temporary, mp4, multiple keys, sequential, readyState - assert_equals: It should require all keys to reach end of content expected 2 but got 1
20:26:52 INFO - onTimeupdate@https://web-platform.test:8443/encrypted-media/scripts/playback-temporary-multikey-sequential.js:95:17
20:26:52 INFO - Test.prototype.step@https://web-platform.test:8443/resources/testharness.js:1594:25
20:26:52 INFO - Test.prototype.step_func/<@https://web-platform.test:8443/resources/testharness.js:1618:35
20:26:52 INFO - EventListener.handleEventrunTest/</<@https://web-platform.test:8443/encrypted-media/scripts/playback-temporary-multikey-sequential.js:110:20
20:26:52 INFO - promise callback
runTest/<@https://web-platform.test:8443/encrypted-media/scripts/playback-temporary-multikey-sequential.js:107:12
20:26:52 INFO - Test.prototype.step@https://web-platform.test:8443/resources/testharness.js:1594:25
20:26:52 INFO - async_test@https://web-platform.test:8443/resources/testharness.js:576:22
20:26:52 INFO - runTest@https://web-platform.test:8443/encrypted-media/scripts/playback-temporary-multikey-sequential.js:17:5
20:26:52 INFO - @https://web-platform.test:8443/encrypted-media/clearkey-mp4-playback-temporary-multikey-sequential-readyState.https.html:52:9
20:26:52 INFO - TEST-OK | /encrypted-media/clearkey-mp4-playback-temporary-multikey-sequential-readyState.https.html | took 2159ms
20:26:52 INFO - PID 10988 | [Child 11244, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/1560371212653 Marionette INFO Stopped listening on port 2828
20:26:52 INFO - PID 10988 | [GMP 8292, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:52 INFO - PID 10988 | [GMP 8292, Chrome_ChildThread] WARNING: pipe erro[Parent 7272, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:52 INFO - PID 10988 | [Child 6104, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:52 INFO - PID 10988 | [Child 6104, Chrome_ChildThread] WARNING: [Parent 7272, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:52 INFO - PID 10988 | [Parent 7272, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:52 INFO - PID 10988 | [Child 9080, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:52 INFO - PID 10988 | [Child 90[Child 10340, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:52 INFO - PID 10988 | [[Parent 7272, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:52 INFO - PID 10988 | [Child 2472, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:52 INFO - PID 10988 | [Child 2472, Chrome_ChildThread] WARNING: pipe erro[Parent 7272, Gecko_IOThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:52 INFO - PID 10988 | [Child 1716, Chrome_ChildThread] WARNING: pipe error: 109: file z:/task_1560362683/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 341
20:26:53 INFO - PID 10988 | [Child 1716, Chrome_ChildThread] WARNING: pipe error:[GPU 5524, Ch
20:26:53 INFO - Browser exited with return code 0
20:26:53 INFO - PROCESS LEAKS None
20:26:53 INFO - Closing logging queue
20:26:53 INFO - queue closed
20:26:53 INFO - Setting up ssl

Flags: needinfo?(apehrson)
Whiteboard: [retriggered][stockwell needswork]

The only patches in bug 1554699 I see that could cause this are [1] and [2].
They both made HTMLMediaElement get up to spec on its play algorithm, by not firing some "timeupdate" events anymore. Does this sound reasonable? Is there a latent bug here?

Bryce, would you be the right person to look at this?

[1] https://hg.mozilla.org/mozilla-central/rev/7a6d1ef04809
[2] https://hg.mozilla.org/mozilla-central/rev/e0a7360fcc7c

Flags: needinfo?(apehrson) → needinfo?(bvandyk)

EME related stuff sounds like me. Holding NI so I can investigate after the all hands.

There are 55 failures associated to this bug in the last 7 days. These are occurring on windows10-64 debug and windows10-64-shippable opt.

:bryce any updates on this?

Not yet. I have a couple of higher priority issues I'm looking into at the moment, but this is still fairly high up my queue and I hope to get to it in the next week.

Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered]

Starting to look at this.

Notes

  • Windows only failures (win7 + win10).
  • Debug and opt failures.
  • No luck reproing locally after an hour of soaking with --reapeat-until-unexpected.

Analysis

The test is failing on this check. The function containing the check is triggered by timeupdate events, which is interesting given comment 4.

The failed check is run if (_video.currentTime > config.duration) {, i.e. when the video is beyond a certain playback time as specified in the config (5 seconds). The failing check is asserting that the test has a session for every init data in the test file. It expects two, but in the cases we're failing we only have 1. Looking at the test it appears this would happen if the onWaitingForKey function is not being called -- this function looks like the only path to add the second session to the session list.

So if we fire timeupdate after the video has reached 5 seconds and we haven't yet got two sessions we fail. If we're firing less timeupdates it seems to me we should have less opportunities for the check we're failing to be run, which makes a new race unlikely. Andreas, do the changes that mean we fire less timeupdate events change the timing of any of the remaining events?

If this isn't a race, perhaps we're not firing waitingforkey as we should be? I'm currently tracing changes in the regressing bug to see if I can find anything like that.

It's not clear to me why this is Windows specific yet.

Flags: needinfo?(bvandyk)

See comment 4, the regression seems to have happened because there's no longer a timeupdate dispatched from PlayInternal. No other events affected.

Flags: needinfo?(apehrson)
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork]

I think this is the result of bug 1554699 changing the internal state of HTMLMediaElement and in turn that altering our timing of the remaining timeupdates we fire. I.e. not only are we not firing timeupdate in certain cases following bug 1554699, we have modified the timing of the timeupdates we fire outside of those cases. Why this causes the bug is not yet clear.

I've done a number of try runs to try and track this down:

Since the bug is intermittent, it's possible that I've been lucky/unlucky, have not done enough runs to repro, and that the bug is not actually fixed in some of the above cases. Assuming that is not the case, the last run is interesting because it suggests the issue is not us firing events, but changing the internal state.

The only place that the state we're changing above is used is when we call FireTimeUpdate with the aPeriodic flag set to true. This happens in two places I can see:

So my hypothesis is that the in state and thus the change in timing around these is what is causing the intermittent.

Regressed by: 1554699

It could be triggering some obscure, latent, race I guess.

Have you managed to repro with logs, e.g., MOZ_LOG=timestamp,nsMediaElement:5,nsMediaElementEvents:5 ?

Moving the check that the failing test is doing on timeupdate to ended seemingly fixes the issue also and seems like a better way to detect that playback has ended than checking in timeupdate with a duration. However, it would be good to understand why this is happening before we update the test.

Can we set logs in the JS of the wpt such that we would get those logs on try? That would be the easiest way to confirm further.

Try run where I overload the assert to do some logging. It looks like we're not ever firing waitingForKey which is a real head-scratcher.

Trimmed log from when the assert fires:

video.duration: 5.083333, video.currentTime: 5.077332, everWaitingForKey:false, waitingForKeyTime: 0

In local runs I see waitingForKey fired just before 2 seconds. So the test is not seeing this event, and we're also continuing to play the video to the end.

I'm going to retrigger some more test runs on the cases mentioned in comment 4 to see if we get any more fails with a bigger sample.

(In reply to Bryce Seager van Dyk (:bryce) from comment #17)

Can we set logs in the JS of the wpt such that we would get those logs on try? That would be the easiest way to confirm further.

You should be able to set them by prefs:
logging.config.timestamp:true
logging.config.sync:true
logging.nsMediaElement:5
logging.nsMediaElementEvents:5

Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork]
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fb8fbe6000f3 disabled test on win for frequent failures r=jmaher
Keywords: leave-open
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell disabled]
Has Regression Range: --- → yes
Severity: normal → S3
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: