Open Bug 1536156 Opened 2 years ago Updated 1 month ago

Intermittent dom/media/test/test_cloneElementVisually_ended_video.html | Visual clone should display final frame.

Categories

(Core :: Audio/Video: Playback, defect, P5)

defect

Tracking

()

REOPENED
mozilla72
Tracking Status
firefox-esr68 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix
firefox72 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: pehrsons)

References

Details

(Keywords: intermittent-failure, regression, Whiteboard: [stockwell fixed:product])

Attachments

(7 files, 1 obsolete file)

#[markdown(off)]
Filed by: shindli [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=234576174&repo=autoland

https://queue.taskcluster.net/v1/task/COoDzJlqSVesvsC3BKCUbw/runs/0/artifacts/public/logs/live_backing.log

16:39:24 INFO - GECKO(980) | MEMORY STAT | vsize 19405005MB | vsizeMaxContiguous 65083296MB | residentFast 1334MB
16:39:24 INFO - TEST-OK | dom/media/test/test_cueless_webm_seek-3.html | took 551ms
16:39:24 INFO - TEST-START | dom/media/test/test_currentTime.html
16:39:24 INFO - GECKO(980) | MEMORY STAT | vsize 19405006MB | vsizeMaxContiguous 65083296MB | residentFast 1335MB
16:39:24 INFO - TEST-OK | dom/media/test/test_currentTime.html | took 130ms
16:39:24 INFO - TEST-START | dom/media/test/test_decode_error.html
16:39:24 INFO - GECKO(980) | [Child 3028, MediaPlayback #1] WARNING: Decoder=12a9f765e780 state=DECODING_METADATA Decode metadata failed, shutting down decoder: file z:/build/build/src/dom/media/MediaDecoderStateMachine.cpp, line 350
16:39:24 INFO - GECKO(980) | [Child 3028, MediaPlayback #1] WARNING: Decoder=12a9f765e780 Decode error: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006): file z:/build/build/src/dom/media/MediaDecoderStateMachine.cpp, line 3296
16:39:24 INFO - GECKO(980) | JavaScript warning: http://mochi.test:8888/tests/dom/media/test/test_decode_error.html, line 34: Error: This error message will be blank when privacy.resistFingerprinting = true. If it is really necessary, please add it to the whitelist in MediaError::GetMessage: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006)
16:39:24 INFO - GECKO(980) | JavaScript warning: http://mochi.test:8888/tests/dom/media/test/test_decode_error.html, line 35: Error: This error message will be blank when privacy.resistFingerprinting = true. If it is really necessary, please add it to the whitelist in MediaError::GetMessage: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006)
16:39:24 INFO - GECKO(980) | [Child 3028, MediaPlayback #2] WARNING: Decoder=12a9f7defb80 state=DECODING_METADATA Decode metadata failed, shutting down decoder: file z:/build/build/src/dom/media/MediaDecoderStateMachine.cpp, line 350
16:39:24 INFO - GECKO(980) | [Child 3028, MediaPlayback #2] WARNING: Decoder=12a9f7defb80 Decode error: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006): file z:/build/build/src/dom/media/MediaDecoderStateMachine.cpp, line 3296
16:39:24 INFO - GECKO(980) | [Child 3028, MediaPlayback #1] WARNING: Decoder=12a9f7de5680 state=DECODING_METADATA Decode metadata failed, shutting down decoder: file z:/build/build/src/dom/media/MediaDecoderStateMachine.cpp, line 350
16:39:24 INFO - GECKO(980) | [Child 3028, MediaPlayback #1] WARNING: Decoder=12a9f7de5680 Decode error: NS_ERROR
16:39:24 INFO - GECKO(980) | _DOM_MEDIA_METADATA_ERR (0x806e0006): file z:/build/build/src/dom/media/MediaDecoderStateMachine.cpp, line 3296
16:39:24 INFO - GECKO(980) | JavaScript warning: http://mochi.test:8888/tests/dom/media/test/test_decode_error.html, line 34: Error: This error message will be blank when privacy.resistFingerprinting = true. If it is really necessary, please add it to the whitelist in MediaError::GetMessage: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006)
16:39:24 INFO - GECKO(980) | JavaScript warning: http://mochi.test:8888/tests/dom/media/test/test_decode_error.html, line 35: Error: This error message will be blank when privacy.resistFingerprinting = true. If it is really necessary, please add it to the whitelist in MediaError::GetMessage: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006)
16:39:24 INFO - GECKO(980) | JavaScript warning: http://mochi.test:8888/tests/dom/media/test/test_decode_error.html, line 34: Error: This error message will be blank when privacy.resistFingerprinting = true. If it is really necessary, please add it to the whitelist in MediaError::GetMessage: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006)
16:39:24 INFO - GECKO(980) | JavaScript warning: http://mochi.test:8888/tests/dom/media/test/test_decode_error.html, line 35: Error: This error message will be blank when privacy.resistFingerprinting = true. If it is really necessary, please add it to the whitelist in MediaError::GetMessage: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006)
16:39:24 INFO - GECKO(980) | [Child 3028, MediaPlayback #1] WARNING: Decoder=12a9f7de0e80 state=DECODING_METADATA Decode metadata failed, shutting down decoder: file z:/build/build/src/dom/media/MediaDecoderStateMachine.cpp, line 350
16:39:24 INFO - GECKO(980) | [Child 3028, MediaPlayback #1] WARNING: Decoder=12a9f7de0e80 Decode error: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006): file z:/build/build/src/dom/media/MediaDecoderStateMachine.cpp, line 3296
16:39:24 INFO - GECKO(980) | JavaScript warning: http://mochi.test:8888/tests/dom/media/test/test_decode_error.html, line 34: Error: This error message will be blank when privacy.resistFingerprinting = true. If it is really necessary, please add it to the whitelist in MediaError::GetMessage: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006)
16:39:24 INFO - GECKO(980) | JavaScript warning: http://mochi.test:8888/tests/dom/media/test/test_decode_error.html, line 35: Error: This error message will be blank when privacy.resistFingerprinting = true. If it is really necessary, please add it to the whitelist in MediaError::GetMessage: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006)
16:39:24 INFO - GECKO(980) | MEMORY STAT | vsize 19405014MB | vsizeMaxContiguous 65083296MB | residentFast 1342MB
16:39:24 INFO - TEST-OK | dom/media/test/test_decode_error.html | took 564ms
16:39:24 INFO - TEST-START | dom/media/test/test_decoder_disable.html
16:39:25 INFO - GECKO(980) | MEMORY STAT | vsize 19405015MB | vsizeMaxContiguous 65083296MB | residentFast 1342MB
16:39:25 INFO - TEST-OK | dom/media/test/test_decoder_disable.html | took 218ms
16:39:25 INFO - TEST-START | dom/media/test/test_defaultMuted.html
16:39:25 INFO - GECKO(980) | MEMORY STAT | vsize 19405015MB | vsizeMaxContiguous 65083296MB | residentFast 1343MB
16:39:25 INFO - TEST-OK | dom/media/test/test_defaultMuted.html | took 399ms
16:39:25 INFO - TEST-START | dom/media/test/test_delay_load.html
16:39:26 INFO - GECKO(980) | MEMORY STAT | vsize 19405028MB | vsizeMaxContiguous 65083296MB | residentFast 1351MB
16:39:26 INFO - GECKO(980) | JavaScript error: resource://gre/modules/WebProgressChild.jsm, line 61: TypeError: this.mm.content is null
16:39:26 INFO - TEST-OK | dom/media/test/test_delay_load.html | took 1111ms
16:39:26 INFO - TEST-START | dom/media/test/test_duration_after_error.html

First occurrences here are when Bug 1533099 landed. Mike can you please take a look?

There are 91 total failures since the bug was filed 2 days ago, on windows all build types and few occurrences on linux64

Flags: needinfo?(mconley)
Whiteboard: [stockwell needswork:owner]

There are 195 total failures in the last 7 days all on windows10-64 all build types.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=235389465&repo=autoland&lineNumber=1849

07:36:35 INFO - TEST-START | dom/media/test/test_cloneElementVisually_ended_video.html

07:36:36 INFO - TEST-INFO | started process screenshot
07:36:36 INFO - TEST-INFO | screenshot: exit 0
07:36:36 INFO - Buffered messages logged at 07:36:35
07:36:36 INFO - AddTask.js | Entering test
07:36:36 INFO - Buffered messages finished
07:36:36 INFO - TEST-UNEXPECTED-FAIL | dom/media/test/test_cloneElementVisually_ended_video.html | Visual clone should display final frame.
07:36:36 INFO - SimpleTest.ok@SimpleTest/SimpleTest.js:275:18
07:36:36 INFO - @dom/media/test/test_cloneElementVisually_ended_video.html:40:5
07:36:36 INFO - asyncwithNewClone@https://example.com/tests/dom/media/test/cloneElementVisually_helpers.js:139:11
07:36:36 INFO - @dom/media/test/test_cloneElementVisually_ended_video.html:37:9
07:36:36 INFO - async
nextTick/<@SimpleTest/AddTask.js:70:34
07:36:36 INFO - nextTick@SimpleTest/AddTask.js:86:11
07:36:36 INFO - setTimeout handler*SimpleTest_setTimeoutShim@SimpleTest/SimpleTest.js:684:43
07:36:36 INFO - add_task@SimpleTest/AddTask.js:30:7
07:36:36 INFO - @dom/media/test/test_cloneElementVisually_ended_video.html:29:1
07:36:36 INFO - AddTask.js | Leaving test
07:36:36 INFO - GECKO(2768) | MEMORY STAT | vsize 19404994MB | vsizeMaxContiguous 65716605MB | residentFast 1290MB
07:36:36 INFO - TEST-OK | dom/media/test/test_cloneElementVisually_ended_video.html | took 1050ms

Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2f870f3026ed
disabled test_cloneElementVisually_ended_video.htm on win64 r=jmaher
Keywords: leave-open
Whiteboard: [stockwell disable-recommended] → [stockwell disableed]
Whiteboard: [stockwell disableed] → [stockwell disabled]

In the last 7 days there have been 26 occurrences on windows 7 32, build types debug and opt.

Assignee: nobody → apehrson
Status: NEW → ASSIGNED
Blocks: 1532047

This seems to be what tests expect.

Depends on D53731

Blocks: 1596777
Attachment #9109822 - Attachment is obsolete: true

This patch does the following:

  • Makes cloneElementVisually() return a promise
  • Plumbs an event from the MediaDecoderStateMachine's VideoSink to
    HTMLVideoElement
  • Hooks the event up to resolve the promise from cloneElementVisually()
  • Updates tests and their expectations.

Depends on D53731

Unlink must not cause an AddRef on the unlinked object, or it is at risk of
leaking. See 1593739 for an example of where this happens.

EndCloningVisually is dangerous in that it tries to do more than just unlink the
video element. It does AddRef in NotifyUAWidgetSetupOrChange, so that must be
avoided.

Previous patches to this bug make sure that MediaDecoder shutdown takes care of
clearing the secondary video container in the MediaDecoder stack, so this is not
actually necessary anymore.

Depends on D53733

I'm still seeing some failures like this on windows with the patches. The log shows the original video displaying frame 18 and the clone frame 17. To me it seems like the original media element fired "ended" too soon.

(In reply to Andreas Pehrson [:pehrsons] from comment #53)

I'm still seeing some failures like this on windows with the patches. The log shows the original video displaying frame 18 and the clone frame 17. To me it seems like the original media element fired "ended" too soon.

The funny thing is that while frame 17 and 18 were captured by the test (after "ended" !), the screenshot of the same failed test shows the original video element displaying frame 22. That means we got "ended" way too soon.

To see frame 17 and 18 you can modify the log output somewhat and pop it into the reftest-analyzer. Use this log snippet for the log linked above. In summary one can use this template to create snippets parsable by the reftest-analyzer:

TEST-UNEXPECTED-FAIL | dom/media/test/test_cloneElementVisually_ended_video.html ||
IMAGE 1 (original): 
IMAGE 2 (clone): 

Frame 22 is visible in the failure screenshot of the run above.

Alastor, do you have any clue what's going on here?

Flags: needinfo?(alwu)

FWIW I have a retrigger running this test on repeat with full MediaDecoder:5 logs that I hope will yield something: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c2acdadfcdf304435bd7ce4692db2d926072d42f&selectedJob=277038611

How about enabling log with MediaDecoder:5,nsMediaElementEvents:5 first to see how many frames we rendered via VideoSink? Because the last frame should be 47, so painting frame 17, 18 or 22 are all really weird, which means we only play like half time of video duration. Does this issue also happen without appling D53733?

Flags: needinfo?(alwu)

(In reply to Andreas Pehrson [:pehrsons] from comment #56)

FWIW I have a retrigger running this test on repeat with full MediaDecoder:5 logs that I hope will yield something: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c2acdadfcdf304435bd7ce4692db2d926072d42f&selectedJob=277038611

weird, why the full log didn't show in either log viewer or raw log.

I see the log but it didn't run just this test on repeat as advertised. Ok, let's dumb this down a bit.

This worked better:
https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=277161459&revision=0033090e57094b7f9c811c76adbe1f0cc0e3227a

original shows frame 20, clone frame 19, screenshot frame 22

I dumped debug info from the original video and it shows MDSM and the sinks having completed playback. Time is as expected at the end (1.6s).


Looking at a failure on autoland (so without D53733 as suggested) I see the original video showing frame 16, the clone frame 24 and the screenshot frame 18.

The difference here is in the clone's frame, which makes sense, because D53733 changed from using the latest frame to the current one. Still, this is far from frame 47, which the VideoSink claims to have appended already?

The AudioSink seems to play 1.5s of data in ~0.5s of wall time.

[task 2019-11-20T10:46:19.004Z] 10:46:19     INFO - GECKO(4248) | [Child 4036: Unnamed thread 000012DAF0C30080]: V/MediaDecoder AudioSink=000012E0F1CADF80 playing audio at time=0 offset=0 length=2160
...
[task 2019-11-20T10:46:19.498Z] 10:46:19     INFO - GECKO(4248) | [Child 4036: Unnamed thread 000012DAF0C30080]: V/MediaDecoder AudioSink=000012E0F1CADF80 playing audio at time=1532516 offset=0 length=69
...
[task 2019-11-20T10:46:19.529Z] 10:46:19     INFO - GECKO(4248) | [Child 4036: Unnamed thread 000012DAF0C30080]: D/MediaDecoder AudioSink=000012E0F1CADF80 Drained

But the MDSM is not having it:

[task 2019-11-20T10:46:19.436Z] 10:46:19     INFO - GECKO(4248) | [Child 4036: Unnamed thread 000012DAF0C30080]: V/MediaDecoder AudioSink=000012E0F1CADF80 playing audio at time=1393197 offset=0 length=1356
[task 2019-11-20T10:46:19.436Z] 10:46:19     INFO - GECKO(4248) | [Child 4036: MediaDecoderStateMachine #1]: V/MediaDecoder AudioSink=000012E0F1CADF80 AudioStream has used an audio packet.
[task 2019-11-20T10:46:19.441Z] 10:46:19     INFO - GECKO(4248) | [Child 4036: MediaDecoderStateMachine #1]: V/MediaDecoder MediaDecoderStateMachine[000012E8F0D41880] Decoder=12daf053a380 UpdatePlaybackPositionInternal(433916)

The AudioSink has played ~1.4s and is running the MDSM clock, but MDSM claims it's still at ~0.4s.

Here is where video gets confused:

[task 2019-11-20T10:46:19.525Z] 10:46:19     INFO - GECKO(4248) | [Child 4036: MediaDecoderStateMachine #1]: V/MediaDecoder MediaDecoderStateMachine[000012E8F0D41880] Decoder=12daf053a380 UpdatePlaybackPositionInternal(521812)
[task 2019-11-20T10:46:19.529Z] 10:46:19     INFO - GECKO(4248) | [Child 4036: Unnamed thread 000012DAF0C30080]: D/MediaDecoder AudioSink=000012E0F1CADF80 Drained
[task 2019-11-20T10:46:19.529Z] 10:46:19     INFO - GECKO(4248) | [Child 4036: MediaDecoderStateMachine #1]: D/MediaDecoder MediaDecoderStateMachine[000012E8F0D41880] Decoder=12daf053a380 [OnMediaSinkAudioComplete]
[task 2019-11-20T10:46:19.529Z] 10:46:19     INFO - GECKO(4248) | [Child 4036: MediaDecoderStateMachine #1]: V/MediaDecoder MediaDecoderStateMachine[000012E8F0D41880] Decoder=12daf053a380 UpdatePlaybackPositionInternal(1578955)

Audio claims to have finished playing out so MDSM seems to skip the current time right to the end of the audio stream. The video sink sees this and renders frames starting at that point. It ends up dropping all that remain and jump to the completed state. Whatever was set in the ImageContainer last time it rendered frames remains there (and there are frames in the (wall-clock) future!), which is why the ImageContainer keeps rendering frames after the ended event.

Now this could be fixed in the VideoSink, so that it clears the frames on ending, or just sets the last frame.

But it's a form of wallpapering over the root cause. Why is the audio clock going that fast? And why does MDSM still think it's not that fast? A samplerate mismatch/conversion issue?

We're querying cubeb for the stream position, and cubeb is notifying us when we've drained. I don't understand this. I've repushed with verbose cubeb logs to see if that yields some clues.

I got the cubeb logs for a failure and it looks like we're getting callbacks from windows unexpectedly fast.
This is condensed a bit. The first number on each row is the wall clock timestamp in seconds (there were hours and minutes too but they didn't change).

    30.980 Output callback: output frames requested: 4800, got 4800
    30.985 Output callback: output frames requested: 1440, got 1440
    30.995 Output callback: output frames requested: 1440, got 1440
    31.006 Output callback: output frames requested: 1440, got 1440
    31.006 Output callback: output frames requested: 1440, got 1440
    31.013 Output callback: output frames requested: 1440, got 1440
    31.023 Output callback: output frames requested: 1440, got 1440
    31.035 Output callback: output frames requested: 1440, got 1440
    31.047 Output callback: output frames requested: 1440, got 1440
    31.056 Output callback: output frames requested: 1440, got 1440
    31.066 Output callback: output frames requested: 1440, got 1440
    31.078 Output callback: output frames requested: 1440, got 1440
    31.089 Output callback: output frames requested: 1440, got 1440
    31.101 Output callback: output frames requested: 1440, got 1440
    31.112 Output callback: output frames requested: 1440, got 1440
    31.123 Output callback: output frames requested: 1440, got 1440
    31.132 Output callback: output frames requested: 1440, got 1440
    31.143 Output callback: output frames requested: 1440, got 1440
    31.153 Output callback: output frames requested: 1440, got 1440
    31.153 Output callback: output frames requested: 1440, got 1440
    31.163 Output callback: output frames requested: 1440, got 1440
    31.174 Output callback: output frames requested: 1440, got 1440
    31.185 Output callback: output frames requested: 1440, got 1440
    31.195 Output callback: output frames requested: 1440, got 1440
    31.207 Output callback: output frames requested: 1440, got 1440
    31.216 Output callback: output frames requested: 1440, got 1440
    31.231 Output callback: output frames requested: 1440, got 1440
    31.239 Output callback: output frames requested: 1440, got 1440
    31.249 Output callback: output frames requested: 1440, got 1440
    31.260 Output callback: output frames requested: 1440, got 1440
    31.273 Output callback: output frames requested: 1440, got 1440
    31.280 Output callback: output frames requested: 1440, got 1440
    31.292 Output callback: output frames requested: 1440, got 1440
    31.292 Output callback: output frames requested: 1440, got 1440
    31.302 Output callback: output frames requested: 1440, got 1440
    31.313 Output callback: output frames requested: 1440, got 1440
    31.323 Output callback: output frames requested: 1440, got 1440
    31.341 Output callback: output frames requested: 1440, got 1440
    31.347 Output callback: output frames requested: 1440, got 1440
    31.357 Output callback: output frames requested: 1440, got 1440
    31.367 Output callback: output frames requested: 1440, got 1440
    31.379 Output callback: output frames requested: 1440, got 1440
    31.389 Output callback: output frames requested: 1440, got 1440
    31.404 Output callback: output frames requested: 1440, got 1440
    31.415 Output callback: output frames requested: 1440, got 1440
    31.425 Output callback: output frames requested: 1440, got 1440
    31.435 Output callback: output frames requested: 1440, got 1440
    31.445 Output callback: output frames requested: 1440, got 1440
    31.445 Output callback: output frames requested: 1440, got 1440
    31.456 Output callback: output frames requested: 1440, got 1440
    31.465 Output callback: output frames requested: 1440, got 430

Per discussion offline with kinetik, I'll file a followup for investigating this further. I'll fix this in the VideoSink for now, per comment 62.

See Also: → 1598143
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/8283eed414d2
Remove unused MediaSink::ClearSecondaryVideoContainer. r=mconley
https://hg.mozilla.org/integration/autoland/rev/d540f1802ff6
Let cloneElementVisually() return a promise that resolves when frames have been rendered. r=alwu,mconley,bzbarsky
https://hg.mozilla.org/integration/autoland/rev/77c16444e714
Simplify setting the current frame in the secondary video container. r=alwu
https://hg.mozilla.org/integration/autoland/rev/29dd64930421
Stop doing EndCloningVisually in Unlink. r=mconley
https://hg.mozilla.org/integration/autoland/rev/a3fa99d936f3
Clear future frames when VideoSink is finished. r=padenot

I think the problem is https://searchfox.org/mozilla-central/rev/581466eef9269afb03d8a0dba2f50215f3a9026c/dom/media/VideoFrameContainer.cpp#223 where we clear everything but the first frame, when in fact the current frame is the last frame.

And if I run this reftest locally it looks better with this fixed. But this is failing for me locally even without the patch stack I landed here so I'm not sure what the harness thinks. Waiting for try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=aced759507f2e57220193ec46246c1261d2106b6

Flags: needinfo?(apehrson)
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/fbae3fddc26f
Remove unused MediaSink::ClearSecondaryVideoContainer. r=mconley
https://hg.mozilla.org/integration/autoland/rev/3a71396093c4
Let cloneElementVisually() return a promise that resolves when frames have been rendered. r=alwu,mconley,bzbarsky
https://hg.mozilla.org/integration/autoland/rev/041cc5d3049e
Simplify setting the current frame in the secondary video container. r=alwu
https://hg.mozilla.org/integration/autoland/rev/c250c5f5059b
Stop doing EndCloningVisually in Unlink. r=mconley
https://hg.mozilla.org/integration/autoland/rev/ab56a61b9a2f
Clear future frames when VideoSink is finished. r=padenot
https://hg.mozilla.org/integration/autoland/rev/9e3e7cb95ee1
Make VideoFrameContainer::ClearFutureFrames keep the current frame instead of the first frame. r=padenot
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Flags: needinfo?(mconley)
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla72
Duplicate of this bug: 1532047
Whiteboard: [stockwell disabled] → [stockwell fixed:product]
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
You need to log in before you can comment on or make changes to this bug.