Closed Bug 1407650 Opened 2 years ago Closed 9 months ago

Intermittent dom/media/test/test_mediarecorder_record_changing_video_resolution.html | Expected number of resize events - got 2, expected 3

Categories

(Core :: Audio/Video: Recording, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: pehrsons)

References

Details

(Keywords: intermittent-failure, Whiteboard: [retriggered][stockwell unknown])

Attachments

(2 files)

Component: Audio/Video → Audio/Video: Recording
https://wiki.mozilla.org/Bugmasters#Intermittent_Test_Failure_Cleanup
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
New failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=171272016&repo=mozilla-inbound&lineNumber=4880
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → INCOMPLETE
We captured this bug and made it debuggable in the Pernosco debugger:
https://pernos.co/debug/eYXBZg-wQLPVre1fsk6Azw/index.html
Access details here: https://mana.mozilla.org/wiki/display/TS/Debugging+with+Pernosco
Email me if you need any help.
I'll take a look.
Assignee: nobody → bugmail
Flags: needinfo?(bugmail)
I spent some time debugging the recording from comment 12, and while the tool is really useful, I think I lack the necessary mental model/code familiarity of the media recorder code to debug this quickly.

Just based on the test output and reading through the test code, it looks like the recording completes and the 'dataavailable' event is supposed to provide the blob that holds the webm video. I was able to inspect that blob in gdb in the recording [1], and it's a 442 byte blob that starts with 0x1a 0x45 0xdf 0xa3 which is the magic header for webm. However when I save that blob locally and try opening it in VLC it seems to be an empty video (no frames). So I'm not sure if the blob is bad or if the real data is stored elsewhere. I was hoping that inspecting the video would allow me to bisect the problem as either on the recording side or on the playback side of the test, but this doesn't match my expectations for either of those cases, so I must be doing something wrong.

The test output also has two instances of "Seeking to next frame finished. width=150, height=150" where I would expect that second one to be 100x100. However my attempts at stepping through the relevant code didn't get me very far because I'm not familiar with the multithreaded/promise architecture of the code and was having a hard time following stuff across threads.

Andreas, do you have any cycles to look at this, or provide any guesses as to where I can focus my debugging efforts?

[1] gdb command "x/442b 0x7f69c694c000" while the blob is alive should do it
Flags: needinfo?(apehrson)
Flags: needinfo?(gbrown)
In the last 7 days, there have been 33 failures.

All of the failures are on linux64-qr, opt and debug build type.

An example of a recent log file:
https://treeherder.mozilla.org/logviewer.html#?job_id=180902327&repo=autoland&lineNumber=2110

And the relevant part of the log:
[task 2018-05-30T13:07:32.164Z] 13:07:32     INFO - TEST-PASS | dom/media/test/test_mediarecorder_record_changing_video_resolution.html | onresize height should be as expected 
[task 2018-05-30T13:07:32.166Z] 13:07:32     INFO - Seeking to next frame finished. width=150, height=150
[task 2018-05-30T13:07:32.168Z] 13:07:32     INFO - Seeking to next frame
[task 2018-05-30T13:07:32.169Z] 13:07:32     INFO - Seeking to next frame finished. width=150, height=150
[task 2018-05-30T13:07:32.171Z] 13:07:32     INFO - Buffered messages finished
[task 2018-05-30T13:07:32.180Z] 13:07:32     INFO - TEST-UNEXPECTED-FAIL | dom/media/test/test_mediarecorder_record_changing_video_resolution.html | Expected number of resize events - got 2, expected 3
[task 2018-05-30T13:07:32.181Z] 13:07:32     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:312:5
[task 2018-05-30T13:07:32.182Z] 13:07:32     INFO -     startTest/mediaRecorder.onstop/video.onended@dom/media/test/test_mediarecorder_record_changing_video_resolution.html:97:7
[task 2018-05-30T13:07:32.184Z] 13:07:32     INFO -     EventHandlerNonNull*startTest/mediaRecorder.onstop@dom/media/test/test_mediarecorder_record_changing_video_resolution.html:96:5
[task 2018-05-30T13:07:32.185Z] 13:07:32     INFO -     EventHandlerNonNull*startTest@dom/media/test/test_mediarecorder_record_changing_video_resolution.html:62:3
[task 2018-05-30T13:07:32.186Z] 13:07:32     INFO - GECKO(1566) | MEMORY STAT | vsize 1838MB | residentFast 147MB | heapAllocated 20MB

:drno As you are the triage owner of this component, could you please take a look at this?
Thank you!
Flags: needinfo?(drno)
Whiteboard: [retriggered] → [stockwell needswork][retriggered]
FWIW I do intend to look at this, but I haven't had time yet.
Unassigning, this is waiting for Andreas to respond and I'm not doing anything with it. If the failures become too frequent we can disable the test for webrender; I'm pretty sure the problem is with the test and not with webrender based on the intermittent failures on non-WR platforms.
Assignee: bugmail → nobody
The issue seems to be that we stop recording before the last frame from the canvas makes it through the MediaStream.

We change the size of the canvas the first time to 100x100 at time 26.075.
We change the size of the canvas the third and last time to 100x100 at time 26.380.

We encode the first frame of 100x100 at time 26.436.
We encode the second frame of 150x150 at time 26.560.

We reach end of stream at time 26.560. At this point mIncomingBuffer contains the third frame, but mOutgoingBuffer does not, so it does not get encoded.

I need to ponder a bit about this but I think it might make sense to flush through mIncomingBuffer on NotifyEndOfStream. Video is pushed so it could have data reaching into the future. With the current data producers we have that shouldn't be too much (10 frames) so it's likely not a problem.

For semantics I think a producer that ends and wants to retract frames should call ClearCurrentFrames(). The question is what removing a listener should mean. That is triggered by MediaRecorder::Stop() and not the frame producer. Currently that is shortcut to VideoTrackEncoder::NotifyEndOfStream and is what's triggering this bug.
Assignee: nobody → apehrson
Flags: needinfo?(drno)
Flags: needinfo?(apehrson)

In the last 7 days, there have been 31 failures.

Most of the failures are on linux64-qr and osx-10-10.

Affected build types: debug, opt.

An example of a recent log file: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=222409505&repo=autoland&lineNumber=2530

And the relevant part of the log:
[task 2019-01-17T10:35:11.893Z] 10:35:11 INFO - TEST-PASS | dom/media/test/test_mediarecorder_record_changing_video_resolution.html | onresize height should be as expected
[task 2019-01-17T10:35:11.894Z] 10:35:11 INFO - Seeking to next frame finished. width=150, height=150
[task 2019-01-17T10:35:11.896Z] 10:35:11 INFO - Seeking to next frame
[task 2019-01-17T10:35:11.898Z] 10:35:11 INFO - Seeking to next frame finished. width=150, height=150
[task 2019-01-17T10:35:11.899Z] 10:35:11 INFO - Buffered messages finished
[task 2019-01-17T10:35:11.901Z] 10:35:11 INFO - TEST-UNEXPECTED-FAIL | dom/media/test/test_mediarecorder_record_changing_video_resolution.html | Expected number of resize events - got 2, expected 3
[task 2019-01-17T10:35:11.901Z] 10:35:11 INFO - SimpleTest.is@SimpleTest/SimpleTest.js:320:5
[task 2019-01-17T10:35:11.902Z] 10:35:11 INFO - startTest/mediaRecorder.onstop/video.onended@dom/media/test/test_mediarecorder_record_changing_video_resolution.html:97:7
[task 2019-01-17T10:35:11.907Z] 10:35:11 INFO - EventHandlerNonNullstartTest/mediaRecorder.onstop@dom/media/test/test_mediarecorder_record_changing_video_resolution.html:96:5
[task 2019-01-17T10:35:11.909Z] 10:35:11 INFO - EventHandlerNonNull
startTest@dom/media/test/test_mediarecorder_record_changing_video_resolution.html:62:3
[task 2019-01-17T10:35:11.911Z] 10:35:11 INFO - GECKO(1832) | MEMORY STAT | vsize 1649MB | residentFast 145MB | heapAllocated 20MB

:[drno] As you are the triage owner of this component, could you please take a look at this?
Thank you!

Flags: needinfo?(drno)
Whiteboard: [retriggered][stockwell unknown] → [retriggered][stockwell unknown][stockwell needswork]
Whiteboard: [retriggered][stockwell unknown][stockwell needswork] → [retriggered][stockwell needswork]

If the analysis in comment 21 still holds, there's a race between capturing a frame on main thread, piping it through the MediaStreamGraph; and MediaRecorder::Stop running on the main thread, then running as a ControlMessage on the graph thread.

The stop message overtakes the frame because ControlMessage run at the beginning of a graph iteration [1].

Medium-term this should be fixed by bug 1423253 (because the frame wouldn't be piped via the graph anymore).

Short-term I could delay the stop() a bit to make this intermittent much less likely.

[1] https://searchfox.org/mozilla-central/rev/dac799c9f4e9f5f05c1071cba94f2522aa31f7eb/dom/media/MediaStreamGraph.cpp#1392

Depends on: 1423253
Flags: needinfo?(drno)
Keywords: leave-open
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/f8a2de33c25f
Avoid test_mr_record_changing_video_resolution.html by delaying stop(). r=jib

This has been fixed long-term by bug 1423253, and the temporary workaround here has been removed.

Status: REOPENED → RESOLVED
Closed: 2 years ago9 months ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla68
You need to log in before you can comment on or make changes to this bug.