Open Bug 1557980 Opened 5 years ago Updated 2 years ago

MediaStream returned by canvas.captureStream() using "bitmaprenderer" and transferFromImageBitmap not recorded by MediaRecorder; MediaStreamTrack mute event not fired

Categories

(Core :: WebRTC, defect, P3)

69 Branch
defect

Tracking

()

People

(Reporter: guest271314, Unassigned)

Details

Attachments

(3 files)

Attached file index.html

User Agent: Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/76.0.3806.1 Chrome/76.0.3806.1 Safari/537.36

Steps to reproduce:

  1. Within a JavaScript module a) create an HTML <video> element; b) create a ReadableStream instance; c) export the ReadableStream instance to a <script type="module">; d) when playback of media begins pass <video> element to createImageBitmap and pass the ImageBitmap to ReadableStreamDefaultController.enqueue(); e) repeat d) using requestAnimationFrame until playback of media is paused

  2. Within <script type="module"> a) import the exported ReadableStream; b) create <video> and <canvas>; c) set context of <canvas> to "bitmaprenderer"; d) create a MediaStream using canvas.captureStream(0) and a MediaRecorder instance with MediaStream passed as parameter and set the MediaStream instance as srcObject of a <video> element at HTML document; e) for each read of enqueued data call canvas.transferFromImageBitmap() and requestFrame() of MediaStream or MediaStreamTrack; f) repeat e) until ReadableStreamDeafultReader done is true; f) call video.pause(); g) at video pause event or MediaStreamTrack mute event call MediaRecorder.stop()

Actual results:

  1. The <video> element at HTML document renders only transparent frames
  2. The mute event of MediaStreamTrack is not fired when images are no longer transferred to the <canvas> element
  3. The resulting Blob at MediaRecorder dataavailable event has size 217 and does not play when passed to URL.createObjectURL() and "No video with supported format and MIME type found." error is rendered at <video> element where Blob URL is set as src

Expected results:

  1. The images set at <canvas> by transferFromImageBitmap should be rendered at <video> played back where MediaStream returned by canvas.captureStream() is set as srcObject
  2. The mute event of MediaStreamtrack should be fired when "The MediaStreamTrack object's source is temporarily unable to provide data." and the ended event of MediaStreamTrack should also be fired "The MediaStreamTrack object's source will no longer provide any data, either because the user revoked the permissions, or because the source device has been ejected, or because the remote peer permanently stopped sending data."
    https://w3c.github.io/mediacapture-main/#event-summary
  3. The resulting Blob at MediaRecorder dataavailable event should be a webm file cconsisting of the images recorded using canvas.transferFromImageBitmap() requestFrame() of MediaStream or MediaStreamTrack

plnkr consisting of attached files (index.html and script.js) to demonstrate the issue

Product: Firefox → Core
Component: Untriaged → Canvas: 2D
Attachment #9070808 - Attachment description: script.js → script.js https://plnkr.co/edit/5bvp9xv0ciMYfVzG?p=preview
Attachment #9070808 - Attachment description: script.js https://plnkr.co/edit/5bvp9xv0ciMYfVzG?p=preview → script.js (https://plnkr.co/edit/5bvp9xv0ciMYfVzG?p=preview)

Essentially the same code as index.html and script.js in one HTML file

<!DOCTYPE html>
<html>
<head>
</head>
<body>
  <video id="source" autoplay muted controls></video><br>
  <video id="sink" autoplay muted controls></video>
  <script>
    (async() => {
      const [source, sink] = document.querySelectorAll("video");
      const canvas = document.createElement("canvas");
      canvas.width = canvas.height = 500;
      const ctx = canvas.getContext("bitmaprenderer");
      const canvasStream = canvas.captureStream(60);
      const [videoTrack] = canvasStream.getVideoTracks();
      console.log(videoTrack);
      videoTrack.onunmute = e => console.log(e);
      videoTrack.onmute = e => console.log(e);
      const response = await (await fetch("https://upload.wikimedia.org/wikipedia/commons/d/d9/120-cell.ogv")).blob();
      source.src = URL.createObjectURL(response);
      sink.srcObject = canvasStream;
      source.onplay = e => {
        requestAnimationFrame(async function draw() {
          if (source.paused) {
            return;
          } else {
            const imageBitmap = await createImageBitmap(source);
            console.log(imageBitmap);
            ctx.transferFromImageBitmap(imageBitmap);
            requestAnimationFrame(draw);
          }
        });
      }
    })();
  </script>
</body>
</html>

What is the reason the ImageBitmap is not drawn onto the <canvas>?

Component: Canvas: 2D → WebRTC

It seems to me that ImageBitmapRenderingContext::GetSurfaceSnapshot() [1] doesn't return any image for us when we're capturing this canvas.
Unclear so far whether this is a bug in the context or the captureStream code.

(In reply to guest271314 from comment #0)

  1. The mute event of MediaStreamtrack should be fired when "The MediaStreamTrack object's source is temporarily unable to provide data." and the ended event of MediaStreamTrack should also be fired "The MediaStreamTrack object's source will no longer provide any data, either because the user revoked the permissions, or because the source device has been ejected, or because the remote peer permanently stopped sending data."
    https://w3c.github.io/mediacapture-main/#event-summary
    These events are not expected. The canvas doesn't know there won't be more frames coming. One could come at any time. The application might know, but since we cannot read its mind, we don't know that we should dispatch any events.

An argument could be made for some event on a tainted webgl context, but I think that's about it.

[1] https://searchfox.org/mozilla-central/rev/e42e1a1be58abbd81efd8d78d6d0699979f43138/dom/canvas/ImageBitmapRenderingContext.cpp#163

Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P3

(In reply to Andreas Pehrson [:pehrsons] from comment #4)

An argument could be made for some event on a tainted webgl context, but I think that's about it.

[1] https://searchfox.org/mozilla-central/rev/e42e1a1be58abbd81efd8d78d6d0699979f43138/dom/canvas/ImageBitmapRenderingContext.cpp#163

The canvas in the code should not be tainted.

This is, the <video> identified as source has src set to a Blob URL, which is local to the document, not cross-origin.

Just noticed

These events are not expected. The canvas doesn't know there won't be more frames coming. One could come at any time. The application might know, but since we cannot read its mind, we don't know that we should dispatch any events.

The canvas does know that more frames are not coming at that specific point in time proven by the fact that no frames are coming at that moment in time. You can read the mind of the application - by reading its input and output. No input means mute should be dispatched. if input does come unmute should be dispatched. That is a rational expectation given that the mute and unmute events are described to do just that.

For example, running different variations of code for this question https://stackoverflow.com/q/56510151 (at Chromium) will dispatch mute and unmute several times in succession, as pointed out by Kaiido https://stackoverflow.com/questions/56510151/change-playout-delay-in-webrtc-stream#comment99624787_56510151 (note also the issue of a DOMException error still being thrown at Firefox even when unmute of MediaStreamTrack has already been fired). Am simply stating that the various implementations are inconsistent with the actual language at the specifications, depending on if WebRTC or HTMLMediaElement and <canvas> are used.

Attached file data URL of canvas

If canvas.toDataURL() is called on the <canvas> after ctx.transferFromImageBitmap(imageBitmap) the image is output

(In reply to guest271314 from comment #7)

Just noticed

These events are not expected. The canvas doesn't know there won't be more frames coming. One could come at any time. The application might know, but since we cannot read its mind, we don't know that we should dispatch any events.

The canvas does know that more frames are not coming at that specific point in time proven by the fact that no frames are coming at that moment in time. You can read the mind of the application - by reading its input and output. No input means mute should be dispatched. if input does come unmute should be dispatched. That is a rational expectation given that the mute and unmute events are described to do just that.

Define "input"?

(In reply to Andreas Pehrson [:pehrsons] from comment #10)

(In reply to guest271314 from comment #7)

Just noticed

These events are not expected. The canvas doesn't know there won't be more frames coming. One could come at any time. The application might know, but since we cannot read its mind, we don't know that we should dispatch any events.

The canvas does know that more frames are not coming at that specific point in time proven by the fact that no frames are coming at that moment in time. You can read the mind of the application - by reading its input and output. No input means mute should be dispatched. if input does come unmute should be dispatched. That is a rational expectation given that the mute and unmute events are described to do just that.

Define "input"?

In this case ctx.transferFromImageBitmap(imageBitmap); every second per const canvasStream = canvas.captureStream(60); (the same holds for captureStream(0) with requestFrame() compared to requestFrame not being called) within requestAnimationFrame(async function draw() {}) compared to return; (no input) within the same animation frame provider function.

(In reply to Andreas Pehrson [:pehrsons] from comment #10)

(In reply to guest271314 from comment #7)

Just noticed

These events are not expected. The canvas doesn't know there won't be more frames coming. One could come at any time. The application might know, but since we cannot read its mind, we don't know that we should dispatch any events.

Relevant to what input is, or how the implementation interprets input, perhaps you are more well-suited than this user to explain why mute and unmute events are dispatched in succession at this jsfiddle https://jsfiddle.net/be5672am/ (a fork of the code at the above linked SO question where only mute and unmute event handlers were added to the code). Note, the code needs to be run at Chromium/Chrome to observe the cycle of mute/unmute events being fired, as Firefox is still broken as to the NotSupportedError: Operation is not supported referencing

dispatchEvent resource://gre/modules/media/PeerConnection.jsm:748
_fireLegacyAddStreamEvents resource://gre/modules/media/PeerConnection.jsm:1403
onSetRemoteDescriptionSuccess resource://gre/modules/media/PeerConnection.jsm:1768
haveSetRemote resource://gre/modules/media/PeerConnection.jsm:1079
haveSetRemote resource://gre/modules/media/PeerConnection.jsm:1076
InterpretGeneratorResume self-hosted:1284
AsyncFunctionNext self-hosted:839

that prevents code from running

That is, prevents MediaRecorder code from running.

#10 To be clear, after a modicum of experimenting with WebRTC within the context of trying to use MediaRecorder specifically to record multiple MediaStreamTracks captured using captureStream() to a single webm file as output have found that mute, unmute and ended events are not reliably fired in a consistent manner compared to WebRTC dispatching the same events. This users' primary focus at this point, in lieu of "multiple tracks" being clearly defined re MediaRecorder is to find a way to implement mkvmerge at JavaScript, where a user can execute mkvmerge({w: true, o: "int_all.webm": i: ["int.webm", "int1.webm", ...."intN.webm"]}) in JavaScript (https://twitter.com/guest271314/status/1127605675910025216; https://github.com/emscripten-core/emsdk/issues/260) code instead of $ mkvmerge -w -o int_all.webm int.webm + int1.webm (which also sets duration of the resulting webm or mkv file) at terminal - instead of waiting to MediaRecorder to have the ability to record multiple MediaStreamTracks of kind "video" to a single file (not necessarily multiple video tracks in the same container; more the effect of using RTCRtpSender.replaceTrack() https://github.com/w3c/mediacapture-record/issues/167). Have not yet learned how to use Emscripten to achieve that requirement, though have previously used Native Messaging to run espeak and espeak-ng at Chromium to output a file of the synthesized audio output. Unfortunately lost the OS where performed the tests though have a rough model of the process preserved at https://stackoverflow.com/q/48219981. Would prefer to have access to the executable without having to use an " Chrome app". Eventually (re-reading this https://github.com/GoogleChromeLabs/webm-wasm/blob/master/build.sh right now) will determine how to use LLVM and/or Empscripten perform the port of the C++ code to JavaScript.

Just happended to encounter this bug along the way.

Interestingly trying to merge a webm file created by Firefox with a webm file created by Chromium will log an error when using mkvmerge Error: The track number 0 from the file 'chromium.webm' cannot be appended to the track number 0 from the file 'firefox.webm'. The formats do not match.

(In reply to guest271314 from comment #11)

In this case ctx.transferFromImageBitmap(imageBitmap); every second per const canvasStream = canvas.captureStream(60); (the same holds for captureStream(0) with requestFrame() compared to requestFrame not being called) within requestAnimationFrame(async function draw() {}) compared to return; (no input) within the same animation frame provider function.

So you're saying if a CanvasCaptureMediaStreamTrack track had its internal frameCaptureRequested property set to true when the canvas was rendered, without anything having been drawn to the canvas since the last time it was rendered, track should fire "mute" ?

That could easily be a very spammy event.

Since the application is fully aware that it didn't draw anything to the canvas, why does it need to be told by the "mute" event?

IMHO the sole purpose of "mute" is to signal that the source of a track is unable to provide data, when this was outside the control of the application - because that's the only way the application could become aware of this. In the case you're describing it's not outside the control of the application, since the application is controlling the drawing to the canvas.

But, if you want this functionality I suggest you work to get it into the spec for HTMLCanvasElement.captureStream(). That's the only way this could end up interoperable, as the language around when to mute and unmute in mediacapture-main is vague.

(In reply to guest271314 from comment #12)

Relevant to what input is, or how the implementation interprets input, perhaps you are more well-suited than this user to explain why mute and unmute events are dispatched in succession at this jsfiddle https://jsfiddle.net/be5672am/

Didn't look at the events, but fixed some bugs in that fiddle. Can't answer questions on MSE though. https://jsfiddle.net/pehrsons/kj5gm09c/

(In reply to guest271314 from comment #15)

Interestingly trying to merge a webm file created by Firefox with a webm file created by Chromium will log an error when using mkvmerge Error: The track number 0 from the file 'chromium.webm' cannot be appended to the track number 0 from the file 'firefox.webm'. The formats do not match.

VP8 vs VP9?

#16

Didn't look at the events, but fixed some bugs in that fiddle.

That was the point of sharing the link. To illustrate how mute and unmute are dispatched at Chromium implementation.

VP8 vs VP9?

VP8. The files represented as XML (using vi's mkv2xml at https://github.com/vi/mkvparse) are at https://gist.github.com/guest271314/f942fb9febd9c197bd3824973794ba3e and https://gist.github.com/guest271314/17d62bf74a97d3f111aa25605a9cd1ca. The webm output by Firefox is always of greater size than the file output by Chromium (https://stackoverflow.com/q/54651869).

Changes the landscape. The files are not interoperable. Am considering filing an issue as to that point.

For the SO question MSE will probably not help with what OP is trying to achieve, and the learning curve with MSE "segments" mode is not sharp, not even compared to WebRTC; "sequence" mode has synchronization issues. At least MSE "works" with MediaRecorder at Firefox; the lingering crash issue at Chromium still exists. Wound up creating a version which stores N seconds of images then continues to push and shift from the array https://plnkr.co/edit/gGFQHS?p=preview.

As you can probably estimate, was trying to find a way to write <SimpleBlock>s (as XML or HTML) to a Matroska container, to overcome MediaRecorder not being able to record "multiple tracks".

We should be able to do something like MediaRecorder.write('<SimpleBlock><track>2</track><timecode>0.0</timecode><keyframe/><data>...</SimpleBlock>')

(In reply to Andreas Pehrson [:pehrsons] from comment #16)

(In reply to guest271314 from comment #11)
That's the only way this could end up interoperable, as the language around when to mute and unmute in mediacapture-main is vague.

Agree. Further, the purpose and capability of front-end usage of the various WebRTC extensions can be confusing to users who were not previously expecting to be confused by language in the relevant specification(s); see, in pertinent part https://gist.github.com/jimmywarting/2296342bfb7a31ad26068659d1fd9fc9#gistcomment-2945243 ; https://github.com/w3c/webrtc-pc/issues/2213.

BTW, the gist of the original two file attachments "index.html" and "script.js" was an attempt to implement <video> essentially "offline", or in a Worker thread (experimented with using ES Modules first), that is, for example, https://github.com/guest271314/mediacapture-worker; from which, if followed the links correctly, spawned createImageBitmap() implementation at Firefox.

Continuing the same line of reasoning have been considering what would be necessary to load an HTML document with only a single element, similar to how paintWorklet (stateless) global scope has limited set of defined objects at globalThis; that is a videoWorklet.

Of course, loading <video> means the Web Media Player code implemented at both Firefox and Chromium, and the associated decoders and encoders, which leads to the concept of exposing Media Decoder and Media Encoder and WebM Writer code as API's avaialble to the front-end - which would allow users to bypass <video>, and MediaRecorder.

At Nightly 73 the ImageBimap is displayed onto the <canvas> and MediaRecorder records when "bitmaprenderer" and transferFromImageBitmap() are used.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: