Closed Bug 1492038 Opened 6 years ago Closed 5 years ago

Subscribing to WebRTC H.264 video coming from Safari 12 freezes

Categories

(Core :: WebRTC: Audio/Video, defect, P2)

62 Branch
Desktop
macOS
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr60 - wontfix
firefox62 + wontfix
firefox63 + wontfix
firefox64 + wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- fixed
firefox68 --- fixed

People

(Reporter: adam, Assigned: dminor)

References

Details

Today Safari updated to 12 and I noticed that it broke WebRTC video going to Firefox.

Steps to repro:

1. Go to app.rtc (or talky.io, or meet.tokbox.com) on Safari 12 and join a room.
2. Join the same room in Firefox 62 (or 60)

Result: The video displays and then immediately freezes in Firefox.

The same issue is not there for Safari 11, it is also not there if you use Safari on iOS 12, only on Desktop from Mac OS.
I'm also able to reproduce this issue on our end (Confrere) with Firefox Nightly 64.0a1 (2018-09-17), as well as stable. We are currently having to reroute Firefox users to other browsers in the meantime, as our Safari usage is quite high (30%) on our platform.
Unfortunately, I'm not on latest OSX. Dan, can you reproduce and have a look?
Rank: 12
Flags: needinfo?(dminor)
Priority: -- → P2
I reproduced it on High Sierra, and I believe Safari 12 was released for Sierra as well (if that gets you closer to the version you're on Jan-Ivar).

Also, just tested with Safari Technology Preview Release 65 (Safari 12.1, WebKit 13607.1.5.2) and found no issue with Firefox. Seems this issue is only with Version 12.0 (13606.2.11)
(In reply to Jan-Ivar Bruaroey [:jib] (needinfo? me) from comment #2)
> Unfortunately, I'm not on latest OSX. Dan, can you reproduce and have a look?

I'm trying to get the webrtc.org update landed in Firefox 64 so I don't have a lot of extra time to take on a high priority bug at the moment. Maybe Andreas can have a look?
Flags: needinfo?(dminor) → needinfo?(apehrson)
I'll give it a go.
Assignee: nobody → apehrson
Status: NEW → ASSIGNED
Flags: needinfo?(apehrson)
I'm wondering if this is related to the H264 NACK problems which have been reported.
[Tracking Requested - why for this release]: Possibly a Safari issue, but worth tracking.
Whiteboard: [webcompat]
Flags: webcompat?
Whiteboard: [webcompat]
Just for reference: maybe this is related to bug 1489757 on the Firefox side?
See Also: → 1489757
So I just tested a local build with the patch from bug 1489757. And that seems to fix the issue.
Depends on: 1489757
Very unlikely we'll ship a fix for this before Fx63, but tracking for Fx62 on the off chance that we end up shipping another dot release before then.
So here is what Firefox RTP logging tells me:

Fx 64 <-> STP (65 (Safari 12.1, WebKit 13607.1.5.2))  = NACKs & PLIs both work => video works normal
Fx 64 <-> Safari 12 (12.0 (13606.2.11))          = No NACKs, No PLIs           => video freezes
Fx 64 with patch from 1489757 <-> Safari 12      = No NACKs, PLIs work         => video works, but yerky
What I can see in rr with frozen video is that RTP packets are coming in, are decoded through GMP (thus OpenH264), and enter the MediaStreamGraph as expected. For 24 frames during ~0.6s (per TimeStamp::Now()), then it stops.

It looks like we keep feeding frames to GMP-OpenH264 but we're not getting any decoded ones back. Instead OpenH264 tells us there has been a GMPDecodeErr through [1]. Even though a comment [2] says the error was catastrophic and we will stop sending frames to decode, it appears that we keep sending them.

I don't think I can go deeper here.

Nils, who can debug OpenH264?


[1] https://searchfox.org/mozilla-central/rev/a0333927deabfe980094a14d0549b589f34cbe49/media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.h#431-433
[2] https://searchfox.org/mozilla-central/rev/a0333927deabfe980094a14d0549b589f34cbe49/dom/media/gmp/gmp-api/gmp-video-decode.h#61-63
Assignee: apehrson → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(drno)
Sijia and Karina, please help look into the decoding error. Thanks.
Flags: needinfo?(sijchen)
Flags: needinfo?(ruil2)
Flags: needinfo?(drno)
Flags: needinfo?(sijchen) → needinfo?(huili2)
I finally managed to get a setup with a custom-built openh264 library working. The only thing of value so far is this from the log,

> Error: Decoding error dState=34

I can continue looking tomorrow.
Dan, can you please take this over from Andreas? I think this has higher priority then the webrtc.org update.
Assignee: nobody → dminor
(In reply to Andreas Pehrson [:pehrsons] from comment #13)
> What I can see in rr with frozen video is that RTP packets are coming in,
> are decoded through GMP (thus OpenH264), and enter the MediaStreamGraph as
> expected. For 24 frames during ~0.6s (per TimeStamp::Now()), then it stops.
> 
> It looks like we keep feeding frames to GMP-OpenH264 but we're not getting
> any decoded ones back. Instead OpenH264 tells us there has been a
> GMPDecodeErr through [1]. Even though a comment [2] says the error was
> catastrophic and we will stop sending frames to decode, it appears that we
> keep sending them.

Chris - I think you added a stipulation about not sending any more frames to decode that does not exist in the webrtc OpenH264 input stream....  Not sure if that's *why* it's failing, but at minimum the comment is wrong.  See link [2]

> [1]
> https://searchfox.org/mozilla-central/rev/
> a0333927deabfe980094a14d0549b589f34cbe49/media/webrtc/signaling/src/media-
> conduit/WebrtcGmpVideoCodec.h#431-433
> [2]
> https://searchfox.org/mozilla-central/rev/
> a0333927deabfe980094a14d0549b589f34cbe49/dom/media/gmp/gmp-api/gmp-video-
> decode.h#61-63
Flags: needinfo?(cpearce)
I think the code comment about Gecko not sending more input after receiving an error callback was true for the playback GMP consumer of this API, but evidently the WebrtcGmpVideoDecoder does not honour that.

I think the WebRTCGmpVideoDecoder should be propagating this error. It looks like it's trying to [1] but maybe that's not happening for some reason here, or maybe the upstream caller isn't handling the propagated error.

I suspect the problem here is that Safari is sending video packets that OpenH264 can't decode. I'll let Cisco's engineers comment on that.

[1] https://searchfox.org/mozilla-central/rev/a0333927deabfe980094a14d0549b589f34cbe49/media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp#912
Flags: needinfo?(cpearce)
Also, the intention for implementations of the GMPVideoDecoder API for a non-catastrophic error would be to discard the GOP and try to start decoding again at the start next GOP. To achieve this the GMPVideoDecoder impl would need to keep calling the InputDataExhausted() callback at least until Gecko delivered the keyframe at the start of the next GOP.
(In reply to Andreas Pehrson [:pehrsons] from comment #15)
> I finally managed to get a setup with a custom-built openh264 library
> working. The only thing of value so far is this from the log,
> 
> > Error: Decoding error dState=34
> 
> I can continue looking tomorrow.

Replaying this recording now. 34 means `dsRefLost | dsDataErrorConcealed`.
The dsRefLost error happens first, at [1]. The previous frame number is 21 and the current frame number is 27 in my recording.

Subclause 8.2.5.2 in the spec says that when gaps_in_frame_num_value_allowed_flag is 0 (it is) and frame_num (27) is not equal to PrevRefFrameNum (21) and is not equal to ( PrevRefFrameNum + 1 ) (22) % MaxFrameNum (4096), the decoding process should infer an unintentional loss of pictures.

And, well, it does result in a loss of pictures. Without more context this seems like an encoder issue on Safari's part. I'll notify Apple with this information.


[1] https://github.com/cisco/openh264/blob/3c93d6bedfb712109899755b6d9626772cee3847/codec/decoder/core/src/decoder_core.cpp#L2517-L2519
I added a link to these details to the corresponding webkit ticket https://bugs.webkit.org/show_bug.cgi?id=189690#c7
(In reply to Chris Pearce (:cpearce) from comment #19)
> Also, the intention for implementations of the GMPVideoDecoder API for a
> non-catastrophic error would be to discard the GOP and try to start decoding
> again at the start next GOP. To achieve this the GMPVideoDecoder impl would
> need to keep calling the InputDataExhausted() callback at least until Gecko
> delivered the keyframe at the start of the next GOP.

The webrtc code doesn't necessarily have GOPs (typically it doesn't); IDRs/Keyframes occur in response to unrecoverable losses or decoding errors (like in this case).  It's made worst (until andj223's patch lands/landed) by the WebRTC GMP Decode call having been moved from sync to async, which lost the return value (error), and so the webrtc code assumed all-was-well, and no IDR is requested.

In realtime communication, a viable option is to NOT stop decoding on loss of data - yes there will be errors in the video until an IDR is received, but it's better to see the other person moving with a few wrong pixels/blocks than to have everything freeze for a longish time.  (though I'll admit this isn't what upstream webrtc code does, or what we do.  Also, recovery-via-nack-and-rexmit is pretty effective and avoids needing an IDR in most packetloss cases, and for that you need to freeze. (Ok, you don't *have* to, but not doing so is tricky/expensive.)
I found another freezing issue with H.264 in Firefox, I'm not sure if it's related because it's happening even Firefox to Firefox, Safari 12 does not need to be involved. 
https://bugzilla.mozilla.org/show_bug.cgi?id=1493079
I started looking into this today. I'm seeing a different error than what Andreas encountered:

[OpenH264] this = 0x0x7fba8834e780, Debug:reference picture introduced by this frame is lost during transmission! uiTId: 0

which occurs at [1]. Digging in, this is caused by an error in WelsReorderRefList at [2].

Interestingly, if I comment out the error handling if statement around that error at [3] and instead continue as if no error occurred, the video runs perfectly smoothly, even with the patch from Bug 1489757 removed. Perhaps this means that the error handling in [2] is stricter than it needs to be. I'll dig in further.

[1] https://github.com/cisco/openh264/blob/openh264v1.7.1/codec/decoder/core/src/decoder_core.cpp#L2395
[2] https://github.com/cisco/openh264/blob/openh264v1.7.1/codec/decoder/core/src/manage_dec_ref.cpp#L247
[3] https://github.com/cisco/openh264/blob/openh264v1.7.1/codec/decoder/core/src/decoder_core.cpp#L2390
@Dan

How did you reproduce this issue? if possible, could you send the bitstream to us. we can do some analysis on this. thanks!
Flags: needinfo?(ruil2)
Hi Karina,

To reproduce this:
1) On an OS X laptop with Safari 12, connect to an appear.in room, e.g. https://appear.in/dminor
2) From an another machine (I was using Ubuntu 18.04) connect to the same room.

With the patch from Bug 1489757 applied, you will see that the video is slow and choppy. If you remove that patch, the video will freeze after a few seconds, which makes it easier to see the problem.

If you want to test openh264 changes in Firefox, all you need to do is build the plugin and copy it over to the user's profile directory, under gmp-gmpopenh264/1.7.1.

Andreas forwarded the bitstream to you by email. Thanks for looking at this!
thanks for this info.

had received bitstream, is digging into openh264 decoder. will update if have any progress.
After digging into decoder, we found above code not exactly the root cause. Actually it is due to this syntax "max_long_term_frame_idx_plus1" not correctly set in Safari 12 encoder. It should be set by encoder through MMCO=4 operation, missed in encoder.
We can prevent this simply by commenting here:
https://github.com/cisco/openh264/blob/cbf45b2275dc733c52cd122e51f51f26fd54c341/codec/decoder/core/src/manage_dec_ref.cpp#L415
Flags: needinfo?(huili2)
(In reply to wayne from comment #29)
> After digging into decoder, we found above code not exactly the root cause.
> Actually it is due to this syntax "max_long_term_frame_idx_plus1" not
> correctly set in Safari 12 encoder. It should be set by encoder through
> MMCO=4 operation, missed in encoder.
> We can prevent this simply by commenting here:
> https://github.com/cisco/openh264/blob/
> cbf45b2275dc733c52cd122e51f51f26fd54c341/codec/decoder/core/src/
> manage_dec_ref.cpp#L415

Thanks a lot for digging in!

Is this something which needs to get fixed on the Apple encoder side, or can we easily make OpenH264 flexible enough to accept this without breaking too much?
QA Contact: jib
Bug 1496529 would provide an alternative to all those problems (and lower CPU usage too)
See Also: → 1496529
QA Contact: jib
I suggest it be fixed on Apple encoder side. Otherwise, the above modification can also be an alternation as in comment 29.
Punting this for ESR60 to the next cycle, though it sounds like we may end up just calling it wontfix there.
Blocks: 1486988
Depends on: 1505284
From my point of view bug 1505284 can not be the solution for this problem for two reasons:
- I don't see us uplifting changes from bug 1505284 fast enough to address this issue quickly
- Resolving bug 1505284 would only solve the problem for Firefox users on Mac, but the problem remains on all other platforms.
No longer depends on: 1505284
(In reply to Nils Ohlmeier [:drno] from comment #35)
> From my point of view bug 1505284 can not be the solution for this problem
> for two reasons:
> - I don't see us uplifting changes from bug 1505284 fast enough to address
> this issue quickly
> - Resolving bug 1505284 would only solve the problem for Firefox users on
> Mac, but the problem remains on all other platforms.

bug 1505284 will resolve the issue on all platforms (mac, windows, linux, bsd) not just mac but about < 1% of users not installing Windows Media Pack or not having ffmpeg installed (and there's no guarantee that this 1% overlap webrtc users).

1505284 is also just a pref change.

I do agree that we need the openh264 resolution still
See Also: → 1505284
(In reply to wayne from comment #32)
> I suggest it be fixed on Apple encoder side. Otherwise, the above
> modification can also be an alternation as in comment 29.

Has someone managed to contact Apple about their encoder issue?

In the meantime, if it doesn't cause regressions with other streams in practice, can this be wallpapered in OpenH264 while also putting pressure on Apple?
Flags: needinfo?(ruil2)
Flags: needinfo?(huili2)
Flags: needinfo?(drno)
(In reply to Randell Jesup [:jesup] from comment #37)
> (In reply to wayne from comment #32)
> > I suggest it be fixed on Apple encoder side. Otherwise, the above
> > modification can also be an alternation as in comment 29.
> 
> Has someone managed to contact Apple about their encoder issue?

I talked to their team during FOMS last month, they are aware of it and intend to fix it.

However, this is likely not going to be in until next year macOS version due to their release schedule.
I presume that means O(1 year), so not soon enough
All, we will release alternative openh264 version patch based on ver 1.8 to temporarily fix this issue.
Flags: needinfo?(huili2)
Yes Apple is aware of the issue. And AFAIK they are planing on shipping a a fix in their next major update (which might be quite some time away). That's why Cisco agreed to give us an update of OpenH264 plugin with a workaround until Apple has shipped a fix on their end.
Flags: needinfo?(drno)
What's the timeline for the openh264 update?
Flags: needinfo?(huili2)
Blocks: 1462786
If this does make it into 64 we can take it for ESR as well. Otherwise looks like this will slip to 65/ esr60.5.0.
We planned to release a fix in early next week, if everything goes well.
Flags: needinfo?(huili2)
Blocks: 1512756
No longer blocks: 1486988
Blocks: 1486988
No longer blocks: 1512756
The release v1.8.1 is to fix this issue in openh264, see Bug 1486988.
No longer blocks: 1462786

This is fixed by the 1.8.1 OpenH264 update in Bug 1486988, which is currently available for Nightly and on Beta.

Status: NEW → RESOLVED
Closed: 5 years ago
Flags: needinfo?(ruil2)
Resolution: --- → WORKSFORME

OpenH264 is no longer used to decode H264 stream since FF 66.

It also works with media.navigator.mediadatadecoder_h264_enabled set to false, which would use the OpenH264 decoder.

You need to log in before you can comment on or make changes to this bug.