Closed Bug 1538508 Opened 5 years ago Closed 5 years ago

FF 66.0 breaks H.264 basline constrained for WebRTC

Categories

(Core :: WebRTC, defect, P2)

66 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- unaffected
firefox66 --- wontfix
firefox67 --- fixed
firefox68 --- verified

People

(Reporter: foreverneilyoung, Assigned: jya)

References

(Regression)

Details

(Keywords: regression)

Attachments

(4 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36

Steps to reproduce:

https://dragonfly-demo.accuware.com:8443/?room=pi1&mode=1

This URL fetches a video from a Raspberry Pi, which is configured to deliver H.264 only

Actual results:

The remote video is not shown

Expected results:

The remote video should be shown, as in Chrome and Safari

Symptom: RTCPeerConnection.onloadedmetadata is not thrown in this case.

This wouldn't be a regression introduced in 66, but one introduce in 67.

Those streams never have a SPS/PPS NAL, and so a decoder is never instantiated and the frame is silently dropped.
The first frame is an IDR by itself, without a SPS/PSS. That frame is silently dropped causing no error with the system decoder.

OpenH264 previously would have caused an error to be raised.
That error leads to the server only then properly sending SPS/PPS and so decoding can start.

The way to fix this on your side is to make your stream correct, the first packet with an IDR must also contain a SPS/PPS.

Until we can provide a work around in our code.

Hi Jean-Yves,

thanks for your answer. The issue is, it is not my H.264 encoder which causes the trouble. I'm using UV4L on a Raspberry PI here.

I'm not quite sure what you mean by this:

This wouldn't be a regression introduced in 66, but one introduce in 67.

I'm having this issue, since I have upgraded to FF 66 on all machines. The H.264 display works fine on Ubunutu (FF 65.01) and after downgrading on my Mac to FF 64.02 it works well here too again.

I will try to let the UV4L guys know about your comment, but basically I think they are just taking the hardware encoded H.264 which comes from the Raspberry CSI cam.

(In reply to foreverneilyoung from comment #2)

Hi Jean-Yves,

thanks for your answer. The issue is, it is not my H.264 encoder which causes the trouble. I'm using UV4L on a Raspberry PI here.

I'm not quite sure what you mean by this:

This wouldn't be a regression introduced in 66, but one introduce in 67.

I'm having this issue, since I have upgraded to FF 66 on all machines. The H.264 display works fine on Ubunutu (FF 65.01) and after downgrading on my Mac to FF 64.02 it works well here too again.

Sorry, I meant a regression introduced in FF 65.
I don't see how it could work for in 65 though. This particular webrtc code didn't change between 65 and 66.

I will try to let the UV4L guys know about your comment, but basically I think they are just taking the hardware encoded H.264 which comes from the Raspberry CSI cam.

They need to send the SPS/PPS NAL along the first IDR.

In the mean time, I'll look for a workaround, but that won't be available until FF 67 at best :(

Thanks for clarification. I can assure you that it worked on Ubuntu 65.01. Meanwhile there is also 66 running, so it doesn't work any longer. While I was filing this bug I have made a lot of attempts with 65.01 and all worked fine.

As said, I don't think that I have any means to control what happens in either the Raspicam, the raspicam driver or the closed source UV4L code. At least I have forwarded your comment regarding SPS/PPS to UV4L. Answer still to come.. Will let you know.

Thanks for taking care. In the end it is just about to restore the same operability, as was before and as is with other browsers.

Regards.

Here is a statement of the UV4L group:

<snip>
It's very likely sps/pps sent by uv4l is in a different rtp packet than the rtp packet containing the idr. In other words, uv4l does not prepend the sps/sps before the idr.

"The way to fix this on your side is to make your stream correct, the first packet with an IDR must also contain a SPS/PPS."

but I am not sure it's a must: I think the first packet can contain the SPS/PPS without the IDR in the same packet.
</snip>

Regards

I hope you noticed that I had to teleport my RPI into another room, since the formerly used room ("pi1") was continuously occupied by one person.

The new room is "pitest", which changes the a.m. test URL to

https://dragonfly-demo.accuware.com:8443/?room=pitest&mode=1

There is a Raspberry PI for your tests and usage. Please don't forget to CLOSE the browser after your test in order to release the room again, otherwise the PI remains allocated.

Thanks

(In reply to foreverneilyoung from comment #6)

I hope you noticed that I had to teleport my RPI into another room, since the formerly used room ("pi1") was continuously occupied by one person.

The new room is "pitest", which changes the a.m. test URL to

https://dragonfly-demo.accuware.com:8443/?room=pitest&mode=1

There is a Raspberry PI for your tests and usage. Please don't forget to CLOSE the browser after your test in order to release the room again, otherwise the PI remains allocated.

My bad it was me, I left my desk with the stream open, in the middle of a debugging session

No issue. Just note the other room now.

This is all just experimental stuff and I'm not really secure against DOS by sleeping away in a debug session :))

Just kidding

UV4L confirms that...

"prepending the sps/pps in the same packet makes firefox happy."

..but...

"I am not convinced it's a bug in the first place. afaik the sps/pps can well be in a separate packet from the idr"

So now. Let's get ready to rumble...

(In reply to foreverneilyoung from comment #9)

UV4L confirms that...

"prepending the sps/pps in the same packet makes firefox happy."

..but...

"I am not convinced it's a bug in the first place. afaik the sps/pps can well be in a separate packet from the idr"

The SPS/PPS doesn't have to be in the same packet, it just needs to be sent before the IDR.

Right now it's not.
No frame can be decoded until the SPS and PPS have been received.

Your page send over 9 frames before a packet with the SPS/PPS is received.

We limited the search for a SPS/PPS change to keyframe only in bug 1469257.
However this causes issues if the first frame containing a SPS/PPS isn't a keyframe.

We also need to error on content with no SPS/PPS as to inform the caller that something is amiss. Such content was invalid to start with.

While not required in the two examples provided, should those streams change resolution and continue to use the same type of bytstreams we would miss the changes as the keyframe never contains the new SPS/PPS NALs.

So we add an option to handle this case, so we can separate the cases where this could be needed without regressing bug 1469257

Depends on D24853

Assignee: nobody → jyavenard
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

Response from UV4L:

The answer is subtle, because it is implictly implying the packet immediately after the SPS/PPS was not an IDR.

If, on the other hand, the packet immediately after the sps/pps was an IDR, Firefox should have started decoding at the moment the IDR was received. And I have never seen the Raspberry Pi encoder or uv4l producing a stream where an IDR did not immediately follow an SPS/PPS.

Turns out we have a religious debate here?

(In reply to foreverneilyoung from comment #14)

Response from UV4L:

The answer is subtle, because it is implictly implying the packet immediately after the SPS/PPS was not an IDR.

If, on the other hand, the packet immediately after the sps/pps was an IDR, Firefox should have started decoding at the moment the IDR was received. And I have never seen the Raspberry Pi encoder or uv4l producing a stream where an IDR did not immediately follow an SPS/PPS.

This is not a religious debate, I'm not sure why you're so keen on creating one. Not very constructive nevertheless.
What you describe is simply not true and completely remote to reality.

In this case we have the first frame is an IDR (NAL type 5), followed by P-slices. If we do not error internally to force webrtc.org to request another IDR, then no SPS/PPS will ever be sent. This is currently what's happening with Firefox 66. No SPS/PPS is ever sent.

If we error on the first IDR decoding as no SPS/PPS has been received and we request another IDR then we will get another 9 P-slices on average before a packet containing a SPS/PPS is received, and following that packet it will continue to send another 2 P-frames before an IDR is finally seen.

IDR do not immediately follow a SPS/PPS and the first IDR is certainly not following a packet containing SPS and PPS NAL. If it did, this regression wouldn't exist.

Attached file nallog-1.txt

Log of the frame types received if we never request a new IDR and accept the incoming data as-is

415 frames, 1 IDR, 414 P-frames. Totally undecodable as no SPS/PPS is ever sent.

This is not a religious debate, I'm not sure why you're so keen on creating one. Not very constructive nevertheless.

LOL, I was just kidding and now you are getting rude? Great and thanks.

What you describe is simply not true and completely remote to reality.

You are totally misunderstanding my role here: I'm NOT UV4L and NOT in charge for dealing with H.264 codecs. I'm the man in the middle, dealing with UV4L on the left and you on the right. And now I'm getting the messages from both sides: I'm not doing anything wrong here. And the problem is not getting solved.

Anyway. I don't care whether or not you are providing a fix or not anymore. I can live with telling the people "Don't use Firefox" if you want to see a WebRTC video from a Raspberry PI.

Out here

(In reply to Jean-Yves Avenard [:jya] from comment #17)

Created attachment 9053529 [details]
nallog-1.txt

Log of the frame types received if we never request a new IDR and accept the incoming data as-is

415 frames, 1 IDR, 414 P-frames. Totally undecodable as no SPS/PPS is ever sent.

Please tell it UV4L directly.

(In reply to foreverneilyoung from comment #18)

What you describe is simply not true and completely remote to reality.

You are totally misunderstanding my role here: I'm NOT UV4L and NOT in charge for dealing with H.264 codecs. I'm the man in the middle, dealing with UV4L on the left and you on the right. And now I'm getting the messages from both sides: I'm not doing anything wrong here. And the problem is not getting solved.

I knew you've been acting as the middle man, and I thank you for that.
The stream you provided allowed to easily and consistently reproduce the problem. I couldn't have done it that easily without your help.

Anyway. I don't care whether or not you are providing a fix or not anymore. I can live with telling the people "Don't use Firefox" if you want to see a WebRTC video from a Raspberry PI.

Out here

Unfortunate that you shut down the stream, it means that our QA team will not be able to verify the fix, slowing down its integration.

Would appreciate if you could turn it back on for now.

Thanks

Attachment #9053506 - Attachment description: Bug 1538508 - P1. Don't limit search for SPS/PPS on keyframe only. r?bryce → Bug 1538508 - P2. Don't limit search for SPS/PPS on keyframe only. r?bryce
Attachment #9053507 - Attachment description: Bug 1538508 - P2. Add options to scan all frames for SPS/PPS change. r?bryce → Bug 1538508 - P3. Add options to scan all frames for SPS/PPS change. r?bryce
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8a573000a49d
P1. Reset mError upon success. r=bryce
https://hg.mozilla.org/integration/autoland/rev/ff707483b5d4
P2. Don't limit search for SPS/PPS on keyframe only. r=bryce
https://hg.mozilla.org/integration/autoland/rev/2fc1fe023957
P3. Add options to scan all frames for SPS/PPS change. r=bryce
Rank: 11
Priority: -- → P2
Attachment #9053535 - Attachment description: Bug 1538508 - P1. Reset mError upon success. r?bryce → Bug 1538508 - P1. Reset mError upon success. r=bryce
Attachment #9053506 - Attachment description: Bug 1538508 - P2. Don't limit search for SPS/PPS on keyframe only. r?bryce → Bug 1538508 - P2. Don't limit search for SPS/PPS on keyframe only. r=bryce
Attachment #9053507 - Attachment description: Bug 1538508 - P3. Add options to scan all frames for SPS/PPS change. r?bryce → Bug 1538508 - P3. Add options to scan all frames for SPS/PPS change. r=bryce
Flags: needinfo?(jyavenard)
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ee7d1a2f8ebd
P1. Reset mError upon success. r=bryce
https://hg.mozilla.org/integration/autoland/rev/ea116addae50
P2. Don't limit search for SPS/PPS on keyframe only. r=bryce
https://hg.mozilla.org/integration/autoland/rev/994589bf1bfb
P3. Add options to scan all frames for SPS/PPS change. r=bryce
Has STR: --- → yes

Do you still need my PI?

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

(In reply to foreverneilyoung from comment #26)

Do you still need my PI?

How long can you keep it up?

Just until our QA team marks it as verified. We have another report with another site, but yours is easier to reproduce.

Basically as long as you need it. Will be out for a couple of days, so the artifical light will be off.

Comment on attachment 9053535 [details]
Bug 1538508 - P1. Reset mError upon success. r=bryce

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: Bug 1505284
  • User impact if declined: Some H264 webrtc stream will fail to decode
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: URL provided in comment 6, or bug 1538079
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Error in streams were being ignored, leading the server to continue serving us invalid content.
    Code was designed to minimize impact on other media playback
  • String changes made/needed: none
  • Do you want to request approval of these patches as well?: on, on
Attachment #9053535 - Flags: approval-mozilla-beta?
Flags: qe-verify?

I would like our QA to verify the patch on Nightly before evaluating a potential uplift, adjusting flags accordingly.

Flags: qe-verify? → qe-verify+
Keywords: regression
QA Whiteboard: [qa-triaged]

Build ID: 20190327175114
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:68.0) Gecko/20100101 Firefox/68.0

Verified as fixed on the latest Nightly build.

Comment on attachment 9053535 [details]
Bug 1538508 - P1. Reset mError upon success. r=bryce

Fix for a regression affecting our ability to decode some H264 webrtc streams, patch verified on Nightly by our QA, uplift approved for 67 beta 7. Thanks folks.

Attachment #9053535 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Thanks for fixing it.

Regards

I have tested the fix with a nightly build of 68 (68.0a1 (2019-03-31) (64 bit)) on macOS Mojave.

At the first glance it is OK, the H.264 video appears. But after a second the video freezes. The developer console shows this:

PC ICE connection state change connected
Remote video play
Remote video playing
PC ICE connection state change disconnected
PC ICE connection state change failed
ICE failed, add a TURN server and see about:webrtc for more details

Might be unrelated (since I have seen this a couple of times already and only with Firefox browsers exclusively, btw). How to proceed with this issue?

There is no TURN server configured, indeed. My ICE configuration is traced in the console:

Instantiate peer connection using ICE server config: {"iceServers":[{"urls":["stun:54.152.156.46:3478","stun:stun.l.google.com:19302"]}]}

All exchanged ICE candidates are NOT of "typ relay", so why should there be a need for TURN?

All other browsers have no issue (namely Chrome and even Safari)

The url has changed and point to different content now, I get not connection regardless of the browser used.

Oh, yes, sorry. I forgot that I changed the room again.

Here you are.

https://dragonfly-demo.accuware.com:8443/?room=pi&mode=1

I added some disco light illumination. I noticed the video stop only after this change, since the only moving part in this video is the second counter in the clock otherwise...

Hope you will be able to confirm. The PI is yours.

No idea what got changed (This is different to the original page) but previously there was no stall of any kind, I had the page showing with the clock progressing for over 1h straight.

In any case, this is a different issue to your original problem as now toggling media.navigator.mediadatadecoder_h264_enabled makes no difference in playback working or not which was what this bug was about.

I suggest you open a new bug so it can be looked at properly.

Yes, this was my impression too. This is an ICE issue, and since I have seen that before (before the H.264 thing) I tend to agree.

Thanks. I will file a new bug.

Please don't forget to leave the room :)

Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: