Closed Bug 1489757 Opened Last year Closed Last year

Changeset 549f0b8075d5 causes video streams to take a very long time to recover from packet loss

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox-esr60 63+ fixed
firefox62 --- wontfix
firefox63 + fixed
firefox64 --- verified

People

(Reporter: ajohnson, Assigned: ajohnson, NeedInfo)

References

Details

(Keywords: regression)

Attachments

(1 file, 4 obsolete files)

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36

Steps to reproduce:

Changeset 549f0b8075d5 causes video streams to take a very long time to recover from packet loss. The repro details are private, but I've had some conversations about this with FF devs on IRC and I'll attach a proposed patch.
Here is a proposed patch as previously discussed.
Attachment #9007436 - Flags: review?(rjesup)
Attachment #9007436 - Flags: review?(docfaraday)
Comment on attachment 9007436 [details] [diff] [review]
fix_for_549f0b8075d5_packetloss_handling_issue_rev4.patch

Review of attachment 9007436 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with nit fixes

::: media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp
@@ +790,5 @@
>    }
>  
> +  if(mDecoderStatus != GMPNoErr){
> +    mDecoderStatus = GMPNoErr;
> +    LOG(LogLevel::Error, ("%s: Decoder status is bad (%u)!",

It's an interesting question whether to reset the error code, or wait for an IDR to force it to reset.  However, until we get an IDR it will likely keep getting set back to an error, and an IDR will stop that (though we may send a FIR or two past when it has actually arrived).  Given that, resetting it to false here doesn't actually do much useful.

Also: this needs a comment as to why this is being done, and a pointer to the bug that willfix this the right way.  Like this: // XXX Bug NNNNNNN - this is an ugly solution to asynchronous decoding errors not being returned to the synchronous Decode() method. 
(Or something like that)

@@ +827,5 @@
>    }
>  
> +  if(mDecoderStatus != GMPNoErr){
> +    mDecoderStatus = GMPNoErr;
> +    LOG(LogLevel::Error, ("%s: Decoder status is bad (%u)!",

ditto
Attachment #9007436 - Flags: review?(rjesup) → review+
(In reply to Randell Jesup [:jesup] from comment #2)
> Comment on attachment 9007436 [details] [diff] [review]
> fix_for_549f0b8075d5_packetloss_handling_issue_rev4.patch
> 
> Review of attachment 9007436 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r+ with nit fixes
> 
> ::: media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp
> @@ +790,5 @@
> >    }
> >  
> > +  if(mDecoderStatus != GMPNoErr){
> > +    mDecoderStatus = GMPNoErr;
> > +    LOG(LogLevel::Error, ("%s: Decoder status is bad (%u)!",
> 
> It's an interesting question whether to reset the error code, or wait for an
> IDR to force it to reset.  However, until we get an IDR it will likely keep
> getting set back to an error, and an IDR will stop that (though we may send
> a FIR or two past when it has actually arrived).  Given that, resetting it
> to false here doesn't actually do much useful.

Actually, if I comment out the "mDecoderStatus = GMPNoErr;" lines, we're right back where we started -- the video stream won't recover from packet loss in a timely way. Grepping the code base for "mDecoderStatus", I didn't see any other places where it gets reset to GMPNoErr (unless it's happening through an alias). So if an IDR is supposed to clear that error flag elsewhere, it's not currently happening. If you think it should be, let me know whereabouts and I can take a look.

> Also: this needs a comment as to why this is being done, and a pointer to
> the bug that willfix this the right way.  Like this: // XXX Bug NNNNNNN -
> this is an ugly solution to asynchronous decoding errors not being returned
> to the synchronous Decode() method. 
> (Or something like that)

For the "bug that willfix this the right way", should I create a new bug, or will this bug (1489757) become the 'willfix'?
Attachment #9007436 - Attachment is obsolete: true
Attachment #9007436 - Flags: review?(docfaraday)
Attachment #9007525 - Flags: review?(rjesup)
Comment on attachment 9007525 [details] [diff] [review]
fix_for_549f0b8075d5_packetloss_handling_issue_rev5.patch

Review of attachment 9007525 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the changes below

::: media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp
@@ +932,5 @@
>                                      codecSpecificInfo,
>                                      aDecodeData->mRenderTimeMs);
>    if (NS_FAILED(rv)) {
>      LOG(LogLevel::Error, ("%s: Decode failed (rv=%u)!",
>          __PRETTY_FUNCTION__, static_cast<unsigned>(rv)));

Set mDecoderStatus here to an error, and also in the other spots above in Decode_g() that fail and return.
Attachment #9007525 - Flags: review?(rjesup) → review+
This unsets the error flag at the top of Decode_g and sets it in the body as appropriate. Anecdotally this sped up recovery a little bit.

I debated on whether this particular bit is a good/bad idea:
>     // destroyed via Terminate(), failed to init, or just not initted yet
>     LOGD(("GMP Decode: not initted yet"));
>+
>+    aThis->mDecoderStatus = GMPDecodeErr;
>     return;
>   }

I'm not sure if setting the error flag when init has failed/been terminated is  appropriate. I assume this is a fairly unlikely branch (aThis->mInitting is probably true more often). Anyhow, please let me know if you have any thoughts here.
Attachment #9007525 - Attachment is obsolete: true
Attachment #9007538 - Flags: review?(rjesup)
Attachment #9007538 - Flags: review?(rjesup) → review+
Keywords: leave-open
Setting leave-open on this, because even with this patch in place, it will eventually need to be fixed properly and this bug can be used to track that.
Assignee: nobody → ajohnson
Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: Untriaged → WebRTC: Audio/Video
Product: Firefox → Core
Comment on attachment 9007538 [details] [diff] [review]
fix_for_549f0b8075d5_packetloss_handling_issue_rev6.patch

Checking with Byron; note this is an imperfect patch to stop the pain (and uplift to beta)
Attachment #9007538 - Flags: review?(docfaraday)
Comment on attachment 9007538 [details] [diff] [review]
fix_for_549f0b8075d5_packetloss_handling_issue_rev6.patch

Review of attachment 9007538 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with fix

::: media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp
@@ +862,5 @@
>  void
>  WebrtcGmpVideoDecoder::Decode_g(const RefPtr<WebrtcGmpVideoDecoder>& aThis,
>                                  nsAutoPtr<GMPDecodeData> aDecodeData)
>  {
> +  aThis->mDecoderStatus = GMPNoErr;

Shouldn't we clear this error after Decode succeeded, not before?
Attachment #9007538 - Flags: review?(docfaraday) → review+
(In reply to Byron Campen [:bwc] from comment #9)
> Comment on attachment 9007538 [details] [diff] [review]
> fix_for_549f0b8075d5_packetloss_handling_issue_rev6.patch
> 
> Review of attachment 9007538 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r+ with fix
> 
> ::: media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp
> @@ +862,5 @@
> >  void
> >  WebrtcGmpVideoDecoder::Decode_g(const RefPtr<WebrtcGmpVideoDecoder>& aThis,
> >                                  nsAutoPtr<GMPDecodeData> aDecodeData)
> >  {
> > +  aThis->mDecoderStatus = GMPNoErr;
> 
> Shouldn't we clear this error after Decode succeeded, not before?

The problem is that nothing clears the DecoderStatus on success, only sets it on failure, and I don't trust all failure paths to end up returning an rv that's a failure.

We could move it, yes, so long as we believe rv is accurate.  Here's the IRC discussion that led to this:
https://mozilla.logbot.info/media/20180909#c15285519

Do you still want the change?  If so, we'll need to have andj2223 verify the change still recovers correctly.
Flags: needinfo?(docfaraday)
Priority: -- → P3
(In reply to Randell Jesup [:jesup] from comment #10)
> (In reply to Byron Campen [:bwc] from comment #9)
> > Comment on attachment 9007538 [details] [diff] [review]
> > fix_for_549f0b8075d5_packetloss_handling_issue_rev6.patch
> > 
> > Review of attachment 9007538 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > r+ with fix
> > 
> > ::: media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp
> > @@ +862,5 @@
> > >  void
> > >  WebrtcGmpVideoDecoder::Decode_g(const RefPtr<WebrtcGmpVideoDecoder>& aThis,
> > >                                  nsAutoPtr<GMPDecodeData> aDecodeData)
> > >  {
> > > +  aThis->mDecoderStatus = GMPNoErr;
> > 
> > Shouldn't we clear this error after Decode succeeded, not before?
> 
> The problem is that nothing clears the DecoderStatus on success, only sets
> it on failure, and I don't trust all failure paths to end up returning an rv
> that's a failure.
> 
> We could move it, yes, so long as we believe rv is accurate.  Here's the IRC
> discussion that led to this:
> https://mozilla.logbot.info/media/20180909#c15285519
> 
> Do you still want the change?  If so, we'll need to have andj2223 verify the
> change still recovers correctly.

We used to set mDecoderStatus to GMPNoErr when Decode finished... why is that no longer what we want to do? Every early return in Decode_g happens on an error condition, so it would seem to me that we set mDecoderStatus to an error if something bad happened, and if we make it to the end of the function, it must be fine?
Flags: needinfo?(docfaraday)
Andrew can you check if you address Byron's review comment it still fixes the problem for you?

Thanks a lot for your effort here in debugging and submitting a patch!!
Flags: needinfo?(ajohnson)
Priority: P3 → P1
drno: Yes, the problem is still fixed after Byron's revision. I've attached the revised patch.
Attachment #9007538 - Attachment is obsolete: true
Flags: needinfo?(ajohnson)
See Also: → 1492038
I’m still curious on Rjesup’s question — how confident are we that all of the functions called within Decode_g() return an error value?

Note that mDecoderStatus can also be set to an error state inside of RecvErr() in an entirelu different source file (grep for mDecoderStatus and you’ll find it). Is there a poasibility that mDecoderStatus can be set to an error state within RecvErr() yet none of the functions called by Decode_g() return an error?

If so, we run the risk of masking an error by resetting mDecoderStatus in this manner. Dont get me wrong, this is still an improvement ovet the prior code, bit it’d be good to have a better understanding of how/when/why mDecoderStatus is set to an error state within RecvErr() and if we run the risk of masking it by accident, here.
(Typed with my thumbs, excuse the typos)
Blocks: 1492038
Attachment #9010081 - Flags: review?(docfaraday)
Attachment #9010081 - Flags: review?(docfaraday) → review+
Sorry we didn't have a try run yet.
Keywords: checkin-needed
The change appears to be crashing in treeherder when attempting to do H264 calls:

[task 2018-09-19T18:08:02.866Z] 18:08:02     INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ mozalloc_abort]
[task 2018-09-19T18:08:02.868Z] 18:08:02     INFO - Crash dump filename: /tmp/tmpebK_5Y.mozrunner/minidumps/792079e7-4e72-5bfb-5b94-ab4bf5e0335d.dmp
[task 2018-09-19T18:08:02.868Z] 18:08:02     INFO - Operating system: Linux
[task 2018-09-19T18:08:02.869Z] 18:08:02     INFO -                   0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2018-09-19T18:08:02.869Z] 18:08:02     INFO - CPU: x86
[task 2018-09-19T18:08:02.870Z] 18:08:02     INFO -      GenuineIntel family 6 model 62 stepping 4
[task 2018-09-19T18:08:02.871Z] 18:08:02     INFO -      2 CPUs
[task 2018-09-19T18:08:02.872Z] 18:08:02     INFO - 
[task 2018-09-19T18:08:02.872Z] 18:08:02     INFO - GPU: UNKNOWN
[task 2018-09-19T18:08:02.873Z] 18:08:02     INFO - 
[task 2018-09-19T18:08:02.874Z] 18:08:02     INFO - Crash reason:  SIGSEGV
[task 2018-09-19T18:08:02.875Z] 18:08:02     INFO - Crash address: 0x0
[task 2018-09-19T18:08:02.876Z] 18:08:02     INFO - Process uptime: not available
[task 2018-09-19T18:08:02.877Z] 18:08:02     INFO - 
[task 2018-09-19T18:08:02.878Z] 18:08:02     INFO - Thread 26 (crashed)
[task 2018-09-19T18:08:02.880Z] 18:08:02     INFO -  0  firefox-bin!mozalloc_abort [mozalloc_abort.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 35 + 0x0]
[task 2018-09-19T18:08:02.880Z] 18:08:02     INFO -     eip = 0x566690e1   esp = 0xe6875b30   ebp = 0xe6875b38   ebx = 0x56694000
[task 2018-09-19T18:08:02.881Z] 18:08:02     INFO -     esi = 0xf753fdf8   edi = 0xe6875b58   eax = 0x00000000   ecx = 0xf7540864
[task 2018-09-19T18:08:02.882Z] 18:08:02     INFO -     edx = 0x00000000   efl = 0x00210282
[task 2018-09-19T18:08:02.883Z] 18:08:02     INFO -     Found by: given as instruction pointer in context
[task 2018-09-19T18:08:02.884Z] 18:08:02     INFO -  1  firefox-bin!abort [mozalloc_abort.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 82 + 0x1c]
[task 2018-09-19T18:08:02.885Z] 18:08:02     INFO -     eip = 0x56669158   esp = 0xe6875b40   ebp = 0xe6875b48   ebx = 0x56694000
[task 2018-09-19T18:08:02.886Z] 18:08:02     INFO -     esi = 0xf753fdf8   edi = 0xe6875b58
[task 2018-09-19T18:08:02.886Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.888Z] 18:08:02     INFO -  2  libxul.so!rtc::FatalMessage::~FatalMessage() [checks.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 109 + 0x5]
[task 2018-09-19T18:08:02.888Z] 18:08:02     INFO -     eip = 0xf17017a6   esp = 0xe6875b50   ebp = 0xe6875b68   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.889Z] 18:08:02     INFO -     esi = 0xf753fdf8   edi = 0xe6875b58
[task 2018-09-19T18:08:02.889Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.890Z] 18:08:02     INFO -  3  libxul.so!webrtc::RtpPacketizerH264::PacketizeSingleNalu(unsigned int) [rtp_format_h264.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 250 + 0x9]
[task 2018-09-19T18:08:02.891Z] 18:08:02     INFO -     eip = 0xf17bae2d   esp = 0xe6875b70   ebp = 0xe6875c48   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.892Z] 18:08:02     INFO -     esi = 0xe6875b80   edi = 0xe3142410
[task 2018-09-19T18:08:02.893Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.893Z] 18:08:02     INFO -  4  libxul.so!webrtc::RtpPacketizerH264::GeneratePackets() [rtp_format_h264.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 173 + 0xa]
[task 2018-09-19T18:08:02.894Z] 18:08:02     INFO -     eip = 0xf17bac46   esp = 0xe6875c50   ebp = 0xe6875c88   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.895Z] 18:08:02     INFO -     esi = 0x00000001   edi = 0xe1df3a00
[task 2018-09-19T18:08:02.895Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.896Z] 18:08:02     INFO -  5  libxul.so!webrtc::RtpPacketizerH264::SetPayloadData(unsigned char const*, unsigned int, webrtc::RTPFragmentationHeader const*) [rtp_format_h264.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 166 + 0xb]
[task 2018-09-19T18:08:02.896Z] 18:08:02     INFO -     eip = 0xf17ba9dc   esp = 0xe6875c90   ebp = 0xe6875db8   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.897Z] 18:08:02     INFO -     esi = 0x00000000   edi = 0xe6875cd0
[task 2018-09-19T18:08:02.898Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.898Z] 18:08:02     INFO -  6  libxul.so!webrtc::RTPSenderVideo::SendVideo(webrtc::RtpVideoCodecTypes, webrtc::FrameType, signed char, unsigned int, long long, unsigned char const*, unsigned int, webrtc::RTPFragmentationHeader const*, webrtc::RTPVideoHeader const*, webrtc::StreamId const*, webrtc::StreamId const*) [rtp_sender_video.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 367 + 0x10]
[task 2018-09-19T18:08:02.898Z] 18:08:02     INFO -     eip = 0xf17d0890   esp = 0xe6875dc0   ebp = 0xe6875f18   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.899Z] 18:08:02     INFO -     esi = 0xe1df3a00   edi = 0xe6876a68
[task 2018-09-19T18:08:02.900Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.900Z] 18:08:02     INFO -  7  libxul.so!webrtc::RTPSender::SendOutgoingData(webrtc::FrameType, signed char, unsigned int, long long, unsigned char const*, unsigned int, webrtc::RTPFragmentationHeader const*, webrtc::RTPVideoHeader const*, unsigned int*) [rtp_sender.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 461 + 0x29]
[task 2018-09-19T18:08:02.901Z] 18:08:02     INFO -     eip = 0xf17cd69c   esp = 0xe6875f20   ebp = 0xe6876058   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.905Z] 18:08:02     INFO -     esi = 0xe6875f68   edi = 0xe39bf008
[task 2018-09-19T18:08:02.906Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.907Z] 18:08:02     INFO -  8  libxul.so!webrtc::ModuleRtpRtcpImpl::SendOutgoingData(webrtc::FrameType, signed char, unsigned int, long long, unsigned char const*, unsigned int, webrtc::RTPFragmentationHeader const*, webrtc::RTPVideoHeader const*, unsigned int*) [rtp_rtcp_impl.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 419 + 0x29]
[task 2018-09-19T18:08:02.908Z] 18:08:02     INFO -     eip = 0xf17cd37c   esp = 0xe6876060   ebp = 0xe68760d8   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.909Z] 18:08:02     INFO -     esi = 0xe39bf008   edi = 0xe39bf328
[task 2018-09-19T18:08:02.910Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.910Z] 18:08:02     INFO -  9  libxul.so!webrtc::PayloadRouter::OnEncodedImage(webrtc::EncodedImage const&, webrtc::CodecSpecificInfo const*, webrtc::RTPFragmentationHeader const*) [payload_router.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 137 + 0x37]
[task 2018-09-19T18:08:02.912Z] 18:08:02     INFO -     eip = 0xf180da8e   esp = 0xe68760e0   ebp = 0xe6876848   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.912Z] 18:08:02     INFO -     esi = 0xe6876118   edi = 0xe6876a80
[task 2018-09-19T18:08:02.914Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.914Z] 18:08:02     INFO - 10  libxul.so!webrtc::internal::VideoSendStreamImpl::OnEncodedImage(webrtc::EncodedImage const&, webrtc::CodecSpecificInfo const*, webrtc::RTPFragmentationHeader const*) [video_send_stream.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 1032 + 0x14]
[task 2018-09-19T18:08:02.917Z] 18:08:02     INFO -     eip = 0xf1826cd3   esp = 0xe6876850   ebp = 0xe6876938   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.918Z] 18:08:02     INFO -     esi = 0xe186d8e8   edi = 0xe6876a80
[task 2018-09-19T18:08:02.920Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.920Z] 18:08:02     INFO - 11  libxul.so!non-virtual thunk to webrtc::internal::VideoSendStreamImpl::OnEncodedImage(webrtc::EncodedImage const&, webrtc::CodecSpecificInfo const*, webrtc::RTPFragmentationHeader const*) [video_send_stream.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 0 + 0x5]
[task 2018-09-19T18:08:02.920Z] 18:08:02     INFO -     eip = 0xf1826de9   esp = 0xe6876940   ebp = 0xe6876968   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.920Z] 18:08:02     INFO -     esi = 0xe6876a58   edi = 0xe39be000
[task 2018-09-19T18:08:02.921Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.921Z] 18:08:02     INFO - 12  libxul.so!webrtc::ViEEncoder::OnEncodedImage(webrtc::EncodedImage const&, webrtc::CodecSpecificInfo const*, webrtc::RTPFragmentationHeader const*) [vie_encoder.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 610 + 0x15]
[task 2018-09-19T18:08:02.921Z] 18:08:02     INFO -     eip = 0xf182dd6a   esp = 0xe6876970   ebp = 0xe68769a8   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.922Z] 18:08:02     INFO -     esi = 0xe6876a80   edi = 0xe39be000
[task 2018-09-19T18:08:02.922Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.922Z] 18:08:02     INFO - 13  libxul.so!non-virtual thunk to webrtc::ViEEncoder::OnEncodedImage(webrtc::EncodedImage const&, webrtc::CodecSpecificInfo const*, webrtc::RTPFragmentationHeader const*) [vie_encoder.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 0 + 0x5]
[task 2018-09-19T18:08:02.922Z] 18:08:02     INFO -     eip = 0xf182de1d   esp = 0xe68769b0   ebp = 0xe68769d8   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.922Z] 18:08:02     INFO -     esi = 0xe6876a58   edi = 0xe39be3d0
[task 2018-09-19T18:08:02.922Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.922Z] 18:08:02     INFO - 14  libxul.so!webrtc::VCMEncodedFrameCallback::OnEncodedImage(webrtc::EncodedImage const&, webrtc::CodecSpecificInfo const*, webrtc::RTPFragmentationHeader const*) [generic_encoder.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 162 + 0x11]
[task 2018-09-19T18:08:02.923Z] 18:08:02     INFO -     eip = 0xf17e1a38   esp = 0xe68769e0   ebp = 0xe6876a28   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.923Z] 18:08:02     INFO -     esi = 0xe6876a58   edi = 0xe39be3d0
[task 2018-09-19T18:08:02.923Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.923Z] 18:08:02     INFO - 15  libxul.so!mozilla::WebrtcGmpVideoEncoder::Encoded(GMPVideoEncodedFrame*, nsTArray<unsigned char> const&) [WebrtcGmpVideoCodec.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 663 + 0x2e]
[task 2018-09-19T18:08:02.923Z] 18:08:02     INFO -     eip = 0xeefb29d2   esp = 0xe6876a30   ebp = 0xe6876b08   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.923Z] 18:08:02     INFO -     esi = 0xe6876a68   edi = 0x00000002
[task 2018-09-19T18:08:02.923Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.924Z] 18:08:02     INFO - 16  libxul.so!mozilla::gmp::GMPVideoEncoderParent::RecvEncoded(mozilla::gmp::GMPVideoEncodedFrameData const&, nsTArray<unsigned char>&&) [GMPVideoEncoderParent.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 263 + 0xd]
[task 2018-09-19T18:08:02.924Z] 18:08:02     INFO -     eip = 0xf0535874   esp = 0xe6876b10   ebp = 0xe6876b38   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.924Z] 18:08:02     INFO -     esi = 0xe3152150   edi = 0xe1df1540
[task 2018-09-19T18:08:02.924Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.924Z] 18:08:02     INFO - 17  libxul.so!non-virtual thunk to mozilla::gmp::GMPVideoEncoderParent::RecvEncoded(mozilla::gmp::GMPVideoEncodedFrameData const&, nsTArray<unsigned char>&&) [GMPVideoEncoderParent.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 0 + 0x5]
[task 2018-09-19T18:08:02.924Z] 18:08:02     INFO -     eip = 0xf0535949   esp = 0xe6876b40   ebp = 0xe6876b58   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.924Z] 18:08:02     INFO -     esi = 0xe6876bb8   edi = 0xe3152154
[task 2018-09-19T18:08:02.924Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.924Z] 18:08:02     INFO - 18  libxul.so!mozilla::gmp::PGMPVideoEncoderParent::OnMessageReceived(IPC::Message const&) [PGMPVideoEncoderParent.cpp: : 321 + 0x15]
[task 2018-09-19T18:08:02.925Z] 18:08:02     INFO -     eip = 0xeec88bd3   esp = 0xe6876b60   ebp = 0xe6876c08   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.925Z] 18:08:02     INFO -     esi = 0xe6876bb8   edi = 0xe3152154
[task 2018-09-19T18:08:02.925Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.925Z] 18:08:02     INFO - 19  libxul.so!mozilla::gmp::PGMPContentParent::OnMessageReceived(IPC::Message const&) [PGMPContentParent.cpp: : 278 + 0xf]
[task 2018-09-19T18:08:02.925Z] 18:08:02     INFO -     eip = 0xeec7949d   esp = 0xe6876c10   ebp = 0xe6876c28   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.925Z] 18:08:02     INFO -     esi = 0x00000002   edi = 0xe697e390
[task 2018-09-19T18:08:02.925Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.925Z] 18:08:02     INFO - 20  libxul.so!mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) [MessageChannel.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 2248 + 0xc]
[task 2018-09-19T18:08:02.925Z] 18:08:02     INFO -     eip = 0xeeb50420   esp = 0xe6876c30   ebp = 0xe6876c58   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.925Z] 18:08:02     INFO -     esi = 0x00000001   edi = 0xe52cad28
[task 2018-09-19T18:08:02.926Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.926Z] 18:08:02     INFO - 21  libxul.so!mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) [MessageChannel.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 2175 + 0xc]
[task 2018-09-19T18:08:02.926Z] 18:08:02     INFO -     eip = 0xeeb4f4d2   esp = 0xe6876c60   ebp = 0xe6876cc8   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.926Z] 18:08:02     INFO -     esi = 0xe52cad28   edi = 0xe3823f30
[task 2018-09-19T18:08:02.926Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.926Z] 18:08:02     INFO - 22  libxul.so!mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) [MessageChannel.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 2012 + 0xc]
[task 2018-09-19T18:08:02.926Z] 18:08:02     INFO -     eip = 0xeeb4f9b5   esp = 0xe6876cd0   ebp = 0xe6876cf8   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.926Z] 18:08:02     INFO -     esi = 0xe52cad28   edi = 0xe3823f30
[task 2018-09-19T18:08:02.926Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.927Z] 18:08:02     INFO - 23  libxul.so!mozilla::ipc::MessageChannel::MessageTask::Run() [MessageChannel.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 2045 + 0xc]
[task 2018-09-19T18:08:02.927Z] 18:08:02     INFO -     eip = 0xeeb4fe19   esp = 0xe6876d00   ebp = 0xe6876d18   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.928Z] 18:08:02     INFO -     esi = 0xe69aa700   edi = 0xe3823f00
[task 2018-09-19T18:08:02.928Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.928Z] 18:08:02     INFO - 24  libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 1161 + 0x8]
[task 2018-09-19T18:08:02.928Z] 18:08:02     INFO -     eip = 0xee684dc8   esp = 0xe6876d20   ebp = 0xe6877228   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.929Z] 18:08:02     INFO -     esi = 0x00000000   edi = 0xe9389c80
[task 2018-09-19T18:08:02.929Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.929Z] 18:08:02     INFO - 25  libxul.so!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 519 + 0x11]
[task 2018-09-19T18:08:02.930Z] 18:08:02     INFO -     eip = 0xee6876a8   esp = 0xe6877230   ebp = 0xe6877258   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.930Z] 18:08:02     INFO -     esi = 0xe687724f   edi = 0xe8b464d0
[task 2018-09-19T18:08:02.931Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.931Z] 18:08:02     INFO - 26  libxul.so!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [MessagePump.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 364 + 0x13]
[task 2018-09-19T18:08:02.933Z] 18:08:02     INFO -     eip = 0xeeb52f94   esp = 0xe6877260   ebp = 0xe6877288   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.934Z] 18:08:02     INFO -     esi = 0xe8bd49e0   edi = 0xe8b464d0
[task 2018-09-19T18:08:02.934Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.935Z] 18:08:02     INFO - 27  libxul.so!MessageLoop::RunInternal() [message_loop.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 325 + 0xb]
[task 2018-09-19T18:08:02.935Z] 18:08:02     INFO -     eip = 0xeeb1ac64   esp = 0xe6877290   ebp = 0xe68772b8   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.935Z] 18:08:02     INFO -     esi = 0xe8b464d0   edi = 0xe68772c8
[task 2018-09-19T18:08:02.936Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.937Z] 18:08:02     INFO - 28  libxul.so!MessageLoop::Run() [message_loop.cc:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 318 + 0x8]
[task 2018-09-19T18:08:02.938Z] 18:08:02     INFO -     eip = 0xeeb1abdb   esp = 0xe68772c0   ebp = 0xe68772e8   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.938Z] 18:08:02     INFO -     esi = 0xe8b464d0   edi = 0xe68772c8
[task 2018-09-19T18:08:02.939Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.940Z] 18:08:02     INFO - 29  libxul.so!nsThread::ThreadFunc(void*) [nsThread.cpp:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 464 + 0x8]
[task 2018-09-19T18:08:02.940Z] 18:08:02     INFO -     eip = 0xee682a58   esp = 0xe68772f0   ebp = 0xe6877338   ebx = 0xf54b6000
[task 2018-09-19T18:08:02.940Z] 18:08:02     INFO -     esi = 0xe9389ca0   edi = 0xe9389c80
[task 2018-09-19T18:08:02.941Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.941Z] 18:08:02     INFO - 30  libnspr4.so!_pt_root [ptthread.c:a919cff62d7dee9f7e18e1cc7a60b6008fcab72b : 201 + 0x6]
[task 2018-09-19T18:08:02.942Z] 18:08:02     INFO -     eip = 0xf7379868   esp = 0xe6877340   ebp = 0xe6877368   ebx = 0xf7389000
[task 2018-09-19T18:08:02.942Z] 18:08:02     INFO -     esi = 0xf738b540   edi = 0xe8b3e1c0
[task 2018-09-19T18:08:02.942Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.942Z] 18:08:02     INFO - 31  libpthread-2.23.so + 0x6295
[task 2018-09-19T18:08:02.943Z] 18:08:02     INFO -     eip = 0xf7749295   esp = 0xe6877370   ebp = 0xe6877428   ebx = 0x00000000
[task 2018-09-19T18:08:02.943Z] 18:08:02     INFO -     esi = 0x00000000   edi = 0x003d0f00
[task 2018-09-19T18:08:02.943Z] 18:08:02     INFO -     Found by: call frame info
[task 2018-09-19T18:08:02.948Z] 18:08:02     INFO - 32  libc-2.23.so + 0xe70ae
[task 2018-09-19T18:08:02.948Z] 18:08:02     INFO -     eip = 0xf74740ae   esp = 0xe6877430   ebp = 0x00000000
[task 2018-09-19T18:08:02.948Z] 18:08:02     INFO -     Found by: previous frame's frame pointer
Looks like it's failing in here https://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtp_format_h264.cc#250

I'm assuming that is with the fake H264 plugin. Dan do you have any idea if this is related to the problems you encountered during the webrtc.org update?
Flags: needinfo?(dminor)
Blocks: 1448863
Flags: needinfo?(dminor)
This has not reproduced for me locally (I ran the affected test 1000 times) and retriggers on try seem green so far. It looks like this is an intermittent failure. It's possible this patch is causing it to occur more often, but given the severity of the problem it is fixing, I'm inclined to land the patch as is. I'll give it another hour or so to see if any more failures turn up.
Rather than leaving this bug open to find a clean way of fixing this, I've filed Bug 1492852 as a follow up.
Keywords: leave-open
See Also: → 1492852
I've updated the patch to include mercurial author information and a commit description and to refer to Bug 1492852 as a follow up rather than leaving this bug open. Carrying forward r+ from bwc.
Attachment #9010081 - Attachment is obsolete: true
Attachment #9010644 - Flags: review+
Pushed by dminor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4bc0d86a17a8
Bug 1448863 causes video streams to take a very long time to recover from packet loss; r=bwc
https://hg.mozilla.org/mozilla-central/rev/4bc0d86a17a8
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Given the severity, it would be good to see an uplift to 63. If another 62 point release ships, it would be good to see there as well if possible. Thanks!
Comment on attachment 9010644 [details] [diff] [review]
bug-1489757-fix.patch

Approval Request Comment
[Feature/Bug causing the regression]: 1448863
[User impact if declined]: Video in WebRTC calls with Safari and others using H264 freeze pretty often.
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Not yet.
[Needs manual test from QE? If yes, steps to reproduce]:
1. Open https://appr.tc with Safari 12 and join a new meeting room
2. Join the same URL with Firefox and wait a >= 5 seconds
3. Video from the Safari should not freeze
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: Not really.
[Why is the change risky/not risky?]: It only tries to ensure that decoding errors are propagated back to the sender in case of H264.
[String changes made/needed]: N/A
Attachment #9010644 - Flags: approval-mozilla-beta?
Depends on: 1493183
FYI - Good news, perfherder is showing improvements from this fix:

== Change summary for alert #16055 (as of Thu, 20 Sep 2018 20:08:49 GMT) ==

Improvements:

 22%  tp5o_webext responsiveness windows7-32 pgo e10s stylo         0.98 -> 0.77
 20%  tp5o_webext responsiveness windows10-64 pgo e10s stylo        1.04 -> 0.83
 19%  tp5o_webext responsiveness windows7-32 opt e10s stylo         1.03 -> 0.83
 16%  tp5o_webext responsiveness windows10-64 opt e10s stylo        1.11 -> 0.94
 14%  tp5o_webext responsiveness windows10-64-qr opt e10s stylo     1.10 -> 0.95
 13%  tp5o_webext responsiveness linux64-qr opt e10s stylo          1.71 -> 1.49

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16055
I'd like to see this verified on nightly before uplifting. Brindusa could your team verify the fix on Nightly on Monday so as to include this uplift in beta 9? Thanks
Flags: qe-verify+
Flags: needinfo?(brindusa.tot)
(In reply to Pascal Chevrel:pascalc from comment #28)
> I'd like to see this verified on nightly before uplifting. Brindusa could
> your team verify the fix on Nightly on Monday so as to include this uplift
> in beta 9? Thanks

Note: with the latest Nightly which include the patch from bug 1486012 you can't verify this, because the code from bug 1486012 breaks interop with Safari. So you can only test this with a Nightly from which the code in bug 1486012 has been removed.
(In reply to Nils Ohlmeier [:drno] from comment #29)
> (In reply to Pascal Chevrel:pascalc from comment #28)
> > I'd like to see this verified on nightly before uplifting. Brindusa could
> > your team verify the fix on Nightly on Monday so as to include this uplift
> > in beta 9? Thanks
> 
> Note: with the latest Nightly which include the patch from bug 1486012 you
> can't verify this, because the code from bug 1486012 breaks interop with
> Safari. So you can only test this with a Nightly from which the code in bug
> 1486012 has been removed.

Turns out this was only a temporary problem as bug 1486012 introduced a problem, which got fixed by a subsequent patch from another bug which didn't made it into the same Nightly build.

I was able to verify this is fixed in the latest Nightly.
Duplicate of this bug: 1493079
I managed to somehow reproduce the issue using an older version of Nightly (2018-09-08) by recreating the steps from comment 26. The moment I entered the room from Firefox, the video went black.
I retested everything using latest Nightly 64.0a1 on macOS 10.13, but the bug is still reproducing. The video still becomes black when I enter the room from Firefox. At first I thought that it was because I update the browser while the room was opened, but now I am not so sure. The issues reproduced even after I quit the browser and open it again (with and without a new profile). 
Is this another issue?
Flags: needinfo?(brindusa.tot)
Flags: needinfo?(ajohnson)
Hi Oana,

The "video going black before the call can even start" problem that you describe isn't something I've seen throughout my testing with this issue, and I don't think anyone else has been seeing a black video stream with respect to this issue, so I'm inclined to believe that you're seeing something different, but I can't be totally sure. I've been testing on linux rather than mac, so perhaps the same issue can manifest as a black stream rather than a frozen one on that platform.

The fact that the problem you're seeing always happens right at the beginning of the call with no video stream ever getting displayed also makes me believe it's probably something different. This issue (1489757) can sometimes happen right at the beginning of the call, but often happens in the middle of an already-in-progress call where some video was successfully displayed for at least a moment or two.

You may want to run firefox with logging enabled and post those logs somewhere. Something like this can work:
MOZ_LOG=signaling:4,timestamp,jsep:4,webrtc_trace:5,GMP:5,sync MOZ_LOG_FILE=firefox_debug.log WEBRTC_TRACE=nspr NSPR_LOG_MODULES=webrtc_trace:4,sync /path/to/your/firefox
Flags: needinfo?(ajohnson)
Thank you for your quick answer.
The "video going black before the call can even start" problem might be intermittent, I think I will log it separately after I investigate it more.

I managed to reproduce the issue by opening the call in Safari 12 on a macOS 10.14 and copying the url in Nightly from 2018-09-08 on macOS 10.13.

I retested everything by opening the call in Safari 12 on a macOS 10.14 and copying the url in the latest Nightly 64.0a1 using different platforms. I got different results:

1. Safari (macOS 10.14) - > Latest Nightly 64.0a1 (Ubuntu 16.04) - the image was lagging on latest Nightly
2. Safari (macOS 10.14) - > Latest Nightly 64.0a1 (macOS 10.13) - the image was lagging on latest Nightly
3. Safari (macOS 10.14) - > Latest Nightly 64.0a1 (macOS 10.12) - the image was frozen on latest Nightly
Flags: needinfo?(ajohnson)
The problem described in Comment 26 is actually Bug 1492038. Andrew wrote this patch to fix a different bug, but while testing, :drno noticed that it helped with the problem in Bug 1492038 because it fixes error reporting from the H.264 plugin. I don't think it's fair to ask Andrew for help with Safari / Nightly video problems since that was not the problem he originally encountered, and he's not a Mozilla employee :)

For what it's worth, the testing I did (Safari on OS X to Nightly on Ubuntu 18.04) was consistent with Oana's. With this patch applied, instead of the video freezing completely, I saw updated frames every few seconds. I think that is as best as we can expect from this patch. The H.264 bitstream from Safari is still triggering errors in the Cisco codec, but with this patch applied, we at least signal those errors, which I'm guessing causes Safari to send a new keyframe and let the video progress a little bit.

Based on comment#0, the original repro details for this bug are private, so it's not really possible to verify this patch. Based upon my testing and what Oana reported in Comment 34, it seems like it is working as expected to me.
Flags: needinfo?(ajohnson)
[Tracking Requested - why for this release]: The release of Safari 12 breaks WebRTC interop with Firefox video calling. As this patch improves the situation we should consider uplifting this patch as high as possible.
Comment on attachment 9010644 [details] [diff] [review]
bug-1489757-fix.patch

Mitigation for a major WebRTC regression, uplift approved for 63 beta 10, thanks
Attachment #9010644 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+
Please request ESR60 approval on this when you get a chance.
Flags: qe-verify+
Flags: needinfo?(ajohnson)
I'm not familiar with how to request ESR60 approval, so I will turn that task over to drno. Please let me know if something is needed from me. Thanks.
Flags: needinfo?(ajohnson) → needinfo?(drno)
Comment on attachment 9010644 [details] [diff] [review]
bug-1489757-fix.patch

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
This fixes error reporting for the H.264 plugin. Without error reporting, any problem encountered causes significant delays or frozen video.
User impact if declined: 
Frozen video on H.264 WebRTC calls, e.g. Safari 12 to Firefox
Fix Landed on Version:
64
Risk to taking this patch (and alternatives if risky): 
Low.
String or UUID changes made by this patch: 
None.

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #9010644 - Flags: approval-mozilla-esr60?
Comment on attachment 9010644 [details] [diff] [review]
bug-1489757-fix.patch

Fixes WebRTC interop issues with Safari 12 (at least). Approved for ESR 60.3.
Attachment #9010644 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
QA Contact: jib
I retested everything by making a call from Safari (macOS 10.14) to Latest Nightly 64.0a1 (Windows 10). The image is still lagging. 
According to this and the info from comment 34 and comment 35, there is nothing more I can do to test the issue, so I will take out the qe-verify+ flag.
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.