Closed
Bug 1075640
Opened 10 years ago
Closed 10 years ago
WebRTC H.264 sporadic crash in WebrtcGmpVideoDecoder
Categories
(Core :: WebRTC: Audio/Video, defect)
Tracking
()
RESOLVED
FIXED
mozilla36
People
(Reporter: mzanaty, Assigned: ehugg)
References
Details
Attachments
(3 files, 3 obsolete files)
1.16 KB,
patch
|
jesup
:
review+
|
Details | Diff | Splinter Review |
3.75 KB,
patch
|
Details | Diff | Splinter Review | |
2.43 KB,
patch
|
ehugg
:
review+
mzanaty
:
review+
lmandel
:
approval-mozilla-aurora+
lmandel
:
approval-mozilla-beta+
Sylvestre
:
approval-mozilla-release+
|
Details | Diff | Splinter Review |
Seeing the following sporadic crash in WebrtcGmpVideoDecoder when testing real H.264 WebRTC services. Not able to reproduce in simple test apps. https://crash-stats.mozilla.com/report/index/af34c7f1-657d-4d9b-9b29-2f9772140924
Looking at the crashing line 609: http://hg.mozilla.org/releases/mozilla-beta/annotate/10a34974f747/media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp#l609 *(reinterpret_cast<uint32_t*>(frame->Buffer())) = frame->Size(); Should this be frame->Size()-4 (to exclude the size itself)?
Assignee | ||
Comment 2•10 years ago
|
||
It looks like the -4 might be handled three lines later in the memcpy: http://hg.mozilla.org/releases/mozilla-beta/annotate/10a34974f747/media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp#l612
The memcpy size-4 is what got me thinking the size poked into the buffer header should also be size-4. What gets poked into the header should probably be the size of what remains excluding the size field itself. That's how most protocol/buffer formats work, including AVCC. But I'm not sure if this internal FF GMP format is expected to be exactly like AVCC or not. If it is intended to be exactly like AVCC, we may have more problems, since the code is not scanning the entire buffer to replace all 4-byte AnnexB start codes (0x0) with 4-byte AVCC lengths. That is no problem for the OpenH264 decoder, since it expects and handles AnnexB start codes just fine. But I'm not sure if it can be a problem for any OMX HW decoder which may expect strict AVCC with no AnnexB start codes anywhere in the frame.
Assignee | ||
Comment 4•10 years ago
|
||
We may have a problem there, but this particular crash looks like a null deref. My first guess would be that setting the members to null here: http://hg.mozilla.org/releases/mozilla-beta/annotate/10a34974f747/media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp#l667 Is racing with the usage here: http://hg.mozilla.org/releases/mozilla-beta/annotate/10a34974f747/media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp#l635 Perhaps dispatching to the GMP thread to set these to nullptr on release is needed, or I could be way off-base since I don't have a repro case. Any hints on reproducing this?
Meant (0x00000001) not (0x0) in c#3. To clarify the full pipeline of buffer mangling that happens from RTP to OpenH264 decoder: 1. RTP has no AnnexB start codes or AVCC lengths. 2. JB adds AnnexB start codes between NALs in the frame buffer. 3. GMP converts the leading AnnexB start code in the frame buffer to an AVCC length covering the entire frame (with potentially embedded AnnexB start codes if there are multiple NALs, which seems wrong). 4. OpenH264 GMP plugin wrapper (part of the OpenH264 project/download) converts AVCC lengths to AnnexB start codes (but due to 3 it only has the leading length to convert since it covers the entire frame). 5. OpenH264 decoder expects and handles AnnexB start codes if there are multiple NALs in the frame. All of this works fine if line 609 is adjusted to size-4. But I'm not sure if it works fine for OMX HW decoders which may expect strict AVCC with no AnnexB start codes within the frame. Also not sure if this size-4 issue is related to the crash at all. Don't see how since the buffer pointer would need to be invalid to cause the crash, regardless of what you're poking into that pointer.
Comment 6•10 years ago
|
||
(In reply to mzanaty from comment #5) > Meant (0x00000001) not (0x0) in c#3. > > To clarify the full pipeline of buffer mangling that happens from RTP to > OpenH264 decoder: > 1. RTP has no AnnexB start codes or AVCC lengths. > 2. JB adds AnnexB start codes between NALs in the frame buffer. > 3. GMP converts the leading AnnexB start code in the frame buffer to an AVCC > length covering the entire frame (with potentially embedded AnnexB start > codes if there are multiple NALs, which seems wrong). This wasn't intended to be an AVCC length; it was intended to be an OMX length. (Though it's still just as wrong.) The code for receiving buffers back uses the same "length includes the size field" definition. Unfortunately, it appears we implemented a different definition than OMX (or AVCC) uses, though the conversion is trivial. We can look to switch later. That code assumed it would get called with a single NAL (and even comments that it doesn't handle mode 0 data). When session_info/etc changed to pass multiple NALs, this should have been adjusted. The "better" way to support mode 0 data would likely be to use the FragmentationHeader stuff, IIRC how it works, instead of (re)scanning. > All of this works fine if line 609 is adjusted to size-4. But I'm not sure > if it works fine for OMX HW decoders which may expect strict AVCC with no > AnnexB start codes within the frame. In the plugin, the initial size is replaced with a start code; it doesn't look for multiple AVCC (or OMX-like GMP funky) units within the buffer even you you were to attempt to encode multiple units that way - it only converts the first to a start code. This all needs some coordinated cleanup. > Also not sure if this size-4 issue is related to the crash at all. Don't see > how since the buffer pointer would need to be invalid to cause the crash, > regardless of what you're poking into that pointer. Exactly.
Assignee | ||
Comment 7•10 years ago
|
||
Here's another in the same place: https://crash-stats.mozilla.com/report/index/aaba3711-4c61-4920-be3f-906352141006
Comment 8•10 years ago
|
||
Ok, looking closely at the code, I see only one obvious possibility for a null-deref here: aInputImage._length == 0 If so, CreateEmptyFrame will call DestroyBuffer() instead of allocating a buffer. We could add a MOZ_ASSERT(aInputImage._length > 0) or if (aInputImage._length == 0) { return WEBRTC_VIDEO_CODEC_OK; } However, *why* that might be 0 is an interesting question.
Updated•10 years ago
|
Assignee: nobody → rjesup
Assignee | ||
Comment 9•10 years ago
|
||
Here's a stack from a debug version: [10001] ###!!! ABORT: NULL segment: 'mSegment', file /Users/ehugg/mozilla/mozilla-beta/ipc/glue/Shmem.cpp, line 320 mozilla::ipc::Shmem::AssertInvariants() const+0x0000004A [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x007BF57A] unsigned char* mozilla::ipc::Shmem::get<unsigned char>() const+0x00000019 [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x0123A509] mozilla::gmp::GMPVideoEncodedFrameImpl::Buffer()+0x0000001C [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x020E396C] mozilla::WebrtcGmpVideoDecoder::Decode_g(webrtc::EncodedImage const&, bool, webrtc::RTPFragmentationHeader const*, webrtc::CodecSpecificInfo const*, long long)+0x0000019D [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x00FC17BD] mozilla::runnable_args_m_5_ret<mozilla::WebrtcGmpVideoDecoder*, int (mozilla::WebrtcGmpVideoDecoder::*)(webrtc::EncodedImage const&, bool, webrtc::RTPFragmentationHeader const*, webrtc::CodecSpecificInfo const*, long long), webrtc::EncodedImage, bool, web+0x000000BD [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x00FC303D] mozilla::SyncRunnable::Run()+0x00000036 [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x00F3A1F6] nsThread::ProcessNextEvent(bool, bool*)+0x000006AA [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x0016639A] NS_ProcessNextEvent(nsIThread*, bool)+0x00000097 [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x00044957] mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*)+0x000003D5 [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x007BB9A5] MessageLoop::RunInternal()+0x00000076 [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x00732586] MessageLoop::RunHandler()+0x00000015 [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x00732495] MessageLoop::Run()+0x0000002D [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x0073243D] nsThread::ThreadFunc(void*)+0x00000166 [/Users/ehugg/mozilla/mozilla-beta/./obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x00164D16]
Assignee | ||
Comment 10•10 years ago
|
||
I can get this to happen now almost every time. I could either set you up to see it happen or try out other ideas you have. I'll be back on it this afternoon.
Comment 11•10 years ago
|
||
Can you catch it in gdb and look at aInputImage._length in the ::Decode_g() stack frame?
Comment 12•10 years ago
|
||
Also, try putting a breakpoint on WebrtcGMPVideoCodec.cpp:602 with a conditional of aInputImage._length == 0 and see where this came from and why. Thanks!
Assignee | ||
Comment 13•10 years ago
|
||
Yes your guess is correct. aInputImage._length is zero in this case. It's zero clear back here in the WebRTC video_receiver. http://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_coding/main/source/video_receiver.cc#465 Also when this happens a lot of these warnings pop out of the OpenH264 decoder: PPS id is invalid! SPS id is invalid! So perhaps we're receiving junk or lost our place in the stream. I put a check like this at the top of decode_g and it doesn't crash but the video gets pretty choppy: if (!aInputImage._length) { return WEBRTC_VIDEO_CODEC_ERROR; } Also the team working on this claims that it worked OK a few weeks ago with 33.
Assignee | ||
Comment 14•10 years ago
|
||
Assignee | ||
Updated•10 years ago
|
Assignee: rjesup → ethanhugg
Status: NEW → ASSIGNED
Assignee | ||
Comment 15•10 years ago
|
||
Using the this length-check patch, call worked for a bit and got these results from about:webrtc using two instance of FF on the same machine. Side that initiated the call: PeerConnection:1412722593909716 (id=24 url=https://dev-2.wbx2.com/) 16:02:56 GMT-0700 (PDT) ICE statistics 1412722593909716 (id=24 url=https://dev-2.wbx2.com/): stream1/audio Local candidate Remote candidate ICE State Priority Nominated Selected 192.168.1.2:52557/udp(host) 166.78.47.126:33488/udp(host) failed 9149908167411368000 166.137.214.117:62502/udp(peerreflexive) 166.78.47.126:33488/udp(host) succeeded 7996986662804521000 * * Local candidate addr Type 192.168.1.2:52557/udp host 166.137.214.117:32547/udp serverreflexive 166.137.214.117:32542/udp peerreflexive Remote candidate addr Type 166.78.47.126:33488/udp host 1412722593909716 (id=24 url=https://dev-2.wbx2.com/): stream2/video Local candidate Remote candidate ICE State Priority Nominated Selected 192.168.1.2:52230/udp(host) 166.78.47.126:33568/udp(host) failed 9149908167411368000 166.137.214.117:62502/udp(peerreflexive) 166.78.47.126:33568/udp(host) succeeded 7996986662804521000 * * Local candidate addr Type 192.168.1.2:52230/udp host 166.137.214.117:35438/udp serverreflexive 166.137.214.117:62502/udp peerreflexive Remote candidate addr Type 166.78.47.126:33568/udp host Local SDP v=0 o=Mozilla-SIPUA-33.0 21185 0 IN IP4 0.0.0.0 s=SIP Call t=0 0 a=ice-ufrag:b3aa1151 a=ice-pwd:c4604c1ad0301b3ad9ae94bfc483b264 a=fingerprint:sha-256 00:C4:BB:7E:AC:8F:25:03:E9:F4:87:6F:30:28:B9:3A:D5:7B:42:E7:3E:42:62:DA:A0:6E:D2:1A:FD:A9:BE:D0 m=audio 52557 RTP/SAVPF 109 0 8 101 c=IN IP4 192.168.1.2 a=rtpmap:109 opus/48000/2 a=ptime:20 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-15 a=sendrecv a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=setup:actpass a=candidate:0 1 UDP 2130379007 192.168.1.2 52557 typ host a=candidate:0 2 UDP 2130379006 192.168.1.2 49455 typ host a=rtcp-mux m=video 52230 RTP/SAVPF 120 126 97 c=IN IP4 192.168.1.2 a=rtpmap:120 VP8/90000 a=rtpmap:126 H264/90000 a=fmtp:126 profile-level-id=42e01f;packetization-mode=1 a=rtpmap:97 H264/90000 a=fmtp:97 profile-level-id=42e01f a=sendrecv a=rtcp-fb:120 nack a=rtcp-fb:120 nack pli a=rtcp-fb:120 ccm fir a=rtcp-fb:126 nack a=rtcp-fb:126 nack pli a=rtcp-fb:126 ccm fir a=rtcp-fb:97 nack a=rtcp-fb:97 nack pli a=rtcp-fb:97 ccm fir a=setup:actpass a=candidate:0 1 UDP 2130379007 192.168.1.2 52230 typ host a=candidate:0 2 UDP 2130379006 192.168.1.2 51049 typ host a=rtcp-mux Remote SDP v=0 o=linus 0 0 IN IP4 166.78.47.126 s=- t=0 0 a=ice-lite m=audio 33488 RTP/AVP 109 c=IN IP4 166.78.47.126 b=TIAS:64000 a=sendrecv a=rtpmap:109 opus/48000/2 a=fmtp:109 PROFILE=0;LEVEL=0;packetization-mode=0;level-asymmetry-allowed=1;parameter-add=1;maxaveragebitrate=64000;usedtx=0;stereo=1;useinbandfec=0;cbr=0 a=rtcp-mux a=label:100 a=ice-ufrag:NT++ViZb a=ice-pwd:nT2cTHzxKuhL1IeGsLy9BvhvV+dZvvs5 a=candidate:0 1 UDP 2130706431 166.78.47.126 33488 typ host a=candidate:8 1 TCP 1962934271 166.78.47.126 33488 typ host tcptype passive a=setup:passive a=fingerprint:sha-1 49:5F:57:B5:99:ED:31:0A:05:AF:C6:AE:93:AA:34:02:3E:5B:FF:85 m=video 33568 RTP/AVP 126 c=IN IP4 166.78.47.126 b=TIAS:256000 a=sendrecv a=rtpmap:126 H264/90000 a=fmtp:126 PROFILE=0;LEVEL=0;profile-level-id=42000C;packetization-mode=1;level-asymmetry-allowed=1;max-mbps=27600;max-fs=920;max-br=1000;parameter-add=1;usedtx=0;stereo=0;useinbandfec=0;cbr=0 a=rtcp-fb:* nack pli a=rtcp-fb:* ccm fir a=rtcp-mux a=label:200 a=ice-ufrag:aCzvw9fW a=ice-pwd:mliaZK7PxfKG9mmkI2dEwz8TQYCiLh+2 a=candidate:0 1 UDP 2130706431 166.78.47.126 33568 typ host a=candidate:8 1 TCP 1962934271 166.78.47.126 33568 typ host tcptype passive a=setup:passive a=fingerprint:sha-1 49:5F:57:B5:99:ED:31:0A:05:AF:C6:AE:93:AA:34:02:3E:5B:FF:85 RTP statistics outbound_rtp_audio_0 Local: 16:02:56 GMT-0700 (PDT) outboundrtp SSRC: 1830038791 Sent: 18992 packets (1145.35 Kb) Remote: 16:02:55 GMT-0700 (PDT) inboundrtp SSRC: 1830038791 Received: 17457 packets (664.87 Kb) Lost: 1472 Jitter: 0 RTT: 408 ms outbound_rtp_video_1 Encoder: Avg. bitrate: 1.32 Mbps (0.47 SD) Avg. framerate: 16.21 fps (1.19 SD) Dropped frames: 4294935271 Local: 16:02:56 GMT-0700 (PDT) outboundrtp SSRC: 1400586114 Sent: 55936 packets (60566.03 Kb) Remote: 16:02:55 GMT-0700 (PDT) inboundrtp SSRC: 1400586114 Received: 46438 packets (49249.68 Kb) Lost: 9302 Jitter: 0 RTT: 448 ms inbound_rtp_audio_1 A/V sync: 0 ms Jitter-buffer delay: 523 ms Local: 16:02:56 GMT-0700 (PDT) inboundrtp SSRC: 4184994738 Received: 16256 packets (884.56 Kb) Lost: 0 Jitter: 0 Remote: 16:02:52 GMT-0700 (PDT) outboundrtp SSRC: 4184994738 Sent: 17644 packets (685.2 Kb) inbound_rtp_video_2 Decoder: Avg. bitrate: 0.02 Mbps (0.22 SD) Avg. framerate: 0.02 fps (0.26 SD) Discarded packets: 0 Local: 16:02:56 GMT-0700 (PDT) inboundrtp SSRC: 3135148106 Received: 33848 packets (35580.11 Kb) Lost: 8925 Jitter: 9.315 Remote: 16:02:55 GMT-0700 (PDT) outboundrtp SSRC: 3135148106 Sent: 42788 packets (44687.62 Kb) Side that accepted the call: PeerConnection:1412722616374064 (id=25 url=https://dev-2.wbx2.com/) 16:03:13 GMT-0700 (PDT) ICE statistics 1412722616374064 (id=25 url=https://dev-2.wbx2.com/): stream1/audio Local candidate Remote candidate ICE State Priority Nominated Selected 192.168.1.2:60124/udp(host) 166.78.47.126:33548/udp(host) failed 9149908167411368000 166.137.214.117:60604/udp(peerreflexive) 166.78.47.126:33548/udp(host) succeeded 7996986662804521000 * * Local candidate addr Type 192.168.1.2:60124/udp host 166.137.214.117:41544/udp serverreflexive 166.137.214.117:49661/udp peerreflexive Remote candidate addr Type 166.78.47.126:33548/udp host 1412722616374064 (id=25 url=https://dev-2.wbx2.com/): stream2/video Local candidate Remote candidate ICE State Priority Nominated Selected 192.168.1.2:61927/udp(host) 166.78.47.126:33440/udp(host) failed 9149908167411368000 166.137.214.117:60604/udp(peerreflexive) 166.78.47.126:33440/udp(host) succeeded 7996986662804521000 * * Local candidate addr Type 192.168.1.2:61927/udp host 166.137.214.117:43694/udp serverreflexive 166.137.214.117:60604/udp peerreflexive Remote candidate addr Type 166.78.47.126:33440/udp host Local SDP v=0 o=Mozilla-SIPUA-33.0 16753 0 IN IP4 0.0.0.0 s=SIP Call t=0 0 a=ice-ufrag:04aed478 a=ice-pwd:94b9fbe2fd56549ffce840d486227520 a=fingerprint:sha-256 72:18:C6:92:B3:37:68:A8:EA:3D:10:F1:F6:BA:FA:23:77:15:9D:FE:E2:61:FF:CC:E9:81:08:F0:0F:2D:BE:5D m=audio 41544 RTP/SAVPF 109 0 8 101 c=IN IP4 166.137.214.117 a=rtpmap:109 opus/48000/2 a=ptime:20 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:101 telephone-event/8000 a=fmtp:101 0-15 a=sendrecv a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level a=setup:actpass a=candidate:0 1 UDP 2130379007 192.168.1.2 60124 typ host a=candidate:1 1 UDP 1694236671 166.137.214.117 41544 typ srflx raddr 192.168.1.2 rport 60124 a=candidate:0 2 UDP 2130379006 192.168.1.2 57558 typ host a=candidate:1 2 UDP 1694236670 166.137.214.117 48694 typ srflx raddr 192.168.1.2 rport 57558 a=rtcp-mux m=video 43694 RTP/SAVPF 120 126 97 c=IN IP4 166.137.214.117 a=rtpmap:120 VP8/90000 a=rtpmap:126 H264/90000 a=fmtp:126 profile-level-id=42e01f;packetization-mode=1 a=rtpmap:97 H264/90000 a=fmtp:97 profile-level-id=42e01f a=sendrecv a=rtcp-fb:120 nack a=rtcp-fb:120 nack pli a=rtcp-fb:120 ccm fir a=rtcp-fb:126 nack a=rtcp-fb:126 nack pli a=rtcp-fb:126 ccm fir a=rtcp-fb:97 nack a=rtcp-fb:97 nack pli a=rtcp-fb:97 ccm fir a=setup:actpass a=candidate:0 1 UDP 2130379007 192.168.1.2 61927 typ host a=candidate:1 1 UDP 1694236671 166.137.214.117 43694 typ srflx raddr 192.168.1.2 rport 61927 a=candidate:0 2 UDP 2130379006 192.168.1.2 64821 typ host a=candidate:1 2 UDP 1694236670 166.137.214.117 56949 typ srflx raddr 192.168.1.2 rport 64821 a=rtcp-mux Remote SDP v=0 o=linus 0 0 IN IP4 166.78.47.126 s=- t=0 0 a=ice-lite m=audio 33548 RTP/AVP 109 c=IN IP4 166.78.47.126 b=TIAS:64000 a=sendrecv a=rtpmap:109 opus/48000/2 a=fmtp:109 PROFILE=0;LEVEL=0;packetization-mode=0;level-asymmetry-allowed=1;parameter-add=1;maxaveragebitrate=64000;usedtx=0;stereo=1;useinbandfec=0;cbr=0 a=rtcp-mux a=label:100 a=ice-ufrag:YKUmCYoF a=ice-pwd:NAqEOvtNauhF+Q/OkxY9Jn+4s6ANEWOQ a=candidate:0 1 UDP 2130706431 166.78.47.126 33548 typ host a=candidate:8 1 TCP 1962934271 166.78.47.126 33548 typ host tcptype passive a=setup:passive a=fingerprint:sha-1 49:5F:57:B5:99:ED:31:0A:05:AF:C6:AE:93:AA:34:02:3E:5B:FF:85 m=video 33440 RTP/AVP 126 c=IN IP4 166.78.47.126 b=TIAS:256000 a=sendrecv a=rtpmap:126 H264/90000 a=fmtp:126 PROFILE=0;LEVEL=0;profile-level-id=42000C;packetization-mode=1;level-asymmetry-allowed=1;max-mbps=27600;max-fs=920;max-br=1000;parameter-add=1;usedtx=0;stereo=0;useinbandfec=0;cbr=0 a=rtcp-fb:* nack pli a=rtcp-fb:* ccm fir a=rtcp-mux a=label:200 a=ice-ufrag:M1icM74s a=ice-pwd:r73dtAvkOH6BOpctajBt+oa8JlD9Gnwa a=candidate:0 1 UDP 2130706431 166.78.47.126 33440 typ host a=candidate:8 1 TCP 1962934271 166.78.47.126 33440 typ host tcptype passive a=setup:passive a=fingerprint:sha-1 49:5F:57:B5:99:ED:31:0A:05:AF:C6:AE:93:AA:34:02:3E:5B:FF:85 RTP statistics outbound_rtp_audio_0 Local: 16:03:13 GMT-0700 (PDT) outboundrtp SSRC: 4184994738 Sent: 18699 packets (1128.03 Kb) Remote: 16:03:13 GMT-0700 (PDT) inboundrtp SSRC: 4184994738 Received: 16840 packets (641.37 Kb) Lost: 1528 Jitter: 0 RTT: 720 ms outbound_rtp_video_1 Encoder: Avg. bitrate: 1.07 Mbps (0.37 SD) Avg. framerate: 15.83 fps (1.21 SD) Dropped frames: 4294946714 Local: 16:03:13 GMT-0700 (PDT) outboundrtp SSRC: 3135148106 Sent: 44965 packets (47955.65 Kb) Remote: 16:03:13 GMT-0700 (PDT) inboundrtp SSRC: 3135148106 Received: 36214 packets (37805.44 Kb) Lost: 8532 Jitter: 0 RTT: 1524 ms inbound_rtp_audio_1 A/V sync: 0 ms Jitter-buffer delay: 60 ms Local: 16:03:13 GMT-0700 (PDT) inboundrtp SSRC: 1830038791 Received: 17076 packets (929.07 Kb) Lost: 0 Jitter: 0 Remote: 16:03:05 GMT-0700 (PDT) outboundrtp SSRC: 1830038791 Sent: 19423 packets (754.24 Kb) inbound_rtp_video_2 Decoder: Avg. bitrate: 0.00 Mbps (0.00 SD) Avg. framerate: 0.00 fps (0.00 SD) Discarded packets: 0 Local: 16:03:13 GMT-0700 (PDT) inboundrtp SSRC: 1400586114 Received: 47120 packets (50653.75 Kb) Lost: 9686 Jitter: 4.641 Remote: 16:03:11 GMT-0700 (PDT) outboundrtp SSRC: 1400586114 Sent: 58298 packets (61890.17 Kb)
Comment 16•10 years ago
|
||
Ok, H.264 mode 1 was selected at both ends. from the code (by inspection), a=rtcp-fb:* should work. I'll also note the profile-level-id value seems very low (42000c) - 1.2. I'll need more logs/traces and maybe adding log statements about the NAL sequence received. signaling:5,gmp:6 would be a good start
Assignee | ||
Comment 17•10 years ago
|
||
Logs from a successful but choppy call. Initiating side, Windows with signaling:5,GMP:6,webrtc_trace:65535
Assignee | ||
Comment 18•10 years ago
|
||
Logs from a successful, but choppy call. Answering side, OSX with signaling:5,GMP:5,webrtc_trace:65535
Assignee | ||
Comment 19•10 years ago
|
||
Also from the run that created those logs. Both sides had the message "PPS id is invalid" many times in the console which comes from here: https://github.com/cisco/openh264/blob/v1.1-Firefox33/codec/decoder/core/src/decoder_core.cpp#L539
Assignee | ||
Comment 20•10 years ago
|
||
Logs with pcap of both sides put up here: https://drive.google.com/folderview?id=0B3jogUA4XhcTOE9ubDBsb044VjA&usp=sharing Send me a note with your google account ID if you want to view and I'll add you to the list.
Assignee | ||
Updated•10 years ago
|
Attachment #8501805 -
Attachment is obsolete: true
Assignee | ||
Updated•10 years ago
|
Attachment #8501806 -
Attachment is obsolete: true
Assignee | ||
Comment 21•10 years ago
|
||
nspr log of crash on OSX with signaling:6,GMP:5,webrtc_trace:65535,timestamp and WEBRTC_TRACE_FILE=nspr
Assignee | ||
Comment 22•10 years ago
|
||
This crash appears to not be caused by a particular incoming RTP stream, but rather packet loss. It showed up in this particular application because all of the video streams are being routed through a server and most of our other tests are peer-to-peer. On this line we are setting the video engine to continue decoding a frame with errors: http://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/video_engine/vie_channel.cc#175 This is why we were getting errors like "PPS id is invalid" from the OpenH264 decoder during high packet loss. In a packet loss situation we can then end up with a zero-length frame coming out of VCMJitterBuffer::ExtractAndSetDecode. WebrtcGmpVideoDecoder::Decode_g will then crash on a zero-length frame. Three possible places we could fix this (there are probably more): 1. Always set decode error mode to kNoErrors for H.264 2. Have WebrtcGmpVideoDecoder::Decode_g return an error on zero-length frames. 3. Have VCMReceiver::FrameForDecoding return NULL instead of any zero-length frames.
Reporter | ||
Comment 23•10 years ago
|
||
(In reply to Ethan Hugg [:ehugg] from comment #22) > Three possible places we could fix this (there are probably more): > 1. Always set decode error mode to kNoErrors for H.264 > 2. Have WebrtcGmpVideoDecoder::Decode_g return an error on zero-length > frames. > 3. Have VCMReceiver::FrameForDecoding return NULL instead of any zero-length > frames. Maybe we should do all 3 for extra safety in the short term. #3 might need to be specific for H.264. While I can’t imagine why video engine would think it makes sense to invoke decode of an empty frame for any codec, it seems safest to leave other codecs alone since they may have some specific hacks or odd behavior that relies on empty frames.
Assignee | ||
Updated•10 years ago
|
Attachment #8502210 -
Attachment is obsolete: true
Assignee | ||
Comment 24•10 years ago
|
||
I would prefer to simply catch them in Decode_g (#2) which is what the attached patch does. This has very little chance of side effects so the testing would be simpler and it would be easier to uplift.
Reporter | ||
Comment 25•10 years ago
|
||
(In reply to Ethan Hugg [:ehugg] from comment #24) That should be sufficient. Any of them alone should be sufficient. More than one is just healthy paranoia. If I had to pick one, #1 seems to catch more things earliest. But again, any of them should be sufficient.
Comment 26•10 years ago
|
||
DecodeWithErrors actually has a purpose, especially with mode 0 - it could pass all the "good" NALs through and you'd get a partial update instead of total freeze. For mode 1, IF you you only include data up to the first missing packet, you also could get a partial decode. However - what I think would work better (at least for h.264) here is "decode good frames even if we skipped a frame due to loss". In talking-head usage, this rarely causes major visual bugs and recovery wipes any small ones away in 0.1-0.3 seconds; often one barely would notice unless the screen is large and bandwidth is high. kNoErrors (and disabling dual-decode) implies "freeze on error"; I don't like it. Really there should be an explicit "session->ExtractPacketWithErrors())" that is codec-specific. As it stands we *cannot* allow decoding with errors because of how Mode 1 NALs are accumulated in the buffer (not to mention it doesn't check HaveFirstPacket()). If the "dual" decoder used "kDecodeAllCompleteFrames" (i.e. skip frames with errors), that would be better. kSelectiveErrors would be smarter to use than kWithErrors or kNoErrors, but it has an evil test where it won't decode a frame-with-loss if it has received between between 0.2 and 0.8 of the "average" number of packets/frame. At least it does check HaveFirstPacket. I'll note nothing ever sets kSelectiveErrors right now. So I think that's out for now, OR we modify it to decode complete frames only (but not stall on error).
Comment 27•10 years ago
|
||
Comment 28•10 years ago
|
||
Ethan - can you give that a try, without the 0-length-buffer check (and with it)? I think this might yield a better experience (beyond just "not crashing"). To land, I'd take this and #2 or #3 as a safety valve (if this is better than #1, which would be a bit painful to do for H.264 only). We could also run this one where we leave it kWithErrors, but change the code for kWithErrors from decodable_ = true to decodable_ = (is_h264 ? complete_ : true).
Assignee | ||
Comment 29•10 years ago
|
||
OK, I ran a bunch of tests 1. Without either patch it crashes within the first minute. 2. with the 0-length-frame patch it breaks up, freezes, recovers every few seconds but doesn't crash. Ran for over 20 mins. 3. With the selective errors patch it doesn't crash after over 20 mins but seems less able to recover. Part of the screen will stay wrong for a long time and eventually it won't recover at all and just freeze. 4. With both patches the result is the same as #3. I ran #3 a bunch of times thinking it might be chance, but it really seems to recover differently.
Comment 30•10 years ago
|
||
I think this is the 'right' fix, plus comments as to why. Basically, I found that session_info.cc MakeDecodable() strips partial NALs and packets after a mising one (good!) but if the first packet of a NAL was missing it strips the entire NAL, which can leave 0 bytes to decode. I would land this and the patch to filter them at the GMP decoder as belt-and-suspenders. Ethan, can you try this alone and see if it's equivalent to your existing patch?
Updated•10 years ago
|
Attachment #8505253 -
Flags: review?(ethanhugg)
Comment 31•10 years ago
|
||
Comment on attachment 8501415 [details] [diff] [review] Check for zero length input before decode Review of attachment 8501415 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/webrtc/signaling/src/media-conduit/WebrtcGmpVideoCodec.cpp @@ +591,5 @@ > // destroyed via Terminate() > return WEBRTC_VIDEO_CODEC_ERROR; > } > > + if (!aInputImage._length) { As I usually only use ! for booleans and pointers, I'll likely land this as "if (aInputImage._length == 0)"
Attachment #8501415 -
Flags: review+
Assignee | ||
Comment 32•10 years ago
|
||
Comment on attachment 8505253 [details] [diff] [review] Don't return 0-length frames for decoding; comments Review of attachment 8505253 [details] [diff] [review]: ----------------------------------------------------------------- Tested this patch alone as well as both patches together and they worked as expected. No crashes after 20mins and video periodically recovers.
Attachment #8505253 -
Flags: review?(ethanhugg) → review+
Assignee | ||
Comment 33•10 years ago
|
||
Try run based on 33 primarily for the wider team to test: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=e0da21982b46
Reporter | ||
Comment 34•10 years ago
|
||
Comment on attachment 8505253 [details] [diff] [review] Don't return 0-length frames for decoding; comments Review of attachment 8505253 [details] [diff] [review]: ----------------------------------------------------------------- Changes look good to fix the crash. However, MakeDecodable has more problems, at least for the OpenH264 decoder. It assumes the decoder can always handle truncated NALs reliably. Some can, most can't. The OpenH264 decoder can't, and therefore should only be fed complete NALs. That would be safest for OMX HW decoders as well. I'll file a separate bug for this.
Attachment #8505253 -
Flags: review+
Assignee | ||
Comment 35•10 years ago
|
||
Treeherder seems unable to find this try run so here it is on tbpl: https://tbpl.mozilla.org/?tree=Try&rev=e0da21982b46
Comment 36•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/8a1aa0399dca https://hg.mozilla.org/integration/mozilla-inbound/rev/adde0b51f5b2
Comment 37•10 years ago
|
||
(In reply to mzanaty from comment #34) > Changes look good to fix the crash. However, MakeDecodable has more > problems, at least for the OpenH264 decoder. It assumes the decoder can > always handle truncated NALs reliably. Some can, most can't. The OpenH264 > decoder can't, and therefore should only be fed complete NALs. That would be > safest for OMX HW decoders as well. I'll file a separate bug for this. Ethan's test shows it must handle truncated NALs fairly well (his test is mode 1, so tons of NALs are getting truncated). An earlier test he did for me where I would only mark frames as decodable if they were complete worked, but not as well. (For mode 1, the only place where that would be different than modifying MakeDecodable to be whole-nals would be on iframes, maybe).
Comment 38•10 years ago
|
||
[Tracking Requested - why for this release]: Important crash fix for a crash that can happen at any time in an h.264 call (if you're unlucky on packet loss). Will also want to ride-along in any 33 updates
status-firefox33:
--- → affected
status-firefox34:
--- → affected
status-firefox35:
--- → affected
status-firefox36:
--- → affected
status-firefox-esr31:
--- → unaffected
tracking-firefox34:
--- → ?
Whiteboard: [openh264-uplift]
Reporter | ||
Comment 39•10 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #37) Verified the OpenH264 v1.1 decoder can indeed reliably handle truncated NALs. So MakeDecodable works fine and no separate bug needed. (At least for OpenH264; can't speak for OMX HW decoders which may be hard to generalize.)
Comment 40•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/8a1aa0399dca https://hg.mozilla.org/mozilla-central/rev/adde0b51f5b2
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Updated•10 years ago
|
tracking-firefox33:
--- → +
Comment 41•10 years ago
|
||
Comment on attachment 8505253 [details] [diff] [review] Don't return 0-length frames for decoding; comments Would like to see this in 33.0.x if we build one Approval Request Comment [Feature/regressing bug #]: H264 support/OpenH264 [User impact if declined]: Random crashes if there's any packet loss not recovered with NACK [Describe test coverage new/current, TBPL]: We have no tests for real network conditions in our infra yet. Manually tested by Cisco on a connection with ~50% packet loss and 1s RTT, where it normally crashes within a minute. [Risks and why]: Extremely low risk. If we want no risk at all, we can take just the "ignore 0-length frames" patch alone on release, but the risk of the full fix here is virtually 0. [String/UUID change made/needed]: None
Attachment #8505253 -
Flags: approval-mozilla-release?
Attachment #8505253 -
Flags: approval-mozilla-beta?
Attachment #8505253 -
Flags: approval-mozilla-aurora?
Comment 42•10 years ago
|
||
Comment on attachment 8505253 [details] [diff] [review] Don't return 0-length frames for decoding; comments Approved for Beta and Aurora. Sylvestre will need to make a call on including this in any point release.
Attachment #8505253 -
Flags: approval-mozilla-beta?
Attachment #8505253 -
Flags: approval-mozilla-beta+
Attachment #8505253 -
Flags: approval-mozilla-aurora?
Attachment #8505253 -
Flags: approval-mozilla-aurora+
Comment 43•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/e0616172775d https://hg.mozilla.org/releases/mozilla-aurora/rev/d0081f5a668f https://hg.mozilla.org/releases/mozilla-beta/rev/bb26f4854630 https://hg.mozilla.org/releases/mozilla-beta/rev/452bc7db811e
Whiteboard: [openh264-uplift]
Comment 44•10 years ago
|
||
Comment on attachment 8505253 [details] [diff] [review] Don't return 0-length frames for decoding; comments I didn't want to take this in 33.0.1 to make sure it is only about OMTC & top crashes. However, we are probably going to make a 33.0.2 and I am going to take this one. Can this be landed in GECKO330_2014101104_RELBRANCH too? Thanks
Attachment #8505253 -
Flags: approval-mozilla-release? → approval-mozilla-release+
Comment 45•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-release/rev/be43cc1b2373 https://hg.mozilla.org/releases/mozilla-release/rev/45dd53a5354b https://hg.mozilla.org/releases/mozilla-release/rev/71f3ac3edb10 https://hg.mozilla.org/releases/mozilla-release/rev/498e36b0685a
Comment 46•10 years ago
|
||
Do you need help from the Manual QA team to verify this fix for 33.0.1? If yes, can you also please provide some clear instructions to test?
Comment 47•10 years ago
|
||
(In reply to Florin Mezei, QA (:FlorinMezei) from comment #46) > Do you need help from the Manual QA team to verify this fix for 33.0.1? If > yes, can you also please provide some clear instructions to test? From https://hg.mozilla.org/releases/mozilla-release/ It seems 33.0.1 build1 doesn't contain the fix.
Comment 49•10 years ago
|
||
(In reply to mzanaty from comment #39) > (In reply to Randell Jesup [:jesup] from comment #37) > Verified the OpenH264 v1.1 decoder can indeed reliably handle truncated > NALs. So MakeDecodable works fine and no separate bug needed. (At least for > OpenH264; can't speak for OMX HW decoders which may be hard to generalize.) Is there a way for other people to verify this, or does this require special hardware/devices (that's what it sounded like from the initial report)?
Flags: needinfo?(mzanaty)
Assignee | ||
Comment 50•10 years ago
|
||
This does not require special hardware or devices, simply a case where there is lots of packet loss. From the duplicate bug above you can see we are not the only ones that ran into this. You need to have enough packet loss so that you run into the right combination of packets to cause this. Ping me on IRC if you'd like to duplicate my test scenario.
Flags: needinfo?(mzanaty)
Reporter | ||
Comment 51•10 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #49) This can be verified by inducing specific packet loss patterns between two Firefox browsers in a WebRTC H.264 call. The first packet(s) of a frame must be lost ("leading loss") to reproduce the crash. With high enough packet loss rates, this may happen within seconds or minutes. To verify the fix, video must continue and recover despite losing the first packet(s) of a frame which can be verified in wireshark.
You need to log in
before you can comment on or make changes to this bug.
Description
•