1054624 - OpenH264 calls between two machines shows streaming issues

Reporter

Description

•

11 years ago

The video in calls between two hosts in the Mozilla MV office if OpenH264 is used hangs a lot and only updates the picture every couple of seconds, or works for a seconds but hangs then again. A screen recording of such a call: https://dl.dropboxusercontent.com/u/17831554/h264/H264video.mov about:webrtc copies from both sides of the call: https://dl.dropboxusercontent.com/u/17831554/h264/offer.txt https://dl.dropboxusercontent.com/u/17831554/h264/answer.txt For this call between Win and Mac both hosts were on the Mozila WiFi (not Mozilla Guest). Both machines were running the public build Nightly 34.0a1 (2014-08-15). Ping time between was 3ms (with few exceptions). According to the RTCP reports in the about:webrtc dumps there was some packet loss. This happened for me for calls between my Win <-> Mac machines and Win <-> Linux. For some unknown reason I was not able to reproduce the problem for Linux <-> Mac. Direction of the call does not seem to matter. I had very few calls which were running fine for a few seconds, before the video would freeze on both ends and then only update the picture only every couple of seconds and never recover from this bad video streaming state. Calls on just one browser on all OS's work fine on webrtc-landing/pc-test.html. A call between two browsers on one Mac works fine as well.

Nils Ohlmeier [:drno]

Reporter

Comment 1

•

11 years ago

A call between two Mac's in my home WiFi shows the same problem, but not as bad as in the office. The video rendering freezes from time to time on end. If it freezes it is usually frozen for several seconds, but it always recovered. In my home network most of the time only one remote stream at a time would freeze, where in the office network usually both streams froze at the same time. According to the RTCP reports there was packet loss as well, but less then in the office network. about:webrtc dumps: https://dl.dropboxusercontent.com/u/17831554/h264/offer2.txt https://dl.dropboxusercontent.com/u/17831554/h264/answer2.txt

Nils Ohlmeier [:drno]

Reporter

Comment 2

•

11 years ago

I think the common nominator here seems to be packet loss. Maire can you find the best person to answer: How does our implementation handle delays and/or queuing of packets over the IPC to and from the Plugin?

Blocks: WebRTC-OpenH264

Severity: normal → critical

Flags: needinfo?(mreavy)

Maire Reavy [:mreavy]

Comment 3

•

11 years ago

Thanks for the report, Nils. Can you verify that Aurora has the same problem? We did check basic packet loss recovery a couple of weeks ago (artificially injecting packet loss), and it seemed to work fine. webrtc.org logs would help a lot. During a call that's showing this problem, can you open about:webrtc logs and click "Start debug mode" from both sides of the call and upload the logs here? If this is a bug in packet loss recovery, Randell's the best person to start looking at this. I've given him a heads up on this bug and assigned it to him for now. He said he should be able to take a look at it later today. The webrtc.org logs should help get us the visibility we need into what's going on.

Assignee: nobody → rjesup

Flags: needinfo?(mreavy)

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 4

•

11 years ago

What is the build source(s)? From the SDP, I see this: a=fmtp:126 PROFILE=0;LEVEL=0;profile-level-id=42e00d;packetization-mode=1;level-asymmetry-allowed=1;parameter-add=1;usedtx=0;stereo=0;useinbandfec=0;cbr=0 Nightly (or any other build) shouldn't show that; when I do a pc_test call forcing H.264 I see: a=fmtp:126 profile-level-id=42e00d;packetization-mode=1;max-fs=600 There was a time when we'd get the PROFILE=0;LEVEL=0 stuff, but IIRC it was early in the h.264 work. ALso, note it's getting audio fmtp values on a video codec line.

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 5

•

11 years ago

Tried locally with induced packet loss and with packet loss plus disabling NACK - fine (minor stutters when recovering without NACK). (pc_test.html, modded to strip NACK negotiation if wanted) Tried Linux->Mac nightly (fine) with induced packet loss on Talky Tried Linus->Windows Debug (inbound) with induced packet loss on Talky - fine. a short freeze or two that recovered, but windows debug builds have trouble keeping up at all.

Nils Ohlmeier [:drno]

Reporter

Comment 6

•

11 years ago

(In reply to Randell Jesup [:jesup] from comment #4) > What is the build source(s)? Build source is our public builds. I don't know how to extract more precise build related information out of these then the version number from "About Firefox" > From the SDP, I see this: > > a=fmtp:126 > PROFILE=0;LEVEL=0;profile-level-id=42e00d;packetization-mode=1;level- > asymmetry-allowed=1;parameter-add=1;usedtx=0;stereo=0;useinbandfec=0;cbr=0 > > Nightly (or any other build) shouldn't show that; when I do a pc_test call > forcing H.264 I see: > a=fmtp:126 profile-level-id=42e00d;packetization-mode=1;max-fs=600 > > There was a time when we'd get the PROFILE=0;LEVEL=0 stuff, but IIRC it was > early in the h.264 work. Right now I don't know what is actually being used because of bug 1055080. All I can say is that the profile 0 thing is not showing in the SDP which crosses the wire. But until we know which part of the system is lying to us here... > ALso, note it's getting audio fmtp values on a > video codec line. I don't get what you mean. I see fmtp 101, 126 and 97 in the offers and answers all properly in the section where I would expect them.

Nils Ohlmeier [:drno]

Reporter

Comment 7

•

11 years ago

Attached file about:webrtc and debug log files — Details

H264 video call where video gets super choppy. About:webrtc and debug logs are in the tarball. Again public Nightly's from today 34.0a1 (2014-08-18) which according to about:buildconfig translates to https://hg.mozilla.org/mozilla-central/rev/0aaa2d3d15cc

Eric Rescorla (:ekr)

Comment 8

•

11 years ago

I see the same anomalous stuff as noted in c4 in working loopback scenarios. So, I suspect that this is just a separate issue about about:webrtc

Eric Rescorla (:ekr)

Comment 9

•

11 years ago

FWIW, here is what I see being configured in this setting: (gdb) p *$1 $2 = { codecType = webrtc::kVideoCodecH264, plName = "H264", '\0' <repeats 27 times>, plType = 126 '~', width = 320, height = 240, startBitrate = 300, maxBitrate = 2000, minBitrate = 200, maxFramerate = 30 '\036', codecSpecific = { VP8 = { pictureLossIndicationOn = 66, feedbackModeOn = 224, complexity = webrtc::kComplexityNormal, resilience = webrtc::kResilienceOff, numberOfTemporalLayers = 0 '\0', denoisingOn = false, errorConcealmentOn = false, automaticResizeOn = false, frameDroppingOn = false, keyFrameInterval = 0 }, H264 = { profile = 66 'B', constraints = 224 '?', level = 13 '\r', packetizationMode = 1 '\001', frameDroppingOn = false, keyFrameInterval = 0, spsData = 0x0, spsLen = 0, ppsData = 0x0, ppsLen = 0 } },

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 10

•

11 years ago

(In reply to Nils Ohlmeier [:drno] from comment #6) > > ALso, note it's getting audio fmtp values on a > > video codec line. > > I don't get what you mean. I see fmtp 101, 126 and 97 in the offers and > answers all properly in the section where I would expect them. For the other bug, but: usedtx=0;stereo=0 and others are audio fmtp values.

Nils Ohlmeier [:drno]

Reporter

Comment 11

•

11 years ago

So here is a Pcap and a webrtc debug log from a call again with todays Nightly. This time between Win and Linux. Both machines are on cable. As the file is too big here is a link: http://people.mozilla.org/~nohlmeier/bz1054624/h264_dump_log.tar.bz2

Maire Reavy [:mreavy]

Updated

•

11 years ago

Whiteboard: [openh264-uplift]

Nils Ohlmeier [:drno]

Reporter

Comment 12

•

11 years ago

So here is a PCAP and WebRTC.log (sorry don't know which of the two browsers succeeded in creating this file) from a working call between Aurora and Nightly on my Mac. http://people.mozilla.org/~nohlmeier/bz1054624/loopback.tar.bz2

Nils Ohlmeier [:drno]

Reporter

Comment 13

•

11 years ago

This tarball contains PCAP, WebRTC.log and about:webrtc from a call between two Nightly's from today. Both machines on MV office WiFi. Picture froze pretty quickly. http://people.mozilla.org/~nohlmeier/bz1054624/full-logs-mac-win.tar.bz2

Nils Ohlmeier [:drno]

Reporter

Comment 14

•

11 years ago

FYI the code for my test call can be found here: https://github.com/nils-ohlmeier/mozwebrtctools/tree/master/manual_call The pages depend on a simplesignlaing server (https://github.com/luser/simplesignalling) running somewhere (and the hard coded address in both pages to the manually updated to that server address). I just double checked that ICE connection gets established reasonably fast, but H264 rendering sometimes takes several seconds after ICE connection to start rendering the remote video.

Nils Ohlmeier [:drno]

Reporter

Comment 15

•

11 years ago

I replaced modifying the SDP offer in offer.html with setting the user pref media.navigator.video.preferred_codec to 126. Verification shows that the SDP looks pretty much the same as before (the m=video line shows only 126, but the attribute lines for all video codecs are still there). Advantage is that you can use apprtc.appspot.com or any other WebRTC video calling service. When I tried it yesterday with my own signaling server and with apprtc it still showed the same result.

Nils Ohlmeier [:drno]

Reporter

Comment 16

•

11 years ago

Sorry forgot to mention in my last comment that my test was again between my MacBook Pro and my Windows ThinkPad laptop in the office network.

Nils Ohlmeier [:drno]

Reporter

Comment 17

•

11 years ago

Starring at the stats from comment #12 at little bit more I think we can conclude two things: 1) We are not really looking at packet loss here. Tcpdump's and RTC stats agree that this call had at maximum 1.3% packet loss. I assume that should not result is such bad video quality. 2) What I find interesting though is that both sides of the call say in the about:webrtc stats the local encoder dropped 10% on one side and 14% on the other side. I don't know how these numbers get calculated, but could this indicate some form of over load?

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 18

•

11 years ago

(In reply to Nils Ohlmeier [:drno] from comment #17) > Starring at the stats from comment #12 at little bit more I think we can > conclude two things: > > 1) We are not really looking at packet loss here. Tcpdump's and RTC stats > agree that this call had at maximum 1.3% packet loss. I assume that should > not result is such bad video quality. It shouldn't on it's own. > > 2) What I find interesting though is that both sides of the call say in the > about:webrtc stats the local encoder dropped 10% on one side and 14% on the > other side. I don't know how these numbers get calculated, but could this > indicate some form of over load? That can depend on the cause. It may drop packets after an IDR (for error recovery) to keep the bitrate stable, for example, or if the previous frame took too long to encode, etc. If the camera is running 30fps, that would drop it to ~27fps-ish (average). It also might be a burst when it's having some problem. Can't tell from a single number

Nils Ohlmeier [:drno]

Reporter

Comment 19

•

11 years ago

Syd, could you please try to reproduce this? Basically you need two make a WebRTC call between two machines. For me the problem seems to be worse if one of the machines is Windows. Start with fresh profiles on both browsers (Aurora or Nightly). Verify that you see the Cisco H264 in your about:plugins (should show up ~1min after starting with a fresh profile). Then go to about:config add a new integer preference by right clicking into the list of prefs. Name: media.navigator.video.preferred_codec Value: 126 Then go to https://apprtc.appspot.com in one browser and visit the new URL with the /?r=XXXX with the browser on the other machine. Now the question is: how does the video look like if you run it for 1 minute. I get long lasting video freezes all the time.

Flags: needinfo?(spolk)

Nils Ohlmeier [:drno]

Reporter

Comment 20

•

11 years ago

If I set media.navigator.video.default_height to 200 and media.navigator.video.default_width to 320 the problem seems to appear less frequent.

Nils Ohlmeier [:drno]

Reporter

Comment 21

•

11 years ago

A call between two Macs worked fine for a couple of minutes. So maybe we are this problem only shows across different platforms?!

Nils Ohlmeier [:drno]

Reporter

Comment 22

•

11 years ago

Disregard c#21 the last test was flawed.

Nils Ohlmeier [:drno]

Reporter

Comment 23

•

11 years ago

@Syd: to verify that H264 is being used got to the about:webrtc page and search for the "m=video" line in the SDP's. We want to see something with 126 in there, and NOT 120. The lines below the "m=video" line should also say something with the string "H264" in it and NOT "VP8". Maire also indicated that H264 on Nightly might be broken with existing user profiles which have downloaded the plugin before already. So it might be worth trying it with fresh user profiles.

Syd Polk :sydpolk

Comment 24

•

11 years ago

I followed the steps Nils outlined. Was not able to get the H264 preference to stick until I created new profiles on both machines. I initiated a session via apprtc.appspot.com on my Windows VM, and connected to my other Mac. Verified that H264 was indeed in use. The video ran for about 3 minutes with only slight hicoughs. Then about 3 minutes in, the Windows machine displayed "Channel error", and the video sent by the Mac froze on the Windows screen. Then the video sent by Windows froze on the Mac screen. After about 10 seconds of that, both video streams started working. A few seconds later, the Windows machine displayed "Channel Error" again, and the Mac side lost connection ("Waiting for some to join..."). The Windows side is showing the last frame it got from the Mac. After 30 or so more seconds, the Windows side displayed "Channel error" again. Now things are not working at all. The Windows VM is on a NAT inside my home network; the Mac is not. My home network itself is behind a NAT. I am not running a TURN server. Don't know if this is important or not, but I thought I would mention it.

Syd Polk :sydpolk

Comment 25

•

11 years ago

I tried with a new profile because trying with an older profile did not set the codec to 126. Verified that the codec was being used correctly by looking at about:webrtc like Nils suggested.

Flags: needinfo?(spolk)

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 26

•

11 years ago

Please retry first with VP8 until you have multiple good calls, then switch. The Double-NAT may be part of the problem (and if the VM is slow, it could cause other issues). "Channel Error" implies to me a networking/TURN/ICE issue with apprtc. You can also try talky.io. I find that simpler in many ways (less cut-and-paste URLs). http://talky.io/choose_any_name_for_room - go there from both sides

Flags: needinfo?(spolk)

Syd Polk :sydpolk

Comment 27

•

11 years ago

OK, I ran with NSPR logging, WebRTC debugging and Wireshark on both machines last night. The feed from the Windows machine to the Mac was mostly ok; it went for 3 or 4 minutes and then froze. And then, about 30 seconds later, started again. The feed from the Mac to the Windows machine was bad. I would see a frame update every 30 seconds. This went on throughout the entire several minute run. The logs that were generated were big, I uploaded them to my Google Drive: https://drive.google.com/a/mozilla.com/#folders/0B1BaMtdCeAxdUXVfT2E3dl9BRXM

Syd Polk :sydpolk

Updated

•

11 years ago

Flags: needinfo?(spolk)

Syd Polk :sydpolk

Comment 28

•

11 years ago

Exact steps I am using: Setup: Mac - Mac OS X 10.9 native OS running on my home network, behind the NAT provided by AT&T UVerse. Wifi IP address: 192.168.1.94 Windows 7 VM running on Mac OS X 10.9 running on home network. VM network Bridged with IP address of 192.168.1.104; host is running both Ethernet (primary) 192.168.1.80 and Wi-Fi 192.168.1.68. No NAT between Windows VM and Mac. On, the Mac, export NSPR_LOG_MODULES=signaling:5,mtransport:5,timestamp:1,gmp:5,webrtc_trace:65535 export NSPR_LOG_FILE=/Users/spolk/Desktop/nspr.log export WEBRTC_TRACE_FILE=/Users/spolk/Desktop/WebRTC.log ./firefox -p On Windows: cmd /c "set NSPR_LOG_MODULES=signaling:5,mtransport:5,timestamp:1,gmp:5,webrtc_trace:65535 && set WEBRTC_TRACE_FILE=c:\logs\WebRTC.log && set NSPR_LOG_FILE=c:\logs\nspr.log && firefox -p On the Mac, I navigate to http://talky.io/syd_polk. I choose NOT to share the microphone, but I shared the builtin camera. On Windows, I navigate to the same site. I choose NOT to share the microphone, but I do share my Logitech Webcam which is attached to the Windows VM.

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 29

•

11 years ago

Attached patch Added debugs for rtp packet reception error handling — Details — Splinter Review

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 30

•

11 years ago

Attached patch reinstate timestamp-modification kludges (improved) for OpenH264 rtp reception — Details — Splinter Review

seems to work locally with both mode 1 and mode 0, though I haven't challenged it with loss yet. Based on an improved version of the timestamp-modification kludge. Might still be mode 0 issues to deal with.

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

11 years ago

Attachment #8488053 - Flags: review?(pkerr)

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 31

•

11 years ago

Comment on attachment 8488055 [details] [diff] [review] reinstate timestamp-modification kludges (improved) for OpenH264 rtp reception f? to Mo - this would be an alternative to options that try to set (or reset after a loss recovery) isFirstPacket in the current code reliably.

Attachment #8488055 - Flags: feedback?(mzanaty)

Syd Polk :sydpolk

Comment 32

•

11 years ago

I reproduced the steps with the try build at: https://tbpl.mozilla.org/?tree=Try&rev=9d1b38852d6a There were no video problems at all. I uploaded the logs from it: https://drive.google.com/a/mozilla.com/file/d/0B1BaMtdCeAxdTFJxNE9IemdVMFk/ https://drive.google.com/a/mozilla.com/file/d/0B1BaMtdCeAxdZ0JJOEhMalRFSm8/

Paul Kerr [:pkerr]

Updated

•

11 years ago

Attachment #8488053 - Flags: review?(pkerr) → review+

mzanaty

Comment 33

•

11 years ago

Attached patch Fix isFirstPacket for loss/reorder/nack cases (obsolete) — Details — Splinter Review

I prefer the solution in the attached patch. It should work more deterministically in more cases.

mzanaty

Comment 34

•

11 years ago

Attached patch Fix isFirstPacket for loss/reorder/nack cases (obsolete) — Details — Splinter Review

Oops, wrong patch previously attached. Correct one here.

Attachment #8488214 - Attachment is obsolete: true

Ethan Hugg [:ehugg]

Comment 35

•

11 years ago

Comment on attachment 8488223 [details] [diff] [review] Fix isFirstPacket for loss/reorder/nack cases Any chance something is missing from this Fix isFirstPacket patch? I get this crash on both Windows and OSX when applied to M-C and running either the landing page test or talky.io. Hit MOZ_CRASH() at /Users/ehugg/mozilla/mozilla-central/memory/mozalloc/mozalloc_abort.cpp:30 Process 77516 stopped * thread #25: tid = 0x2392f1, 0x00000001000da31d libmozalloc.dylib`mozalloc_abort(msg=0x0000000117ab53a8) + 93 at mozalloc_abort.cpp:30, name = 'GMPThread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) frame #0: 0x00000001000da31d libmozalloc.dylib`mozalloc_abort(msg=0x0000000117ab53a8) + 93 at mozalloc_abort.cpp:30 27 #else 28 __android_log_print(ANDROID_LOG_ERROR, "Gecko", "mozalloc_abort: %s", msg); 29 #endif -> 30 MOZ_CRASH(); 31 } 32 33 #if defined(XP_UNIX) (lldb) bt * thread #25: tid = 0x2392f1, 0x00000001000da31d libmozalloc.dylib`mozalloc_abort(msg=0x0000000117ab53a8) + 93 at mozalloc_abort.cpp:30, name = 'GMPThread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) * frame #0: 0x00000001000da31d libmozalloc.dylib`mozalloc_abort(msg=0x0000000117ab53a8) + 93 at mozalloc_abort.cpp:30 frame #1: 0x00000001015e6c25 XUL`Abort(aMsg=0x0000000117ab53a8) + 21 at nsDebugImpl.cpp:473 frame #2: 0x00000001015e6654 XUL`NS_DebugBreak(aSeverity=3, aStr=0x0000000107d8f86d, aExpr=0x0000000107d8f87a, aFile=0x0000000107d8f883, aLine=317) + 1204 at nsDebugImpl.cpp:430 frame #3: 0x0000000101d29daa XUL`mozilla::ipc::Shmem::AssertInvariants(this=0x000000014645c698) const + 74 at Shmem.cpp:317 frame #4: 0x00000001029ef899 XUL`unsigned char* mozilla::ipc::Shmem::get<unsigned char>(this=0x000000014645c698) const + 25 at Shmem.h:158 frame #5: 0x000000010449a97c XUL`mozilla::gmp::GMPVideoEncodedFrameImpl::Buffer(this=0x000000014645c660) + 28 at GMPVideoEncodedFrameImpl.cpp:302 frame #6: 0x000000010257f78a XUL`mozilla::WebrtcGmpVideoDecoder::Decode_g(this=0x000000012fac4920, aInputImage=0x000000011bf49928, aMissingFrames=false, aFragmentation=0x000000012d9059f0, aCodecSpecificInfo=0x000000012d9059c0, aRenderTimeMs=298987112) + 426 at WebrtcGmpVideoCodec.cpp:647 frame #7: 0x0000000102581cbd XUL`mozilla::runnable_args_m_5_ret<mozilla::WebrtcGmpVideoDecoder*, int (this=0x000000011bf498f0)(webrtc::EncodedImage const&, bool, webrtc::RTPFragmentationHeader const*, webrtc::CodecSpecificInfo const*, long long), webrtc::EncodedImage, bool, webrtc::RTPFragmentationHeader const*, webrtc::CodecSpecificInfo const*, long long, int>::Run() + 189 at runnable_utils_generated.h:481 frame #8: 0x0000000102583b16 XUL`mozilla::SyncRunnable::Run(this=0x000000014645bcc0) + 54 at SyncRunnable.h:75 frame #9: 0x00000001016cafd8 XUL`nsThread::ProcessNextEvent(this=0x0000000117ea5030, aMayWait=true, aResult=0x0000000117ab5c4e) + 2088 at nsThread.cpp:823 frame #10: 0x0000000101722a57 XUL`NS_ProcessNextEvent(aThread=0x0000000117ea5030, aMayWait=true) + 151 at nsThreadUtils.cpp:265 frame #11: 0x0000000101d27d4d XUL`mozilla::ipc::MessagePumpForNonMainThreads::Run(this=0x0000000117eff080, aDelegate=0x00000001003b3ae0) + 925 at MessagePump.cpp:355 frame #12: 0x0000000101cbdf46 XUL`MessageLoop::RunInternal(this=0x00000001003b3ae0) + 118 at message_loop.cc:229 frame #13: 0x0000000101cbde55 XUL`MessageLoop::RunHandler(this=0x00000001003b3ae0) + 21 at message_loop.cc:222 frame #14: 0x0000000101cbddfd XUL`MessageLoop::Run(this=0x00000001003b3ae0) + 45 at message_loop.cc:196 frame #15: 0x00000001016c9456 XUL`nsThread::ThreadFunc(aArg=0x0000000117ea5030) + 358 at nsThread.cpp:350 frame #16: 0x000000010136356f libnss3.dylib`_pt_root(arg=0x0000000117e41370) + 463 at ptthread.c:212 frame #17: 0x00007fff98cf0899 libsystem_pthread.dylib`_pthread_body + 138 frame #18: 0x00007fff98cf072a libsystem_pthread.dylib`_pthread_start + 137

mzanaty

Comment 36

•

11 years ago

Attached patch Real Fix isFirstPacket for loss/reorder/nack cases — Details — Splinter Review

Sorry, real patch with different file name (v2) attached properly.

Attachment #8488223 - Attachment is obsolete: true

mzanaty

Comment 37

•

11 years ago

The patch is easier to review without whitespace, due to a large indent block. Here it is without whitespace for easier review. --- a/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtp_receiver_video.cc +++ b/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtp_receiver_video.cc int32_t RTPReceiverVideo::ReceiveH264Codec(WebRtcRTPHeader* rtp_header, const uint8_t* payload_data, uint16_t payload_data_length) { size_t offset = RtpFormatH264::kNalHeaderOffset; uint8_t nal_type = payload_data[offset] & RtpFormatH264::kTypeMask; rtp_header->type.Video.codecHeader.H264.nalu_header = nal_type; + rtp_header->type.Video.isFirstPacket = true; // start of NALU not frame // get original NAL type if FU-A or STAP-A switch (nal_type) { case RtpFormatH264::kFuA: offset = RtpFormatH264::kFuAHeaderOffset; if (offset >= payload_data_length) return -1; // malformed nal_type = payload_data[offset] & RtpFormatH264::kTypeMask; + if (!(payload_data[offset] & RtpFormatH264::kFragStartBit)) { + rtp_header->type.Video.isFirstPacket = false; + } break; --- a/media/webrtc/trunk/webrtc/modules/video_coding/main/source/session_info.cc +++ b/media/webrtc/trunk/webrtc/modules/video_coding/main/source/session_info.cc int VCMSessionInfo::InsertPacket(const V... // Check for duplicate packets. if (rit != packets_.rend() && (*rit).seqNum == packet.seqNum && (*rit).sizeBytes > 0) return -2; PacketIterator packet_list_it; + if (packet.codec == kVideoCodecH264) { + // H.264 can have leading or trailing non-VCL NALUs, + // and the RTP marker bit is not reliable for the last packet of a frame. + // So allow out-of-order packets to update the first and last seq# range. + // Also mark as a key frame if any packet is of that type. + if (frame_type_ != kVideoFrameKey) { + frame_type_ = packet.frameType; + } + if ((!HaveFirstPacket() || + IsNewerSequenceNumber(first_packet_seq_num_, packet.seqNum)) && + packet.isFirstPacket) { + first_packet_seq_num_ = packet.seqNum; + } + if ((!HaveLastPacket() && packet.markerBit) || + (HaveLastPacket() && + IsNewerSequenceNumber(packet.seqNum, last_packet_seq_num_))) { + last_packet_seq_num_ = packet.seqNum; + } + } else { ...existing VP8 handling... + }

Ethan Hugg [:ehugg]

Comment 38

•

11 years ago

Try for builds with Mo's patch above: https://tbpl.mozilla.org/?tree=Try&rev=60739e2f9906

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 39

•

11 years ago

Comment on attachment 8488419 [details] [diff] [review] Real Fix isFirstPacket for loss/reorder/nack cases Review of attachment 8488419 [details] [diff] [review]: ----------------------------------------------------------------- r-, but if the investigation shows it's safe to only set KeyFrame when the 'session' includes an <iframe> (by changing rtp_receiver_video.cc), then this will be a trivial review of just the mods to do that. Note: please file a followup to update the in-tree webrtc unittests to include H.264 packetization tests. This will be required to upstream these fixes. (Modulo they may be independently implementing this; upstreaming the older patch stalled and didn't have tests anyways.) ::: media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtp_receiver_video.cc @@ +234,5 @@ > uint16_t payload_data_length) { > size_t offset = RtpFormatH264::kNalHeaderOffset; > uint8_t nal_type = payload_data[offset] & RtpFormatH264::kTypeMask; > rtp_header->type.Video.codecHeader.H264.nalu_header = nal_type; > + rtp_header->type.Video.isFirstPacket = true; // start of NALU not frame I'd expand the comment, move to above the line: // For H.264, isFirstPacket means first in NAL unit, not first in the timestamp, // which elsewhere is referred to as a 'frame' or 'session' Then also update the comment in module_comment_types.h with "or NAL for H.264" ::: media/webrtc/trunk/webrtc/modules/video_coding/main/source/session_info.cc @@ +486,5 @@ > > PacketIterator packet_list_it; > > + if (packet.codec == kVideoCodecH264) { > + // H.264 can have leading or trailing non-VCL NALUs, Some people reading this will be network people, not codec people, and the meaning of VCL may not be obvious. Expand or define @@ +487,5 @@ > PacketIterator packet_list_it; > > + if (packet.codec == kVideoCodecH264) { > + // H.264 can have leading or trailing non-VCL NALUs, > + // and the RTP marker bit is not reliable for the last packet of a frame. A reference to this would be good. (RFC 6184 5.1 - Decoders [] MUST NOT rely on this property). @@ +491,5 @@ > + // and the RTP marker bit is not reliable for the last packet of a frame. > + // So allow out-of-order packets to update the first and last seq# range. > + // Also mark as a key frame if any packet is of that type. > + if (frame_type_ != kVideoFrameKey) { > + frame_type_ = packet.frameType; This is a little iffy though likely works in most cases if there's no loss. The jitter buffer in recovery mode will "fast-forward" to a KeyFrame. In normal H.264 sequences <SPS><PPS><iframe> or <pframe> this will work; however, since all non-VCL NALs are marked as "KeyFrame" now, it will try to fast forward to any "session" with a KeyFrame. This means <SPS><PPS><pframe> (admittedly odd), <SPS><PPS><i-slice> (probably odd), <SPS><PPS> (odd, maybe 6184 violation, but possible), <pframe><SEI-or-other-random-NAL> (likely odd, but possible I suppose) will cause the jitterbuffer to assume it can fast-forward to the timestamp/"session" and start decoding there. I'm not sure I want to think about what a STAP-A with a leading p-frame and later IDR all with the same timestamp would cause, assuming this is (semi)-legal. Also (and perhaps more dangerously), if there's loss and the decoder tries to decode anyways, the fast-forward code might fail. It may check "HaveFirstPacket() && HaveLastPacket()", which with this change will always be true even if we're missing packets. So <SPS><PPS><loss> would be seen as a valid fast-forward target. Or <loss><PPS><loss>. or <loss><iframe> where the SPS/PPS were different than the last one (admittedly rare, and a known failure mode of H.264 depacketization). Setting KeyFrame only if an <iframe> is present removes all but the last (well-known, and almsot unavoidable) failure mode. A more complete fix would be to update the fast-forward code to query codec/packetization-knowledgable session code instead of relying on a precalculated bool. However, that's out-of-scope currently. A simple fix here would be to consider only iframes KeyFrames, and then use the logic here. This presumes SPS/PPS would be sent (if they're sent) with the same timestamp, which they should be. Please check if that would mess anything else up. Please check the fast-forward code and see if it needs to be updated, or if setting KeyFrame only on <iframe>s will work. @@ +502,5 @@ > + if ((!HaveLastPacket() && packet.markerBit) || > + (HaveLastPacket() && > + IsNewerSequenceNumber(packet.seqNum, last_packet_seq_num_))) { > + last_packet_seq_num_ = packet.seqNum; > + } In theory there are packet sequences that would confuse this, such as: TimeStamp: N N N N+1 N+1 N N+1 N+2 However that wouldn't apply to Mode 0 or Mode 1. Add a comment at the front of the "if ((!HaveFirstPacket() ||" block stating this assumes the original packet order by sequence number has non-decreasing timestamps, which is valid for Mode 0 or Mode 1 @@ +523,5 @@ > + // Update the frame type with the type of the first media packet. > + // TODO(mikhal): Can this trigger? > + frame_type_ = packet.frameType; > + } > + remove trailing whitespace

Attachment #8488419 - Flags: review-

mzanaty

Comment 40

•

11 years ago

(In reply to Randell Jesup [:jesup] from comment #39) > Comment on attachment 8488419 [details] [diff] [review] > Real Fix isFirstPacket for loss/reorder/nack cases > > Review of attachment 8488419 [details] [diff] [review]: > ----------------------------------------------------------------- > > r-, but if the investigation shows it's safe to only set KeyFrame when the > 'session' includes an <iframe> (by changing rtp_receiver_video.cc), then > this will be a trivial review of just the mods to do that. That would be safe for FF-to-FF but not for general H.264 peers which may use GDR (Gradual Decoder Refresh) or AIR (Adaptive Intra Refresh) techniques instead of IDR for recovery. GDR often sends a Recovery Point SEI followed by several (e.g. 5) P frames with some percentage (e.g. 20%) of intra-macroblocks in each P frame, after which the entire frame has been intra-refreshed. AIR is often even more gradual (e.g. 100 frames with 1% IMB each). Both GDR and AIR typically send SPS/PPS/SEI before the first P frame of the refresh cycle, but no IDR or I frames. > Note: please file a followup to update the in-tree webrtc unittests to > include H.264 packetization tests. This will be required to upstream these > fixes. (Modulo they may be independently implementing this; upstreaming the > older patch stalled and didn't have tests anyways.) Filed 1066616 for this. Let me know if it needs more details about the tests. > ::: media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtp_receiver_video.cc > @@ +234,5 @@ > > uint16_t payload_data_length) { > > size_t offset = RtpFormatH264::kNalHeaderOffset; > > uint8_t nal_type = payload_data[offset] & RtpFormatH264::kTypeMask; > > rtp_header->type.Video.codecHeader.H264.nalu_header = nal_type; > > + rtp_header->type.Video.isFirstPacket = true; // start of NALU not frame > > I'd expand the comment, move to above the line: > // For H.264, isFirstPacket means first in NAL unit, not first in the > timestamp, > // which elsewhere is referred to as a 'frame' or 'session' > > Then also update the comment in module_comment_types.h with "or NAL for > H.264" Agreed, will do. > ::: > media/webrtc/trunk/webrtc/modules/video_coding/main/source/session_info.cc > @@ +486,5 @@ > > > > PacketIterator packet_list_it; > > > > + if (packet.codec == kVideoCodecH264) { > > + // H.264 can have leading or trailing non-VCL NALUs, > > Some people reading this will be network people, not codec people, and the > meaning of VCL may not be obvious. Expand or define Agreed, will do. > @@ +487,5 @@ > > PacketIterator packet_list_it; > > > > + if (packet.codec == kVideoCodecH264) { > > + // H.264 can have leading or trailing non-VCL NALUs, > > + // and the RTP marker bit is not reliable for the last packet of a frame. > > A reference to this would be good. (RFC 6184 5.1 - Decoders [] MUST NOT > rely on this property). Agreed, will do. > @@ +491,5 @@ > > + // and the RTP marker bit is not reliable for the last packet of a frame. > > + // So allow out-of-order packets to update the first and last seq# range. > > + // Also mark as a key frame if any packet is of that type. > > + if (frame_type_ != kVideoFrameKey) { > > + frame_type_ = packet.frameType; > > This is a little iffy though likely works in most cases if there's no loss. > > The jitter buffer in recovery mode will "fast-forward" to a KeyFrame. In > normal H.264 sequences <SPS><PPS><iframe> or <pframe> this will work; > however, since all non-VCL NALs are marked as "KeyFrame" now, it will try to > fast forward to any "session" with a KeyFrame. The only non-VCL NALUs that get marked as KeyFrame are SPS, PPS and Recovery Point SEI. The only VCL NALU that gets marked as KeyFrame is IDR. These all explicitly signal that recovery can start here, so it makes sense to "fast forward" here if necessary for recovery. All other non-VCL and VCL NALUs don't get marked as KeyFrame. > This means <SPS><PPS><pframe> (admittedly odd), <SPS><PPS><i-slice> > (probably odd), <SPS><PPS> (odd, maybe 6184 violation, but possible), > <pframe><SEI-or-other-random-NAL> (likely odd, but possible I suppose) will > cause the jitterbuffer to assume it can fast-forward to the > timestamp/"session" and start decoding there. I'm not sure I want to think > about what a STAP-A with a leading p-frame and later IDR all with the same > timestamp would cause, assuming this is (semi)-legal. This is intentional. While odd, or just uncommon, some H.264 gear does this. > Also (and perhaps more dangerously), if there's loss and the decoder tries > to decode anyways, the fast-forward code might fail. It may check > "HaveFirstPacket() && HaveLastPacket()", which with this change will always > be true even if we're missing packets. So <SPS><PPS><loss> would be seen as > a valid fast-forward target. Or <loss><PPS><loss>. or <loss><iframe> where > the SPS/PPS were different than the last one (admittedly rare, and a known > failure mode of H.264 depacketization). Setting KeyFrame only if an > <iframe> is present removes all but the last (well-known, and almsot > unavoidable) failure mode. Setting KeyFrame only for IDR won't work for all H.264 peers due to GDR/AIR (see earlier comment). And it still has the same loss issues. Only leading loss is an issue, regardless if we set KeyFrame only for IDR or also SPS/PPS/Rec.Pt.SEI. Trailing loss will fail the HaveLastPacket check, and middle loss will fail the InSequence check. Here is an idea to fix the leading loss case, similar to the VP8 code. Enforce isFirstPacketOfTimestamp by looking for first_mb=0 in VCL NALUs (1,5). Then the HaveFirstPacket checks would fail for leading loss. > A more complete fix would be to update the fast-forward code to query > codec/packetization-knowledgable session code instead of relying on a > precalculated bool. However, that's out-of-scope currently. > > A simple fix here would be to consider only iframes KeyFrames, and then use > the logic here. This presumes SPS/PPS would be sent (if they're sent) with > the same timestamp, which they should be. Please check if that would mess > anything else up. It would only mess up GDR/AIR. But it still won't fix leading loss cases. > Please check the fast-forward code and see if it needs to be updated, or if > setting KeyFrame only on <iframe>s will work. Looks like it will work using the idea above (first_mb=0), similar to VP8. > @@ +502,5 @@ > > + if ((!HaveLastPacket() && packet.markerBit) || > > + (HaveLastPacket() && > > + IsNewerSequenceNumber(packet.seqNum, last_packet_seq_num_))) { > > + last_packet_seq_num_ = packet.seqNum; > > + } > > In theory there are packet sequences that would confuse this, such as: > TimeStamp: N N N N+1 N+1 N N+1 N+2 Do you mean this is the network sequence or actual RTP seq# sequence? InSequence will fail if there are seq# gaps within a timestamp. The JB handles separate timestamps separately, so the network sequence shouldn't matter. > However that wouldn't apply to Mode 0 or Mode 1. > Add a comment at the front of the "if ((!HaveFirstPacket() ||" block stating > this assumes the original packet order by sequence number has non-decreasing > timestamps, which is valid for Mode 0 or Mode 1 The JB may explode elsewhere on such streams, but not here. This code only deals with the packets queued under the same timestamp. An earlier seq# for that same timestamp can come later on the wire and still be accepted. This is unrelated to the relative seq# between different timestamps. > @@ +523,5 @@ > > + // Update the frame type with the type of the first media packet. > > + // TODO(mikhal): Can this trigger? > > + frame_type_ = packet.frameType; > > + } > > + > > remove trailing whitespace Yep

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 41

•

11 years ago

(In reply to mzanaty from comment #40) > (In reply to Randell Jesup [:jesup] from comment #39) > > Comment on attachment 8488419 [details] [diff] [review] > > Real Fix isFirstPacket for loss/reorder/nack cases > > > > Review of attachment 8488419 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > r-, but if the investigation shows it's safe to only set KeyFrame when the > > 'session' includes an <iframe> (by changing rtp_receiver_video.cc), then > > this will be a trivial review of just the mods to do that. > > That would be safe for FF-to-FF but not for general H.264 peers which may > use GDR (Gradual Decoder Refresh) or AIR (Adaptive Intra Refresh) techniques > instead of IDR for recovery. GDR often sends a Recovery Point SEI followed > by several (e.g. 5) P frames with some percentage (e.g. 20%) of > intra-macroblocks in each P frame, after which the entire frame has been > intra-refreshed. AIR is often even more gradual (e.g. 100 frames with 1% IMB > each). Both GDR and AIR typically send SPS/PPS/SEI before the first P frame > of the refresh cycle, but no IDR or I frames. So a more nuanced codec-aware fast-forward would be needed to support these in general. In the meantime, you could set KeyFrame on <iframe> or <sei> (AIR/GDR). > > @@ +491,5 @@ > > > + // and the RTP marker bit is not reliable for the last packet of a frame. > > > + // So allow out-of-order packets to update the first and last seq# range. > > > + // Also mark as a key frame if any packet is of that type. > > > + if (frame_type_ != kVideoFrameKey) { > > > + frame_type_ = packet.frameType; > > > > This is a little iffy though likely works in most cases if there's no loss. > > > > The jitter buffer in recovery mode will "fast-forward" to a KeyFrame. In > > normal H.264 sequences <SPS><PPS><iframe> or <pframe> this will work; > > however, since all non-VCL NALs are marked as "KeyFrame" now, it will try to > > fast forward to any "session" with a KeyFrame. > > The only non-VCL NALUs that get marked as KeyFrame are SPS, PPS and Recovery > Point SEI. The only VCL NALU that gets marked as KeyFrame is IDR. These all > explicitly signal that recovery can start here, so it makes sense to "fast > forward" here if necessary for recovery. All other non-VCL and VCL NALUs > don't get marked as KeyFrame. SPS and PPS don't directly indicate recovery can start there, though in practice they can if followed by iframe (or as you point out SEI). > > > This means <SPS><PPS><pframe> (admittedly odd), <SPS><PPS><i-slice> > > (probably odd), <SPS><PPS> (odd, maybe 6184 violation, but possible), > > <pframe><SEI-or-other-random-NAL> (likely odd, but possible I suppose) will > > cause the jitterbuffer to assume it can fast-forward to the > > timestamp/"session" and start decoding there. I'm not sure I want to think > > about what a STAP-A with a leading p-frame and later IDR all with the same > > timestamp would cause, assuming this is (semi)-legal. > > This is intentional. While odd, or just uncommon, some H.264 gear does this. Right, but are all of those valid fast-forward points? > > Also (and perhaps more dangerously), if there's loss and the decoder tries > > to decode anyways, the fast-forward code might fail. It may check > > "HaveFirstPacket() && HaveLastPacket()", which with this change will always > > be true even if we're missing packets. So <SPS><PPS><loss> would be seen as > > a valid fast-forward target. Or <loss><PPS><loss>. or <loss><iframe> where > > the SPS/PPS were different than the last one (admittedly rare, and a known > > failure mode of H.264 depacketization). Setting KeyFrame only if an > > <iframe> is present removes all but the last (well-known, and almsot > > unavoidable) failure mode. > > Setting KeyFrame only for IDR won't work for all H.264 peers due to GDR/AIR > (see earlier comment). And it still has the same loss issues. Only leading > loss is an issue, regardless if we set KeyFrame only for IDR or also > SPS/PPS/Rec.Pt.SEI. Trailing loss will fail the HaveLastPacket check, and > middle loss will fail the InSequence check. I see now; you're only setting HaveLastPacket on marker bits, and then if that's set and a later same-timestamp packet comes in, updating it to include that. What about missing Marker bits? Per above, we shouldn't depend on the Marker bit at all except to indicate we _can_ end frame accumulation at that point and push it down for decode (i.e. optimization; we don't have to wait for the next packet (which may be a while) to see if it has a different timestamp.) Does HaveLastPacket on the last packet get set when the next packet comes in and has a different timestamp? This is edge-casey enough we could just comment in the code and maybe file a bug on it, unless you believe missing (as opposed to early) Marker bits are a real possibility. > > Here is an idea to fix the leading loss case, similar to the VP8 code. > Enforce isFirstPacketOfTimestamp by looking for first_mb=0 in VCL NALUs > (1,5). Then the HaveFirstPacket checks would fail for leading loss. Does that fail for i-slices? Or FMO? (are those allowed in baseline? Of course, that assumes it's always baseline) Note that I'm ok with <lost-new-sps><pps><iframe> being incorrectly tagged as fast-forward target, as it will fail the same way with incremental decode-with-errors, and in theory the system should notice a decoder failure and ask for an IDR - of course, it probably doesn't do that. And there's no true way to know if <loss><iframe> is a valid decode point or not (ok, no easy way without getting into the short of the h264 decoder and breaking into the SPS/PPS and the iframe). (SPS/PPS aren't required before an iframe, and in sprop-parameter-set usage are never required - but still could be sent.) > > A more complete fix would be to update the fast-forward code to query > > codec/packetization-knowledgable session code instead of relying on a > > precalculated bool. However, that's out-of-scope currently. > > > > A simple fix here would be to consider only iframes KeyFrames, and then use > > the logic here. This presumes SPS/PPS would be sent (if they're sent) with > > the same timestamp, which they should be. Please check if that would mess > > anything else up. > > It would only mess up GDR/AIR. But it still won't fix leading loss cases. Would SEI or iframe == keyframe cover most or all of it? (can we key on SEI decode points /GDR only? I'm fuzzy on what can be in an SEI) > > Please check the fast-forward code and see if it needs to be updated, or if > > setting KeyFrame only on <iframe>s will work. > > Looks like it will work using the idea above (first_mb=0), similar to VP8. May be an option modulo the answers to my questions above. > > > @@ +502,5 @@ > > > + if ((!HaveLastPacket() && packet.markerBit) || > > > + (HaveLastPacket() && > > > + IsNewerSequenceNumber(packet.seqNum, last_packet_seq_num_))) { > > > + last_packet_seq_num_ = packet.seqNum; > > > + } > > > > In theory there are packet sequences that would confuse this, such as: > > TimeStamp: N N N N+1 N+1 N N+1 N+2 > > Do you mean this is the network sequence or actual RTP seq# sequence? > InSequence will fail if there are seq# gaps within a timestamp. The JB > handles separate timestamps separately, so the network sequence shouldn't > matter. that's in-sequence RTP (no gaps) with timestamps that are per above (i.e. timestamp goes backwards from Seq# X to Seq # X+1). And likely that will blow the brains of the current jitterbuffer code. As I said, I don't think we need to care about this for Mode 0/1.

mzanaty

Comment 42

•

11 years ago

(In reply to Randell Jesup [:jesup] from comment #41) > SPS and PPS don't directly indicate recovery can start there, though in > practice they can if followed by iframe (or as you point out SEI). Agreed. No major objection to excluding SPS/PPS from KeyFrame marking, since they should appear in the same timestamp as IDR or SEI which will then mark as KeyFrame. Excluding them would cause things to fail if they use a different timestamp than the IDR or SEI, but I've never seen that. The only corner case I've actually seen where this would fail is SPS/PPS/P without SEI. GDR/AIR often include SEI, but I've seen cases of AIR without SEI. That's why I included SPS/PPS, but it is such a low runner scenario that we can probably ignore. OTOH, I see no harm in including them. > > > This means <SPS><PPS><pframe> (admittedly odd), <SPS><PPS><i-slice> > > > (probably odd), <SPS><PPS> (odd, maybe 6184 violation, but possible), > > > <pframe><SEI-or-other-random-NAL> (likely odd, but possible I suppose) will > > > cause the jitterbuffer to assume it can fast-forward to the > > > timestamp/"session" and start decoding there. I'm not sure I want to think > > > about what a STAP-A with a leading p-frame and later IDR all with the same > > > timestamp would cause, assuming this is (semi)-legal. > > > > This is intentional. While odd, or just uncommon, some H.264 gear does this. > > Right, but are all of those valid fast-forward points? Yes. > > > Also (and perhaps more dangerously), if there's loss and the decoder tries > > > to decode anyways, the fast-forward code might fail. It may check > > > "HaveFirstPacket() && HaveLastPacket()", which with this change will always > > > be true even if we're missing packets. So <SPS><PPS><loss> would be seen as > > > a valid fast-forward target. Or <loss><PPS><loss>. or <loss><iframe> where > > > the SPS/PPS were different than the last one (admittedly rare, and a known > > > failure mode of H.264 depacketization). Setting KeyFrame only if an > > > <iframe> is present removes all but the last (well-known, and almsot > > > unavoidable) failure mode. > > > > Setting KeyFrame only for IDR won't work for all H.264 peers due to GDR/AIR > > (see earlier comment). And it still has the same loss issues. Only leading > > loss is an issue, regardless if we set KeyFrame only for IDR or also > > SPS/PPS/Rec.Pt.SEI. Trailing loss will fail the HaveLastPacket check, and > > middle loss will fail the InSequence check. > > I see now; you're only setting HaveLastPacket on marker bits, and then if > that's set and a later same-timestamp packet comes in, updating it to > include that. > > What about missing Marker bits? Per above, we shouldn't depend on the > Marker bit at all except to indicate we _can_ end frame accumulation at that > point and push it down for decode (i.e. optimization; we don't have to wait > for the next packet (which may be a while) to see if it has a different > timestamp.) Does HaveLastPacket on the last packet get set when the next > packet comes in and has a different timestamp? No. That would require snooping into another "session" from the current one. But it would completely avoid any dependence on reliable marker bits. Let me see how easy or intrusive this is. > This is edge-casey enough we could just comment in the code and maybe file a > bug on it, unless you believe missing (as opposed to early) Marker bits are > a real possibility. I've never seen missing markers. But as you pointed out, the spec allows it. > > Here is an idea to fix the leading loss case, similar to the VP8 code. > > Enforce isFirstPacketOfTimestamp by looking for first_mb=0 in VCL NALUs > > (1,5). Then the HaveFirstPacket checks would fail for leading loss. > > Does that fail for i-slices? Or FMO? (are those allowed in baseline? Of > course, that assumes it's always baseline) It works for everything except FMO/ASO, which is not allowed in CBP (or CHP/HP). > Note that I'm ok with <lost-new-sps><pps><iframe> being incorrectly tagged > as fast-forward target, as it will fail the same way with incremental > decode-with-errors, and in theory the system should notice a decoder failure > and ask for an IDR - of course, it probably doesn't do that. And there's no > true way to know if <loss><iframe> is a valid decode point or not (ok, no > easy way without getting into the short of the h264 decoder and breaking > into the SPS/PPS and the iframe). (SPS/PPS aren't required before an > iframe, and in sprop-parameter-set usage are never required - but still > could be sent.) Agreed. Knowing for sure something is a valid and complete refresh point takes very deep inspection. We only use a simpler heuristic, which should always work unless the peer is bad (buggy or rogue). > > > A more complete fix would be to update the fast-forward code to query > > > codec/packetization-knowledgable session code instead of relying on a > > > precalculated bool. However, that's out-of-scope currently. > > > > > > A simple fix here would be to consider only iframes KeyFrames, and then use > > > the logic here. This presumes SPS/PPS would be sent (if they're sent) with > > > the same timestamp, which they should be. Please check if that would mess > > > anything else up. > > > > It would only mess up GDR/AIR. But it still won't fix leading loss cases. > > Would SEI or iframe == keyframe cover most or all of it? > (can we key on SEI decode points /GDR only? I'm fuzzy on what can be in an > SEI) Yes, most but not all, see first comment about including/excluding SPS/PPS. We already key on SEI only if it is a Recovery Point, not just any SEI, in ReceiveH264Codec. case RtpFormatH264::kSei: // check if it is a Recovery Point SEI (aka GDR) if (offset+1 >= payload_data_length) return -1; // malformed if (payload_data[offset+1] != RtpFormatH264::kSeiRecPt) break; > > > Please check the fast-forward code and see if it needs to be updated, or if > > > setting KeyFrame only on <iframe>s will work. > > > > Looks like it will work using the idea above (first_mb=0), similar to VP8. > > May be an option modulo the answers to my questions above. What do you think about this option after these answers?

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 43

•

11 years ago

(In reply to mzanaty from comment #42) > (In reply to Randell Jesup [:jesup] from comment #41) > > SPS and PPS don't directly indicate recovery can start there, though in > > practice they can if followed by iframe (or as you point out SEI). > > Agreed. No major objection to excluding SPS/PPS from KeyFrame marking, since > they should appear in the same timestamp as IDR or SEI which will then mark > as KeyFrame. Excluding them would cause things to fail if they use a > different timestamp than the IDR or SEI, but I've never seen that. The only > corner case I've actually seen where this would fail is SPS/PPS/P without > SEI. GDR/AIR often include SEI, but I've seen cases of AIR without SEI. > That's why I included SPS/PPS, but it is such a low runner scenario that we > can probably ignore. OTOH, I see no harm in including them. So, thinking some more: the only case where SPS/PPS causing it to be marked as KeyFrame will break are: <SPS><PPS><pframe> (no SEI - either AIR without SEI, or <SPS><PPS> that aren't part of a refresh). And to that point - the more important thing in fast-forward is to not incorrectly skip over an IDR, especially if the decoder can return an error to mean "I need an IDR". So, Now that I know more details about some of the SEI uses (and the HaveLastFrame() is understood to require a Marker bit somewhere in the frame/session/timestamp), I think we can just go with SPS/PPS/iframe/SEI-with-refresh as it's coded. > > What about missing Marker bits? Per above, we shouldn't depend on the > > Marker bit at all except to indicate we _can_ end frame accumulation at that > > point and push it down for decode (i.e. optimization; we don't have to wait > > for the next packet (which may be a while) to see if it has a different > > timestamp.) Does HaveLastPacket on the last packet get set when the next > > packet comes in and has a different timestamp? > > No. That would require snooping into another "session" from the current one. > But it would completely avoid any dependence on reliable marker bits. Let me > see how easy or intrusive this is. Sure, but follow-on bug. > > This is edge-casey enough we could just comment in the code and maybe file a > > bug on it, unless you believe missing (as opposed to early) Marker bits are > > a real possibility. > > I've never seen missing markers. But as you pointed out, the spec allows it. Right > > > Here is an idea to fix the leading loss case, similar to the VP8 code. > > > Enforce isFirstPacketOfTimestamp by looking for first_mb=0 in VCL NALUs > > > (1,5). Then the HaveFirstPacket checks would fail for leading loss. > > > > Does that fail for i-slices? Or FMO? (are those allowed in baseline? Of > > course, that assumes it's always baseline) > > It works for everything except FMO/ASO, which is not allowed in CBP (or > CHP/HP). Can we add that to the first-packet setting? So it would be Set/Update first-packet on: in-order-but-new-timestamp || non-VCL-NAL || VCL(mb==0) (and even the first there is likely optional). And comments! > > Note that I'm ok with <lost-new-sps><pps><iframe> being incorrectly tagged > > as fast-forward target, as it will fail the same way with incremental > > decode-with-errors, and in theory the system should notice a decoder failure > > and ask for an IDR - of course, it probably doesn't do that. And there's no > > true way to know if <loss><iframe> is a valid decode point or not (ok, no > > easy way without getting into the short of the h264 decoder and breaking > > into the SPS/PPS and the iframe). (SPS/PPS aren't required before an > > iframe, and in sprop-parameter-set usage are never required - but still > > could be sent.) > > Agreed. Knowing for sure something is a valid and complete refresh point > takes very deep inspection. We only use a simpler heuristic, which should > always work unless the peer is bad (buggy or rogue). Right. And if we're overly permissive in deciding is_refresh, that's better.

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 44

•

11 years ago

Attached patch nit-fixes and comment updates per review (interdiff) — Details — Splinter Review

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 45

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/123ddef31d44

status-firefox32: --- → unaffected

status-firefox33: --- → affected

status-firefox34: --- → affected

status-firefox35: --- → affected

Target Milestone: --- → mozilla35

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 46

•

11 years ago

Comment on attachment 8488419 [details] [diff] [review] Real Fix isFirstPacket for loss/reorder/nack cases After extensive discussion and more analysis, r+ with the previous nits

Attachment #8488419 - Flags: review- → review+

Phil Ringnalda (:philor)

Comment 47

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/123ddef31d44

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

11 years ago

status-firefox35: affected → fixed

mzanaty

Updated

•

11 years ago

Attachment #8488055 - Flags: feedback?(mzanaty)

Syd Polk :sydpolk

Comment 48

•

11 years ago

I ran it again with the 9/14 Nightly debug build. Worked great. Logs are here: https://drive.google.com/a/mozilla.com/file/d/0B1BaMtdCeAxdNWdORGFqcllEdGM/edit?usp=sharing

Randell Jesup [:jesup] (needinfo me)

Assignee

Comment 49

•

11 years ago

Comment on attachment 8488419 [details] [diff] [review] Real Fix isFirstPacket for loss/reorder/nack cases Approval Request Comment [Feature/regressing bug #]: bug 1050461 (mode 0 support) [User impact if declined]: Failure under packet loss; the only other option would be to back out a fairly large patch (and turn off mode 0 support, which external equipment we expect to work with uses). [Describe test coverage new/current, TBPL]: On central/nightlies; verified by one of the two people who could reliably reproduce this (Syd). [Risks and why]: Hard to make it work any worse; low risk - basically just adjusts the setting of the "first packet of frame/last packet of frame" bits. [String/UUID change made/needed]: none

Attachment #8488419 - Flags: approval-mozilla-beta?

Attachment #8488419 - Flags: approval-mozilla-aurora?

Lawrence Mandel [:lmandel] (use needinfo)

Updated

•

11 years ago

tracking-firefox33: --- → +

tracking-firefox34: --- → +

Lawrence Mandel [:lmandel] (use needinfo)

Comment 50

•

11 years ago

Comment on attachment 8488419 [details] [diff] [review] Real Fix isFirstPacket for loss/reorder/nack cases Beta+ Aurora+

Attachment #8488419 - Flags: approval-mozilla-beta?

Attachment #8488419 - Flags: approval-mozilla-beta+

Attachment #8488419 - Flags: approval-mozilla-aurora?

Attachment #8488419 - Flags: approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Comment 51

•

11 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/a67e850f9376 https://hg.mozilla.org/releases/mozilla-beta/rev/75eddbd6dc80

status-firefox33: affected → fixed

status-firefox34: affected → fixed

Randell Jesup [:jesup] (needinfo me)

Assignee

Updated

•

11 years ago

Whiteboard: [openh264-uplift]

about:webrtc and debug log files 11 years ago Nils Ohlmeier [:drno] 687.68 KB, application/x-bzip2		Details
Added debugs for rtp packet reception error handling 11 years ago Randell Jesup [:jesup] (needinfo me) 7.36 KB, patch	pkerr : review+	Details \| Diff \| Splinter Review
reinstate timestamp-modification kludges (improved) for OpenH264 rtp reception 11 years ago Randell Jesup [:jesup] (needinfo me) 5.58 KB, patch		Details \| Diff \| Splinter Review
Fix isFirstPacket for loss/reorder/nack cases 11 years ago mzanaty 4.56 KB, patch		Details \| Diff \| Splinter Review
Fix isFirstPacket for loss/reorder/nack cases 11 years ago mzanaty 4.54 KB, patch		Details \| Diff \| Splinter Review
Real Fix isFirstPacket for loss/reorder/nack cases 11 years ago mzanaty 6.18 KB, patch	jesup : review+ lmandel : approval-mozilla-aurora+ lmandel : approval-mozilla-beta+	Details \| Diff \| Splinter Review
nit-fixes and comment updates per review (interdiff) 11 years ago Randell Jesup [:jesup] (needinfo me) 8.91 KB, patch		Details \| Diff \| Splinter Review