Closed Bug 1054624 Opened 6 years ago Closed 6 years ago

OpenH264 calls between two machines shows streaming issues

Categories

(Core :: WebRTC: Audio/Video, defect)

34 Branch
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla35
Tracking Status
firefox32 --- unaffected
firefox33 + fixed
firefox34 + fixed
firefox35 --- fixed

People

(Reporter: drno, Assigned: jesup)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 2 obsolete files)

The video in calls between two hosts in the Mozilla MV office if OpenH264 is used hangs a lot and only updates the picture every couple of seconds, or works for a seconds but hangs then again.
A screen recording of such a call: https://dl.dropboxusercontent.com/u/17831554/h264/H264video.mov
about:webrtc copies from both sides of the call:
https://dl.dropboxusercontent.com/u/17831554/h264/offer.txt
https://dl.dropboxusercontent.com/u/17831554/h264/answer.txt

For this call between Win and Mac both hosts were on the Mozila WiFi (not Mozilla Guest). Both machines were running the public build Nightly 34.0a1 (2014-08-15). Ping time between was 3ms (with few exceptions). According to the RTCP reports in the about:webrtc dumps there was some packet loss.

This happened for me for calls between my Win <-> Mac machines and Win <-> Linux. For some unknown reason I was not able to reproduce the problem for Linux <-> Mac.

Direction of the call does not seem to matter.
I had very few calls which were running fine for a few seconds, before the video would freeze on both ends and then only update the picture only every couple of seconds and never recover from this bad video streaming state.

Calls on just one browser on all OS's work fine on webrtc-landing/pc-test.html. A call between two browsers on one Mac works fine as well.
A call between two Mac's in my home WiFi shows the same problem, but not as bad as in the office. The video rendering freezes from time to time on end. If it freezes it is usually frozen for several seconds, but it always recovered. In my home network most of the time only one remote stream at a time would freeze, where in the office network usually both streams froze at the same time.

According to the RTCP reports there was packet loss as well, but less then in the office network. about:webrtc dumps:
https://dl.dropboxusercontent.com/u/17831554/h264/offer2.txt
https://dl.dropboxusercontent.com/u/17831554/h264/answer2.txt
I think the common nominator here seems to be packet loss.

Maire can you find the best person to answer: How does our implementation handle delays and/or queuing of packets over the IPC to and from the Plugin?
Blocks: OpenH264
Severity: normal → critical
Flags: needinfo?(mreavy)
Thanks for the report, Nils.  Can you verify that Aurora has the same problem?  We did check basic packet loss recovery a couple of weeks ago (artificially injecting packet loss), and it seemed to work fine.

webrtc.org logs would help a lot.  During a call that's showing this problem, can you open about:webrtc logs and click "Start debug mode" from both sides of the call and upload the logs here?

If this is a bug in packet loss recovery, Randell's the best person to start looking at this.  I've given him a heads up on this bug and assigned it to him for now.  He said he should be able to take a look at it later today.  The webrtc.org logs should help get us the visibility we need into what's going on.
Assignee: nobody → rjesup
Flags: needinfo?(mreavy)
What is the build source(s)?

From the SDP, I see this:

a=fmtp:126
PROFILE=0;LEVEL=0;profile-level-id=42e00d;packetization-mode=1;level-asymmetry-allowed=1;parameter-add=1;usedtx=0;stereo=0;useinbandfec=0;cbr=0

Nightly (or any other build) shouldn't show that; when I do a pc_test call forcing H.264 I see:
a=fmtp:126 profile-level-id=42e00d;packetization-mode=1;max-fs=600

There was a time when we'd get the PROFILE=0;LEVEL=0 stuff, but IIRC it was early in the h.264 work.  ALso, note it's getting audio fmtp values on a video codec line.
Tried locally with induced packet loss and with packet loss plus disabling NACK - fine (minor stutters when recovering without NACK).  (pc_test.html, modded to strip NACK negotiation if wanted)

Tried Linux->Mac nightly (fine) with induced packet loss on Talky
Tried Linus->Windows Debug (inbound) with induced packet loss on Talky - fine.  a short freeze or two that recovered, but windows debug builds have trouble keeping up at all.
(In reply to Randell Jesup [:jesup] from comment #4)
> What is the build source(s)?

Build source is our public builds. I don't know how to extract more precise build related information out of these then the version number from "About Firefox"

> From the SDP, I see this:
> 
> a=fmtp:126
> PROFILE=0;LEVEL=0;profile-level-id=42e00d;packetization-mode=1;level-
> asymmetry-allowed=1;parameter-add=1;usedtx=0;stereo=0;useinbandfec=0;cbr=0
> 
> Nightly (or any other build) shouldn't show that; when I do a pc_test call
> forcing H.264 I see:
> a=fmtp:126 profile-level-id=42e00d;packetization-mode=1;max-fs=600
> 
> There was a time when we'd get the PROFILE=0;LEVEL=0 stuff, but IIRC it was
> early in the h.264 work. 

Right now I don't know what is actually being used because of bug 1055080.
All I can say is that the profile 0 thing is not showing in the SDP which crosses the wire. But until we know which part of the system is lying to us here...

> ALso, note it's getting audio fmtp values on a
> video codec line.

I don't get what you mean. I see fmtp 101, 126 and 97 in the offers and answers all properly in the section where I would expect them.
H264 video call where video gets super choppy. About:webrtc and debug logs are in the tarball.
Again public Nightly's from today 34.0a1 (2014-08-18) which according to about:buildconfig translates to https://hg.mozilla.org/mozilla-central/rev/0aaa2d3d15cc
I see the same anomalous stuff as noted in c4 in working loopback scenarios. So, I suspect that this is just a separate issue about about:webrtc
FWIW, here is what I see being configured in this setting:

(gdb) p *$1
$2 = {
  codecType = webrtc::kVideoCodecH264, 
  plName = "H264", '\0' <repeats 27 times>, 
  plType = 126 '~', 
  width = 320, 
  height = 240, 
  startBitrate = 300, 
  maxBitrate = 2000, 
  minBitrate = 200, 
  maxFramerate = 30 '\036', 
  codecSpecific = {
    VP8 = {
      pictureLossIndicationOn = 66, 
      feedbackModeOn = 224, 
      complexity = webrtc::kComplexityNormal, 
      resilience = webrtc::kResilienceOff, 
      numberOfTemporalLayers = 0 '\0', 
      denoisingOn = false, 
      errorConcealmentOn = false, 
      automaticResizeOn = false, 
      frameDroppingOn = false, 
      keyFrameInterval = 0
    }, 
    H264 = {
      profile = 66 'B', 
      constraints = 224 '?', 
      level = 13 '\r', 
      packetizationMode = 1 '\001', 
      frameDroppingOn = false, 
      keyFrameInterval = 0, 
      spsData = 0x0, 
      spsLen = 0, 
      ppsData = 0x0, 
      ppsLen = 0
    }
  },
(In reply to Nils Ohlmeier [:drno] from comment #6)
> > ALso, note it's getting audio fmtp values on a
> > video codec line.
> 
> I don't get what you mean. I see fmtp 101, 126 and 97 in the offers and
> answers all properly in the section where I would expect them.

For the other bug, but: usedtx=0;stereo=0 and others are audio fmtp values.
So here is a Pcap and a webrtc debug log from a call again with todays Nightly. This time between Win and Linux. Both machines are on cable.
As the file is too big here is a link: http://people.mozilla.org/~nohlmeier/bz1054624/h264_dump_log.tar.bz2
Whiteboard: [openh264-uplift]
So here is a PCAP and WebRTC.log (sorry don't know which of the two browsers succeeded in creating this file) from a working call between Aurora and Nightly on my Mac.
http://people.mozilla.org/~nohlmeier/bz1054624/loopback.tar.bz2
This tarball contains PCAP, WebRTC.log and about:webrtc from a call between two Nightly's from today. Both machines on MV office WiFi. Picture froze pretty quickly.
http://people.mozilla.org/~nohlmeier/bz1054624/full-logs-mac-win.tar.bz2
FYI the code for my test call can be found here: https://github.com/nils-ohlmeier/mozwebrtctools/tree/master/manual_call
The pages depend on a simplesignlaing server (https://github.com/luser/simplesignalling) running somewhere (and the hard coded address in both pages to the manually updated to that server address).

I just double checked that ICE connection gets established reasonably fast, but H264 rendering sometimes takes several seconds after ICE connection to start rendering the remote video.
I replaced modifying the SDP offer in offer.html with setting the user pref media.navigator.video.preferred_codec to 126.
Verification shows that the SDP looks pretty much the same as before (the m=video line shows only 126, but the attribute lines for all video codecs are still there). Advantage is that you can use apprtc.appspot.com or any other WebRTC video calling service.
When I tried it yesterday with my own signaling server and with apprtc it still showed the same result.
Sorry forgot to mention in my last comment that my test was again between my MacBook Pro and my Windows ThinkPad laptop in the office network.
Starring at the stats from comment #12 at little bit more I think we can conclude two things:

1) We are not really looking at packet loss here. Tcpdump's and RTC stats agree that this call had at maximum 1.3% packet loss. I assume that should not result is such bad video quality.

2) What I find interesting though is that both sides of the call say in the about:webrtc stats the local encoder dropped 10% on one side and 14% on the other side. I don't know how these numbers get calculated, but could this indicate some form of over load?
(In reply to Nils Ohlmeier [:drno] from comment #17)
> Starring at the stats from comment #12 at little bit more I think we can
> conclude two things:
> 
> 1) We are not really looking at packet loss here. Tcpdump's and RTC stats
> agree that this call had at maximum 1.3% packet loss. I assume that should
> not result is such bad video quality.

It shouldn't on it's own.

> 
> 2) What I find interesting though is that both sides of the call say in the
> about:webrtc stats the local encoder dropped 10% on one side and 14% on the
> other side. I don't know how these numbers get calculated, but could this
> indicate some form of over load?

That can depend on the cause.  It may drop packets after an IDR (for error recovery) to keep the bitrate stable, for example, or if the previous frame took too long to encode, etc.  If the camera is running 30fps, that would drop it to ~27fps-ish (average).  It also might be a burst when it's having some problem.  Can't tell from a single number
Syd, could you please try to reproduce this?
Basically you need two make a WebRTC call between two machines. For me the problem seems to be worse if one of the machines is Windows.
Start with fresh profiles on both browsers (Aurora or Nightly). Verify that you see the Cisco H264 in your about:plugins (should show up ~1min after starting with a fresh profile).
Then go to about:config add a new integer preference by right clicking into the list of prefs. Name: media.navigator.video.preferred_codec Value: 126
Then go to https://apprtc.appspot.com in one browser and visit the new URL with the /?r=XXXX with the browser on the other machine.
Now the question is: how does the video look like if you run it for 1 minute. I get long lasting video freezes all the time.
Flags: needinfo?(spolk)
If I set media.navigator.video.default_height to 200 and media.navigator.video.default_width to 320 the problem seems to appear less frequent.
A call between two Macs worked fine for a couple of minutes. So maybe we are this problem only shows across different platforms?!
Disregard c#21 the last test was flawed.
@Syd: to verify that H264 is being used got to the about:webrtc page and search for the "m=video" line in the SDP's. We want to see something with 126 in there, and NOT 120. The lines below the "m=video" line should also say something with the string "H264" in it and NOT "VP8".

Maire also indicated that H264 on Nightly might be broken with existing user profiles which have downloaded the plugin before already. So it might be worth trying it with fresh user profiles.
I followed the steps Nils outlined. Was not able to get the H264 preference to stick until I created new profiles on both machines. I initiated a session via apprtc.appspot.com on my Windows VM, and connected to my other Mac. Verified that H264 was indeed in use. The video ran for about 3 minutes with only slight hicoughs. Then about 3 minutes in, the Windows machine displayed "Channel error", and the video sent by the Mac froze on the Windows screen. Then the video sent by Windows froze on the Mac screen. After about 10 seconds of that, both video streams started working. A few seconds later, the Windows machine displayed "Channel Error" again, and the Mac side lost connection ("Waiting for some to join..."). The Windows side is showing the last frame it got from the Mac. After 30 or so more seconds, the Windows side displayed "Channel error" again. Now things are not working at all.

The Windows VM is on a NAT inside my home network; the Mac is not. My home network itself is behind a NAT. I am not running a TURN server. Don't know if this is important or not, but I thought I would mention it.
I tried with a new profile because trying with an older profile did not set the codec to 126. Verified that the codec was being used correctly by looking at about:webrtc like Nils suggested.
Flags: needinfo?(spolk)
Please retry first with VP8 until you have multiple good calls, then switch.  The Double-NAT may be part of the problem (and if the VM is slow, it could cause other issues).  "Channel Error" implies to me a networking/TURN/ICE issue with apprtc.

You can also try talky.io.  I find that simpler in many ways (less cut-and-paste URLs).
http://talky.io/choose_any_name_for_room - go there from both sides
Flags: needinfo?(spolk)
OK, I ran with NSPR logging, WebRTC debugging and Wireshark on both machines last night.

The feed from the Windows machine to the Mac was mostly ok; it went for 3 or 4 minutes and then froze. And then, about 30 seconds later, started again.

The feed from the Mac to the Windows machine was bad. I would see a frame update every 30 seconds. This went on throughout the entire several minute run.

The logs that were generated were big, I uploaded them to my Google Drive:

https://drive.google.com/a/mozilla.com/#folders/0B1BaMtdCeAxdUXVfT2E3dl9BRXM
Flags: needinfo?(spolk)
Exact steps I am using:

Setup:

Mac - Mac OS X 10.9 native OS running on my home network, behind the NAT provided by AT&T UVerse. Wifi IP address: 192.168.1.94
Windows 7 VM running on Mac OS X 10.9 running on home network. VM network Bridged with IP address of 192.168.1.104; host is running both Ethernet (primary) 192.168.1.80 and Wi-Fi 192.168.1.68.

No NAT between Windows VM and Mac.

On, the Mac, 

export NSPR_LOG_MODULES=signaling:5,mtransport:5,timestamp:1,gmp:5,webrtc_trace:65535
export NSPR_LOG_FILE=/Users/spolk/Desktop/nspr.log
export WEBRTC_TRACE_FILE=/Users/spolk/Desktop/WebRTC.log
./firefox -p

On Windows:

cmd /c "set NSPR_LOG_MODULES=signaling:5,mtransport:5,timestamp:1,gmp:5,webrtc_trace:65535 && set WEBRTC_TRACE_FILE=c:\logs\WebRTC.log && set NSPR_LOG_FILE=c:\logs\nspr.log && firefox -p

On the Mac, I navigate to http://talky.io/syd_polk. I choose NOT to share the microphone, but I shared the builtin camera.

On Windows, I navigate to the same site. I choose NOT to share the microphone, but I do share my Logitech Webcam which is attached to the Windows VM.
seems to work locally with both mode 1 and mode 0, though I haven't challenged it with loss yet.  Based on an improved version of the timestamp-modification kludge.  Might still be mode 0 issues to deal with.
Attachment #8488053 - Flags: review?(pkerr)
Comment on attachment 8488055 [details] [diff] [review]
reinstate timestamp-modification kludges (improved) for OpenH264 rtp reception

f? to Mo - this would be an alternative to options that try to set (or reset after a loss recovery) isFirstPacket in the current code reliably.
Attachment #8488055 - Flags: feedback?(mzanaty)
Attachment #8488053 - Flags: review?(pkerr) → review+
I prefer the solution in the attached patch. It should work more deterministically in more cases.
Oops, wrong patch previously attached. Correct one here.
Attachment #8488214 - Attachment is obsolete: true
Comment on attachment 8488223 [details] [diff] [review]
Fix isFirstPacket for loss/reorder/nack cases

Any chance something is missing from this Fix isFirstPacket patch?  I get this crash on both Windows and OSX when applied to M-C and running either the landing page test or talky.io.


Hit MOZ_CRASH() at /Users/ehugg/mozilla/mozilla-central/memory/mozalloc/mozalloc_abort.cpp:30
Process 77516 stopped
* thread #25: tid = 0x2392f1, 0x00000001000da31d libmozalloc.dylib`mozalloc_abort(msg=0x0000000117ab53a8) + 93 at mozalloc_abort.cpp:30, name = 'GMPThread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x00000001000da31d libmozalloc.dylib`mozalloc_abort(msg=0x0000000117ab53a8) + 93 at mozalloc_abort.cpp:30
   27  	#else
   28  	    __android_log_print(ANDROID_LOG_ERROR, "Gecko", "mozalloc_abort: %s", msg);
   29  	#endif
-> 30  	    MOZ_CRASH();
   31  	}
   32  	
   33  	#if defined(XP_UNIX)
(lldb) bt
* thread #25: tid = 0x2392f1, 0x00000001000da31d libmozalloc.dylib`mozalloc_abort(msg=0x0000000117ab53a8) + 93 at mozalloc_abort.cpp:30, name = 'GMPThread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001000da31d libmozalloc.dylib`mozalloc_abort(msg=0x0000000117ab53a8) + 93 at mozalloc_abort.cpp:30
    frame #1: 0x00000001015e6c25 XUL`Abort(aMsg=0x0000000117ab53a8) + 21 at nsDebugImpl.cpp:473
    frame #2: 0x00000001015e6654 XUL`NS_DebugBreak(aSeverity=3, aStr=0x0000000107d8f86d, aExpr=0x0000000107d8f87a, aFile=0x0000000107d8f883, aLine=317) + 1204 at nsDebugImpl.cpp:430
    frame #3: 0x0000000101d29daa XUL`mozilla::ipc::Shmem::AssertInvariants(this=0x000000014645c698) const + 74 at Shmem.cpp:317
    frame #4: 0x00000001029ef899 XUL`unsigned char* mozilla::ipc::Shmem::get<unsigned char>(this=0x000000014645c698) const + 25 at Shmem.h:158
    frame #5: 0x000000010449a97c XUL`mozilla::gmp::GMPVideoEncodedFrameImpl::Buffer(this=0x000000014645c660) + 28 at GMPVideoEncodedFrameImpl.cpp:302
    frame #6: 0x000000010257f78a XUL`mozilla::WebrtcGmpVideoDecoder::Decode_g(this=0x000000012fac4920, aInputImage=0x000000011bf49928, aMissingFrames=false, aFragmentation=0x000000012d9059f0, aCodecSpecificInfo=0x000000012d9059c0, aRenderTimeMs=298987112) + 426 at WebrtcGmpVideoCodec.cpp:647
    frame #7: 0x0000000102581cbd XUL`mozilla::runnable_args_m_5_ret<mozilla::WebrtcGmpVideoDecoder*, int (this=0x000000011bf498f0)(webrtc::EncodedImage const&, bool, webrtc::RTPFragmentationHeader const*, webrtc::CodecSpecificInfo const*, long long), webrtc::EncodedImage, bool, webrtc::RTPFragmentationHeader const*, webrtc::CodecSpecificInfo const*, long long, int>::Run() + 189 at runnable_utils_generated.h:481
    frame #8: 0x0000000102583b16 XUL`mozilla::SyncRunnable::Run(this=0x000000014645bcc0) + 54 at SyncRunnable.h:75
    frame #9: 0x00000001016cafd8 XUL`nsThread::ProcessNextEvent(this=0x0000000117ea5030, aMayWait=true, aResult=0x0000000117ab5c4e) + 2088 at nsThread.cpp:823
    frame #10: 0x0000000101722a57 XUL`NS_ProcessNextEvent(aThread=0x0000000117ea5030, aMayWait=true) + 151 at nsThreadUtils.cpp:265
    frame #11: 0x0000000101d27d4d XUL`mozilla::ipc::MessagePumpForNonMainThreads::Run(this=0x0000000117eff080, aDelegate=0x00000001003b3ae0) + 925 at MessagePump.cpp:355
    frame #12: 0x0000000101cbdf46 XUL`MessageLoop::RunInternal(this=0x00000001003b3ae0) + 118 at message_loop.cc:229
    frame #13: 0x0000000101cbde55 XUL`MessageLoop::RunHandler(this=0x00000001003b3ae0) + 21 at message_loop.cc:222
    frame #14: 0x0000000101cbddfd XUL`MessageLoop::Run(this=0x00000001003b3ae0) + 45 at message_loop.cc:196
    frame #15: 0x00000001016c9456 XUL`nsThread::ThreadFunc(aArg=0x0000000117ea5030) + 358 at nsThread.cpp:350
    frame #16: 0x000000010136356f libnss3.dylib`_pt_root(arg=0x0000000117e41370) + 463 at ptthread.c:212
    frame #17: 0x00007fff98cf0899 libsystem_pthread.dylib`_pthread_body + 138
    frame #18: 0x00007fff98cf072a libsystem_pthread.dylib`_pthread_start + 137
Sorry, real patch with different file name (v2) attached properly.
Attachment #8488223 - Attachment is obsolete: true
The patch is easier to review without whitespace, due to a large indent block. Here it is without whitespace for easier review.

--- a/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtp_receiver_video.cc
+++ b/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtp_receiver_video.cc
 
 int32_t RTPReceiverVideo::ReceiveH264Codec(WebRtcRTPHeader* rtp_header,
                                           const uint8_t* payload_data,
                                           uint16_t payload_data_length) {
   size_t offset = RtpFormatH264::kNalHeaderOffset;
   uint8_t nal_type = payload_data[offset] & RtpFormatH264::kTypeMask;
   rtp_header->type.Video.codecHeader.H264.nalu_header = nal_type;
+  rtp_header->type.Video.isFirstPacket = true; // start of NALU not frame
   // get original NAL type if FU-A or STAP-A
   switch (nal_type) {
     case RtpFormatH264::kFuA:
       offset = RtpFormatH264::kFuAHeaderOffset;
       if (offset >= payload_data_length) return -1; // malformed
       nal_type = payload_data[offset] & RtpFormatH264::kTypeMask;
+      if (!(payload_data[offset] & RtpFormatH264::kFragStartBit)) {
+        rtp_header->type.Video.isFirstPacket = false;
+      }
       break;

--- a/media/webrtc/trunk/webrtc/modules/video_coding/main/source/session_info.cc
+++ b/media/webrtc/trunk/webrtc/modules/video_coding/main/source/session_info.cc

 int VCMSessionInfo::InsertPacket(const V...
 
   // Check for duplicate packets.
   if (rit != packets_.rend() &&
       (*rit).seqNum == packet.seqNum && (*rit).sizeBytes > 0)
     return -2;
 
   PacketIterator packet_list_it;
 
+  if (packet.codec == kVideoCodecH264) {
+    // H.264 can have leading or trailing non-VCL NALUs,
+    // and the RTP marker bit is not reliable for the last packet of a frame.
+    // So allow out-of-order packets to update the first and last seq# range.
+    // Also mark as a key frame if any packet is of that type.
+    if (frame_type_ != kVideoFrameKey) {
+      frame_type_ = packet.frameType;
+    }
+    if ((!HaveFirstPacket() ||
+        IsNewerSequenceNumber(first_packet_seq_num_, packet.seqNum)) &&
+        packet.isFirstPacket) {
+      first_packet_seq_num_ = packet.seqNum;
+    }
+    if ((!HaveLastPacket() && packet.markerBit) ||
+        (HaveLastPacket() &&
+        IsNewerSequenceNumber(packet.seqNum, last_packet_seq_num_))) {
+      last_packet_seq_num_ = packet.seqNum;
+    }
+  } else {
 ...existing VP8 handling...
+  }
Try for builds with Mo's patch above:

https://tbpl.mozilla.org/?tree=Try&rev=60739e2f9906
Comment on attachment 8488419 [details] [diff] [review]
Real Fix isFirstPacket for loss/reorder/nack cases

Review of attachment 8488419 [details] [diff] [review]:
-----------------------------------------------------------------

r-, but if the investigation shows it's safe to only set KeyFrame when the 'session' includes an <iframe> (by changing rtp_receiver_video.cc), then this will be a trivial review of just the mods to do that.

Note: please file a followup to update the in-tree webrtc unittests to include H.264 packetization tests.  This will be required to upstream these fixes.  (Modulo they may be independently implementing this; upstreaming the older patch stalled and didn't have tests anyways.)

::: media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtp_receiver_video.cc
@@ +234,5 @@
>                                            uint16_t payload_data_length) {
>    size_t offset = RtpFormatH264::kNalHeaderOffset;
>    uint8_t nal_type = payload_data[offset] & RtpFormatH264::kTypeMask;
>    rtp_header->type.Video.codecHeader.H264.nalu_header = nal_type;
> +  rtp_header->type.Video.isFirstPacket = true; // start of NALU not frame

I'd expand the comment, move to above the line:
// For H.264, isFirstPacket means first in NAL unit, not first in the timestamp, 
// which elsewhere is referred to as a 'frame' or 'session'

Then also update the comment in module_comment_types.h with "or NAL for H.264"

::: media/webrtc/trunk/webrtc/modules/video_coding/main/source/session_info.cc
@@ +486,5 @@
>  
>    PacketIterator packet_list_it;
>  
> +  if (packet.codec == kVideoCodecH264) {
> +    // H.264 can have leading or trailing non-VCL NALUs,

Some people reading this will be network people, not codec people, and the meaning of VCL may not be obvious.  Expand or define

@@ +487,5 @@
>    PacketIterator packet_list_it;
>  
> +  if (packet.codec == kVideoCodecH264) {
> +    // H.264 can have leading or trailing non-VCL NALUs,
> +    // and the RTP marker bit is not reliable for the last packet of a frame.

A reference to this would be good.  (RFC 6184 5.1 - Decoders [] MUST NOT rely on this property).

@@ +491,5 @@
> +    // and the RTP marker bit is not reliable for the last packet of a frame.
> +    // So allow out-of-order packets to update the first and last seq# range.
> +    // Also mark as a key frame if any packet is of that type.
> +    if (frame_type_ != kVideoFrameKey) {
> +      frame_type_ = packet.frameType;

This is a little iffy though likely works in most cases if there's no loss.

The jitter buffer in recovery mode will "fast-forward" to a KeyFrame.  In normal H.264 sequences <SPS><PPS><iframe> or <pframe> this will work; however, since all non-VCL NALs are marked as "KeyFrame" now, it will try to fast forward to any "session" with a KeyFrame.  

This means <SPS><PPS><pframe> (admittedly odd), <SPS><PPS><i-slice> (probably odd), <SPS><PPS> (odd, maybe 6184 violation, but possible), <pframe><SEI-or-other-random-NAL> (likely odd, but possible I suppose) will cause the jitterbuffer to assume it can fast-forward to the timestamp/"session" and start decoding there.  I'm not sure I want to think about what a STAP-A with a leading p-frame and later IDR all with the same timestamp would cause, assuming this is (semi)-legal.

Also (and perhaps more dangerously), if there's loss and the decoder tries to decode anyways, the fast-forward code might fail.  It may check "HaveFirstPacket() && HaveLastPacket()", which with this change will always be true even if we're missing packets.  So <SPS><PPS><loss> would be seen as a valid fast-forward target.  Or <loss><PPS><loss>.  or <loss><iframe> where the SPS/PPS were different than the last one (admittedly rare, and a known failure mode of H.264 depacketization).  Setting KeyFrame only if an <iframe> is present removes all but the last (well-known, and almsot unavoidable) failure mode.

A more complete fix would be to update the fast-forward code to query codec/packetization-knowledgable session code instead of relying on a precalculated bool.  However, that's out-of-scope currently.  

A simple fix here would be to consider only iframes KeyFrames, and then use the logic here.  This presumes SPS/PPS would be sent (if they're sent) with the same timestamp, which they should be.  Please check if that would mess anything else up.

Please check the fast-forward code and see if it needs to be updated, or if setting KeyFrame only on <iframe>s will work.

@@ +502,5 @@
> +    if ((!HaveLastPacket() && packet.markerBit) ||
> +        (HaveLastPacket() &&
> +        IsNewerSequenceNumber(packet.seqNum, last_packet_seq_num_))) {
> +      last_packet_seq_num_ = packet.seqNum;
> +    }

In theory there are packet sequences that would confuse this, such as: 
TimeStamp: N N N N+1 N+1 N N+1 N+2 
However that wouldn't apply to Mode 0 or Mode 1.
Add a comment at the front of the "if ((!HaveFirstPacket() ||" block stating this assumes the original packet order by sequence number has non-decreasing timestamps, which is valid for Mode 0 or Mode 1

@@ +523,5 @@
> +      // Update the frame type with the type of the first media packet.
> +      // TODO(mikhal): Can this trigger?
> +      frame_type_ = packet.frameType;
> +    }
> +    

remove trailing whitespace
Attachment #8488419 - Flags: review-
(In reply to Randell Jesup [:jesup] from comment #39)
> Comment on attachment 8488419 [details] [diff] [review]
> Real Fix isFirstPacket for loss/reorder/nack cases
> 
> Review of attachment 8488419 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r-, but if the investigation shows it's safe to only set KeyFrame when the
> 'session' includes an <iframe> (by changing rtp_receiver_video.cc), then
> this will be a trivial review of just the mods to do that.

That would be safe for FF-to-FF but not for general H.264 peers which may use GDR (Gradual Decoder Refresh) or AIR (Adaptive Intra Refresh) techniques instead of IDR for recovery. GDR often sends a Recovery Point SEI followed by several (e.g. 5) P frames with some percentage (e.g. 20%) of intra-macroblocks in each P frame, after which the entire frame has been intra-refreshed. AIR is often even more gradual (e.g. 100 frames with 1% IMB each). Both GDR and AIR typically send SPS/PPS/SEI before the first P frame of the refresh cycle, but no IDR or I frames.
 
> Note: please file a followup to update the in-tree webrtc unittests to
> include H.264 packetization tests.  This will be required to upstream these
> fixes.  (Modulo they may be independently implementing this; upstreaming the
> older patch stalled and didn't have tests anyways.)

Filed 1066616 for this. Let me know if it needs more details about the tests.
 
> ::: media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtp_receiver_video.cc
> @@ +234,5 @@
> >                                            uint16_t payload_data_length) {
> >    size_t offset = RtpFormatH264::kNalHeaderOffset;
> >    uint8_t nal_type = payload_data[offset] & RtpFormatH264::kTypeMask;
> >    rtp_header->type.Video.codecHeader.H264.nalu_header = nal_type;
> > +  rtp_header->type.Video.isFirstPacket = true; // start of NALU not frame
> 
> I'd expand the comment, move to above the line:
> // For H.264, isFirstPacket means first in NAL unit, not first in the
> timestamp, 
> // which elsewhere is referred to as a 'frame' or 'session'
> 
> Then also update the comment in module_comment_types.h with "or NAL for
> H.264"

Agreed, will do.

> :::
> media/webrtc/trunk/webrtc/modules/video_coding/main/source/session_info.cc
> @@ +486,5 @@
> >  
> >    PacketIterator packet_list_it;
> >  
> > +  if (packet.codec == kVideoCodecH264) {
> > +    // H.264 can have leading or trailing non-VCL NALUs,
> 
> Some people reading this will be network people, not codec people, and the
> meaning of VCL may not be obvious.  Expand or define

Agreed, will do.

> @@ +487,5 @@
> >    PacketIterator packet_list_it;
> >  
> > +  if (packet.codec == kVideoCodecH264) {
> > +    // H.264 can have leading or trailing non-VCL NALUs,
> > +    // and the RTP marker bit is not reliable for the last packet of a frame.
> 
> A reference to this would be good.  (RFC 6184 5.1 - Decoders [] MUST NOT
> rely on this property).

Agreed, will do.

> @@ +491,5 @@
> > +    // and the RTP marker bit is not reliable for the last packet of a frame.
> > +    // So allow out-of-order packets to update the first and last seq# range.
> > +    // Also mark as a key frame if any packet is of that type.
> > +    if (frame_type_ != kVideoFrameKey) {
> > +      frame_type_ = packet.frameType;
> 
> This is a little iffy though likely works in most cases if there's no loss.
> 
> The jitter buffer in recovery mode will "fast-forward" to a KeyFrame.  In
> normal H.264 sequences <SPS><PPS><iframe> or <pframe> this will work;
> however, since all non-VCL NALs are marked as "KeyFrame" now, it will try to
> fast forward to any "session" with a KeyFrame.  

The only non-VCL NALUs that get marked as KeyFrame are SPS, PPS and Recovery Point SEI. The only VCL NALU that gets marked as KeyFrame is IDR. These all explicitly signal that recovery can start here, so it makes sense to "fast forward" here if necessary for recovery. All other non-VCL and VCL NALUs don't get marked as KeyFrame.

> This means <SPS><PPS><pframe> (admittedly odd), <SPS><PPS><i-slice>
> (probably odd), <SPS><PPS> (odd, maybe 6184 violation, but possible),
> <pframe><SEI-or-other-random-NAL> (likely odd, but possible I suppose) will
> cause the jitterbuffer to assume it can fast-forward to the
> timestamp/"session" and start decoding there.  I'm not sure I want to think
> about what a STAP-A with a leading p-frame and later IDR all with the same
> timestamp would cause, assuming this is (semi)-legal.

This is intentional. While odd, or just uncommon, some H.264 gear does this.

> Also (and perhaps more dangerously), if there's loss and the decoder tries
> to decode anyways, the fast-forward code might fail.  It may check
> "HaveFirstPacket() && HaveLastPacket()", which with this change will always
> be true even if we're missing packets.  So <SPS><PPS><loss> would be seen as
> a valid fast-forward target.  Or <loss><PPS><loss>.  or <loss><iframe> where
> the SPS/PPS were different than the last one (admittedly rare, and a known
> failure mode of H.264 depacketization).  Setting KeyFrame only if an
> <iframe> is present removes all but the last (well-known, and almsot
> unavoidable) failure mode.

Setting KeyFrame only for IDR won't work for all H.264 peers due to GDR/AIR (see earlier comment). And it still has the same loss issues. Only leading loss is an issue, regardless if we set KeyFrame only for IDR or also SPS/PPS/Rec.Pt.SEI. Trailing loss will fail the HaveLastPacket check, and middle loss will fail the InSequence check.

Here is an idea to fix the leading loss case, similar to the VP8 code. Enforce isFirstPacketOfTimestamp by looking for first_mb=0 in VCL NALUs (1,5). Then the HaveFirstPacket checks would fail for leading loss.

> A more complete fix would be to update the fast-forward code to query
> codec/packetization-knowledgable session code instead of relying on a
> precalculated bool.  However, that's out-of-scope currently.  
> 
> A simple fix here would be to consider only iframes KeyFrames, and then use
> the logic here.  This presumes SPS/PPS would be sent (if they're sent) with
> the same timestamp, which they should be.  Please check if that would mess
> anything else up.

It would only mess up GDR/AIR. But it still won't fix leading loss cases.

> Please check the fast-forward code and see if it needs to be updated, or if
> setting KeyFrame only on <iframe>s will work.

Looks like it will work using the idea above (first_mb=0), similar to VP8.

> @@ +502,5 @@
> > +    if ((!HaveLastPacket() && packet.markerBit) ||
> > +        (HaveLastPacket() &&
> > +        IsNewerSequenceNumber(packet.seqNum, last_packet_seq_num_))) {
> > +      last_packet_seq_num_ = packet.seqNum;
> > +    }
> 
> In theory there are packet sequences that would confuse this, such as: 
> TimeStamp: N N N N+1 N+1 N N+1 N+2 

Do you mean this is the network sequence or actual RTP seq# sequence? InSequence will fail if there are seq# gaps within a timestamp. The JB handles separate timestamps separately, so the network sequence shouldn't matter.

> However that wouldn't apply to Mode 0 or Mode 1.
> Add a comment at the front of the "if ((!HaveFirstPacket() ||" block stating
> this assumes the original packet order by sequence number has non-decreasing
> timestamps, which is valid for Mode 0 or Mode 1

The JB may explode elsewhere on such streams, but not here. This code only deals with the packets queued under the same timestamp. An earlier seq# for that same timestamp can come later on the wire and still be accepted. This is unrelated to the relative seq# between different timestamps.

> @@ +523,5 @@
> > +      // Update the frame type with the type of the first media packet.
> > +      // TODO(mikhal): Can this trigger?
> > +      frame_type_ = packet.frameType;
> > +    }
> > +    
> 
> remove trailing whitespace

Yep
(In reply to mzanaty from comment #40)
> (In reply to Randell Jesup [:jesup] from comment #39)
> > Comment on attachment 8488419 [details] [diff] [review]
> > Real Fix isFirstPacket for loss/reorder/nack cases
> > 
> > Review of attachment 8488419 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > r-, but if the investigation shows it's safe to only set KeyFrame when the
> > 'session' includes an <iframe> (by changing rtp_receiver_video.cc), then
> > this will be a trivial review of just the mods to do that.
> 
> That would be safe for FF-to-FF but not for general H.264 peers which may
> use GDR (Gradual Decoder Refresh) or AIR (Adaptive Intra Refresh) techniques
> instead of IDR for recovery. GDR often sends a Recovery Point SEI followed
> by several (e.g. 5) P frames with some percentage (e.g. 20%) of
> intra-macroblocks in each P frame, after which the entire frame has been
> intra-refreshed. AIR is often even more gradual (e.g. 100 frames with 1% IMB
> each). Both GDR and AIR typically send SPS/PPS/SEI before the first P frame
> of the refresh cycle, but no IDR or I frames.

So a more nuanced codec-aware fast-forward would be needed to support these in general.

In the meantime, you could set KeyFrame on <iframe> or <sei> (AIR/GDR).


> > @@ +491,5 @@
> > > +    // and the RTP marker bit is not reliable for the last packet of a frame.
> > > +    // So allow out-of-order packets to update the first and last seq# range.
> > > +    // Also mark as a key frame if any packet is of that type.
> > > +    if (frame_type_ != kVideoFrameKey) {
> > > +      frame_type_ = packet.frameType;
> > 
> > This is a little iffy though likely works in most cases if there's no loss.
> > 
> > The jitter buffer in recovery mode will "fast-forward" to a KeyFrame.  In
> > normal H.264 sequences <SPS><PPS><iframe> or <pframe> this will work;
> > however, since all non-VCL NALs are marked as "KeyFrame" now, it will try to
> > fast forward to any "session" with a KeyFrame.  
> 
> The only non-VCL NALUs that get marked as KeyFrame are SPS, PPS and Recovery
> Point SEI. The only VCL NALU that gets marked as KeyFrame is IDR. These all
> explicitly signal that recovery can start here, so it makes sense to "fast
> forward" here if necessary for recovery. All other non-VCL and VCL NALUs
> don't get marked as KeyFrame.

SPS and PPS don't directly indicate recovery can start there, though in practice they can if followed by iframe (or as you point out SEI).

> 
> > This means <SPS><PPS><pframe> (admittedly odd), <SPS><PPS><i-slice>
> > (probably odd), <SPS><PPS> (odd, maybe 6184 violation, but possible),
> > <pframe><SEI-or-other-random-NAL> (likely odd, but possible I suppose) will
> > cause the jitterbuffer to assume it can fast-forward to the
> > timestamp/"session" and start decoding there.  I'm not sure I want to think
> > about what a STAP-A with a leading p-frame and later IDR all with the same
> > timestamp would cause, assuming this is (semi)-legal.
> 
> This is intentional. While odd, or just uncommon, some H.264 gear does this.

Right, but are all of those valid fast-forward points?

> > Also (and perhaps more dangerously), if there's loss and the decoder tries
> > to decode anyways, the fast-forward code might fail.  It may check
> > "HaveFirstPacket() && HaveLastPacket()", which with this change will always
> > be true even if we're missing packets.  So <SPS><PPS><loss> would be seen as
> > a valid fast-forward target.  Or <loss><PPS><loss>.  or <loss><iframe> where
> > the SPS/PPS were different than the last one (admittedly rare, and a known
> > failure mode of H.264 depacketization).  Setting KeyFrame only if an
> > <iframe> is present removes all but the last (well-known, and almsot
> > unavoidable) failure mode.
> 
> Setting KeyFrame only for IDR won't work for all H.264 peers due to GDR/AIR
> (see earlier comment). And it still has the same loss issues. Only leading
> loss is an issue, regardless if we set KeyFrame only for IDR or also
> SPS/PPS/Rec.Pt.SEI. Trailing loss will fail the HaveLastPacket check, and
> middle loss will fail the InSequence check.

I see now; you're only setting HaveLastPacket on marker bits, and then if that's set and a later same-timestamp packet comes in, updating it to include that.

What about missing Marker bits?  Per above, we shouldn't depend on the Marker bit at all except to indicate we _can_ end frame accumulation at that point and push it down for decode (i.e. optimization; we don't have to wait for the next packet (which may be a while) to see if it has a different timestamp.)  Does HaveLastPacket on the last packet get set when the next packet comes in and has a different timestamp?

This is edge-casey enough we could just comment in the code and maybe file a bug on it, unless you believe missing (as opposed to early) Marker bits are a real possibility.

> 
> Here is an idea to fix the leading loss case, similar to the VP8 code.
> Enforce isFirstPacketOfTimestamp by looking for first_mb=0 in VCL NALUs
> (1,5). Then the HaveFirstPacket checks would fail for leading loss.

Does that fail for i-slices?  Or FMO? (are those allowed in baseline?  Of course, that assumes it's always baseline)
Note that I'm ok with <lost-new-sps><pps><iframe> being incorrectly tagged as fast-forward target, as it will fail the same way with incremental decode-with-errors, and in theory the system should notice a decoder failure and ask for an IDR - of course, it probably doesn't do that.  And there's no true way to know if <loss><iframe> is a valid decode point or not (ok, no easy way without getting into the short of the h264 decoder and breaking into the SPS/PPS and the iframe).  (SPS/PPS aren't required before an iframe, and in sprop-parameter-set usage are never required - but still could be sent.)

> > A more complete fix would be to update the fast-forward code to query
> > codec/packetization-knowledgable session code instead of relying on a
> > precalculated bool.  However, that's out-of-scope currently.  
> > 
> > A simple fix here would be to consider only iframes KeyFrames, and then use
> > the logic here.  This presumes SPS/PPS would be sent (if they're sent) with
> > the same timestamp, which they should be.  Please check if that would mess
> > anything else up.
> 
> It would only mess up GDR/AIR. But it still won't fix leading loss cases.

Would SEI or iframe == keyframe cover most or all of it?
(can we key on SEI decode points /GDR only?   I'm fuzzy on what can be in an SEI)
 
> > Please check the fast-forward code and see if it needs to be updated, or if
> > setting KeyFrame only on <iframe>s will work.
> 
> Looks like it will work using the idea above (first_mb=0), similar to VP8.

May be an option modulo the answers to my questions above.

> 
> > @@ +502,5 @@
> > > +    if ((!HaveLastPacket() && packet.markerBit) ||
> > > +        (HaveLastPacket() &&
> > > +        IsNewerSequenceNumber(packet.seqNum, last_packet_seq_num_))) {
> > > +      last_packet_seq_num_ = packet.seqNum;
> > > +    }
> > 
> > In theory there are packet sequences that would confuse this, such as: 
> > TimeStamp: N N N N+1 N+1 N N+1 N+2 
> 
> Do you mean this is the network sequence or actual RTP seq# sequence?
> InSequence will fail if there are seq# gaps within a timestamp. The JB
> handles separate timestamps separately, so the network sequence shouldn't
> matter.

that's in-sequence RTP (no gaps) with timestamps that are per above (i.e. timestamp goes backwards from Seq# X to Seq # X+1).  And likely that will blow the brains of the current jitterbuffer code.

As I said, I don't think we need to care about this for Mode 0/1.
(In reply to Randell Jesup [:jesup] from comment #41)
> SPS and PPS don't directly indicate recovery can start there, though in
> practice they can if followed by iframe (or as you point out SEI).

Agreed. No major objection to excluding SPS/PPS from KeyFrame marking, since they should appear in the same timestamp as IDR or SEI which will then mark as KeyFrame. Excluding them would cause things to fail if they use a different timestamp than the IDR or SEI, but I've never seen that. The only corner case I've actually seen where this would fail is SPS/PPS/P without SEI. GDR/AIR often include SEI, but I've seen cases of AIR without SEI. That's why I included SPS/PPS, but it is such a low runner scenario that we can probably ignore. OTOH, I see no harm in including them. 

> > > This means <SPS><PPS><pframe> (admittedly odd), <SPS><PPS><i-slice>
> > > (probably odd), <SPS><PPS> (odd, maybe 6184 violation, but possible),
> > > <pframe><SEI-or-other-random-NAL> (likely odd, but possible I suppose) will
> > > cause the jitterbuffer to assume it can fast-forward to the
> > > timestamp/"session" and start decoding there.  I'm not sure I want to think
> > > about what a STAP-A with a leading p-frame and later IDR all with the same
> > > timestamp would cause, assuming this is (semi)-legal.
> > 
> > This is intentional. While odd, or just uncommon, some H.264 gear does this.
> 
> Right, but are all of those valid fast-forward points?

Yes.

> > > Also (and perhaps more dangerously), if there's loss and the decoder tries
> > > to decode anyways, the fast-forward code might fail.  It may check
> > > "HaveFirstPacket() && HaveLastPacket()", which with this change will always
> > > be true even if we're missing packets.  So <SPS><PPS><loss> would be seen as
> > > a valid fast-forward target.  Or <loss><PPS><loss>.  or <loss><iframe> where
> > > the SPS/PPS were different than the last one (admittedly rare, and a known
> > > failure mode of H.264 depacketization).  Setting KeyFrame only if an
> > > <iframe> is present removes all but the last (well-known, and almsot
> > > unavoidable) failure mode.
> > 
> > Setting KeyFrame only for IDR won't work for all H.264 peers due to GDR/AIR
> > (see earlier comment). And it still has the same loss issues. Only leading
> > loss is an issue, regardless if we set KeyFrame only for IDR or also
> > SPS/PPS/Rec.Pt.SEI. Trailing loss will fail the HaveLastPacket check, and
> > middle loss will fail the InSequence check.
> 
> I see now; you're only setting HaveLastPacket on marker bits, and then if
> that's set and a later same-timestamp packet comes in, updating it to
> include that.
> 
> What about missing Marker bits?  Per above, we shouldn't depend on the
> Marker bit at all except to indicate we _can_ end frame accumulation at that
> point and push it down for decode (i.e. optimization; we don't have to wait
> for the next packet (which may be a while) to see if it has a different
> timestamp.)  Does HaveLastPacket on the last packet get set when the next
> packet comes in and has a different timestamp?

No. That would require snooping into another "session" from the current one. But it would completely avoid any dependence on reliable marker bits. Let me see how easy or intrusive this is.

> This is edge-casey enough we could just comment in the code and maybe file a
> bug on it, unless you believe missing (as opposed to early) Marker bits are
> a real possibility.

I've never seen missing markers. But as you pointed out, the spec allows it.

> > Here is an idea to fix the leading loss case, similar to the VP8 code.
> > Enforce isFirstPacketOfTimestamp by looking for first_mb=0 in VCL NALUs
> > (1,5). Then the HaveFirstPacket checks would fail for leading loss.
> 
> Does that fail for i-slices?  Or FMO? (are those allowed in baseline?  Of
> course, that assumes it's always baseline)

It works for everything except FMO/ASO, which is not allowed in CBP (or CHP/HP).

> Note that I'm ok with <lost-new-sps><pps><iframe> being incorrectly tagged
> as fast-forward target, as it will fail the same way with incremental
> decode-with-errors, and in theory the system should notice a decoder failure
> and ask for an IDR - of course, it probably doesn't do that.  And there's no
> true way to know if <loss><iframe> is a valid decode point or not (ok, no
> easy way without getting into the short of the h264 decoder and breaking
> into the SPS/PPS and the iframe).  (SPS/PPS aren't required before an
> iframe, and in sprop-parameter-set usage are never required - but still
> could be sent.)

Agreed. Knowing for sure something is a valid and complete refresh point takes very deep inspection. We only use a simpler heuristic, which should always work unless the peer is bad (buggy or rogue).
 
> > > A more complete fix would be to update the fast-forward code to query
> > > codec/packetization-knowledgable session code instead of relying on a
> > > precalculated bool.  However, that's out-of-scope currently.  
> > > 
> > > A simple fix here would be to consider only iframes KeyFrames, and then use
> > > the logic here.  This presumes SPS/PPS would be sent (if they're sent) with
> > > the same timestamp, which they should be.  Please check if that would mess
> > > anything else up.
> > 
> > It would only mess up GDR/AIR. But it still won't fix leading loss cases.
> 
> Would SEI or iframe == keyframe cover most or all of it?
> (can we key on SEI decode points /GDR only?   I'm fuzzy on what can be in an
> SEI)

Yes, most but not all, see first comment about including/excluding SPS/PPS. We already key on SEI only if it is a Recovery Point, not just any SEI, in ReceiveH264Codec.

    case RtpFormatH264::kSei: // check if it is a Recovery Point SEI (aka GDR)
      if (offset+1 >= payload_data_length) return -1; // malformed
      if (payload_data[offset+1] != RtpFormatH264::kSeiRecPt) break;

> > > Please check the fast-forward code and see if it needs to be updated, or if
> > > setting KeyFrame only on <iframe>s will work.
> > 
> > Looks like it will work using the idea above (first_mb=0), similar to VP8.
> 
> May be an option modulo the answers to my questions above.

What do you think about this option after these answers?
(In reply to mzanaty from comment #42)
> (In reply to Randell Jesup [:jesup] from comment #41)
> > SPS and PPS don't directly indicate recovery can start there, though in
> > practice they can if followed by iframe (or as you point out SEI).
> 
> Agreed. No major objection to excluding SPS/PPS from KeyFrame marking, since
> they should appear in the same timestamp as IDR or SEI which will then mark
> as KeyFrame. Excluding them would cause things to fail if they use a
> different timestamp than the IDR or SEI, but I've never seen that. The only
> corner case I've actually seen where this would fail is SPS/PPS/P without
> SEI. GDR/AIR often include SEI, but I've seen cases of AIR without SEI.
> That's why I included SPS/PPS, but it is such a low runner scenario that we
> can probably ignore. OTOH, I see no harm in including them. 

So, thinking some more: the only case where SPS/PPS causing it to be marked as KeyFrame will break are:
<SPS><PPS><pframe> (no SEI - either AIR without SEI, or <SPS><PPS> that aren't part of a refresh).
And to that point - the more important thing in fast-forward is to not incorrectly skip over an IDR, especially if the decoder can return an error to mean "I need an IDR".

So, Now that I know more details about some of the SEI uses (and the HaveLastFrame() is understood to require a Marker bit somewhere in the frame/session/timestamp), I think we can just go with SPS/PPS/iframe/SEI-with-refresh as it's coded.

> > What about missing Marker bits?  Per above, we shouldn't depend on the
> > Marker bit at all except to indicate we _can_ end frame accumulation at that
> > point and push it down for decode (i.e. optimization; we don't have to wait
> > for the next packet (which may be a while) to see if it has a different
> > timestamp.)  Does HaveLastPacket on the last packet get set when the next
> > packet comes in and has a different timestamp?
> 
> No. That would require snooping into another "session" from the current one.
> But it would completely avoid any dependence on reliable marker bits. Let me
> see how easy or intrusive this is.

Sure, but follow-on bug.
 
> > This is edge-casey enough we could just comment in the code and maybe file a
> > bug on it, unless you believe missing (as opposed to early) Marker bits are
> > a real possibility.
> 
> I've never seen missing markers. But as you pointed out, the spec allows it.

Right

> > > Here is an idea to fix the leading loss case, similar to the VP8 code.
> > > Enforce isFirstPacketOfTimestamp by looking for first_mb=0 in VCL NALUs
> > > (1,5). Then the HaveFirstPacket checks would fail for leading loss.
> > 
> > Does that fail for i-slices?  Or FMO? (are those allowed in baseline?  Of
> > course, that assumes it's always baseline)
> 
> It works for everything except FMO/ASO, which is not allowed in CBP (or
> CHP/HP).

Can we add that to the first-packet setting?  So it would be 
Set/Update first-packet on: in-order-but-new-timestamp || non-VCL-NAL || VCL(mb==0)
(and even the first there is likely optional).  And comments!

> > Note that I'm ok with <lost-new-sps><pps><iframe> being incorrectly tagged
> > as fast-forward target, as it will fail the same way with incremental
> > decode-with-errors, and in theory the system should notice a decoder failure
> > and ask for an IDR - of course, it probably doesn't do that.  And there's no
> > true way to know if <loss><iframe> is a valid decode point or not (ok, no
> > easy way without getting into the short of the h264 decoder and breaking
> > into the SPS/PPS and the iframe).  (SPS/PPS aren't required before an
> > iframe, and in sprop-parameter-set usage are never required - but still
> > could be sent.)
> 
> Agreed. Knowing for sure something is a valid and complete refresh point
> takes very deep inspection. We only use a simpler heuristic, which should
> always work unless the peer is bad (buggy or rogue).

Right.  And if we're overly permissive in deciding is_refresh, that's better.
Comment on attachment 8488419 [details] [diff] [review]
Real Fix isFirstPacket for loss/reorder/nack cases

After extensive discussion and more analysis, r+ with the previous nits
Attachment #8488419 - Flags: review- → review+
https://hg.mozilla.org/mozilla-central/rev/123ddef31d44
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Attachment #8488055 - Flags: feedback?(mzanaty)
I ran it again with the 9/14 Nightly debug build. Worked great. Logs are here:

https://drive.google.com/a/mozilla.com/file/d/0B1BaMtdCeAxdNWdORGFqcllEdGM/edit?usp=sharing
Comment on attachment 8488419 [details] [diff] [review]
Real Fix isFirstPacket for loss/reorder/nack cases

Approval Request Comment
[Feature/regressing bug #]: bug 1050461 (mode 0 support)

[User impact if declined]: Failure under packet loss; the only other option would be to back out a fairly large patch (and turn off mode 0 support, which external equipment we expect to work with uses).

[Describe test coverage new/current, TBPL]: On central/nightlies; verified by one of the two people who could reliably reproduce this (Syd).

[Risks and why]: Hard to make it work any worse; low risk - basically just adjusts the setting of the "first packet of frame/last packet of frame" bits.

[String/UUID change made/needed]: none
Attachment #8488419 - Flags: approval-mozilla-beta?
Attachment #8488419 - Flags: approval-mozilla-aurora?
Comment on attachment 8488419 [details] [diff] [review]
Real Fix isFirstPacket for loss/reorder/nack cases

Beta+
Aurora+
Attachment #8488419 - Flags: approval-mozilla-beta?
Attachment #8488419 - Flags: approval-mozilla-beta+
Attachment #8488419 - Flags: approval-mozilla-aurora?
Attachment #8488419 - Flags: approval-mozilla-aurora+
Whiteboard: [openh264-uplift]
You need to log in before you can comment on or make changes to this bug.