Closed Bug 996379 Opened 6 years ago Closed 6 years ago

OpenH264 decoder fail to decode the frame from B2G HW h264 encoder

Categories

(Core :: WebRTC: Audio/Video, defect, critical)

x86
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 985254
2.0 S2 (23may)
feature-b2g 2.0

People

(Reporter: rlin, Assigned: rlin)

References

Details

(Whiteboard: [ft:multimedia-platform][c=webrtc, s=fx32])

Attachments

(2 files, 2 obsolete files)

The OpenH264 decoder can't recovery the frame loss without the key frame with IDR.
The OpenH264 responds to keyframe requests with SPS + PPS + IDR.
Need some change for the HW encoder on mobile devices.
The receiver side found encoded frames drop and cause decoder can't recover.
Can you explain this in more detail please? I can't fully follow here. And what exactly is the fix here?
I'm trying to test the compatibility between Openh264 and mobile device. 
With bug 911046, We found the OpenH264 can't decode the frame from mobile device after several successful decode result.
Check the log and found encoded frame can drop in the jitter buffer with this msg (kOldPacket = -5)
Asked in OpenH264 maillist, the OpenH264 decoder required the IDR frame to recover if the deltaFrame got miss.
So we would try to generate the keyframe with SPS + PPS + IDR as OpenH264 decoder to fix this problem.
Assignee: nobody → rlin
Can we fix this on the OpenH264 side?
Typically, H.264 is used in either external negotiation mode (sprop-parameter-sets in the SDP fmtp line, which have copies of the SPS/PPS settings), or in-band parameter sets (SPS and PPS before every IDR (iframe)).  Note that on every SPS/PPS change the resolution can change. 

It sounds like OpenH264 is being used in in-band mode (which is by far the most common in videoconferencing), and the default mode for the OpenMax code is out-of-band.  Likely this is because the primary usecase for the encoder is for video-capture to SD card, where there's no packet loss.

a) OpenH264 should not error out if an IDR does not have an SPS/PPS so long as the frames reference a SPS and PPS that have been received (frames have a parameter-set index to use).  It already has the SPS/PPS; it should use them.  This avoids freezes on a dropped SPS/PPS packet.

b) the OMX encoder should (if used in in-band mode) generate SPS/PPS before each IDR.  This is important for front-of-stream losses (common!), and for IDRs sent to sync a new participant in a conference.
Found the sender send strange rtp timestamp to decoder. That may affect jitter buffer behavior.
236	3.994492	10.247.29.162	10.247.26.174	RTP	90	PT=DynamicRTP-Type-97, SSRC=0x5A559332, Seq=4349, Time=65362159, Mark  
^^^^^^^^^^^^^^^^^^^^^^^^^^^ first frame's timestamp should be wrong...
239	4.000320	10.247.29.162	10.247.26.174	RTP	880	PT=DynamicRTP-Type-97, SSRC=0x5A559332, Seq=4350, Time=880031669 
240	4.000481	10.247.29.162	10.247.26.174	RTP	880	PT=DynamicRTP-Type-97, SSRC=0x5A559332, Seq=4351, Time=880031669 
....
457	4.478651	10.247.29.162	10.247.26.174	RTP	1200	PT=DynamicRTP-Type-97, SSRC=0x5A559332, Seq=4381, Time=880075139, Mark 

An "workaround" was found when disable timing error check (set timing_error = false;)
http://mxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_coding/main/source/receiver.cc#166
http://mxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_coding/main/source/receiver.cc#173
This is for testing...I will toward to find why cause this issue.

BTW, I'm not the h264 code expert and I'm not sure how to fix this....
Length implies that is an SPS or PPS H.264 packet/NALU

That timestamp is wildly wrong, perhaps uninitialized.

From the 911046 patch, it should give the SPS/PPS the same timestamp as the rest of the IDR; if it doesn't there's a bug in the patch.  Add debugging there or use a debugger to examine it
We reached out to the OMX driver vendor to get this fixed on their end but in addition we also have to talk to cisco to fix this on the OpenH264, plus we need to get the timestamp issue fixed.
The wrong SPS/PPS timestamp only happens in obsoleted github branch. Current patch in bug 911046 doesn't have this problem.
rlin told me that seperating SPS/PPS and I-frame in two 'encoded image' with same timestamp also has problem because jitter buffer thinks it's duplicated. I think the final solution should be prepending SPS/PPS to all I-frames and pack that in one 'encoded image'.
(In reply to John Lin[:jolin][:jhlin] from comment #9)
> The wrong SPS/PPS timestamp only happens in obsoleted github branch. Current
> patch in bug 911046 doesn't have this problem.
> rlin told me that seperating SPS/PPS and I-frame in two 'encoded image' with
> same timestamp also has problem because jitter buffer thinks it's
> duplicated. I think the final solution should be prepending SPS/PPS to all
> I-frames and pack that in one 'encoded image'.
I will try if this prepending solution can solve the 
1. duplicated timestamp 2.packet loss and OpenH264 decoder can't recovery with those error msg. 
0x04(dsRefLos) >= 0x02(dsNoParamSets)->0x10(dsNoParamSets) ->0x10...
Assumption: I'm basing this on the data in this bug, and the latest patches on bug 911046. which are still under review, and discussion on the OpenH264 list.  rlin tells me on IRC that the timing error reported here was in an old version of  the OMX patches, so the timing issue is (now) a red herring.

Note that in the current (newer) patchset from bug 911046, the timestamp doesn't actually come from the OMX code or hardware; it comes from the code in the patch there.  The time (in us) is passed into mOMX->Encode(), but output encoded NALUs have their timestamp set from the input frame's timestamp (timeUs from GetNextEncodedFrame() is unused).

The RTP reception code, when faced with a wildly incorrect timestamp will throw the packet away as an error -- as rlin's comment 6 shows.  When that happens, it will flush the jitter buffer an restart decoding; if this happens on every IDR/SPS/PPS, it will never recover.  This will never get passed to OpenH264.


The remaining issue appears to be the OpenH264 decoder considering a frame_num jump to be a pack loss requiring recovery, and it doesn't allow the application to decide whether it should decode anyways.  (Most decoders leave it up to the application.)  Combine that with the OMX encoder not inserting SPS/PPS before subsequent iframes, and you never recover (I think because OpenH264 is looking for SPS/PPS/iframe combo, though it shouldn't need to).

a) the OMX HW encoder should not have frame_num jump on a stream without errors, even if this is allowed in the spec
b) the OMX interface code we're writing should insert cached SPS/PPS as needed when an iframe is output
c) OpenH264 should relax the decoding requirements to allow decoding past a frame drop.
d) (if this is actually the case) OpenH264 should allow recovery with just iframe (or just p-frame from an LTR or other reference frame)

I think a combination of b) and c) will get us working quickly
(In reply to John Lin[:jolin][:jhlin] from comment #9)
> rlin told me that seperating SPS/PPS and I-frame in two 'encoded image' with
> same timestamp also has problem because jitter buffer thinks it's
> duplicated. 

it shouldn't - so long as they have different consecutive RTP values, that should be ok.  Duplicates have the same sequence number, not the same timestamp.  (could be a bug somewhere - can you provide a wireshark trace, and/or a NSPR_LOG_MODULES=webrtc_trace:65535 WEBRTC_TRACE_FILE=whatever log from the openh264 side?

>I think the final solution should be prepending SPS/PPS to all
> I-frames and pack that in one 'encoded image'.

You can "pack" them into a single packet if it's packetization mode 1 and you use a STAP-A packet.  Otherwise, no.  That's probably not what you're referring to above.
(In reply to Randell Jesup [:jesup] from comment #12)
> (In reply to John Lin[:jolin][:jhlin] from comment #9)
> > rlin told me that seperating SPS/PPS and I-frame in two 'encoded image' with
> > same timestamp also has problem because jitter buffer thinks it's
> > duplicated. 
> 
> it shouldn't - so long as they have different consecutive RTP values, that
> should be ok.  Duplicates have the same sequence number, not the same
> timestamp.  (could be a bug somewhere - can you provide a wireshark trace,
> and/or a NSPR_LOG_MODULES=webrtc_trace:65535 WEBRTC_TRACE_FILE=whatever log
> from the openh264 side?
> 
> >I think the final solution should be prepending SPS/PPS to all
> > I-frames and pack that in one 'encoded image'.
> 
> You can "pack" them into a single packet if it's packetization mode 1 and
> you use a STAP-A packet.  Otherwise, no.  That's probably not what you're
> referring to above.
OK, I will attach the wireshark and NSPR logs on this bug.
Attached file webrtc nspr log (obsolete) —
Comment on attachment 8407410 [details]
webrtc nspr log

some problem in wireshark log, need new one.
Attachment #8407410 - Attachment is obsolete: true
Attached file webrtc.tar.gz
encoder send
I/     ( 1033):  encoded._timeStamp 998765978 size 25 ft 0 ps
I/     ( 1033):  encoded._timeStamp 998765978 size 9471 ft 0 ps <-- miss this.
I/     ( 1033):  encoded._timeStamp 998771918 size 7645 ft 1 ps
I/     ( 1033):  encoded._timeStamp 998778488 size 23924 ft 1 ps
I/     ( 1033):  encoded._timeStamp 998785238 size 17051 ft 1 ps

decoder recv
Decoding frame input length= 25  cnt = 1 rv = 0 ts = 1248301187
Decoding frame input length= 7645  cnt = 2 rv = 0 ts = 1248307127
Decoding frame input length= 23924  cnt = 3 rv = 16 ts = 1248313697   
Decoding frame input length= 17051  cnt = 4 rv = 16 ts = 1248320447
DESKTOP ADDRESS = 10.247.29.118
WIP patch to make OpenH264 work with patches in bug 911046.
Comment on attachment 8407465 [details] [diff] [review]
[WIP] Prepends SPS/PPS to I-frames generated by OMX encoder.

Review of attachment 8407465 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/media/omx/OMXCodecWrapper.cpp
@@ +779,5 @@
>        mAMRCSDProvided = true;
>      } else {
> +      if (mCodecType == AVC_ENC) {
> +        PrependParamSets(aOutputBuf, mCodec);
> +      }

You can't just drop multiple NALUs into a single "EncodedFrame" buffers.  Each NALU must either be sent as a separate packet (mode 0 or 1), or combined using an aggregation packet (STAP-A, mode 1 only).  For STAP-A, you can combine SPS, PPS and (if small enough) one or more pframe/iframe NALs, but this would be unusual unless encoding black.

You need to insert them as if they came out of the encoder as separate NALs.  Note also if the encoder outputs SPS and PPS in one output, they must be broken into two RTP packets before sending (which might be related to the problems with the "frame number" reported by Openh264)
Attachment #8407465 - Flags: feedback-
this WIP can recover if OpenH264 got decode error #0x10.
Status: NEW → ASSIGNED
Whiteboard: [ft:multimedia-platform][webrtc]
Target Milestone: --- → 2.0 S1 (9may)
I found 
1. When receiver receive the first packet. the VCMDecodingState::IsOldFrame always return false and avoid to check the timestamp. Because of in_initial_state_ == false. and this flag would set to be true while deliver to decoder. 
2. When receiver receive the 2nd packet with same timestamp, the VCMDecodingState::IsOldFrame would return true because the IsNewerTimestamp return false. And drop it on VCMJitterBuffer::InsertPacket.

So the openH264 decoder would fail to decode by missing the 2nd IDR encoded frame.
Hi Randell,
Does decoder should receive the first EncodedImage with SPS/PPS/IDR content?
If yes, the problem looks why decoder can get 22 bytes of SPS/PPS encoded frame from jitter buffer...
Flags: needinfo?(rjesup)
(In reply to Randy Lin [:rlin] from comment #22)
> Hi Randell,
> Does decoder should receive the first EncodedImage with SPS/PPS/IDR content?
> If yes, the problem looks why decoder can get 22 bytes of SPS/PPS encoded
> frame from jitter buffer...

Discussed on IRC; you should get one packet for SPS, one for PPS, and one for IDR.  Normally they're all the same timestamp, but the webrtc.org code dislikes this.  The patch in bug 985254 has a hack to avoid this problem.
Depends on: 985254
Flags: needinfo?(rjesup)
Hi Randell,
Does the openH264 encoder always generate the NLUs start code? 
I found the codec would automatic append it. (Does firefox browser implementation would handle this?). That would cause problem between with mobile hw h264 and openh264.
Flags: needinfo?(rjesup)
Start codes should never be shipped in RTP - for the OMX encoder, we must strip them before sending, and when receiving data, we must insert them before decoding.  Per jhlin and the patches I've reviewed, we do this (and I'm pretty sure OpenH264 doesn't generate/require start codes).
Flags: needinfo?(rjesup)
Whiteboard: [ft:multimedia-platform][webrtc] → [ft:multimedia-platform][c=webrtc, s=fx32]
Can decode data from frame device, but I found we still have problem on long decoding test on 1.0 version of openH264. Request them to figure out this issue.
feature-b2g: --- → 2.0
Target Milestone: 2.0 S1 (9may) → 2.0 S2 (23may)
This was covered in the packetization bug (bug 985254).
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 985254
This one would be also covered in bug 999704.
Duplicate of bug: 999704
Duplicate of bug: 985254
You need to log in before you can comment on or make changes to this bug.