Closed Bug 1435025 Opened 6 years ago Closed 6 years ago

Rtp padding packets dropped as invalid packets

Categories

(Core :: WebRTC, defect, P2)

58 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: trisfald, Assigned: drno, NeedInfo)

References

(Blocks 1 open bug)

Details

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Build ID: 20180128191252

Steps to reproduce:

Hello!
We use a SFU to do webrtc calls with simulcast enabled. 
When we limit the receiver's bandwidth the video switches to a lower quality and this's correct. Meanwhile, we send padding packets in the video stream to probe the bandwidth. After a bit, we switch off the limiter.

Firefox displays the video correctly but there are issues with the receiver reports.


Actual results:

Upon deactivating the bandwidth limiter, the rembs sent by Firefox stay very low (forever). Also, the Receiver report has a high value of fraction lost, even though there aren't lost packets and no nacks from the receiver. 
Low rembs prevents us from switching to a higher quality layer. However, when we disable rembs we cannot increase the estimate because of the high fraction lost in RR.
If we disable the padding generator, rembs have the same problem and don't work, but the value fraction lost reported in RRs is 0.

These issues are not present with Chrome.

The video stream is displayed correctly.


Expected results:

In the presence of padding packets rembs should increase progressively when available bandwidth increases. Receiver reports should not indicate fraction lost if there are no losses.


Thanks!
Michael, can you help triage this?
Flags: needinfo?(mfroman)
Please see the instructions here[1] to get a dump of the packets when the issue occurs.  Also, after the call, go to about:webrtc and click the 'save the page' button.  Please attach both to the bug here to give us more info to go on.

[1] https://blog.mozilla.org/webrtc/debugging-encrypted-rtp-is-more-fun-than-it-used-to-be/
Flags: needinfo?(mfroman) → needinfo?(trisfald)
Priority: -- → P2
If we can get a URL/login/whatever, we could try it ourselves as well, and also take an 'rr' trace perhaps (though because of single-threading execution, that can slow things down enough to warp realtime code).  But even without rr being able to debug against it would be even better than a trace.

You can also get a log of a call with setting these env vars:
WEBRTC_TRACE_FILE=nspr MOZ_LOG=webrtc_trace:5,signaling:5 MOZ_LOG_FILE=some temporary file

Note: it will be long.  Compress it and upload here, if you can.  Thanks!
Ok so, I did various tests and found some interesting things:
1) the moz.log doesn't work as expected when trying to convert it to pcap since RTP_PACKET or RTCP_packet isn't printed in every packet received so the egrep method mentioned in the link [1] wasn't very helpful.
So I had to make a regexp and replace all 00 00\n with 00 00 RTP_PACKET\n and 00 0\n with 00 0 RTP_PACKET\n.
I don't know if it's a bug or not, if you notice that too I can open an issue.

So, after replacing everything I finally have one pcap with everything inside.

Meanwhile I did a wireshark probe and saved everything in a pcap.

Confronting the firefox pcap and the wireshark one I noticed that all dropped packets seems to be our last padding packet sent in a row (that contains a NON full padding packet), but not every last padding packet sent is dropped by Firefox *1.

Notice that these lost packet correctly arrived to the client since in the wireshark probe they are present.

At this point I cannot explain to me if there's something wrong when we generate the last padding packet or Firefox can't parse it right (since chrome seems not be affected).

To verify my theory, I commented out the line where we sent the last padding packet and worked like a charm!
The line that seems to broke everything is this [2]. If I comment this out works perfectly.

[1] https://blog.mozilla.org/webrtc/debugging-encrypted-rtp-is-more-fun-than-it-used-to-be/
[2] https://github.com/lynckia/licode/blob/ac9b138fbf80a87444365a8f86f6dec0628d28a9/erizo/src/erizo/rtp/RtpPaddingGeneratorHandler.cpp#L113

To do this test I used Licode as SFU and firefox as a subscribe for a simulcast stream generated from chrome.
I capped the bandwidth at the middle of the test to make firefox switch to low quality, then released the bandwidth to let him try to grow thanks to padding.
So if you look at the pcap, I suggest you to start from the bottom, where there was plenty of bandwidth and you can rapidly see that every dropped packet is a last padding packet.

* During other tests I noticed that the lasts packets which passes are the ones with the last bit (which indicates the length of the padding inside the rtp packet) is equal o inferior to the packet length (obviously). But the problem is that the last bit which i’m reading is in encrypted state by SRTP so it’s not meaningful if the packet isn’t decrypted before reading the whole packet.

So, my question is only one: Does Firefox decrypt every packet before checking if is a padding packet?

For example, take my pcaps and search for seq.num: 29910
In firefox this packet was dropped and seen as packet lost. in wireshark.pcap we can see the packet exist but the padding length is seen as 122 (which is > of the total length of the packet (107)).
this is because the 122 is seen in an encrypted bit (7a).

But if you look at the packet 29914 you can see that the padding count is 211 but the packet length is 311 (which can handle 211 bytes of padding if we remove 42 for ip and 20 for the header). In fact this packet wasn’t dropped by Firefox.
We cannot think of other things.

That’s will explain why the full padding packets always pass (cause they’re always long enough to be > padding length seen by inspecting the last bit in encrypted way).
Can this explain the whole thing?

I'm sorry for the bad english but this is so complicated :/

(attached file)
Attached file discussion.zip
Flags: needinfo?(rjesup)
Flags: needinfo?(drno)
Yeah this is strange. I'm wondering if for some unknown reason the RTP parsing here https://searchfox.org/mozilla-central/source/media/webrtc/signaling/src/mediapipeline/MediaPipeline.cpp#1110 fails. That is one spot where incoming RTP packets get dropped without any log message.

To answer your question: as you can see in that MediaPipeline code we don't actually pay any attention to padding.

Can you provide us (maybe outside of Bugzilla if that helps) with a URL where we can test this our self?

If not the next best option would probably be to build a version of Firefox for you with additional log message in MediaPipeline where we currently don't log anything.
Flags: needinfo?(drno)
Yes, all packets are decrypted (in mtransport) before the RTP is looked at (first in MediaPipeline, then in the webrtc.org code directly).

Unfortunately, the RTP dumping code doesn't let you dump the full contents of RTP packets... (nils, can we change this by at least adding a pref?)  There's no security impact I believe, though you can run it by someone (dveditz?))

This could be failing in the SRTP decrypt, or here, or perhaps after being passed to webrtc.org code as an input packet (but that should be logged.)

HOWEVER:

Looking at seq 29121 (lost) and 29120 (not lost):
29120 has 0xEC (236) padding bytes starting at 0x3e (60 bytes of header and header-extension before the padding).
29121 has 0x5E (94) passing bytes.  They would thus start at 0x2a (42) -- but there are still 60 bytes of header/header-extension.  

This is an invalid RTP packet.  Chrome shouldn't accept it...
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Flags: needinfo?(rjesup)
Resolution: --- → INVALID
(In reply to Randell Jesup [:jesup] from comment #7)
> Yes, all packets are decrypted (in mtransport) before the RTP is looked at
> (first in MediaPipeline, then in the webrtc.org code directly).
> 
> Unfortunately, the RTP dumping code doesn't let you dump the full contents
> of RTP packets... (nils, can we change this by at least adding a pref?) 
> There's no security impact I believe, though you can run it by someone
> (dveditz?))
> 
> This could be failing in the SRTP decrypt, or here, or perhaps after being
> passed to webrtc.org code as an input packet (but that should be logged.)
> 
> HOWEVER:
> 
> Looking at seq 29121 (lost) and 29120 (not lost):
> 29120 has 0xEC (236) padding bytes starting at 0x3e (60 bytes of header and
> header-extension before the padding).
> 29121 has 0x5E (94) passing bytes.  They would thus start at 0x2a (42) --
> but there are still 60 bytes of header/header-extension.  
> 
> This is an invalid RTP packet.  Chrome shouldn't accept it...

Yeah but 5E is the last byte in encrypted state. the wireshark.pcap dump has the traffic still encrypted, so this 5e is a "random" crypted data and doesn't reflect the actual value in this last byte field. Looking at my server info this byte should be 64.

SENDING LAST PADDING: 29121, header_length: 20, payload: 120, padding size: 64

However if you can make me a build with some logs where there might be dropping would be great. also this night (CET) I'll manage to make you a link to a demo page where you can actually test.
Looking for example at the packet 29426 (which is miracolously passed) from the firefox.pcap and count the padding bytes, it's exactly 221 and looking at the server console:

SENDING LAST PADDING: 29426, header_length: 20, payload: 120, padding size: 221

But if you look at the encrypted packed in wireshark.pcap the padding count is 16. that's why passed. because the length of the total packet is > 16 (which is again a random value due to encryption).

So, I'm seeing something strange, I don't question about the fact that maybe we are sending something wrong, but until now seems everything correct.
Following Nils suggestion I looked inside Mediapipeline.cpp. To me, it seems very likely that some padding packets get dropped at line 1110.

The reason is that inside the call to mRtpParser->Parse there's a call to the Parse function that bring us here: https://searchfox.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtp_utility.cc#182. Here, the payload bit P is read as well as header->paddingLength, which is used few lines below (line 227) to perform a check.

However, the packet payload is still encrypted at this point, so the value of the last padding byte is garbage and the check might fail even for correct packets. The srtp unprotect happens later in Mediapipeline.cpp https://searchfox.org/mozilla-central/source/media/webrtc/signaling/src/mediapipeline/MediaPipeline.cpp#1162 after the rtp parse check.

We built firefox from source, moving the check at Mediapipeline 1110 after the packet unprotect. Everything works better: we don't see anymore missing padding packets in the dump generated by the browser nor in the receiver reports.
Flags: needinfo?(trisfald)
Thanks for tracking this down. I'll have a look at how we can fix this.
Assignee: nobody → drno
Status: RESOLVED → REOPENED
Rank: 15
Ever confirmed: true
Resolution: INVALID → ---
(In reply to Nils Ohlmeier [:drno] from comment #12)
> Created attachment 8948905 [details]
> Bug 1435025: decrypt SRTP to allow parsing of padding.
> 
> Review commit: https://reviewboard.mozilla.org/r/218312/diff/#index_header
> See other reviews: https://reviewboard.mozilla.org/r/218312/

Tested the patch and seems very good to us! thanks!!
Comment on attachment 8948905 [details]
Bug 1435025: don't parse padding on SRTP packets

https://reviewboard.mozilla.org/r/218312/#review224214

::: media/webrtc/signaling/src/mediapipeline/MediaPipeline.cpp:1115
(Diff revision 1)
> +  // Make a copy rather than cast away constness
> +  auto innerData = MakeUnique<unsigned char[]>(aLen);
> +  memcpy(innerData.get(), aData, aLen);
> +  int outLen = 0;
> +  nsresult res =
> +    mRtp.mRecvSrtp->UnprotectRtp(innerData.get(), aLen, aLen, &outLen);

Ooof! We're going to be decrypting the same packet multiple times when using bundle. Is there no other way to do this?
Attachment #8948905 - Flags: review?(docfaraday) → review-
Comment on attachment 8948905 [details]
Bug 1435025: don't parse padding on SRTP packets

https://reviewboard.mozilla.org/r/218312/#review224214

> Ooof! We're going to be decrypting the same packet multiple times when using bundle. Is there no other way to do this?

Ugh. That is a good point I hadn't considered.

The other options which came to my mind:
1) Add an option to the webrtc.org RTP parser to only parse the header
2) Ignore the return value of the RTP parser (and potentially parse it again after successful decryption to ensure it's a valid RTP packet)
3) Build our own minimal RTP parser
Summary: Rtp padding packets considered as packet lost → Rtp padding packets dropped as invalid packets
I wanted to see how hard option #1 would be and it turned out to be pretty easy. What do you think about landing this as a short term fix and opening a new bug for re-factoring the SRTP parsing down into the transportlayer code?
Flags: needinfo?(docfaraday)
Comment on attachment 8948905 [details]
Bug 1435025: don't parse padding on SRTP packets

https://reviewboard.mozilla.org/r/218312/#review224508

Wow that's a lot of classes you need to touch for such a simple change.
Attachment #8948905 - Flags: review?(docfaraday) → review+
(In reply to Nils Ohlmeier [:drno] from comment #17)
> I wanted to see how hard option #1 would be and it turned out to be pretty
> easy. What do you think about landing this as a short term fix and opening a
> new bug for re-factoring the SRTP parsing down into the transportlayer code?

I guess that's ok with me.
Since we are not easily able to reproduce this can I ask you to test the builds with the second patch?

Linux 64: https://queue.taskcluster.net/v1/task/Zn9QrhpYRUmKlLXaKdBzUQ/runs/0/artifacts/public/build/target.tar.bz2
OS X: https://queue.taskcluster.net/v1/task/aiehMGMPSZG9SJxJVCnMIg/runs/0/artifacts/public/build/target.dmg
Win 64: https://queue.taskcluster.net/v1/task/Rf6u23-4TAWiK5etPR2nQg/runs/0/artifacts/public/build/target.zip
Flags: needinfo?(work)
Flags: needinfo?(trisfald)
Flags: needinfo?(docfaraday)
Sure!
We'll setup a test server tomorrow morning (CET).
Have a nice day!
(In reply to Nils Ohlmeier [:drno] from comment #20)
> Since we are not easily able to reproduce this can I ask you to test the
> builds with the second patch?
> 
> Linux 64:
> https://queue.taskcluster.net/v1/task/Zn9QrhpYRUmKlLXaKdBzUQ/runs/0/
> artifacts/public/build/target.tar.bz2
> OS X:
> https://queue.taskcluster.net/v1/task/aiehMGMPSZG9SJxJVCnMIg/runs/0/
> artifacts/public/build/target.dmg
> Win 64:
> https://queue.taskcluster.net/v1/task/Rf6u23-4TAWiK5etPR2nQg/runs/0/
> artifacts/public/build/target.zip

Works like a charm! 
no more dropped rtp padding packets!
If you want to inspect I will attach the pcap. every malformed packet with a marker (note that we added a marker only to distinguish between the rtp padding full packet and the last non-full padding packet) is a last padding!

Note that as I said before (maybe is a bug), the moz-log is somehow "malformed".
Tons of packet doesn't have the "RTP_PACKET" string after the zeroes so we had to use a regexp and replace all 00 00\n with 00 00 RTP_PACKET\n and 00 0\n with 00 0 RTP_PACKET\n.

and since text2pcap use the regex 

> egrep '(RTP_PACKET|RTCP_PACKET)' moz.log.child-1 | text2pcap -D -n -l 1 -i 17 -u 1234,1235 -t '%H:%M:%S.' - rtp.pcap

lots of packet won't be translated into .pcap

Have a nice day guys!
Flags: needinfo?(work)
Attached file working_build_pcap.zip
This is a dump of the wireshark probe containing all padding data (no more dropped packets!)
Comment on attachment 8948905 [details]
Bug 1435025: don't parse padding on SRTP packets

https://reviewboard.mozilla.org/r/218312/#review224866

r+ here too; we should wrap this up in a patch for upstream (dminor)
Attachment #8948905 - Flags: review+
Setting needinfo to remind me to look at getting this upstreamed.
Flags: needinfo?(dminor)
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/a9cf23c66a7e
don't parse padding on SRTP packets r=bwc,jesup
Blocks: 1437199
https://hg.mozilla.org/mozilla-central/rev/a9cf23c66a7e
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
I think it makes more sense for patch authors to get their changes upstreamed rather than relying upon me as they will be in a better position to make a case for their changes. In this case, since it is (supposed to be!) a short term fix, it doesn't seem like something we would attempt to upstream.
Flags: needinfo?(dminor)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: