Closed Bug 1317946 Opened 8 years ago Closed 7 years ago

[e10s] Zero size UDP packets close sockets

Categories

(Core :: WebRTC: Networking, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: carvello.thomas, Assigned: drno)

References

Details

Attachments

(7 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2906.0 Safari/537.36

Steps to reproduce:

Im connecting to my webrtc page with a fresly new installed Firefox
Im calling my webrtc gateway
Restart Firefox
Make the same procedure than before
Reboot and repaire firefox
Make the same procedure than before


Actual results:

Call is successfull  the first time
On a restart , Call failed with Ice Failed messages and no DTLS Exchange after the reception of  the server message.
On a reboot , call is successfull again until first restart


Expected results:

Call successfull every time
Do you have a simple testcase to reproduce the issue? It would help to find a regression.
Component: WebRTC → WebRTC: Networking
Flags: needinfo?(carvello.thomas)
Keywords: testcase-wanted
Attached file DTLS good exchange
Attached file DTLS bad exchange
in fact i have new informations:

the dtls negotiation failed.

when Firefox send a DTLS client HELLO with extended master secret extension , it works.

when Firefox didn't send it, It failed with ICE failed message.

i have attached two pcap traces describing the two cases.

Thank you for your time
Carvello I think you are looking at the wrong layer for the failure here. From looking at your working pcap it looks like in the working case two different transports get established. We see two DTLS handshakes in there.
But in the bad case we only see a single DTLS handshake which probably means ICE did not succeed for the second transport (which results in DTLS not kicking off).

Could you please attach a copy of about:webrtc of the working and ideally also of the not working case?
Whiteboard: [needinfo 2016/11/16 to reporter]
Attached file TraceIceFailed.txt
Attached file TraceGood.txt
i've sent the two cases.
for further informations, it's not  happening every time but only  with certain browser and pcs.
Thanks for the logs. The logs show this is definitely an ICE problem.

So I can see that for some reason Firefox in bad case starts with not being able to use the IPv6 addresses the machine appears to have. But that is probably unrelated to the root cause here.

The logs show some really strange error messages when ICE tries to find a working path for the second (video?) stream.
I would be most interested in PCAP files showing the STUN messages, before we get to the DTLS. Feel free to attach them here on send them to me via email if you are not comfortable sharing these publicly here.
Ok thank you , i have sent you an email directly.
What I find suspicious is that if Firefox sends out the DTLS client hello everything appears to be fine. But in the bad case the other side replies with a STUN binding response, followed immediately by a STUN binding request and an empty UDP packet (some kind of keep alive?).

I think either our ICE stack does not like the STUN request, suggested by these logs messages:

(stun/INFO) STUN-CLIENT(7Orz|IP4:172.19.1.126:54658/UDP|IP4:172.21.100.14:12002/UDP(host(IP4:172.19.1.126:54658/UDP)|candidate:1 1 UDP 33554431 172.21.100.14 12002 typ host)): Received response; processing

(ice/INFO) ICE-PEER(PC:1479323744758000 (id=2147483658 url=XXXXXXXX?hash=f6d4a026a5989ef438ceff6005d889da&co:default)/CAND-PAIR(7Orz): setting pair to state FAILED: 7Orz|IP4:172.19.1.126:54658/UDP|IP4:172.21.100.14:12002/UDP(host(IP4:172.19.1.126:54658/UDP)|candidate:1 1 UDP 33554431 172.21.100.14 12002 typ host)

Or the empty UDP message causes problems (which I currently don't see how - but you never know).

Could you please get us ICE logs from Firefox for the bad case. Here is the description how https://wiki.mozilla.org/Media/WebRTC/Logging#ICE.2FSTUN.2FTURN
I think I would like to see R_LOG_LEVEL=9 in this case. Again feel free to send directly via email if your prefer that.
Thank you_ for your return, i've sent you the traces with extra log level.
Thanks for the traces.
They show that for some yet unknown reason our ICE UDP socket get closed while we are trying to establish the connection.
Assignee: nobody → drno
Status: UNCONFIRMED → NEW
backlog: --- → webrtc/webaudio+
Rank: 25
Ever confirmed: true
Priority: -- → P2
Carvello can you please check on 'about:support' what the row with 'Multiprocess Windows' says?
well i have  my browser in french so it returns "1/1 (Activé par défaut)"


here are all the traces from about:support

Paramètres de base de l’application
-----------------------------------

Nom: Firefox
Version: 50.0
Identifiant de compilation: 20161104212021
Canal de mise à jour: release
Agent utilisateur: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0
Système d’exploitation: Windows_NT 10.0
Fenêtres multi-processus: 1/1 (Activé par défaut)
Mode sans échec: false

Rapports de plantage des 3 derniers jours
-----------------------------------------

Tous les rapports de plantage

Extensions
----------

Nom: Application Update Service Helper
Version: 1.0
Activée: true
ID: aushelper@mozilla.org

Nom: Multi-process staged rollout
Version: 1.5
Activée: true
ID: e10srollout@mozilla.org

Nom: Pocket
Version: 1.0.5
Activée: true
ID: firefox@getpocket.com

Nom: Web Compat
Version: 1.0
Activée: true
ID: webcompat@mozilla.org

Nom: CipherFox
Version: 4.1.0
Activée: false
ID: cipherfox@mkfly

Nom: CipherFox Secure
Version: 0.9.1-signed.1-signed
Activée: false
ID: Cipherfoxsec@none

Nom: Enable-Disable weak SSL ciphers
Version: withIcons.1-signed.1-signed
Activée: false
ID: jid0-Cvxju9QjyfrnsooZFieYymDBAV0@jetpack

Nom: KeepVid Pro
Version: 4.10.0
Activée: false
ID: KVAllmytube@KeepVid.com

Accélération graphique
----------------------

Fonctionnalités
Composition: Direct3D 11
Zoom/Panoramique asynchrones: entrée molette activée
Rendu WebGL: Google Inc. -- ANGLE (Intel(R) HD Graphics 530 Direct3D11 vs_5_0 ps_5_0)
Rendu WebGL2: (no info)
Décodage matériel H264: Yes; Using D3D11 API
Backend audio: wasapi
Direct2D: true
DirectWrite: true (10.0.14393.351)
GPU 1
Actif: Oui
Description: Intel(R) HD Graphics 530
ID du vendeur: 0x8086
ID du périphérique: 0x191b
Version du pilote: 20.19.15.4474
Date du pilote: 6-13-2016
Pilotes: igdumdim64 igd10iumd64 igd10iumd64 igd12umd64 igdumdim32 igd10iumd32 igd10iumd32 igd12umd32
ID du sous-système: 116d1462
RAM: Unknown
GPU 2
Actif: Non
Description: NVIDIA GeForce GTX 970M
ID du vendeur: 0x10de
ID du périphérique: 0x13d8
Version du pilote: 21.21.13.6909
Date du pilote: 8-1-2016
Pilotes: C:\WINDOWS\System32\DriverStore\FileRepository\nvmiwu.inf_amd64_bd786fe53bff67f7\nvd3dumx,C:\WINDOWS\System32\DriverStore\FileRepository\nvmiwu.inf_amd64_bd786fe53bff67f7\nvwgf2umx,C:\WINDOWS\System32\DriverStore\FileRepository\nvmiwu.inf_amd64_bd786fe53bff67f7\nvwgf2umx,C:\WINDOWS\System32\DriverStore\FileRepository\nvmiwu.inf_amd64_bd786fe53bff67f7\nvwgf2umx C:\WINDOWS\System32\DriverStore\FileRepository\nvmiwu.inf_amd64_bd786fe53bff67f7\nvd3dum,C:\WINDOWS\System32\DriverStore\FileRepository\nvmiwu.inf_amd64_bd786fe53bff67f7\nvwgf2um,C:\WINDOWS\System32\DriverStore\FileRepository\nvmiwu.inf_amd64_bd786fe53bff67f7\nvwgf2um,C:\WINDOWS\System32\DriverStore\FileRepository\nvmiwu.inf_amd64_bd786fe53bff67f7\nvwgf2um
ID du sous-système: 116d1462
RAM: 3072
Diagnostics
AzureCanvasAccelerated: 0
AzureCanvasBackend: direct2d 1.1
AzureContentBackend: direct2d 1.1
AzureFallbackCanvasBackend: cairo
Journal des décisions
D3D9_COMPOSITING:
disabled by default: Disabled by default




Préférences modifiées importantes
---------------------------------

accessibility.typeaheadfind.flashBar: 0
browser.cache.disk.capacity: 358400
browser.cache.disk.filesystem_reported: 1
browser.cache.disk.smart_size.first_run: false
browser.cache.disk.smart_size.use_old_max: false
browser.cache.frecency_experiment: 4
browser.download.importedFromSqlite: true
browser.places.smartBookmarksVersion: 8
browser.sessionstore.upgradeBackup.latestBuildID: 20161104212021
browser.startup.homepage_override.buildID: 20161104212021
browser.startup.homepage_override.mstone: 50.0
browser.tabs.remote.autostart.2: true
browser.urlbar.daysBeforeHidingSuggestionsPrompt: 1
browser.urlbar.lastSuggestionsPromptDate: 20161118
extensions.lastAppVersion: 50.0
gfx.crash-guard.d3d11layers.appVersion: 50.0
gfx.crash-guard.d3d11layers.deviceID: 0x191b
gfx.crash-guard.d3d11layers.driverVersion: 20.19.15.4474
gfx.crash-guard.d3d11layers.feature-d2d: true
gfx.crash-guard.d3d11layers.feature-d3d11: true
gfx.crash-guard.status.d3d11layers: 2
gfx.crash-guard.status.d3d11video: 2
media.gmp-eme-adobe.abi: x86-msvc-x64
media.gmp-eme-adobe.lastUpdate: 1479304946
media.gmp-eme-adobe.version: 17
media.gmp-gmpopenh264.abi: x86-msvc-x64
media.gmp-gmpopenh264.lastUpdate: 1479304954
media.gmp-gmpopenh264.version: 1.6
media.gmp-manager.buildID: 20161104212021
media.gmp-manager.lastCheck: 1479400197
media.gmp-widevinecdm.abi: x86-msvc-x64
media.gmp-widevinecdm.lastUpdate: 1479304960
media.gmp-widevinecdm.version: 1.4.8.903
media.gmp.storage.version.observed: 1
media.hardware-video-decoding.failed: false
media.webrtc.debug.log_file: C:\Users\kepch\AppData\Local\Temp\WebRTC.log
network.cookie.prefsMigrated: true
network.predictor.cleaned-up: true
places.database.lastMaintenance: 1479306513
places.history.expiration.transient_current_max_pages: 104858
plugin.disable_full_page_plugin_for_types: application/pdf
plugin.importedState: true
security.sandbox.content.tempDirSuffix: {663c8266-4f94-4437-a717-6bb6c32b3d99}
services.sync.declinedEngines:
storage.vacuum.last.index: 1
storage.vacuum.last.places.sqlite: 1479306513
ui.osk.debug.keyboardDisplayReason: IKPOS: Touch screen not found.

Préférences verrouillées importantes
------------------------------------

Base de données « Places »
--------------------------

JavaScript
----------

Ramasse-miettes incrémentiel: true

Accessibilité
-------------

Activée: false
Empêcher l’accessibilité: 0

Versions des bibliothèques
--------------------------

NSPR
Version minimale attendue: 4.12
Version utilisée: 4.12

NSS
Version minimale attendue: 3.26.2
Version utilisée: 3.26.2

NSSSMIME
Version minimale attendue: 3.26.2
Version utilisée: 3.26.2

NSSSSL
Version minimale attendue: 3.26.2
Version utilisée: 3.26.2

NSSUTIL
Version minimale attendue: 3.26.2
Version utilisée: 3.26.2

Fonctionnalités expérimentales
------------------------------

Bac à sable
-----------

Content Process Sandbox Level: 1
Flags: needinfo?(carvello.thomas)
Keywords: testcase-wanted
Whiteboard: [needinfo 2016/11/16 to reporter]
Carvello could you please download Firefox Nightly from here https://nightly.mozilla.org/ and disable the "Enable multi-process" option on the first page of the Preferences and check if you still have the problem?

If Nightly without the multi-process should work then you download my try build from here:
https://archive.mozilla.org/pub/firefox/try-builds/drno@ohlmeier.org-a2004d1132a7b99dfe36235fac0774e285440318/try-win32/firefox-53.0a1.en-US.win32.zip
and check if it works in that one with and without the multi-process option.
i can confirm that it works without the multi-process option for nightly version.

for your try build ,do i have to remove the nighlty build or just unzip and launch the firefox.exe?

if it is the right thing to do then it works without the multi-process option but still have the ICE failed with the multi-process option.
Do you have any news for me regarding that issue?
So I'm pretty sure it is our packet filter code which causes the problem. But I still don't quite understand what is going wrong in your case.

Could I ask you to get me one more log file: could you get a NSPR log file for this
https://wiki.mozilla.org/Media/WebRTC/Logging#Signaling_.28SDP_offer.2Fanswer_handling.29
With the following NSPR_LOG_MODULES line:
  NSPR_LOG_MODULES=timestamp,signaling:5,mtransport:5,UDPSocket:5
i will send you traces right away.
Sorry I'm still not sure what exactly is going on here. I created a new build for testing purposes in which I have disabled the internal packet filter. Could please download this build https://archive.mozilla.org/pub/firefox/try-builds/drno@ohlmeier.org-466eef1d4ce00cd88241b5506309110d6bf17f52/try-win32/firefox-53.0a1.en-US.win32.zip
and let me know if this works or not?

I want to find out if the problem is caused by our multi-process networking code, or actually by the packet filter. I assume it will take a few more rounds of trying with special builds until we will have an better idea what exactly is causing the problem here.
Ok thank you for your time!

i have downloaded this build and confirm that it still doesn't work  so maybe the internal packet filter is not the issue.
any news regarding this issue?
Hi Carvello,

can I ask you one more time to test another build for me: https://archive.mozilla.org/pub/firefox/try-builds/drno@ohlmeier.org-54d01937da6f2879f140fa91d4cd657c302a190f/try-win32/firefox-53.0a1.en-US.win32.zip

This is a new theory that this is somehow linked to ICE consent, which in fact shipped in Firefox 49.
still have the ice failed issue with this version.
Any news from this issue? My customer is a little bit impatient. If we have t change something from our side , we are okay with that but we still don't know what is cuasing this issue.

thank you.
Have you any news for me?
we are still stucked with this issue and we don't know what to do.
Flags: needinfo?(drno)
Any news for me?
Attachment #8811949 - Attachment is obsolete: true
Hi Carvello,

can you please check if this build is any better: https://archive.mozilla.org/pub/firefox/try-builds/drno@ohlmeier.org-d047dc1ef645ead5283b956ebbfc983cae71b5c8/try-win32/firefox-53.0a1.en-US.win32.zip

I found that in our mutli-process code we have a received length check which could get triggered through the empty UDP packets your implementation appears to send. In the build above I commented that length check out. Another way for you to test this would be if you could make your gateway stop sending these empty UDP packets and see if that makes the problem dis-appear.
Flags: needinfo?(drno)
Seems to work!! I don't have the ice failed message and my video appears!
(In reply to Carvello Thomas from comment #38)
> Seems to work!! I don't have the ice failed message and my video appears!

Great. Patch is up for review and should hopefully get into Firefox 53.
Attachment #8823825 - Flags: review?(mcmanus) → review?(schien)
Summary: Webrtc: ICE failed , NO media since Firefox 49.0 → [e10s] Zero size UDP packets close sockets
Comment on attachment 8823825 [details]
Bug 1317946: allow receiving zero size UDP messages.

https://reviewboard.mozilla.org/r/102312/#review103282

lgtm. BTW can you add a test for it in netwerk/test/unit/test_udpsocket.js and dom/network/tests/test_udpsocket.html?
Attachment #8823825 - Flags: review?(schien) → review+
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #41)
> lgtm. BTW can you add a test for it in netwerk/test/unit/test_udpsocket.js

Done. See updated patches.

> and dom/network/tests/test_udpsocket.html?

I created a test but it just hangs. My current guess is that maybe because this actually goes through DOM the empty message actually never makes it to nsUDPSocket.cpp.

@schien: do you think getting the empty message working from mochitest/DOM level is needed?
Flags: needinfo?(schien)
Comment on attachment 8824617 [details]
Bug 1317946: allow sending of zero length buffers.

https://reviewboard.mozilla.org/r/103038/#review104042

::: netwerk/base/nsUDPSocket.cpp
(Diff revision 1)
>  NS_IMETHODIMP
>  nsUDPSocket::Send(const nsACString &aHost, uint16_t aPort,
>                    const uint8_t *aData, uint32_t aDataLength,
>                    uint32_t *_retval)
>  {
> -  NS_ENSURE_ARG(aData);

How about `if (!aData && !aDataLength) { return NS_ERROR_INVALID_ARG; }`?
Attachment #8824617 - Flags: review?(schien)
Comment on attachment 8824618 [details]
Bug 1317946: unit test for zero length UDP messages.

https://reviewboard.mozilla.org/r/103040/#review104044

lgtm
Attachment #8824618 - Flags: review?(schien) → review+
Comment on attachment 8824619 [details]
Bug 1317946: mochitest for zero size messages

https://reviewboard.mozilla.org/r/103042/#review104048

lgtm

::: dom/network/tests/test_udpsocket.html:130
(Diff revision 1)
> +    info('setting up listener');
> +    socket.addEventListener('message', function recv_callback(msg) {
> +      info('received message');
> +      socket.removeEventListener('message', recv_callback);
> +      is(msg.remotePort, socket.localPort, 'expected packet from ' + socket.localPort);
> +      //ok(is_same_buffer(msg.data, EMPTY_DATA_ARRAY_BUFFER), 'expected same buffer data');

remember to uncomment before land it.
Attachment #8824619 - Flags: review+
(In reply to Nils Ohlmeier [:drno] from comment #46)
> (In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment
> #41)
> > and dom/network/tests/test_udpsocket.html?
> 
> I created a test but it just hangs. My current guess is that maybe because
> this actually goes through DOM the empty message actually never makes it to
> nsUDPSocket.cpp.
> 
> @schien: do you think getting the empty message working from mochitest/DOM
> level is needed?

UDPSocket WebAPI is implemented based on `nsUDPSocket::SendBinaryStream`. If we want to allow nsUDPSocket::Send accepting empty array, I'll prefer to align this behavior in `SendBinaryStream`.

I guess it will take some time to implement sending 0-size stream. Feel free to file a follow-up bug if send-empty-array is blocking your other works (and add comment in IDL to note the difference between send and sendBinaryStream).
Flags: needinfo?(schien)
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #50)
> UDPSocket WebAPI is implemented based on `nsUDPSocket::SendBinaryStream`. If
> we want to allow nsUDPSocket::Send accepting empty array, I'll prefer to
> align this behavior in `SendBinaryStream`.
> 
> I guess it will take some time to implement sending 0-size stream. Feel free
> to file a follow-up bug if send-empty-array is blocking your other works
> (and add comment in IDL to note the difference between send and
> sendBinaryStream).

No we currently have no dependency on being able to write 0-size message into streams. So I'll just abandon the mochitest (which I had only commit to show what I had done to try to get it working).
Attachment #8824619 - Attachment is obsolete: true
Comment on attachment 8824617 [details]
Bug 1317946: allow sending of zero length buffers.

https://reviewboard.mozilla.org/r/103038/#review104042

> How about `if (!aData && !aDataLength) { return NS_ERROR_INVALID_ARG; }`?

(!aData && !aDataLenght) would still reject the empty UDP message. But the new revision still checks for the remaining errors beside intentionally empty packets.
Comment on attachment 8824617 [details]
Bug 1317946: allow sending of zero length buffers.

https://reviewboard.mozilla.org/r/103038/#review105058

::: netwerk/base/nsUDPSocket.cpp:1192
(Diff revision 2)
>                    uint32_t *_retval)
>  {
> -  NS_ENSURE_ARG(aData);
>    NS_ENSURE_ARG_POINTER(_retval);
> +  if ((!aData && aDataLength) ||
> +      (aData && !aDataLength)) {

How about
```
if (!((aData && aDataLength > 0) ||
      (!aData && !aDataLength)))
````
, if you also think it is more readable.
Attachment #8824617 - Flags: review?(schien) → review+
Looks like Win 7 has problems receiving 0-size UDP packets. I'll have to double check on a real machine next week.
Depends on: 1331810
Since I identified that the xpcshell test on Windows in blocked by a bug in NSPR (bug 1331810) the question is now: wait for that bug to be resolved before this can be fixed and landed?

Or is there a way to disable an xpcshell test for Windows only? If that is the case we could land this with the test being disable on Windows (the bug reporter confirmed already that it fixes the actual problem on Windows) and enable only the test part once bug 1331810 is fixed.

Opinions?
Flags: needinfo?(schien)
Flags: needinfo?(mcmanus)
I'll prefer to fix NSPR bug first, otherwise we'll have different behavior/restriction on windows. This will make the user of nsUDPSocket/UDPSocketChild more confused.

Do you have any WIP patch that can fix or workaround that NSPR bug?
Flags: needinfo?(schien) → needinfo?(drno)
I'm not saying you should, but you can disable a test for windows with skip-if = os == "win" in xcpshell.ini

It doesn't sound like the nspr patch is hard, is it? Normally ted will let it land in m-c and handle upstreaming/release of nspr async so that things proceed quickly enough. NI him is a good policy.
Flags: needinfo?(mcmanus)
Yeah I provided a patch for NSPR today. Lets see how quickly Ted gets to review it.

Looking at the Windows NSPR code actually made me realize that it has more/other problems (bug 1332065), but luckily these are not blocking for this problem here.
Flags: needinfo?(drno)
Pushed by drno@ohlmeier.org:
https://hg.mozilla.org/integration/autoland/rev/c38320df0f33
allow receiving zero size UDP messages. r=schien
https://hg.mozilla.org/integration/autoland/rev/8de1cf1be14b
allow sending of zero length buffers. r=schien
https://hg.mozilla.org/integration/autoland/rev/7cc0f50b9bd5
unit test for zero length UDP messages. r=schien
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: