Closed
Bug 842283
Opened 12 years ago
Closed 12 years ago
data over reliable data channel drops
Categories
(Core :: WebRTC: Networking, defect, P2)
Tracking
()
RESOLVED
DUPLICATE
of bug 896228
People
(Reporter: shacharz, Assigned: jesup)
Details
(Whiteboard: [webrtc][blocking-webrtc-])
Attachments
(1 file)
|
3.76 MB,
application/octet-stream
|
Details |
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.84 Safari/537.22
Steps to reproduce:
Created a test page: sharefest.peer5.com
where a reliable dataChannel is created, and then peer1 asks from peer2 for chunks of the file he has, to which peer2 replies by sending the chunks of data
Actual results:
sometime those chunks of data aren't received
Expected results:
chunks of data over reliable should "always" be received
-always is under considerable conditions, ofcourse if the connection to the other peer is lost than there could be packet loss.
Updated•12 years ago
|
Component: Untriaged → WebRTC: Networking
Product: Firefox → Core
QA Contact: jsmith
Version: 21 Branch → Trunk
| Assignee | ||
Comment 1•12 years ago
|
||
shachar - can you verify this is seen in a build made after but 837103 was fixed? It would have caused these exact symptoms. It landed in m-c on Feb 6, and would have been in the Feb 7 nightly. Changeset https://hg.mozilla.org/mozilla-central/rev/0383bb82c925
Flags: needinfo?(shacharz)
| Assignee | ||
Comment 2•12 years ago
|
||
bug 837103 of course...
yes,
and although it is more likely to happen between 2 computers and over wireless
it sometimes happen in localhost.
Flags: needinfo?(shacharz)
| Assignee | ||
Comment 4•12 years ago
|
||
Ok, thanks.
Can you try a debug build, and set NSPR_LOG_MODULES=datachannel:5,sctp:5 and NSPR_LOG_FILE=whatever and then attach the output here?
Assignee: nobody → rjesup
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P2
Whiteboard: [webrtc][blocking-webrtc+]
Comment 5•12 years ago
|
||
Shachar,
can you be a bit more specific how to reproduce the problem? Just try one file?
Which size? Anything else?
Best regards
Michael
Michael: if you're using sharefest, then yes share 1 file (e.g 20MB), open a new tab with the dynamic link created. And notice that in the receiver's console you'll see "expire.." that happens when the chunk doesn't reach its destination after 1 second.
Comment 7•12 years ago
|
||
Shachar: I can reproduce the problem you are describing. However, it doesn't be related to unreliable transfer. The Wireshark trace shows no indication that PR-SCTP is being used.
What I see is the usage of two datachannels, both transferring strings, not binary data. On one data channel you are transferring small strings, about 400 bytes. After the transmission on the other data channel stops, these messages are sent every 2 seconds. So do you have an app timer running? The other datachannel transfers messages of size about 6800 bytes. After some number of messages (2730 in my case),
no further messages are sent on that data channel. So something is missing. The SCTP trace looks fine, so I think the problem is not within the SCTP stack.
I do see datachannel log entries like:
DataChannelOnMessageAvailable (5) with null Listener!
I have to look up what this means... And need to double check the code path for DOMSTRING.
Comment 8•12 years ago
|
||
Sharchar: BTW: Are you closing data channels?
Best regards
Michael
(In reply to Michael Tüxen from comment #7)
> Shachar: I can reproduce the problem you are describing. However, it doesn't
> be related to unreliable transfer. The Wireshark trace shows no indication
> that PR-SCTP is being used.
> What I see is the usage of two datachannels, both transferring strings, not
> binary data. On one data channel you are transferring small strings, about
> 400 bytes. After the transmission on the other data channel stops, these
> messages are sent every 2 seconds. So do you have an app timer running? The
> other datachannel transfers messages of size about 6800 bytes. After some
> number of messages (2730 in my case),
> no further messages are sent on that data channel. So something is missing.
> The SCTP trace looks fine, so I think the problem is not within the SCTP
> stack.
>
> I do see datachannel log entries like:
> DataChannelOnMessageAvailable (5) with null Listener!
> I have to look up what this means... And need to double check the code path
> for DOMSTRING.
I guess you're seeing 2 data channels, because you're running both tabs in the same computer, so 1 for each.
1: "the receiver" is sending chunk requests - probably the small messages
2: "the sender" is sending the chunk data - probably the larger ones
the app doesn't exactly run a timer, it requests more chunks once the earlier chunks arrived, and if they don't arrive after a while (1 second currently) there's a timer to "Expire" the chunks
(In reply to Michael Tüxen from comment #8)
> Sharchar: BTW: Are you closing data channels?
>
> Best regards
> Michael
the data channels aren't closed currently
I'm not sure if this is the same bug, but I could reproduce the phenomenon between 2 win7-64bit computers (over wireless) with file size > 50KB in this demo:
http://masweb.ics.es.osaka-u.ac.jp/~k-nkgwj/webrtc/test/multihost-datachannel/
Comment 10•12 years ago
|
||
(In reply to Shachar from comment #9)
> (In reply to Michael Tüxen from comment #7)
> > Shachar: I can reproduce the problem you are describing. However, it doesn't
> > be related to unreliable transfer. The Wireshark trace shows no indication
> > that PR-SCTP is being used.
> > What I see is the usage of two datachannels, both transferring strings, not
> > binary data. On one data channel you are transferring small strings, about
> > 400 bytes. After the transmission on the other data channel stops, these
> > messages are sent every 2 seconds. So do you have an app timer running? The
> > other datachannel transfers messages of size about 6800 bytes. After some
> > number of messages (2730 in my case),
> > no further messages are sent on that data channel. So something is missing.
> > The SCTP trace looks fine, so I think the problem is not within the SCTP
> > stack.
> >
> > I do see datachannel log entries like:
> > DataChannelOnMessageAvailable (5) with null Listener!
> > I have to look up what this means... And need to double check the code path
> > for DOMSTRING.
> I guess you're seeing 2 data channels, because you're running both tabs in
> the same computer, so 1 for each.
> 1: "the receiver" is sending chunk requests - probably the small messages
> 2: "the sender" is sending the chunk data - probably the larger ones
Correct. Thanks for the clarification. The tracefile shows this, I can
understand this now.
>
> the app doesn't exactly run a timer, it requests more chunks once the
> earlier chunks arrived, and if they don't arrive after a while (1 second
> currently) there's a timer to "Expire" the chunks
>
> (In reply to Michael Tüxen from comment #8)
> > Sharchar: BTW: Are you closing data channels?
> >
> > Best regards
> > Michael
> the data channels aren't closed currently
Great! I know that we have a bug related to closing...
>
> I'm not sure if this is the same bug, but I could reproduce the phenomenon
> between 2 win7-64bit computers (over wireless) with file size > 50KB in this
> demo:
> http://masweb.ics.es.osaka-u.ac.jp/~k-nkgwj/webrtc/test/multihost-
> datachannel/
Just to double check: If (for whatever reason) the small chunk request wouldn't
be received anymore by your application, you wouldn't send the large ones anymore,
right? My current guess is, that somehow messages received by the SCTP stack aren't delivered to the JS application anymore at some point. Not sure. Just a guess, but this would make sense from looking at the wireshark tracefile and you explanations.
Best regards
Michael
| Reporter | ||
Comment 11•12 years ago
|
||
(In reply to Michael Tüxen from comment #10)
> Just to double check: If (for whatever reason) the small chunk request
> wouldn't
> be received anymore by your application, you wouldn't send the large ones
> anymore,
> right?
that's correct
Comment 12•12 years ago
|
||
OK, the logfile shows:
1961159008[100469660]: DataChannelOnMessageAvailable (5) with null Listener!
This means that during the transfer,
mChannel->mListener
gets NULL. I don't know why. But once that happens, no messages will be delivered
to the JS layer and you observe the behavior you are experiencing.
Randell: Any idea why the mListener gets NULL?
| Assignee | ||
Comment 13•12 years ago
|
||
mListener should be set to NULL only if the DOM object went away and was garbage-collected. Assign it to a var. There's an open bug to implement the WebSockets behavior of it not being GC'd if there's still an active listener in it.
Comment 14•12 years ago
|
||
Shachar: Can you try what Randell suggested and report if the behavior changes?
If it does, could you also retry
http://masweb.ics.es.osaka-u.ac.jp/~k-nkgwj/webrtc/test/multihost-datachannel/
Thanks a lot!
Best regards
Michael
Comment 15•12 years ago
|
||
Shachar: Retesting today with sharefest showed no problem in contrast to testing before. Did you change anything on the JS side?
Best regards
Michael
| Reporter | ||
Comment 16•12 years ago
|
||
Yea I can't reproduce it in localhost anymore either,
There are racing conditions, where more than one dataChannel is being created, so although the other DC are not used, I saved a pointer to them. (maybe that's what solved it?). I can still reproduce the problem between 2 different computers (tried on wireless)
(In reply to Michael Tüxen from comment #15)
> Shachar: Retesting today with sharefest showed no problem in contrast to
> testing before. Did you change anything on the JS side?
>
> Best regards
> Michael
Comment 17•12 years ago
|
||
Did some testing between Mac OS X, Windows 7 and Linux Ubuntu with Firefox Nightly.
I copied a 100MB file over WLAN from each platform and it worked fine.
Any idea what I can do to reproduce it?
Best regards
Michael
Comment 18•12 years ago
|
||
Shachar, can you provide steps to reproduce? Thanks.
Flags: needinfo?(shacharz)
| Reporter | ||
Comment 19•12 years ago
|
||
Sorry for taking so long.
try using a 90MB file (there's a 100MB or so limitation on sharefest right now)
between 2 computers: win7 64bit, to win7 64bit
over wireless both (I can reproduce it with both computers in the same wifi).
This scenario leads to data drop both on both demo pages mentioned above.
(In reply to Maire Reavy [:mreavy] from comment #18)
> Shachar, can you provide steps to reproduce? Thanks.
Flags: needinfo?(shacharz)
Comment 20•12 years ago
|
||
Can you provide the logging as described in comment 4?
That allows to figure out what is going on...
Best regards
Michael
Comment 21•12 years ago
|
||
I just tested transferring a 100MB file over the WLAN at the IETF and it worked fine. However, this was done between two Mac OS X machines, not Windows...
Best regards
Michael
| Reporter | ||
Comment 22•12 years ago
|
||
Attachment of the logs from the following scenario:
Using the sharefest.peer5.com application
Sender (win7 64bit, connected wirely) sends a 5MB file to
Receiver (win7 64bit, connected wirelessly)
Updated•12 years ago
|
Whiteboard: [webrtc][blocking-webrtc+] → [webrtc][blocking-webrtc-]
Comment 23•12 years ago
|
||
I have looked at the traces. At some point of time, both sides do not receive
any SCTP packets anymore. I seems like the receive threads a somehow blocked or
packets are not received anymore. This happens on both sides! However, there is
no indication why this happens.
This problem is strange, since I can't reproduce it (I really tried hard), I even
used a Windows VM. So there must be something specific to your test setup that
I didn't have. No idea what it could be... Do you have NAT boxes between the sender
and receiver?
| Assignee | ||
Comment 24•12 years ago
|
||
It there any chance a NAT rebooted, or an IP address changed at one end, or one end lost connectivity, etc?
| Reporter | ||
Comment 25•12 years ago
|
||
Don't think so,
It's 2 computers connected to the same router, 1 wired and 1 wireless.
| Reporter | ||
Comment 26•12 years ago
|
||
Ok, I seem to narrowed down the problem, and I got it work now (updated in sharefest.me so it'll be harder to reproduce there).
Scenario 1 (works):
receiver: request 1 chunk of data
sender: dc.send() the requested chunk
...
and so on untill receiver has the entire file.
Scenario 2 (doesn't work):
receiver: request 100 chunk of data
sender: dc.send() requested chunks of data 1 after the other.
...
and so on untill receiver has the entire file.
In scenario 2 the connection immediately drops and stops sending.
(only reproduced when the sender is windows7, couldn't reproduce when sender is OSX)
Also, it happens both in reliable and unreliable DC.
| Assignee | ||
Comment 27•12 years ago
|
||
Aha. Please get a log with datachannel:5,sctp:5 !!
I suspect a socket-buffer-overflow is aborting the association on windows
| Reporter | ||
Comment 28•12 years ago
|
||
I thought that's what I did (in the attachment)
(In reply to Randell Jesup [:jesup] from comment #27)
> Aha. Please get a log with datachannel:5,sctp:5 !!
>
> I suspect a socket-buffer-overflow is aborting the association on windows
| Assignee | ||
Comment 29•12 years ago
|
||
I believe we have fixed this now, and the fix will be in FF 23. Please verify if possible. Thanks!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•