Closed Bug 1118398 Opened 5 years ago Closed 5 years ago

Datachannels don't fire onclose event always

Categories

(Core :: WebRTC: Networking, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox37 --- fixed
firefox38 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: drno, Assigned: mt)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-needed)

Attachments

(2 files, 2 obsolete files)

As Byron pointed out in bug 1112382 we had since a long time a wrong check if the data channels have been closed properly or not.

After fixing that one line it turns out that the same test case (test_dataChannels_basicAudioVideo.html) works fine on regular desktop runs, but in case of IPC calling datachannel.close() only moves the datachannel into the closing state where it sit and waits (for ever?). This results in the test case timeout to kick in and close the data channel from the far, which then results in the local data channel to finally switch to the closed state.
Blocks: 1115769
Blocks: 1118283
I'm seeing this every time now on my local machine.  The remote onclose fires, but the local onclose never does.
Note that this was originally limited to IPC, that is not the case here.
Summary: IPC Datachannels never switch from closing to closed → Datachannels don't fire onclose event always
This fixes the issue of missing onclose events for me when I execute:
./mach mochitest dom/media/tests/mochitest/ipc/
Attachment #8546831 - Flags: feedback?(tuexen)
Comment on attachment 8546831 [details] [diff] [review]
bug_1118398__fire_onclose_for_closing_datachannels.patch

Review of attachment 8546831 [details] [diff] [review]:
-----------------------------------------------------------------

I assume that you want to remove all this logging stuff.

::: dom/media/tests/mochitest/ipc/ipc.json
@@ -1,4 @@
>  {
>  "runtests":{
> -    "dom/media/tests/mochitest/test_getUserMedia_basicVideoAudio.html":"included",
> -    "dom/media/tests/mochitest/test_peerConnection_basicAudioVideo.html":"included",

Revert this too.

::: netwerk/sctp/datachannel/DataChannel.cpp
@@ +1772,5 @@
>                          channel->mState == DataChannel::WAITING_TO_OPEN);
>            if (channel->mState == DataChannel::OPEN ||
>                channel->mState == DataChannel::WAITING_TO_OPEN) {
>              ResetOutgoingStream(channel->mStream);
> +            //SendOutgoingStreamReset();

Since this is called below, only conditional on there being outstanding resets, I think that this is a good change.

@@ +1776,5 @@
> +            //SendOutgoingStreamReset();
> +          }
> +          if (channel->mState == DataChannel::OPEN ||
> +              channel->mState == DataChannel::WAITING_TO_OPEN ||
> +              channel->mState == DataChannel::CLOSING) {

if (channel->mState != DataChannel::CONNECTING) {

Though thinking about this more, I don't think that this needs to be conditional at all.  I'm not confident about the protocol invariants though: maybe this would risk the event being dispatched from other places in the code as well.
Yeah this is patch is not meant to be landing. I left the ton of debug messages in there so it would match up with this log:

331366400[10cf1d5d0]: Connection 120423ac0: n = 1
331366400[10cf1d5d0]: Incoming: Channel 1 on connection 120423ac0 closed, state 1
331366400[10cf1d5d0]: Connection 120423ac0: Resetting outgoing stream 1
331366400[10cf1d5d0]: Disconnected DataChannel 1243d0560 from connection 120423ac0
331366400[10cf1d5d0]: Destroying Data channel 1
331366400[10cf1d5d0]: Connection 120423ac0: Sending 1 pending resets
331366400[10cf1d5d0]: Connection 120423ac0: Sending outgoing stream reset for 1 streams
1993302784[10cf1c070]: 1255aac80(1243d0560): OnChannelClosed - Dispatching
530 INFO DataChannelWrapper (pcRemote_channel_0): 'onclose' event fired
531 INFO TEST-PASS | dom/media/tests/mochitest/ipc/test_ipc.html | test_dataChannel_basicAudioVideo.html | remoteChannel is in state 'closed' 
532 INFO Still waiting for DataChannel closure
331366400[10cf1d5d0]: Connection 10cf8b2e0: n = 1
331366400[10cf1d5d0]: Incoming: Channel 1 on connection 10cf8b2e0 closed, state 2
331366400[10cf1d5d0]: Disconnected DataChannel 120a696a0 from connection 10cf8b2e0
331366400[10cf1d5d0]: Destroying Data channel 1
331366400[10cf1d5d0]: Connection 10cf8b2e0: Sending 2 pending resets
331366400[10cf1d5d0]: Connection 10cf8b2e0: Sending outgoing stream reset for 2 streams
331366400[10cf1d5d0]: ***failed: setsockopt RESET, errno 37
331366400[10cf1d5d0]: Don't worry it was just an EALREADY
1993302784[10cf1c070]: 120483000(120a696a0): OnChannelClosed - Dispatching
331366400[10cf1d5d0]: Connection 10cf8b2e0: n = 1
331366400[10cf1d5d0]: Connection 10cf8b2e0: not a reset incoming ssn
331366400[10cf1d5d0]: Connection 10cf8b2e0: Sending 2 pending resets
331366400[10cf1d5d0]: Connection 10cf8b2e0: Sending outgoing stream reset for 2 streams

So what worries me is that the SendOutgoingStreamReset() call at the end of HandleStreamResetEvent()
https://dxr.mozilla.org/mozilla-central/source/netwerk/sctp/datachannel/DataChannel.cpp#1790
always seem to get the EALREADY error.
And only because HandleStreamResetEvent() seems to get called with another reset, but no reset_incoming_ssn this time, the pending outgoing resets get successfully flushed. Is that second reset message the confirmation for the first outgoing reset (not shown in this log) which then unblocks the setsockopt?
Flags: needinfo?(tuexen)
Attached file MozReview Request: bz://1118398/mt (obsolete) —
Attachment #8546857 - Flags: review?(rjesup)
/r/2317 - Bug 1118398 - Dispatch data channel onclose unconditionally on reset

Pull down this commit:

hg pull review -r 75793576bba525bed4df766be4b4e49f1b20759f
Attachment #8546857 - Flags: review?(rjesup) → review+
Should the onclose event only be fired if the peer called close() or is it also fired if close() was called
locally? If it is only called when the peer initiates the closing, that the patch look right to me.
Flags: needinfo?(tuexen)
(In reply to Michael Tüxen from comment #9)
> Should the onclose event only be fired if the peer called close() or is it
> also fired if close() was called
> locally? If it is only called when the peer initiates the closing, that the
> patch look right to me.

Well, the consequence of the change is that the side that calls close() gets the onclose event.  That is what the spec says should happen (and that is a general thing across web APIs, even though you initiate an action, you still get an event for the corresponding state change).

The problem previously was that the event was being suppressed if you called close(), since the readyState was "closing" then.

Now I looked again, and it seems like there might be some more funny business going on here.  Reading RFC 6525, there are two types of message, but we are only paying calling onclose when the peer asks us to stop sending.  If we issue the outgoing reset, is that triggering a reciprocal incoming reset for the same stream?  Otherwise, I don't understand how this is working.

The Re-configuration Response should be all that we see in response to an outgoing SSN reset, but that won't trigger onclose.
(In reply to Martin Thomson [:mt] from comment #11)
> Now I looked again, and it seems like there might be some more funny
> business going on here.  Reading RFC 6525, there are two types of message,
> but we are only paying calling onclose when the peer asks us to stop
> sending.  If we issue the outgoing reset, is that triggering a reciprocal
> incoming reset for the same stream?  Otherwise, I don't understand how this
> is working.

My understanding is that this here:
https://dxr.mozilla.org/mozilla-central/source/netwerk/sctp/datachannel/DataChannel.cpp#1768
Basically takes the Reset from the peer for the incoming stream and sends out the matching reset for your own outgoing stream.

What I don't understand is where the outgoing reset gets "converted" into the incoming reset. Because so far I see code for handling incoming resets and code for sending outgoing resets. But the matching vice versa code does not seem to be there. Is the SCTP stack swapping the direction for us so we don't have to mess with that?
 
> The Re-configuration Response should be all that we see in response to an
> outgoing SSN reset, but that won't trigger onclose.

If I understand it correctly the confirmation for one close can be delivered in the reset request for your own outgoing reset request.
(In reply to Martin Thomson [:mt] from comment #11)
> (In reply to Michael Tüxen from comment #9)
> > Should the onclose event only be fired if the peer called close() or is it
> > also fired if close() was called
> > locally? If it is only called when the peer initiates the closing, that the
> > patch look right to me.
> 
> Well, the consequence of the change is that the side that calls close() gets
> the onclose event.  That is what the spec says should happen (and that is a
> general thing across web APIs, even though you initiate an action, you still
> get an event for the corresponding state change).
> 
> The problem previously was that the event was being suppressed if you called
> close(), since the readyState was "closing" then.
> 
> Now I looked again, and it seems like there might be some more funny
> business going on here.  Reading RFC 6525, there are two types of message,
> but we are only paying calling onclose when the peer asks us to stop
> sending.  If we issue the outgoing reset, is that triggering a reciprocal
> incoming reset for the same stream?  Otherwise, I don't understand how this
> is working.
> 
> The Re-configuration Response should be all that we see in response to an
> outgoing SSN reset, but that won't trigger onclose.

I would like to understand first, what happens at the JS layer. Then we can
see how to get that from the events provided by the SCTP layer.

Do I understand you correctly, that the onclose() event is fired on both sides
of the data channel?

Is is correct, that no onmessage() event is fired after an onclose() event got
fired?

Is the JS app expected to be able to call successfully the send() method
anytime before it called the close() method or before the onclose() method
has fired?

In other words: What is the semantic of the onclose() event? How does it
relate to sending and receiving of messages?

From reading the spec this is unclear to me...
Comment on attachment 8546831 [details] [diff] [review]
bug_1118398__fire_onclose_for_closing_datachannels.patch

Review of attachment 8546831 [details] [diff] [review]:
-----------------------------------------------------------------

I would like to understand the semantics of the onclose event before implementing it.

Assuming both sides get the onclose() event and also assuming that it is an indication
that you will not receive data anymore (so no more onmessage() events) and that you are
not allowed to send anymore.

So is the following correct:

Side A calls the close() method. From now on, calling the send method will fail. However,
messages can still be received. The outgoing stream will be reset. Once we get the incoming
stream reset request, we will fire onclose().

Side B receives the request for the peer to reset it's incoming stream. It will fire the
onclose() event and no more send() calls are possible. It will send a reset request for
the corresponding outgoing stream.

Is this semantic correct?

The patch contains a lot of changes related to debugging.
Attachment #8546831 - Flags: feedback?(tuexen) → feedback-
(In reply to Michael Tüxen from comment #14)
> Comment on attachment 8546831 [details] [diff] [review]
> bug_1118398__fire_onclose_for_closing_datachannels.patch
> 
> Review of attachment 8546831 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I would like to understand the semantics of the onclose event before
> implementing it.
> 
> Assuming both sides get the onclose() event and also assuming that it is an
> indication
> that you will not receive data anymore (so no more onmessage() events) and
> that you are
> not allowed to send anymore.

You're not allowed to send() once the closing procedure has started.  onmessage is badly under-specified; it's unclear when you can receive onmessage.  Nothing states it can't be received in the closed state, for example.

> 
> So is the following correct:
> 
> Side A calls the close() method. From now on, calling the send method will
> fail. However,
> messages can still be received.

It appears so, though it's not specified.

> The outgoing stream will be reset. Once we get the incoming
> stream reset request, we will fire onclose().

That would be consistent with the above assertion.  If close() is speced to stop onmessage, then we could transition to 'closed' immediately and fire onclose.  However, right now it appears that onmessage in the closing state would be valid, so waiting for the mirrored reset from the peer makes sense.
 
> Side B receives the request for the peer to reset it's incoming stream. It
> will fire the
> onclose() event and no more send() calls are possible. It will send a reset
> request for
> the corresponding outgoing stream.
> 
> Is this semantic correct?

This is consistent with the assumption that onmessage is valid until we transition to closed and fire onclose.

> The patch contains a lot of changes related to debugging.
I think that I can summarize this issue more succinctly:

_If a channel transitions to the closed state for any reason, then the onclose event MUST be fired._

The points about what can be sent and received in various states is relevant, and Randell has it right: if you ask to close, you should not be able to send.  Until this is acknowledged, you should expect to be able to receive messages.

I don't think that we should be asking for the suppression of onmessage in the closing state.  However, there are corner cases: what about a message that is in transit when a close request appears (either from JS or from the remote peer).  Do we wait until the message is sent?  But then, what if the packet is lost?  Do we wait until the message is acknowledged (if that is the case)?

I don't think that this is a hugely critical detail, but my expectation would be that a message sent on a reliable stream would be delivered if the state of the stream was not closed at the point that it was sent.  The -data-channel draft implies that RFC6525 guarantees that this is what happens, but my brief reading of 6525 couldn't confirm it.

>  [RFC6525] also guarantees that all the messages are delivered (or
>  abandoned) before the stream is reset.
Blocks: 1119593
Michael, what do you think about attachment 8546857 [details] in light of comment 16?  That certainly seems to make the onclose fire appropriately.  Empirical evidence being the only evidence that matters when it comes to tests.  I want to land something that will allow tests to pass; and even if this isn't the *right* change, it's certainly an improvement.

BTW, :jesup, we probably need to submit a PR on the spec to get those holes patched.
Attachment #8546857 - Flags: feedback?(tuexen)
Attachment #8546831 - Attachment is obsolete: true
Comment on attachment 8546857 [details]
MozReview Request: bz://1118398/mt

Looks good.
Attachment #8546857 - Flags: feedback?(tuexen) → feedback+
(In reply to Martin Thomson [:mt] from comment #17)
> Michael, what do you think about attachment 8546857 [details] in light of
> comment 16?  That certainly seems to make the onclose fire appropriately. 
> Empirical evidence being the only evidence that matters when it comes to
> tests.  I want to land something that will allow tests to pass; and even if
> this isn't the *right* change, it's certainly an improvement.
I agree.
> 
> BTW, :jesup, we probably need to submit a PR on the spec to get those holes
> patched.
That would be very good.
(In reply to Martin Thomson [:mt] from comment #16)
> I think that I can summarize this issue more succinctly:
> 
> _If a channel transitions to the closed state for any reason, then the
> onclose event MUST be fired._
> 
> The points about what can be sent and received in various states is
> relevant, and Randell has it right: if you ask to close, you should not be
> able to send.  Until this is acknowledged, you should expect to be able to
> receive messages.
> 
> I don't think that we should be asking for the suppression of onmessage in
> the closing state.  However, there are corner cases: what about a message
> that is in transit when a close request appears (either from JS or from the
> remote peer).  Do we wait until the message is sent?  But then, what if the
> packet is lost?  Do we wait until the message is acknowledged (if that is
> the case)?
> 
> I don't think that this is a hugely critical detail, but my expectation
> would be that a message sent on a reliable stream would be delivered if the
> state of the stream was not closed at the point that it was sent.  The
> -data-channel draft implies that RFC6525 guarantees that this is what
> happens, but my brief reading of 6525 couldn't confirm it.
At least this is what is intended... On the sender side you stop assigning new
SSNs for a stream, which is reset, as described in A1 of
https://tools.ietf.org/html/rfc6525#section-5.1.2
On the receiver side you must buffer to deliver them correctly as described in E2 of
https://tools.ietf.org/html/rfc6525#section-5.2.2

The corresponding notifications should be provided in a way the the reset notification
is indication that the SSN starts from zero again.

I do know that the SCTP implementation has a bug related to this and I hope to fix it soon...

Best regards
Michael
> 
> >  [RFC6525] also guarantees that all the messages are delivered (or
> >  abandoned) before the stream is reset.
Keywords: checkin-needed
The last Try run I see in this bug from comment 10 shows test failures, no?
Keywords: checkin-needed
Quite right; my bad.  I think that I will have to land this coincident with the test fixes in bug 1119593.
Landed coincident with the test fixes this needs from bug 1119593:

https://hg.mozilla.org/integration/mozilla-inbound/rev/5970fced4222
https://hg.mozilla.org/mozilla-central/rev/47f17e4219ea
Assignee: nobody → martin.thomson
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
We're going to need this (and supporting patches) on Aurora as well. Martin, can you please nominate them for aurora approval when you're reasonably confident that they're baked enough to do so? :)
Flags: needinfo?(martin.thomson)
Happy to do so.  This turns the tests perma-orange without bug 1119593 as well, but that should also remove a swathe of other infrequent oranges that I discovered while doing the work.  I'll flag both together and let you manage the landing timeframe.  You probably have a better view of whether there are more or less failures now, if there is any info at all.
Flags: needinfo?(martin.thomson)
Approval Request Comment
[Feature/regressing bug #]: 1118283
[User impact if declined]: none, tests only
[Describe test coverage new/current, TreeHerder]: lots
[Risks and why]: this removes a persistent orange, but will perma-orange if not taken with bug 1119593
[String/UUID change made/needed]: none

As landed.  Don't forget bug 1119593 patches (all 11 of them) at the same time.  Best to land -r 47f17e4219ea::7f03823a577e from inbound instead I think.
Attachment #8556879 - Flags: review+
Attachment #8556879 - Flags: approval-mozilla-aurora?
Comment on attachment 8556879 [details] [diff] [review]
01 Bug_1118398___Dispatch_data_channel_onclose_unconditionally_on_reset__r_jesup.patch

Small fix for a test failure. Aurora+
Attachment #8556879 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Flags: in-testsuite+
Attachment #8546857 - Attachment is obsolete: true
Attachment #8619051 - Flags: review+
I need to check to see what shipped versions of Firefox had this issue, if any, and note them in developer documentation. Marking as doc update needed.
Keywords: dev-doc-needed
Blocks: 1050930
You need to log in before you can comment on or make changes to this bug.