Sending data over an RTCDataChannel sometimes fails for an operation-specific reason
Categories
(Core :: WebRTC, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox146 | --- | fixed |
People
(Reporter: alex, Assigned: bwc)
References
(Blocks 1 open bug)
Details
Attachments
(11 files)
|
26.49 KB,
text/plain
|
Details | |
|
5.90 KB,
text/html
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36
Steps to reproduce:
I create two RTCPeerConnection objects in the same browser process. The initiating peer creates a datachannel, they exchange SDP offers/answers/ICE candidates until both have a connectionState of "connected".
The initiating peer closes the first datachannel and opens a second.
The receiving peer stores a reference to the second incoming datachannel to ensure it is not garbage collected and registers a "message" event listener.
The initiating peer waits for the channel's readyState to be "open" and writes some data into it.
Actual results:
Calling .send intermittently throws an "DOMException" with the error message "The operation failed for an operation-specific reason" and a .code of 0.
The "close" event fires on the datachannel shortly afterwards though the peer connection retains a connectionState of "connected".
Expected results:
The .send method should not throw and the remote peer should receive the data.
The same code works in Chrome without any errors.
The connection log from about:webrtc during the test run is attached.
From what I can see, I have two peer connections - the receiver d1ae5ae4-616c-4119-8fde-345dc93878fc and the initiator - 97cd2ee9-b0e9-47ec-9932-5a307b33ca3a though I can't see any obvious errors in the log though maybe I'm missing something.
Is there a way to find out what the "operation-specific" reason was?
Comment 1•4 months ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::WebRTC' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 2•4 months ago
|
||
Thanks for the report! That error (in 140) comes from the SendDataMsgCommon return value being anything other than 0 or EMSGSIZE. Looking at Nightly, the plumbing has shifted a bit here but I think the same logic applies. Might be worth a try in Nightly, though.
If Nightly exhibits the same behavior, I think the easiest way forward would be if you could provide a minimal example for reproducing this, that we can debug. Please attach a html file that shows the issue, or link a jsfiddle or similar to the same effect.
I'm trying to isolate the code into a runnable example, but it's proving very hard to trigger the same behaviour without the WebRTC code running in the context of a larger application.
I think the problem may be that the datachannel IDs become eligible for reuse immediately after .close is called on a channel?
What seems to be happening is that one side opens a datachannel, waits for the 'open' event, writes several pieces of data into it and closes it. It then does this again immediately.
Could it be possible that remote sends confirmation of the first closure while the local is in the middle of writing into the second channel (which has the same ID as the first)?
That is, if I wait for the 'close' event after closing the first channel but before opening a new channel, the "operation-specific error" seems to occur less frequently.
Sets up initiator and receiver peer connections.
The receiver listens for incoming datachannels. When one is opened it waits for the first message event, then echos the received data back to the sender and closes the channel.
The initiator opens a channel, sends a message and closes the channel. It then opens a second channel, sends a message and waits for the receiver to close the channel.
It does this in a loop until an error occurs.
In Firefox Nightly 144.0a1 this runs a couple of times then the second datachannel receives a message sent to the first datachannel.
This appears to be because both the initiator and the receiver close the datachannel. If line 154 is commented out (e.g. the initiator does not close the channel) the messages always arrive at the correct datachannel, but the loop eventually grinds to a halt after 10-40k iterations when it should run forever, given that both datachannels are closed before the loop continues.
I've added a reproduction file. It doesn't show this problem directly (e.g. it doesn't throw an "The operation failed for an operation-specific reason" error) but it does show that sometimes datachannels will receive messages sent to other datachannels if previously the same datachannel was closed by both ends of the connection. Should I open a new bug for this?
if previously the same datachannel was closed by both ends of the connection
I mean if a previous datachannel was closed by both ends of the connection. Sorry for the confusion, I can't edit my comments here.
I've opened https://bugzilla.mozilla.org/show_bug.cgi?id=1988454 as I'm not sure these two problems are related.
The "operation-specific reason" error only seems to happen when the channel IDs are the same, the "wrong channel delivery" can happen with different IDs and is probably serious enough to warrant tracking it separately.
Comment 8•4 months ago
|
||
Byron, could you take a look? (both here and bug 1988454 ideally)
| Assignee | ||
Comment 9•4 months ago
|
||
I think I see what's happening here, but there's a spec wrinkle; selecting an already-in-use id when calling createDataChannel does not cause an error according to the spec, even though it cannot work and is clearly invalid. That's really strange, and means we can't test this fully in wpt right now. I think I can at least test that this specific bug does not occur though.
| Assignee | ||
Comment 10•4 months ago
|
||
I was hoping that I could get a try run today, but there was some infra bustage that ate my pushes. Trying again...
https://treeherder.mozilla.org/jobs?repo=try&landoCommitID=153845
https://treeherder.mozilla.org/jobs?repo=try&landoCommitID=153846
Comment 11•4 months ago
|
||
The severity field is not set for this bug.
:mjf, could you have a look please?
For more information, please visit BugBot documentation.
| Assignee | ||
Comment 13•3 months ago
|
||
| Assignee | ||
Comment 14•3 months ago
|
||
Also, make sure that we don't fire close events until streams have been reset
in both directions.
Depends on D269062
| Assignee | ||
Comment 15•3 months ago
|
||
Depends on D269063
| Assignee | ||
Comment 16•3 months ago
|
||
Add some missing test case cleanup, mark a test as long, use promise_test instead of async_test in one place.
Depends on D269064
| Assignee | ||
Comment 17•3 months ago
|
||
Depends on D269065
| Assignee | ||
Comment 18•3 months ago
|
||
This helps ensure that these runnables (and all of their lambda captures)
aren't leaked during shutdown.
Depends on D269066
| Assignee | ||
Comment 19•3 months ago
|
||
Mostly this is logging crucial lifecycle events at INFO, not DEBUG.
Depends on D269067
| Assignee | ||
Comment 20•3 months ago
|
||
Depends on D269068
| Assignee | ||
Comment 21•3 months ago
|
||
Depends on D269069
Comment 22•3 months ago
|
||
Comment 24•3 months ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/2c7b813bc96c
https://hg.mozilla.org/mozilla-central/rev/eac802dc63b4
https://hg.mozilla.org/mozilla-central/rev/bfacca5f1eec
https://hg.mozilla.org/mozilla-central/rev/e686cc99b7e5
https://hg.mozilla.org/mozilla-central/rev/738a2847ad05
https://hg.mozilla.org/mozilla-central/rev/ce0c3db06950
https://hg.mozilla.org/mozilla-central/rev/d48fbf91af28
https://hg.mozilla.org/mozilla-central/rev/eaae699d5b93
https://hg.mozilla.org/mozilla-central/rev/87544e60cda8
Updated•2 months ago
|
Description
•