Closed Bug 886417 Opened 7 years ago Closed 7 years ago

Intermittent test_dataChannel_basicAudio.html,test_dataChannel_basicVideo.html,test_dataChannel_basicAudioVideo.html,test_dataChannel_basicAudioVideoCombined.html | Channel is in state: 'closed' - got open, expected closed

Categories

(Core :: WebRTC, defect)

24 Branch
ARM
Android
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla25
Tracking Status
firefox23 --- fixed
firefox24 --- fixed
firefox25 --- fixed

People

(Reporter: whimboo, Assigned: jesup)

References

()

Details

(Keywords: intermittent-failure, Whiteboard: [webrtc][blocking-webrtc-])

Attachments

(1 file, 1 obsolete file)

After I pushed my patch for the datachannel framework to inbound, we hit a single failure so far on Android when closing the second datachannel at the end of the test.

https://tbpl.mozilla.org/php/getParsedLog.php?id=24522549&tree=Mozilla-Inbound

9726 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_dataChannel_basicAudio.html | Channel is in state: 'closed' - got open, expected closed

9723 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_dataChannel_basicAudio.html | Run step: CLOSE_LAST_OPENED_DATA_CHANNEL
9724 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_dataChannel_basicAudio.html | DataChannelWrapper (pcLocal_channel_1): Closing channel
9725 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_dataChannel_basicAudio.html | DataChannelWrapper (pcRemote_channel_1): 'onclose' event fired
9726 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_dataChannel_basicAudio.html | Channel is in state: 'closed' - got open, expected closed
9727 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_dataChannel_basicAudio.html | DataChannelWrapper (pcLocal_channel_0): Closing channel
9728 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_dataChannel_basicAudio.html | DataChannelWrapper (pcRemote_channel_0): 'onclose' event fired
9729 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_dataChannel_basicAudio.html | Closing peer connections. Connection state=true
9730 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_dataChannel_basicAudio.html | PeerConnectionWrapper (pcLocal): Closed connection.
9731 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_dataChannel_basicAudio.html | PeerConnectionWrapper (pcRemote): Closed connection.
9732 INFO TEST-INFO | /tests/dom/media/tests/mochitest/test_dataChannel_basicAudio.html | Test finished

Randell, mentioned that he most likely found the reason for it. He can give more details soon.
https://tbpl.mozilla.org/php/getParsedLog.php?id=24526729&tree=Mozilla-Inbound
Summary: Intermittent failure in test_dataChannel_basicAudio.html: Channel is in state: 'closed' - got open, expected closed → Intermittent test_dataChannel_basicAudio.html,test_dataChannel_basicVideo.html | Channel is in state: 'closed' - got open, expected closed
This is likely lack of locking around access to the mState (readystate) value, combined with some cases Dispatching the onclose before setting mState to CLOSED (though under lock).  Patch in a sec that should solve it.
https://tbpl.mozilla.org/php/getParsedLog.php?id=24534934&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=24533792&tree=Mozilla-Inbound
Summary: Intermittent test_dataChannel_basicAudio.html,test_dataChannel_basicVideo.html | Channel is in state: 'closed' - got open, expected closed → Intermittent test_dataChannel_basicAudio.html,test_dataChannel_basicVideo.html,test_dataChannel_basicAudioVideo.html,test_dataChannel_basicAudioVideoCombined.html | Channel is in state: 'closed' - got open, expected closed
Blocks: 796894
Comment on attachment 766834 [details] [diff] [review]
lock before accessing DataChannel->mState

Review of attachment 766834 [details] [diff] [review]:
-----------------------------------------------------------------

Randell, any idea why this code only fails on Android? Shouldn't it affect all platforms? That's interesting.
(In reply to Henrik Skupin (:whimboo) [away 06/28 - 07/07] from comment #10)
> Randell, any idea why this code only fails on Android? Shouldn't it affect
> all platforms? That's interesting.

Please drop this question. We are now getting those failures on other platforms too.
Randell, is this patch ready for review or what is still needed? Shall we do some try server runs?
Attachment #766834 - Attachment is obsolete: true
Comment on attachment 768743 [details] [diff] [review]
lock before accessing DataChannel->mState

Trivial patch to avoid accessing the DataChannel state var without the lock held (and remove unused SetReadyState - ReadyState is readonly in webidl).  Green Try, fixes a frequent Orange
Attachment #768743 - Flags: review?(mcmanus)
Attachment #768743 - Flags: review?(mcmanus) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/edde24398dd0
Whiteboard: [WebRTC] → [WebRTC][webrtc-uplift]
Target Milestone: --- → mozilla25
https://hg.mozilla.org/mozilla-central/rev/edde24398dd0
Assignee: nobody → rjesup
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Randell, are the other branches also affected? I would suspect so. If that's the case we have to backport the patch to aurora and beta. Otherwise I'm not able to backport the datachannel tests.
Flags: needinfo?(rjesup)
Comment on attachment 768743 [details] [diff] [review]
lock before accessing DataChannel->mState

[Approval Request Comment]
Bug caused by (feature/regressing bug #): NA

User impact if declined: Causes random mochitest failures.  Applications that check the state immediately after close might get confused.  In theory this may be a tsan threading issue, if a compiler generated "dangerous" code for the access.  This is also blocking uplifting some tests to Aurora; which makes Aurora probably more important.  Beta would allow us to uplift tests to beta.

Testing completed (on m-c, etc.): on m-c for some time

Risk to taking this patch (and alternatives if risky): Virtually none; just locks access to the state.  

String or IDL/UUID changes made by this patch: none
Attachment #768743 - Flags: approval-mozilla-beta?
Attachment #768743 - Flags: approval-mozilla-aurora?
Comment on attachment 768743 [details] [diff] [review]
lock before accessing DataChannel->mState

low risk patch fixing frequent oranges. Looks good to uplift.
Attachment #768743 - Flags: approval-mozilla-beta?
Attachment #768743 - Flags: approval-mozilla-beta+
Attachment #768743 - Flags: approval-mozilla-aurora?
Attachment #768743 - Flags: approval-mozilla-aurora+
Whiteboard: [WebRTC][webrtc-uplift] → [webrtc][blocking-webrtc-]
You need to log in before you can comment on or make changes to this bug.