Closed Bug 1411977 Opened 3 years ago Closed 2 years ago

RUN_ON_THREAD() should not "queue jump" when dispatching to same-thread

Categories

(Core :: WebRTC: Networking, enhancement, P2)

52 Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr52 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox60 --- fixed

People

(Reporter: jesup, Assigned: bwc)

References

(Depends on 3 open bugs)

Details

Attachments

(7 files, 2 obsolete files)

1.37 KB, patch
drno
: review+
Details | Diff | Splinter Review
59 bytes, text/x-review-board-request
drno
: review+
Details
59 bytes, text/x-review-board-request
drno
: review+
Details
59 bytes, text/x-review-board-request
drno
: review+
Details
59 bytes, text/x-review-board-request
drno
: review+
Details
59 bytes, text/x-review-board-request
drno
: review+
Details
59 bytes, text/x-review-board-request
drno
: review+
Details
RUN_ON_THREAD gained a same-thread "queue jump" property due to bug 824851, which landed in dec 2012.  We short-circuit when dispatching to the same thread (which 'hops' to the front of the event queue, basically). That "solved" the bug at the time, but was the wrong fix - the real bug there was "RUN_ON_THREAD(..., WrapRunnableRet(this,...));" -- 'this' didn't hold a ref to the pipeline, so no surprise it crashed in some cases. We avoid that now, and should explicitly document the lifetime guarantee for calls passing a bare 'this' to WrapRunnable.

The unexpected queue-jumping has led to a number of bits of code that are fragile or having timing issues, depending on if there's a related runnable already in the queue or not (that'll get jumped).  Disabling queue-jumping has exposed a number of bugs or hidden assumptions, like that a runnable will run and take a ref before the caller can drop it's ref, or that a particular state will be exposed to JS.  Also, we know that with e10s, negotiation of peerconnections and connection of ICE fails when we remove this.

We should remove queue-jumping-by-default, and if there is a need for some calling code to queue-jump, it should explicitly request that (via a different call or an optional parameter).  Need for queue jumping should be clearly exposed in the calling source code and justified.

The belief is that removing this and cleaning up the accidental dependencies may cut into our unexplained-timeouts in tests, and will reduce the chance of introducing new timing bugs.


This will be the master bug for removing queue-jumping, and will have a number of blocking bugs.  I expect it will make sense to land it ASAP at the start of 59 Nightly.
Depends on: 1411982
Depends on: 1412122
Comment on attachment 8922337 [details] [diff] [review]
don't bypass event queue in RUN_ON_THREAD

Review of attachment 8922337 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM
Attachment #8922337 - Flags: review?(drno) → review+
Depends on: 1412123
Depends on: 1413356
Attachment #8924014 - Attachment is obsolete: true
Attachment #8924014 - Flags: review?(rjesup)
Attachment #8924014 - Attachment is obsolete: false
Attachment #8924014 - Attachment is obsolete: true
Comment on attachment 8924014 [details]
Bug 1411977: dispatch inside SingletonThreadHolder only if needed.

https://reviewboard.mozilla.org/r/195222/#review200288

r+, though it doesn't seem to give me the option..
reviewed on the right bug...
Assignee: nobody → docfaraday
Comment on attachment 8945583 [details]
Bug 1411977 - Part 1: Stop queue jumping in RUN_ON_THREAD.

https://reviewboard.mozilla.org/r/215724/#review221492
Attachment #8945583 - Flags: review?(drno) → review+
Comment on attachment 8945584 [details]
Bug 1411977 - Part 2: Stop using sync dispatch and queue jumping with SingletonThreadHolder.

https://reviewboard.mozilla.org/r/215726/#review221506

LGMT

::: media/mtransport/nr_socket_prsock.cpp:1134
(Diff revision 1)
> -  // also guarantees socket_child_ is released from the io_thread, and
> -  // tells the SingletonThreadHolder we're done with it
> -
>  #if defined(MOZILLA_INTERNAL_API)
>    // close(), but transfer the socket_child_ reference to die as well
> +  // dispatches back to STS to call ReleaseUse, to avoid shutting down the IO

The comment threw me off for a little bit as it clearly does not dispatch to the STS thread right here. Maybe change to something like "destroy_i() dispatches internally back to STS...".
Attachment #8945584 - Flags: review?(drno) → review+
Comment on attachment 8945585 [details]
Bug 1411977 - Part 3: Clear the SingletonThreadHolder _after_ thread shutdowns are finished, not before they are started.

https://reviewboard.mozilla.org/r/215728/#review221510
Attachment #8945585 - Flags: review?(drno) → review+
Comment on attachment 8945586 [details]
Bug 1411977 - Part 4: Only try to dispatch the release of TransportLayers when there is a target thread.

https://reviewboard.mozilla.org/r/215730/#review221512
Attachment #8945586 - Flags: review?(drno) → review+
Comment on attachment 8945587 [details]
Bug 1411977 - Part 5: Don't pass a pointer to a temporary to NotifyDataChannel_m.

https://reviewboard.mozilla.org/r/215732/#review221528
Attachment #8945587 - Flags: review?(drno) → review+
Comment on attachment 8945588 [details]
Bug 1411977 - Part 6: Don't unwind the stack when firing onsignalingstatechange.

https://reviewboard.mozilla.org/r/215734/#review221568

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:3217
(Diff revision 1)
>    if (!pco) {
>      return;
>    }
> +
>    WrappableJSErrorResult rv;
> -  RUN_ON_THREAD(mThread,
> +  pco->OnStateChange(PCObserverStateType::IceConnectionState, rv);

A couple of lines below in PeerConnectionImpl::IceGatheringStateChange() we call pco->OnStateChange() only via mThread->Dispatch() (to avoid queue jumping no longer RUN_ON_THREAD).

Which of the two is right? Can we make them to them same thing either way please?
Attachment #8945588 - Flags: review?(drno) → review+
Comment on attachment 8945588 [details]
Bug 1411977 - Part 6: Don't unwind the stack when firing onsignalingstatechange.

https://reviewboard.mozilla.org/r/215734/#review221568

> A couple of lines below in PeerConnectionImpl::IceGatheringStateChange() we call pco->OnStateChange() only via mThread->Dispatch() (to avoid queue jumping no longer RUN_ON_THREAD).
> 
> Which of the two is right? Can we make them to them same thing either way please?

Maybe we can do this. There's some convoluted logic around making sure that gathering state changes, events for trickle candidates, and callbacks for SLD success happen in the proper order here. Maybe transceivers will let us untangle some of it?
Comment on attachment 8946310 [details]
Bug 1411977 - Part 7: Stop using extra dispatches for ICE candidates and gathering state changes.

https://reviewboard.mozilla.org/r/216280/#review222142

::: dom/media/PeerConnection.js:382
(Diff revision 1)
>      // canTrickle == null means unknown; when a remote description is received it
>      // is set to true or false based on the presence of the "trickle" ice-option
>      this._canTrickle = null;
>  
>      // States
> -    this._iceGatheringState = this._iceConnectionState = "new";
> +    this._iceConnectionState = "new";

What is the reason to delete the iceGatheringState, but not the iceConnectionState?

::: dom/media/PeerConnection.js:1798
(Diff revision 1)
>    //
>    //   complete   The ICE agent has completed gathering. Events such as adding
>    //              a new interface or a new TURN server will cause the state to
>    //              go back to gathering.
>    //
> -  handleIceGatheringStateChange(gatheringState) {
> +  handleIceGatheringStateChange() {

Lets delete this function all together and call directly notifyIceGatheringStateChange() instead.

::: dom/media/PeerConnection.js:1813
(Diff revision 1)
>        case "IceConnectionState":
>          this.handleIceConnectionStateChange(this._dompc._pc.iceConnectionState);
>          break;
>  
>        case "IceGatheringState":
> -        this.handleIceGatheringStateChange(this._dompc._pc.iceGatheringState);
> +        this.handleIceGatheringStateChange();

I think this could now directly call this._dompc.notifyIceGatheringStateChange().

::: media/webrtc/signaling/src/peerconnection/PeerConnectionImpl.cpp:3239
(Diff revision 1)
> -                    NS_DISPATCH_NORMAL);
>  
>    if (mIceGatheringState == PCImplIceGatheringState::Complete) {
>      SendLocalIceCandidateToContent(0, "", "");
>    }
> +

According to spec https://www.w3.org/TR/webrtc/#update-the-ice-gathering-state we are suppose to first fire the event and then the empty candidate. Let's make that change here while we are at it.
Attachment #8946310 - Flags: review?(drno) → review+
Comment on attachment 8946310 [details]
Bug 1411977 - Part 7: Stop using extra dispatches for ICE candidates and gathering state changes.

https://reviewboard.mozilla.org/r/216280/#review222142

> What is the reason to delete the iceGatheringState, but not the iceConnectionState?

Scope creep. ;)

> According to spec https://www.w3.org/TR/webrtc/#update-the-ice-gathering-state we are suppose to first fire the event and then the empty candidate. Let's make that change here while we are at it.

This makes our test-cases sad. Let's drop part 7, and instead do this work in a new bug.
Attachment #8946310 - Attachment is obsolete: true
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4480819ddb2b
Part 1: Stop queue jumping in RUN_ON_THREAD. r=drno
https://hg.mozilla.org/integration/autoland/rev/d6c47acf39b0
Part 2: Stop using sync dispatch and queue jumping with SingletonThreadHolder. r=drno
https://hg.mozilla.org/integration/autoland/rev/481ba7607ae2
Part 3: Clear the SingletonThreadHolder _after_ thread shutdowns are finished, not before they are started. r=drno
https://hg.mozilla.org/integration/autoland/rev/2a59e9ad8ba0
Part 4: Only try to dispatch the release of TransportLayers when there is a target thread. r=drno
https://hg.mozilla.org/integration/autoland/rev/ef56214b5e2c
Part 5: Don't pass a pointer to a temporary to NotifyDataChannel_m. r=drno
https://hg.mozilla.org/integration/autoland/rev/d0cd0ad9ca5d
Part 6: Don't unwind the stack when firing onsignalingstatechange. r=drno
You need to log in before you can comment on or make changes to this bug.