Closed Bug 1320744 Opened 7 years ago Closed 7 years ago

HttpChannelChild should implement nsIThreadRetargetableRequest

Categories

(Core :: Networking, defect)

50 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: baku, Assigned: schien)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-active][PBg-HTTP-M1])

Attachments

(7 files, 6 obsolete files)

59 bytes, text/x-review-board-request
mayhemer
: review+
Details
59 bytes, text/x-review-board-request
mayhemer
: review+
Details
59 bytes, text/x-review-board-request
mayhemer
: review+
Details
105.77 KB, image/png
Details
42.38 KB, patch
Details | Diff | Splinter Review
22.97 KB, patch
Details | Diff | Splinter Review
59 bytes, text/x-review-board-request
mrbkap
: review+
Details
nsHttpChannel implements this interface. We should be consistent between e10s and non-e10s.
Blocks: 1267903
SC, FYI...
See Also: → 1015466
Whiteboard: [necko-next]
SC says he expects this bug to be fixed in 2017 Q2.
Blocks: Quantum
See Also: 1015466
Assignee: nobody → schien
Attachment #8823211 - Flags: review?(honzab.moz)
Attachment #8823212 - Flags: review?(honzab.moz)
Comment on attachment 8823211 [details]
Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.

https://reviewboard.mozilla.org/r/101794/#review105712

the patch name should be like "make refcounter of httpchannelchild thread safe"

not sure if this or the followup causes a perma leak of HttpBaseChannel.

::: netwerk/protocol/http/HttpChannelChild.h:112
(Diff revision 1)
>  
>    mozilla::ipc::IPCResult RecvNotifyTrackingProtectionDisabled() override;
>    mozilla::ipc::IPCResult RecvNotifyTrackingResource() override;
>    void FlushedForDiversion();
>  
> +  bool SendDeletingChannel();

override?

::: netwerk/protocol/http/HttpChannelChild.cpp:205
(Diff revision 1)
> -// Override nsHashPropertyBag's AddRef: we don't need thread-safe refcnt
> +NS_IMPL_ADDREF_INHERITED(HttpChannelChild, HttpBaseChannel)
> -NS_IMPL_ADDREF(HttpChannelChild)
>  
>  NS_IMETHODIMP_(MozExternalRefCountType) HttpChannelChild::Release()
>  {
> -  NS_PRECONDITION(0 != mRefCnt, "dup release");
> +  MOZ_ASSERT(int32_t(mRefCnt) > 0, "dup release");

do this assertion on the local |count| var (>=0).

::: netwerk/protocol/http/HttpChannelChild.cpp:2848
(Diff revision 1)
>  }
>  
> +bool
> +HttpChannelChild::SendDeletingChannel()
> +{
> +  if(NS_IsMainThread()) {

if (

::: netwerk/protocol/http/HttpChannelChild.cpp:2855
(Diff revision 1)
> +  }
> +
> +  nsresult rv;
> +  rv = NS_DispatchToMainThread(
> +         NewRunnableMethod(this, &HttpChannelChild::SendDeletingChannel));
> +  return NS_SUCCEEDED(rv);

you should assert a success here.  because otherwise you leak?
Attachment #8823211 - Flags: review?(honzab.moz) → review+
Comment on attachment 8823211 [details]
Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.

https://reviewboard.mozilla.org/r/101794/#review105712

I've updated the commit log and I'll fix the perma leak before check-in.

> override?

No, PHttpChannelChild::SendDeletingChannel is not a virtual function.

> do this assertion on the local |count| var (>=0).

done.

> if (

done.

> you should assert a success here.  because otherwise you leak?

done.
The memory leak on try is false alarm. BloatView use `NS_LOG_ADDREF` / `NS_LOG_RELEASE` to track the refcount of each object. In my patch I've changed the `HttpChannelChild::AddRef` implementation from `NS_IMPL_ADDREF` to `NS_IMPL_ADDREF_INHERITED`. Therefore, we need to introduce the same semantic as in `NS_IMPL_RELEASE_INHERITED` into our own `HttpChannelChild::Release`.
Comment on attachment 8823211 [details]
Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.

The memory leak is fixed in this revision. See comment #12 for more detail.
Attachment #8823211 - Flags: review+ → review?(honzab.moz)
Comment on attachment 8823212 [details]
Bug 1320744 - Part 3, implement nsIThreadRetargetableRequest in HttpChannelChild.

https://reviewboard.mozilla.org/r/101796/#review105714

(sorry, this was left in reviewboard on the previous version of the patch, had to publish.. not sure the comments are still current, but some probably are)

before I can do a review here, please add good explanatory comments to the code and please add a description of the patch to bugzilla.  it's hard to try to understand it w/o a word of clarification.  thanks.

::: netwerk/protocol/http/HttpChannelChild.h:196
(Diff revision 1)
>    // Sets the event target for future IPC messages. Messages will either be
>    // directed to the TabGroup or DocGroup, depending on the LoadInfo associated
>    // with the channel. Should be called when a new channel is being set up,
>    // before the constructor message is sent to the parent.
>    void SetEventTarget();
> +  already_AddRefed<nsIEventTarget> GetEventTargetForNetwork();

comment

::: netwerk/protocol/http/HttpChannelChild.h:287
(Diff revision 1)
>    // is synthesized.
>    bool mSuspendParentAfterSynthesizeResponse;
>  
> +  // Target thread for delivering ODA.
> +  nsCOMPtr<nsIEventTarget> mTargetThread;
> +  nsCOMPtr<nsIEventTarget> mNetworkEventTarget;

and this one is?

::: netwerk/protocol/http/HttpChannelChild.cpp:726
(Diff revision 1)
>  
>    DoOnStatus(this, transportStatus);
>    DoOnProgress(this, progress, progressMax);
>  
> +  if (mTargetThread && NS_GetCurrentThread() != mTargetThread) {
> +    // Suspend channel event processing while doing OMT ODA

very nice comment :))

please say why...

::: netwerk/protocol/http/HttpChannelChild.cpp:732
(Diff revision 1)
> +    mEventQ->Suspend();
> +
> +    RefPtr<HttpChannelChild> self = this;
> +    nsCOMPtr<nsIRunnable> resumeRunnable = NS_NewRunnableFunction([self]() {
> +      MOZ_ASSERT(NS_IsMainThread());
> +      // Resume channel event processing on original thread.

and a nice one again :D

why on the original thread..?
Comment on attachment 8823211 [details]
Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.

https://reviewboard.mozilla.org/r/101794/#review105942

::: netwerk/protocol/http/HttpChannelChild.cpp:208
(Diff revision 3)
>  NS_IMETHODIMP_(MozExternalRefCountType) HttpChannelChild::Release()
>  {
> -  NS_PRECONDITION(0 != mRefCnt, "dup release");
> -  NS_ASSERT_OWNINGTHREAD(HttpChannelChild);
> -  --mRefCnt;
> -  NS_LOG_RELEASE(this, mRefCnt, "HttpChannelChild");
> +  // Copied from NS_IMPL_RELEASE_INHERITED to align the refcount logging
> +  // with NS_IMPL_ADDREF_INHERITED.
> +  nsrefcnt count = HttpBaseChannel::Release();
> +  NS_LOG_RELEASE(this, count, "HttpChannelChild");

1. |this| can be no longer valid.  
2. I believe the NS_LOG_RELEASE here should be removed, as HttpBaseChannel::Release() takes care of it.
3. sad that when HttpChannelChild will leak now, we won't find out believing it's HttpBaseChannel...

::: netwerk/protocol/http/HttpChannelChild.cpp:214
(Diff revision 3)
>  
>    // Normally we Send_delete in OnStopRequest, but when we need to retain the
>    // remote channel for security info IPDL itself holds 1 reference, so we
>    // Send_delete when refCnt==1.  But if !mIPCOpen, then there's nobody to send
>    // to, so we fall through.
> -  if (mKeptAlive && mRefCnt == 1 && mIPCOpen) {
> +  if (mKeptAlive && count == 1 && mIPCOpen) {

hmm... and what if:

Thread 1:

nsrefcnt count = HttpBaseChannel::Release();
count == 1

--CONTEXT SWITCH--

Thread 2:

nsrefcnt count = HttpBaseChannel::Release();
count == 0
delete this;

--CONTEXT SWITCH--

Thread 1:
this->SendDeletingChannel();

BOOOOM!!!!!!


the last ref is held only by IPC, but still, it may happen we get the last call on a different thread.  I think the release method cannot land as is.
Attachment #8823211 - Flags: review?(honzab.moz) → review-
Comment on attachment 8823211 [details]
Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.

https://reviewboard.mozilla.org/r/101794/#review106148

::: netwerk/protocol/http/HttpChannelChild.cpp:208
(Diff revision 3)
>  NS_IMETHODIMP_(MozExternalRefCountType) HttpChannelChild::Release()
>  {
> -  NS_PRECONDITION(0 != mRefCnt, "dup release");
> -  NS_ASSERT_OWNINGTHREAD(HttpChannelChild);
> -  --mRefCnt;
> -  NS_LOG_RELEASE(this, mRefCnt, "HttpChannelChild");
> +  // Copied from NS_IMPL_RELEASE_INHERITED to align the refcount logging
> +  // with NS_IMPL_ADDREF_INHERITED.
> +  nsrefcnt count = HttpBaseChannel::Release();
> +  NS_LOG_RELEASE(this, count, "HttpChannelChild");

1. Yes, `this` is not valid for dereference. However `NS_LOG_RELEASE` only use `this` as an id of that object. Code here is aligned with https://searchfox.org/mozilla-central/rev/c477aa8bd99278962998adba1c5e4b15a02c42c7/xpcom/glue/nsISupportsImpl.h#963

2. This `NS_LOG_RELEASE` cannot be removed if we use `NS_IMPL_ADDREF_INHERITED` to implement AddRef() because `NS_LOG_ADDREF` for 'HttpChannelChild' will be called. Therefore, `NS_LOG_RELEASE` here is necessary to keep BloatView happy.

3. In this revision, HttpChannelChild will be tracked by BloatView so we'll see both `HttpChannelChild` and `HttpBaseChannel` appeared in BloatView if leak happened.

If we keep `NS_IMPL_ADDREF` instead of using the inherited implementation. We can keep the same approach to implement HttpChannelChild::Release.

::: netwerk/protocol/http/HttpChannelChild.cpp:214
(Diff revision 3)
>  
>    // Normally we Send_delete in OnStopRequest, but when we need to retain the
>    // remote channel for security info IPDL itself holds 1 reference, so we
>    // Send_delete when refCnt==1.  But if !mIPCOpen, then there's nobody to send
>    // to, so we fall through.
> -  if (mKeptAlive && mRefCnt == 1 && mIPCOpen) {
> +  if (mKeptAlive && count == 1 && mIPCOpen) {

This should never happen because in the case we need to call `SendDeletingChannel()`, the last refcount is held by the `AddIPDLReference()`. Only `ReleaseIPDLReference()` can make refcount to 0 and it means we already complete the `SendDeletingChannel -> SendDeleteSelf -> send__delete__` procedure.
(didn't look at the new patch yet)


(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #16)
> Comment on attachment 8823211 [details]
> Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.
> 
> https://reviewboard.mozilla.org/r/101794/#review106148
> 
> ::: netwerk/protocol/http/HttpChannelChild.cpp:208
> (Diff revision 3)
> >  NS_IMETHODIMP_(MozExternalRefCountType) HttpChannelChild::Release()
> >  {
> > -  NS_PRECONDITION(0 != mRefCnt, "dup release");
> > -  NS_ASSERT_OWNINGTHREAD(HttpChannelChild);
> > -  --mRefCnt;
> > -  NS_LOG_RELEASE(this, mRefCnt, "HttpChannelChild");
> > +  // Copied from NS_IMPL_RELEASE_INHERITED to align the refcount logging
> > +  // with NS_IMPL_ADDREF_INHERITED.
> > +  nsrefcnt count = HttpBaseChannel::Release();
> > +  NS_LOG_RELEASE(this, count, "HttpChannelChild");
> 
> 1. Yes, `this` is not valid for dereference. However `NS_LOG_RELEASE` only
> use `this` as an id of that object. 

And that is an argument?  No.  Never pass an invalid ptr to any API!  The reason is that when a new object is created in the meantime ending up with the same address as |this| on a different thread, you will screw the counter, obviously.

> Code here is aligned with
> https://searchfox.org/mozilla-central/rev/
> c477aa8bd99278962998adba1c5e4b15a02c42c7/xpcom/glue/nsISupportsImpl.h#963

Then that code is wrong too.

> 
> 2. This `NS_LOG_RELEASE` cannot be removed if we use
> `NS_IMPL_ADDREF_INHERITED` to implement AddRef() because `NS_LOG_ADDREF` for
> 'HttpChannelChild' will be called. Therefore, `NS_LOG_RELEASE` here is
> necessary to keep BloatView happy.

understand.  I think just doing NS_LOG_RELEASE before you call super::Release() would do.

> 
> 3. In this revision, HttpChannelChild will be tracked by BloatView so we'll
> see both `HttpChannelChild` and `HttpBaseChannel` appeared in BloatView if
> leak happened.

good.

> 
> If we keep `NS_IMPL_ADDREF` instead of using the inherited implementation.
> We can keep the same approach to implement HttpChannelChild::Release.

I would personally prefer that, if possible.  Could be confusing when the leak happens.

> 
> ::: netwerk/protocol/http/HttpChannelChild.cpp:214
> (Diff revision 3)
> >  
> >    // Normally we Send_delete in OnStopRequest, but when we need to retain the
> >    // remote channel for security info IPDL itself holds 1 reference, so we
> >    // Send_delete when refCnt==1.  But if !mIPCOpen, then there's nobody to send
> >    // to, so we fall through.
> > -  if (mKeptAlive && mRefCnt == 1 && mIPCOpen) {
> > +  if (mKeptAlive && count == 1 && mIPCOpen) {
> 
> This should never happen because in the case we need to call
> `SendDeletingChannel()`, the last refcount is held by the
> `AddIPDLReference()`. Only `ReleaseIPDLReference()` can make refcount to 0
> and it means we already complete the `SendDeletingChannel -> SendDeleteSelf
> -> send__delete__` procedure.

OK, let's see if this is going to crash in release or not ;)  honestly I don't much trust the code as is, but that unfortunately is also the legacy of our broken IPC referencing logic...
Comment on attachment 8823211 [details]
Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.

https://reviewboard.mozilla.org/r/101794/#review109312

still smelling something wrong here...

::: netwerk/protocol/http/HttpChannelChild.cpp:2845
(Diff revision 5)
> +    return PHttpChannelChild::SendDeletingChannel();
> +  }
> +
> +  nsresult rv;
> +  rv = NS_DispatchToMainThread(
> +         NewRunnableMethod(this, &HttpChannelChild::SendDeletingChannel));

so... this runnable will addref |this|, making the refcnt == 2 and after it's dispatched, ran and released will drop the counter back to 1 leading to call of SendDeletingChannel from Release() again?

I can't locally test this since this patch deps on number of other patches that are yet incomplete.

::: netwerk/protocol/http/HttpChannelChild.cpp:2848
(Diff revision 5)
> +  nsresult rv;
> +  rv = NS_DispatchToMainThread(
> +         NewRunnableMethod(this, &HttpChannelChild::SendDeletingChannel));
> +  MOZ_ASSERT(NS_SUCCEEDED(rv));
> +
> +  return NS_SUCCEEDED(rv);

why returning something when it's not checked anywhere?
Attachment #8823211 - Flags: review?(honzab.moz) → review-
Comment on attachment 8823211 [details]
Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.

https://reviewboard.mozilla.org/r/101794/#review111000
Whiteboard: [necko-next] → [necko-active]
Comment on attachment 8823211 [details]
Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.

https://reviewboard.mozilla.org/r/101794/#review109312

> so... this runnable will addref |this|, making the refcnt == 2 and after it's dispatched, ran and released will drop the counter back to 1 leading to call of SendDeletingChannel from Release() again?
> 
> I can't locally test this since this patch deps on number of other patches that are yet incomplete.

In `Release()`, `mKeptAlive` is used to guarantee that `SendDeletingChannel()` is only called at most once. However I'm thinking about using `NewNonOwningRunnableMethod` to avoid holding additional refcount.

> why returning something when it's not checked anywhere?

Fixed. I should have remove the return value when I realize this function cannot be override.
Comment on attachment 8823211 [details]
Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.

https://reviewboard.mozilla.org/r/101794/#review115454

::: netwerk/protocol/http/HttpChannelChild.cpp:2860
(Diff revision 6)
> +    DebugOnly<bool> success = PHttpChannelChild::SendDeletingChannel();
> +    MOZ_ASSERT(success);
> +    return;
> +  }
> +
> +  nsresult rv;

move to the top as the first declaration in the method
Attachment #8823211 - Flags: review?(honzab.moz) → review+
Comment on attachment 8839572 [details] [diff] [review]
[for splinter]  Bug 1320744 - Part 2, implement nsIThreadRetargetableRequest in HttpChannelChild.

Review of attachment 8839572 [details] [diff] [review]:
-----------------------------------------------------------------

few questions:

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +730,5 @@
>    const int64_t progress = offset + count;
>    DoOnProgress(this, progress, progressMax);
>  
> +  if (mTargetThread && NS_GetCurrentThread() != mTargetThread) {
> +    // Suspend channel event processing while doing OMT ODA

usual question: why? :)

@@ +736,5 @@
> +
> +    RefPtr<HttpChannelChild> self = this;
> +    nsCOMPtr<nsIRunnable> resumeRunnable = NS_NewRunnableFunction([self]() {
> +      MOZ_ASSERT(NS_IsMainThread());
> +      // Resume channel event processing on original thread.

....why?

@@ +755,5 @@
> +        self->DoOnDataAvailable(self, self->mListenerContext, stringStream, offset, count);
> +        stringStream->Close();
> +
> +        nsCOMPtr<nsIEventTarget> originalTarget = self->GetChannelEventTarget();
> +        originalTarget->Dispatch(resumeRunnable, NS_DISPATCH_NORMAL);

I'm lost a bit (lack of knowledge here).  can you tell me what's the difference between mEventQ->GetTargetThread() and mTargetThread?  I may have a hunch, but I want to make sure I understand.

@@ +1884,5 @@
>    LOG(("HttpChannelChild::Cancel [this=%p]\n", this));
> +
> +  nsCOMPtr<nsIEventTarget> originalTarget = GetChannelEventTarget();
> +  bool isOnThread;
> +  originalTarget->IsOnCurrentThread(&isOnThread);

Could non-main-thread mEventQ->GetTargetThread() ever become null during the time the channel is pending?  Since then we would access mCanceled etc members potentially concurrently.

@@ +1888,5 @@
> +  originalTarget->IsOnCurrentThread(&isOnThread);
> +  if (!isOnThread) {
> +    return originalTarget->Dispatch(
> +      NewRunnableMethod<nsresult>(this, &HttpChannelChild::Cancel, status),
> +      NS_DISPATCH_NORMAL);

nit: indent or leave on the same line

@@ +1889,5 @@
> +  if (!isOnThread) {
> +    return originalTarget->Dispatch(
> +      NewRunnableMethod<nsresult>(this, &HttpChannelChild::Cancel, status),
> +      NS_DISPATCH_NORMAL);
> +  }

add a good comment why you are doing this change.  if we delay cancellation by posting it, the steam listener (onstart/oda) could still be called on the target after the channel has been canceled (or, the channel's consumer thought it was).  

I'm not sure how big this could be a problem, but something tells me it may bring issues, hard to reproduce, hard to identify, with hard consequences.

@@ +2137,5 @@
>  
> +already_AddRefed<nsIEventTarget>
> +HttpChannelChild::GetChannelEventTarget()
> +{
> +  nsCOMPtr<nsIEventTarget> target = mEventQ->GetTargetThread();

where from is mEventQ->GetTargetThread() set?  what is it exactly and how does it change?

@@ +2909,5 @@
> +  nsCOMPtr<nsIThreadRetargetableStreamListener> retargetableListener =
> +      do_QueryInterface(mListener, &rv);
> +  if (!retargetableListener || NS_FAILED(rv)) {
> +    NS_WARNING("Listener is not retargetable");
> +    return NS_OK;

this is non fatal?

@@ +2915,5 @@
> +
> +  rv = retargetableListener->CheckListenerChain();
> +  if (NS_FAILED(rv)) {
> +    NS_WARNING("Subsequent listeners are not retargetable");
> +    return NS_OK;

as well this?  maybe it's correct, just checking.  maybe worth a comment if there is nothing in IDL for this method (didn't look)?
Comment on attachment 8839572 [details] [diff] [review]
[for splinter]  Bug 1320744 - Part 2, implement nsIThreadRetargetableRequest in HttpChannelChild.

Review of attachment 8839572 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +730,5 @@
>    const int64_t progress = offset + count;
>    DoOnProgress(this, progress, progressMax);
>  
> +  if (mTargetThread && NS_GetCurrentThread() != mTargetThread) {
> +    // Suspend channel event processing while doing OMT ODA

This is to avoid OnStopRequest to be processed while ODA is still sit in the event queue of the target thread.

@@ +736,5 @@
> +
> +    RefPtr<HttpChannelChild> self = this;
> +    nsCOMPtr<nsIRunnable> resumeRunnable = NS_NewRunnableFunction([self]() {
> +      MOZ_ASSERT(NS_IsMainThread());
> +      // Resume channel event processing on original thread.

ChannelEventQueue will processing the pending queue right on the thread invoking Resume(). However all the channel events are designed to be processed on main thread/network EventTarget, since they will trigger OnStartRequest/OnStopRequest/OnProgress/OnStatus. Therefore, we can only process mEventQ on main thread.

@@ +755,5 @@
> +        self->DoOnDataAvailable(self, self->mListenerContext, stringStream, offset, count);
> +        stringStream->Close();
> +
> +        nsCOMPtr<nsIEventTarget> originalTarget = self->GetChannelEventTarget();
> +        originalTarget->Dispatch(resumeRunnable, NS_DISPATCH_NORMAL);

mEventQ->GetTargetThread() is the same thread/EventTarget HttpChannelChild sits on, which is main thread/network EventTargat.
mTargetThread is the thread for ODA delivery, e.g. image decoder thread.

@@ +1889,5 @@
> +  if (!isOnThread) {
> +    return originalTarget->Dispatch(
> +      NewRunnableMethod<nsresult>(this, &HttpChannelChild::Cancel, status),
> +      NS_DISPATCH_NORMAL);
> +  }

This change is made because we call `Cancel()` when mListener->OnDataAvailable return failure. In nsHttpChannel we doesn't have this behavior and I think we shouldn't close the channel because of a callback function. In that case, we can keep `Cancel` main thread only. How do you think?

@@ +2137,5 @@
>  
> +already_AddRefed<nsIEventTarget>
> +HttpChannelChild::GetChannelEventTarget()
> +{
> +  nsCOMPtr<nsIEventTarget> target = mEventQ->GetTargetThread();

TargetThread is assigned in https://searchfox.org/mozilla-central/rev/39e4b25a076c59d2e7820297d62319f167871449/netwerk/protocol/http/HttpChannelChild.cpp#2072. This is introduced by Quantum DOM modification in bug 1318506.

@@ +2909,5 @@
> +  nsCOMPtr<nsIThreadRetargetableStreamListener> retargetableListener =
> +      do_QueryInterface(mListener, &rv);
> +  if (!retargetableListener || NS_FAILED(rv)) {
> +    NS_WARNING("Listener is not retargetable");
> +    return NS_OK;

This is copied from nsHttpChannel::RetargetDeliveryTo(). There are two types of error handling in caller: 1) NS_ENSURE_SUCCESS, 2) NS_WARNING if failed.
Therefore, returning NS_OK is the only choice to make current code work but hiding the warning message as a side effect.

@@ +2915,5 @@
> +
> +  rv = retargetableListener->CheckListenerChain();
> +  if (NS_FAILED(rv)) {
> +    NS_WARNING("Subsequent listeners are not retargetable");
> +    return NS_OK;

The documentation in IDL is vague. https://searchfox.org/mozilla-central/rev/39e4b25a076c59d2e7820297d62319f167871449/netwerk/base/nsIThreadRetargetableRequest.idl#29
It only states that there is no guarantee retargeting will success, but not describing how to detect it earlier. I thinking adding "@throws NS_ERROR_NOT_AVAILABLE if failed to retarget" in IDL will provide more clear error handling to both caller and implementer. We can do that in a separate bug to change both IDL and code if you also think this is the way to do it.
Comment on attachment 8839572 [details] [diff] [review]
[for splinter]  Bug 1320744 - Part 2, implement nsIThreadRetargetableRequest in HttpChannelChild.

Review of attachment 8839572 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +2909,5 @@
> +  nsCOMPtr<nsIThreadRetargetableStreamListener> retargetableListener =
> +      do_QueryInterface(mListener, &rv);
> +  if (!retargetableListener || NS_FAILED(rv)) {
> +    NS_WARNING("Listener is not retargetable");
> +    return NS_OK;

I somehow compare to the wrong code while writing previous comment. Here I should just return NS_ERROR_NO_INTERFACE.
Comment on attachment 8823211 [details]
Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.

https://reviewboard.mozilla.org/r/101794/#review115454

> move to the top as the first declaration in the method

This one is actually `DebugOnly<nsresult> rv`, so I would prefer to leave the declaration as close as the assignment. In fact I plan to rewrite these lines to:

```
DebugOnly<nsresult> rv =
  NS_DispatchToMainThread(
    NewNonOwningRunnableMethod(this, &HttpChannelChild::SendDeletingChannel));
MOZ_ASSERT(NS_SUCCEEDED(rv));
```
Attached patch main-thread-only-cancel.patch (obsolete) — Splinter Review
The major issue in previous comment is about allowing Cancel() to be called on other threads. With this patch we can avoid calling Cancel() on other threads for OMT ODA.
Attachment #8839859 - Flags: feedback?(honzab.moz)
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #38)
> Comment on attachment 8839572 [details] [diff] [review]
> [for splinter]  Bug 1320744 - Part 2, implement nsIThreadRetargetableRequest
> in HttpChannelChild.
> 
> Review of attachment 8839572 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: netwerk/protocol/http/HttpChannelChild.cpp
> @@ +730,5 @@
> >    const int64_t progress = offset + count;
> >    DoOnProgress(this, progress, progressMax);
> >  
> > +  if (mTargetThread && NS_GetCurrentThread() != mTargetThread) {
> > +    // Suspend channel event processing while doing OMT ODA
> 
> This is to avoid OnStopRequest to be processed while ODA is still sit in the
> event queue of the target thread.

OK, I'd like to talk about this face to face (after/during the necko meeting) to make sure I fully understand.  As well for other two replies.

> @@ +1889,5 @@
> > +  if (!isOnThread) {
> > +    return originalTarget->Dispatch(
> > +      NewRunnableMethod<nsresult>(this, &HttpChannelChild::Cancel, status),
> > +      NS_DISPATCH_NORMAL);
> > +  }
> 
> This change is made because we call `Cancel()` when
> mListener->OnDataAvailable return failure. 

Which is correct, I think.

> In nsHttpChannel we doesn't have
> this behavior 

Because the result of mListener->ODA is returned by nsHttpChannel::ODA and thus goes down to the input stream pump which then cancels itself and fires only OnStopRequest with status = that error.

> and I think we shouldn't close the channel because of a
> callback function. 

We must keep to the nsIStreamListener contract.  If we cancel asynchronously, I'm scared that we can get ODA calls before we finally get OnStop(error)

> In that case, we can keep `Cancel` main thread only. How
> do you think?

The nsIStreamListener contract is:

call OnStartRequest once
if it throws, go to [cancel]
call 0-N times OnDataAvailable
if any of them throws, prevent any other call to ODA, go to [cancel]
[cancel]: set status = error from the callback
call OnStopRequest(status) once
release references


but I'm afraid that the following can happen:

OnStartRequest on main thread
retarget to T2
T2: ODA
T2: ODA -> failure -> post cancel to main thread

followed possibly with something like

T2: ODA consumer: what is this???
Cancel(error)
T2: ODA consumer: what is this again???
OnStopRequest(error)
T2: ODA consumer: what is this again???

Not sure how bad this can be, but all those ODAs after the second one that threw must be prevented.  That's my point.

> 
> @@ +2137,5 @@
> >  
> > +already_AddRefed<nsIEventTarget>
> > +HttpChannelChild::GetChannelEventTarget()
> > +{
> > +  nsCOMPtr<nsIEventTarget> target = mEventQ->GetTargetThread();
> 
> TargetThread is assigned in
> https://searchfox.org/mozilla-central/rev/
> 39e4b25a076c59d2e7820297d62319f167871449/netwerk/protocol/http/
> HttpChannelChild.cpp#2072. This is introduced by Quantum DOM modification in
> bug 1318506.

Aha!!  So mEventQueue->GetEventTarget() is likely to be the main thread (with prioritization), right?

And nsHttpChannel->mTargetThread is the worker thread we want to retarget ODA to, right?  I was confused by those two.

> 
> @@ +2909,5 @@
> > +  nsCOMPtr<nsIThreadRetargetableStreamListener> retargetableListener =
> > +      do_QueryInterface(mListener, &rv);
> > +  if (!retargetableListener || NS_FAILED(rv)) {
> > +    NS_WARNING("Listener is not retargetable");
> > +    return NS_OK;
> 
> This is copied from nsHttpChannel::RetargetDeliveryTo(). There are two types
> of error handling in caller: 1) NS_ENSURE_SUCCESS, 2) NS_WARNING if failed.
> Therefore, returning NS_OK is the only choice to make current code work but
> hiding the warning message as a side effect.

OK

> 
> @@ +2915,5 @@
> > +
> > +  rv = retargetableListener->CheckListenerChain();
> > +  if (NS_FAILED(rv)) {
> > +    NS_WARNING("Subsequent listeners are not retargetable");
> > +    return NS_OK;
> 
> The documentation in IDL is vague.
> https://searchfox.org/mozilla-central/rev/
> 39e4b25a076c59d2e7820297d62319f167871449/netwerk/base/
> nsIThreadRetargetableRequest.idl#29
> It only states that there is no guarantee retargeting will success, but not
> describing how to detect it earlier. I thinking adding "@throws
> NS_ERROR_NOT_AVAILABLE if failed to retarget" in IDL will provide more clear
> error handling to both caller and implementer. We can do that in a separate
> bug to change both IDL and code if you also think this is the way to do it.

Leave it as it is.  If there will ever be a consumer needing to know if retarget is successful we may change it.

(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #40)
> Comment on attachment 8823211 [details]
> Bug 1320744 - Part 1, make refcounter of HttpChannelChild thread-safe.
> 
> https://reviewboard.mozilla.org/r/101794/#review115454
> 
> > move to the top as the first declaration in the method
> 
> This one is actually `DebugOnly<nsresult> rv`, 

Agree!
Comment on attachment 8839859 [details] [diff] [review]
main-thread-only-cancel.patch

Review of attachment 8839859 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +739,5 @@
>    // rest.
>    nsCOMPtr<nsIInputStream> stringStream;
>    nsresult rv = NS_NewByteInputStream(getter_AddRefs(stringStream), data.get(),
> +                                      count, (needOMT ? NS_ASSIGNMENT_COPY
> +                                                      : NS_ASSIGNMENT_DEPEND));

I missed this!  yes!

@@ +746,5 @@
>      return;
>    }
>  
> +  if (needOMT) {
> +    // Suspend channel event processing while doing OMT ODA

so, just add here the explanatory comment you added to bugzilla (with whatever tweaks needed).  same for other places.

@@ +844,5 @@
>                                      uint64_t offset, uint32_t count)
>  {
>    LOG(("HttpChannelChild::DoOnDataAvailable [this=%p]\n", this));
>    if (mCanceled)
>      return;

here it is!  is mCanceled an atomic?  I think rel-acq would do.

@@ +847,5 @@
>    if (mCanceled)
>      return;
>  
>    nsresult rv = mListener->OnDataAvailable(aRequest, aContext, aStream, offset, count);
> +  Unused << NS_WARN_IF(NS_FAILED(rv));

totally bad!  :X

this has to be stored to mStatus (if there is not already).  the above if (mCanceled) return; condition has to be extended to if (mCanceled || NS_FAILED(mStatus)) return;

and the failure mStatus value must be preserved (not rewritten by Cancel()) and must be passed to OnStop as the status arg.

and we definitely must send cancel to the parent channel with the status, yes, from here and here only, and post to the main thread when needed.

or something like that..

@@ +1899,5 @@
>  NS_IMETHODIMP
>  HttpChannelChild::Cancel(nsresult status)
>  {
>    LOG(("HttpChannelChild::Cancel [this=%p]\n", this));
> +  MOZ_ASSERT(NS_IsMainThread());

We do the same in nsHttpChannel, so it's probably alright.  ODA should never cancel the channel then.  Not happy, but I think it's good enough for keeping this simple.
Attachment #8839859 - Flags: feedback?(honzab.moz) → feedback-
Thanks @mayheymer for the detailed explanation of the nsIStreamListener contract.

I think we can prepend a CancelEvent in mEventQ when running DoOnDataAvailable on the retargeted thread. It'll be guaranteed to be processed first before other ODA/OnStopRequest since mEventQ is suspended at that time.
(In reply to Honza Bambas (:mayhemer) from comment #43)
> 
> @@ +844,5 @@
> >                                      uint64_t offset, uint32_t count)
> >  {
> >    LOG(("HttpChannelChild::DoOnDataAvailable [this=%p]\n", this));
> >    if (mCanceled)
> >      return;
> 
> here it is!  is mCanceled an atomic?  I think rel-acq would do.
> 
Solely making mCanceled atomic might not be enough. We should make entire Cancel() atomic since there are more actions happened after `mCanceled = true;` in both nsHttpChannel [1] and HttpChannelChild [2].

[1] https://searchfox.org/mozilla-central/rev/b1044cf7c2000c3e75e8181e893236a940c8b6d2/netwerk/protocol/http/nsHttpChannel.cpp#5765

[2] https://searchfox.org/mozilla-central/rev/b1044cf7c2000c3e75e8181e893236a940c8b6d2/netwerk/protocol/http/HttpChannelChild.cpp#1859
update according to review comment.

[summary of changes]
1. HttpChannelChild implements nsIThreadRetargetableRequest
2. Dispatch ODA to retarget thread
3. Allow channel cancel on other thread preempts following ODA/OnStopRequest
4. make mCanceled atomic with rel_acq ordering.
Attachment #8839859 - Attachment is obsolete: true
Attachment #8840360 - Flags: feedback?(honzab.moz)
Comment on attachment 8840360 [details] [diff] [review]
http-channel-child-retargetable.patch

try looks good, so move to reviewboard for continuing the review process.
Attachment #8840360 - Attachment is obsolete: true
Attachment #8840360 - Flags: feedback?(honzab.moz)
Attachment #8839572 - Attachment is obsolete: true
:sc, please add the latest patch (part 2) as a normal patch to bugzilla.  I'm totally lost in what I see in rb.  Thanks.
Flags: needinfo?(schien)
Ah!  the name of the diff when you download it from rb remains the same among revisions - god, it's so stupid.  OK, no problem.  I found myself again :)
Flags: needinfo?(schien)
Comment on attachment 8841550 [details] [diff] [review]
(for spliter) Part 2, implement nsIThreadRetargetableRequest in HttpChannelChild.

Review of attachment 8841550 [details] [diff] [review]:
-----------------------------------------------------------------

I think I finally understand how the stuff works here.

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +768,5 @@
> +        // target to continue processing the following ODA/OnStopRequest.
> +        nsCOMPtr<nsIEventTarget> originalTarget = self->GetChannelEventTarget();
> +        originalTarget->Dispatch(NS_NewRunnableFunction([self]() {
> +          MOZ_ASSERT(NS_IsMainThread());
> +          self->mEventQ->Resume();

this is not a good design from the performance point of view.  first, OnTransportAndData is called on the main thread.  you find out the channel's ODA has been retargetted.  you post this event (the outer one) to that retarget thread, while the event queue (mEventQ) is suspended, so that no other calls to HttpChannelChild::OnTransportAndData can happen (are just put to the queue).  then you process ODA on the retarget thread, and only after that you post back to the main thread to resume the queue and start this dance again.  So, this will not get rid of the queue starvation delays, it will double them.

I think we may need to modify retargetting capability of ChannelEventQueue and use it instead.  Now retargeting implementation is pretty lame in that class.   My idea is to enhance the RunOrEnqueue method by one argument - an event target.  Each ChannelEvent helper class object will carry the event target with it.  ChannelEventQueue::FlushQueue will process the events, but before Run is called, the target thread will be checked.  If not equal to the current thread, FlushQueue will re-post itself to that thread.  This will ensure sequencing of all the events.  Also, when there is something in the queue but the target thread passed to RunOrEnqueue is the current thread, the even must be enqueued to preserve order, obviously.

Makes sense?  mEventQ->RetargetDeliveryTo is used only on two places and both I believe can be easily changed to use the suggested API change.

@@ +3015,5 @@
> +  // any ODA/OnStopRequest callbacks.
> +  nsTArray<UniquePtr<ChannelEvent>> cancelEvent;
> +  cancelEvent.AppendElement(MakeUnique<CancelEvent>(this, aRv));
> +
> +  mEventQ->PrependEvents(cancelEvent);

yes, this is a good solution to put in front.

but with the suggested changes above this method would look just like:

{
  nsTArray<UniquePtr<ChannelEvent>> cancelEvent;
  cancelEvent.AppendElement(MakeUnique<CancelEvent>(NS_GetMainThread(), this, aRv));
  mEventQ->PrependEvents(cancelEvent);
}

nit: maybe add a method "PrependEvent" for a single event addition?
Comment on attachment 8823212 [details]
Bug 1320744 - Part 3, implement nsIThreadRetargetableRequest in HttpChannelChild.

https://reviewboard.mozilla.org/r/101796/#review117136

see comment 55 (would give f+ but rb is limited even just in that simple thing...)
Attachment #8823212 - Flags: review?(honzab.moz) → review-
(In reply to Honza Bambas (:mayhemer) from comment #55)
sgtm, this will allow ChannelEventQueue to continue event process on multiple threads.
The proposed change of ChannelEventQueue in comment #55 have one restriction: the producer of ChannelEvent is always on main thread. 

In HTTP ODA retargeting scenario, retarget thread is provided during `OnStartRequest` callback, which happens on main thread. If ODA event is produced on other thread, you'll not be guaranteed to have the retarget thread to call `RunOrEnqueue` unless the thread is blocked until `OnStartRequest` finished.

This is going to cause trouble when I move content process Http to STS thread in bug 1338493.

nsHttpChannel/nsInputStreamPump doesn't have this issue because nsInputStreamPump requires explicit `AsyncWait` to obtain ODA after OnStartRequest, so the retarget thread can be specified at that time. However, we don't do pulling for receiving IPC message. The solution I can think of is to store the IPC message in a separate queue and process it after OnStartRequest. In this case we can delay the `RunOrEnqueue` until the retarget thread is decided.
Whiteboard: [necko-active] → [necko-active][PBg-HTTP-M1]
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #58)
> The proposed change of ChannelEventQueue in comment #55 have one
> restriction: the producer of ChannelEvent is always on main thread. 
> 
> In HTTP ODA retargeting scenario, retarget thread is provided during
> `OnStartRequest` callback, which happens on main thread. If ODA event is
> produced on other thread, you'll not be guaranteed to have the retarget
> thread to call `RunOrEnqueue` unless the thread is blocked until
> `OnStartRequest` finished.

Yes!  Thanks for catching this so soon.

I can imagine the architecture will be as following:

HttpChannelChild will have a member nsIEventTarget (say, mTargetThread) that will be set (changed?) from RetargetDeliveryTo.  In RecvODA you will call mEventQ->RunOnEnqueue(mTargetThread, event).  Right?

So... what if the nsIEventTarget reference to RunOrEnqueue ODAs on would be your own implementation of that interface that will be assign a real thread later, in this case somewhere from OnStartRequest.  

like (very schematically):

class MyLazyTarget : nsIEventTarget // held as HttpChannelChild { RefPtr<> mTargetThread }
{
  mTarget;
  nsresult IsOnCurrentThread(bool *res) { mTarget ? mTarget->IsOnCurrentThread(res) : *res = false; return OK }
  FORWARD_ALL_OTHER_METHODS(mTarget->)
public:
  SetTarget(aTarget) { mTarget = aTarget } // called from: RetargetDeliveryTo(t) { mTargetThread->SetTarget(t) }
};

The ChannelEventQueue class is using (calling on) the target thread in a later stage (after you are out of OnStartRequest) anyway, so I think it could solve the problem.


Also, (if not already implemented) you need to put stuff to the queue if any thread is inside of a ChannelEvent(derivate)::Run().  A re-entrant locked counter in ChannelEventQueue will be needed.  When it's non-null, RunOrEnqueue will always enqueue.

Makes sense?

> 
> This is going to cause trouble when I move content process Http to STS
> thread in bug 1338493.
> 
> nsHttpChannel/nsInputStreamPump doesn't have this issue because
> nsInputStreamPump requires explicit `AsyncWait` to obtain ODA after
> OnStartRequest, so the retarget thread can be specified at that time.

Actually, must be.

> However, we don't do pulling for receiving IPC message. The solution I can
> think of is to store the IPC message in a separate queue and process it
> after OnStartRequest. In this case we can delay the `RunOrEnqueue` until the
> retarget thread is decided.

If the nsIEventTarget wrapper idea has some serious flaw, then this could do, but separate queues sounds like something rather over complicated.
(In reply to Honza Bambas (:mayhemer) from comment #59)
> (In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment
> #58)
> > The proposed change of ChannelEventQueue in comment #55 have one
> > restriction: the producer of ChannelEvent is always on main thread. 
> > 
> > In HTTP ODA retargeting scenario, retarget thread is provided during
> > `OnStartRequest` callback, which happens on main thread. If ODA event is
> > produced on other thread, you'll not be guaranteed to have the retarget
> > thread to call `RunOrEnqueue` unless the thread is blocked until
> > `OnStartRequest` finished.
> 
> Yes!  Thanks for catching this so soon.
> 
> I can imagine the architecture will be as following:
> 
> HttpChannelChild will have a member nsIEventTarget (say, mTargetThread) that
> will be set (changed?) from RetargetDeliveryTo.  In RecvODA you will call
> mEventQ->RunOnEnqueue(mTargetThread, event).  Right?
> 
> So... what if the nsIEventTarget reference to RunOrEnqueue ODAs on would be
> your own implementation of that interface that will be assign a real thread
> later, in this case somewhere from OnStartRequest.  
> 
> like (very schematically):
> 
> class MyLazyTarget : nsIEventTarget // held as HttpChannelChild { RefPtr<>
> mTargetThread }
> {
>   mTarget;
>   nsresult IsOnCurrentThread(bool *res) { mTarget ?
> mTarget->IsOnCurrentThread(res) : *res = false; return OK }
>   FORWARD_ALL_OTHER_METHODS(mTarget->)
> public:
>   SetTarget(aTarget) { mTarget = aTarget } // called from:
> RetargetDeliveryTo(t) { mTargetThread->SetTarget(t) }
> };
> 
> The ChannelEventQueue class is using (calling on) the target thread in a
> later stage (after you are out of OnStartRequest) anyway, so I think it
> could solve the problem.
The major flaw is that all the nsIEventTarget operation on MyLazyTarget is not guaranteed to happen on the same event target object, since `SetTarget` can be called on other thread simultaneously. This will make the event queue processing more unpredictable.

We can do it the other way around: make each ChannelEvent to provide the nsIEventTarget they want. In this case, we can get the event target at the beginning of the event processing in ChannelEventQueue.h/.cpp and guarantee all the event target operations are on the same object in `RunOrEnqueue` and `FlushQueue`.
> 
> 
> Also, (if not already implemented) you need to put stuff to the queue if any
> thread is inside of a ChannelEvent(derivate)::Run().  A re-entrant locked
> counter in ChannelEventQueue will be needed.  When it's non-null,
> RunOrEnqueue will always enqueue.
Sure, will be included in my implementation.
Attached patch p2-channel-event-queue.patch (obsolete) — Splinter Review
summary of the changes:

1. make ChannelEventQueue to be able to run, enqueue, and flush on multiple threads.
2. leverage queue suspension to restrict single event execution over multiple threads.
3. allow AutoEventEnqueuer on multiple threads.
Attachment #8841550 - Attachment is obsolete: true
Attachment #8843862 - Flags: review?(honzab.moz)
Attachment #8843863 - Flags: review?(honzab.moz)
Comment on attachment 8843862 [details] [diff] [review]
p2-channel-event-queue.patch

Review of attachment 8843862 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/ipc/ChannelEventQueue.cpp
@@ +89,5 @@
>  
>    if (!--mSuspendCount) {
>      RefPtr<Runnable> event =
> +      NewCancelableRunnableMethod(this, &ChannelEventQueue::CompleteResume);
> +    Unused << NS_WARN_IF(NS_FAILED(NS_DispatchToCurrentThread(event.forget())));

hit bug 1345319 here because Fetch API will retarget to nsStreamTransportService, which is a thread pool.
:sc, can you please give a proper description on the patches which part is which?  like "part 1 of 4" or something.  it's hard to figure out which patch to review first, second etc.  Thanks.
Flags: needinfo?(schien)
Ah!  I see it now, you made copies in spliter :D  got it.  Thanks.
I'll be doing splinter copies (if necessary) myself, you don't need to do it.
Flags: needinfo?(schien)
Comment on attachment 8843862 [details] [diff] [review]
p2-channel-event-queue.patch

Review of attachment 8843862 [details] [diff] [review]:
-----------------------------------------------------------------

I'm a bit puzzled here.  you were talking about problems when ondata can arrive sooner than we finish onstart so that you can't pass target as an arg to RunOrEnque.  but this patch shows something else.  but the patch works according my local testing.

can you summarize what problems you are facing?  can you file any bugs and make them blocking this one of there are any issues left to fix first?

::: netwerk/ipc/ChannelEventQueue.cpp
@@ +89,5 @@
>  
>    if (!--mSuspendCount) {
>      RefPtr<Runnable> event =
> +      NewCancelableRunnableMethod(this, &ChannelEventQueue::CompleteResume);
> +    Unused << NS_WARN_IF(NS_FAILED(NS_DispatchToCurrentThread(event.forget())));

Note that the dispatch instead of a direct call of CompleteResume has been introduced in bug 870564.

::: netwerk/ipc/ChannelEventQueue.h
@@ +63,4 @@
>    // @param aAssertionWhenNotQueued - this optional param will be used in an
>    //   assertion when the event is executed directly.
>    inline void RunOrEnqueue(ChannelEvent* aCallback,
> +                           nsIEventTarget* aTarget = nullptr,

I though you wanted to let the target hang off of the event and not pass it as an arg?

@@ +105,4 @@
>  
>    uint32_t mSuspendCount;
> +  bool mSuspended;
> +  uint32_t mForcedCount;

write a comment what this is

@@ +168,5 @@
>  ChannelEventQueue::StartForcedQueueing()
>  {
>    MutexAutoLock lock(mMutex);
>    mForced = true;
> +  ++mForcedCount;

doesn't just one flag do?  why do you need a bool and a counter?  for mSuspend/mSuspendCount it makes sense, because those are used somewhat independently.  But looking at the code, you would do with just if (!!mForcedCount) instead of if (mForced) on all places.

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +305,5 @@
>  {
>    LOG(("HttpChannelChild::RecvAssociateApplicationCache [this=%p]\n", this));
>    mEventQ->RunOrEnqueue(new AssociateApplicationCacheEvent(this, groupID,
> +                                                           clientID),
> +                        mNeckoTarget);

I think this should always end up on the main thread.

@@ +844,5 @@
>    MOZ_RELEASE_ASSERT(!mFlushedForDiversion,
>      "Should not be receiving any more callbacks from parent!");
>  
>    mEventQ->RunOrEnqueue(new StopRequestEvent(this, channelStatus, timing),
> +                        mNeckoTarget,

shouldn't this always end up on the main thread too?

@@ +1249,5 @@
>    RefPtr<HttpChannelChild> self(this);
>    Send__delete__(this);
>  
> +  // XXX
> +  mNeckoTarget = nullptr;

needs a comment.  and how do you ensure this is thread-safe thing to do?

@@ +1343,5 @@
>    mEventQ->RunOrEnqueue(new Redirect1Event(this, registrarId, newUri,
>                                             redirectFlags, responseHead,
>                                             securityInfoSerialization,
> +                                           channelId),
> +                        mNeckoTarget);

this must happen on the main thread

@@ +1498,5 @@
>  mozilla::ipc::IPCResult
>  HttpChannelChild::RecvRedirect3Complete()
>  {
>    LOG(("HttpChannelChild::RecvRedirect3Complete [this=%p]\n", this));
> +  mEventQ->RunOrEnqueue(new Redirect3Event(this), mNeckoTarget);

this as well.

@@ +2120,5 @@
>    if (dispatcher) {
>      nsCOMPtr<nsIEventTarget> target =
>        dispatcher->EventTargetFor(TaskCategory::Network);
>      gNeckoChild->SetEventTargetForActor(this, target);
> +    mNeckoTarget = target;

how are you ensuring atomicity or sequential access to this member?
Attachment #8843862 - Flags: review?(honzab.moz) → feedback+
Comment on attachment 8843863 [details] [diff] [review]
p3-retargetable-http-child.patch

Review of attachment 8843863 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +829,5 @@
>      return;
>  
>    nsresult rv = mListener->OnDataAvailable(aRequest, aContext, aStream, offset, count);
>    if (NS_FAILED(rv)) {
> +    TryCancel(rv);

could the method be named "CancelOnMainThread" ?

@@ +2993,5 @@
> +    nsCOMPtr<nsIThread> mainThread;
> +    NS_GetMainThread(getter_AddRefs(mainThread));
> +    target = mainThread.forget();
> +  }
> +  mEventQ->PrependEvents(cancelEvent, target);

should use the PrependEvent method?
Attachment #8843863 - Flags: review?(honzab.moz) → review+
Comment on attachment 8823212 [details]
Bug 1320744 - Part 3, implement nsIThreadRetargetableRequest in HttpChannelChild.

https://reviewboard.mozilla.org/r/101796/#review120552
Attachment #8823212 - Flags: review?(honzab.moz) → review+
Comment on attachment 8843871 [details]
Bug 1320744 - Part 2, Allow ChannelEventQueue to perform flush on multiple threads.

https://reviewboard.mozilla.org/r/117488/#review120554
Attachment #8843871 - Flags: review?(honzab.moz)
Attachment #8843862 - Attachment is obsolete: true
Attachment #8843863 - Attachment is obsolete: true
Comment on attachment 8843863 [details] [diff] [review]
p3-retargetable-http-child.patch

Review of attachment 8843863 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +829,5 @@
>      return;
>  
>    nsresult rv = mListener->OnDataAvailable(aRequest, aContext, aStream, offset, count);
>    if (NS_FAILED(rv)) {
> +    TryCancel(rv);

Sure. That makes the code more self-explanatory.

@@ +2993,5 @@
> +    nsCOMPtr<nsIThread> mainThread;
> +    NS_GetMainThread(getter_AddRefs(mainThread));
> +    target = mainThread.forget();
> +  }
> +  mEventQ->PrependEvents(cancelEvent, target);

Good catch. Totally forgot to modify this part...
Comment on attachment 8843862 [details] [diff] [review]
p2-channel-event-queue.patch

Review of attachment 8843862 [details] [diff] [review]:
-----------------------------------------------------------------

Here is the summary of issues I encountered:

1. the main-thread-producer-only limitation I mentioned in comment 58 is not going to take effect until I start working on bug 1338493 (move HttpChannel IPC from main thread to STS thread in content process). That's why my current patch works. My first step is to make sure multi-thread queue consuming works. Next step is to decide whether to 

2. the DispatchToCurrentThread-in-thread-pool issue will be triggered by current patch. Bug 1345319 is filed but I was not able to solve it by myself. In despite I'm trying to workaround it by not calling DispatchToCurrentThread in Resume.

3. While I'm trying to write the patch for the workaround, I hit the ServiceWorker+Diversion issue because the I change the timing of queue flushing for Resume. It reveals a issue that synthesized ODA/OnStopRequest is not aware by ChannelEventQueue, so the guard event `HttpFlushedForDiversionEvent` cannot guarantee all the ODA/OnStopRequest are diverted to parent. I'll explain it in more detail in next comment.

::: netwerk/ipc/ChannelEventQueue.cpp
@@ +89,5 @@
>  
>    if (!--mSuspendCount) {
>      RefPtr<Runnable> event =
> +      NewCancelableRunnableMethod(this, &ChannelEventQueue::CompleteResume);
> +    Unused << NS_WARN_IF(NS_FAILED(NS_DispatchToCurrentThread(event.forget())));

Thanks, good to know the background of this line of code. In my local patch I tried to workaround bug 1345319 by making a direct call to CompleteResume. I'll make sure we don't hit the original multiple suspend issue.

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +305,5 @@
>  {
>    LOG(("HttpChannelChild::RecvAssociateApplicationCache [this=%p]\n", this));
>    mEventQ->RunOrEnqueue(new AssociateApplicationCacheEvent(this, groupID,
> +                                                           clientID),
> +                        mNeckoTarget);

I should have added a clear comment about mNeckoTarget. This is the event target for Network events on main thread (Quantum DOM stuff, see bug 1318506).

@@ +1249,5 @@
>    RefPtr<HttpChannelChild> self(this);
>    Send__delete__(this);
>  
> +  // XXX
> +  mNeckoTarget = nullptr;

My stupid mistake...I should restore the original comment because it is for the same purpose. mNeckoTarget is used only on main thread so no thread-safty issue (This is not the retargeted thread for ODA to be clear).
This diagram illustrates the issue I see in ServiceWorker diversion scenario.

In current implementation, mSynthesizedResponsePump->Resume() will dispatch nsInputStreamPump::OnInputStreamReady to main thread and ChannelEventQueue::Resume will also dispatch a task back to current thread to complete resume. So the synthesized ODA/OnStopRequest (which generated by OnInputStreamReady) happens to be executed before FlushedForDiversion (which called after even queue resumption). HttpChannelParent will recv ODA/OnStopRequest/DivertComplete in correct order.

If I make ChannelEventQueue not dispatching a task but call CompleteResume directly. The FlushedForDiversion will be executed first which cause DivertComplete be received before ODA/OnStopRequest.

One solution is that DivertComplete should be sent after both synthesized OnStopRequest is diverted and HttpFlushedForDiversionEvent is executed in ServiceWorker scenario. IMO this is legit because the ODA/OnStopRequest is generated from content process in this scenario, DivertComplete should only be sent after all events are populated.

Need @jdm and @mayhemer to take a look at my proposal.
Flags: needinfo?(josh)
Flags: needinfo?(honzab.moz)
Thanks for the graph!  It really helped.

OK, so you are calling CompleteResume() directly from Resume() to w/a bug 1345319.  But it is causing reorder of events.  Should we then rather invest time to fix bug 1345319?  It's a bug that should be fixed anyway.  I think it's not a good idea to remove the dispatch of CompleteResume() from Resume(), because it could cause other regressions we may not yet know about.  This is a large change already, so we should keep most of the behavior as it was before the patch, if possible.

I can help fixing bug 1345319 if you want me to.
Flags: needinfo?(honzab.moz) → needinfo?(schien)
Agreed that we should preserve the behavior as much as we can.

@mayhemer, it will help me a lot if you can take a look at bug 1345319, since I have no clue on what that patch might look like.
Flags: needinfo?(schien)
(In reply to Shih-Chiang Chien [:schien] (UTC+8) (use ni? plz) from comment #79)
> Agreed that we should preserve the behavior as much as we can.
> 
> @mayhemer, it will help me a lot if you can take a look at bug 1345319,
> since I have no clue on what that patch might look like.

Ok, I found another way to workaround bug 1345319. With the fix I mentioned in comment #60, we can now always get the correct event target and call `nsIEventTarget::Dispatch`. In this way we can avoid NS_DispatchToCurrentThread in ChannelEventQueue.
@mayhemer, you may start review my patches again. To be noticed that there are still one docshell leak/HttpChannelChild leak in browser mochitest (https://treeherder.mozilla.org/#/jobs?repo=try&revision=5da83f9fcc7f1c1a22335d8faee5540ce950deed).
It is related to enabling thread retargeting in nsHTML5StreamParser, try shows green if I manually turn off HTTP retargeting in nsHTML5StreamParser (https://treeherder.mozilla.org/#/jobs?repo=try&revision=914ef087543fd82878ca11c37923602ac050d933). I'm still trying to figure out the root cause.
Flags: needinfo?(josh)
The root cause of docshell leak/HttpChannelChild leak in browser chrome tests is that DoOnStatus/DoOnProgress cannot be run after DoOnStopRequest. OnStopRequest might trigger test case finish and will trigger test framework shutdown. The async DoOnStatus/DoOnProgress might cause the delay of CC on nsDocShell. This will result fails alarm in ShutdownLeak.

A workaround is to postpone OnStopRequest processing by resubmitting it to the end of the main thread event target, if thread retargeting is enabled. In this case we can guarantee that all the DoOnStatus/DoOnProgress is done before DoOnStopRequest. This workaround can be removed after bug 1338493 is complete because we'll always submit OnStopRequest from STS thread to main thread.
Comment on attachment 8823212 [details]
Bug 1320744 - Part 3, implement nsIThreadRetargetableRequest in HttpChannelChild.

part 3 should be reviewed again.
Attachment #8823212 - Flags: review+ → review?(honzab.moz)
@mayhemer, any chance to provide review result today?
Flags: needinfo?(honzab.moz)
I'll see what I can do.  Pretty busy currently.  Sorry for delays.
Flags: needinfo?(honzab.moz)
Comment on attachment 8849952 [details] [diff] [review]
(for splinter review)  Bug 1320744 - Part 2, Allow ChannelEventQueue to perform flush on multiple threads.

Review of attachment 8849952 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/ipc/ChannelEventQueue.h
@@ +122,5 @@
>                                  bool aAssertionWhenNotQueued)
>  {
>    MOZ_ASSERT(aCallback);
>  
> +  // Events execution could be destruction of channel (and our own

'could be a destruction of the channel'

@@ +148,5 @@
> +
> +    nsCOMPtr<nsIEventTarget> target = event->GetEventTarget();
> +    MOZ_ASSERT(target);
> +
> +    bool isCurrentThread;

init to false

@@ +155,5 @@
> +
> +    if (!isCurrentThread) {
> +      SuspendInternal();
> +      mEventQueue.AppendElement(Move(event));
> +      ResumeInternal();

add a comment explaining why you need to suspend/resume

@@ +192,5 @@
> +inline nsresult
> +ChannelEventQueue::PrependEvent(UniquePtr<ChannelEvent>& aEvent)
> +{
> +  MutexAutoLock lock(mMutex);
> +  MOZ_ASSERT(mSuspended || !!mForcedCount);

explain well this assertion

@@ +208,5 @@
>  inline nsresult
>  ChannelEventQueue::PrependEvents(nsTArray<UniquePtr<ChannelEvent>>& aEvents)
>  {
>    MutexAutoLock lock(mMutex);
> +  MOZ_ASSERT(mSuspended || !!mForcedCount);

as well here.

::: netwerk/ipc/ChannelEventQueue.cpp
@@ +51,5 @@
> +      MOZ_ASSERT(!mFlushing);
> +      mFlushing = true;
> +    }
> +
> +    while (true) {

few objections, definitely all for followups: 

- what if an event (on this thread) suspends the queue?  I don't see it being checked.  

- Suspend from a different thread is probably OK as it won't see any events until Resume (because of repost).  still a bit racy, but probably OK to leave as is (the code was racy before this patch already).

@@ +60,5 @@
> +
> +      nsCOMPtr<nsIEventTarget> target = event->GetEventTarget();
> +      MOZ_ASSERT(target);
> +
> +      bool isCurrentThread;

init to false

@@ +62,5 @@
> +      MOZ_ASSERT(target);
> +
> +      bool isCurrentThread;
> +      DebugOnly<nsresult> rv = target->IsOnCurrentThread(&isCurrentThread);
> +      MOZ_ASSERT(NS_SUCCEEDED(rv));

hmm.. we will just loop when this fails.  could we rather DIAGNOSTIC_ASSERT(false) and either throw the event away or exec on the current thread?

@@ +87,3 @@
>    }
>  
> +  // We can only resume after completely exits flushing state.

don't type obvious comments, say why you do this.

@@ +104,5 @@
> +{
> +  mMutex.AssertCurrentThreadOwns();
> +
> +  mSuspended = true;
> +  mSuspendCount++;

nit: a small assert that mSuspended == !!mSuspendCount before you do the changes?

@@ +138,4 @@
>  
> +    nsCOMPtr<nsIEventTarget> target;
> +      target = mEventQueue[0]->GetEventTarget();
> +    MOZ_ASSERT(target);

can you add a comment why you are picking the target of the next event?

::: netwerk/protocol/ftp/FTPChannelChild.cpp
@@ +259,5 @@
>  
> +  already_AddRefed<nsIEventTarget> GetEventTarget()
> +  {
> +    return do_GetMainThread();
> +  }

aren't you missing 'override' here?  and all other places you are implementing GetEventTarget()?  maybe you could have a base class implementing this method specific to each module.  there seems to be a lot of duplicated code among the other files too.

maybe helpful could be to have 

class ChannelMainThreadEvent : public ChannelEvent 
{ 
  already_AddRefed<nsIEventTarget> GetEventTarget() override
  {
    return do_GetMainThread();
  } 
};

in ChannelEventQueue.h and derive here (and where applicable) from it instead of ChannelEvent
Comment on attachment 8850522 [details] [diff] [review]
(for splinter review) Bug 1320744 - Part 3, implement nsIThreadRetargetableRequest in HttpChannelChild.

Review of attachment 8850522 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +761,3 @@
>    const int64_t progressMax = mResponseHead->ContentLength();
>    const int64_t progress = offset + count;
> +  if (NS_IsMainThread()) {

blank line before this line please

this whole block needs a good comment WHY it's been done.

@@ +958,5 @@
>      SendDivertOnStopRequest(channelStatus);
>      return;
>    }
>  
> +  // In thread retargeting is enabled, there might be DoOnStatus/DoOnProgress

// If ?

are references to DoOnStatus and DoOnProgress correct?  I don't see events for them, just direct calls.  Are you rather referring to events posted by mProgressSink?

Order of OnStatus/OnProgress/OnStopRequest calls on HttpChannelChild are ensured by dispatching them via mEventQ, so they are ensured to be sequenced even among threads.

Therefor, I'm not sure how we can even get call to HttpChannelChild::OnStopRequest when there are still pending some status and progress events.  This should be made more clear.  Unless I'm missing something (can be :)) there could be a bug in the event queue logic (part 2 patch).

@@ +961,5 @@
>  
> +  // In thread retargeting is enabled, there might be DoOnStatus/DoOnProgress
> +  // sit in the main thread event queue. We need to ensure OnStopRequest is
> +  // fired after that by resubmitting the StopRequestEvent at the end of main
> +  // thread event queueu.

queueu

@@ +3134,5 @@
> +    MOZ_ASSERT(!NS_IsMainThread());
> +    MOZ_ASSERT(aChild);
> +  }
> +
> +  void Run() { mChild->Cancel(mRv); }

assert thread, if applicable.

@@ +3144,5 @@
> +    return target.forget();
> +  }
> +
> +private:
> +  HttpChannelChild* mChild;

why isn't this strong?

::: parser/html/nsHtml5StreamParser.cpp
@@ -983,5 @@
>  
>    if (NS_FAILED(rv)) {
> -    // for now skip warning if we're on child process, since we don't support
> -    // off-main thread delivery there yet.  This will change with bug 1015466
> -    if (!XRE_IsContentProcess()) {

:))
Comment on attachment 8843871 [details]
Bug 1320744 - Part 2, Allow ChannelEventQueue to perform flush on multiple threads.

https://reviewboard.mozilla.org/r/117488/#review125564

https://bugzilla.mozilla.org/show_bug.cgi?id=1320744#c103
Attachment #8843871 - Flags: review?(honzab.moz) → review+
Comment on attachment 8823212 [details]
Bug 1320744 - Part 3, implement nsIThreadRetargetableRequest in HttpChannelChild.

https://reviewboard.mozilla.org/r/101796/#review125568

https://bugzilla.mozilla.org/show_bug.cgi?id=1320744#c104

I really sincerely hate review board!
Attachment #8823212 - Flags: review?(honzab.moz) → review+
Attachment #8850522 - Attachment description: (for splinter review) Part 3, implement nsIThreadRetargetableRequest in HttpChannelChild. → (for splinter review) Bug 1320744 - Part 3, implement nsIThreadRetargetableRequest in HttpChannelChild.
Comment on attachment 8850522 [details] [diff] [review]
(for splinter review) Bug 1320744 - Part 3, implement nsIThreadRetargetableRequest in HttpChannelChild.

Review of attachment 8850522 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/protocol/http/HttpChannelChild.cpp
@@ +761,3 @@
>    const int64_t progressMax = mResponseHead->ContentLength();
>    const int64_t progress = offset + count;
> +  if (NS_IsMainThread()) {

Done. Following comment was added.

// OnTransportAndData will be run on retargeted thread if applicable, however
// OnStatus/OnProgress event can only be fired on main thread. We need to
// dispatch the status/progress event handling back to main thread with the
// appropriate event target for networking.

@@ +958,5 @@
>      SendDivertOnStopRequest(channelStatus);
>      return;
>    }
>  
> +  // In thread retargeting is enabled, there might be DoOnStatus/DoOnProgress

Here I tried reference to the NS_NewRunnableFunction dispatched in OnTransportAndData (line 771), not the individual StatusEvent/ProgressEvent.

Maybe I should use "event target" instead of "event queue" to avoid the confusion.

Here is the scenario I meant:

[on main thread]
> RecvOnTransportAndData
>  - RunOrEnqueue(TransportAndDataEvent)
[switch to retargeted thread]
>   - CompleteResume / FlushQueue
>    - OnTransportAndData
>     - dispatch runnable to main thread for DoOnStatus/DoOnProgress
[switch back to main thread]
> RecvOnStopRequest
>  - RunOrEnqueue(OnStopRequestEvent)
>   - OnStopRequest
>    - call mListener->OnStopRequest
[process next runnable on main thread]
> RunnableFunction::Run
>  - DoOnStatus / DoOnProgress
>   - call mListener->OnStatus / mListener->OnProgress

My code here will pending the ChannelEventQueue processing back to the end of main thread event target. Thus all the RunnableFunction we generated in OnTransportAndData will be executed before mListener->OnStopRequest

@@ +961,5 @@
>  
> +  // In thread retargeting is enabled, there might be DoOnStatus/DoOnProgress
> +  // sit in the main thread event queue. We need to ensure OnStopRequest is
> +  // fired after that by resubmitting the StopRequestEvent at the end of main
> +  // thread event queueu.

Done.

@@ +3134,5 @@
> +    MOZ_ASSERT(!NS_IsMainThread());
> +    MOZ_ASSERT(aChild);
> +  }
> +
> +  void Run() { mChild->Cancel(mRv); }

HttpChannelChild::Cancel already have thread assertion and looks like a convention in HttpChannelChild to not add extra thread assertion (assertion in callee). Though I can always add it here.

@@ +3144,5 @@
> +    return target.forget();
> +  }
> +
> +private:
> +  HttpChannelChild* mChild;

This is a convention of the ChannelEvent to avoid reference cycle (HttpChannelChild -> ChannelEventQueue -> ChannelEvent). Otherwise HttpChannelChild dtor will never be called if there is a ChannelEvent in queue, additional ChannelEventQueue::Clear will be needed and called all over the place when we abort HttpChannelChild.
Comment on attachment 8849952 [details] [diff] [review]
(for splinter review)  Bug 1320744 - Part 2, Allow ChannelEventQueue to perform flush on multiple threads.

Review of attachment 8849952 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/ipc/ChannelEventQueue.h
@@ +122,5 @@
>                                  bool aAssertionWhenNotQueued)
>  {
>    MOZ_ASSERT(aCallback);
>  
> +  // Events execution could be destruction of channel (and our own

Done.

@@ +148,5 @@
> +
> +    nsCOMPtr<nsIEventTarget> target = event->GetEventTarget();
> +    MOZ_ASSERT(target);
> +
> +    bool isCurrentThread;

Done.

@@ +155,5 @@
> +
> +    if (!isCurrentThread) {
> +      SuspendInternal();
> +      mEventQueue.AppendElement(Move(event));
> +      ResumeInternal();

Here I simply reuse the code in Resume to decide next thread to run event. Here is the comment I add:

// This event cannot be run on current thread so put it in queue and
// resume it on corresponding thread.

@@ +192,5 @@
> +inline nsresult
> +ChannelEventQueue::PrependEvent(UniquePtr<ChannelEvent>& aEvent)
> +{
> +  MutexAutoLock lock(mMutex);
> +  MOZ_ASSERT(mSuspended || !!mForcedCount);

I add following comment:

// Prepending event while no queue flush foreseen might cause the following
// channel events not run. This assertion here guarantee there must be a
// queue flush, either triggered by Resume or EndForcedQueueing, to execute
// the added events.

@@ +208,5 @@
>  inline nsresult
>  ChannelEventQueue::PrependEvents(nsTArray<UniquePtr<ChannelEvent>>& aEvents)
>  {
>    MutexAutoLock lock(mMutex);
> +  MOZ_ASSERT(mSuspended || !!mForcedCount);

same as above.

::: netwerk/ipc/ChannelEventQueue.cpp
@@ +51,5 @@
> +      MOZ_ASSERT(!mFlushing);
> +      mFlushing = true;
> +    }
> +
> +    while (true) {

Suspend during flushing is done by TakeEvent [1], nullptr will be return in this case. So, suspend from any thread will not stop the running channel event but flush will be aborted after that.

[1] https://searchfox.org/mozilla-central/rev/c48398abd9f0f074c69f2223260939e30e8f99a8/netwerk/ipc/ChannelEventQueue.cpp#23

@@ +60,5 @@
> +
> +      nsCOMPtr<nsIEventTarget> target = event->GetEventTarget();
> +      MOZ_ASSERT(target);
> +
> +      bool isCurrentThread;

Done.

@@ +62,5 @@
> +      MOZ_ASSERT(target);
> +
> +      bool isCurrentThread;
> +      DebugOnly<nsresult> rv = target->IsOnCurrentThread(&isCurrentThread);
> +      MOZ_ASSERT(NS_SUCCEEDED(rv));

throwing away that event might be problematic. I'll add a DIAGNOSTIC_ASSERT and run it on current thread.

@@ +87,3 @@
>    }
>  
> +  // We can only resume after completely exits flushing state.

Invoking Resume() directly in the while loop might make CompleteResume being executed while mFlushing is still true. In this case we'll not do FlushQueue because we think queue flushing is running. So, this obvious comment is the actual reason, you should not try ask for queue resumption during flush. I'm going to extend the comment as:

// The flush procedure is aborted because next event cannot be run on current
// thread. We need to resume the event processing right after flush procedure
// is finished.
// Note: we cannot call Resume() while "mFlushing == true" because
// CompleteResume will not trigger FlushQueue while there is an ongoing flush.

@@ +104,5 @@
> +{
> +  mMutex.AssertCurrentThreadOwns();
> +
> +  mSuspended = true;
> +  mSuspendCount++;

This assertion cannot hold in following case:

>Suspend
> - mSuspend = true, mSuspendCount = 1
>Resume
> - ResumeInternal
>  - mSuspendCount = 0;
>  - dispatch CompleteResume to another thread
>Suspend
> - true != !!0

@@ +138,4 @@
>  
> +    nsCOMPtr<nsIEventTarget> target;
> +      target = mEventQueue[0]->GetEventTarget();
> +    MOZ_ASSERT(target);

Add following comment:

// Complete the resumption on the event target for the next event so that we
// can flush the queue directly without additional event dispatching.

::: netwerk/protocol/ftp/FTPChannelChild.cpp
@@ +259,5 @@
>  
> +  already_AddRefed<nsIEventTarget> GetEventTarget()
> +  {
> +    return do_GetMainThread();
> +  }

We are doing necko event labeling right now (e.g. bug 1343750 for FTPChannel), so we'll see less and less ChannelEvent dispatching to main thread directly. Thus the value of adding ChannelMainThreadEvent will become lower by time. However I can still do that to minimize my patch.
Comment on attachment 8849952 [details] [diff] [review]
(for splinter review)  Bug 1320744 - Part 2, Allow ChannelEventQueue to perform flush on multiple threads.

Review of attachment 8849952 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/ipc/ChannelEventQueue.h
@@ +155,5 @@
> +
> +    if (!isCurrentThread) {
> +      SuspendInternal();
> +      mEventQueue.AppendElement(Move(event));
> +      ResumeInternal();

What I wanted was to explain why you have to bother with suspending and resuming.  This just repeats what the code does (= a useless comment).

@@ +192,5 @@
> +inline nsresult
> +ChannelEventQueue::PrependEvent(UniquePtr<ChannelEvent>& aEvent)
> +{
> +  MutexAutoLock lock(mMutex);
> +  MOZ_ASSERT(mSuspended || !!mForcedCount);

this is a good comment!  thanks.

::: netwerk/ipc/ChannelEventQueue.cpp
@@ +51,5 @@
> +      MOZ_ASSERT(!mFlushing);
> +      mFlushing = true;
> +    }
> +
> +    while (true) {

aha!  very well hidden ;)

@@ +104,5 @@
> +{
> +  mMutex.AssertCurrentThreadOwns();
> +
> +  mSuspended = true;
> +  mSuspendCount++;

yeah, right, had to realize that.

@@ +138,4 @@
>  
> +    nsCOMPtr<nsIEventTarget> target;
> +      target = mEventQueue[0]->GetEventTarget();
> +    MOZ_ASSERT(target);

Isn't the reason just to save some loops/reposts?  this looks to me simply like a shorthand.  maybe just don't bother with a comment here, it's not anything critical to understand at the first look.

::: netwerk/protocol/ftp/FTPChannelChild.cpp
@@ +259,5 @@
>  
> +  already_AddRefed<nsIEventTarget> GetEventTarget()
> +  {
> +    return do_GetMainThread();
> +  }

OK, a good arg to not add ChannelMainThreadEvent.  but still might be nice to have a base class for each module at least.  up to you.  work is already done.
Comment on attachment 8849952 [details] [diff] [review]
(for splinter review)  Bug 1320744 - Part 2, Allow ChannelEventQueue to perform flush on multiple threads.

Review of attachment 8849952 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/ipc/ChannelEventQueue.h
@@ +155,5 @@
> +
> +    if (!isCurrentThread) {
> +      SuspendInternal();
> +      mEventQueue.AppendElement(Move(event));
> +      ResumeInternal();

The reason I use suspend/resume here is to avoid create another path of triggering MayFlushQueue. Do you want me to put this down in comment?

::: netwerk/ipc/ChannelEventQueue.cpp
@@ +138,4 @@
>  
> +    nsCOMPtr<nsIEventTarget> target;
> +      target = mEventQueue[0]->GetEventTarget();
> +    MOZ_ASSERT(target);

The original code will also dispatch a runnable on the target thread, so basicly we are doing the same thing.

::: netwerk/protocol/ftp/FTPChannelChild.cpp
@@ +259,5 @@
>  
> +  already_AddRefed<nsIEventTarget> GetEventTarget()
> +  {
> +    return do_GetMainThread();
> +  }

Already add MainThreadChannelEvent in my latest local patch.
Pushed by schien@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/afe830d17d3e
Part 1, make refcounter of HttpChannelChild thread-safe. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/fe2d707ec202
Part 2, Allow ChannelEventQueue to perform flush on multiple threads. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/fbe57838663a
Part 3, implement nsIThreadRetargetableRequest in HttpChannelChild. r=mayhemer
Backed out for frequently failing test_bug440572.html on Windows 8 x64 opt and pgo:

https://hg.mozilla.org/integration/autoland/rev/1181e6fcf33f7525f6bc414e83ba0a1f3caf5231
https://hg.mozilla.org/integration/autoland/rev/2994cb1c71365634983039c6dd04bea5e2b83919
https://hg.mozilla.org/integration/autoland/rev/99ef7d4e960de742234cc8e3f9a0ba62bc3f0f9b

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=fbe57838663a9eaf66067c30eb72ff6d51584a81&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=86442021&repo=autoland

12:52:39     INFO - TEST-START | dom/tests/mochitest/bugs/test_bug440572.html
12:52:39     INFO - TEST-INFO | started process screenshot
12:52:39     INFO - TEST-INFO | screenshot: exit 0
12:52:39     INFO - Buffered messages logged at 12:52:39
12:52:39     INFO - TEST-PASS | dom/tests/mochitest/bugs/test_bug440572.html | wrong sender! 
12:52:39     INFO - TEST-PASS | dom/tests/mochitest/bugs/test_bug440572.html | wrong sender! 
12:52:39     INFO - TEST-PASS | dom/tests/mochitest/bugs/test_bug440572.html | wrong sender! 
12:52:39     INFO - TEST-PASS | dom/tests/mochitest/bugs/test_bug440572.html | received the right number of messages. 
12:52:39     INFO - Buffered messages finished
12:52:39     INFO - TEST-UNEXPECTED-FAIL | dom/tests/mochitest/bugs/test_bug440572.html | test in frame failed. - got "SecurityError: Permission denied to access property \"content\" on cross-origin object", expected "success"
12:52:39     INFO -     SimpleTest.is@SimpleTest/SimpleTest.js:310:5
12:52:39     INFO -     runtests@dom/tests/mochitest/bugs/test_bug440572.html:31:3
12:52:39     INFO -     onload@dom/tests/mochitest/bugs/test_bug440572.html:1:1
12:52:39     INFO - TEST-PASS | dom/tests/mochitest/bugs/test_bug440572.html | parent["content"] should be the WebIDL property of Window. 
12:52:39     INFO - Not taking screenshot here: see the one that was previously logged
12:52:39     INFO - TEST-UNEXPECTED-FAIL | dom/tests/mochitest/bugs/test_bug440572.html | parent["dump"] should be the WebIDL property of Window. - didn't expect "success", but got it
12:52:39     INFO -     SimpleTest.isnot@SimpleTest/SimpleTest.js:322:5
12:52:39     INFO -     runtests@dom/tests/mochitest/bugs/test_bug440572.html:33:3
12:52:39     INFO -     onload@dom/tests/mochitest/bugs/test_bug440572.html:1:1
Flags: needinfo?(schien)
The javascript execution of three iframes in test_bug440572.html cannot be guaranteed. I'll try fix this test case.
Flags: needinfo?(schien)
Try looks good (100% pass for 20 run on m-e10s chunk 3 on win8 x64 opt), waiting for @mrbkap's review for the test case modification.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0633d8ae942f590f3ccfaf219701c7b4b2936c0
Comment on attachment 8851367 [details]
Bug 1320744 - Part 4, Allow iframes in test_bug440572.html to be loaded out of sequence. .

https://reviewboard.mozilla.org/r/123678/#review126548
Attachment #8851367 - Flags: review?(mrbkap) → review+
Pushed by schien@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/de9bccfe404e
Part 1, make refcounter of HttpChannelChild thread-safe. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/5edaddffcc08
Part 2, Allow ChannelEventQueue to perform flush on multiple threads. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/3774a5d55ac2
Part 3, implement nsIThreadRetargetableRequest in HttpChannelChild. r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/e2a697abd5d3
Part 4, Allow iframes in test_bug440572.html to be loaded out of sequence. r=mrbkap.
Depends on: 1351558
Depends on: 1371203
Depends on: 1401459
You need to log in before you can comment on or make changes to this bug.