Open Bug 1248889 Opened 8 years ago Updated 2 years ago

Ensure subclasses of nsIEventTarget conform the "dispatch or leak" convention

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

People

(Reporter: xidorn, Unassigned)

References

Details

Attachments

(9 files)

58 bytes, text/x-review-board-request
Details
58 bytes, text/x-review-board-request
Details
58 bytes, text/x-review-board-request
Details
58 bytes, text/x-review-board-request
Details
58 bytes, text/x-review-board-request
Details
58 bytes, text/x-review-board-request
Details
58 bytes, text/x-review-board-request
Details
58 bytes, text/x-review-board-request
khuey
: review-
Details
58 bytes, text/x-review-board-request
khuey
: review-
Details
In bug 1186745, the behavior of nsThread is changed to always leak when dispatch fails. This is also documented in nsIEventTarget.idl. But not all of subclasses of nsIEventTarget have been changed to conform that.

In bug 1237213, I'm going to convert the existing Dispatch method to fallible, and add an infallible Dispatch method for nsIEventTarget.

In addition to the fallible & infallible Dispatch, I think a DispatchOrRelease method would also be useful for cases where it is safe to release in the caller thread if dispatch fails. However, if an implementation of nsIEventTarget doesn't follow "dispatch or leak" convention, it would a general DispatchOrRelease method could be very dangerous.
Try push doesn't show any additional leak: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6048d2cd549

The workaround in part 8 is for the compiler internal error in Hazard Analysis Build in the try push above. The try push with this workaround is here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dce5ade596b4
FWIW, since the behavior change of nsThread, many event targets already have an inconsistent behavior which sometimes leak but sometimes don't. So we need to make them consistent anyway.
Comment on attachment 8722301 [details]
MozReview Request: Bug 1248889 part 8 - Make worker-related targets work properly with leak-when-failure convention.

https://reviewboard.mozilla.org/r/35957/#review32595

I can't figure out how to reassign a review request in reviewboard. But could you reassign these to khuey. He knows these APIs better.
Attachment #8722301 - Flags: review?(jonas)
Comment on attachment 8722302 [details]
MozReview Request: Bug 1248889 part 9 - Make WebSocketImpl always leak event when dispatch fails.

https://reviewboard.mozilla.org/r/35959/#review32597

Same here
Attachment #8722302 - Flags: review?(jonas)
Attachment #8722301 - Flags: review?(khuey)
Attachment #8722302 - Flags: review?(khuey)
Maybe it's because it's late, but I don't understand why providing yet another method that people have to use correctly for dispatching events is an improvement over what we have now.
(In reply to Nathan Froyd [:froydnj] from comment #14)
> Maybe it's because it's late, but I don't understand why providing yet
> another method that people have to use correctly for dispatching events is
> an improvement over what we have now.

Do you mean the DispatchOrRelease?

I think it has been clear that we want to leak the event by default when we are uncertain, since releasing object in a wrong thread could be a disaster.

However, there indeed exist cases where it is safe to release the event in the current thread (some events are actually empty, and just for unlocking the target event loop). But the leak-by-default structure makes it hard/dirty to release it after detecting a dispatch failure. So DispatchOrRelease is just a helper function for that case.
No.

There was a bunch of existing code that existed for 15 years with the "not-leaking" behavior.  It is not ok to change the behavior out from under that code without auditing it.  It is also not ok to selectively change some of the call sites to go back to the old behavior but not others, unless you've shown that these are the only call sites that depend on the old behavior.

Most of Gecko was written with the assumption that a runnable could be destroyed on either the dispatching thread or the executing thread, but that dispatching runnables would not leak.  You simply cannot change that out from underneath the entire codebase for a small bit of code written lately that wants to make the opposite tradeoff.  The "opt-in" behavior should be for the latter group.
Comment on attachment 8722302 [details]
MozReview Request: Bug 1248889 part 9 - Make WebSocketImpl always leak event when dispatch fails.

While I'm r-ing these because this approach is not the right solution to the problem, also note that I do not review patches in mozreview, and will r- any patches sent to me through mozreview.
Attachment #8722302 - Flags: review?(khuey) → review-
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #16)
> There was a bunch of existing code that existed for 15 years with the
> "not-leaking" behavior.  It is not ok to change the behavior out from under
> that code without auditing it.  It is also not ok to selectively change some
> of the call sites to go back to the old behavior but not others, unless
> you've shown that these are the only call sites that depend on the old
> behavior.

I think we've reached a consensus that making Dispatch infallible by default is the way we can take in the mailing list. And this bug is a step preparing doing that.

This bug is mainly for closing the gap of inconsistency of behavior between different event target (and within event targets after changing the behavior of nsThread).

> Most of Gecko was written with the assumption that a runnable could be
> destroyed on either the dispatching thread or the executing thread, but that
> dispatching runnables would not leak.  You simply cannot change that out
> from underneath the entire codebase for a small bit of code written lately
> that wants to make the opposite tradeoff.  The "opt-in" behavior should be
> for the latter group.

C++ doesn't have compulsory ownership check, so I don't quite believe all people keep that in mind. And any place which doesn't follow the assumption would potentially cause security issue. So dispatch-or-release really requires more careful review then dispatch-or-leak.

In addition, if we want both leaking and non-leaking dispatch, we have to leak internally and release it later. Holding the ownership of an object in callsite for leaking will cause race condition which makes it possible to eventually release the object in a wrong thread when dispatch succeeds.

So given this, I think we want a leaking internal function anyway. If you think dispatch-or-leak should be the "opt-in" one, we can consider renaming the current Dispatch to DispatchOrLeak, and DispatchOrRelease to Dispatch. But that's a different topic.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #17)
> Comment on attachment 8722302 [details]
> MozReview Request: Bug 1248889 part 9 - Make WebSocketImpl always leak event
> when dispatch fails.
> 
> While I'm r-ing these because this approach is not the right solution to the
> problem, also note that I do not review patches in mozreview, and will r-
> any patches sent to me through mozreview.

If the argument in comment 18 can convince you to start reviewing, I'll submit the patches as attachment for you to review.
Flags: needinfo?(khuey)
Comment on attachment 8722297 [details]
MozReview Request: Bug 1248889 part 4 - Make nsSocketTransportService and nsStreamTransportService leak when dispatch fails.

I'm not the key reviewer here. When you get agreement with kyle you can renew the r?, or if kyle wants to approve the callsite changes I'm fine with that too.
Attachment #8722297 - Flags: review?(mcmanus)
(In reply to Xidorn Quan [:xidorn] (UTC+8) from comment #18)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #16)
> > There was a bunch of existing code that existed for 15 years with the
> > "not-leaking" behavior.  It is not ok to change the behavior out from under
> > that code without auditing it.  It is also not ok to selectively change some
> > of the call sites to go back to the old behavior but not others, unless
> > you've shown that these are the only call sites that depend on the old
> > behavior.
> 
> I think we've reached a consensus that making Dispatch infallible by default
> is the way we can take in the mailing list. And this bug is a step preparing
> doing that.

How?  Can you spell out the steps to the desired end state here?
Flags: needinfo?(khuey) → needinfo?(quanxunzhen)
1. Make it consistent for all nsIEventTarget to follow the "dispatch or leak" behavior. (this bug)
2. Convert the current Dispatch to accept a new parameter typed fallible_t, and add another Dispatch helper function which takes the current parameters but return void, which assert or crash when dispatch fails. (bug 1237213)
3. Examine crashes due to the change in step 2 case by case, and then fix each via allowing it leak, making it dispatch-or-release, or fix the race condition causing this.
Flags: needinfo?(quanxunzhen)
Attachment #8722300 - Flags: review?(jgilbert)
Comment on attachment 8722300 [details]
MozReview Request: Bug 1248889 part 7 - Make ContextLossWorkerEventTarget leak when dispatch fails.

https://reviewboard.mozilla.org/r/35955/#review33839

::: dom/canvas/WebGLContextLossHandler.cpp:85
(Diff revision 1)
> +    nsresult rv = mEventTarget->Dispatch(wrappedEvent.forget(), aFlags);

Why can we not use DispatchOrRelease here? The code to leak+manually-release below seems more obtuse.
https://reviewboard.mozilla.org/r/35955/#review33839

> Why can we not use DispatchOrRelease here? The code to leak+manually-release below seems more obtuse.

Because we want to leak the runnable passed in. We should release that only if we return NS_OK here, otherwise if someone calls DispatchOrRelease on ContextLossWorkerEventTarget, they would get a double free.
kyle, are you convinced by the steps in comment 22?
Flags: needinfo?(khuey)
Comment on attachment 8722296 [details]
MozReview Request: Bug 1248889 part 3 - Move LeakRefPtr to mfbt.

https://reviewboard.mozilla.org/r/35947/#review35059

I don't think we want this exposed for wider consumption.
Attachment #8722296 - Flags: review?(nfroyd)
Comment on attachment 8722294 [details]
MozReview Request: Bug 1248889 part 1 - Add DispatchOrRelease method in nsIEventTarget and use it in NS_DispatchToCurrentThread and DispatchFromScript of nsThread.

https://reviewboard.mozilla.org/r/35943/#review35085

OK, I understand this patch series a little better now, and if we were to continue doing this mostly-infallible Dispatch, we would want something like this.  But you're going to have to convince Kyle first, who already wants to rip out the patches this work builds on. :)
Attachment #8722294 - Flags: review?(nfroyd)
Attachment #8722295 - Flags: review?(nfroyd)
Comment on attachment 8722295 [details]
MozReview Request: Bug 1248889 part 2 - Make nsThreadPool and SharedThreadPool leak when dispatch fails.

https://reviewboard.mozilla.org/r/35945/#review35087
Comment on attachment 8722298 [details]
MozReview Request: Bug 1248889 part 5 - Make LazyIdleThread leak when dispatch fails.

https://reviewboard.mozilla.org/r/35951/#review35089
Attachment #8722298 - Flags: review?(nfroyd)
Comment on attachment 8722299 [details]
MozReview Request: Bug 1248889 part 6 - Make TimerThread use DispathOrRelease to dispatch TimerEvent.

https://reviewboard.mozilla.org/r/35953/#review35091
Attachment #8722299 - Flags: review?(nfroyd)
(In reply to Xidorn Quan [:xidorn] (UTC+8) from comment #22)
> 1. Make it consistent for all nsIEventTarget to follow the "dispatch or
> leak" behavior. (this bug)
> 2. Convert the current Dispatch to accept a new parameter typed fallible_t,
> and add another Dispatch helper function which takes the current parameters
> but return void, which assert or crash when dispatch fails. (bug 1237213)
> 3. Examine crashes due to the change in step 2 case by case, and then fix
> each via allowing it leak, making it dispatch-or-release, or fix the race
> condition causing this.

The problem with this is that we really should be auditing all of the call sites involved rather than just waiting for problems to show up in step 3.  But it's clear to me that none of the people involved in this project give a shit about doing it correctly, so we might as well do this, since it's better than nothing.
Flags: needinfo?(khuey)
Depends on: 1276549
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.