Closed Bug 330128 Opened 18 years ago Closed 17 years ago

Calling cancel() on a timer doesn't drop ref to the callback

Categories

(Core :: XPCOM, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9beta1

People

(Reporter: bzbarsky, Assigned: sayrer)

References

Details

(Keywords: memory-leak, top-memory-leak)

Attachments

(2 files, 4 obsolete files)

As far as I can see, XPCOM timers don't drop their ref to the callback unless they are reinited or destroyed... I think that they should do it in the following cases:

1)  When cancel() is called.
2)  After firing, if they're ONE_SHOT.

Otherwise callers have to explicitly drop refs to the timer to avoid leaking...
This sounds like a good plan to me, although it might upset some existing callers that expect the timer to hold on to them.
Sayre, would you have the cycles (pun intended) to look at this?
Assignee: nobody → sayrer
Flags: blocking1.9?
Attached patch drop references [WIP] (obsolete) — Splinter Review
This gets through mochitest without crashing, but I think it broke a couple of prototype.js and scriptaculous testcases.
Attached patch drop references [WIP] (obsolete) — Splinter Review
Attachment #285136 - Attachment is obsolete: true
(In reply to comment #4)
> 
> This gets through mochitest without crashing, but I think it broke a couple of
> prototype.js and scriptaculous testcases.

Dropping callbacks from one shot timers in this patch breaks testPeriodicalExecuterStop in prototype.js for sure.

Thats' very interesting.  The DOM setInterval code (which is what that test seems to exercise to me) uses nsITimer::InitWithFuncCallback and reinits after every firing.  So this patch shouldn't affect it....
Attached file somewhat reduced test
(In reply to comment #7)
> Thats' very interesting.  The DOM setInterval code (which is what that test
> seems to exercise to me) uses nsITimer::InitWithFuncCallback and reinits after
> every firing.  So this patch shouldn't affect it....
> 

If the call to ReleaseCallback() is included, the order of events changes. The third call to peEventFired happens, but after the setTimeout call fires.
So in other words the interval timer no longer fires every 100ms?
(In reply to comment #10)
> So in other words the interval timer no longer fires every 100ms?
> 

Here's a detailed description. It looks like nsGlobalWindow is depending on some combination of its timer list and the TimerImpl to keep timers alive. If I take out the setTimeout call, the setInterval call only fires once. nsGlobalWindow::SetTimeoutOrInterval uses one shot timers:

    rv = timeout->mTimer->InitWithFuncCallback(TimerCallback, timeout,
                                               realInterval,
                                               nsITimer::TYPE_ONE_SHOT);

The firing sequence should be something like

100ms -- fire interval
200ms -- fire interval
300ms -- fire interval
...
1000ms -- fire setTimeout

I tried lengthening the setTimeout, and it made no difference. With a setTimeout(2000) call included, the firing sequence is something like

100ms -- fire
...
2000ms -- fire interval
2000ms -- fire setTimeout
2000ms -- fire interval

without the setTimeout call, I get

100ms -- fire
Oh, I see what's going on here.  The nsGlobalWindow code reinits the timer from inside the call into it.  In other words, the call sequence is:

  nsTimerImpl::Fire
    nsGlobalWindow::TimerCallback
      nsGlobalWindow::RunTimeout
        nsTimerImpl::InitWithFuncCallback
    nsTimerImpl::ReleaseCallback

So the ReleaseCallback at the end of Fire() nulls clears out the reiniting (or more importantly clears out the type-setting) that happened during the InitWithFuncCallback call earlier in Fire().  Then when the timer actually fires its type is unknown, so it does nothing.  Adding a setTimeout() "helps" because then we run all the timeouts/intervals that should have come before it.

I think it makes sense for one-shot timers to support this reinit pattern.  Furthermore, if this were not using InitWithFuncCallback the existing timer code could really screw us over: if using InitWithCallback(), say, a reinit of a one-shot timer from under Fire() will cause it to drop the ref to the object it's making a call on (a violation of XPCOM policy).

The right thing to do her might be to have Fire() make a copy of mCallback and mCallbackType, then ReleaseCallback(), _then_ actually make the callback, then if its type is still UNKNOWN and it's not one-shot restore mCallback and mCallbackType, and then release the local copies.  Or something like that.  That would allow all timers to be reinited while firing.
Attached patch drop references [WIP] (obsolete) — Splinter Review
OK, the problem was reseting CALLBACK_TYPE_FUNC callbacks in ReleaseCallback() to CALLBACK_TYPE_UNKNOWN. This patch is still pretty shady, but it no longer fails the test. I think we should add some explicit state to track initialization.
Attachment #285137 - Attachment is obsolete: true
Comment on attachment 285164 [details] [diff] [review]
drop references [WIP]

bz's analysis is correct, marking obsolete.
Attachment #285164 - Attachment is obsolete: true
Oh I do hate nsTimerImpl.cpp. Sayrer, you got this one? I have guilt and blame because I patched up thread safety after pavlov landed this XPCOM timer stuff way back when, but I would love to duck. I'll review FWIW.

/be
(In reply to comment #15)
> Oh I do hate nsTimerImpl.cpp. Sayrer, you got this one?

Yeah, I'll deal with it.
Attached patch allow reinitialization (obsolete) — Splinter Review
Attachment #285421 - Flags: review?(brendan)
Comment on attachment 285421 [details] [diff] [review]
allow reinitialization


>+  // Handle callbacks that re-init the timer, but avoid leaking
>+  // see bug 330128

Nit only: prevailing style uses full stop at end of sentences, capitalizes sentences.

>+  CallbackUnion callback = mCallback;
>+  PRUint8 callbackType = mCallbackType;

Old-school C lusers like me would use PRUintn callbackType, to avoid any chop-to-byte or pad-to-word hazards in comparisons, but that might actually make for fatter code. It shouldn't hurt to use the same type as the member, but if it does, PRUintn should fix that hurt.

>-      mCallback.o->Observe(static_cast<nsITimer*>(this),
>+      callback.o->Observe(static_cast<nsITimer*>(this),
>                            NS_TIMER_CALLBACK_TOPIC,
>                            nsnull);

Bonus points for unrelated, underhanging indentation fix here.

Nice work, thanks for doing this.

/be
Attachment #285421 - Flags: review?(brendan) → review+
s/unrelated, //

/be
Attachment #285421 - Attachment is obsolete: true
Attachment #285433 - Flags: superreview?(bzbarsky)
Comment on attachment 285433 [details] [diff] [review]
address brendan's comments

sr=bzbarsky
Attachment #285433 - Flags: superreview?(bzbarsky) → superreview+
Attachment #285433 - Flags: approval1.9?
Attachment #285433 - Flags: approval1.9? → approval1.9+
Attachment #285433 - Flags: approval1.9+ → approval1.9?
Comment on attachment 285433 [details] [diff] [review]
address brendan's comments

I don't think there's a prohibition on reviewer being approver. If I missed a memo, please point me to it.

/be
Attachment #285433 - Flags: approval1.9? → approval1.9+
Comment on attachment 285433 [details] [diff] [review]
address brendan's comments

No, but the tree rules are currently that only endgame drivers can give approval :(

Sayre, do you really feel safe enough about this patch that it can be shipped in the beta with just a few days of testing? I'm slightly worried, though only because there are so many users of timers.
Attachment #285433 - Flags: approval1.9+ → approval1.9?
(In reply to comment #23)
> (From update of attachment 285433 [details] [diff] [review])
> No, but the tree rules are currently that only endgame drivers can give
> approval :(
> 
> Sayre, do you really feel safe enough about this patch that it can be shipped
> in the beta with just a few days of testing?

It is pretty isolated, it will be easy to back out if we find it causes problems, and we also don't want to ship leaks in the beta.
The patch has good context. The first hunk is the leak fix. The second, big one is doing manual ref-counting but there are no returns or other non-local jumps -- it is all straight line code and you can see the counting balances once you figure out the logic. The third hunk contains a crucial fix to reset the callback type to unknown, and give the union a name so it can be used when declaring the local in the second hunk to save the current callback.

What's not to love? Seriously, this is about as easy to analyze a patch as you are likely to get in nsTimerImpl.cpp.

/be
I'm not worried about bugs in the patch, I'm worried about any of the numerous clients of timers behaving awkwardly due to this change. For example if someone happens to rely on the timer keeping them alive past the timer firing.
Thinking about it more it seems pretty unlikely that that'll happen. I'm slightly worried still that someone will get .callback after the timer has fired. It would be great if you could check on that.

Other than that this patch looks fine to me for beta.
So can I check this in to hg.mozilla.org?
If you checked the .callback users I think we should take this on trunk. Sorry if I wasn't clear on that before.
Yep, I'll check it in to CVS whenever and endgame-driver approves it.
Oh for crying out loud -- approve, already, or I will (again, and don't clear my approval if I do).

/be
+'ing this.  It's requesting approval as well.  End game drivers will go through blockers that are requesting approval and we will approve this for the beta.
Flags: blocking1.9? → blocking1.9+
Comment on attachment 285433 [details] [diff] [review]
address brendan's comments

a=endgame drivers before brendan could carry out his threat ;)
Attachment #285433 - Flags: approval1.9? → approval1.9+
(that's approval for M9 landing, to be clear!)
Target Milestone: --- → mozilla1.9 M9
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Depends on: 401137
I don't think this should go on branch for the reason noted in comment 1 (especially as applied to extensions).
Given that we haven't had a single caller rely on that in our internal code, i'm not that worried that extensions will.
I'm very interested in this patch. However I found a couple of bugs with it:
1) This does not work for ONE_SHOT timers
2) This does not work when cancel() is called when the callback is fired.

In order to fix those replace in Fire()
if (mCallbackType == CALLBACK_TYPE_UNKNOWN && callbackType != TYPE_ONE_SHOT) {

with

if (mCallbackType == CALLBACK_TYPE_UNKNOWN && mType != TYPE_ONE_SHOT && !mCanceled) {
> 1) This does not work for ONE_SHOT timers

Uh... how so?

> 2) This does not work when cancel() is called when the callback is fired.

Indeed.  We should fix that.
The reason the callback is not always released for ONE_SHOT timers is because mType should be compared to TYPE_ONE_SHOT instead of comparing callbackType to TYPE_ONE_SHOT (like comparing apples and oranges)
(In reply to comment #39)
> The reason the callback is not always released for ONE_SHOT timers is because
> mType should be compared to TYPE_ONE_SHOT instead of comparing callbackType to
> TYPE_ONE_SHOT (like comparing apples and oranges)

Doh, typo. I'll run these bug fixes through the test suite.
Depends on: 407201
Jonas: help us out here with reasons for your nomination for 1.8.1.x

We already know 1.8 is leaky, and FF3 fixes tons and tons of other leaks, so it'd have to be a huge leak and a very safe patch to be worth considering. Also given there appear to be a couple regressions linked to this bug, are there others? Are you or Sayre willing to write up a branch patch that includes the regression fixes?
Flags: blocking1.8.1.15?
Don't recall, probably happened by mistake.
Flags: wanted1.8.1.x?
OK, then I think we don't want this on the branch. A known leak that's not particularly drawing complaints is better than risking a use-after-free crash if we get too aggressive. For people worried about memory use they should upgrade to FF3.
Flags: blocking1.8.1.15? → blocking1.8.1.15-
Blocks: 472258
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: