Closed Bug 252324 Opened 20 years ago Closed 20 years ago

timers with very long delays block timer queue

Categories

(Core :: XPCOM, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.8alpha4

People

(Reporter: bryner, Assigned: brendan)

References

Details

(Keywords: fixed-aviary1.0, fixed1.7.5)

Attachments

(2 files)

We recently ran into a situation where an nsITimer was created with a delay of
604,800,000 ms (that's 7 days).  Often this timer would end up at the head of
the timer queue instead of at the end, which blocked any more timers from firing
until the app is shut down.

Ideally we'd be able to deal with a timer like this, but if not, it should be
documented and initializing the timer should fail.
I believe I am also experiencing this same problem due to Calendar timers being
very long.

I think I've identified the problem.  In TimerThread::AddTimerInternal(), the
TIMER_LESS_THAN test returns true if the timer at the front of the queue has
gone past it's delay and the timer being added is DELAY_INTERVAL_MAX.  This
results in a difference larger than DELAY_INTERVAL_LIMIT and thus
TIMER_LESS_THAN returns true.

I noticed that in TimerThread::Run(), the use of TIMER_LESS_THAN is always
negated.  So maybe a fix might be to swap the order of the timers being compared
and negating the return value.  The code would then change from:

     if (TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) {

to:
     if (!TIMER_LESS_THAN(timer->mTimeout, aTimer->mTimeout)) {

Another fix might be to first make sure the timer hasn't expired yet, and if it
has then skip to the next timer.  That code would be something like this:

     if (TIMER_LESS_THAN(now, timer->mTimeout) &&
         TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) {

TIMER_LESS_THAN was created as the fix for bug 138791.  That bug report might
contain some useful information that could help in solving this problem.
Additional testing has convinced me that my hypothesis is correct.  (See comment
#1 for details.)  I have also reproduced this on RedHat 8.0, Win2K, and WinNT. 
A calendar event 24 days in the future on RedHat and 5 days in the future for
both Win2K and WinNT almost always reproduced the problem for me.  I did have
logging enabled and was using a debug build, so it's possible that a release
build cannot reproduce this as reliably.

In TimerThread::AddTimerInternal(), I have changed:

      if (TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) {

to:
      if (TIMER_LESS_THAN(now, timer->mTimeout) &&
          TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) {

in order to skip over any timers that are past due.  This appears to solve the
problem.
Woohoo!  Thanks, lcook.  Taking bug for aviary1.0 and 1.7.x release fixage.

/be
Assignee: pavlov → brendan
Flags: blocking1.7.x+
Flags: blocking-aviary1.0+
You're welcome.  Thanks for checking in the fix.  Can you also get it into
1.8a4, or is that implied by getting it into the branches?
For completeness, here's a patch aginst the trunk.
I had a cosmetic cleanup patch hanging around in my trunk tree, it's trivially
safe (comment and spacing changes only).

/be
Comment on attachment 159527 [details] [diff] [review]
Larry's patch merged with combined whitespace/comment/brace cleanup

r+sr=me.

/be
Attachment #159527 - Flags: superreview+
Attachment #159527 - Flags: review+
Attachment #159527 - Flags: approval1.7.x?
Attachment #159527 - Flags: approval-aviary?
Fixed on the trunk; waiting for branch approvals.

/be
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.8alpha4
Comment on attachment 159527 [details] [diff] [review]
Larry's patch merged with combined whitespace/comment/brace cleanup

a=dveditz for 1.7.x branch
Attachment #159527 - Flags: approval1.7.x? → approval1.7.x+
Comment on attachment 159527 [details] [diff] [review]
Larry's patch merged with combined whitespace/comment/brace cleanup

a=asa for aviary checkin.
Attachment #159527 - Flags: approval-aviary? → approval-aviary+
Comment on attachment 159527 [details] [diff] [review]
Larry's patch merged with combined whitespace/comment/brace cleanup

a=chofmann for aviary branch
bz points out a buglet: we should add mTimeoutAdjustment to timer->mTimeout when
comparing to now in AddTimerInternal:

    if (TIMER_LESS_THAN(now, timer->mTimeout) &&
        TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) {

TIMER_LESS_THAN requires that its two operands differ by less than half the
domain of a PRIntervalTime, 2^31.  But the two operands must be translated using
the same origin.  All mTimeouts are so translated, but the current interval-time
|now| must be translated by -mTimeoutAdjustment (note that positive timeout
adjustment means earlier, negative means later) to have the same origin as any
mTimeout.  So the above should be:

    if (TIMER_LESS_THAN(now, timer->mTimeout + mTimeoutAdjustment) &&
        TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) {

bz found this as we were looking into bug 257454 and chatting over IRC.

For this bug, the above change should not matter, because this bug is about the
case where

  aTimer->mTimeout is now+0x7fffffff and timer->mTimeout is (now - slop)

so TIMER_LESS_THAN computes

  0x7fffffff + slop > 0x80000000

and mTimeoutAdjustment does is increase slop slightly.

I'd like to get this fix in too.  lcook, bz, anyone see something wrong with the
above analysis?

/be
OS: Windows 2000 → All
Hardware: PC → All
I could be wrong, but I think your proposed change of adding the adjustment will
reintroduce the problem I tried to fix.  The point of adding the comparison with
'now' was to determine if a timer is past due and therefore should not be
compared with the timer being added, thus forcing the new timer to be added
after ones that are past due.  Adding the adjustment, if it's a positive value,
will allow a past due timer to be compared with the timer being set.  And if the
timer being set is 0x7fffffff, we now have the original problem where the new
timer gets put on the front of the timer queue.
Oops, lcook is right again -- the problem is not 0x7fffffff or any magic value
(mTimeout wraps around 25 days, we care only about uint32 subtraction here). 
The problem is that AddTimerInternal is being called with aTimer->mTimeout = now
+ delay, where delay <= 0x7fffffff.  That means we should not try to translate
now to the mTimeoutAdjustment-biased axis when comparing now and timer->mTimeout.

Another reason not to: mTimeoutAdjustment as sampled in AddTimerInternal differs
from the updated value used by a subsequent TimerThread::Run call.

That last point is obvious.  The first is subtle, and not really clear just from
looking at the code -- you have to know how AddTimerInternal's callers compute
aTimer->mTimeout.

I'm still thinking there's a hole for TYPE_REPEATING_PRECISE timers, but I need
to think about it tomorrow.  Too tired now.

I made sure the branches and trunk use TIMER_LESS_THAN(now, timer->mTimeout),
not bogo-adjusted time.  Fixed on trunk and branches.

/be
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
*** Bug 259053 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: