Closed Bug 252324 Opened 20 years ago Closed 20 years ago

timers with very long delays block timer queue

Categories

(Core :: XPCOM, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.8alpha4

People

(Reporter: bryner, Assigned: brendan)

References

Details

(Keywords: fixed-aviary1.0, fixed1.7.5)

Attachments

(2 files)

We recently ran into a situation where an nsITimer was created with a delay of 604,800,000 ms (that's 7 days). Often this timer would end up at the head of the timer queue instead of at the end, which blocked any more timers from firing until the app is shut down. Ideally we'd be able to deal with a timer like this, but if not, it should be documented and initializing the timer should fail.
I believe I am also experiencing this same problem due to Calendar timers being very long. I think I've identified the problem. In TimerThread::AddTimerInternal(), the TIMER_LESS_THAN test returns true if the timer at the front of the queue has gone past it's delay and the timer being added is DELAY_INTERVAL_MAX. This results in a difference larger than DELAY_INTERVAL_LIMIT and thus TIMER_LESS_THAN returns true. I noticed that in TimerThread::Run(), the use of TIMER_LESS_THAN is always negated. So maybe a fix might be to swap the order of the timers being compared and negating the return value. The code would then change from: if (TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) { to: if (!TIMER_LESS_THAN(timer->mTimeout, aTimer->mTimeout)) { Another fix might be to first make sure the timer hasn't expired yet, and if it has then skip to the next timer. That code would be something like this: if (TIMER_LESS_THAN(now, timer->mTimeout) && TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) { TIMER_LESS_THAN was created as the fix for bug 138791. That bug report might contain some useful information that could help in solving this problem.
Additional testing has convinced me that my hypothesis is correct. (See comment #1 for details.) I have also reproduced this on RedHat 8.0, Win2K, and WinNT. A calendar event 24 days in the future on RedHat and 5 days in the future for both Win2K and WinNT almost always reproduced the problem for me. I did have logging enabled and was using a debug build, so it's possible that a release build cannot reproduce this as reliably. In TimerThread::AddTimerInternal(), I have changed: if (TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) { to: if (TIMER_LESS_THAN(now, timer->mTimeout) && TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) { in order to skip over any timers that are past due. This appears to solve the problem.
Woohoo! Thanks, lcook. Taking bug for aviary1.0 and 1.7.x release fixage. /be
Assignee: pavlov → brendan
Flags: blocking1.7.x+
Flags: blocking-aviary1.0+
You're welcome. Thanks for checking in the fix. Can you also get it into 1.8a4, or is that implied by getting it into the branches?
For completeness, here's a patch aginst the trunk.
I had a cosmetic cleanup patch hanging around in my trunk tree, it's trivially safe (comment and spacing changes only). /be
Comment on attachment 159527 [details] [diff] [review] Larry's patch merged with combined whitespace/comment/brace cleanup r+sr=me. /be
Attachment #159527 - Flags: superreview+
Attachment #159527 - Flags: review+
Attachment #159527 - Flags: approval1.7.x?
Attachment #159527 - Flags: approval-aviary?
Fixed on the trunk; waiting for branch approvals. /be
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → mozilla1.8alpha4
Comment on attachment 159527 [details] [diff] [review] Larry's patch merged with combined whitespace/comment/brace cleanup a=dveditz for 1.7.x branch
Attachment #159527 - Flags: approval1.7.x? → approval1.7.x+
Comment on attachment 159527 [details] [diff] [review] Larry's patch merged with combined whitespace/comment/brace cleanup a=asa for aviary checkin.
Attachment #159527 - Flags: approval-aviary? → approval-aviary+
Comment on attachment 159527 [details] [diff] [review] Larry's patch merged with combined whitespace/comment/brace cleanup a=chofmann for aviary branch
bz points out a buglet: we should add mTimeoutAdjustment to timer->mTimeout when comparing to now in AddTimerInternal: if (TIMER_LESS_THAN(now, timer->mTimeout) && TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) { TIMER_LESS_THAN requires that its two operands differ by less than half the domain of a PRIntervalTime, 2^31. But the two operands must be translated using the same origin. All mTimeouts are so translated, but the current interval-time |now| must be translated by -mTimeoutAdjustment (note that positive timeout adjustment means earlier, negative means later) to have the same origin as any mTimeout. So the above should be: if (TIMER_LESS_THAN(now, timer->mTimeout + mTimeoutAdjustment) && TIMER_LESS_THAN(aTimer->mTimeout, timer->mTimeout)) { bz found this as we were looking into bug 257454 and chatting over IRC. For this bug, the above change should not matter, because this bug is about the case where aTimer->mTimeout is now+0x7fffffff and timer->mTimeout is (now - slop) so TIMER_LESS_THAN computes 0x7fffffff + slop > 0x80000000 and mTimeoutAdjustment does is increase slop slightly. I'd like to get this fix in too. lcook, bz, anyone see something wrong with the above analysis? /be
OS: Windows 2000 → All
Hardware: PC → All
I could be wrong, but I think your proposed change of adding the adjustment will reintroduce the problem I tried to fix. The point of adding the comparison with 'now' was to determine if a timer is past due and therefore should not be compared with the timer being added, thus forcing the new timer to be added after ones that are past due. Adding the adjustment, if it's a positive value, will allow a past due timer to be compared with the timer being set. And if the timer being set is 0x7fffffff, we now have the original problem where the new timer gets put on the front of the timer queue.
Oops, lcook is right again -- the problem is not 0x7fffffff or any magic value (mTimeout wraps around 25 days, we care only about uint32 subtraction here). The problem is that AddTimerInternal is being called with aTimer->mTimeout = now + delay, where delay <= 0x7fffffff. That means we should not try to translate now to the mTimeoutAdjustment-biased axis when comparing now and timer->mTimeout. Another reason not to: mTimeoutAdjustment as sampled in AddTimerInternal differs from the updated value used by a subsequent TimerThread::Run call. That last point is obvious. The first is subtle, and not really clear just from looking at the code -- you have to know how AddTimerInternal's callers compute aTimer->mTimeout. I'm still thinking there's a hole for TYPE_REPEATING_PRECISE timers, but I need to think about it tomorrow. Too tired now. I made sure the branches and trunk use TIMER_LESS_THAN(now, timer->mTimeout), not bogo-adjusted time. Fixed on trunk and branches. /be
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
*** Bug 259053 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: