Closed Bug 138791 Opened 23 years ago Closed 23 years ago

nsTimer does not handle interval rollover

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla1.0

People

(Reporter: emaijala+moz, Assigned: emaijala+moz)

References

Details

Attachments

(5 files, 4 obsolete files)

proposed fix to repeating precise timer overflow 23 years ago Brendan Eich [:brendan] 667 bytes, patch		Details \| Diff \| Splinter Review
Proposed fix v1 to fix the main issue and others 23 years ago Ere Maijala (slow) 6.32 KB, patch		Details \| Diff \| Splinter Review
Same patch diffed against current cvs 23 years ago Ere Maijala (slow) 6.18 KB, patch	jesup : review+	Details \| Diff \| Splinter Review
Added comment as requested by rjesup 23 years ago Ere Maijala (slow) 6.35 KB, patch		Details \| Diff \| Splinter Review
alterna-patch incorporating Ere's non-wraparound fixes 23 years ago Brendan Eich [:brendan] 4.60 KB, patch	jesup : review+ waterson : superreview+	Details \| Diff \| Splinter Review
alterna-patch with rjesup and waterson fixes 23 years ago Brendan Eich [:brendan] 4.65 KB, patch		Details \| Diff \| Splinter Review
Ere scores again: TIMER_LESS_THAN was wrong, plus the SetDelayInternal ONE_SHOT fix 23 years ago Brendan Eich [:brendan] 4.76 KB, patch	jesup : review+ waterson : superreview+ blizzard : approval+	Details \| Diff \| Splinter Review
forgot to include the diff for TimerThread.h, for the comment rjesup wanted 23 years ago Brendan Eich [:brendan] 735 bytes, patch		Details \| Diff \| Splinter Review
quick followup fix for efficiency, truth, and beauty 23 years ago Brendan Eich [:brendan] 929 bytes, patch	jesup : review+ waterson : superreview+ asa : approval+	Details \| Diff \| Splinter Review

Ere Maijala (slow)

Assignee

Description

•

23 years ago

I was debugging biff to find out why it stops working after a while and found out that if a timer is added when PR_IntervalTime()+delay is more than 0xFFFFFFFF, the timer will never fire. I'm in the process of making this work.

Ere Maijala (slow)

Assignee

Updated

•

23 years ago

Blocks: 124051

R.K.Aa.

Comment 1

•

23 years ago

*** Bug 138789 has been marked as a duplicate of this bug. ***

R.K.Aa.

Updated

•

23 years ago

Summary: nsTimer doesn't handle interval rollover → nsTimer does not handle interval rollover

Ere Maijala (slow)

Assignee

Comment 2

•

23 years ago

There seems to be another problem also. If an existing timer is given a new delay, it might be set to past. I will need to ask the author of this change why it is done the way it is (I think author is brendan@mozilla.org).

Brendan Eich [:brendan]

Comment 3

•

23 years ago

The reason NS_TYPE_REPEATING_PRECISE timers update their timeout based only on the programmed delay is so that they don't accumulate error (the skew between when nsTimerImpl::PostTimerEvent enqueues a plevent and when nsTimerImpl::Fire calls SetDelayInternal for an NS_TYPE_REPEATING_SLACK timer. Oops, a caller to SetDelay for an NS_TYPE_REPEATING_PRECISE timer should update mTimeout = now first. Patch coming up. /be

Brendan Eich [:brendan]

Comment 4

•

23 years ago

Attached patch proposed fix to repeating precise timer overflow (obsolete) — Details — Splinter Review

Does this help? I wonder if someone is using NS_TYPE_REPEATING_PRECISE when they ought to be using NS_TYPE_REPEATING_SLACK. PRECISE combined with event-loop hogging (say, by Gecko) could easily lead to timers always firing late. /be

Ere Maijala (slow)

Assignee

Comment 5

•

23 years ago

I'll try it. There's also another case: anyone who calls SetDelay() for an existing timer. There are many of these and I think it's correct from the caller to assume that calling SetDelay() is sufficient.

Ere Maijala (slow)

Assignee

Comment 6

•

23 years ago

Comment on attachment 80301 [details] [diff] [review] proposed fix to repeating precise timer overflow Note: this should fix the precise timer but not the original issue

Attachment #80301 - Attachment description: proposed fix → proposed fix to repeating precise timer overflow

Ere Maijala (slow)

Assignee

Comment 7

•

23 years ago

Why is the precise timer reinitialized in PostTimerEvent() and also in Fire()? There's also another problem in Fire(): mDelay should be converted to interval before decrementing mTimeout, right? Could this cause the delay problem seen in 117061 as milliseconds are (in Win2K at least) a lot less than same time in intervals? My suggestion would be to always set mTimeout to now + delay in SetDelayInternal() unless it is precise repeating timer and mTimeout != 0. How does that sound? Note: the changes to make timers fully handle interval rollovers are many. I'll get a patch together soon when these issues are solved.

Ere Maijala (slow)

Assignee

Comment 8

•

23 years ago

This is starting to have quite a few things that might affect 117061 also.

Blocks: 117061

Ere Maijala (slow)

Assignee

Updated

•

23 years ago

Blocks: 129953

Ere Maijala (slow)

Assignee

Comment 9

•

23 years ago

I guess it's time for me to take this..

Assignee: dougt → ere

Ere Maijala (slow)

Assignee

Comment 10

•

23 years ago

Brendan, I had to already get rid of 'if (mTimeout < now) mTimeout = 0xffffffff;' in SetDelayInternal to be able to even try handling rollover. I will create a patch today (which for me is Tuesday)

Ere Maijala (slow)

Assignee

Comment 11

•

23 years ago

Sorry, the reinitialization part of my comment #7 is bogus.

Ere Maijala (slow)

Assignee

Comment 12

•

23 years ago

Attached patch Proposed fix v1 to fix the main issue and others (obsolete) — Details — Splinter Review

This patch also incorporates Brendan's patch. I'd love to have input about it.

Attachment #80301 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Comment 13

•

23 years ago

The reason the 'if (mTimeout < now) mTimeout = 0xffffffff;' code was introduced (by me, I'm afraid) was that some web pages were setting huge refresh timeouts that overflowed the timer and led to immediate refresh (in an infinite loop). See bug 124103, which has a description of how to construct a testcase in case the original page has changed. Does this patch handle that situation gracefully? If not, we should consider moving that logic out of timers and into the docshell, where the page-refresh timer gets set....

Ere Maijala (slow)

Assignee

Comment 14

•

23 years ago

Yes, they are now handled differently as the timer thread can now handle timers that won't fire before the interval has rolled over. The problem before was that when the interval time was near the upper boundary, even a moderately short timeout could overflow it.

Ere Maijala (slow)

Assignee

Comment 15

•

23 years ago

Attached patch Same patch diffed against current cvs — Details — Splinter Review

I've been using this now for one and a half days. Feels good.

Attachment #80553 - Attachment is obsolete: true

Randell Jesup [:jesup] (needinfo me)

Comment 16

•

23 years ago

Comment on attachment 80766 [details] [diff] [review] Same patch diffed against current cvs Add a 1-line comment to AddTimerInternal() documenting the return values and r=rjesup@wgate.com (I'm also going to mark this as All instead of Win2000).

Attachment #80766 - Flags: review+

Ere Maijala (slow)

Assignee

Comment 17

•

23 years ago

Attached patch Added comment as requested by rjesup — Details — Splinter Review

Thanks, comment added.

Brendan Eich [:brendan]

Comment 18

•

23 years ago

I'm sorry I got behind on my bugmail, because I've been hacking a patch for this that's a little bit simpler, although I didn't catch the lack of mDelay scaling when decreasing the timeout local for the NS_TYPE_REPEATING_PRECISE case that Ere fixed in nsTimerImpl::Fire; nor did I see the missing Notify calls that he added (pavlov!!!). I've incorporated Ere's fine fixes into my patch. My patch clamps the maximum delay at just under half the PRIntervalTime domain, so that no two timers (any two) can differ by 0x80000000 or more. Thus, to tell whether a timer t is "less than" a timer u (t's deadline comes before u's), we can test (t < u || t - u > 0x80000000). I think it's best to clamp delay at some reasonable upper bound, rather than pretend that we implement arbitrarily large delay correctly. Without a larger PRLongIntervalTime from NSPR, or a wraparound callback, we can't distinguish wrapping twice from once. Patch coming up, comments and testing appreciated. /be

Brendan Eich [:brendan]

Comment 19

•

23 years ago

Attached patch alterna-patch incorporating Ere's non-wraparound fixes (obsolete) — Details — Splinter Review

If this works as well, I favor it (not out of ego, I'm very grateful to Ere for quickly finding so many latent bugs, as well as addressing wraparound directly) because it avoids adding a member to nsTimerImpl, and because the TIMER_LESS_THAN condition is cheaper than the triple-disjunction test in Ere's patch. But I sure hope my line-number proof is correct! /be

Brendan Eich [:brendan]

Comment 20

•

23 years ago

Pav did point out yesterday, that PRIntervalTime is higher resolution than ms by 1 to 2 decimal orders (per NSPR min and max), so we could raise the upper bound on aDelay that my patch implements by storing mTimeout in ms, not in interval ticks. I didn't make that change, but I could if people think it's important for 1.0. The main thing for 1.0 is to fix testcases in dependent bugs, I say. /be

Brendan Eich [:brendan]

Comment 21

•

23 years ago

Direct mozilla1.0 RC2 dependency, and mozilla1.0+ preapproval to fix. /be

Blocks: 138000

Keywords: mozilla1.0+

Chris Waterson

Comment 22

•

23 years ago

Comment on attachment 80796 [details] [diff] [review] alterna-patch incorporating Ere's non-wraparound fixes sr=waterson. I'd make TIMER_LESS_THAN be a `static inline' fn (instead of macro) to avoid double-evaluations.

Attachment #80796 - Flags: superreview+

Randell Jesup [:jesup] (needinfo me)

Updated

•

23 years ago

Attachment #80796 - Flags: review+

Randell Jesup [:jesup] (needinfo me)

Comment 23

•

23 years ago

Comment on attachment 80796 [details] [diff] [review] alterna-patch incorporating Ere's non-wraparound fixes r=rjesup@wgate.com if a comment documenting AddTimerInternal's return values is added

Brendan Eich [:brendan]

Comment 24

•

23 years ago

Attached patch alterna-patch with rjesup and waterson fixes (obsolete) — Details — Splinter Review

I'll check this into the trunk later today if everyone's ok with it. I still crave Ere's review. /be

Brendan Eich [:brendan]

Updated

•

23 years ago

Attachment #80796 - Attachment is obsolete: true

Ere Maijala (slow)

Assignee

Comment 25

•

23 years ago

This is interesting. I wrote a comment before, but I've no idea where it went. A A few things: I strongly suggest we also take my change from SetDelayInternal() where mTimeout is always set unless it is a precise timer and mTimeout is already set. i.e.: - if (mTimeout == 0 || mType == NS_TYPE_REPEATING_SLACK) - mTimeout = now; + if (mTimeout == 0 || mType != NS_TYPE_REPEATING_PRECISE) + mTimeout = now; While debugging the code (I've used a good amount of time in that :) I saw underflows before this change. Another problem is that with this code it is still possible for a timer near the 0xFFFFFFFF boundary to not fire, right? If PR_IntervalNow() rolls over before the thread checks for this, it's too late and the timer is left dangling. This will kill biff and others (if there are such) that rely on a timer firing. About resolution, the smallest interval resolution is 1000 ticks per second. Isn't that exactly the same as milliseconds?

Brendan Eich [:brendan]

Comment 26

•

23 years ago

Attached patch Ere scores again: TIMER_LESS_THAN was wrong, plus the SetDelayInternal ONE_SHOT fix — Details — Splinter Review

Wow, I was not thinking clearly. Regarding the 1ms minimum resolution of PRIntervalTime, Ere is also right -- I was going by pav's whiteboard numbers, which were 10000 and 100000 -- should be 1000 and 100000 ticks/sec. So we should store mTimeout in ms, always, but I'll leave that for the 1.1alpha trunk. Resoliciting r= and sr=. /be

Attachment #80808 - Attachment is obsolete: true

Chris Waterson

Comment 27

•

23 years ago

Comment on attachment 80842 [details] [diff] [review] Ere scores again: TIMER_LESS_THAN was wrong, plus the SetDelayInternal ONE_SHOT fix sr=waterson

Attachment #80842 - Flags: superreview+

Randell Jesup [:jesup] (needinfo me)

Comment 28

•

23 years ago

Comment on attachment 80842 [details] [diff] [review] Ere scores again: TIMER_LESS_THAN was wrong, plus the SetDelayInternal ONE_SHOT fix Add that comment documenting the return values from AddTimerInternal() for real this time, and r=rjesup@wgate.com Good catch by ere

Attachment #80842 - Flags: review+

Christopher Blizzard (:blizzard)

Comment 29

•

23 years ago

Comment on attachment 80842 [details] [diff] [review] Ere scores again: TIMER_LESS_THAN was wrong, plus the SetDelayInternal ONE_SHOT fix a=blizzard on behalf of drivers for the 1.0 branch

Attachment #80842 - Flags: approval+

Brendan Eich [:brendan]

Comment 30

•

23 years ago

Attached patch forgot to include the diff for TimerThread.h, for the comment rjesup wanted — Details — Splinter Review

Just a comment change, nothing else to see. /be

Brendan Eich [:brendan]

Comment 31

•

23 years ago

Checked into trunk and 1.0 branch. Ere, can you mark VERIFIED once you've verified with biff? Many thanks, /be

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

Ere Maijala (slow)

Assignee

Comment 32

•

23 years ago

I've run my own build for two days without problems. I'll run the trunk build overnight and verify then.

Asa Dotzler [:asa]

Comment 33

•

23 years ago

adding branch resolution keyword.

Keywords: fixed1.0.0

Ere Maijala (slow)

Assignee

Comment 34

•

23 years ago

Marking verified. Of course this is difficult to verify 100% but it has been working for several days now.

Status: RESOLVED → VERIFIED

neil@parkwaycc.co.uk

Comment 35

•

23 years ago

Just a nitpick: u - t < DELAY_INTERVAL_LIMIT (u <> t) is equivalent to ~(u - t) > ~DELAY_INTERVAL_LIMIT (u <> t) is equivalent to (t - u - 1) > (DELAY_INTERVAL_LIMIT - 1) (u <> t) is equivalent to t - u > DELAY_INTERVAL_LIMIT (u <> t) Since u - t < DELAY_INTERVAL_LIMIT is never evaluated when u == t the condition is unnecessary and t - u > DELAY_INTERVAL_LIMIT can be used in all cases.

Ere Maijala (slow)

Assignee

Comment 36

•

23 years ago

I think that's not true. Say, if t=0xfffffff0 and u=1, TIMER_LESS_THAN must return false (current interval rolled over before we managed to trigger the timer at 0xfffffff0). Note that t can be > DELAY_INTERVAL_LIMIT and often is.

David :Bienvenu

Comment 37

•

23 years ago

*** Bug 127693 has been marked as a duplicate of this bug. ***

neil@parkwaycc.co.uk

Comment 38

•

23 years ago

> return (t < u) > ? u - t < DELAY_INTERVAL_LIMIT > : t - u > DELAY_INTERVAL_LIMIT; Ere, when t=0xfffffff0 and u=1 we don't have t < u so this calculates t - u > DELAY_INTERVAL_LIMIT anyway.

Ere Maijala (slow)

Assignee

Comment 39

•

23 years ago

Crap, I had the variables mixed. New try: Say, if t=1 and u=0xfffffff0, TIMER_LESS_THAN must return false (current interval rolled over before we managed to trigger the timer at 0xfffffff0).

neil@parkwaycc.co.uk

Comment 40

•

23 years ago

Ere, when t=1 and u=0xfffffff0 we have t < u, so you then calculate u - t = 0xffffffef which is not < DELAY_INTERVAL_LIMIT as you say. I say calculate t - u = 0x11 which is not > DELAY_INTERVAL_LIMIT thus the answer is still correct. This might be easier to understand if t and u were integers. Assume t = (int)t64 and u = (int)u64. We want abs(t64-u64) < DELAY_INTERVAL_LIMIT by definition of wrap. Then if t64 < u64 then t64-u64 < 0 and (int)(t64-u64) < 0 and (int)t64-(int)u64 < 0 thus t - u < 0!

Ere Maijala (slow)

Assignee

Comment 41

•

23 years ago

Right you are, it seems. There was a problem with the original one, and the latest version seemed to work in all cases, so I didn't even think that it might have been even easier. Additionally, in comments #36 and #39 I had a bad thinko.. Should it be fixed? Brendan?

Ere Maijala (slow)

Assignee

Comment 42

•

23 years ago

*** Bug 141131 has been marked as a duplicate of this bug. ***

Brendan Eich [:brendan]

Comment 43

•

23 years ago

neil: thanks, ere and I were cross-eyed from the case analysis, and missed the neat fact that ~D == D - 1 if D is 0x80000000 and represented by an unsigned 32 bit type. I'll fix fast with drivers' blessing. /be

Brendan Eich [:brendan]

Comment 44

•

23 years ago

Attached patch quick followup fix for efficiency, truth, and beauty — Details — Splinter Review

Looking to get this into 1.0 and the trunk. /be

Brendan Eich [:brendan]

Comment 45

•

23 years ago

Not gonna reopen, because not broken -- just less efficient than the fix we want. /be

OS: Windows 2000 → All

Priority: -- → P1

Hardware: PC → All

Target Milestone: --- → mozilla1.0

Randell Jesup [:jesup] (needinfo me)

Comment 46

•

23 years ago

Comment on attachment 81911 [details] [diff] [review] quick followup fix for efficiency, truth, and beauty r=rjesup@wgate.com

Attachment #81911 - Flags: review+

Chris Waterson

Comment 47

•

23 years ago

Comment on attachment 81911 [details] [diff] [review] quick followup fix for efficiency, truth, and beauty sr=waterson

Attachment #81911 - Flags: superreview+

Asa Dotzler [:asa]

Comment 48

•

23 years ago

Comment on attachment 81911 [details] [diff] [review] quick followup fix for efficiency, truth, and beauty a=asa (on behalf of drivers) for checkin to the 1.0 branch

Attachment #81911 - Flags: approval+

Brendan Eich [:brendan]

Comment 49

•

23 years ago

Followup fix is in branch and trunk. Thanks, neil@parkwaycc.co.uk. /be

Brendan Eich [:brendan]

Comment 50

•

21 years ago

See bug 252324, which was left in the wake of this bug's patches. /be

You need to log in before you can comment on or make changes to this bug.