Last Comment Bug 318419 - window.setTimeout() goes crazy periodically - functions are immediately called repeatedly
: window.setTimeout() goes crazy periodically - functions are immediately calle...
Status: RESOLVED FIXED
[rft-dl]
: fixed1.8.1, testcase, verified1.8.0.2
Product: Core
Classification: Components
Component: DOM: Core & HTML (show other bugs)
: 1.8 Branch
: All All
: -- major with 5 votes (vote)
: ---
Assigned To: Johnny Stenback (:jst, jst@mozilla.com)
: Hixie (not reading bugmail)
:
Mentors:
http://bonsai.mozilla.org/cvsblame.cg...
: 306759 316491 (view as bug list)
Depends on: 291386
Blocks: 319219
  Show dependency treegraph
 
Reported: 2005-11-30 14:19 PST by tom mutdosch
Modified: 2012-07-25 08:38 PDT (History)
32 users (show)
dveditz: blocking1.8.0.2+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
testcase showing timer get called too quickly (25.01 KB, text/html)
2005-12-31 13:08 PST, Ratan LaL
no flags Details
checks for rollover when inserting into the list (3.94 KB, patch)
2006-01-29 15:37 PST, Samuel Sieb
no flags Details | Diff | Splinter Review
convert dom timers to 64-bit (9.63 KB, patch)
2006-02-03 20:50 PST, Samuel Sieb
no flags Details | Diff | Splinter Review
updated patch (11.56 KB, patch)
2006-02-04 16:50 PST, Samuel Sieb
no flags Details | Diff | Splinter Review
Change nsTimeout::mWhen to PRTime to avoid firing DOM timeouts early when PRIntervalTime wraps. (8.98 KB, patch)
2006-02-07 21:55 PST, Johnny Stenback (:jst, jst@mozilla.com)
bugzilla-mozilla-20000923: review+
Details | Diff | Splinter Review
Same change w/o nsInt64 (8.09 KB, patch)
2006-02-13 17:29 PST, Johnny Stenback (:jst, jst@mozilla.com)
no flags Details | Diff | Splinter Review
Fix comments, and prevent the interval from wrapping as well. (10.34 KB, patch)
2006-02-16 16:19 PST, Johnny Stenback (:jst, jst@mozilla.com)
jst: review+
brendan: superreview+
dveditz: approval‑branch‑1.8.1+
dveditz: approval1.8.0.2+
Details | Diff | Splinter Review

Description tom mutdosch 2005-11-30 14:19:11 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

I have seen this every now and then since using the 1.5 betas and Release Candidates (and now the final build).  
I have an extension that sets a timeout to run a function every hour:
window.setTimeout(reminderFox_initializeReminderFoxHourly, 3600000);

Occasionally when that timeout is set, the target function gets called immediately several times.  Even more strange is that if I open a new tab, or change focus from the browser, the target function gets called again.  It seems something goes haywire with how the function is registered.  

I am not sure what causes this or when, but others have seen it when using my extension (reminderfox) as well.  And bug 316491 has some similarities, though in my case I do not have a really large timeout - the length does not seem to matter.  

The problem is pretty severe when it happens as it keeps calling the function many times and in my case, that function pops up an alert dialog so it's very noticable.


Reproducible: Sometimes
Comment 1 Nickolay_Ponomarev 2005-12-09 05:12:36 PST
This is a dupe of bug 312972 actually, but I'm not sure whether that bug is correctly duped to bug 291386, so I won't resolve this one yet.
Comment 2 James Ross 2005-12-09 05:32:17 PST
I'm confirming this, because it is precicely this broken setTimeout/setInterval behaviour that causes problems in ChatZilla in bug 319219.

This is basically just bug 291386 (I don't think it's necessarily a dup though), even though bug 291386 comment 6 indicates it shouldn't happen to DOM timers.
Comment 3 Ratan LaL 2005-12-31 13:08:14 PST
Created attachment 207277 [details]
testcase showing timer get called too quickly
Comment 4 James Ross 2006-01-08 13:59:32 PST
Alright, I think I got this one sussed. It's the DOM timer code in nsGlobalWindow failing to take into account the interval time (from NSPR) wrapping.

In the old days - and on some non-Windows platforms (Linux for example) - the value only wraps after 49.7 days (or thereabouts), but on a modern Windows system at least, the value NSPR uses can wrap in as little as 19 hours (not sure how much lower it can go - that is the wrap point for my system).

This is, it seems, totally unrelated to bug 291386 (which is about XPCOM timers), as they appear to be working fine throughout the wrapping (which messes up the DOM timers).

I will explain the details in a bit, after some more debugging...
Comment 5 Wayne Mery (:wsmwk, NI for questions) 2006-01-08 18:04:56 PST
does this point then to multiple issues affecting chatzilla?  because it cz sometimes has trouble within a few minutes of starting up.  Also, I'm shocked that I have 2 cz sessions up now for almost 2 weeks and not a hint trouble.
Comment 6 James Ross 2006-01-08 18:41:50 PST
No, it's just one issue, but it's a rather odd one. Allow me to explain...

When a timeout (interval or normal) is added, or in the case of an interval re-scheduled, the DOM code stores when it is expected to fire in an unsigned-32bit int. This is fine normally.

If the NSPR interval time (used for measuring everything) is coming up to wrap (every 19 hours here, 49 days on some systems), the value of "now" gets close to 0xFFFFFFFF. This means that when DOM stores the expected trigger time, it will end up with a wrapped value (when now + delay > 0xFFFFFFFF), which is stored normally. This is where things go wrong...

The new timer is (obviously) not supposed to fire until after the 'now' value has wrapped, but it gets stored at the start of the timer queue with a very low expected fire time (because it wrapped). This means that the next time *any* timeout fires, the wrapped timeout will be considered "expired" (because it is earlier in the queue, and has a lower expected fire time), and will be fired. Ooops!

What this means for ChatZilla, is that if you are running /or start/ it within the 5 minutes before the timers wrap, it will go wrong, as there is a 5m timer (which then gets fired instantly until the 'now' time wraps). This applies to all timeouts and intervals, but ChatZilla is the most visible place to be affected.
Comment 7 Samuel Sieb 2006-01-08 18:44:50 PST
I've managed to make it happen on Linux too.  It's not as easy or as obvious and it recovers much quicker because it uses a millisecond timer.
Comment 8 Jeffrey Knight 2006-01-16 12:16:06 PST
this does sound like bug <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=316491">316491</a>
Comment 9 James Ross 2006-01-16 12:22:43 PST
Bug 316491 could be considered a duplicate of this one, but I would note that there is probably more to that bug than this one - this one is about nice, normal timers wrapping and causing problems (which can be fixed simply by observing the wrapping and handling it); that one is about timers which are too big to store in the first place (which would need much more drastic changes).

They both occur because the XPCOM and DOM timer code sucks, though. ;)
Comment 10 Wayne Mery (:wsmwk, NI for questions) 2006-01-16 12:46:42 PST
Ian, the problem is now well defined. Is there enough information to pass it to a developer?  Seems straight forward.

should the sunbird bug 306834 blocked by this rather than bug 291386, based on what we now know from James' report here about behavior with chatzilla and the "ugly patch" in bug 306834?
Comment 11 Samuel Sieb 2006-01-29 15:37:40 PST
Created attachment 210086 [details] [diff] [review]
checks for rollover when inserting into the list

there are a couple other places in the file where it needs to account for rollover, but this should fix the major bug
Comment 12 Samuel Sieb 2006-01-29 16:22:01 PST
Comment on attachment 210086 [details] [diff] [review]
checks for rollover when inserting into the list

A much simpler method would be to use the TIMER_LESS_THAN macro from nsTimerImpl.h
Comment 13 Mike Schroepfer 2006-02-02 17:17:42 PST
Samuel - is this something you could get an updated patch for in time for the next .0.x release - i.e. in the next couple o weeks?
Comment 14 Samuel Sieb 2006-02-03 14:43:29 PST
Yes, I'm working on a patch for 64-bit timers.
Comment 15 Samuel Sieb 2006-02-03 20:50:52 PST
Created attachment 210671 [details] [diff] [review]
convert dom timers to 64-bit

I couldn't convince the nspr people to let me add a PR_IntervalNow64 so I had to do it this way instead.  I've tested it at roll-over and everything seems to hold together.  The other side effect of this is that you can now create timeouts up to  the full range of a 32-bit signed integer which is around 24.5 days if you really want to. ;-)  (The previous limit could be as low as 6 hours on certain windows computers.)
Comment 16 Samuel Sieb 2006-02-04 16:06:59 PST
Comment on attachment 210671 [details] [diff] [review]
convert dom timers to 64-bit

A patch from another bug conflicted with this one and I also didn't compile with DEBUG_TIMERS set.
Comment 17 Samuel Sieb 2006-02-04 16:50:30 PST
Created attachment 210723 [details] [diff] [review]
updated patch
Comment 18 Johnny Stenback (:jst, jst@mozilla.com) 2006-02-06 16:09:38 PST
Comment on attachment 210723 [details] [diff] [review]
updated patch

+PRUint64
+nsGlobalWindow::GetIntervalNow()
+{
+  static PRUint64 upper = 0;
+  static PRUint32 lastnow = 0;
+
+  PRUint32 now = PR_IntervalNow();
+  if (now < lastnow)
+    upper++;
+  lastnow = now;
+  return (upper << 32) + now;
+}

Seems like this patch relies on this function (and the one in the timer code as well) firing at least once between every time PRIntervalTimer's wrap around, otherwise timeouts scheduled more than one wrap out could fire late since the absolute time (upper + now) wouldn't match reality. Plus, "upper" could potentially remain unchanged over several PRIntervalTimer wraps even if this function is called if "now < lastnow" happens to not be true when it does get called (i.e. if the time since the last call to this method is greater than the time it takes for PRIntervalTime to wrap).

Also, the timer code in XPCOM can run on any number of threads (AFAIK), so updating static's w/o a lock isn't safe, and locking there is not necessarily something we'd want to do.

Seems like we need another approach here to solve this :(
Comment 19 James Ross 2006-02-06 16:23:43 PST
As far as I can see, there are only two other solutions:
 A. Make the DOM code handle wrapping, a la XPCOM timers.
 B. Make NSPR return a 64bit time.

The shortest (theoretical) wrap period for NSPR currently is 11h55m. Although I'm not really keen on the idea of making the times 64bit by manually computing the high 32bits outside of NSPR, in practise I'm not sure it is really a problem. It's not my call, though, and I'd have gone for solution A above anyway.
Comment 20 Daniel Veditz [:dveditz] 2006-02-07 16:09:06 PST
Would need a trunk patch landed soon to get the required baking time, otherwise it misses the 1.8.0.2 train.
Comment 21 Johnny Stenback (:jst, jst@mozilla.com) 2006-02-07 21:41:00 PST
To the best of my knowledge there's two problems here.

The first one, which I'm arguing is in practical terms not a problem, is that our XPCOM timers can't fire farther out into the future than 2^32/PR_TicksPerSecond() seconds, which in the worst case is 11h plus change. I can't imagine this fact causing DOM timers to ever fire immediately etc.

The second problem is that the DOM code deals with PRIntervalTime, but it doesn't at all take into account that PRIntervalTime's can wrap fairly easily (depending on the time of day and the requested delay, essentially). The problem here is that when a DOM timeout is scheduled, we set mWhen in the DOM timeout to PR_IntervalNow() + delay, which can wrap and leave mWhen way in the past (relative to a current PR_IntervalNow()). When this happens and any DOM timeout fires, any of the DOM timeouts whose mWhen wrapped (or most, in the most likely case at least) will fire since we walk the whole list of DOM timeouts looking for timeout's whose mWhen is less than PR_IntervalNow(). This could easily lead to DOM timeouts to fire at unexpected times, and when they do, a ton of them could fire at once.

This second problem is fixable in the DOM code by making the DOM code use PRTime instead of PRIntervalTime (this is only one of many ways around this, but it's the one I chose to test out). That way a DOM timeout's mWhen wouldn't wrap (for years at least), and once a DOM timeout does fire we'd only actually execute the DOM timeout's that have been scheduled to fire before the current time. I'll attach a patch that implements these changes in the DOM code in a bit.
Comment 22 Johnny Stenback (:jst, jst@mozilla.com) 2006-02-07 21:55:00 PST
Created attachment 211103 [details] [diff] [review]
Change nsTimeout::mWhen to PRTime to avoid firing DOM timeouts early when PRIntervalTime wraps.

This does what I talked about in my earlier comment.

James, does this make sense to you? Let me know if you see any problems with this approach, and I'd really appreciate if you could have a look at this and maybe even run it and try to reproduce the problems you've seen in the past.
Comment 23 James Ross 2006-02-08 05:01:24 PST
JST, there really was no need to repeat the problem I described a month a go in this bug. We all all well aware of the problem here.

As for the patch, if it works fine. I would rather have just fixed the wrapping.

Also, I can't easily test this any more because NSPR has removed the code that wrapped frequently, so the original problem has basically gone away (the bug remains, of course).
Comment 24 James Ross 2006-02-08 05:13:58 PST
I guess I'll just back out the rather pointless NSPR changes and test this patch. Easier than trying to explain why they shouldn't have changed NSPR. ;)
Comment 25 James Ross 2006-02-08 05:54:27 PST
Comment on attachment 211103 [details] [diff] [review]
Change nsTimeout::mWhen to PRTime to avoid firing DOM timeouts early when PRIntervalTime wraps.

>       // Add a "secret" final argument that indicates timeout lateness
>       // in milliseconds
>-      PRIntervalTime lateness =
>-        PR_IntervalToMilliseconds(now - timeout->mWhen);
>-      timeout->mArgv[timeout->mArgc] = INT_TO_JSVAL((jsint) lateness);
>+      nsInt64 lateness = now - timeout->mWhen;
>+
>+      timeout->mArgv[timeout->mArgc] =
>+        INT_TO_JSVAL((jsint)(lateness / PR_USEC_PER_MSEC_64));

mozilla\dom\src\base\nsGlobalWindow.cpp(6208) : error C2440: 'type cast' : cannot convert from 'const nsTInt64' to 'jsint'
        with
        [
            T=PRInt64
        ]
        Ambiguous user-defined-conversion

Ideas?
Comment 26 Johnny Stenback (:jst, jst@mozilla.com) 2006-02-08 14:15:58 PST
Sorry for repeating what had mostly already been said, just wanted to make it clear what I was doing in the context of this patch in my own words as well.

As for the compiler error, try changing the offending line(s) to:

      timeout->mArgv[timeout->mArgc] =
        INT_TO_JSVAL((jsint)(PRInt32)(lateness / PR_USEC_PER_MSEC_64));

nsInt64 really sort of sucks, but so do all our other portable 64-bit helpers.
Comment 27 James Ross 2006-02-10 06:21:51 PST
Comment on attachment 211103 [details] [diff] [review]
Change nsTimeout::mWhen to PRTime to avoid firing DOM timeouts early when PRIntervalTime wraps.

r=silver with the one extra typecast to compile.

I'm not a huge fan of all those 32/64 typecasts, but I'm not a huge fan of C++ either. The patch does fix the problem here, in so far as it has stopped the timers spamming on the original wrap points, and I haven't seen it do it at any other time, either.
Comment 28 Johnny Stenback (:jst, jst@mozilla.com) 2006-02-13 17:29:57 PST
Created attachment 211806 [details] [diff] [review]
Same change w/o nsInt64

This is the same change as the previous (with the exception of the removal of a redundant delay < 0 check) w/o the mess that nsInt64 introduces. We no longer need nsInt64 to work with PRTime since all compilers we now care about have native 64-bit values that work right.
Comment 29 Brendan Eich [:brendan] 2006-02-13 22:53:57 PST
Comment on attachment 211806 [details] [diff] [review]
Same change w/o nsInt64

>+  // Non-zero interval in milli seconds if repetitive timeout
>   PRInt32 mInterval;
> 
>-  // Nominal time to run this timeout
>-  PRIntervalTime mWhen;
>+  // Nominal time (in micro seconds since the epoch) to run this timeout

Nit: milliseconds and microseconds are each one word ;-).

>         // Reschedule the OS timer. Don't bother returning any error
>         // codes if this fails since nobody who cares about them is
>         // listening anyways.
>         nsresult rv =
>-          timeout->mTimer->InitWithFuncCallback(TimerCallback, timeout, delay,
>+          timeout->mTimer->InitWithFuncCallback(TimerCallback, timeout,
>+                                                delay / PR_USEC_PER_MSEC,
>                                                 nsITimer::TYPE_ONE_SHOT);

Here (above)...

>+    PRUint32 delay =
>+      PR_MAX(((PRUint32)(t->mWhen / PR_USEC_PER_MSEC)),
>+              DOM_MIN_TIMEOUT_VALUE);
>+
>+    // Set mWhen back to the time when the timer is supposed to
>+    // fire.
>     t->mWhen += now;
> 
>     t->mTimer = do_CreateInstance("@mozilla.org/timer;1");
>     NS_ENSURE_TRUE(t->mTimer, NS_ERROR_OUT_OF_MEMORY);
> 
>     rv = t->mTimer->InitWithFuncCallback(TimerCallback, t, delay,
>                                          nsITimer::TYPE_ONE_SHOT);
>     if (NS_FAILED(rv)) {
>       t->mTimer = nsnull;

and here, it seems possible that delay/t->mWhen could be > 2^32 * 1e6 (current PRTime plus 200 years is 53 bits, fits in a JS double, but overflows a PRUint32 by one bit when divided by 1000000).  Check my math, please.  This suggests having a max delay constant too.

I am sorry I didn't follow up on this problem in the code -- I noticed it when reviewing bryner's bfcache changes, which led to bug 306759 being filed.  Still on my list, I'm duping it against this one now.

/be
Comment 30 Brendan Eich [:brendan] 2006-02-13 22:54:09 PST
*** Bug 306759 has been marked as a duplicate of this bug. ***
Comment 31 Brendan Eich [:brendan] 2006-02-13 22:55:19 PST
I wrote:

> and here, it seems possible that delay/t->mWhen could be > 2^32 * 1e6 (current

The / between delay and t->mWhen meant "or", not "divided by".

/be
Comment 32 Daniel Veditz [:dveditz] 2006-02-16 12:21:32 PST
provisionally want, pending these patches being landed and tested elsewhere
Comment 33 Felix Miata 2006-02-16 15:42:15 PST
I'm one of those routinely affected by bug 319219 using the 1.8 branch nightlies on OS/2. It just happened again minutes ago using yesterday's, so I'd like to see this checked in there to see if it's a fix that works.
Comment 34 Johnny Stenback (:jst, jst@mozilla.com) 2006-02-16 16:19:26 PST
Created attachment 212163 [details] [diff] [review]
Fix comments, and prevent the interval from wrapping as well.

This prevents the given interval from ever causing PRTime to wrap or for the difference between now and when the timer should fire from ever being set to a value greater than what our timer code can handle (which is system dependent due to the timer code's use of PRIntervalTime).
Comment 35 Brendan Eich [:brendan] 2006-02-16 19:08:07 PST
Comment on attachment 212163 [details] [diff] [review]
Fix comments, and prevent the interval from wrapping as well.

sr=me, thanks.

/be
Comment 36 Johnny Stenback (:jst, jst@mozilla.com) 2006-02-17 15:24:44 PST
Taking bug.
Comment 37 Johnny Stenback (:jst, jst@mozilla.com) 2006-02-17 15:25:13 PST
Fixed on the trunk.
Comment 38 Johnny Stenback (:jst, jst@mozilla.com) 2006-02-17 15:27:22 PST
Comment on attachment 212163 [details] [diff] [review]
Fix comments, and prevent the interval from wrapping as well.

Pushing silver's r= forward.
Comment 39 Daniel Veditz [:dveditz] 2006-02-21 17:52:08 PST
Comment on attachment 212163 [details] [diff] [review]
Fix comments, and prevent the interval from wrapping as well.

approved for 1.8.0 branch, a=dveditz
Comment 40 Johnny Stenback (:jst, jst@mozilla.com) 2006-02-22 16:50:45 PST
Fixed on the branches.
Comment 41 Dave Liebreich [:davel] 2006-03-01 16:19:23 PST
Marking [rft-dl] (ready for testing in Firefox 1.5.0.2 release candidates)
Comment 42 Jay Patel [:jay] 2006-03-08 14:33:42 PST
v.fixed on 1.8.0 branch with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.2) Gecko/20060308 Firefox/1.5.0.2, testcase executes as expected, I don't see any unexpected first setTimeouts fired.
Comment 43 Johnny Stenback (:jst, jst@mozilla.com) 2006-03-10 08:28:13 PST
*** Bug 316491 has been marked as a duplicate of this bug. ***
Comment 44 Andre-John Mas 2006-03-11 12:31:55 PST
From what I understand this issue has been fixed by providing a maximum value for each platform. Does the ECMA define a maximum value? If it does not would this not in some way break specification compliance? Surely we need to try providing the web developers a dependable common solution across platforms. If I test against Firefox on Windows, for the most part I expect it to behave the same on Firefox for Mac, but if there are variations on the rules applied to Javascript for each platform this potentially could lead to varying behaviour.

Comment 45 Brendan Eich [:brendan] 2006-03-11 13:29:36 PST
(In reply to comment #44)
> From what I understand this issue has been fixed by providing a maximum value
> for each platform. Does the ECMA define a maximum value?

ECMA, presumably you mean ECMA-262 aka ECMAScript, does not specify anything but the core JS language.  It does not specify window.setTimeout, window.alert, or any such "DOM level 0" methods.

Neither does any w3c spec (blame MS and Netscape back in the day for not trying to standardize level 0).  However the w3c "web API" working group, a recently added w.g., and the WHAT w.g. (http://www.whatwg.org/), are both taking on the task of specifying parts of DOM level 0.  Cc'ing sicking, to help get this point addressed by any webapiwg-produced spec; Hixie for whatwg is already QA contact for this bug.

> If it does not would
> this not in some way break specification compliance? Surely we need to try
> providing the web developers a dependable common solution across platforms. If
> I test against Firefox on Windows, for the most part I expect it to behave the
> same on Firefox for Mac, but if there are variations on the rules applied to
> Javascript for each platform this potentially could lead to varying behaviour.

The upper bound on interval-time is generous enough that I doubt it will matter, and there is no cross-browser standard that pages count on.  I would hope that any spec not require all browsers to have the same upper bound, only that script should not count on a time >= SOME_BIG_LIMIT, where SOME_BIG_LIMIT is a constant for all platforms.

/be
Comment 46 Jonas Sicking (:sicking) No longer reading bugmail consistently 2006-03-14 17:34:42 PST
The webapi wg has not yet started work on the timer functions.

I do believe that it is resonable to allow a limit, and one should be specified in the spec once a spec is written. However the limit should be fairly high, measured in days. I would say somewhere around 2-7 days is a resonable limit. It should be higher then a day though since it's resonable to consider a page that does something once a day.

And the implementation should definitely throw if it is given a time that is bigger then allowed.

Note You need to log in before you can comment on or make changes to this bug.