Closed Bug 318419 Opened 19 years ago Closed 19 years ago

window.setTimeout() goes crazy periodically - functions are immediately called repeatedly

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: tommut, Assigned: jst)

References

(
URL
)

Details

(Keywords: fixed1.8.1, testcase, verified1.8.0.2, Whiteboard: [rft-dl])

Attachments

(2 files, 5 obsolete files)

testcase showing timer get called too quickly 19 years ago Ratan LaL 25.01 KB, text/html		Details
checks for rollover when inserting into the list 19 years ago Samuel Sieb 3.94 KB, patch		Details \| Diff \| Splinter Review
convert dom timers to 64-bit 19 years ago Samuel Sieb 9.63 KB, patch		Details \| Diff \| Splinter Review
updated patch 19 years ago Samuel Sieb 11.56 KB, patch		Details \| Diff \| Splinter Review
Change nsTimeout::mWhen to PRTime to avoid firing DOM timeouts early when PRIntervalTime wraps. 19 years ago Johnny Stenback (:jst) 8.98 KB, patch	bugzilla-mozilla-20000923 : review+	Details \| Diff \| Splinter Review
Same change w/o nsInt64 19 years ago Johnny Stenback (:jst) 8.09 KB, patch		Details \| Diff \| Splinter Review
Fix comments, and prevent the interval from wrapping as well. 19 years ago Johnny Stenback (:jst) 10.34 KB, patch	jst : review+ brendan : superreview+ dveditz : approval-branch-1.8.1+ dveditz : approval1.8.0.2+	Details \| Diff \| Splinter Review

tom mutdosch

Reporter

Description

•

19 years ago

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

I have seen this every now and then since using the 1.5 betas and Release Candidates (and now the final build).  
I have an extension that sets a timeout to run a function every hour:
window.setTimeout(reminderFox_initializeReminderFoxHourly, 3600000);

Occasionally when that timeout is set, the target function gets called immediately several times.  Even more strange is that if I open a new tab, or change focus from the browser, the target function gets called again.  It seems something goes haywire with how the function is registered.  

I am not sure what causes this or when, but others have seen it when using my extension (reminderfox) as well.  And bug 316491 has some similarities, though in my case I do not have a really large timeout - the length does not seem to matter.  

The problem is pretty severe when it happens as it keeps calling the function many times and in my case, that function pops up an alert dialog so it's very noticable.


Reproducible: Sometimes

tom mutdosch

Reporter

Updated

•

19 years ago

Version: unspecified → 1.5 Branch

Reed Loden [:reed]

Updated

•

19 years ago

Component: OS Integration → General

QA Contact: os.integration → general

Nickolay_Ponomarev

Updated

•

19 years ago

Assignee: nobody → general

Component: General → DOM: Level 0

Product: Firefox → Core

QA Contact: general → ian

Version: 1.5 Branch → 1.8 Branch

Nickolay_Ponomarev

Comment 1

•

19 years ago

This is a dupe of bug 312972 actually, but I'm not sure whether that bug is correctly duped to bug 291386, so I won't resolve this one yet.

James Ross

Comment 2

•

19 years ago

I'm confirming this, because it is precicely this broken setTimeout/setInterval behaviour that causes problems in ChatZilla in bug 319219.

This is basically just bug 291386 (I don't think it's necessarily a dup though), even though bug 291386 comment 6 indicates it shouldn't happen to DOM timers.

Blocks: 319219

Status: UNCONFIRMED → NEW

Depends on: 291386

Ever confirmed: true

Ratan LaL

Comment 3

•

19 years ago

Attached file testcase showing timer get called too quickly — Details

James Ross

Comment 4

•

19 years ago

Alright, I think I got this one sussed. It's the DOM timer code in nsGlobalWindow failing to take into account the interval time (from NSPR) wrapping.

In the old days - and on some non-Windows platforms (Linux for example) - the value only wraps after 49.7 days (or thereabouts), but on a modern Windows system at least, the value NSPR uses can wrap in as little as 19 hours (not sure how much lower it can go - that is the wrap point for my system).

This is, it seems, totally unrelated to bug 291386 (which is about XPCOM timers), as they appear to be working fine throughout the wrapping (which messes up the DOM timers).

I will explain the details in a bit, after some more debugging...

Wayne Mery (:wsmwk)

Comment 5

•

19 years ago

does this point then to multiple issues affecting chatzilla?  because it cz sometimes has trouble within a few minutes of starting up.  Also, I'm shocked that I have 2 cz sessions up now for almost 2 weeks and not a hint trouble.

James Ross

Comment 6

•

19 years ago

No, it's just one issue, but it's a rather odd one. Allow me to explain...

When a timeout (interval or normal) is added, or in the case of an interval re-scheduled, the DOM code stores when it is expected to fire in an unsigned-32bit int. This is fine normally.

If the NSPR interval time (used for measuring everything) is coming up to wrap (every 19 hours here, 49 days on some systems), the value of "now" gets close to 0xFFFFFFFF. This means that when DOM stores the expected trigger time, it will end up with a wrapped value (when now + delay > 0xFFFFFFFF), which is stored normally. This is where things go wrong...

The new timer is (obviously) not supposed to fire until after the 'now' value has wrapped, but it gets stored at the start of the timer queue with a very low expected fire time (because it wrapped). This means that the next time *any* timeout fires, the wrapped timeout will be considered "expired" (because it is earlier in the queue, and has a lower expected fire time), and will be fired. Ooops!

What this means for ChatZilla, is that if you are running /or start/ it within the 5 minutes before the timers wrap, it will go wrong, as there is a 5m timer (which then gets fired instantly until the 'now' time wraps). This applies to all timeouts and intervals, but ChatZilla is the most visible place to be affected.

Samuel Sieb

Comment 7

•

19 years ago

I've managed to make it happen on Linux too.  It's not as easy or as obvious and it recovers much quicker because it uses a millisecond timer.

OS: Windows XP → All

Hardware: PC → All

Jeffrey Knight

Comment 8

•

19 years ago

this does sound like bug <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=316491">316491</a>

James Ross

Comment 9

•

19 years ago

Bug 316491 could be considered a duplicate of this one, but I would note that there is probably more to that bug than this one - this one is about nice, normal timers wrapping and causing problems (which can be fixed simply by observing the wrapping and handling it); that one is about timers which are too big to store in the first place (which would need much more drastic changes).

They both occur because the XPCOM and DOM timer code sucks, though. ;)

Wayne Mery (:wsmwk)

Comment 10

•

19 years ago

Ian, the problem is now well defined. Is there enough information to pass it to a developer?  Seems straight forward.

should the sunbird bug 306834 blocked by this rather than bug 291386, based on what we now know from James' report here about behavior with chatzilla and the "ugly patch" in bug 306834?

Markus Hübner

Updated

•

19 years ago

Keywords: testcase

Samuel Sieb

Updated

•

19 years ago

URL: http://lxr.mozilla.org/mozilla/source...

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

19 years ago

URL: http://lxr.mozilla.org/mozilla/source... → http://bonsai.mozilla.org/cvsblame.cg...

Samuel Sieb

Comment 11

•

19 years ago

Attached patch checks for rollover when inserting into the list (obsolete) — Details — Splinter Review

there are a couple other places in the file where it needs to account for rollover, but this should fix the major bug

Samuel Sieb

Comment 12

•

19 years ago

Comment on attachment 210086 [details] [diff] [review]
checks for rollover when inserting into the list

A much simpler method would be to use the TIMER_LESS_THAN macro from nsTimerImpl.h

Attachment #210086 - Attachment is obsolete: true

Mike Schroepfer

Updated

•

19 years ago

Flags: blocking1.8.0.2?

Mike Schroepfer

Comment 13

•

19 years ago

Samuel - is this something you could get an updated patch for in time for the next .0.x release - i.e. in the next couple o weeks?

Samuel Sieb

Comment 14

•

19 years ago

Yes, I'm working on a patch for 64-bit timers.

Assignee: general → samuel

Samuel Sieb

Updated

•

19 years ago

Status: NEW → ASSIGNED

Samuel Sieb

Comment 15

•

19 years ago

Attached patch convert dom timers to 64-bit (obsolete) — Details — Splinter Review

I couldn't convince the nspr people to let me add a PR_IntervalNow64 so I had to do it this way instead.  I've tested it at roll-over and everything seems to hold together.  The other side effect of this is that you can now create timeouts up to  the full range of a 32-bit signed integer which is around 24.5 days if you really want to. ;-)  (The previous limit could be as low as 6 hours on certain windows computers.)

Samuel Sieb

Comment 16

•

19 years ago

Comment on attachment 210671 [details] [diff] [review]
convert dom timers to 64-bit

A patch from another bug conflicted with this one and I also didn't compile with DEBUG_TIMERS set.

Attachment #210671 - Attachment is obsolete: true

Samuel Sieb

Comment 17

•

19 years ago

Attached patch updated patch (obsolete) — Details — Splinter Review

Johnny Stenback (:jst)

Assignee

Comment 18

•

19 years ago

Comment on attachment 210723 [details] [diff] [review]
updated patch

+PRUint64
+nsGlobalWindow::GetIntervalNow()
+{
+  static PRUint64 upper = 0;
+  static PRUint32 lastnow = 0;
+
+  PRUint32 now = PR_IntervalNow();
+  if (now < lastnow)
+    upper++;
+  lastnow = now;
+  return (upper << 32) + now;
+}

Seems like this patch relies on this function (and the one in the timer code as well) firing at least once between every time PRIntervalTimer's wrap around, otherwise timeouts scheduled more than one wrap out could fire late since the absolute time (upper + now) wouldn't match reality. Plus, "upper" could potentially remain unchanged over several PRIntervalTimer wraps even if this function is called if "now < lastnow" happens to not be true when it does get called (i.e. if the time since the last call to this method is greater than the time it takes for PRIntervalTime to wrap).

Also, the timer code in XPCOM can run on any number of threads (AFAIK), so updating static's w/o a lock isn't safe, and locking there is not necessarily something we'd want to do.

Seems like we need another approach here to solve this :(

James Ross

Comment 19

•

19 years ago

As far as I can see, there are only two other solutions:
 A. Make the DOM code handle wrapping, a la XPCOM timers.
 B. Make NSPR return a 64bit time.

The shortest (theoretical) wrap period for NSPR currently is 11h55m. Although I'm not really keen on the idea of making the times 64bit by manually computing the high 32bits outside of NSPR, in practise I'm not sure it is really a problem. It's not my call, though, and I'd have gone for solution A above anyway.

Daniel Veditz [:dveditz]

Comment 20

•

19 years ago

Would need a trunk patch landed soon to get the required baking time, otherwise it misses the 1.8.0.2 train.

Johnny Stenback (:jst)

Assignee

Comment 21

•

19 years ago

To the best of my knowledge there's two problems here.

The first one, which I'm arguing is in practical terms not a problem, is that our XPCOM timers can't fire farther out into the future than 2^32/PR_TicksPerSecond() seconds, which in the worst case is 11h plus change. I can't imagine this fact causing DOM timers to ever fire immediately etc.

The second problem is that the DOM code deals with PRIntervalTime, but it doesn't at all take into account that PRIntervalTime's can wrap fairly easily (depending on the time of day and the requested delay, essentially). The problem here is that when a DOM timeout is scheduled, we set mWhen in the DOM timeout to PR_IntervalNow() + delay, which can wrap and leave mWhen way in the past (relative to a current PR_IntervalNow()). When this happens and any DOM timeout fires, any of the DOM timeouts whose mWhen wrapped (or most, in the most likely case at least) will fire since we walk the whole list of DOM timeouts looking for timeout's whose mWhen is less than PR_IntervalNow(). This could easily lead to DOM timeouts to fire at unexpected times, and when they do, a ton of them could fire at once.

This second problem is fixable in the DOM code by making the DOM code use PRTime instead of PRIntervalTime (this is only one of many ways around this, but it's the one I chose to test out). That way a DOM timeout's mWhen wouldn't wrap (for years at least), and once a DOM timeout does fire we'd only actually execute the DOM timeout's that have been scheduled to fire before the current time. I'll attach a patch that implements these changes in the DOM code in a bit.

Johnny Stenback (:jst)

Assignee

Comment 22

•

19 years ago

Attached patch Change nsTimeout::mWhen to PRTime to avoid firing DOM timeouts early when PRIntervalTime wraps. (obsolete) — Details — Splinter Review

This does what I talked about in my earlier comment.

James, does this make sense to you? Let me know if you see any problems with this approach, and I'd really appreciate if you could have a look at this and maybe even run it and try to reproduce the problems you've seen in the past.

Attachment #211103 - Flags: superreview?(brendan)

Attachment #211103 - Flags: review?(silver)

James Ross

Comment 23

•

19 years ago

JST, there really was no need to repeat the problem I described a month a go in this bug. We all all well aware of the problem here.

As for the patch, if it works fine. I would rather have just fixed the wrapping.

Also, I can't easily test this any more because NSPR has removed the code that wrapped frequently, so the original problem has basically gone away (the bug remains, of course).

James Ross

Comment 24

•

19 years ago

I guess I'll just back out the rather pointless NSPR changes and test this patch. Easier than trying to explain why they shouldn't have changed NSPR. ;)

James Ross

Comment 25

•

19 years ago

Comment on attachment 211103 [details] [diff] [review]
Change nsTimeout::mWhen to PRTime to avoid firing DOM timeouts early when PRIntervalTime wraps.

>       // Add a "secret" final argument that indicates timeout lateness
>       // in milliseconds
>-      PRIntervalTime lateness =
>-        PR_IntervalToMilliseconds(now - timeout->mWhen);
>-      timeout->mArgv[timeout->mArgc] = INT_TO_JSVAL((jsint) lateness);
>+      nsInt64 lateness = now - timeout->mWhen;
>+
>+      timeout->mArgv[timeout->mArgc] =
>+        INT_TO_JSVAL((jsint)(lateness / PR_USEC_PER_MSEC_64));

mozilla\dom\src\base\nsGlobalWindow.cpp(6208) : error C2440: 'type cast' : cannot convert from 'const nsTInt64' to 'jsint'
        with
        [
            T=PRInt64
        ]
        Ambiguous user-defined-conversion

Ideas?

Johnny Stenback (:jst)

Assignee

Comment 26

•

19 years ago

Sorry for repeating what had mostly already been said, just wanted to make it clear what I was doing in the context of this patch in my own words as well.

As for the compiler error, try changing the offending line(s) to:

      timeout->mArgv[timeout->mArgc] =
        INT_TO_JSVAL((jsint)(PRInt32)(lateness / PR_USEC_PER_MSEC_64));

nsInt64 really sort of sucks, but so do all our other portable 64-bit helpers.

Samuel Sieb

Updated

•

19 years ago

Attachment #210723 - Attachment is obsolete: true

James Ross

Comment 27

•

19 years ago

Comment on attachment 211103 [details] [diff] [review]
Change nsTimeout::mWhen to PRTime to avoid firing DOM timeouts early when PRIntervalTime wraps.

r=silver with the one extra typecast to compile.

I'm not a huge fan of all those 32/64 typecasts, but I'm not a huge fan of C++ either. The patch does fix the problem here, in so far as it has stopped the timers spamming on the original wrap points, and I haven't seen it do it at any other time, either.

Attachment #211103 - Flags: review?(silver) → review+

Johnny Stenback (:jst)

Assignee

Comment 28

•

19 years ago

Attached patch Same change w/o nsInt64 (obsolete) — Details — Splinter Review

This is the same change as the previous (with the exception of the removal of a redundant delay < 0 check) w/o the mess that nsInt64 introduces. We no longer need nsInt64 to work with PRTime since all compilers we now care about have native 64-bit values that work right.

Attachment #211806 - Flags: superreview?(brendan)

Brendan Eich [:brendan]

Comment 29

•

19 years ago

Comment on attachment 211806 [details] [diff] [review]
Same change w/o nsInt64

>+  // Non-zero interval in milli seconds if repetitive timeout
>   PRInt32 mInterval;
> 
>-  // Nominal time to run this timeout
>-  PRIntervalTime mWhen;
>+  // Nominal time (in micro seconds since the epoch) to run this timeout

Nit: milliseconds and microseconds are each one word ;-).

>         // Reschedule the OS timer. Don't bother returning any error
>         // codes if this fails since nobody who cares about them is
>         // listening anyways.
>         nsresult rv =
>-          timeout->mTimer->InitWithFuncCallback(TimerCallback, timeout, delay,
>+          timeout->mTimer->InitWithFuncCallback(TimerCallback, timeout,
>+                                                delay / PR_USEC_PER_MSEC,
>                                                 nsITimer::TYPE_ONE_SHOT);

Here (above)...

>+    PRUint32 delay =
>+      PR_MAX(((PRUint32)(t->mWhen / PR_USEC_PER_MSEC)),
>+              DOM_MIN_TIMEOUT_VALUE);
>+
>+    // Set mWhen back to the time when the timer is supposed to
>+    // fire.
>     t->mWhen += now;
> 
>     t->mTimer = do_CreateInstance("@mozilla.org/timer;1");
>     NS_ENSURE_TRUE(t->mTimer, NS_ERROR_OUT_OF_MEMORY);
> 
>     rv = t->mTimer->InitWithFuncCallback(TimerCallback, t, delay,
>                                          nsITimer::TYPE_ONE_SHOT);
>     if (NS_FAILED(rv)) {
>       t->mTimer = nsnull;

and here, it seems possible that delay/t->mWhen could be > 2^32 * 1e6 (current PRTime plus 200 years is 53 bits, fits in a JS double, but overflows a PRUint32 by one bit when divided by 1000000).  Check my math, please.  This suggests having a max delay constant too.

I am sorry I didn't follow up on this problem in the code -- I noticed it when reviewing bryner's bfcache changes, which led to bug 306759 being filed.  Still on my list, I'm duping it against this one now.

/be

Brendan Eich [:brendan]

Comment 30

•

19 years ago

*** Bug 306759 has been marked as a duplicate of this bug. ***

Brendan Eich [:brendan]

Comment 31

•

19 years ago

I wrote:

> and here, it seems possible that delay/t->mWhen could be > 2^32 * 1e6 (current

The / between delay and t->mWhen meant "or", not "divided by".

/be

Daniel Veditz [:dveditz]

Comment 32

•

19 years ago

provisionally want, pending these patches being landed and tested elsewhere

Flags: blocking1.8.0.2? → blocking1.8.0.2+

Felix Miata

Comment 33

•

19 years ago

I'm one of those routinely affected by bug 319219 using the 1.8 branch nightlies on OS/2. It just happened again minutes ago using yesterday's, so I'd like to see this checked in there to see if it's a fix that works.

Johnny Stenback (:jst)

Assignee

Comment 34

•

19 years ago

Attached patch Fix comments, and prevent the interval from wrapping as well. — Details — Splinter Review

This prevents the given interval from ever causing PRTime to wrap or for the difference between now and when the timer should fire from ever being set to a value greater than what our timer code can handle (which is system dependent due to the timer code's use of PRIntervalTime).

Attachment #212163 - Flags: superreview?(brendan)

Brendan Eich [:brendan]

Comment 35

•

19 years ago

Comment on attachment 212163 [details] [diff] [review]
Fix comments, and prevent the interval from wrapping as well.

sr=me, thanks.

/be

Attachment #212163 - Flags: superreview?(brendan) → superreview+

Johnny Stenback (:jst)

Assignee

Comment 36

•

19 years ago

Taking bug.

Assignee: samuel → jst

Status: ASSIGNED → NEW

Johnny Stenback (:jst)

Assignee

Comment 37

•

19 years ago

Fixed on the trunk.

Status: NEW → RESOLVED

Closed: 19 years ago

Resolution: --- → FIXED

Johnny Stenback (:jst)

Assignee

Updated

•

19 years ago

Attachment #211103 - Attachment is obsolete: true

Attachment #211103 - Flags: superreview?(brendan)

Johnny Stenback (:jst)

Assignee

Updated

•

19 years ago

Attachment #211806 - Attachment is obsolete: true

Attachment #211806 - Flags: superreview?(brendan)

Johnny Stenback (:jst)

Assignee

Comment 38

•

19 years ago

Comment on attachment 212163 [details] [diff] [review]
Fix comments, and prevent the interval from wrapping as well.

Pushing silver's r= forward.

Attachment #212163 - Flags: review+

Attachment #212163 - Flags: approval1.8.0.2?

Attachment #212163 - Flags: approval-branch-1.8.1?

Daniel Veditz [:dveditz]

Comment 39

•

19 years ago

Comment on attachment 212163 [details] [diff] [review]
Fix comments, and prevent the interval from wrapping as well.

approved for 1.8.0 branch, a=dveditz

Attachment #212163 - Flags: approval1.8.0.2?

Attachment #212163 - Flags: approval1.8.0.2+

Attachment #212163 - Flags: approval-branch-1.8.1?

Attachment #212163 - Flags: approval-branch-1.8.1+

Johnny Stenback (:jst)

Assignee

Comment 40

•

19 years ago

Fixed on the branches.

Keywords: fixed1.8.0.2, fixed1.8.1

Dave Liebreich [:davel]

Comment 41

•

19 years ago

Marking [rft-dl] (ready for testing in Firefox 1.5.0.2 release candidates)

Whiteboard: [rft-dl]

Jay Patel [:jay]

Comment 42

•

19 years ago

v.fixed on 1.8.0 branch with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.2) Gecko/20060308 Firefox/1.5.0.2, testcase executes as expected, I don't see any unexpected first setTimeouts fired.

Keywords: fixed1.8.0.2 → verified1.8.0.2

Johnny Stenback (:jst)

Assignee

Comment 43

•

19 years ago

*** Bug 316491 has been marked as a duplicate of this bug. ***

Andre-John Mas

Comment 44

•

19 years ago

From what I understand this issue has been fixed by providing a maximum value for each platform. Does the ECMA define a maximum value? If it does not would this not in some way break specification compliance? Surely we need to try providing the web developers a dependable common solution across platforms. If I test against Firefox on Windows, for the most part I expect it to behave the same on Firefox for Mac, but if there are variations on the rules applied to Javascript for each platform this potentially could lead to varying behaviour.

Brendan Eich [:brendan]

Comment 45

•

19 years ago

(In reply to comment #44)
> From what I understand this issue has been fixed by providing a maximum value
> for each platform. Does the ECMA define a maximum value?

ECMA, presumably you mean ECMA-262 aka ECMAScript, does not specify anything but the core JS language.  It does not specify window.setTimeout, window.alert, or any such "DOM level 0" methods.

Neither does any w3c spec (blame MS and Netscape back in the day for not trying to standardize level 0).  However the w3c "web API" working group, a recently added w.g., and the WHAT w.g. (http://www.whatwg.org/), are both taking on the task of specifying parts of DOM level 0.  Cc'ing sicking, to help get this point addressed by any webapiwg-produced spec; Hixie for whatwg is already QA contact for this bug.

> If it does not would
> this not in some way break specification compliance? Surely we need to try
> providing the web developers a dependable common solution across platforms. If
> I test against Firefox on Windows, for the most part I expect it to behave the
> same on Firefox for Mac, but if there are variations on the rules applied to
> Javascript for each platform this potentially could lead to varying behaviour.

The upper bound on interval-time is generous enough that I doubt it will matter, and there is no cross-browser standard that pages count on.  I would hope that any spec not require all browsers to have the same upper bound, only that script should not count on a time >= SOME_BIG_LIMIT, where SOME_BIG_LIMIT is a constant for all platforms.

/be

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 46

•

19 years ago

The webapi wg has not yet started work on the timer functions.

I do believe that it is resonable to allow a limit, and one should be specified in the spec once a spec is written. However the limit should be fairly high, measured in days. I would say somewhere around 2-7 days is a resonable limit. It should be higher then a day though since it's resonable to consider a page that does something once a day.

And the implementation should definitely throw if it is given a time that is bigger then allowed.

You need to log in before you can comment on or make changes to this bug.