Last Comment Bug 291386 - Timers fire way too quickly in some cases
: Timers fire way too quickly in some cases
Status: NEW
[FF2][no l10n impact]
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: x86 Windows XP
: P2 critical with 28 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
: 312972 335968 (view as bug list)
Depends on: 129953 nsIThreadManager
Blocks: 21762 203448 257982 280922 306834 317491 318419 329775
  Show dependency treegraph
Reported: 2005-04-21 17:29 PDT by Darin Fisher
Modified: 2015-05-06 13:19 PDT (History)
50 users (show)
asa: blocking1.8b3-
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---

timeout values (910 bytes, text/plain)
2005-04-21 17:30 PDT, Darin Fisher
no flags Details
Sample JS component that demonstrates the problem (4.88 KB, text/plain)
2005-04-26 22:27 PDT, Darin Fisher
no flags Details
Simplified demonstartion of setInterval time oscillation. (2.22 KB, text/html)
2007-05-05 16:54 PDT, Mike Koss
no flags Details

Description Darin Fisher 2005-04-21 17:29:24 PDT
Timers fire way too quickly in some cases.

During heavy page load, our timer code gets horribly confused and starts firing
timers very quickly (a requested delay of 50ms becomes 1ms in some cases).  This
happens with all types of timers, including TYPE_ONE_SHOT!

From a quick glance at TimerThread.cpp, I suspect the problem has to do with
mTimeoutAdjustment growing wildly large.

My code runs a timer on the main thread, and records the interval between Notify
calls.  I exercised it while the browser was loading and other large
pages.  On those pages, I expect large delays before my Notify method is called,
but I do not expect Notify calls to come in so much faster than the requested 50ms.
Comment 1 Darin Fisher 2005-04-21 17:30:57 PDT
Created attachment 181478 [details]
timeout values

This attachment contains a log of timeout values generated using a
TYPE_ONE_SHOT timer, where I created a new timer each time the previous one
fired.	Each timer was configured to fire after a 50ms delay.  Notice that the
observed intervals however frequently dip below 50ms.
Comment 2 Brendan Eich [:brendan] 2005-04-21 17:49:26 PDT
darin: can you instrument the filter a bit to show mTimeoutAdjustment?  It's
trying to keep deadlines met, even if that means shrinking delay a bit.

Probably this IIR filter is just hopelessly wrong in light of the *hard to fix*
design flaw and unintended consequence of pavlov's timer rewrite of late 2001:
by using PR_WaitCondVar, timers now have up to 10ms of error.  Before, with
event- or message-loop native timers, it was more like 1ms.

Ideas on fixing the underlying problem?

If there's a bug in the filter implementation, let's fix it too.

Comment 3 Darin Fisher 2005-04-21 17:54:28 PDT
I'll try to instrument the code when I can.  Right now, I'm just testing Firefox
1.0.  I think the solution involves not firing ONE_SHOT and REPEATING_SLACK
timers faster than the interval requested.  1ms vs 50ms is a big deal.  The
current algorithm probably only makes sense for REPEATING_PRECISE.
Comment 4 Boris Zbarsky [:bz] 2005-04-21 22:29:00 PDT
Note that DOM timeouts and intervals _do_ want this adjustment, generally
speaking, and are ONE_SHOT timers (timeouts have to be; intervals are probably
done that way to share code with timeouts as much as possible?  Or because they
do some of their own adjustment for delays?  See
Comment 5 Darin Fisher 2005-04-22 00:56:17 PDT
> Note that DOM timeouts and intervals _do_ want this adjustment, generally

Are you saying that it is okay for setTimeout(foo, 50) to call foo 1 ms later?
Comment 6 Darin Fisher 2005-04-22 00:58:38 PDT
Ah, it looks like the DOM timer code deals with timers firing too early:
Comment 7 Boris Zbarsky [:bz] 2005-04-22 08:49:10 PDT
> Are you saying that it is okay for setTimeout(foo, 50) to call foo 1 ms later?

No, but by far the #1 use of setTimeout is for the foo to set the same timeout
again.  In that case, we really do want to average out to the timeout length in
the long run, or at least try to.

But yes, it's a problem in general...
Comment 8 Darin Fisher 2005-04-22 13:47:09 PDT
> No, but by far the #1 use of setTimeout is for the foo to set the same timeout
> again.  In that case, we really do want to average out to the timeout length in
> the long run, or at least try to.

I think the DOM code (setTimeout) does not have the problem described in this
bug.  It seems to have code to workaround the bug I'm talking about here.

I think that each ONE_SHOT timer should be independent of previous ONE_SHOT
timers.  It's absurd for there to be any shared state between different nsITimer
instances.  We need to ensure that ONE_SHOT timers fire no faster than the
requested timeout (within some delta at least).  For TYPE_REPEATING_PRECISE and
TYPE_REPEATING_SLACK there should be some minimal allowed timeout for when we
are in 'catch up' mode.  1 ms is a bad choice for that minimum ;-)
Comment 9 Boris Zbarsky [:bz] 2005-04-24 14:17:56 PDT
> I think that each ONE_SHOT timer should be independent of previous ONE_SHOT
> timers.

To a certain extent... But the problem this code is trying to address is the
following.  Say we are at time X.  We are asked to fire a timer at time X+N.  We
know that in general there are M units of systematic lag in the system so that
if we set the timer for X+N it'll fire at X+N+M.  So we should really set the
timer for X+N-M so that it will fire at the right time.  This is the general
idea.  The rest is implementation -- to measure the systematic lag we use past

The problem we have is that the lag is not quite systematic and is largely due
to the poor resolution of the PR_WaitCondvar code.  So we can end up with N
approximately equal to M and a call aligned on 10ms boundaries in such a way
that for it the lag is close to 0.  Then we get the effect you describe.

The solution should be to fix the resolution of the timer code and then
reevaluate how good a job the current algorithm does when it doesn't have to
contend with people wanting timers that fire every 10ms using a timing mechanism
that is only accurate to 10ms....
Comment 10 Brendan Eich [:brendan] 2005-04-24 19:14:48 PDT
Did the system as a whole regress, without the timer code changing?  Darin,
would you be able to provide the patch you used to test, so helpers can try it
on older milestones?

Comment 11 Darin Fisher 2005-04-24 20:41:16 PDT

i appreciate that the code is trying to make up for delays that may be observed
with past timers, and that is a good thing.  but, we know when we are about to
issue a timer's callback, and we know when the consumer initialized the timer. 
we can therefore determine that it is not yet time for the timer to expire. 
(the DOM setTimeout code does exactly this!)  for example, we can tell that 1ms
is no where near to the 50ms requested.  we can make the timer wait a little
longer before issuing the timer's callback in an attempt to better hit the
desired timeout value.  this is what the DOM code does, and i think it is what
we should do in the nsITimer code.

i doubt this problem has anything to do with the resolution of PR_WaitCondVar. 
that may be a problem that matters for much shorter (than 50ms) timeouts.  i'm
pretty sure the main problem with timers is that we simply fail to process timer
PLEvents on the main thread when we are deep inside layout for some lengthy
period of time.  recall: the problem described in this bug occurs when the
system is heavily loaded (while trying to render for example).

i think we should solve this bug by implementing logic that makes the timer code
go back and wait again if it has expired significantly too soon (based on the
predictive logic).

brendan: my test code is part of a larger extension.  but, it's easy enough to
make a component that can be dropped into mozilla to test this bug.  i'll see
about putting something together for that.
Comment 12 Brendan Eich [:brendan] 2005-04-24 20:57:55 PDT
pav and I were thinking starvation of main thread events is a problem now on
more sites than before, or some such change.  Or maybe it's always been a
problem, but we thought this would have been noticed by DHTML gurus sooner.

Darin: agreed on relieving every nsITimer user from having to reinvent the DOM
timeout/interval code that avoids running too soon.  Any impulse response filter
is not going to cope well with starvation-induced delays of the order you saw. 
So we should fix this feedback mechanism and push the DOM logic down into XPCOM
at the same time, whether we fix the starvation problem any time soon, or not.

Pavlov is probably not the right owner.  I won't have time for this till later
in the week.  Guess we can live with it for 1.1a/1.8b2.

Comment 13 Boris Zbarsky [:bz] 2005-04-24 22:44:32 PDT
Darin, I see what you're saying.  Yeah, your proposal makes sense.
Comment 14 Darin Fisher 2005-04-26 22:27:22 PDT
Created attachment 181936 [details]
Sample JS component that demonstrates the problem

Download this JS file, edit the LOG_PATH variable, and then drop it into your
components directory.  To reproduce excessively small timeouts, load  Then click the link to the CNN money page.  Now, press back and
forward a bunch of times switching between those two pages.  Then take a peak
at the log file.

On my system, I get output that looks like this:


Normally, when the browser is just idling, the timeouts hit the 50 ms mark
pretty reliably.
Comment 15 Darin Fisher 2005-04-27 21:19:10 PDT
reassigning to brendan ;-)
Comment 16 Brendan Eich [:brendan] 2005-04-27 21:32:07 PDT
Good idea!

Comment 17 Frank 2005-07-20 18:37:59 PDT
Sorry to come in here with an issue that might or might not be topical..

I am noticing that on ver 20050713 of Thunderbird ( 1.0+ Alpha 2 ), after
*something happens*, timers are triggering at nearly 8 seconds, no matter what
the setTimeout OR setInterval interval parameter is set to..

I can enter any of these in the JS console and it will trigger in ~ 8secs


If I close down and then restart TB, timers are not firing early ( until that
*something* happens again...

Is this the same issue or is there a bug filed for this issue
Comment 18 Asa Dotzler [:asa] 2005-08-18 11:42:25 PDT
brendan, are we likely to get a not risky patch in time for b4? This seems like
it probably shouldn't block us. 
Comment 19 Brendan Eich [:brendan] 2005-08-19 11:01:50 PDT
This can be done for 1.8b4, by imitating with the DOM setTimeout/setInterval
code does to deal with early timers (formerly from buggy OSes such as Win2k,
when we used native widget-kit timers instead of Pav's XPCOM portable timers).

Comment 20 Brendan Eich [:brendan] 2005-09-14 12:16:41 PDT
Can't block at this point without a patch.  If someone has time to figure out a
solution here, please take this bug.

Comment 21 Jorge Villalobos [:jorgev] 2005-10-19 16:48:33 PDT
*** Bug 312972 has been marked as a duplicate of this bug. ***
Comment 22 Simon Paquet [:sipaq] 2005-11-04 00:17:25 PST
This hit us (Sunbird/Calendar/Lightning) pretty hard in bug 306834, resulting in a pretty ugly workaround hack (see attachment 201791 [details] [diff] [review]). Therefore we would greatly appreciate a fix here.
Comment 23 :Gijs Kruitbosch (away 26-29 incl.) 2005-12-06 08:20:21 PST
This bug also causes ChatZilla to go completely numb in some cases.

Timers start firing basically instantly, which means that the automatic who check (every 2m) and automatic ISON check (every 5m) start spamming the send queue and we quickly end up with tens of thousands of items queued, which means anything you try to send will basically never get there (we send 1 item ever 1.5s).

We also don't ping out because ping responses are handled independantly from the sendqueue.

Needless to say, like the sunbird people, we would *really* appreciate a fix.
Comment 24 :Gijs Kruitbosch (away 26-29 incl.) 2005-12-06 09:05:48 PST
Removing clearly bogus ETA.
Comment 25 Wayne Mery (:wsmwk, NI for questions) 2005-12-07 06:18:21 PST
changing sev to critical because, as comment 23 notes, chatzilla becomes useless when sending to server ceases (bug 319219). cpu loop also probable.

question - if sendqueue length goes very high what might happen?  For example 200,000 or a million?

(?milestone mozilla1.8beta4 is past?)
Comment 26 Nickolay_Ponomarev 2005-12-09 05:14:53 PST
Note that setTimeout has a similar problem (bug 318419 and bug 312972), despite the code Darin mentioned in comment 6.
Comment 27 Wayne Mery (:wsmwk, NI for questions) 2005-12-20 13:33:08 PST
How certain is it that load - previously described as "heavy page load" and "system is heavily loaded" - is a necessary condition to induce this problem?  

When chatzilla induced timer bug hits (bug 319219) the system is often unattended, idle.

Other symptoms when chatzilla tanks - in the browser:

* tab switch very slow (10-30 sec)
* URLbar contents lag when window finally displays after tab switch  (10-30 sec)
* scrolling page is choppy
* URLbar autocomplete stops working
* mouseover / hover does not display title / tooltip 
Comment 28 Alex Staubo 2005-12-20 14:04:04 PST
(In reply to comment #27)
> How certain is it that load - previously described as "heavy page load" and
> "system is heavily loaded" - is a necessary condition to induce this problem?  

This might be a different bug altogether, but in my experience, load has nothing to do with it. I have a page which runs a function, using setInterval(), twice a minute; the function in turn does an AJAX call. It turns out that whenever unrelated effects (which steps through animation frames using setInterval()) are set in motion, the first function's scheduling becomes erratic and unpredictable, being called much too soon. For this reason I have been forced to add a guard which skips the interval logic if the time difference is too low. I notice that effects in general are choppy (as opposed to smooth) on Firefox.
Comment 29 Rob Marshall [tH] 2005-12-30 12:26:50 PST
This bug affected me 3 days in a row with ChatZilla on XULRunner.
ssieb suggested that I comment out the first if block in NSPR's ntinrval.c <>, and after making that change I haven't seen the bug occur for 4 days.
Comment 30 James Ross 2006-01-01 19:08:55 PST
Since the original comment mentions mTimeoutAdjustment, I think this is a (partially) relevant...

There I was, logging nsTimerImpl:5, and I noticed that all the timers that fire before smoothSlack (and thus mTimeoutAdjustment) are set up [1] were all only 0-2ms late - pretty damn good.

However, once smoothSlack got involved, the timers started varying wildly between 8ms *early* and 50ms late mostly (this was with nothing appreciable going on).

smoothSlack itself varied between -600 and -250 for most of it, which seems ok (equates to attempting to wake up 4-10ms early).

[0] FWIW, this is Windows XP 64bit, on an AMD Athlon 64 3200+ with:
  QueryPerformanceFrequency: 2009000000
            _nt_ticksPerSec: 61309
               _nt_bitShift: 15

[1] They wait until they have a certain number of samples first.
Comment 31 Brendan Eich [:brendan] 2006-02-02 13:35:10 PST
Anyone want to take this bug from me?  I'll advise.  I've been a bad owner and I don't expect to have the time to fix it, important though it is.  That makes me a useless pointy-hair, I guess.

Comment 32 Johnny Stenback (:jst, 2006-02-06 17:26:40 PST
For the record, AFAIK the DOM code doesn't do anything to adjust for timers that fire early. What it *does* do though is to make sure it always fires JS timeouts in the order for which they were scheduled by the page, even if that means firing the early due to an OS timer for a JS timeout fired early (and before other OS timers fired).
Comment 33 Darin Fisher 2006-02-06 17:44:31 PST
OK -> me
Comment 34 Mike Shaver (:shaver -- probably not reading bugmail closely) 2006-02-07 08:30:47 PST
Thanks, Darin!  If we end up with a well-baked patch for this, I would be supportive of it on 1.8.0.x as well as 1.8.1; it's a pretty sour bug.
Comment 35 Darin Fisher 2006-03-10 15:59:47 PST
So, I think I found the cause of this bug.  It's basically triggered by the fact that all timer objects are by default restricted to only firing when the main loop is idle.  As a result, timers simply get horribly delayed during page load, especially over a fast network connection.  This is mainly an issue on Windows because we only do idle timers on Windows.

I think we need to account for this explicit starvation when we update the TimerThread's filter.  I'm not sure what exactly though.
Comment 36 Darin Fisher 2006-03-10 16:12:48 PST
I noticed that places had to workaround this bug (see the patch in bug 318052) because suppressing timers during pageload (even chrome pageload) makes it very difficult to implement smooth drag-n-drop.
Comment 37 Martijn Wargers [:mwargers] (not working for Mozilla) 2006-03-10 16:19:24 PST
Well, timers don't work at all in windows during a drag, see bug 203573.
Comment 38 Darin Fisher 2006-03-10 16:23:21 PST
> Well, timers don't work at all in windows during a drag, see bug 203573.

I think that's because the timer events, which drive marquee, are being suppressed because they are only permitted to fire at idle time.  During a drag operation, the main event loop is very busy servicing the mouse movement enents!  Setting a timer to fire normally, and not just when the main event loop is idle, is the only way to workaround this problem.  I think it's really bad that the behavior of the timer system is so platform dependent.  We should try to at least make the timer behavior similar on all platforms, especially given the fact that the timer system was designed to be cross-platform ;-)
Comment 39 Darin Fisher 2006-04-07 13:49:23 PDT
This bug will be fixed by my patch for bug 326273.
Comment 40 Serge Gautherie (:sgautherie) 2006-04-30 02:53:48 PDT
*** Bug 335968 has been marked as a duplicate of this bug. ***
Comment 41 Darin Fisher 2006-05-19 11:45:43 PDT
Actually, my patch for bug 326273 does not completely resolve this bug.  It is still possible for the filter to force timers to fire too quickly.  I notice the problem on Linux mostly where we have difficulty maintaining 10ms timers on DHTML workloads.

See this testcase:
Comment 42 Serge Gautherie (:sgautherie) 2006-08-02 02:24:24 PDT
[Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8.1b1) Gecko/20060728 SeaMonkey/1.1a] (nightly) (W98SE)

(I tried Windows suspend mode again: "bug 335968" still there.)
Comment 43 Serge Gautherie (:sgautherie) 2006-10-02 10:53:26 PDT
[Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.8.1) Gecko/20061002 SeaMonkey/1.1b] (nightly) (W98SE)

(I tried Windows suspend mode again: "bug 335968" still there.)
Comment 44 Mike Koss 2007-05-05 16:54:11 PDT
Created attachment 263884 [details]
Simplified demonstartion of setInterval time oscillation.

Setting multiple setInterval()'s on one page causes timers to behave erratically (oscillate between near instant firing, and very delayed firing).  I think this is related to this existing bug, so I'm attaching the test case here.  You can also run this test from:
Comment 45 Serge Gautherie (:sgautherie) 2007-05-31 04:06:50 PDT
(I wonder if bug 376643 fix will help here ?)
Comment 46 Simon Paquet [:sipaq] 2007-06-04 03:51:43 PDT
Adding bug 329775, which has been fixed with a workaround. But it would be much better, if we could remove the workaround, when this is fixed here.
Comment 47 Adrian 2007-10-04 01:23:53 PDT
Why can't this bug be fixed? It's open for YEARS now.
Comment 48 Samuel Sieb 2007-10-15 21:45:05 PDT
The first time I tried the testcase, it worked fine.  But repeatedly after that, I get something like:
f1: 998ms
f2: 999ms
f1: 1504ms-> Greater than 10% error!
f2: 1504ms-> Greater than 10% error!
f1: 489ms-> Greater than 10% error!
f2: 489ms-> Greater than 10% error!
f1: 1005ms
f2: 1005ms
f1: 1001ms
f2: 1002ms
f1: 1003ms
f2: 1003ms
f1: 997ms
f2: 1000ms
f1: 1001ms
f2: 999ms
f1: 1001ms
f2: 1001ms
f1: 1000ms
f2: 1000ms
f1: 1000ms
f2: 1000ms

Always close to 1500 on the second firing and the third firing will of course be very close to 500 to compensate.  The rest are all very close to 1000.  I'm using a somewhat recent trunk Seamonkey.
Comment 49 Egon Knapen 2007-10-28 13:45:02 PDT
I noticed this problem for years and forgot about it until this week. I'm having this problem, to the point I can't use firefox for my project anymore.

I need it for the timeouts and requests for XmlHttpRequest. Which is working fine on all other browsers (MSIE, Opera, Safari) except for firefox 2.0.7.

If it worked like samuel above wrote, than I could live with it, it doesn't look good however having a div-tag blink like that. 

But having 3 or 4 timers, some just stop working after a few seconds... 
Comment 50 Samuel Sieb 2007-11-15 14:00:50 PST
On latest trunk seamonkey on Linux, the worst I get is:
f1: 998ms
f2: 999ms
f1: 1024ms
f2: 1024ms
f1: 975ms
f2: 976ms
f1: 1001ms
f2: 1002ms
f1: 1000ms
f2: 1000ms
f1: 999ms
f2: 1000ms
f1: 1001ms
f2: 1001ms
f1: 1000ms
f2: 1001ms
f1: 1001ms
f2: 1000ms
f1: 999ms
f2: 1000ms
f1: 999ms
f2: 1000ms

I wonder why the second firing for both timers is off like that.
Comment 51 Egon Knapen 2007-11-15 14:21:23 PST
Hi Samuel, if I look at your list #50, I you would be very happy, what i'm getting with firefox 2:

f1: 1000ms
f2: 1000ms
f1: 0ms-> Greater than 10% error!
f2: 0ms-> Greater than 10% error!
f1: 2016ms-> Greater than 10% error!
f2: 2016ms-> Greater than 10% error!
f1: 0ms-> Greater than 10% error!
f2: 0ms-> Greater than 10% error!
f1: 1984ms-> Greater than 10% error!
f2: 1984ms-> Greater than 10% error!
f1: 0ms-> Greater than 10% error!
f2: 0ms-> Greater than 10% error!
f1: 2016ms-> Greater than 10% error!
f2: 2016ms-> Greater than 10% error!
f1: 0ms-> Greater than 10% error!
f2: 0ms-> Greater than 10% error!
f1: 2000ms-> Greater than 10% error!
f2: 2000ms-> Greater than 10% error!
f1: 0ms-> Greater than 10% error!
f2: 0ms-> Greater than 10% error!
f1: 2000ms-> Greater than 10% error!
f2: 2000ms-> Greater than 10% error!

Which is close, to totally useless...
Comment 52 Egon Knapen 2007-11-15 14:37:57 PST
I saw the text at the attachment already said the same thing. The other browsers in the field (IE, Opera, Safari) manage to get a perfect 1000 for most almost every events... 

Would be nice to have firefox show atleast 1000, not 999 or 998, that should be possible I think... Sometimes show a little bit more like 1030 could be excused because of some other thing happening at the same time in the computer.
Comment 53 Peter Swan 2008-02-11 11:26:08 PST
any update on this bug? this is the first time i'm having to write workaround code for firefox and it is not pleasant!
Comment 54 Egon Knapen 2008-02-11 12:47:57 PST
well it works fine in firefox 3, ff2 is a complete catastroph on this point...
Comment 55 Evan Parker 2008-08-14 15:10:32 PDT
On ff3 (3.0.1) on windows XP I got this on one run:

f1: 993ms
f2: 993ms
f1: 993ms
f2: 994ms
f1: 1344ms-> Greater than 10% error!
f2: 1344ms-> Greater than 10% error!
f1: 667ms-> Greater than 10% error!
f2: 667ms-> Greater than 10% error!
f1: 973ms
f2: 973ms
f1: 1015ms
f2: 1015ms
f1: 1013ms
f2: 1014ms
f1: 1019ms
f2: 1019ms
f1: 976ms
f2: 976ms
f1: 996ms
f2: 996ms
f1: 1000ms
f2: 1000ms

Though I couldn't reproduce it.

Also, FF3 still fires timers too early repeatedly.  See this test page:

Note You need to log in before you can comment on or make changes to this bug.