Closed
Bug 697250
Opened 14 years ago
Closed 14 years ago
An interval timer set for a longer delay can fire twice between firings of an interval timer set for a shorter delay
Categories
(Core :: DOM: Core & HTML, defect, P1)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
FIXED
mozilla10
People
(Reporter: codemonkey49, Assigned: bzbarsky)
References
()
Details
Attachments
(2 files, 1 obsolete file)
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0a1) Gecko/20111025 Firefox/10.0a1
Build ID: 20111025031041
Steps to reproduce:
Please visit this site with a floating-point heavy javascript simulation using Nightly:
http://tholman.com/series/flash-forward/ports/multiphase-flow/02/
Hold the mouse button down in the rectangle to add points. Keep the mouse button pressed for a while until ~1/4th full of points when they settle.
Actual results:
The points behave erratically as if there is some kind of floating point errors in the operations (they have too much 'energy' compared to other browsers).
Expected results:
On other browsers tested the points move fluidly (IE9, Chrome 16.0.912.4 dev-m, Opera, Safari).
When I tried this last ~1-3 weeks ago using Nightly the points moved fluidly so I believe there is some recent change that broke math in some subtle way. I notice on arewefast.com that TM+JM got a lot slower recently so I assume some something significant changed in the engine.
Sorry but I don't have the time or experience to narrow this down to a test case that fails. I am not the author of this demo.
Also I forgot to note another difference, when holding the mouse down sometimes the 'streeam' of new particles becomes straight with no randomness at which point all the other particles no longer repel each other and fall to the bottom of the screen. I didn't observe this on any other browser.
So maybe something to do with the random number generator?
| Assignee | ||
Comment 2•14 years ago
|
||
I get the "no repelling" failure condition every time I try this, so I never get a chance to reproduce the movement aspect.
And I get that failure condition on builds going back to at least Firefox 4...
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 3•14 years ago
|
||
The console is also spammed with warnings about an undefined value. It looks like we end up with a NaN value somewhere. I can track this down tomorrow if nobody beats me to it..
Comment 4•14 years ago
|
||
OK, here's what causes the "no repelling" failure. The script uses 2 different intervals (functions called by setInterval):
1) Update: called every 20 ms. Adds particles, calculates various properties (neighbors, force, etc)
2) Draw: called every ~17 ms (1000 / 60). Moves and draws all particles.
This scheme fails if the first interval is called twice without an intermediate call to draw (the second interval). When this happens we will have two particles at the same position (the second interval is supposed to move the particles added by the first interval). The function setParticle does this:
this.nx = p1.x - p2.x;
this.ny = p1.y - p2.y;
this.distance = Math.sqrt(this.nx * this.nx + this.ny * this.ny);
// ... snip ...
var invDistance = 1 / this.distance;
this.nx *= invDistance;
this.ny *= invDistance;
If particles p1 and p2 have the same position, this.distance is 0. 1/0 is Infinity so this.nx becomes 0 * Infinity = NaN.
I added some logging and sometimes we indeed have two "update" calls without a "draw" call. Also verified that changing the "draw" interval from 17 ms to 22 ms causes the same problem in Chrome.
I wouldn't be surprised if this is also responsible for the problem in comment 0. Note that the script does not use Math.random or Date; it depends on the mouse input and setInterval but for the rest it's deterministic.
Comment 5•14 years ago
|
||
Moving to DOM.
Assignee: general → nobody
Component: JavaScript Engine → DOM
OS: Windows 7 → All
QA Contact: general → general
Hardware: x86_64 → All
| Assignee | ||
Updated•14 years ago
|
| Assignee | ||
Comment 6•14 years ago
|
||
Even without the delay loop we get there pretty quickly, but the delay loop nearly guarantees it on the first iteration....
| Assignee | ||
Comment 7•14 years ago
|
||
Comment on attachment 569688 [details]
Testcase that triggers the timeout issue pretty reliably
This isn't quite right...
Attachment #569688 -
Attachment is obsolete: true
| Assignee | ||
Comment 8•14 years ago
|
||
| Assignee | ||
Comment 9•14 years ago
|
||
So here's what I see when I reproduced on that last testcase. For everything below, t is the time in milliseconds, with 0 being when those two setInterval calls in the testcase happen:
t=0: f scheduled for t=20
t=0: g scheduled for t=17
t=17: g called, finishes. 17 < 17+17, so g scheduled for t=34
t=20: f called
t=40: f finishes. 20 < 20+20, so f scheduled for t=40
t=41: g called, finishes. 41 < 34+17, g scheduled for t=51
t=41: f called
t=61: f finishes. 41 < 40+20, so we try to schedule f for t=60, clamp to t=61
t=82: g called, finishes. 82 > 51+17, so we skip ahead; g scheduled for t=99.
t=82: f called
t=102: f finishes. 82 > 61+20, so we skip ahead; f scheduled for t=102.
t=111: g called, finishes. 111 < 99+17, g scheduled for t=116.
t=111: f called
t=131: f finishes. 111 < 102 + 20, so f scheduled for t=122.
t=135: g called, finishes. 135 > 116+17, so we skip ahead; g scheduled for t=152.
t=135: f called
t=155: f finishes. 135 < 122+20, so we try to schedule f for 142, clamp to 155.
t=157: f called
So the key part there is to hit a situation where a call to g ends up on the "target time + 17 < time when g started running" codepath, which resets g to run at "now + 17" while the immediately following call to f ends up on the "target time + 20 > time when f started running", which resets f to run at "target time + 20". Since g runs right before f, it will get scheduled for "time when f started running + 17", so as long as f started running more than 3ms after its target time f will end up scheduled before g....
| Assignee | ||
Comment 10•14 years ago
|
||
Looking at the spec at http://www.whatwg.org/specs/web-apps/current-work/multipage/timers.html#dom-windowtimers-setinterval the "skip ahead" behavior doesn't seem to be allowed, though step 8 sort of allows anything you want.
It looks like the "skip ahead" thing was added in bug 376643, since at the time this code was using PRTime for things like mWhen. Now that we're using TimeStamp there, would it be safe to back out this patch? Or can TimeStamp increase discontinuously over the things listed in the bug? I guess at least over SIGSTOP/SIGCONT it can...
If we do want to preserve the skipping behavior, I wonder how we can do it without causing issues like the one in this testcase.
| Reporter | ||
Comment 11•14 years ago
|
||
I'm probably missing something as I have zero knowledge of Mozilla code...
What if you just always schedule the next event to happen after "interval-(now%interval)" milliseconds ie sleep for however much is left of the interval period containing the current time? This should handle stop/start and suspend/resume, and not fire twice in a row.
But what time to do the first one could be a problem (if the interval was '1 minute' you'd expect it to run after 1 minute not 'on the next minute'). I guess you could store some offset based on when the interval was first set ie "interval-((now+offset)%interval)".
| Assignee | ||
Comment 12•14 years ago
|
||
> What if you just always schedule the next event to happen after
> "interval-(now%interval)" milliseconds
"now" is not a number. It doesn't really have a concept of % attached to it... We could fix that by measuring time from some fixed base (e.g. browser startup). This is a good idea! Let me think it through.
> I guess you could store some offset based on when the interval was first set ie
> "interval-((now+offset)%interval)"
Yeah, that's needed to correctly handle this testcase, which should ideally call f every 5ms:
setInterval(f, 10);
setTimeout(function() { setInterval(f, 10); }, 5);
| Assignee | ||
Comment 13•14 years ago
|
||
OK, so with that setup there is the open question of whether to set "now" to the time the timer starts firing or the time the timer stops firing.
1) As long as a timer is not irretrievably late (i.e. finishes firing before its next
scheduled firing) it will get scheduled for said next firing no matter what definition
of "now" is used.
2) If a timer starts firing in a situation where we would not skip right now (i.e timer
firing start is not later than its next firing, then if "now" is treated as firing
start the timer will be scheduled for "ASAP" just like it is now. If "now" is treated
as firing end, then it depends on how long the timer took to run.
3) If a timer starts firing in a situation where it would skip right now, it's
guaranteed to skip no matter which definition of "now" is used. It'll obviously skip
less with the "timer firing start" definition.
Also, when the timer skips it's guaranteed to skip by no more than it skips by right now.
An alternative approach is to rip out the "skip ahead" bit and do what precise-but-can-skip XPCOM timers do: when a timer finishes running after its next scheduled firing, schedule it for "ASAP" and, importantly, update its target time to "ASAP". In nsTimeout terms, that means |mWhen = delay + TimeStamp::Now()| instead of |mWhen = firingTime|. This might behave better in terms of trying to "catch up" if it falls a bit behind....
| Assignee | ||
Comment 14•14 years ago
|
||
This patch does that last suggestion: act like precise-but-can-skip XPCOM timers. It fixes the problem with the fluid just collapsing for me. Jamax, I started off a Windows optimized build with this patch on the try server; once it completes, would you be willing to try the resulting build and see whether it solves the 'too much energy' problem too?
Attachment #570464 -
Flags: review?(jst)
| Assignee | ||
Updated•14 years ago
|
Assignee: nobody → bzbarsky
Priority: -- → P1
Whiteboard: [need review]
| Assignee | ||
Comment 15•14 years ago
|
||
I believe the attached patch is actually pretty much equivalent to the comment 11 proposal with "now" measured at firing start, except it won't try to "drift back" to the right timing once it gets off from it, except insofar as the delay line stuff in nsTimerImpl will pull it back. So maybe the comment 11 proposal is better in that regard.
cjones, if you have any thoughts here, I'd love to hear them!
Comment 16•14 years ago
|
||
Try run for d934802342d4 is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=d934802342d4
Results (out of 1 total builds):
success: 1
Builds available at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-d934802342d4
| Reporter | ||
Comment 17•14 years ago
|
||
Boris I tried out the patch and it does fix the double timer problem on my system (the demo doesn't ever 'fail to repel'). Your second testcase in comment 8 ran to a couple thousand before I stopped it. Fyi I did not try suspend/resume or anything else that might be affected by the patch.
But the too-much-energy problem is still there. Maybe it's just this demo relying on some undefined behavior, but it works 'correctly' in other browsers. I'll try to get a movie posted later today so you can at least see what the effect looks like.
| Assignee | ||
Comment 18•14 years ago
|
||
> But the too-much-energy problem is still there.
I'd love to see a demonstration of that. The behavior looks pretty identical to me in Firefox and Chrome....
| Assignee | ||
Comment 19•14 years ago
|
||
Comment on attachment 570464 [details] [diff] [review]
When an interval is very late, just go ahead and schedule it for firing ASAP and update its target time to 'now' (the latter to prevent misbehavior when the interval is very late due to the computer being suspended, say) instead of trying to schedule it f
Chris, I'd love your feedback here. See comment 15 and the preceding discussion.
Attachment #570464 -
Flags: feedback?(jones.chris.g)
| Reporter | ||
Comment 20•14 years ago
|
||
I cobbled together a screen capture at:
http://www.youtube.com/watch?v=T0ULuMOllig
BUT this is a problem in the demo not FF. I got the other browsers to do the same kind of thing (jittering, small exploding groups) by adding far fewer particles. I think the demo just has a logic problem relating to the refresh rate and I noticed it in Nightly because it's so much faster than the others.
Anyway I'm glad there was a real bug in there however small and I didn't completely waste you guys' time!
| Assignee | ||
Comment 21•14 years ago
|
||
Jamax, thanks!
Going to retitle this bug to make it cover the issue we're ending up fixing, which I wouldn't describe as "small".
Summary: Different behavior in a floating point javascript simulation than previously → An interval timer set for a longer delay can fire twice between firings of an interval timer set for a shorter delay
Comment on attachment 570464 [details] [diff] [review]
When an interval is very late, just go ahead and schedule it for firing ASAP and update its target time to 'now' (the latter to prevent misbehavior when the interval is very late due to the computer being suspended, say) instead of trying to schedule it f
I think this approach makes sense. The proposal of comment 11 sounds a bit more attractive to me for callbacks that persistently don't finish in less time than their interval, because it would tend to throttle back the callbacks by interval/2 on average. But this patch provides more consistent behavior. For non-degenerate callbacks, I think I slightly prefer the approach here of "rebasing" the interval after OS sleep etc, but I'm not sure how much it matters.
We really need to get folks on requestAnimationFrame() ... :)
Attachment #570464 -
Flags: feedback?(jones.chris.g) → feedback+
Comment 23•14 years ago
|
||
Comment on attachment 570464 [details] [diff] [review]
When an interval is very late, just go ahead and schedule it for firing ASAP and update its target time to 'now' (the latter to prevent misbehavior when the interval is very late due to the computer being suspended, say) instead of trying to schedule it f
This seems reasonable to me. Ultimately we'll need to get this out to users for a while to see how it performs though.
r=jst
Attachment #570464 -
Flags: review?(jst) → review+
| Assignee | ||
Comment 24•14 years ago
|
||
Flags: in-testsuite?
Whiteboard: [need review]
Target Milestone: --- → mozilla10
Comment 25•14 years ago
|
||
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•