Closed Bug 1410930 Opened 4 years ago Closed 3 years ago

TimeThread condition variable wait mechanism is innacurate on some windows devices

Categories

(Core :: XPCOM, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: bkelly, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 obsolete files)

Bug 528273 has two test attachments for checking accuracy of setTimeout() and setInterval():

setTimeout:
https://bugzilla.mozilla.org/attachment.cgi?id=412022

setInterval:
https://bugzilla.mozilla.org/attachment.cgi?id=412023

We do pretty well on these on linux/mac.  On windows, though, we show significant jitter/drift.

I did a bisection and this only started after bug 1363829 landed.

I suspect what is happening is that we are just barely missing a deadline on windows and triggering another nsITimer to be scheduled.  We enforce a minimum delay on the timers because its a deeply nested timeout which introduces 4+ ms delays.
We could try increasing the mAllowedEarlyFiringTime in TimeoutExecutor for the windows platform:

https://searchfox.org/mozilla-central/source/dom/base/TimeoutExecutor.h#33

This value is currently derived from the nsTimerImpl here, though:

https://searchfox.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#415

If nsTimerImpl is using the wrong value, then maybe it should be fixed there.
We should probably be taking mAllowedEarlyFiringMicroseconds into account or using forceRunNextTimer here:

https://searchfox.org/mozilla-central/source/xpcom/threads/TimerThread.cpp#438
I think maybe the TimerThread was busy looping on allowed early timers before instead of actually firing them.  Lets see if fixing that helps here.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=cabdc7662e4f1b574af9252ed848e95bc6084f1d
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Comment on attachment 8921120 [details] [diff] [review]
Make TimerThread fire allowed early timers instead of busy looping. r=froydnj

Oh, the next: label is after the if-statement, so we don't busy loop.  Sorry for my confusion here.
Attachment #8921120 - Attachment is obsolete: true
I did some investigating and this comes from rounding/accuracy issues on the windows platform.  Linux and mac actually can sleep for fractions of a millisecond while the API we are using on windows cannot.  I think the rounding logic in nsTimer and TimeoutExecutor do a poor job with the windows at the moment.

I'll try to think about the best way to fix this.
Implementing drift correction as described here helps this test on my windows machine:

https://github.com/whatwg/html/issues/3151#issuecomment-339000314
Before doing any drift correction, I've been trying to understand why I'm seeing such wildly inaccurate delays on my windows machine.  It seems TimerThread is getting woken up late quite frequently:

### ### TimeoutExecutor: scheduled timer with delay 9.976ms
### ### TimeThread: tried to wait for 10ms, but actually waited 14.574ms
This seems to fix the problem for me on my machine.

Its a bit difficult to know if its adequate in general, though.  Adding a printf_stderr() in the wrong path also seemed to fix the problem.  I don't know what kind of crazy algorithm windows is using to wake up threads blocked on condition variables.

At a minimum, though, this reduces the scope of the inaccurate timers on windows problem.
Attachment #8921530 - Flags: review?(nfroyd)
See Also: → 1411329
Comment on attachment 8921530 [details] [diff] [review]
Run TimerThread at PRIORITY_HIGH to improve accuracy on windows platform. r=froydnj

Review of attachment 8921530 [details] [diff] [review]:
-----------------------------------------------------------------

This seems pretty reasonable.
Attachment #8921530 - Flags: review?(nfroyd) → review+
I still get some weird delays on my windows machine, but with the priority change they are lessened.  For example, the setTimeout() test still shows many >4ms delays, but the setInterval() test does not any more.  I have no idea why, unfortunately.
Edge has similar accuracy issues on my machine.
If Edge suffers from the same issues, then I'd say the prioritization patch is probably not worth landing?  Or are our issues *without* the prioritization patch worse than Edge's, and the prioritization patch brings us in line with Edge's behavior?
Our behavior is different, but off by values of a similar magnitude.

For example, if I change the test in comment 0 to use a timeout of 13ms, then FF will still end up with 15ms delays.  Edge will hover around 16ms to 17ms delays.

It seems like sometimes the condition variable wait is quantized to ~5ms increments.  Maybe this is a new power savings thing.  I am running a recently updated Windows 10 Fall Creators Update.
Summary: win32 platform does worse on bug 528273 timeout/interval tests since bug 1363829 landed → TimeThread condition variable wait mechanism is innacurate on some windows devices
Component: DOM → XPCOM
Not working this at the moment.
Assignee: bkelly → nobody
Status: ASSIGNED → NEW

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:bkelly, could you have a look please?

Flags: needinfo?(ben)
Attachment #8921530 - Attachment is obsolete: true
Flags: needinfo?(ben)

At the time I decided I couldn't see a consistent improvement and decided not to land the patch.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.