TimeoutManager::RunTimeout() should limit consecutive callbacks using a time budget

RESOLVED FIXED in Firefox 55

Status

()

Core
DOM
RESOLVED FIXED
6 months ago
3 months ago

People

(Reporter: bkelly, Assigned: bkelly)

Tracking

(Blocks: 1 bug)

unspecified
mozilla55
Points:
---

Firefox Tracking Flags

(firefox55 fixed)

Details

Attachments

(5 attachments, 4 obsolete attachments)

3.56 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
15.07 KB, patch
Ehsan
: review+
Details | Diff | Splinter Review
1.58 KB, patch
Ehsan
: review+
Details | Diff | Splinter Review
8.85 KB, patch
Ehsan
: review+
Details | Diff | Splinter Review
5.78 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
(Assignee)

Description

6 months ago
In bug 1342854 I added a configurable limit to the number of timer callbacks that can run in a single event loop runnable.  Right now its based on a pref.

What we really want, though, is to impose a limit based on a CPU or time budget.  If the callbacks do nothing, then we might run more of them.  If a callback takes 100ms, then we probably should not run any more and yield the main thread.

I'm filing this as a follow-up bug because the current code makes it a bit tricky.  TimeoutManager::RunTimeout() is designed to figure out which timers will be run up front before it executes any callbacks.  A time budget would require changing this to such that the set of callbacks to run would be modified in the middle of the callback loop.

I think perhaps it would be safe to set the dummy insertion point after all the callbacks that *might* be run based on the deadline.  We can then stop at any time before that dummy insertion point.  It needs some investigation and testing, though.
Make sure you look at the js::SliceBudget class. It is very useful for this sort of incremental stuff. It is used in iGC and iCC.
(Assignee)

Comment 2

6 months ago
I should note that I don't plan to work on this immediately.  I assigned it to myself so I don't forget about it.
(Assignee)

Comment 3

3 months ago
I'm going to see if I can fit this in for FF55.
Blocks: 1362412
No longer depends on: 1342854
(Assignee)

Comment 4

3 months ago
Created attachment 8866513 [details] [diff] [review]
P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan

Step one is to cleanup a few things that look like bugs to me in RunTimeout().

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4b9071292221b56dce9bfb7e20abc454da5be3dc
(Assignee)

Comment 5

3 months ago
Created attachment 8866526 [details] [diff] [review]
P2 Only execute consecutive timeout handlers for a limit period of time. r=ehsan
(Assignee)

Comment 6

3 months ago
Created attachment 8866527 [details] [diff] [review]
P3 Remove fixed limit on maximum consecutive timeout handlers to execute. r=ehsan

https://treeherder.mozilla.org/#/jobs?repo=try&revision=512b137dffc7b27fe4eb622c32b8a202f3cdf01f
(Assignee)

Comment 7

3 months ago
Comment on attachment 8866513 [details] [diff] [review]
P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan

So, in order to do this work I need to be able to gracefully leave the main RunTimeout() loop where the handlers are called.  Because of this I was looking at the loop code and thought I saw some errors.  This patch tries to fix these problems before adding my new code.

So, this patch tries to fixes these problems:

1) If the window gets frozen, the old code will leave a bunch of timers with a non-zero firing depth.  If the window gets thawed these timers may not get marked as runnable again unless there is a new timer that fires for that depth.  (See the loop earlier in RunTimeouts that focuses on firing depth zero timers.)  So this patch allows frozen windows to take the same path as suspended windows where the firing depth is zeroed.

2) I thought maybe the `!scx` case had the same problem with firing depth, but it seems more likely these timers will just never run.  I modified the code to remove timers when we're in this partially torn down state.

3) We previously thought the early return could leave some dummy timers in the list.  Looking at the code, though, the list should always be empty when we hit this path.  This adds an assert to make that clear.
Attachment #8866513 - Flags: review?(ehsan)
(Assignee)

Comment 8

3 months ago
Comment on attachment 8866526 [details] [diff] [review]
P2 Only execute consecutive timeout handlers for a limit period of time. r=ehsan

This adds the code to measure time spent executing timeout handlers and to early execute if we exceed a threshold.

The threshold defaults to 4ms (25% of a frame).  Its controlled by a pref called "dom.timeout.max_consecutive_callback_ms".

The threshold is only checked once we fire the Timeout specifically targeted by this instance of RunTimeout().  In my testing for previous work this is necessary to avoid some obscure corner cases.  Its a restriction that will go away once bug 1363829 is implemented.

The threshold exit itself is implemented the same way as a suspended window.  We simply continue through the list marking timeout firing depth back to zero.
Attachment #8866526 - Flags: review?(ehsan)
(Assignee)

Comment 9

3 months ago
Comment on attachment 8866527 [details] [diff] [review]
P3 Remove fixed limit on maximum consecutive timeout handlers to execute. r=ehsan

This patch removes the previous "max consecutive callbacks" limiting code.  This was a flat limit on the number of callbacks controlled by the "dom.timeout.max_consecutive_callbacks" pref.

The new mechanism is superior to this one because a time limit automatically adjusts for different levels of CPU power, etc.  We used to set the count limit differently for mobile, but we don't have to do that for the time limit.
Attachment #8866527 - Flags: review?(ehsan)
(Assignee)

Comment 10

3 months ago
Created attachment 8867794 [details] [diff] [review]
P4 Don't waste time calling TimeStamp::Now() after time budget has elapsed. r=ehsan

This is a small optimization to the previous patches.  Once the time budget has elapsed we don't need to call TimeStamp::Now() for every remaining timer.  This patch sets a flag instead and avoids calling this expensive operation when we are already running short on time.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4a10f60b4f43cdde64843c272d4cbb69db0d4e7b
Attachment #8867794 - Flags: review?(ehsan)
Comment on attachment 8866513 [details] [diff] [review]
P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan

Review of attachment 8866513 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!
Attachment #8866513 - Flags: review?(ehsan) → review+
Comment on attachment 8866526 [details] [diff] [review]
P2 Only execute consecutive timeout handlers for a limit period of time. r=ehsan

Review of attachment 8866526 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/libpref/init/all.js
@@ +5698,5 @@
>  pref("dom.timeout.max_consecutive_callbacks", 5);
>  
> +// Maximum amount of time in milliseconds consecutive setTimeout()/setInterval()
> +// callback are allowed to run before yielding the event loop.
> +pref("dom.timeout.max_consecutive_callback_ms", 4);

I'd like to get smaug's feedback on what default value we should use here.  I can't think of anything better than 4ms, but I'm wondering if we shouldn't start with something lower?
Attachment #8866526 - Flags: review?(ehsan)
Attachment #8866526 - Flags: review+
Attachment #8866526 - Flags: feedback?(bugs)
Attachment #8866527 - Flags: review?(ehsan) → review+
Attachment #8867794 - Flags: review?(ehsan) → review+
(Assignee)

Comment 13

3 months ago
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #12)
> I'd like to get smaug's feedback on what default value we should use here. 
> I can't think of anything better than 4ms, but I'm wondering if we shouldn't
> start with something lower?

Well, we don't want to make it too small either.  If you read the section called "What's the catch" here:

https://blog.wanderview.com/blog/2017/03/13/firefox-52-settimeout-changes/

I discuss some of the downsides of forcing timers to yield.  The goal is to strike a reasonable where a useful number of timers make it through while still yielding in time to maintain good fps.

Comment 14

3 months ago
Comment on attachment 8866526 [details] [diff] [review]
P2 Only execute consecutive timeout handlers for a limit period of time. r=ehsan

4ms is quite low, IMO. Not sure having even lower value would make much difference, especially once we prioritize input events over timer events (well, not prioritize but stick next to vsync.).

I wonder if we could add some telemetry to figure out reasonable value.
Probably not, since this depends so highly on a web site.
Attachment #8866526 - Flags: feedback?(bugs) → feedback+

Comment 15

3 months ago
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ff1878374dd
P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/f370c47abb14
P2 Only execute consecutive timeout handlers for a limit period of time. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/69a33656f4a3
P3 Remove fixed limit on maximum consecutive timeout handlers to execute. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/b2a7fe15c578
P4 Don't waste time calling TimeStamp::Now() after time budget has elapsed. r=ehsan

Comment 16

3 months ago
backed this out for test_timer_flood.html perma-failing like https://treeherder.mozilla.org/logviewer.html#?job_id=99365259&repo=mozilla-inbound&lineNumber=1868

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&bugfiler&noautoclassify&filter-searchStr=Android 4.3 API15%2B debug Mochitests executed by TaskCluster test-android-4.3-arm7-api-15%2Fdebug-mochitest-6 tc-M(6)&fromchange=e5bb57513b43
Flags: needinfo?(bkelly)

Comment 17

3 months ago
Backout by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f77644a7da56
Backed out 4 changesets for test_timer_flood.html perma-failing
(Assignee)

Comment 18

3 months ago
Thanks.  I'll take a look.

(And I'm really glad I wrote that test!)
Flags: needinfo?(bkelly)
(Assignee)

Comment 19

3 months ago
My best guess here is that the "targetTimeoutSeen" is not working like it used to.  If the timeout gets canceled or removed, then we may never throttle the loop.  I'm planning to remove this as part of bug 1363829, but maybe I can remove it here.
(Assignee)

Comment 20

3 months ago
Actually, it seems maybe the test just runs more slowly since we're permitting more time to be spent on executing setTimeout() callbacks.  I might need to tune the pref down a bit.
(Assignee)

Comment 21

3 months ago
After investigating some more it seems the changes to continue the loop to set the firing depth back to zero is taking a lot of time.  I'm going to try an alternative solution.
(Assignee)

Comment 22

3 months ago
And also we end up spending more time in the top loop where the flat limit was previously.  Perhaps we should have some large flat limit still, like 5000, and then further limit it with the time based threshold in the lower loop.
(Assignee)

Comment 23

3 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c387f16e20518389bc714cc8bf4bf2dd3543e944
(Assignee)

Comment 24

3 months ago
Setting a flat threshold of 1000 timers in addition to the time budget allows android to pass test_timer_flood.html.  It does still take close to the timeout value, though.  The test time jumps from ~160 seconds to ~315 seconds.  The timeout for the test is set at 330 seconds.

I have an idea to avoid crawling the loop to set mFiringDepth back to zero after time elapses.  I think we can use unique firing depth values that "expire" after we leave RunTimeout().  There is nothing that really requires them to be an increment/decrement.  This would let us just break out of the main RunTimeout() loop for all the various exit scenarios.

We will have to burn a little bit of memory to do this, though.  Since we can't just decrement when we leave a nested RunTimeout() we'll need another way to find the previous levels firing depth.  This can be handled by a stack structured implemented on a small AutoTArray<uint32_t, 4>.
Thanks for digging through this...  I'm super glad you'd added this test, otherwise this would probably regress real pages in bad ways and we wouldn't have known about it!
(Assignee)

Comment 26

3 months ago
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #25)
> Thanks for digging through this...  I'm super glad you'd added this test,
> otherwise this would probably regress real pages in bad ways and we wouldn't
> have known about it!

Well, I think most sites and devices would be fine.  The android emulator running a debug build is exceptionally slow and basic computation stuff.  Still, good to fix.

This works in light local testing.  Lets see what try says:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=24c32d63da3c6573ecc8c1860e212cc2da99f4af
(Assignee)

Comment 27

3 months ago
On my local testing with a nexus 4 debug build:

1. Current m-c: ~24 seconds
2. With original patches here: ~56 seconds
3. With patches in comment 23: ~30 seconds
4. With patches in comment 26: ~28 seconds

We'll have to see what the android emulator says, though.  It could be quite different.
(Assignee)

Comment 28

3 months ago
A few more tweaks.  Doesn't change much on my nexus, but might help on the slower emulator.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bbd1ac19c4e84751df5d5552980113166411f6e2
(Assignee)

Comment 29

3 months ago
Comment 26 brought the debug emulator back down to ~170 seconds from ~315 second.  So that's good.

Comment 28 made the time limit include time spent in the initial RunTimeout() loop.  That actually made the test worsen to ~240 seconds.

I think this means debug emulator can spend the majority of its time in the initial loop.  We should probably time limit that to be 25% of our total time allotment or something.  We want to spend our time running JS, not doing book keeping.
(Assignee)

Comment 30

3 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a13efc969443ca8e84a90eef8566ece0619ae9e4
(Assignee)

Comment 31

3 months ago
Comment 30 patch brings us about back to the ~160 seconds to complete test_timer_flood.html on android debug emulator.

I'll clean up the patches and flag for review on the new stuff.
(Assignee)

Comment 32

3 months ago
Created attachment 8868729 [details] [diff] [review]
P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan
Attachment #8866513 - Attachment is obsolete: true
Attachment #8868729 - Flags: review+
(Assignee)

Comment 33

3 months ago
Created attachment 8868735 [details] [diff] [review]
P2 Change TimeoutManager firing depth to a unique ID that can be expired to optimize cleanup. r=ehsan

Ehsan, this patch refactors our mFiringDepth to allow us to avoid having to iterate the entire list of timers to reset the value when we early-exit the RunTimeout() loop.

Previous mFiringDepth would essentially increment every time we enter RunTimeout() (or RunTimeoutHandler) and decrement when we leave the method.  This allows us to separate timers running within an event loop spin due to something like sync XHR within a timeout handler.

The new system "creates" a new FiringId integer when it enters RunTimeout().  This is appended on an AutoTArray<> stack.  When we leave RunTimeout() we pop it off the stack.

Every FiringId is an increment from the last created.  We never decrement.  Eventually we will wrap around and re-use FiringId values, but this patch assumes we'll never have that many nested RunTimeout() calls.  It seems likely we would overflow the call stack before that happens.

Instead of the old `mFiringDepth == 0` check, we now have an IsInvalidFiringId() method.  This returns true if the given FiringId is not in the AutoTArray<> stack.  It uses a number of fast paths to avoid having to search the array on every call.
Attachment #8868735 - Flags: review?(ehsan)
(Assignee)

Comment 34

3 months ago
Created attachment 8868737 [details] [diff] [review]
P3 Remove unused Timeout::mPrincipal member variable. r=ehsan

The last patch made us use a bit more memory.  This patch reclaims some memory by nuking an unused member variable from Timeout.
Attachment #8868737 - Flags: review?(ehsan)
(Assignee)

Comment 35

3 months ago
Created attachment 8868738 [details] [diff] [review]
P4 Only execute consecutive timeout handlers for a limit period of time. r=ehsan

This is a modified version of the time limit patch.  It now does:

1. Measures elapsed time from the beginning of RunTimeout() instead of just the second loop where handlers are executed.
2. Aborts the initial loop that looks for what timers to run if it sees a significant amount of the time budget being used there.  This is set to 25% of the total time budget.  We only check for time expiration here every 100 Timeout objects examined to avoid too much overhead.
3. Exits the second loop after executing at least one handler if the time has expired.

The goal of (2) is to avoid spending too much of our time budget on book keeping such that we don't have any time to run js handlers.
Attachment #8866526 - Attachment is obsolete: true
Attachment #8867794 - Attachment is obsolete: true
Attachment #8868738 - Flags: review?(ehsan)
(Assignee)

Comment 36

3 months ago
Created attachment 8868739 [details] [diff] [review]
P5 Removed the fixed limit on consecutive Timeout callbacks. r=ehsan

Rebased version of the previously reviewed patch to remove the fixed limit.  With the time limit check on the initial loop, this is no longer needed.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=662d276e8842273005d8c540469426d964d2314f
Attachment #8866527 - Attachment is obsolete: true
Attachment #8868739 - Flags: review+
Sorry I didn't find enough time to do the reviews today, I'll try to finish them tomorrow.
Comment on attachment 8868735 [details] [diff] [review]
P2 Change TimeoutManager firing depth to a unique ID that can be expired to optimize cleanup. r=ehsan

Review of attachment 8868735 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: dom/base/TimeoutManager.cpp
@@ +152,5 @@
> +TimeoutManager::CreateFiringId()
> +{
> +  uint32_t id = mNextFiringId;
> +  mNextFiringId += 1;
> +  if (mNextFiringId == InvalidFiringId) {

Can you please add some comment here explaining that an overflow is expected here?  It may not be super obvious to someone reading the code later.
Attachment #8868735 - Flags: review?(ehsan) → review+
Attachment #8868737 - Flags: review?(ehsan) → review+
Attachment #8868738 - Flags: review?(ehsan) → review+

Comment 39

3 months ago
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3cc2996b4900
P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/7af3265734b8
P2 Change TimeoutManager firing depth to a unique ID that can be expired to optimize cleanup. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/598bd9d7eecc
P3 Remove unused Timeout::mPrincipal member variable. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb29b5cc977c
P4 Only execute consecutive timeout handlers for a limit period of time. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/a8582a3560cc
P5 Removed the fixed limit on consecutive Timeout callbacks. r=ehsan
https://hg.mozilla.org/mozilla-central/rev/3cc2996b4900
https://hg.mozilla.org/mozilla-central/rev/7af3265734b8
https://hg.mozilla.org/mozilla-central/rev/598bd9d7eecc
https://hg.mozilla.org/mozilla-central/rev/cb29b5cc977c
https://hg.mozilla.org/mozilla-central/rev/a8582a3560cc
Status: ASSIGNED → RESOLVED
Last Resolved: 3 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.