Closed Bug 1343912 Opened 7 years ago Closed 7 years ago

TimeoutManager::RunTimeout() should limit consecutive callbacks using a time budget

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla55

Tracking Flags:

Tracking

Status

firefox55

---

fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

Details

Attachments

(5 files, 4 obsolete files)

P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan 7 years ago Ben Kelly [:bkelly, not reviewing] 3.56 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
P2 Only execute consecutive timeout handlers for a limit period of time. r=ehsan 7 years ago Ben Kelly [:bkelly, not reviewing] 5.71 KB, patch	ehsan.akhgari : review+ smaug : feedback+	Details \| Diff \| Splinter Review
P3 Remove fixed limit on maximum consecutive timeout handlers to execute. r=ehsan 7 years ago Ben Kelly [:bkelly, not reviewing] 6.68 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
P4 Don't waste time calling TimeStamp::Now() after time budget has elapsed. r=ehsan 7 years ago Ben Kelly [:bkelly, not reviewing] 2.15 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan 7 years ago Ben Kelly [:bkelly, not reviewing] 3.56 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review
P2 Change TimeoutManager firing depth to a unique ID that can be expired to optimize cleanup. r=ehsan 7 years ago Ben Kelly [:bkelly, not reviewing] 15.07 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
P3 Remove unused Timeout::mPrincipal member variable. r=ehsan 7 years ago Ben Kelly [:bkelly, not reviewing] 1.58 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
P4 Only execute consecutive timeout handlers for a limit period of time. r=ehsan 7 years ago Ben Kelly [:bkelly, not reviewing] 8.85 KB, patch	ehsan.akhgari : review+	Details \| Diff \| Splinter Review
P5 Removed the fixed limit on consecutive Timeout callbacks. r=ehsan 7 years ago Ben Kelly [:bkelly, not reviewing] 5.78 KB, patch	bkelly : review+	Details \| Diff \| Splinter Review

Ben Kelly [:bkelly, not reviewing]

Assignee

Description

•

7 years ago

In bug 1342854 I added a configurable limit to the number of timer callbacks that can run in a single event loop runnable.  Right now its based on a pref.

What we really want, though, is to impose a limit based on a CPU or time budget.  If the callbacks do nothing, then we might run more of them.  If a callback takes 100ms, then we probably should not run any more and yield the main thread.

I'm filing this as a follow-up bug because the current code makes it a bit tricky.  TimeoutManager::RunTimeout() is designed to figure out which timers will be run up front before it executes any callbacks.  A time budget would require changing this to such that the set of callbacks to run would be modified in the middle of the callback loop.

I think perhaps it would be safe to set the dummy insertion point after all the callbacks that *might* be run based on the deadline.  We can then stop at any time before that dummy insertion point.  It needs some investigation and testing, though.

Andrew McCreight [:mccr8]

Comment 1

•

7 years ago

Make sure you look at the js::SliceBudget class. It is very useful for this sort of incremental stuff. It is used in iGC and iCC.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 2

•

7 years ago

I should note that I don't plan to work on this immediately.  I assigned it to myself so I don't forget about it.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 3

•

7 years ago

I'm going to see if I can fit this in for FF55.

Blocks: 1362412

No longer depends on: 1342854

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 4

•

7 years ago

Attached patch P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan (obsolete) — Details — Splinter Review

Step one is to cleanup a few things that look like bugs to me in RunTimeout().

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4b9071292221b56dce9bfb7e20abc454da5be3dc

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 5

•

7 years ago

Attached patch P2 Only execute consecutive timeout handlers for a limit period of time. r=ehsan (obsolete) — Details — Splinter Review

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 6

•

7 years ago

Attached patch P3 Remove fixed limit on maximum consecutive timeout handlers to execute. r=ehsan (obsolete) — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=512b137dffc7b27fe4eb622c32b8a202f3cdf01f

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 7

•

7 years ago

Comment on attachment 8866513 [details] [diff] [review]
P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan

So, in order to do this work I need to be able to gracefully leave the main RunTimeout() loop where the handlers are called.  Because of this I was looking at the loop code and thought I saw some errors.  This patch tries to fix these problems before adding my new code.

So, this patch tries to fixes these problems:

1) If the window gets frozen, the old code will leave a bunch of timers with a non-zero firing depth.  If the window gets thawed these timers may not get marked as runnable again unless there is a new timer that fires for that depth.  (See the loop earlier in RunTimeouts that focuses on firing depth zero timers.)  So this patch allows frozen windows to take the same path as suspended windows where the firing depth is zeroed.

2) I thought maybe the `!scx` case had the same problem with firing depth, but it seems more likely these timers will just never run.  I modified the code to remove timers when we're in this partially torn down state.

3) We previously thought the early return could leave some dummy timers in the list.  Looking at the code, though, the list should always be empty when we hit this path.  This adds an assert to make that clear.

Attachment #8866513 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 8

•

7 years ago

Comment on attachment 8866526 [details] [diff] [review]
P2 Only execute consecutive timeout handlers for a limit period of time. r=ehsan

This adds the code to measure time spent executing timeout handlers and to early execute if we exceed a threshold.

The threshold defaults to 4ms (25% of a frame).  Its controlled by a pref called "dom.timeout.max_consecutive_callback_ms".

The threshold is only checked once we fire the Timeout specifically targeted by this instance of RunTimeout().  In my testing for previous work this is necessary to avoid some obscure corner cases.  Its a restriction that will go away once bug 1363829 is implemented.

The threshold exit itself is implemented the same way as a suspended window.  We simply continue through the list marking timeout firing depth back to zero.

Attachment #8866526 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 9

•

7 years ago

Comment on attachment 8866527 [details] [diff] [review]
P3 Remove fixed limit on maximum consecutive timeout handlers to execute. r=ehsan

This patch removes the previous "max consecutive callbacks" limiting code.  This was a flat limit on the number of callbacks controlled by the "dom.timeout.max_consecutive_callbacks" pref.

The new mechanism is superior to this one because a time limit automatically adjusts for different levels of CPU power, etc.  We used to set the count limit differently for mobile, but we don't have to do that for the time limit.

Attachment #8866527 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 10

•

7 years ago

Attached patch P4 Don't waste time calling TimeStamp::Now() after time budget has elapsed. r=ehsan (obsolete) — Details — Splinter Review

This is a small optimization to the previous patches.  Once the time budget has elapsed we don't need to call TimeStamp::Now() for every remaining timer.  This patch sets a flag instead and avoids calling this expensive operation when we are already running short on time.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4a10f60b4f43cdde64843c272d4cbb69db0d4e7b

Attachment #8867794 - Flags: review?(ehsan)

(no longer active)

Comment 11

•

7 years ago

Comment on attachment 8866513 [details] [diff] [review]
P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan

Review of attachment 8866513 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

Attachment #8866513 - Flags: review?(ehsan) → review+

(no longer active)

Comment 12

•

7 years ago

Comment on attachment 8866526 [details] [diff] [review]
P2 Only execute consecutive timeout handlers for a limit period of time. r=ehsan

Review of attachment 8866526 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/libpref/init/all.js
@@ +5698,5 @@
>  pref("dom.timeout.max_consecutive_callbacks", 5);
>  
> +// Maximum amount of time in milliseconds consecutive setTimeout()/setInterval()
> +// callback are allowed to run before yielding the event loop.
> +pref("dom.timeout.max_consecutive_callback_ms", 4);

I'd like to get smaug's feedback on what default value we should use here.  I can't think of anything better than 4ms, but I'm wondering if we shouldn't start with something lower?

Attachment #8866526 - Flags: review?(ehsan)

Attachment #8866526 - Flags: review+

Attachment #8866526 - Flags: feedback?(bugs)

(no longer active)

Updated

•

7 years ago

Attachment #8866527 - Flags: review?(ehsan) → review+

(no longer active)

Updated

•

7 years ago

Attachment #8867794 - Flags: review?(ehsan) → review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 13

•

7 years ago

(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #12)
> I'd like to get smaug's feedback on what default value we should use here. 
> I can't think of anything better than 4ms, but I'm wondering if we shouldn't
> start with something lower?

Well, we don't want to make it too small either.  If you read the section called "What's the catch" here:

https://blog.wanderview.com/blog/2017/03/13/firefox-52-settimeout-changes/

I discuss some of the downsides of forcing timers to yield.  The goal is to strike a reasonable where a useful number of timers make it through while still yielding in time to maintain good fps.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 14

•

7 years ago

Comment on attachment 8866526 [details] [diff] [review]
P2 Only execute consecutive timeout handlers for a limit period of time. r=ehsan

4ms is quite low, IMO. Not sure having even lower value would make much difference, especially once we prioritize input events over timer events (well, not prioritize but stick next to vsync.).

I wonder if we could add some telemetry to figure out reasonable value.
Probably not, since this depends so highly on a web site.

Attachment #8866526 - Flags: feedback?(bugs) → feedback+

Pulsebot

Comment 15

•

7 years ago

Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ff1878374dd
P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/f370c47abb14
P2 Only execute consecutive timeout handlers for a limit period of time. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/69a33656f4a3
P3 Remove fixed limit on maximum consecutive timeout handlers to execute. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/b2a7fe15c578
P4 Don't waste time calling TimeStamp::Now() after time budget has elapsed. r=ehsan

Iris Hsiao [:ihsiao]

Comment 16

•

7 years ago

backed this out for test_timer_flood.html perma-failing like https://treeherder.mozilla.org/logviewer.html#?job_id=99365259&repo=mozilla-inbound&lineNumber=1868

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&bugfiler&noautoclassify&filter-searchStr=Android 4.3 API15%2B debug Mochitests executed by TaskCluster test-android-4.3-arm7-api-15%2Fdebug-mochitest-6 tc-M(6)&fromchange=e5bb57513b43

Flags: needinfo?(bkelly)

Pulsebot

Comment 17

•

7 years ago

Backout by ihsiao@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f77644a7da56
Backed out 4 changesets for test_timer_flood.html perma-failing

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 18

•

7 years ago

Thanks.  I'll take a look.

(And I'm really glad I wrote that test!)

Flags: needinfo?(bkelly)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 19

•

7 years ago

My best guess here is that the "targetTimeoutSeen" is not working like it used to.  If the timeout gets canceled or removed, then we may never throttle the loop.  I'm planning to remove this as part of bug 1363829, but maybe I can remove it here.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 20

•

7 years ago

Actually, it seems maybe the test just runs more slowly since we're permitting more time to be spent on executing setTimeout() callbacks.  I might need to tune the pref down a bit.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 21

•

7 years ago

After investigating some more it seems the changes to continue the loop to set the firing depth back to zero is taking a lot of time.  I'm going to try an alternative solution.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 22

•

7 years ago

And also we end up spending more time in the top loop where the flat limit was previously.  Perhaps we should have some large flat limit still, like 5000, and then further limit it with the time based threshold in the lower loop.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 23

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c387f16e20518389bc714cc8bf4bf2dd3543e944

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 24

•

7 years ago

Setting a flat threshold of 1000 timers in addition to the time budget allows android to pass test_timer_flood.html.  It does still take close to the timeout value, though.  The test time jumps from ~160 seconds to ~315 seconds.  The timeout for the test is set at 330 seconds.

I have an idea to avoid crawling the loop to set mFiringDepth back to zero after time elapses.  I think we can use unique firing depth values that "expire" after we leave RunTimeout().  There is nothing that really requires them to be an increment/decrement.  This would let us just break out of the main RunTimeout() loop for all the various exit scenarios.

We will have to burn a little bit of memory to do this, though.  Since we can't just decrement when we leave a nested RunTimeout() we'll need another way to find the previous levels firing depth.  This can be handled by a stack structured implemented on a small AutoTArray<uint32_t, 4>.

(no longer active)

Comment 25

•

7 years ago

Thanks for digging through this...  I'm super glad you'd added this test, otherwise this would probably regress real pages in bad ways and we wouldn't have known about it!

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 26

•

7 years ago

(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #25)
> Thanks for digging through this...  I'm super glad you'd added this test,
> otherwise this would probably regress real pages in bad ways and we wouldn't
> have known about it!

Well, I think most sites and devices would be fine.  The android emulator running a debug build is exceptionally slow and basic computation stuff.  Still, good to fix.

This works in light local testing.  Lets see what try says:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=24c32d63da3c6573ecc8c1860e212cc2da99f4af

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 27

•

7 years ago

On my local testing with a nexus 4 debug build:

1. Current m-c: ~24 seconds
2. With original patches here: ~56 seconds
3. With patches in comment 23: ~30 seconds
4. With patches in comment 26: ~28 seconds

We'll have to see what the android emulator says, though.  It could be quite different.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 28

•

7 years ago

A few more tweaks.  Doesn't change much on my nexus, but might help on the slower emulator.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bbd1ac19c4e84751df5d5552980113166411f6e2

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 29

•

7 years ago

Comment 26 brought the debug emulator back down to ~170 seconds from ~315 second.  So that's good.

Comment 28 made the time limit include time spent in the initial RunTimeout() loop.  That actually made the test worsen to ~240 seconds.

I think this means debug emulator can spend the majority of its time in the initial loop.  We should probably time limit that to be 25% of our total time allotment or something.  We want to spend our time running JS, not doing book keeping.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 30

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a13efc969443ca8e84a90eef8566ece0619ae9e4

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 31

•

7 years ago

Comment 30 patch brings us about back to the ~160 seconds to complete test_timer_flood.html on android debug emulator.

I'll clean up the patches and flag for review on the new stuff.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 32

•

7 years ago

Attached patch P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan — Details — Splinter Review

Attachment #8866513 - Attachment is obsolete: true

Attachment #8868729 - Flags: review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 33

•

7 years ago

Attached patch P2 Change TimeoutManager firing depth to a unique ID that can be expired to optimize cleanup. r=ehsan — Details — Splinter Review

Ehsan, this patch refactors our mFiringDepth to allow us to avoid having to iterate the entire list of timers to reset the value when we early-exit the RunTimeout() loop.

Previous mFiringDepth would essentially increment every time we enter RunTimeout() (or RunTimeoutHandler) and decrement when we leave the method.  This allows us to separate timers running within an event loop spin due to something like sync XHR within a timeout handler.

The new system "creates" a new FiringId integer when it enters RunTimeout().  This is appended on an AutoTArray<> stack.  When we leave RunTimeout() we pop it off the stack.

Every FiringId is an increment from the last created.  We never decrement.  Eventually we will wrap around and re-use FiringId values, but this patch assumes we'll never have that many nested RunTimeout() calls.  It seems likely we would overflow the call stack before that happens.

Instead of the old `mFiringDepth == 0` check, we now have an IsInvalidFiringId() method.  This returns true if the given FiringId is not in the AutoTArray<> stack.  It uses a number of fast paths to avoid having to search the array on every call.

Attachment #8868735 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 34

•

7 years ago

Attached patch P3 Remove unused Timeout::mPrincipal member variable. r=ehsan — Details — Splinter Review

The last patch made us use a bit more memory.  This patch reclaims some memory by nuking an unused member variable from Timeout.

Attachment #8868737 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 35

•

7 years ago

Attached patch P4 Only execute consecutive timeout handlers for a limit period of time. r=ehsan — Details — Splinter Review

This is a modified version of the time limit patch.  It now does:

1. Measures elapsed time from the beginning of RunTimeout() instead of just the second loop where handlers are executed.
2. Aborts the initial loop that looks for what timers to run if it sees a significant amount of the time budget being used there.  This is set to 25% of the total time budget.  We only check for time expiration here every 100 Timeout objects examined to avoid too much overhead.
3. Exits the second loop after executing at least one handler if the time has expired.

The goal of (2) is to avoid spending too much of our time budget on book keeping such that we don't have any time to run js handlers.

Attachment #8866526 - Attachment is obsolete: true

Attachment #8867794 - Attachment is obsolete: true

Attachment #8868738 - Flags: review?(ehsan)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 36

•

7 years ago

Attached patch P5 Removed the fixed limit on consecutive Timeout callbacks. r=ehsan — Details — Splinter Review

Rebased version of the previously reviewed patch to remove the fixed limit.  With the time limit check on the initial loop, this is no longer needed.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=662d276e8842273005d8c540469426d964d2314f

Attachment #8866527 - Attachment is obsolete: true

Attachment #8868739 - Flags: review+

(no longer active)

Comment 37

•

7 years ago

Sorry I didn't find enough time to do the reviews today, I'll try to finish them tomorrow.

(no longer active)

Comment 38

•

7 years ago

Comment on attachment 8868735 [details] [diff] [review]
P2 Change TimeoutManager firing depth to a unique ID that can be expired to optimize cleanup. r=ehsan

Review of attachment 8868735 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: dom/base/TimeoutManager.cpp
@@ +152,5 @@
> +TimeoutManager::CreateFiringId()
> +{
> +  uint32_t id = mNextFiringId;
> +  mNextFiringId += 1;
> +  if (mNextFiringId == InvalidFiringId) {

Can you please add some comment here explaining that an overflow is expected here?  It may not be super obvious to someone reading the code later.

Attachment #8868735 - Flags: review?(ehsan) → review+

(no longer active)

Updated

•

7 years ago

Attachment #8868737 - Flags: review?(ehsan) → review+

(no longer active)

Updated

•

7 years ago

Attachment #8868738 - Flags: review?(ehsan) → review+

Pulsebot

Comment 39

•

7 years ago

Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3cc2996b4900
P1 Cleanup logic in TimeoutManager::RunTimeout() to keep firing depth and other data consistent. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/7af3265734b8
P2 Change TimeoutManager firing depth to a unique ID that can be expired to optimize cleanup. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/598bd9d7eecc
P3 Remove unused Timeout::mPrincipal member variable. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb29b5cc977c
P4 Only execute consecutive timeout handlers for a limit period of time. r=ehsan
https://hg.mozilla.org/integration/mozilla-inbound/rev/a8582a3560cc
P5 Removed the fixed limit on consecutive Timeout callbacks. r=ehsan

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 40

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3cc2996b4900
https://hg.mozilla.org/mozilla-central/rev/7af3265734b8
https://hg.mozilla.org/mozilla-central/rev/598bd9d7eecc
https://hg.mozilla.org/mozilla-central/rev/cb29b5cc977c
https://hg.mozilla.org/mozilla-central/rev/a8582a3560cc

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55

Nobody; OK to take it and work on it

Updated

•

5 years ago

Component: DOM → DOM: Core & HTML

You need to log in before you can comment on or make changes to this bug.