Closed Bug 1318084 Opened 8 years ago Closed 8 years ago

http://militarymaps.info will not loaded completely

Categories

(Core :: DOM: Core & HTML, defect)

52 Branch
x86_64
Windows 10
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox50 --- unaffected
firefox51 --- unaffected
firefox52 --- fixed
firefox53 + fixed

People

(Reporter: speciesx, Assigned: bkelly)

References

()

Details

(Keywords: regression)

Attachments

(3 files, 2 obsolete files)

Attached image Firefox Nightly
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:53.0) Gecko/20100101 Firefox/53.0


STR:
Load this page http://militarymaps.info

Actual results:
Page is partially loaded (attached screenshot Firefox Nightly) 

Expected results:
Page is fully loaded (attached screenshot Firefox 50)
Keywords: regression
Attached image Firefox 50
I think there is a regression when loading the website, I stay stuck during the countdown, and the loading will finish only if I switch to another tab and go back. In that case, I'm able to see the map.

Reg range:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d19cffeae1febc6d669a024340de25228e544475&tochange=24d97dc514d4b8fe608e9aee7527615c1e613606
Blocks: 1300659
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(bkelly)
Version: Trunk → 52 Branch
The fact that switching tabs suggests it has something to do with the timer scheduling code changes I made.  When you take a tab from background back to foreground things get rescheduled again so that may be why that works around the problem.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly)
Note, on my machine I don't make it as far as the comment 0 screenshot.  It gets stuck at the countdown around the number "7".
Note that while its stuck one core is pegged.
Tracking 53+ for this nightly regression.
So, this site floods the main thread with timers which is triggering the back pressure mechanism I put in place.  Unfortunately, the back pressure mechanism I built does not quite work as I thought.  It is a bit too strict and also burns a ton of CPU for no reason.  I'm writing a patch to use a better mechanism.
I need to do some cleanup, but this fixes the issue locally.
(Bill, Ollie has flags disabled due to PTO.  Can you take this?)

This patch removes the Suspend()/Resume() back pressure in favor of an increasing minimum timer delay.  When the TabGroup queue is 5000 runnables deep we start applying a 500ms delay.  This is scaled up by another 500ms for every 5000 more runnables.

In order to avoid running timers out of order and to avoid a lot of extra CPU work we don't reduce the delay amount until the window fully drops below 5000 runnables again.

At that point we run the expensive ResetTimers code to recalculate deadlines so we can avoid any out of order timer execution after that point.

The old Suspend()/Resume() mechanism did not really let the backlogged timers run.  We would fire on timer, then Suspend() making all queued timer runnables no-ops, then Resume() re-queueing more timers, and then repeat.  The result was a very slow execution of timers while burning a lot of CPU thrashing the state around.  This patch slows the rate at which timers can fire instead.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=93e21fe246c2963a05045fe63c9fd822a4cdc3ec
Attachment #8811946 - Attachment is obsolete: true
Attachment #8812045 - Flags: review?(wmccloskey)
Comment on attachment 8812045 [details] [diff] [review]
Make nsGlobalWindow back pressure increase timer delays instead of calling Suspend(). r=smaug

Review of attachment 8812045 [details] [diff] [review]:
-----------------------------------------------------------------

Olli said he could review since he's not quite on PTO yet.
Attachment #8812045 - Flags: review?(wmccloskey) → review?(bugs)
Comment on attachment 8812045 [details] [diff] [review]
Make nsGlobalWindow back pressure increase timer delays instead of calling Suspend(). r=smaug

> 
>+class nsGlobalWindow::MaybeRemoveBackPressureCallback final : public nsITimeoutHandler
>+{
>+  RefPtr<nsGlobalWindow> mWindow;
Why this doesn't need to be added to CC?

>+nsGlobalWindow::MaybeRemoveBackPressure(nsIDocument* aSuppressedDoc)
This is confusing method, at least its name, since it may either remove back pressure or increase it.
Perhaps call it CancelOrUpdateBackPressure or some such?


r- because of the CC issue.
Attachment #8812045 - Flags: review?(bugs) → review-
This updates the patch to use a simple Runnable instead of a timer callback.  I originally added the timer callback because I was worried about out-of-order timers, but using the ResetTimersForThrottleReduction() solved that problem.  I verified locally that this fixes the comment 0 problem and my original timer bomb scenario.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=02a4d2c96c3e3ac0f1e408c4ee1dbae9dbc45482
Attachment #8812045 - Attachment is obsolete: true
Comment on attachment 8812314 [details] [diff] [review]
Make nsGlobalWindow back pressure increase timer delays instead of calling Suspend(). r=smaug

Please test how this works when a back pressured window goes from foreground to background and back to foreground


The Runnable approach is rather event loop heavy, since I assume we'll end up dispatching tons of runnables, but those should be really fast ones.
Attachment #8812314 - Flags: review+
(In reply to Olli Pettay [:smaug] (vacation Nov 19-26) from comment #13)
> Please test how this works when a back pressured window goes from foreground
> to background and back to foreground

Yes, I tested that locally.  Basically we will recalculate timers up to the background delay amount, but with current default prefs that has no effect because back pressure delay is greater.

> The Runnable approach is rather event loop heavy, since I assume we'll end
> up dispatching tons of runnables, but those should be really fast ones.

Well, the runnable is dispatched to the ThrottledEventQueue.  So the current queue backlog must complete before we update our back pressure state.  The greater the queue backlog the longer we stay in back pressure before updating back pressure state.
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f1a8ad430beb
Make nsGlobalWindow back pressure increase timer delays instead of calling Suspend(). r=smaug
Comment on attachment 8812314 [details] [diff] [review]
Make nsGlobalWindow back pressure increase timer delays instead of calling Suspend(). r=smaug

Approval Request Comment
[Feature/regressing bug #]: Bug 1300659
[User impact if declined]: Poorly behaved sites that abuse the main thread will trigger a back pressure mechanism as of bug 1300659.  That back pressure mechanism is too aggressive, however, and can cause these sites to no longer function.  This patch adjusts the back pressure mechanism to manage the abuse while still allowing useful amounts of work to be performed.
[Describe test coverage new/current, TreeHerder]: Unfortunately its difficult to write tests for this sort thing that would not be racy and intermittent.  This patch was tested locally.  We have extensive tests that use setTimeout() as well in our automated test suite.
[Risks and why]: Minimal given that the back pressure mechanism is only triggered when a site produces extreme stress on the browser.  It does not come into play the majority of the time.
[String/UUID change made/needed]: None
Attachment #8812314 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/f1a8ad430beb
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8812314 [details] [diff] [review]
Make nsGlobalWindow back pressure increase timer delays instead of calling Suspend(). r=smaug

fix a regression on some sites in aurora52
Attachment #8812314 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
There was a conflict on uplift so running a quick clobber build and some tests to make sure I did the merge correctly.
Blocks: 1319991
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: