Low-priority setTimeout() breaks slider on bestbuy.com

RESOLVED WORKSFORME

Status

()

P1
normal
RESOLVED WORKSFORME
2 months ago
a month ago

People

(Reporter: denschub, Assigned: jesup)

Tracking

({regression})

unspecified
regression
Points:
---
Bug Flags:
webcompat ?

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox65 unaffected, firefox66 fix-optional, firefox67 fix-optional)

Details

(Whiteboard: [webcompat], URL)

Attachments

(1 attachment)

(Reporter)

Description

2 months ago

This is a regression from bug bug 1270059, but I'm filing this in Firefox::General because it doesn't really fit into the performance component. Feel free to move around.

On https://www.bestbuy.com/?intl=nosplash, the header contains a slider which should be starting automatically on site load. Currently, it's not. In addition, the start/stop button is not working.

mozregression brought me to a small regression range, and I blame the change in setTimeout()s behavior.

(Reporter)

Comment 1

2 months ago

:jesup, given you worked on bug 1270059, is this something you could have a look at? Let me know if you want help identifying the JS that's causing the breakage, happy to help out.

Flags: needinfo?(rjesup)
(Reporter)

Comment 2

2 months ago

JFTR, this also breaks other slides on the same page, like the "Most-viewed items" or "Trending clearance and outlet products" sliders, for example.

Keywords: regression
status-firefox-esr60: --- → unaffected
tracking-firefox66: ? → +
(Assignee)

Updated

2 months ago
Assignee: nobody → rjesup
Flags: needinfo?(rjesup)
tracking-firefox67: --- → +
Flags: webcompat?
Whiteboard: [webcompat]

I'm investigating... it looks like a page error (invalid (per the spec) assumptions about timeout firing) - when I turn off deferral in a debug build, and add 4 seconds to all settimeout/interval calls, it fails in the same way (per the spec, we should always be able to fire timeouts later than requested). I'm working on opt builds with a configurable settimeout delay to try to verify this and see if I can track down the error.

I see it breaking with dom.timeout.defer_during_load = false if I set dom.timeout.delay_by_ms to 200, but works at 100 (on my machine, opt build).  This implies an incorrect assumption about when timeouts might fire (vs the load event probably)

Dennis, given jesup's comment, are you able to identify where bestbuy is making poor assumptions about timeouts in relation to load? We might be able to give them a hint on how they can fix this (if it's not too much of a rats nest to figure out).

Flags: needinfo?(dschubert)

Guessing at a priority but feel free to change it or move components.

Priority: -- → P1

Why is Chrome unaffected?

(In reply to Masatoshi Kimura [:emk] from comment #7)

Why is Chrome unaffected?

While they also deprioritize timeouts during load, they don't do it in exactly the same manner, and probably they fire the particular timeout before firing load. I am investigating the how they handle this page, but probably BestBuy simply happened to not get bitten by their deprioritization.

Looking at it in chrome, I see various timeouts firing before pageload; many of them seem to be 0ms timers firing from the imported es6-promise package - https://github.com/stefanpenner/es6-promise.git or maybe from some tracking scripts - hard to tell with minified JS

There's also an earlier timer in Bootstrap.js which may be interesting to look at

(Reporter)

Comment 10

a month ago

Yikes, what a debugging unfriendly codebase this site serves to us. :)

There are way too many setTimeout() calls to be looking at every single one. However, there is a hint in the console, a "TypeError: this is undefined" exception, but that's 51 frames deep in a stack full of wrapper functions.

This site seems to be using a framework on top of Backbone.js called Marionette, and the exception itself is happening in the _proxyMethods function of a module called Wreqr.Radio. This module is a global communication channel for components, so... that's fun. If we look at the unminified source for that function, we can see that this exception doesn't make sense here. _proxyMethods gets called from within Radios constructor, and they set this up with some empty objects straight away. Inside the function, there are a couple of callbacks, but given they call Underscore.js's each function, we should be somewhat confident that Underscore.js isn't messing with the scopes of these functions.

While looking at the stacks of calls to that _proxyMethods method in both a "working" scenario (i.e., Firefox with dom.timeout.defer_during_load turned off) and a "non-working" one, I noticed that the stacks look different. Early calls are identical, but as soon as we are within that method calling Underscore, we end up in different functions, one of which does things that end up breaking the scope of those callback functions.

And it turns out that the site is loading two versions of Lodash/Underscore, one in libraries-d3479d7f4631b94864264fdd7bdf91f3.js (v3.10.1, which is the one that should normally be used), and the other one in timeline-2.0.11.js (v4.17.10, which is used if the slider is broken). Both versions register themselves with the same name to the global RequireJS, so it's no wonder that some parts of the site suddenly end up somewhere else. I don't know exactly why they collide with each other in this way, but given it's one major version of Lodash apart, it's not surprising.

Although this is something the site could fix, it is still a bit concerning to me. The only asynchronous call in the whole stack here is a setTimeout(fn, 4);. Unfortunately, this is RequireJS' nextTick function. So while it is weird for a site to define the same module twice (with different versions), us delaying setTimeout is somehow influencing RequireJS, although only in cases where a module is defined twice.

Flags: needinfo?(dschubert)

Perhaps this is due to timing changes changing the ordering in which resources get accessed, allowing something that normally doesn't get loaded because a version of it is already loaded to get used. I.e. Lodash 3 normally gets installed off something triggered by a timeout, and with the change Lodash 4 gets installed first, perhaps.

I think in any case it's clear the code is racy.

Chrome does seem to be more willing to fire timers before load than we are now - we're re-using the IdleCallback queue decisions, such as avoiding vsync, budgeting, etc. We will fire timers during load if there are idle times, but BB may not be idle enough to allow these to run (or to run early enough).

One option would be to look at having a less-restrictive definition of "idle" used for the deferred timers queue, instead of the same definition used for IdleCallbacks

Olli - thoughts?

Flags: needinfo?(bugs)

So, BestBuy now works.... likely they fixed their timing bug. :-)

We still may want to do more looking at alternative definitions of 'idle' for timer use.

Status: NEW → RESOLVED
Last Resolved: a month ago
Resolution: --- → WORKSFORME
status-firefox66: affected → fix-optional
status-firefox67: affected → fix-optional
Flags: needinfo?(bugs)

[Tracking Requested - why for this release]:

Marking for re-triage - this is resolved; they fixed their bug apparently.

tracking-firefox66: + → ?
tracking-firefox67: + → ?
tracking-firefox66: ? → ---
tracking-firefox67: ? → ---
You need to log in before you can comment on or make changes to this bug.