Closed Bug 1129567 Opened 5 years ago Closed 5 years ago

Amazon preloader plus page-mod creates lots of page jank

Categories

(Add-on SDK Graveyard :: General, defect, P1)

x86
macOS
defect

Tracking

(firefox35 unaffected, firefox36+ fixed, firefox37+ fixed, firefox38+ fixed, firefox39+ fixed)

RESOLVED FIXED
Tracking Status
firefox35 --- unaffected
firefox36 + fixed
firefox37 + fixed
firefox38 + fixed
firefox39 + fixed

People

(Reporter: jryans, Assigned: mossop)

References

()

Details

(Keywords: regression)

Attachments

(2 files)

I've noticed for some time now that Amazon pages (specifically US product pages[1]) become quite hard to scroll for a while after loading.

After some investigation, it appears this is related to a combination of page-mod refactoring done in bug 1058698 and a resource preloader technique Amazon uses.

I've created a simple demo add-on[2] to reproduce this.

From the add-on README:

This problem only appears in Firefox 36 and later.

Bug 1058698 changed `PageMod` to use frame scripts and message passing in
support of e10s.  This work landed in Firefox 36.

Amazon's site uses a browser-specific preloader to pull in JS and CSS files they
believe will be needed on future pages.

In more detail:

1. The preloader waits until the main page has finished loading
2. It creates dummy elements to trigger extra JS and CSS files to load
  * For Firefox, `object` elements with a `data` attribute are used
  * For Chrome, `img` elements with a `src` attribute are used
3. The dummy elements are removed once any of the following happen:
  * `onload` handler is triggered
  * `onerror` handlers is triggered
  * 2500 - 2599 ms have passed

In Firefox, the use of `object` elements for this purpose means a tiny dummy
window is created for each preload request.  The page mod is offered each of
these windows to attach its code, but in most cases the window is already gone
before the first line of `onAttach` can execute, resulting in many errors and
page jank.

[1]: http://www.amazon.com/Saga-Vol-Brian-K-Vaughan/dp/1607069318/
[2]: https://github.com/jryans/page-mod-jank
Blocks: 1058698
Keywords: regression
This issues appears whether or not e10s is used.  I imagine this is because the SDK uses the same code path to handle page-mods in both cases now.
While the jank is occurring, you'll see many errors including:

* TypeError: worker.tab is null
* TypeError: can't access dead object
Bug 1129003 would probably help to ease this at least in the e10s case. I'm not sure why this would affect non-e10s though, there shouldn't be much else going on to what happened previously.
Depends on: 1129003
Irakli, any idea who could investigate this further?

I am worried that if we don't make some fix before 36 goes to release, then many users would be upset.
Flags: needinfo?(rFobic)
(In reply to J. Ryan Stinnett [:jryans] from comment #4)
> Irakli, any idea who could investigate this further?
> 
> I am worried that if we don't make some fix before 36 goes to release, then
> many users would be upset.

An option is to back out these changes from 36 and 37 to give us more time for a fix. They are pretty invasive changes though
[Tracking Requested - why for this release]: Bug 1058698 seems to have adversely affected the performance of loading Amazon when you have an add-on using page-mods.
(In reply to J. Ryan Stinnett [:jryans] from comment #2)
> While the jank is occurring, you'll see many errors including:
> 
> * TypeError: worker.tab is null
> * TypeError: can't access dead object

Were you seeing these with e10s on or off or both?
Flags: needinfo?(jryans)
(In reply to Dave Townsend [:mossop] from comment #7)
> (In reply to J. Ryan Stinnett [:jryans] from comment #2)
> > While the jank is occurring, you'll see many errors including:
> > 
> > * TypeError: worker.tab is null
> > * TypeError: can't access dead object
> 
> Were you seeing these with e10s on or off or both?

With e10s off, I see:

* TypeError: worker.tab is null
* TypeError: can't access dead object

With e10s on, I see:

* TypeError: worker.tab is null
Flags: needinfo?(jryans)
Tracking because of the impact on Amazon.
Attached patch patchSplinter Review
This reverts page-mod to using the old synchronous worker, essentially bringing us back to the page-mod that is currently shipping in Firefox 35. Ryan, can you test and see if this makes things better for you? It feels faster for me.

This patch applies to trunk but a slight variant can be made for aurora and beta where we should land this.
Attachment #8559365 - Flags: feedback?(jryans)
(In reply to Dave Townsend [:mossop] from comment #10)
> Created attachment 8559365 [details] [diff] [review]
> patch
> 
> This reverts page-mod to using the old synchronous worker, essentially
> bringing us back to the page-mod that is currently shipping in Firefox 35.
> Ryan, can you test and see if this makes things better for you? It feels
> faster for me.
> 
> This patch applies to trunk but a slight variant can be made for aurora and
> beta where we should land this.

Oh to be clear this probably won't help the e10s case at all but it should help the non-e10s case which is all I'm worried about for 36-37.
Comment on attachment 8559365 [details] [diff] [review]
patch

Review of attachment 8559365 [details] [diff] [review]:
-----------------------------------------------------------------

This works great in my testing!  Thanks for the quick patch!

With e10s off, there are no more errors and the page scrolls smoothly as expected.

With e10s on, I also see good behavior, but of course the page mod doesn't load at all in this case (which seems expected given the change).
Attachment #8559365 - Flags: feedback?(jryans) → feedback+
I've been trying to capture some more performance details of the jank, but having trouble with tools for the moment.  (Things keep crashing.)

From my initial glimpse at some profiles before they crash:

e10s on:

* AddonInterpositionService.p.interpose and related methods take up 50% of session time

e10s off:

* AddonInterpositionService.p.interpose is once again a large hit, about ~20% or so
* NukeCrossCompartmentWrappers also takes up a large amount of time

I'll try to grab a full profile if I can get things to stop crashing all the time... :(
Attached patch aurora patchSplinter Review
Assignee: nobody → dtownsend
Status: NEW → ASSIGNED
Attachment #8559503 - Flags: review?(jsantell)
Comment on attachment 8559503 [details] [diff] [review]
aurora patch

Review of attachment 8559503 [details] [diff] [review]:
-----------------------------------------------------------------

This is a hell of a find, jryans.
Attachment #8559503 - Flags: review?(jsantell) → review+
Comment on attachment 8559503 [details] [diff] [review]
aurora patch

Approval Request Comment
[Feature/regressing bug #]: Bug 1058698
[User impact if declined]: Amazon.com pages will cause a lot of jank when loading or scrolling if the user has an SDK based add-on installed.
[Describe test coverage new/current, TreeHerder]: This essentially reverts page-mod back to the version currently shipping in release, we should get this onto aurora and beta now to get real world coverage.
[Risks and why]: Carries low risk, this code has been used in release and many version before just fine.
[String/UUID change made/needed]: None
Flags: needinfo?(rFobic)
Attachment #8559503 - Flags: approval-mozilla-beta?
Attachment #8559503 - Flags: approval-mozilla-aurora?
Why did you ask for this to be landed? It regresses add-on support on trunk.
Flags: needinfo?(sledru)
Oups, sorry, my bad. 38 is marked as affected, we try to only accept patches which landed in m-c and since we are late in the beta cycle and with the different timezones we are dealing with, I using the checkin-needed to make sure that the patches lands asap to have some automatic coverage.
Ryan, sorry. Can you backout my mistake? Thanks
Flags: needinfo?(sledru) → needinfo?(ryanvm)
Backed out.
https://hg.mozilla.org/integration/fx-team/rev/7a3357ffc0e5
Flags: needinfo?(ryanvm)
Whiteboard: [fixed-in-fx-team]
Attachment #8559503 - Flags: approval-mozilla-beta?
Attachment #8559503 - Flags: approval-mozilla-beta+
Attachment #8559503 - Flags: approval-mozilla-aurora?
Attachment #8559503 - Flags: approval-mozilla-aurora+
I guess it is a wontfix for 38 then.
(In reply to Sylvestre Ledru [:sylvestre] from comment #21)
> I guess it is a wontfix for 38 then.

No we just need a better fix that doesn't regress e10s behaviour. Bug 1129003 may be all we need so I'm working on that now.
Blocks: 1124703
Assignee: dtownsend → nobody
Status: ASSIGNED → NEW
I've spun a test build with the changes from bug 1129003 and some other CPOW related fixes, can you test if it makes amazon faster for you? OSX builds aren't quite done yet but should show up later today.

https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/dtownsend@mozilla.com-1b928d1066fc/
Flags: needinfo?(jryans)
(In reply to Dave Townsend [:mossop] from comment #24)
> I've spun a test build with the changes from bug 1129003 and some other CPOW
> related fixes, can you test if it makes amazon faster for you? OSX builds
> aren't quite done yet but should show up later today.
> 
> https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/
> dtownsend@mozilla.com-1b928d1066fc/

Hooray!  Amazon pages are much smoother on this test build for both e10s off and on. \o/
Flags: needinfo?(jryans)
(In reply to J. Ryan Stinnett [:jryans] from comment #25)
> (In reply to Dave Townsend [:mossop] from comment #24)
> > I've spun a test build with the changes from bug 1129003 and some other CPOW
> > related fixes, can you test if it makes amazon faster for you? OSX builds
> > aren't quite done yet but should show up later today.
> > 
> > https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/
> > dtownsend@mozilla.com-1b928d1066fc/
> 
> Hooray!  Amazon pages are much smoother on this test build for both e10s off
> and on. \o/

I did a quick comparison of this build vs yesterday's nightly with the 1password extension installed and e10s enabled *or* disabled. Scrolling is much much improved!
Priority: -- → P1
Duplicate of this bug: 1132535
This bug can be marked fixed once all its dependencies are fixed.
Depends on: 1132753, 1133141, 1133834
Comment on attachment 8559365 [details] [diff] [review]
patch

We're going to need to land this on aurora now we've missed another cycle. Jordan, this is basically doign the same thing just reverting page-mod to it's pre-e10s support form.
Attachment #8559365 - Flags: review?(jsantell)
Attachment #8559365 - Flags: review?(jsantell) → review+
Comment on attachment 8559365 [details] [diff] [review]
patch

As before we don't want to land this in nightly as it will actively regress the e10s enabled state there but we should get this onto aurora. I hope to have the real fix on nightly in the next week.

Approval Request Comment
[Feature/regressing bug #]: Bug 1058698
[User impact if declined]: Amazon.com pages will cause a lot of jank when loading or scrolling if the user has an SDK based add-on installed.
[Describe test coverage new/current, TreeHerder]: This essentially reverts page-mod back to the version currently shipping in release.
[Risks and why]: Carries low risk, this code has been used in release and many version before just fine.
[String/UUID change made/needed]: None
Attachment #8559365 - Flags: approval-mozilla-aurora?
Comment on attachment 8559365 [details] [diff] [review]
patch

Thanks for staying on top of having this again, i'll adjust tracking for 39 too just in case we have to do it again in 4 weeks.
Attachment #8559365 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee: nobody → dtownsend
Bug 1132753 is fixed enough for the general case to not block this so we can call this fixed in github. Still needs an uplift though.
Status: NEW → RESOLVED
Closed: 5 years ago
No longer depends on: 1132753
Resolution: --- → FIXED
Dave what needs to happen here? This still hasn't landed in mozilla-central, right?  What needs an uplift (or a backport?)
Flags: needinfo?(dtownsend)
(In reply to Liz Henry :lizzard from comment #34)
> Dave what needs to happen here? This still hasn't landed in mozilla-central,
> right?  What needs an uplift (or a backport?)

Bug 1139189 tracks the uplift to bring the SDK code in to fix this. It just landed in fx-team.
Depends on: 1139189
Flags: needinfo?(dtownsend)
Florin can your team have a look at verifying this fix in 38? Thanks!
OK thanks Dave. Marking this status-firefox39:fixed since the fix in bug 1139189 has landed.
You need to log in before you can comment on or make changes to this bug.