Closed Bug 1256031 Opened 4 years ago Closed 1 year ago

Opening https://elements.polymer-project.org/guides/using-elements in the background hangs Firefox increasing badly CPU and memory use

Categories

(Web Compatibility :: Desktop, defect, P3)

Firefox 45
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: cristiach, Unassigned)

References

()

Details

(Keywords: regression, Whiteboard: [country-all])

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20160304114926
Summary: Webpage https://elements.polymer-project.org/guides/using-elements sometimes hangs Firefox with 100%CPU and increasing memory on Ubuntu Linux Firefox 45.0. → Webpage https://elements.polymer-project.org/guides/using-elements sometimes hangs Firefox with 100%CPU and increasing memory on Ubuntu Linux Firefox 45.0 no addons.
The memory leak is 100% reproducible when you open this link in the background without switching to the tab.
Old versions like 33 are affected too, not a regression, I think.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Webpage https://elements.polymer-project.org/guides/using-elements sometimes hangs Firefox with 100%CPU and increasing memory on Ubuntu Linux Firefox 45.0 no addons. → Opening https://elements.polymer-project.org/guides/using-elements in the background hangs Firefox increasing badly CPU and memory use
Whiteboard: [MemShrink]
Steps to reproduce:
1. "Clear Recent History" > Everything
2. Click link https://elements.polymer-project.org/guides/using-elements to open in background tab

Actual Results:
Eats up CPU, and Memory

Browser console:
Error: DuplicateDefinitionError: a type with name 'dom-module' is already registered
data:%2F**%0A%20*%20%40license%0A%20*%20Copyright%20(c)%202014%20The%20Polymer%20Project%20Authors.%:2228:13

Regression window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=9a7f3c82ab9b&tochange=066cdafa5ec3

Suspect:Bug 816180
Blocks: 816180
Keywords: regression
Flags: needinfo?(bzbarsky)
I can't reproduce this in a current debug build on Mac, though I do see this in an opt build from the same revision.  That's ... moderately concerning.
So in that opt build, I'm seeing the page run a script over and over again.  It's a huge (80k characters) script with a data: URI.  

Running this script triggers creation of a new <script> with the same exact data: URI, which itself runs, etc.

Still trying to figure out what this has to do with documentfragment, if anything, and why this is not an issue in debug builds.
So I expect the reason this works in debug is that we run out of stack and kill the recursion.  It's possible this is also what was happening before bug 816180 if the old-style bindings caused us to fall off from ion or just took up more stack or something....

As in, at this point I expect the bug is faulty logic in the site, and it just happens to fail in slightly different ways in different cases.
Note that I've tried to compile builds from around that checkin so I can get more data about what was going on before my changes, but builds from back then don't actually compile on modern Mac or Linux.  :(
One other thing I tried: running an opt build with the JITs turned off.  This should have much bigger stacks but _does_ reproduce the problem.

Alice0775, I know this is a tall order, but do you think you can try reducing this a bit?  If not, I will keep trying to figure out what the difference between opt and debug is here; the fact that there is one is pretty worrisome if it's not just about stack depths.
Flags: needinfo?(bzbarsky) → needinfo?(alice0775)
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bz] from comment #7)
> One other thing I tried: running an opt build with the JITs turned off. 
> This should have much bigger stacks but _does_ reproduce the problem.
> 
> Alice0775, I know this is a tall order, but do you think you can try
> reducing this a bit?

sorry, difficult.

But, I can reproduce** with debug build ( http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32-debug/1458088008/ )

And also reproduce** with jit off.
user_pref("javascript.options.baselinejit", false);
user_pref("javascript.options.ion", false);
user_pref("javascript.options.ion.offthread_compilation", false);

(**for some reason, there was a need to open two background tab2)

STR
1. disable e10s
2. "Clear Recent History" > Everything
3. Ctrl+click link https://elements.polymer-project.org/guides/using-elements to open in background tab
4. Ctrl+click link https://elements.polymer-project.org/guides/using-elements to open in background tab
Flags: needinfo?(alice0775)
Thanks.  I'll see if I can reproduce in a debug build too...

The other thing that I can't figure out so far is why being in the background tab matters.  I tried to disable some of the obvious things (setTimeout clamping, document visibility state) that depend on that, but the problem still occurs in background tabs but not in foreground ones.

I also tried to create a local unreduced version of the page, because then I can just instrument the JS, but haven't succeeded at that either.  :(
Firefox: 48.0a1, Build ID: 20160316030233
User Agent: Mozilla/5.0 (X11; Linux i686; rv:48.0) Gecko/20100101 Firefox/48.0

Hi,

I have tested this issue on the latest Firefox (45.0) release, latest Nightly (48.0a1) build and I have managed to reproduce it. I have opened the provided URL in background tab, the CPU usage highly increased and after ~30s the browser crashed bp-e0cb2929-734c-425c-81fc-9328a2160318.
Component: Untriaged → DOM
Product: Firefox → Core
Whiteboard: [MemShrink]
OK, I finally hacked up the same-origin policy enough that I can basically run this thing from file://.  Once I have /guides/using-elements, /elements/elements.build.html, and /elements/elements.build.js as file:// URIs, I reproduce the problem in a foreground tab too.  This makes me nearly certain that this is a timing issue.
To further add to the race/timing thing: if I add an alert in parseNext() in the page the first time it sees a script, the problem goes away.
OK, so we're getting a load event for a <style> element. When this fires, we try to find the next thing to parse, via calling importParser.parseNext which calls importParser.nextToParse which calls nextToParseInDoc(rootDocument).

We then try to look for things to parse from the about:blank document loaded by the <link rel="import">... but here's where things get weird.  In an opt build, I see an about:blank document with 17 elements after a while, and then it's got weird nodes in it.

This 17-element doc is nominally https://elements.polymer-project.org/bower_components/paper-button/paper-button.html but when I say "weird nodes" I mean a <script> with an innerText that's more than 2000 chars.  Oh, and _two_ <script> nodes to start with!  Basically, the page manages to stick a copy of that big script from the main page inside paper-button.html and then things go south.

I'm going to stop wasting time on this; this looks pretty clearly like a polymer timing bug of some sort to me...
Component: DOM → Desktop
Flags: needinfo?(bzbarsky)
Product: Core → Tech Evangelism
Version: 45 Branch → Firefox 45
Oh, at least for my hacked-up local version, the fail happens when the page tries to load https://elements.polymer-project.org/guides/bower_components/paper-button/paper-button.html instead of https://elements.polymer-project.org/bower_components/paper-button/paper-button.html (note the lack of "guides" in the latter).  And then it fails: the former URI returns a page with that big <script> in it.
Yeah, I can confirm that on the site itself when the failure is observed there is an XHR for https://elements.polymer-project.org/guides/bower_components/paper-button/paper-button.html which comes from the elt.href of an HTMLLinkElement.  When the failure is not observed, there is no such XHR, nor is there an XHR for <https://elements.polymer-project.org/bower_components/paper-button/paper-button.html>.....
Seems the URL with guides/bower_components serves the same content as the main page, so it's basically loading itself recursively as a component or something.

Maybe something goes wrong in the interaction between this:
https://elements.polymer-project.org/elements/elements.build.html

<dom-module id="paper-button" assetpath="../bower_components/paper-button/">


and this:
https://elements.polymer-project.org/elements/elements.build.js

resolveUrl: function (url) {
var module = Polymer.DomModule.import(this.is);
var root = '';
if (module) {
var assetPath = module.getAttribute('assetpath') || '';
root = Polymer.ResolveUrl.resolveUrl(assetPath, module.ownerDocument.baseURI);
}
return Polymer.ResolveUrl.resolveUrl(url, root);
}
});

?

It may also be noteworthy that I see this error only happen when it fails:

TypeError: request.completes is undefined
.generateRequest()
elements.build.js:12777
.attached()
elements.build.js:12863
._invokeBehavior()
elements.build.js:402
._doBehavior()
elements.build.js:397
Polymer.Base.attachedCallback/<()
elements.build.js:200
Polymer.RenderStatus._makeReady()
elements.build.js:103
Polymer.RenderStatus._catchFirstRender/<()
elements.build.js:109

Will dig some more later.
Whiteboard: [needsdiagnosis] [country-all]
Testing in a recent nightly I don't see this anymore. Firefox is a bit sluggish while the background page opens, but soon the CPU returns to normal levels.
Anyone still seeing the problem?
Flags: needinfo?(bzbarsky)
Flags: needinfo?(alice0775)
The problem still exists for me, in all versions of FF on OSX.
STR
1. disable e10s
2. "Clear Recent History" > Everything
3. Ctrl+click link https://elements.polymer-project.org/guides/using-elements to open in background tab
4. Ctrl+click link https://elements.polymer-project.org/guides/using-elements to open in background tab


I can still reproduce the GB memory and 100% core usage on Latest Aurora48.0a2[1] and Nightly49.0a1[2].

[1]
https://hg.mozilla.org/mozilla-central/rev/25321494921c824703a605127fb1f99b1faf5910
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0 ID:20160601030219

[2]
https://hg.mozilla.org/mozilla-central/rev/25321494921c824703a605127fb1f99b1faf5910
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0 ID:20160601030219
Flags: needinfo?(alice0775)
Right, when I reproduced it was using the steps from comment 8.  So comment 17 doesn't seem relevant unless e10s was explicitly disabled in that nightly...

Note also that this is all racy and timing-dependent.  :(
Flags: needinfo?(bzbarsky)
Priority: -- → P3
Whiteboard: [needsdiagnosis] [country-all] → [needsdiagnosis] [country-all] [qf]
Alice0775: Sorry this went quiet for a little while - would you mind seeing if this still reproducible, and if so, capture a profile via http://perf-html.io/ ?
Flags: needinfo?(alice0775)
Per comment 13, it sounds like this is something pathological with the site; but if it's Firefox specific, still tentatively worth doing something about.

Do we know if the "disable e10s" step (from the various flavors of STR here) is just to make this *easier* to reproduce? Or does it seem like this *only* reproduces in disable-e10s configurations?  If it's the latter, then I'm not sure how much we should worry about this, given that e10s is now the default (and given that we're moving towards being even more multi-process in Project Fission, bug 1451850).
(In reply to Daniel Holbert [:dholbert] from comment #21)
> Alice0775: Sorry this went quiet for a little while - would you mind seeing
> if this still reproducible, and if so, capture a profile via
> http://perf-html.io/ ?

I can reproduce this on Firefox62.0b13 with STR of comment#19.
And I can also reproduce this on Nightly63.0a1 with the STR but I need open 4 background tabs at once.

Gecko profiler log
(Nightly63.0a1 /w new profile only gecko profiler installed + 4 background tabs): https://perfht.ml/2LLvpXD
Flags: needinfo?(alice0775)
This is only happening when the tab is opened in the background.  If you navigate to it in the foreground the site loads performantly.

Ted noted that the profile indicates that most of the time in the pathological case is spent in JS GC? This has to be related to some tuning that's different for background-loaded pages than for foreground-loaded pages.  Is GC policy for background tabs a potential culprit?
Whiteboard: [needsdiagnosis] [country-all] [qf] → [needsdiagnosis] [country-all] [qf:investigate][qf:jsinvestigate]
For me the steps to reproduce from comment 19 completely hang the content processes of those tabs.
Here's a profile without JS stacks: https://perfht.ml/2vSF97F
In a profile with JS stacks, the whole profile is dominated by profiler overhead (creating JS function strings).

It's possible that the page detects the fact that it's in the background and runs different JS.
I can no longer reproduce this. Alice, can you?
Flags: needinfo?(alice0775)
Whiteboard: [needsdiagnosis] [country-all] [qf:investigate][qf:jsinvestigate] → [needsdiagnosis] [country-all]
(In reply to Markus Stange [:mstange] from comment #26)
> I can no longer reproduce this. Alice, can you?

I cannot reproduce this problem anymore on Nightly66.0a1.
Flags: needinfo?(alice0775)
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WORKSFORME
Whiteboard: [needsdiagnosis] [country-all] → [country-all]
Product: Tech Evangelism → Web Compatibility
You need to log in before you can comment on or make changes to this bug.