Closed Bug 1118689 Opened 10 years ago Closed 10 years ago

[non-e10s] 100% CPU usage even if opened a blank tab only and browser process remains on exit

Categories

(Core :: DOM: Core & HTML, defect)

29 Branch
x86_64
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
e10s - ---
firefox34 --- wontfix
firefox35 --- wontfix
firefox36 - wontfix
firefox37 + fixed
firefox38 - fixed
firefox-esr31 --- affected

People

(Reporter: alice0775, Assigned: smaug)

References

()

Details

(Keywords: hang, regression, reproducible)

Attachments

(4 files, 3 obsolete files)

[Tracking Requested - why for this release]: Browser hang up This problem happens with/without Flash Plugin, with/without e10s. Reproducible: always Steps To Reproduce: 1. Open a topic of http://game.watch.impress.co.jp/backno/news/index.html in Tab[1] 2. Reload the tab (F5) until CPU usage becomes 100% core maybe it needs at least 5-10times --- Browser hangs here sometimes 3. Open blank tab in Tab[2] 4. Close tab [1] --- 100% core forever. 5. Exit Browser --- firefox.exe remains in process list forever Actual Results: 100% core forever. http://people.mozilla.org/~bgirard/cleopatra/#report=cb475cb8b8ecee36eeec26c2d4aefd3472509286 And after exit browser, firefox.exe remains in process list forever. And browser becomes unresponsive completely sometimes. Expected Results: Browser should not hang up. Should CPU utilization settle to 0%. Process should disappear immediately after quit.
Steps 1. Open index.html of attachment 8545237 [details] in Tab[1] (network access should be available) 2. Wait for a while so that CPU usage becomes almost 100% core 3. Open about:blank in new tab Tab[2] 4. Close tab [1] ---- observe CPU still high usage --- Bug! 5. Close Browser ---- observe CPU still high usage and process remains --- also Bug! Regression window(m-c) Good: https://hg.mozilla.org/mozilla-central/rev/201feeaf31f3 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0 ID:20140323082913 Bad: https://hg.mozilla.org/mozilla-central/rev/b1b18e742a42 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0 ID:20140323171514 Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=201feeaf31f3&tochange=b1b18e742a42 Regression window(m-i) Good: https://hg.mozilla.org/integration/mozilla-inbound/rev/0f161bf4f415 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0 ID:20140323084414 Bad: https://hg.mozilla.org/integration/mozilla-inbound/rev/e15020347f1b Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0 ID:20140323085513 Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=0f161bf4f415&tochange=e15020347f1b regressed by: Bug 986542
Blocks: 986542
Component: JavaScript Engine → DOM
Flags: needinfo?(bobbyholley)
Version: Trunk → 29 Branch
Summary: 100% CPU usage even if opened a blank tab only and browser unresponsive → 100% CPU usage even if opened a blank tab only and browser process remains on exit
We have this regression for a while. I don't think we will do a new 35 build for this. However, if the patch is low risk, we will be happy to take a fix for 36.
Summary: 100% CPU usage even if opened a blank tab only and browser process remains on exit → 100% CPU usage even if opened a blank tab only and browser unresponsive
Summary: 100% CPU usage even if opened a blank tab only and browser unresponsive → 100% CPU usage even if opened a blank tab only and browser process remains on exit
Boris, do you have time to look at this?
Flags: needinfo?(bobbyholley) → needinfo?(bzbarsky)
Attached file google_ad.zip ( This is the culprit ) (obsolete) —
I can't seem to reproduce the bug, at least in a trunk debug build on Mac... In particular, loading the index.html from the attached zip file never sends CPU to 100% after the pageload finishes. :( I'll attach the patch I was going to use to debug this. If someone can reproduce this in their own build, it would be interesting to set a breakpoint in nsScriptSecurityManager::ScriptAllowed right before the return statement, condition it on "thingie.mDetached" being true, and see whether it's getting hit repeatedly after the 100% CPU tab has been closed. If so, it would be interesting to know what the C++ callstack looks like at that point, and which JS function we're about to call via the callback object we have in the parent frame's CallbackObject code.
Flags: needinfo?(bzbarsky)
I can't reproduce on linux (opt build).
I can reproduce on Ubuntu14.04 64bit too. https://hg.mozilla.org/mozilla-central/rev/2a193b7f395c Mozilla/5.0 (X11; Linux x86_64; rv:37.0) Gecko/20100101 Firefox/37.0 ID:20150106030201 So, The Google ad seems to be limited in Japan region... :(
OS: Windows 7 → All
I should mention that the problem does not occur in e10s window.
Summary: 100% CPU usage even if opened a blank tab only and browser process remains on exit → [non-e10s] 100% CPU usage even if opened a blank tab only and browser process remains on exit
Boris Zbarsky [:bz] Olli Pettay [:smaug] Could you test once with non-e10s window?
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bugs)
The testing I did in comment 8 was with a non-e10s window.
Flags: needinfo?(bzbarsky)
I was testing non-e10s too.
Flags: needinfo?(bugs)
I think this should be fixed in high priority because Google ad codes causes the problem. (Unfortunately, this maybe the region limited problem. and also I do not familiar windbg/gdb...)
Alice, is the cleopatra profile created using Nightly? If not, could you use Nightly (stackwalking enabled for the profiling).
Masayuki, do you know if anyone from Mozilla Japan could help with this a bit.
Flags: needinfo?(masayuki)
Attachment #8545445 - Attachment description: index_1.html minimum → index_1.html minimum, expired :(
Attachment #8545348 - Attachment is obsolete: true
Attachment #8545237 - Attachment is obsolete: true
Hmm, I've never experienced this and all of them are expired...
Flags: needinfo?(masayuki)
[Tracking Requested - why for this release]: Reproduced on https://mixi.jp/
The problem happens when the following Google ads is shown. "ブランド高く売れる" "車高く売れる" And the ad seems to go is spread widely on famous site in Japan.
Need some help here. Does Mozilla Japan happen to have a VPN to use?
Flags: needinfo?(masayuki)
Flags: needinfo?(VYV03354)
I have never worked at Mozilla Japan.
Flags: needinfo?(VYV03354)
(In reply to Olli Pettay [:smaug] from comment #25) > Need some help here. > > Does Mozilla Japan happen to have a VPN to use? Kato-san?
Flags: needinfo?(masayuki) → needinfo?(m_kato)
Sorry Masatoshi, my comment wasn't clear. I was wondering if you could help here, like reproduce this (and Masayuki might know whether there is a VPN).
Flags: needinfo?(VYV03354)
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #27) > (In reply to Olli Pettay [:smaug] from comment #25) > > Need some help here. > > > > Does Mozilla Japan happen to have a VPN to use? > > Kato-san? I don't have VPN for MV. But maybe, I can setup proxy.
Flags: needinfo?(m_kato)
No longer seeing these ads.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Flags: needinfo?(VYV03354)
I have might-be-same issue. In my case advertising on issue is delivered by doubleclick.net. URL of the ad are https://ad.doubleclick.net/adi/N7586.150834.TURN/B8186874.109993605;sz=728x90;ord=%5Btimestamp%5D?:89 and https://ad.doubleclick.net/adi/N7586.150834.TURN/B8186874.109993604;sz=300x250;ord=%5Btimestamp%5D?:89 Reproduced on Japanese version of35.0.1, 37.0a2 (2015-01-29) and 38.0a1 (2015-01-29).
I can reproduce with the URLs in Comment#33. And get same regression window of comment#4 . The problem advertising seems to have spread widely. :bz Can you reproduce with the URLs in Comment#33?
Status: RESOLVED → REOPENED
Flags: needinfo?(bzbarsky)
Resolution: WORKSFORME → ---
Yes, thank you. Using the second of those urls, in non-e10s mode, I see the 100% CPU usage and the failure to quit. Looking into it.
Flags: needinfo?(bzbarsky)
OK, so while we're running the page, everything is dandy as far as I can tell. When I try to quit (in non-e10s mode), we end up in a loop running "load" events. The load event runs the .prototype<.handleLoad_.value function in <https://ad.doubleclick.net/adi/N7586.150834.TURN/B8186874.109993604;sz=300x250;ord=%5Btimestamp%5D?:89> which in turn runs .prototype<.applyScaling_.value in the same file. This sets the "src" attribute on the <img>, which then fires another load event, etc. So the real issue here is that we don't really have a distinction between "<browser> removed from the DOM because we're quitting" and "<iframe> removed from the DOM because web page". Bug 986542 made events work in the latter case, but this quit case looks identical from Gecko's point of view. So we end up in a situation where the mainthread event loop never empties out and shutdown can't proceed. Here's a trivial testcase that has this behavior: data:text/html,<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP" onload="this.src = this.src"> If I load this and then quit the browser (still in non-e10s mode), I get the shutdown hang (well, until the MOZ_CRASH from RunWatchdog kicks in) with 100% CPU usage. Olli, any thoughts on how we should handle this? We really need to figure out when we can just mark windows as "no more execution here, please".
Flags: needinfo?(bugs)
Does img.src = img.src need to do anything when the relevant docshell has been detached? That feels like a simpler thing to fix than trying to figure out when an event listener shouldn't be called anymore (since we don't want to regress Bug 986542).
Flags: needinfo?(bugs)
> Does img.src = img.src need to do anything when the relevant docshell has been detached? I don't know... We could try simply bailing out of LoadImage if we're in such a situation. We _do_ need to do it for image elements in data documents that never had a window, though maybe not if the global they're associated with has been detached...
What do you mean with "though maybe not if the global they're associated with has been detached..."
Actually, I was just confused. Images in data documents never perform loads anyway.
Assignee: nobody → bugs
Attached patch v1Splinter Review
Not super pretty. Tried to reduce risk for regressions, so didn't change data or chrome document handling. https://tbpl.mozilla.org/?tree=Try&rev=d99242b0bf0f
Attachment #8563586 - Flags: review?(bzbarsky)
Comment on attachment 8563586 [details] [diff] [review] v1 > nsImageLoadingContent::SetLoadingEnabled(bool aLoadingEnabled) Seems to me like setting false should be allowed even if our document (currently) can't load images. So like so: mLoadingEnabled = aLoadingEnabled && nsContentUtils::GetImgLoaderForChannel(nullptr, GetOurOwnerDoc()); though this does mean you can't reenable loading for an <img> that's bfcached before it comes out of bfcache.... Do we really need to condition mLoadingEnabled on the document state at all? > nsImageLoadingContent::FireEvent(const nsAString& aEventType) Why is the change here needed? This one seems a bit odd... I guess the problem is error events? Worth documenting. r=me modulo the above.
Attachment #8563586 - Flags: review?(bzbarsky) → review+
mLoadingEnabled is indeed very odd, and didn't want to change its behavior. The FireEvent case is there because there are code paths where we end up firing error event if LoadImage and such calls fail. I'll add a comment.
> mLoadingEnabled is indeed very odd, and didn't want to change its behavior. Then please don't? The current mLoadingEnabled behavior is that it's set based on whether image loading is enabled at all. Then if image loading is globally disabled, you can't toggle mLoadingEnabled. Just leave that behavior instead of introducing dependencies on the document.
Oh, crap, I had the constructor in my mind, not the setter. Will pass nullptr also in setter.
Attached patch v2 (obsolete) — Splinter Review
But need to wait for tryserver results before landing.
Looks like I need to still enable stuff for svg images.
[Tracking Requested - why for this release]: 100% cpu usage is bad. Need to get some feedback for the new behavior, so fixing this for FF36 might not be a good idea, but FF37 perhaps.
Too late for 36. However, Smaug, could you fill an uplift request for aurora? Thanks
Comment on attachment 8563715 [details] [diff] [review] v3 Approval Request Comment [Feature/regressing bug #]:bug 986542 [User impact if declined]: High cpu usage [Describe test coverage new/current, TreeHerder]: landed with a test. [Risks and why]: not very risky. It does prevent images to load in certain cases when web pages are going away, so if some web site relies on such behavior... [String/UUID change made/needed]:NA
Flags: needinfo?(bugs)
Attachment #8563715 - Flags: approval-mozilla-aurora?
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
(In reply to Olli Pettay [:smaug] from comment #52) > [Risks and why]: not very risky. It does prevent images to load in certain > cases when web pages are going away, so if some web site relies on such > behavior... Do you understand these cases and can you provide more specifics? Alice - Can you still reproduce on the latest Nightly build?
Flags: needinfo?(bugs)
Flags: needinfo?(alice0775)
I can not the problem on the following Nightly38.0a1. https://hg.mozilla.org/mozilla-central/rev/09f4968d5f42 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Firefox/38.0 ID:20150217030213 However, I can still reproduce the problem on the following Aurora37.0a2 and Beta36.0b9. https://hg.mozilla.org/releases/mozilla-aurora/rev/997fae1e1f88 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:37.0) Gecko/20100101 Firefox/37.0 ID:20150217004007 https://hg.mozilla.org/releases/mozilla-beta/rev/101d05cb5635 Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0 ID:20150212154903
Flags: needinfo?(alice0775)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #54) > (In reply to Olli Pettay [:smaug] from comment #52) > Do you understand these cases and can you provide more specifics? From comment 36 data:text/html,<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP" onload="this.src = this.src"> So yes, we understand pretty well how this can happen and why the regression happened.
Flags: needinfo?(bugs)
Comment on attachment 8563715 [details] [diff] [review] v3 Thanks for verifying. Aurora+
Attachment #8563715 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: