Closed
Bug 984838
Opened 10 years ago
Closed 10 years ago
100% CPU on yammer
Categories
(Firefox :: General, defect)
Firefox
General
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: markh, Unassigned)
References
Details
(Keywords: perf)
Attachments
(1 file)
681.88 KB,
text/plain
|
Details |
STR for me on Win 7, on 2 different machines * Open yammer; Select the "all company" group; shift-reload. * Alternatively: Open yammer; select "all company" group; restart and restore session. Firefox starts shewing 100% of a core. Also: * Closing the tab fixes it; re-open and it comes back. * Selecting the home tab on yammer stops it. Re-selecting "all company" does *not* restart the problem - it's necessary for the page-load to happen on the "all company" group to repro. Possibly all groups - didn't try that. Possibly happens on the community yammer etc, but also didn't try that - so sadly the above requires access to the staff yammer :(
Comment 1•10 years ago
|
||
Mark can you link to your mozilla profiler when this happens we already have axel's in the yammer thread.
Reporter | ||
Comment 2•10 years ago
|
||
Profile at http://people.mozilla.org/~mhammond/yammer_high_cpu.json
Comment 3•10 years ago
|
||
Any change for a gecko profiler link, not to .json. Something like http://people.mozilla.org/~bgirard/cleopatra/#report=xxxxxx
Comment 4•10 years ago
|
||
chance, not change, but anyhow, link to a profile which is easy to read. I don't know how to import that .json to anywhere.
Comment 5•10 years ago
|
||
Ah, that profile is from devtools. Please use Gecko profiler or some other profiler which gives better information what is happening inside Gecko.
Comment 6•10 years ago
|
||
I saw similar behaviour on http://telemetry.mozilla.org/#release/27/SIMPLE_MEASURES_AMI_STARTUP_END/saved_session/Fennec (not sure it's completely required to reproduce, but I had the "Show evolution over Calendar Dates" button selected). In my case, dismissing the slow script dialog a few times eventually led to crashes: bp-0160e74b-eea5-41ae-a478-7fbb92140317 bp-b856d64a-2b53-4a8f-9304-b08ef2140317 bp-93e62d47-59fc-4acc-b3fe-90e0c2140317 One note: running Nightly from the command line, I see the following warning: JavaScript warning: http://cdnjs.cloudflare.com/ajax/libs/d3/3.3.9/d3.min.js, line 3: mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]] value using Object.create Here's a profile: http://people.mozilla.org/~bgirard/cleopatra/#report=8e78388aa31ba35b58dec7ecba48ca70af22d59e
Comment 7•10 years ago
|
||
That profile shows a bunch of JS running off timers, which then goes and creates tons of arrays, does something with Voronoi polygons, creates a ton of non-Array objects, etc. NVD3 is some sort of charting package. Is it possible that yammer is using this for something and continuously updating the charts?
Comment 8•10 years ago
|
||
bz: Comment 6 was referring to a similar symptom seen at a different URL; at smaug's suggestion I split it off to bug 984928.
Comment 9•10 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #7) > That profile shows a bunch of JS running off timers, which then goes and > creates tons of arrays, does something with Voronoi polygons, creates a ton > of non-Array objects, etc. ...which leads to GCing very often, since we certainly spend quite a bit time in gc.
Reporter | ||
Comment 10•10 years ago
|
||
Gecko profiler link: http://people.mozilla.org/~bgirard/cleopatra/#report=517f7807e93091b80d5155fb0e88c11d0956dce1
Comment 11•10 years ago
|
||
That's showing mostly a bunch of painting.
Reporter | ||
Comment 12•10 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #11) > That's showing mostly a bunch of painting. Yeah. Looking at it with devtools, if you enable "Highlight painted area" you can see the group icon flashing repeatedly as fast as the flashing can be rendered. It *looked* to me like repeated XHR requests complete caused the image's src attribute to be set. However, no XHR requests could be seen in the network tab - I'm not sure if they were cached, or my attempts at reading the profile were simply wrong. So yeah, I think repeatedly painting the group icon is the symptom. Can't repro on Chrome (FWIW, which really isn't much).
Comment 13•10 years ago
|
||
Do we not support stackwalking on Win7? That profile seems to be from pseudostack which isn't too useful, since it for example doesn't tell which events are being dispatched. (Stuff under nsEventDispatcher::Dispatch take quite a bit time. srcset() is suspicious.) Hmm, https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Availability hints that native stack might be disabled on release builds. Can you reproduce the issue on trunk?
Reporter | ||
Comment 14•10 years ago
|
||
Nightly generated profile: http://people.mozilla.org/~bgirard/cleopatra/#report=8c05146cb089d814d4fc65f62500285f56d572d5 Looks to me like a timer.
Comment 15•10 years ago
|
||
Hmm, *lots* of async events dispatched, and the listener for them calls event.add/y.handle() which ends up calling some srcset. Do we end up changing some img all the time, fire load event for it, and then change the src again, and fire load...? Or could be error event too. That is my guess from the profile. Lost of async dom events, and HTMLImageElement::PreHandleEvent also in the profile, that certainly shouldn't be there at all unless it is called often.
Comment 16•10 years ago
|
||
Could be a Yammer bug. Have you reported the issue to them?
Comment 17•10 years ago
|
||
We've opened a support request with yammer about this. Let's see what they came back with. Will let everyone know when I hear back.
Comment 18•10 years ago
|
||
I think on Yammer it was mentioned that this was happening on Nightly. Can someone give me a version that is is happening on? Is it just the latest? Any other versions that people have been experiencing this issue?
Comment 19•10 years ago
|
||
Vien: for me it's happening in Nightly on OSX Mavericks, consistently. If you open a Yammer tab and click on "All Company" then leave it open for a while, Nightly proceeds to eat 100% of your CPU.
Comment 20•10 years ago
|
||
I think the latest is still affected by this.
Reporter | ||
Comment 21•10 years ago
|
||
mozregression tells me: Got as far as we can go bisecting nightlies... Ensuring we have enough metadata to get a pushlog... Last good revision: 41d962d23e81 (2014-03-11) First bad revision: 44ae8462d6ab (2014-03-12) Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=41d962d23e81&tochange=44ae8462d6ab Sadly though, mozregression hangs trying to use inbound builds to bisect further. That pushlog is fairly large and nothing springs out (except maybe ed30fc4d3e17, "Bug 982146 - Clean up SyncRunnable wait code, r=bsmedberg." but reversing that simple patch locally didn't have any affect)
Comment 22•10 years ago
|
||
Here is the output from the Mac Activity Monitor "Sample Process". Some interesting things to search for in there: - 50 nsJSContext::GarbageCollectNow - 43 imgRequestNotifyRunnable::Run()
Comment 23•10 years ago
|
||
Possibly bug 980243? Did we use to have the same behavior 100% CPU back in Firefox 26?
Reporter | ||
Comment 24•10 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #23) > Possibly bug 980243? As usual, bz hits the nail on the head! Reverting the patch from that bug locally solves the problem for me.
Blocks: 980243
Comment 25•10 years ago
|
||
OK, but then the important question is what did Firefox 26 do on this site? The behavior prior to bug 980243 was to not do image loads we really should have been doing, and it got broken in the Firefox 27 cycle. Note that WebKit does NOT handle those image loads quite the same way we do, so it's possible that the code was written to rely on their handling and effectively ends up in an infinite loop where it has a load handler on <img> which then sets src on that same image.
Reporter | ||
Comment 26•10 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #25) > OK, but then the important question is what did Firefox 26 do on this site? I figured I'd test this while waiting for hg to update. On 26 I see 100% CPU *and* firefox is unresponsive (ie, I can't interact with it, the "spinners" are frozen; strangely though, windows doesn't report it is "unresponsive" and doesn't offer to kill it when trying to close it as I'd expect - it must be hard-killed via the task manager)
Comment 27•10 years ago
|
||
Probably starving OS events; that's something we've changed around some since 26. Sounds like this is a bug in the yammer script, probably as described in comment 25: a load handler on an <img> setting that <img>'s src to the same value it already has, which triggers a forced load and a new load event for the same image, etc.
Comment 28•10 years ago
|
||
I'm still working with Yammer support on this issue. I was able to replicate the issue myself and have given them the exact same instructions to replicate it on their end. The customer rep that's helping me was NOT able to replicate and has escalated to their tier 2 support. Should expect an answer back in a day.
Comment 29•10 years ago
|
||
So I did a bit of poking around in Yammer's code. The site is doing tons of src sets all on the same same HTMLImageElement. The src being set is https://mug0.assets-yammer.com/mugshot/images/150x150/all_company.png and the function immediately doing the set is: function i(e, t) { e.height = e.height, e.width = e.width, e.src = t } via this stack trace: 0 i(e = [object HTMLImageElement], t = "https://mug0.assets-yammer.com/mugshot/images/150x150/all_company.png") ["https://c64.assets-yammer.com/assets/yam-requirejs-home-39f6e8696fcdb68131e547a660e49b05.js":42] this = [object Window] 1 anonymous([object Object]) ["https://c64.assets-yammer.com/assets/yam-requirejs-home-39f6e8696fcdb68131e547a660e49b05.js":42] this = [object HTMLImageElement] 2 anonymous(e = [object Object]) ["https://c64.assets-yammer.com/assets/vendor-4debc085c4ec407eb4852954143be359.js":25] this = [object HTMLImageElement] 3 anonymous(e = [object Event]) ["https://c64.assets-yammer.com/assets/vendor-4debc085c4ec407eb4852954143be359.js":24] this = [object HTMLImageElement] Sadly, the source is minified, so finding who exactly calls this function is nontrivial, though I suspect it's jQuery, due to this call: n.complete ? i(n, c) : e(n).on("load", function () { i(n, c) }) I did check and there is exactly one src set for every time we fire the load event, so it's not like the site is adding more and more listeners...
Comment 30•10 years ago
|
||
After working with Yammer support, they are not able to replicate the issue on their end. I was able to replicate on Beta and Nightly but not GA. I gave them the exact steps that I used but they said that it still works normally for them. They tried on FF GA, Beta, and Nightly. I asked them if they had a yammer development network and they said yes we can go that route to help get this resolved. I know everyone is busy but is there anyone who would like to volunteer to join their developer network to help and try to get this resolved? Thanks
Comment 31•10 years ago
|
||
Vien, I'm happy to walk them through what things look like from our end...
Comment 32•10 years ago
|
||
Boris - I'll reach out to you directly. Thanks!
Comment 33•10 years ago
|
||
I'm seeing markh's symptoms on both Aurora (30.0a2 2014-04-22) and Nightly (31.0a1) running on Mac OS X 10.8.3
Comment 34•10 years ago
|
||
So I signed up for their development network thing (eventually; it took them a week to actually let me in) and asked about this two days ago. No response so far. I don't expect to get any at this point.
Flags: needinfo?(vdoan)
Comment 35•10 years ago
|
||
A former yammer employee, Oscar Godson, ran into this. I'm asking him on twitter if he knows anyone who could help us.
Comment 36•10 years ago
|
||
We have an srcset polyfill to render higher resolution images for browsers with a devicePixelRation greater than 1. The polyfill was listening to the 'load' event of the img objects before replacing them. The problem was we weren't unbinding from that event and hence continuously triggering that bit of code. This has been fixed and will go out in our next deploy.
Comment 37•10 years ago
|
||
> for browsers with a devicePixelRation greater than 1
Aha! That explains why some people couldn't reproduce: you needed a high-dpi display!
Updated•10 years ago
|
Flags: needinfo?(vdoan)
Comment 38•10 years ago
|
||
This seems to be fixed on the deployed yammer.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•