984838 - 100% CPU on yammer

Reporter

Description

•

11 years ago

STR for me on Win 7, on 2 different machines * Open yammer; Select the "all company" group; shift-reload. * Alternatively: Open yammer; select "all company" group; restart and restore session. Firefox starts shewing 100% of a core. Also: * Closing the tab fixes it; re-open and it comes back. * Selecting the home tab on yammer stops it. Re-selecting "all company" does *not* restart the problem - it's necessary for the page-load to happen on the "all company" group to repro. Possibly all groups - didn't try that. Possibly happens on the community yammer etc, but also didn't try that - so sadly the above requires access to the staff yammer :(

Ludovic Hirlimann [:Usul]

Updated

•

11 years ago

Keywords: perf

Ludovic Hirlimann [:Usul]

Comment 1

•

11 years ago

Mark can you link to your mozilla profiler when this happens we already have axel's in the yammer thread.

Mark Hammond [:markh] [:mhammond]

Reporter

Comment 2

•

11 years ago

Profile at http://people.mozilla.org/~mhammond/yammer_high_cpu.json

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 3

•

11 years ago

Any change for a gecko profiler link, not to .json. Something like http://people.mozilla.org/~bgirard/cleopatra/#report=xxxxxx

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 4

•

11 years ago

chance, not change, but anyhow, link to a profile which is easy to read. I don't know how to import that .json to anywhere.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 5

•

11 years ago

Ah, that profile is from devtools. Please use Gecko profiler or some other profiler which gives better information what is happening inside Gecko.

:Irving Reid (No longer working on Firefox)

Comment 6

•

11 years ago

I saw similar behaviour on http://telemetry.mozilla.org/#release/27/SIMPLE_MEASURES_AMI_STARTUP_END/saved_session/Fennec (not sure it's completely required to reproduce, but I had the "Show evolution over Calendar Dates" button selected). In my case, dismissing the slow script dialog a few times eventually led to crashes: bp-0160e74b-eea5-41ae-a478-7fbb92140317 bp-b856d64a-2b53-4a8f-9304-b08ef2140317 bp-93e62d47-59fc-4acc-b3fe-90e0c2140317 One note: running Nightly from the command line, I see the following warning: JavaScript warning: http://cdnjs.cloudflare.com/ajax/libs/d3/3.3.9/d3.min.js, line 3: mutating the [[Prototype]] of an object will cause your code to run very slowly; instead create the object with the correct initial [[Prototype]] value using Object.create Here's a profile: http://people.mozilla.org/~bgirard/cleopatra/#report=8e78388aa31ba35b58dec7ecba48ca70af22d59e

Boris Zbarsky [:bzbarsky]

Comment 7

•

11 years ago

That profile shows a bunch of JS running off timers, which then goes and creates tons of arrays, does something with Voronoi polygons, creates a ton of non-Array objects, etc. NVD3 is some sort of charting package. Is it possible that yammer is using this for something and continuously updating the charts?

:Irving Reid (No longer working on Firefox)

Updated

•

11 years ago

Blocks: 984928

:Irving Reid (No longer working on Firefox)

Comment 8

•

11 years ago

bz: Comment 6 was referring to a similar symptom seen at a different URL; at smaug's suggestion I split it off to bug 984928.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 9

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #7) > That profile shows a bunch of JS running off timers, which then goes and > creates tons of arrays, does something with Voronoi polygons, creates a ton > of non-Array objects, etc. ...which leads to GCing very often, since we certainly spend quite a bit time in gc.

Mark Hammond [:markh] [:mhammond]

Reporter

Comment 10

•

11 years ago

Gecko profiler link: http://people.mozilla.org/~bgirard/cleopatra/#report=517f7807e93091b80d5155fb0e88c11d0956dce1

Boris Zbarsky [:bzbarsky]

Comment 11

•

11 years ago

That's showing mostly a bunch of painting.

Mark Hammond [:markh] [:mhammond]

Reporter

Comment 12

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #11) > That's showing mostly a bunch of painting. Yeah. Looking at it with devtools, if you enable "Highlight painted area" you can see the group icon flashing repeatedly as fast as the flashing can be rendered. It *looked* to me like repeated XHR requests complete caused the image's src attribute to be set. However, no XHR requests could be seen in the network tab - I'm not sure if they were cached, or my attempts at reading the profile were simply wrong. So yeah, I think repeatedly painting the group icon is the symptom. Can't repro on Chrome (FWIW, which really isn't much).

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 13

•

11 years ago

Do we not support stackwalking on Win7? That profile seems to be from pseudostack which isn't too useful, since it for example doesn't tell which events are being dispatched. (Stuff under nsEventDispatcher::Dispatch take quite a bit time. srcset() is suspicious.) Hmm, https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Availability hints that native stack might be disabled on release builds. Can you reproduce the issue on trunk?

Mark Hammond [:markh] [:mhammond]

Reporter

Comment 14

•

11 years ago

Nightly generated profile: http://people.mozilla.org/~bgirard/cleopatra/#report=8c05146cb089d814d4fc65f62500285f56d572d5 Looks to me like a timer.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 15

•

11 years ago

Hmm, *lots* of async events dispatched, and the listener for them calls event.add/y.handle() which ends up calling some srcset. Do we end up changing some img all the time, fire load event for it, and then change the src again, and fire load...? Or could be error event too. That is my guess from the profile. Lost of async dom events, and HTMLImageElement::PreHandleEvent also in the profile, that certainly shouldn't be there at all unless it is called often.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 16

•

11 years ago

Could be a Yammer bug. Have you reported the issue to them?

Vien Doan [:vdoan]

Comment 17

•

11 years ago

We've opened a support request with yammer about this. Let's see what they came back with. Will let everyone know when I hear back.

Vien Doan [:vdoan]

Comment 18

•

11 years ago

I think on Yammer it was mentioned that this was happening on Nightly. Can someone give me a version that is is happening on? Is it just the latest? Any other versions that people have been experiencing this issue?

Deb Richardson [:dria] (plz NEEDINFO)

Comment 19

•

11 years ago

Vien: for me it's happening in Nightly on OSX Mavericks, consistently. If you open a Yammer tab and click on "All Company" then leave it open for a while, Nightly proceeds to eat 100% of your CPU.

Mike Conley (:mconley) (:⚙️)

Comment 20

•

11 years ago

I think the latest is still affected by this.

Mark Hammond [:markh] [:mhammond]

Reporter

Comment 21

•

11 years ago

mozregression tells me: Got as far as we can go bisecting nightlies... Ensuring we have enough metadata to get a pushlog... Last good revision: 41d962d23e81 (2014-03-11) First bad revision: 44ae8462d6ab (2014-03-12) Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=41d962d23e81&tochange=44ae8462d6ab Sadly though, mozregression hangs trying to use inbound builds to bisect further. That pushlog is fairly large and nothing springs out (except maybe ed30fc4d3e17, "Bug 982146 - Clean up SyncRunnable wait code, r=bsmedberg." but reversing that simple patch locally didn't have any affect)

Ben Kelly [:bkelly, not reviewing]

Comment 22

•

11 years ago

Attached file Mac ActivityManager SampleProcess after closing yammer tab — Details

Here is the output from the Mac Activity Monitor "Sample Process". Some interesting things to search for in there: - 50 nsJSContext::GarbageCollectNow - 43 imgRequestNotifyRunnable::Run()

Boris Zbarsky [:bzbarsky]

Comment 23

•

11 years ago

Possibly bug 980243? Did we use to have the same behavior 100% CPU back in Firefox 26?

Mark Hammond [:markh] [:mhammond]

Reporter

Comment 24

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #23) > Possibly bug 980243? As usual, bz hits the nail on the head! Reverting the patch from that bug locally solves the problem for me.

Blocks: 980243

Boris Zbarsky [:bzbarsky]

Comment 25

•

11 years ago

OK, but then the important question is what did Firefox 26 do on this site? The behavior prior to bug 980243 was to not do image loads we really should have been doing, and it got broken in the Firefox 27 cycle. Note that WebKit does NOT handle those image loads quite the same way we do, so it's possible that the code was written to rely on their handling and effectively ends up in an infinite loop where it has a load handler on <img> which then sets src on that same image.

Mark Hammond [:markh] [:mhammond]

Reporter

Comment 26

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #25) > OK, but then the important question is what did Firefox 26 do on this site? I figured I'd test this while waiting for hg to update. On 26 I see 100% CPU *and* firefox is unresponsive (ie, I can't interact with it, the "spinners" are frozen; strangely though, windows doesn't report it is "unresponsive" and doesn't offer to kill it when trying to close it as I'd expect - it must be hard-killed via the task manager)

Boris Zbarsky [:bzbarsky]

Comment 27

•

11 years ago

Probably starving OS events; that's something we've changed around some since 26. Sounds like this is a bug in the yammer script, probably as described in comment 25: a load handler on an <img> setting that <img>'s src to the same value it already has, which triggers a forced load and a new load event for the same image, etc.

Vien Doan [:vdoan]

Comment 28

•

11 years ago

I'm still working with Yammer support on this issue. I was able to replicate the issue myself and have given them the exact same instructions to replicate it on their end. The customer rep that's helping me was NOT able to replicate and has escalated to their tier 2 support. Should expect an answer back in a day.

Boris Zbarsky [:bzbarsky]

Comment 29

•

11 years ago

So I did a bit of poking around in Yammer's code. The site is doing tons of src sets all on the same same HTMLImageElement. The src being set is https://mug0.assets-yammer.com/mugshot/images/150x150/all_company.png and the function immediately doing the set is: function i(e, t) { e.height = e.height, e.width = e.width, e.src = t } via this stack trace: 0 i(e = [object HTMLImageElement], t = "https://mug0.assets-yammer.com/mugshot/images/150x150/all_company.png") ["https://c64.assets-yammer.com/assets/yam-requirejs-home-39f6e8696fcdb68131e547a660e49b05.js":42] this = [object Window] 1 anonymous([object Object]) ["https://c64.assets-yammer.com/assets/yam-requirejs-home-39f6e8696fcdb68131e547a660e49b05.js":42] this = [object HTMLImageElement] 2 anonymous(e = [object Object]) ["https://c64.assets-yammer.com/assets/vendor-4debc085c4ec407eb4852954143be359.js":25] this = [object HTMLImageElement] 3 anonymous(e = [object Event]) ["https://c64.assets-yammer.com/assets/vendor-4debc085c4ec407eb4852954143be359.js":24] this = [object HTMLImageElement] Sadly, the source is minified, so finding who exactly calls this function is nontrivial, though I suspect it's jQuery, due to this call: n.complete ? i(n, c) : e(n).on("load", function () { i(n, c) }) I did check and there is exactly one src set for every time we fire the load event, so it's not like the site is adding more and more listeners...

Vien Doan [:vdoan]

Comment 30

•

11 years ago

After working with Yammer support, they are not able to replicate the issue on their end. I was able to replicate on Beta and Nightly but not GA. I gave them the exact steps that I used but they said that it still works normally for them. They tried on FF GA, Beta, and Nightly. I asked them if they had a yammer development network and they said yes we can go that route to help get this resolved. I know everyone is busy but is there anyone who would like to volunteer to join their developer network to help and try to get this resolved? Thanks

Boris Zbarsky [:bzbarsky]

Comment 31

•

11 years ago

Vien, I'm happy to walk them through what things look like from our end...

Vien Doan [:vdoan]

Comment 32

•

11 years ago

Boris - I'll reach out to you directly. Thanks!

Bill Walker [:bwalker] [@wfwalker]

Comment 33

•

11 years ago

I'm seeing markh's symptoms on both Aurora (30.0a2 2014-04-22) and Nightly (31.0a1) running on Mac OS X 10.8.3

Boris Zbarsky [:bzbarsky]

Comment 34

•

11 years ago

So I signed up for their development network thing (eventually; it took them a week to actually let me in) and asked about this two days ago. No response so far. I don't expect to get any at this point.

Flags: needinfo?(vdoan)

Ben Kelly [:bkelly, not reviewing]

Comment 35

•

11 years ago

A former yammer employee, Oscar Godson, ran into this. I'm asking him on twitter if he knows anyone who could help us.

sugendran

Comment 36

•

11 years ago

We have an srcset polyfill to render higher resolution images for browsers with a devicePixelRation greater than 1. The polyfill was listening to the 'load' event of the img objects before replacing them. The problem was we weren't unbinding from that event and hence continuously triggering that bit of code. This has been fixed and will go out in our next deploy.

Boris Zbarsky [:bzbarsky]

Comment 37

•

11 years ago

> for browsers with a devicePixelRation greater than 1 Aha! That explains why some people couldn't reproduce: you needed a high-dpi display!

Vien Doan [:vdoan]

Updated

•

11 years ago

Flags: needinfo?(vdoan)

Ben Kelly [:bkelly, not reviewing]

Comment 38

•

11 years ago

This seems to be fixed on the deployed yammer.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → WORKSFORME