Closed Bug 1510280 Opened 6 years ago Closed 5 years ago

TreeHerder producing multi-second UI stalls after loading more data

Categories

(Tree Management :: Treeherder: Frontend, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kats, Assigned: camd)

References

(Blocks 1 open bug)

Details

Attachments

(5 files)

Since sometime yesterday I'm seeing my browser kind of freeze up for a few seconds every so often. For example I'll click on a link, and it won't open for a seconds. I grabbed a profile of one occurrence of this [1], which seems to indicate that TreeHerder is running some apply-new-jobs code that is taking multiple seconds of the content process main thread. I have three treeherder tabs open at the moment, one of which has a lot of pushes (autoland for ~16 hours) so I guess taking some time to update the display is reasonable, but filing this anyway in case the profile points to some low-hanging fruit for optimization.

[1] https://perfht.ml/2AvleOn
See Also: → 1499551
Blocks: 1067846
Priority: -- → P1

I had the same situation right now with IRCCloud which is running in the same content process as a Treeherder instance in a different tab. At the time when this happened I had a Vidyo call with a couple of people, which means the CPU load was already high. But entering text in IRCCloud caused the tab to freeze.

I spoke with mconley on IRC and as he mentioned the problem should not be caused by IRCCloud but the long running GC/CC cycle for the Treeherder tab. Here is the profile:

https://perfht.ml/2HukPB7

Here another one from a fresh profile, which seems to be related to the cycle collector: https://perfht.ml/2HsH1vg

Steps to reproduce:

  1. Open Treeherder: http://treeherder.mozilla.org/
  2. Scroll down and click for the next 50 results
  3. Scroll up and down a lot

With step 3 you will notice that it takes a couple of seconds for Treeherder to repaint the UI. At some point it even stays completely white for me. Nothing gets rendered at all, even after 20s (see the screenshots in the profile).

Andrew, could you have a look at the profiles and why GC/CC takes that long and blocks everything for the script(s) on Treeherder?

Flags: needinfo?(continuation)

I'm not sure what you mean. I only see about a second of GC/CC time in there. It looks like most of the time is being spent in JS. If JS is running and allocating a lot of objects, there's going to be some time spent in the GC.

Flags: needinfo?(continuation)

I see. Thanks for clarifying that.

Cameron, is there something we can do here? This behavior as caused by Treeherder is very annoying when working with already 50-100 changesets once in a while. Not sure how much this actually affects the sheriffs.

Flags: needinfo?(cdawson)

Henrik-- It is likely that we can do some perf improvements here. I suspect there is some re-rendering during load-time happening that doesn't need to. I hope to have time to do some performance profiling and improvements later in the year. But for now my time is monopolized by working on the Push-Health project.

Assignee: nobody → cdawson
Status: NEW → ASSIGNED
Flags: needinfo?(cdawson)

I'm going to take this down to P2 as I will not get to it right away, but DO hope to get to it as soon as I'm freed up.

Priority: P1 → P2

Sure. I also just tested in Safari and it is all the same. It causes freezes of the whole browser when loading more data.

Summary: TreeHerder producing multi-second UI stalls in firefox → TreeHerder producing multi-second UI stalls after loading more data

I've been running into this for a while, but finally got a chance to profile. https://perfht.ml/2HNhttg where the steps were to load https://treeherder.mozilla.org/#/jobs?repo=autoland and then click the '50' button to load more data.

Some observations:

  1. Almost all the time is either JS or GC.
  2. Of the 30s in that profile (during all of which the browser was pretty unresponsive), about 5s is doing GC.
  3. A ton of time is spent doing "various JS things". For example 1s+ just adding properties to objects, on what looks like a slow-path call out of the JIT.
  4. Pretty much all the JS is under mapPushJobs. About half under updateJobMap, half under recalculateUnclassifiedCounts. Both then end up under enqueueSetState and then jump off into minified/obfuscated code.

My hypothesis is that we're redoing the same work over and over again in some sort of O(N^k), k > 1 algorithm where N is the number of jobs shown, because that's the only way to get the performance characteristics I'm seeing....

Thanks for the analysis, Boris. Yeah, I think I'm re-rendering too much each time we get some data back. I hope to address this some time in Q2 as I know it's very frustrating to folks.

Priority: P2 → P1
Attached file GitHub Pull Request
Regressions: 1554136

Did something already land? Treeherder production feels way faster already when loading more data.

Priority: P1 → P2

I believe we are in a good place with this issue. I'm going to close as fixed. Please reopen if you see more errors related to this.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: