Closed Bug 1112352 Opened 10 years ago Closed 7 years ago

[Meta] Platform fixes for Treeherder performance

Categories

(Tree Management :: Treeherder, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: RyanVM, Unassigned)

References

(Depends on 3 open bugs, Blocks 1 open bug)

Details

(Keywords: meta, perf)

Attachments

(1 file)

While there are many perf issues that can and should be fixed with better Treeherder code, Treeherder has also proved to be a good stress test and has already found multiple inefficiencies that can be fixed in the browser code itself.

Rather than having one bug to track both fixes in the platform and fixes in the Treeherder code, we decided it was best to spin off platform fixes to their own bug for better tracking.
Keywords: meta
Keywords: perf
Depends on: 1115915
Priority: -- → P3
Priority: P3 → P4
* Profiled loading 50 new results, TONS of jank
  * (just scrolling while everything is already loaded seems to stay at 60fps, even with many jobs loaded)
  * Use the devtools profiler in Nightly to load + view this profile (after un- gzip'ing).

* 1000ms pause w/ 36% of time in a synchronous reflow
  * caused by jquery.event.fix
  * called by jquery.event.add
  * not clear to me what treeherder code is triggering this

* async reflow that took 462 ms
  * Much longer than would be ideal
  * Seems like something the layout team can better debug

* a series of DOM xhr load events
  * Each takes about 40ms, collectively produce about a full second of jank
  * 31% of time in extend, under ThJobModel, under ThJobModel.get_list, under processQueue

* Then a series of setTimeouts fire
  * Each timeout handler takes about 45ms, collectively contribute about 1.5 seconds of jank
  * Spending most of the time parsing HTML and building DOM elements
  * Under addJobBtnEls, under renderJobTableRow, under generateJobElements

* Then a 160 ms reflow
  * Seems like something the layout team can better debug
(In reply to Nick Fitzgerald [:fitzgen] from comment #1)
> * 1000ms pause w/ 36% of time in a synchronous reflow
>   * caused by jquery.event.fix
>   * called by jquery.event.add
>   * not clear to me what treeherder code is triggering this

Suggest figuring out how to avoid doing jquery.event.add, but would need to know more about treeherder works, and have a non-minified build.

> * a series of DOM xhr load events
>   * Each takes about 40ms, collectively produce about a full second of jank
>   * 31% of time in extend, under ThJobModel, under ThJobModel.get_list,
> under processQueue

Suggest not doing any work in these xhr load callbacks and only queuing up work to be done in a requestAnimationFrame loop.

It seems like there is a little bit of infrastructure for this already, but

(a) there is still work happening in these callbacks (the whole extending and get_list stuff) when it should probably only happen inside the rAF, and

(b) the process queue seems to be using setTimeout instead of requestAnimationFrame, and I think this is leading to more blocking of the event loop than it should be. Gotta give the engine a chance to render those frames, even if it means that the results show up slightly slower, at least the browser isn't beachballed while they show up.

> * Then a series of setTimeouts fire
>   * Each timeout handler takes about 45ms, collectively contribute about 1.5
> seconds of jank
>   * Spending most of the time parsing HTML and building DOM elements
>   * Under addJobBtnEls, under renderJobTableRow, under generateJobElements

Yeah, I think this should be using rAF, not some kind of timer.

I wonder if you could avoid using jquery to parse HTML snippets and instead just create dom nodes yourself. Will likely be much faster.

--------------------------------

Both reflows are taking way too long, especially the first one, but I don't really know how to debug these. Unfortunately, the devtools profiler is lacking in this department. I suggest getting ahold of someone from the layout team to figure out why they are taking so long.
(In reply to Nick Fitzgerald [:fitzgen] from comment #2)
> (In reply to Nick Fitzgerald [:fitzgen] from comment #1)
> > * 1000ms pause w/ 36% of time in a synchronous reflow
> >   * caused by jquery.event.fix
> >   * called by jquery.event.add
> >   * not clear to me what treeherder code is triggering this
> 
> Suggest figuring out how to avoid doing jquery.event.add, but would need to
> know more about treeherder works, and have a non-minified build.
> ...

Thanks for looking at this Nick! FWIW, it's pretty trivially easy to set up a local non-minified ui-only server which just pulls data from the production/staging. See: https://treeherder-ui.readthedocs.org/en/latest/installation.html#installation
Depends on: 1145831
Depends on: 1159459
Depends on: 1159470
Depends on: 1159819
Depends on: 1067846
Blocks: 1067846
No longer depends on: 1067846
Depends on: 1223445
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
If this is a meta with open dependencies, why is it resolved incomplete? And why is it a P4?
I marked it as incomplete since:
* Treeherder performance seems much less of an issue than it was previously, presumably due to the heroic efforts (and actual data driven approach) for Firefox 57 et al
* there's not really much the Treeherder team can do about the dep bugs (some got fixed long ago, others have just sat)
* we're trying to get on top of our 600+ open bugs and this one is pretty open ended/doesn't really add any value by being open (unlike say bug 1067846 which is about things we can change on our side to improve performance in all browsers)

Since then I found a few other already-open bugs that seemed worth at least linking to this in case anyone stumbled this way in the future, however they don't really change the above, so it didn't seem worth reopening.

As for P4, that's been set for ages rather than a value we'd choose now (under the new triaging system this would be a P3 since it's a meta and P4 is not used, but we're not fixing historical closed bug P4s as part of our recent triage efforts).
Priority: P4 → P3
Perhaps a better way of tracking these in the future is some keyword or whiteboard annotation, since really this bug was just used for the purpose of collecting together bugs on a similar topic, rather than being a meta bug for a particular project/goal.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: