Closed Bug 1032437 Opened 10 years ago Closed 10 years ago

Treeherder initial pageload is slower than TBPL

Categories

(Tree Management :: Treeherder, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: jeads)

References

(Depends on 1 open bug)

Details

(Keywords: perf)

1) With a warm browser cache, open https://treeherder.mozilla.org/ui/#/jobs?repo=mozilla-central
2) Time how long before pushes + results appear.
3) Repeat for https://tbpl.mozilla.org/

Expected:
Equal (or faster) time compared to TBPL.

Actual:
Treeherder takes ~8 seconds until any (and all) results appear.
TBPL displays all pushes + the first set of results after ~2.5 seconds, and all results by the time ~4.5 seconds have passed.

The TBPL pageload being incremental makes the perceived speed even faster, but even ignoring that, 8s vs 4.5s is still quite noticeable.
Priority: -- → P1
Just to add some context - it's quite a common workflow for sheriffs to have to frequently open new tabs (eg from IRC links or checking out a new issue), and this makes that quite slow.
Assignee: nobody → jeads
Blocks: 1042621
So I did some profiling of the pageload.

Something like 10% of the time is spent getting clientWidth and the resulting relayouts.

3-4% is used to get computed display and the resulting style flushes.

5% is innerHTML sets.

These are all happening via slow paths called from the baseline JIT, by the way; looking at the callees we're not running enough code to make it into ion, at first glance.

Anyway, I'd start with finding those clientWidth getters and see whether not doing that helps.
(In reply to Boris Zbarsky [:bz] from comment #2)
> So I did some profiling of the pageload.
> 
> Something like 10% of the time is spent getting clientWidth and the
> resulting relayouts.
> 
> 3-4% is used to get computed display and the resulting style flushes.
> 
> 5% is innerHTML sets.
> 
> These are all happening via slow paths called from the baseline JIT, by the
> way; looking at the callees we're not running enough code to make it into
> ion, at first glance.
> 
> Anyway, I'd start with finding those clientWidth getters and see whether not
> doing that helps.

Thanks for the profiling information! We will look into removing the clientWidth getters to increase the page rendering speed.

The majority of the page load and paging time seems to be waiting for the HTTP GET request to finish. The database query times are typically less than a second, so it looks like we're just returning too much data in one request. Looking at the JSON structure we're returning:

https://treeherder.mozilla.org/api/project/mozilla-central/resultset?count=10&format=json&full=true&with_jobs=true

the bulk of the data returned is actually the property names for each job associative array which is rather wasteful.

{

    "submit_timestamp": 1406146869,
    "machine_name": "tst-linux32-spot-596",
    "job_group_symbol": "M",
    "job_group_name": "Mochitest",
    "platform_option": "opt",
    "job_type_description": "fill me",
    "result_set_id": 231,
    "result": "success",
    "id": 156459,
    "machine_platform_architecture": "x86",
    "end_timestamp": 1406149254,
    "build_platform": "linux32",
    "job_guid": "6394723014e72ce5592bd27084df5bb0c3b66fac",
    "job_type_name": "Mochitest DevTools Browser Chrome",
    "platform": "linux32",
    "state": "completed",
    "running_eta": 2403,
    "pending_eta": 25,
    "build_os": "linux",
    "who": "tests-mozilla-central-ubuntu32_vm-opt-unittest",
    "failure_classification_id": 1,
    "job_type_symbol": "dt",
    "reason": "scheduler",
    "job_group_description": "fill me",
    "job_coalesced_to_guid": null,
    "machine_platform_os": "linux",
    "start_timestamp": 1406147023,
    "build_architecture": "x86",
    "build_platform_id": 7,
    "resource_uri": "/api/project/mozilla-central/jobs/156459/"
}

Simple solution would be to return the job data in an array without the property names and provide a single instance top level key structure that maps the array indexes to the associated properties. This looks like it could reduce the size of the returned data structure by >50% which should be a huge improvement. Working on implementing this now.
Status: NEW → ASSIGNED
I completed the work described in comment 3 and landed 2 pull requests on treeherder-service and treeherder-ui.

https://github.com/mozilla/treeherder-service/commit/52a653911e450fb5c5e9359438fd90a87c85b82a
https://github.com/mozilla/treeherder-ui/commit/f3f1c00bb16ceebd2ed52a51b048dd08a5d4aa1a

I've pushed the changes to dev, http://treeherder-dev.allizom.org, and I'm using stage, https://treeherder.allizom.org, which does not have the changes, to compare load times with.

The times/sizes reported below were generated in timeline in developer tools:

http://treeherder-dev.allizom.org/api/project/mozilla-central/resultset/?count=10&format=json&full=true&with_jobs=true

treeherder-dev, without property names

Response Time (seconds), Response size (MB)
1.) 4.63s, 2.2 MB
2.) 3.23s, 2.2 MB
3.) 3.20s, 2.2 MB
4.) 4.93s, 2.2 MB
5.) 3.21s, 2.2 MB
6.) 3.71s, 2.3 MB
7.) 4.91s, 2.4 MB
8.) 3.53s, 2.4 MB
9.) 5.38s, 2.4 MB
10.) 3.54s, 2.4 MB

avg 4.03s, 2.3 MB

https://treeherder.allizom.org/api/project/mozilla-central/resultset/?count=10&format=json&full=true&with_jobs=true
treeherder-stage, with property names

Response Time (seconds), Response size (MB)
1.) 4.33s, 4.6 MB
2.) 6.57s, 4.6 MB
3.) 4.47s, 4.6 MB
4.) 4.04s, 4.7 MB
5.) 5.94s, 4.7 MB
6.) 6.23s, 4.7 MB
7.) 7.06s, 4.9 MB
8.) 4.95s, 4.9 MB
9.) 4.82s, 4.9 MB
10.) 11.60s, 5.0 MB

avg 5.97s, 4.8 MB

According to this comparison, removal of the property names reduced the total data returned by ~52%, with a corresponding ~33% reduction in total HTTP GET response time. Yay!

I did a similar benchmark on tbpl, it's a bit more challenging because there are multiple ajax calls that need to be profiled but here are the results:

Page load 1
--------------------------------------------
Revision Round Trips: 1266ms/241KB + 1268ms/246KB + 1212ms/243KB + 1319ms/305KB + 1266ms/252KB + 1359ms/467KB + 2215ms/248KB 
builds-running.js 635ms/892KB
builds-pending.js 401ms/184KB
json-pushes: 353ms/261KB
--------------------------------------------
Total time 11.3 sec, 3.3 MB


tbpl does the revision HTTP GET requests in parallel, so the end-to-end time for data retrieval is ~2.7-3.0 sec, with a render time of ~2.5-3.0 seconds at the time I was loading the page. This makes a big difference in the overall page load time and the perceived page load time.

NOTE: tbpl retrieves 7 pushes on page load, treeherder is retrieving 10, so if we run out of stuff to try to speed things up we can always switch back to 7.

Profiling the resultset web service method server side with new relic we see that 42% of the total duration is spent in the WSGIHandler returning data, only 17% is spent in the data retrieval from the database.

So the next thing to try is chunking the data retrieval with a set of parallel AJAX call, this will allow us to start rendering sooner. With the removal of the property names we're now retrieving less total data but more pushes, so this might do it, we shall see.

Working on implementing this now.
Thank you for digging into this :-)
The parallel revision retrieval is implemented and in production. This resolves the page load performance issues when I profile dev, stage, and production. Exact comparisons right now are a bit challenging because tbpl seems to be returning a total of 2 pushes for mozilla-central and treeherder is returning 10. Regardless, they still seem comparable in load times, with tbpl coming in at 2-3 seconds and treeherder hitting 3-4 seconds. To summarize:

We addressed these performance issues by doing the following things:

1.) Reducing the size of data returned from the web service by removing redundant job object property names.

https://github.com/mozilla/treeherder-service/pull/180
https://github.com/mozilla/treeherder-ui/pull/106

2.) Re-structuring the web service requests from the UI so that they are performed in parallel using the $q service in angularjs. I also separated out the retrieval of the resultsets/pushes onto their own web service call. I think this will remove the occasional javascript non-responsive message because the application regains control to render after each request returns. The individual HTTP request round trips are much faster than one single request, by doing them in parallel the end-to-end load time seems very comparable to tbpl.

https://github.com/mozilla/treeherder-service/pull/182
https://github.com/mozilla/treeherder-ui/pull/115

If this is still an issue for anyone please re-open this bug and needinfo jeads.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
(In reply to Jonathan Eads ( :jeads ) from comment #6)
> The parallel revision retrieval is implemented and in production. This
> resolves the page load performance issues when I profile dev, stage, and
> production. Exact comparisons right now are a bit challenging because tbpl
> seems to be returning a total of 2 pushes for mozilla-central and treeherder
> is returning 10. 

TBPL returns fewer than 10 pushes for inactive repos [1]. On mozilla-central the number of pushes per day is low - so it's probably to compare TBPL vs treeherder using another repo, eg mozilla-inbound.

Using inbound, the performance is much more equal now - thank you :-)

[1] I've filed bug 1049002 for deciding whether we want equivalent behaviour for treeherder.
Keywords: perf
Depends on: 1074539
Depends on: 1076840
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/8b798b785c40a16ef1461f5adea8716cb832aa70
Remove job property names from resultset web service data to improve page load performance (Bugs 1032437 1032216 1032448 1042621)
You need to log in before you can comment on or make changes to this bug.