Loading diff with changes to many files is slow because the file changes are loaded serially, not in parallel

RESOLVED FIXED

Status

MozReview
Review Board: DiffViewer
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: bz, Assigned: glob)

Tracking

(Blocks: 1 bug)

Details

(URL)

MozReview Requests

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

Looks like we do a separate XHR for each file, but we serialize the requests.  We should make the requests in parallel, to reduce network latency.

This is NOT the same as bug 1213122, which is about CPU-intensive restyling between the parallelized requests, though I expect fixing this would help that bug too, because we would coalesce the restyles better if multiple requests come back with data at about the same time.
(Assignee)

Comment 1

2 years ago
this is because RB is loading each diff in a queue: https://github.com/reviewboard/reviewboard/blob/master/reviewboard/static/rb/js/pages/views/diffViewerPageView.js#L309

i discussed this with the reviewboard team; it's more complicated than you think, and is dramatically impacted by gecko's rendering issues.  the end-to-end time for viewing https://reviewboard.mozilla.org/r/20953/diff/1 in chrome vs firefox is night/day, even with the serialisation of requests.

the queue exists to help reduce load on the server, and to ensure that the diffs are always rendered top-down.  christian said he'll have a quick look for low-hanging fruit here (like moving rendering out of the serialisation path), but it's likely that the real fix has two parts:  (1) fix gecko :)  (2) kick off several overlapping requests and delay rendering if they come back out of order.
(Assignee)

Comment 2

2 years ago
numbers for clarity:
   chrome:  31s
  firefox: 135s

Comment 3

2 years ago
I've spent the past several hours digging into our Review Board code and checking every code path in the diff load pipeline, sprinkling console.time/timeEnd around every significant part of the code. Here's a braindump.

Our rendering, from what console.time reports, is all fast enough. 30ms here, 80ms there, maybe 120ms, depending on the diff.

The loading of diff content through an AJAX request was the slow part. This content was cached, and Firefox's Network tab confirmed this, so one would expect it to be speedy. However, I was seeing load times of 875.25ms, 810.53ms, etc. consistently. I compared this to Chrome, which reported 100.975ms and 81.409ms, respectively.

I then tried taking the AJAX requests out of the equation by storing the data in sessionStorage and loading from there. I got similarly timings (800+ms).

So then I tried just serving up the content from a local variable (populating a JavaScript file with the content of each diff). This dropped a fair amount, but was still higher than expected.

Next, I set the containers for the injected diffs to have `display: none`. Numbers dropped quite a bit, and the diffs fetched and "loaded" much faster. AJAX requests dropped to around the 170-250ms range.

Then I turned off injecting the diffs into the DOM altogether. AJAX requests dropped to 30-100ms.

Worth noting that with these tests, Chrome didn't exhibit the kinds of performance issues we're seeing here.

So it does seem like a rendering issue, maybe one that's delaying the AJAX requests as well? (Maybe they're just waiting until the DOM rendering settles before responding?)

I did try the suggestion of overlapping the requests. While this appears to work okayish for very small diffs, it basically just freezes Firefox for diffs of any real size. The numbers compound and eventually it beachballs on my Mac.

From our point of view, this is a Firefox performance issue at this point, but I'm happy to help with any testing we can do to further narrow down the causes or verify any potential fixes.
The Gecko issues are tracked in bug 1213122.  This bug is just about the silly serialization of network traffic.

For what it's worth, 31s (the Chrome time cited in comment 2, which I assume is for the diff from the "URL" field of this bug) is completely abysmal as well.  Rendering time for a bugzilla diff attachment (plaintext) is _nothing_ like that, even for quite large diffs.  It's about 1-2s max for this particular diff.

> This content was cached

Er... not when you load the diff the first time!  Measuring cached times is completely pointless from the point of view of what an actual reviewer trying to do the review experiences.

> AJAX requests dropped to 30-100ms.

These are completely impossible times for actual real-life network requests that anyone cares about.  My ping time to reviewboard.mozilla.org is about 100ms...

> Worth noting that with these tests, Chrome didn't exhibit the kinds of performance issues we're seeing here.

The "30s to load a not too big diff" is a performance issue!  Please don't pretend it's not.

> I did try the suggestion of overlapping the requests.

In which exact way?  Just making the requests in parallel but inserting them all into the DOM all at once, in one operation with no intermediate layout flushes, should be the optimal thing to do from the point of view of rendering performance.

If you have a public URL for the setup with overlapped requests I would be happy to profile it in Firefox and see what's going on there.  But again, to be clear, this bug is about unacceptable slow performance with Chrome as well.
> Our rendering, from what console.time reports, is all fast enough. 30ms here, 80ms there, maybe 120ms, depending on the diff.

Oh, one more question just so I understand the numbers: What was this actually measuring?
(Assignee)

Comment 6

2 years ago
(In reply to Boris Zbarsky [:bz] from comment #4)
> For what it's worth, 31s (the Chrome time cited in comment 2, which I assume
> is for the diff from the "URL" field of this bug) is completely abysmal as
> well.

yes, that's my test url; and i agree that 31s is too slow.

another way to fix this may be to create an api call that accepts multiple files, and return the whole lot at once (ie. perform a single large request instead of multiple smaller ones).
> another way to fix this may be to create an api call that accepts multiple files, and return the whole lot at once

If this is not too bad to do server-side, this is, of course, a better solution than doing lots of requests, whether in parallel or in series.

Comment 8

2 years ago
(In reply to Boris Zbarsky [:bz] from comment #4)
> The Gecko issues are tracked in bug 1213122.  This bug is just about the
> silly serialization of network traffic.

We have solid reasons for doing it the way we do. It's actually not "silly" at all. I'll try to explain.

It's suggested further below that we should create an endpoint that batches everything. We used to do it this way, actually. We'd load everything all at once and then dump it onto the page. What actually happens is that, for a first-time, non-cached diff, this can be very very slow, and create a perception that the page is stuck.

Server-side, when generating a diff (and assuming nothing is yet cached), we talk to the repository and grab an original copy of each file. We then apply the diff, generating a patched copy, and from there, we generate all the information needed to render a side-by-side diff: diff opcodes, converted to chunk metadata, with analysis for move detection, etc., followed by syntax highlighting and rendering to HTML. Every bit of this is cached server-side, and the endpoints to request it take advantage of both server-side and client-side caching, to make further requests faster.

This can be computationally expensive (though we have a lot of optimizations in place to get this down to reasonable CPU times). The really expensive part is when talking to repositories. Many Review Board users make use of hosting services that will rate limit requests if too many are happening at once, or overloaded repositories hosted on underpowered hardware, or are accessing repositories hosted in another location through a slow server-to-server connection.

The original way this all worked is that we'd request a rendered diff for the whole page, and it would build it all server-side like we do now. If nothing was yet cached, this would really appear to take quite a while, and if the repositories were acting up, it could lead to timeouts, or users just reloading their page to try again, restarting the entire process (well, minus the server-side caches that were able to be primed during part of that).

By serializing the requests, we were able to work around this, creating a perception that this was significantly faster. And it was faster, too, when dealing with these sort of scenarios. If a repository suddenly stalled, if a hosting service began rate limiting, it didn't block the review process. You'd be looking at the diffs that have loaded and hopefully not even notice the delays on the others.

> 
> For what it's worth, 31s (the Chrome time cited in comment 2, which I assume
> is for the diff from the "URL" field of this bug) is completely abysmal as
> well.  Rendering time for a bugzilla diff attachment (plaintext) is
> _nothing_ like that, even for quite large diffs.  It's about 1-2s max for
> this particular diff.

A Bugzilla diff attachment is pretty bare-bones. As mentioned above, we do a *lot* more than Bugzilla does here.

Also, I would agree, 31 is a lot. Part of this is because we're loading 80 files (default installs load 20 per page). On my 4 year old Mac, it's taking 22 seconds, and I'm able to review changes as they load without slowdown. On Firefox, it's taking 2 minutes 13 seconds, and the page is unusable the majority of the time. That's over twice the time it takes Chrome when repeating the same test without *any* caching.

When I test the same diff on a server I'm managing, it drops to 15 seconds under Chrome for 80 files, but stays at about the 2 minutes part for Firefox.


> > This content was cached
> 
> Er... not when you load the diff the first time!  Measuring cached times is
> completely pointless from the point of view of what an actual reviewer
> trying to do the review experiences.

Far from it. It's a good test of what's going on without involving the server-side diff generation or repository communication machinery, which can definitely impact the numbers wildly (depending on how long the repository's taking to respond). By testing only cached results, we can isolate it to the browser with a minimal round-trip time between browser and server to serve up the content. Helps to highlight which operations are really slowing things down here.

Client-side browser caching is a component, yes, but since we aggressively cache server-side, we should be seeing really good performance even on a first review (assuming anyone at all has loaded that diff at least once already). Measuring against a cached copy is actually going to be more reflective of the average reviewer's experience.


> > AJAX requests dropped to 30-100ms.
> 
> These are completely impossible times for actual real-life network requests
> that anyone cares about.  My ping time to reviewboard.mozilla.org is about
> 100ms...

Again, these are against the cached copies. I'm just trying to see how long it's taking for a bare-minimum (cached) request to be fulfilled. On Chrome, this is very very fast. On Firefox, this is almost a second by default, until I turn off rendering of the diffs. I'm trying to show where there might be a problem, to help any diagnosis on Mozilla's end.


> > I did try the suggestion of overlapping the requests.
> 
> In which exact way?  Just making the requests in parallel but inserting them
> all into the DOM all at once, in one operation with no intermediate layout
> flushes, should be the optimal thing to do from the point of view of
> rendering performance.

That negates the "review as it loads" experience that we moved to, away from the "insert all at once" that created such a large perception of performance problems that had users complaining.

We've been there before, and we moved away from it for pretty solid reasons. Provided the browser's rendering doesn't slow to a crawl, the method we're using works well.


> 
> If you have a public URL for the setup with overlapped requests I would be
> happy to profile it in Firefox and see what's going on there.  But again, to
> be clear, this bug is about unacceptable slow performance with Chrome as
> well.

Performance in Chrome isn't as bad as it's being made out to be in this bug. This is total load time, not the time before you can review code or otherwise use Review Board. The page is completely responsive the whole time it's loading on Chrome, just not on Firefox. Especially on a default setup, the performance is fine, and we don't have complaints from users otherwise.

It takes the time it does here because reviewboard.mozilla.org has quadrupled the number of files requested per page (we default to 20, it's at 80), and we have to request each one. If the rendering performance was the same here as it is in Chrome, Firefox, and Safari, the diffs would just continue to load as you're busy reviewing, and you wouldn't notice a problem. It's because of the massive slowdown (which is impacting both the renders and the requests, and other logic we've got in there) that there's even a noticeable issue here.

Now, are there things we could do to further speed up requests? Yes. I've played around in the past with a variation on our diff loading that passes all the IDs of the diffs to load and, if each is available in cache, just returns them in one request. We have something very similar for the reviews page, for the comment diff fragments. We may be able to develop this for a release at some point. However, even doing so, it doesn't solve the rendering problem, which is the real cause of the performance issues here.
> this can be very very slow, and create a perception that the page is stuck

As opposed to the perception of watching little colored dots fill in one after another?

> You'd be looking at the diffs that have loaded and hopefully not even notice the delays on the others.

How can you not notice it when it's staring you in the face: the UI keeps jiggling, the page keeps moving around...

> As mentioned above, we do a *lot* more than Bugzilla does here.

Yes, I know that.  But for _my_ purposes that doesn't matter.  What I'm dealing with is I'm being asked to do a review system that's a lot slower than the one I'm used to.  Yes, it adds various functionality, but I don't _want_ most of that functionality...

> default installs load 20 per page

Note that the pagination is terrible to start with because it completely breaks searching within the diff.  :(

> It's a good test of what's going on without involving the server-side diff generation or repository communication machinery

Yes, but why does that matter?

The whole problem I'm running into is that we're not starting the server-side work for one chunk until we've gotten a response for a previous chunk.  Yes, this does assume that server load is not an issue, but if that's not the case we should fix that problem on our server (reviewboard.mozilla.org), not rate-throttle everything...

> Helps to highlight which operations are really slowing things down here

Again, the firefox-specific rendering slowness is NOT THIS BUG.  I said that in comment 0, with a link to the bug covering that issue.  This bug report is purely about the fact that on anything resembling a high-latency network connection the review experience for a patch which touches many files is ... frustrating.

> (assuming anyone at all has loaded that diff at least once already)

This assumption may or may not hold, but anyway doesn't affect the network latency situation.

> Measuring against a cached copy is actually going to be more reflective of the average reviewer's experience.

I vehemently disagree, as a reviewer...

> I'm trying to show where there might be a problem, to help any diagnosis on Mozilla's end.

You keep trying to talk about the problem bug 1213122 covers, which is not what this bug report is about.  I know there's a layout performance problem there that we should fix.  Even when we do, the experience here will still be terrible.  :(

> That negates the "review as it loads" experience that we moved to

I should note that I filed this bug as a result of loading a review request (in Chrome, to work around bug 1213122) and getting blocked on the content coming in.  As in, I was reviewing faster than it was coming in.  It's possible that if the requests were parallelized it still would have taken a while to get anything at all to show up, of course.  But my point is that the "review as it loads" experience was not happening for me, and it was pretty jarring.  Having everything pop in at once would probably be less jarring...

> This is total load time, not the time before you can review code or otherwise use Review Board. 

Yes, except that if you start searching in the diff, or skipping to the file you care about, suddenly you sit there and wonder what happened.... and then it pops out of your viewport and you start cursing and looking for it again.

> It takes the time it does here because reviewboard.mozilla.org has quadrupled the number of files requested per page

Yes, because people complained about paginated requests making it really hard to do basic things like search in the diff...

> the diffs would just continue to load as you're busy reviewing, and you wouldn't notice a problem.

My experience actually trying to review in Chrome begs to differ.

> it doesn't solve the rendering problem

Which is tracked in bug 1213122, for the Nth time.
Or to put my opinion on the pagination more clearly: it seems like a workaround for the slowness of requesting a large number of chunks, and it introduces its own (insurmountable, unless you just download the raw diff) annoyances into the review process.  In an ideal world, from my point of view, there would be no pagination at all.  Not at 20 files, nor 80 files, nor any other number.
From what I gather reading all this, and this might be (probably is?) a gross simplification, it seems highly unlikely that we'd get reasonable performance out of a large diff, on any browser, even if we combined things on the server (though I admit it's this last part I'm less certain about).  It sounds like the main problem, from a load-time perspective, is that Review Board's rich diff view just requires quite a lot of computation.

Thus I'd rather profile the operations that Review Board does to a diff and do cost-benefit analyses of all these little features from the perspective of a Mozilla reviewer (for instance, I don't know that anyone here finds the indicators beside the file names to be particularly useful).  Of course this will vary across reviewers, but there may be low-value-high-cost features that we could disable in a customized diff view.  On top of that, we could, perhaps, add a simple control that disables most of this markup for those who really want a high-speed-but-barebones view.

We're working on inline comments now so we'll soon have the experience to understand this problem better.
Another alternative, and one that would probably garner results immediately, is to encourage patch writers to split large changes up across multiple commits.

https://reviewboard.mozilla.org/r/20953/diff/1/ is extraordinarily large. As a reviewer, I'd have probably asked for a breaking-up of this patch if it was posted to Bugzilla. It'd feel unwieldy there too, despite loading sooner.

I would put good money on the following immediate results occurring:

1) The pages would load more quickly (obviously)
2) The reviews would happen more quickly (not just because of 1, but also because I assert that smaller commits are easier to review).
> is to encourage patch writers to split large changes up across multiple commits

That depends on whether the commits are intended to compile and pass tests standalone or not.  If I change the signature of a commonly used function, I'd want a single commit changing all the callsites at least by the time I'm pushing it to inbound...

Updated

2 years ago
Blocks: 1288133
(Assignee)

Updated

2 years ago
Assignee: nobody → glob
Component: General → Review Board: DiffViewer
Comment hidden (mozreview-request)
(Assignee)

Comment 15

2 years ago
this change plus the changes in bug 1265964 dropped the load time of a test review in my development environment in firefox from 40s to 4s.

Comment 16

2 years ago
Worth mentioning that if the order isn't guaranteed, then the order in which comments show up in a review will be different from the order in which comments are left. Comments are in patch display order, primarily, and then sorted by line number.

In our experience, it's common for people to reference comments made in earlier files. We have had bugs related to comment ordering in the past and it definitely impacted the user experience. Many bug reports and complaints pre-fix. You'll likely encounter the same frustration from users that some of ours experienced in the past with out-of-order bugs.
(Assignee)

Updated

2 years ago
Attachment #8813018 - Flags: review?(smacleod)
(Assignee)

Comment 17

2 years ago
(In reply to Christian Hammond from comment #16)
> Worth mentioning that if the order isn't guaranteed, then the order in which
> comments show up in a review will be different from the order in which
> comments are left. Comments are in patch display order, primarily, and then
> sorted by line number.
> 
> In our experience, it's common for people to reference comments made in
> earlier files. We have had bugs related to comment ordering in the past and
> it definitely impacted the user experience. Many bug reports and complaints
> pre-fix. You'll likely encounter the same frustration from users that some
> of ours experienced in the past with out-of-order bugs.

thanks for the heads-up christian.
i'll investigate if that's still an issue with our inline comment rendering.
(Assignee)

Comment 18

2 years ago
note the order of the files on the page doesn't change with this patch because the #file_container_ divs are created in order and then later populated with the xhr response.  what can potentially change is the order in which the files contents appear.

Comment 19

2 years ago
Ah, that makes sense. I read the patch description as saying the files would appear on the page in random order.

Just FYI, I've also been investigating other approaches to speeding up rendering of the diff viewer. If all goes well, the change you currently have up that switches from setting a `<style>` to setting individual `style="..."` for the widths won't be necessary anymore. Not sure on an ETA yet, though.. Still investigating some things.
Comment hidden (mozreview-request)

Comment 21

2 years ago
mozreview-review
Comment on attachment 8813018 [details]
MozReview: load diffs in parallel instead of via the funcQueue (bug 1251286);

https://reviewboard.mozilla.org/r/94536/#review94898

sweet, LGTM

::: reviewboard/reviewboard/static/rb/js/pages/views/diffViewerPageView_mozreview.js:297
(Diff revision 2)
>       *
>       * When the diff is loaded, it will be placed into the appropriate location
>       * in the diff viewer. The anchors on the page will be rebuilt. This will
>       * then trigger the loading of the next file.
>       */
>      queueLoadDiff: function(fileDiffID, fileDiffRevision,

This is a sucky name for this function now, but I don't really think it's worth spending the time changing that...
Attachment #8813018 - Flags: review?(smacleod) → review+

Comment 22

2 years ago
Pushed by bjones@mozilla.com:
https://hg.mozilla.org/webtools/reviewboard/rev/0775a2e88a44
MozReview: load diffs in parallel instead of via the funcQueue ; r=smacleod
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.