Closed Bug 1377253 Opened 5 years ago Closed 3 years ago

Reflow is slow on twitter direct messages (and other sites), with lots of reflow time spent in nested flex reflow

Categories

(Core :: Layout, defect, P2)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
platform-rel --- ?
firefox57 --- fix-optional
firefox58 --- affected

People

(Reporter: dholbert, Assigned: dholbert)

References

(Blocks 7 open bugs)

Details

(Keywords: perf, Whiteboard: [qf:p1:f64][platform-rel-Twitter])

This is a helper bug for bug 1373023. When you type characters on Twitter, we spend some time in synchronous reflows (triggered by each character that you type).

Profile, from bug 1373023 comment 5: http://bit.ly/2sBRXNT

It looks like each PresShell::DoReflow call there is ~28ms long, with most of the time spent in nested flexbox reflow.

The textbox does have four layers of flex container ancestors (with other non-flex-container ancestors in between the layers, which I'll disregard for the moment):
>  <div id="dm_dialog" class="DMDialog modal-container">
>    <div class="DMActivity DMConversation js-ariaDocument u-chromeOverflowFix DMActivity--open">
>     <div class="DMActivity-container">
>       <form class="DMComposer tweet-form">

So that's why we're nested several recursive levels into nsFlexContainerFrame::Reflow stuff.

As I recall, we're not particularly careful about tracking dirtiness for specific flex items.  If a flex container has one dirty descendant & receives a call to Reflow as a result, it just reflows all of its descendants, because the new layout information from the dirty descendant can influence the results of the flexbox layout algorithm & the sizes of all the other flex items.

We probably can & should cache some more state between reflows, so that we can reflow dirty flex-items-and-their-subtrees more selectively & see if their flex-relevant measurements have changed in a way that'd influence the flex layout algorithm.  (And if they haven't changed, then we can skip redoing flex layout & skip reflowing all the other flex items.)
Summary: Reflow is slow on twitter direct messages → Reflow is slow on twitter direct messages (with reflow time mostly spent in nested flex reflow)
(In reply to Daniel Holbert [:dholbert] from comment #0)
> If a flex container has one dirty descendant &
> receives a call to Reflow as a result, it just reflows all of its
> descendants,

(It's actually not quite this bad -- the whole subtree won't necessarily get reflowed.  For example, if one of the descendants is a block and that block isn't marked as dirty, then I'm pretty sure that block won't get reflowed or will have a trivial reflow.  But, each flex item will get an opportunity to reflow itself, which may or may not be fast depending on what sort of container it is & what dirty bits are set.)
Flags: needinfo?(dholbert)
Blocks: FastReflows
Flags: needinfo?(dholbert)
Assignee: nobody → dholbert
Status: NEW → ASSIGNED
Whiteboard: [qf]
Here's a profile I took locally, using latest Nightly on my Linux64 laptop, when typing (1 character per second) into the text field for a DM conversation between me & jwatt (with a large paragraph already present in the textbox before I started profiling here):
https://perf-html.io/public/9aedaaa3b6a5cb8ef6c738966c47a136b8d09bed/calltree/?hiddenThreads=&search=reflow&thread=3&threadOrder=0-2-3-1

(Note that this profile doesn't have the fix for bug 1375491, which will reduce the frequency of layout-flushes here -- that's on inbound and will be in tomorrow's Nightly, I assume.)

The layout flushes in my profile are a bit shorter than the ones in overholt's profile from comment 0 -- around 10ms per character press. (This is probably just due to differing amounts of content & machine capability.)
Flags: needinfo?(dholbert)
Blocks: 1376536
Whiteboard: [qf] → [qf:p1]
Priority: -- → P1
Here's another profile from today's Nightly, now with 0.1ms samples (for better resolution). I typed 9 characters into the textbox for the same twitter conversation with jwatt, with a long "blah blah" paragraph pre-entered into the textbox before my typed characters just in case it matters.

https://perfht.ml/2t832cV  (already filtered for "reflow")

The layout flushes for each of my 9 keypresses are the "double humps" of reflow activity (and there are 4 other brief reflows from JS calls to offsetWidth).  Each keypress seems to trigger 25-30ms of reflow activity, in this profile.
Summary: Reflow is slow on twitter direct messages (with reflow time mostly spent in nested flex reflow) → Reflow is slow on twitter direct messages (and other sites), with lots of reflow time spent in nested flex reflow
Calling this qf:p2 since I haven't gotten to it yet & risk tolerance is ramping down, and I don't think I'll have this ready in time for 57 given that.

p2 -> to do right after 57.
Whiteboard: [qf:p1] → [qf:p2]
Keywords: stale-bug
Blocks: 1393120
Blocks: 1367004
Priority: P1 → P2
See Also: → 1375032
Keywords: perf
Whiteboard: [qf:p2] → [qf:p1]
Whiteboard: [qf:p1] → [qf:i60][qf:p1]
Whiteboard: [qf:i60][qf:p1] → [qf:f60][qf:p1]
platform-rel: --- → ?
Whiteboard: [qf:f60][qf:p1] → [qf:f60][qf:p1] [platform-rel-Twitter]
Blocks: 1422367
Whiteboard: [qf:f60][qf:p1] [platform-rel-Twitter] → [qf:f61][qf:p1] [platform-rel-Twitter]
Blocks: 1438768
Depends on: 1282821
Depends on: 1374540
No longer depends on: 1282821
Hi Daniel,

I was wondering if this is still on your radar to fix?  thanks!
Flags: needinfo?(dholbert)
Whiteboard: [qf:f61][qf:p1] [platform-rel-Twitter] → [qf:f64][qf:p1] [platform-rel-Twitter]
Blocks: 1425106
Blocks: 1465205
Whiteboard: [qf:f64][qf:p1] [platform-rel-Twitter] → [qf:p1:f64][platform-rel-Twitter]
Blocks: 1473110
Blocks: 1474813
No longer blocks: 1474813
(In reply to Daniel Holbert [:dholbert] from comment #0)

> We probably can & should cache some more state between reflows, so that we
> can reflow dirty flex-items-and-their-subtrees more selectively & see if
> their flex-relevant measurements have changed in a way that'd influence the
> flex layout algorithm.  (And if they haven't changed, then we can skip
> redoing flex layout & skip reflowing all the other flex items.)

Since this plan was outlined, we may have incidentally made a start on this solution. In Bug 1409083, we added computed-on-demand cached flex item and flex line sizing information for use by devtools. It seems like this mechanism could be altered to solve this problem. Here's a sketch of how it could be done, with some questions:

1) Pare down the cached information to the minimum amount necessary to solve this problem and change the devtools cached information to be separate/additive.
  a) Is mainSize the only information we need? Seems like crossSize might not affect the space available to siblings.
  b) Is it still appropriate to hang the cached data off of a property? Or is there a more memory/performance friendly way to store the cached data, given that we'll need it for every flex item for every future reflow?

2) In DoFlexLayout(), where we check item->HadMeasuringReflow(), we would do something similar to do a running check of all items on the line. As long as each one of them is the same size as the cached value, we don't need to reflow it. Once we do, then we have to reflow the following children.
  a) Do we actually have sizes for the flex items here if they haven't had a measuring reflow?
  b) The iteration order here is the order the items appear in the flex line; not the order that they were frozen by the flex algorithm. If that matters, how do we account for that?

I'm sure there are other tricky issues to consider. If this is outline is deemed plausible, I would be happy to take the bug and attempt it.
Slow flex reflows show up in https://www.huffingtonpost.com/ too.
Depends on: 1490890
Depends on: 1492538
I think this particular case with Twitter is better now.

At least:
 - I verified that keypress-triggered reflows aren't reflowing frames that are part of the conversation backscroll.  So: that's a sign that we're not reflowing unboundedly-expensive arbitrary amounts of content.  (That's what was unboundedly-slow over in bug 1376536.)  

 - I captured a profile of me typing "123456" (with a pause between each character) into a twitter DM with ~medium amount of backscroll (25 messages), and I got this profile: https://perfht.ml/2T0qv9p
We're spending 16ms of the time there with DoReflow in the backtrace (which averages to 16ms/6-keypresses = 2.6ms per keypress, though that's actually broken down into several different reflows per keypress since Twitter is doing a lot at once, and none of the individual reflows is longer than 2ms as you can see in the "Marker Chart" view.)

Meanwhile, if I perform the same experiment in Chrome, they report spending over twice as much time in layout (34ms and 36ms in the two tests that I did).

Compare this (particularly <2ms reflow times) to the ~28ms reflow times in comment 0, and it seems likely this is fixed.

Overholt: if you happen to still have the expensive-to-type-in twitter-DM-conversation available to test with (The one that you used in bug 1373023), it'd be great if you could reprofile just to double check my measurements! But for now, I'm going to call this WORKSFORME.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: needinfo?(dholbert) → needinfo?(overholt)
Resolution: --- → WORKSFORME
(In reply to Olli Pettay [:smaug] (high review load) from comment #10)
> Slow flex reflows show up in https://www.huffingtonpost.com/ too.

BTW, I just profiled a pageload of huffingtonpost.com: https://perfht.ml/2zGiayB

The "marker table" UI there shows that there are lots of reflows, though most are quite short. The longest one is right at the beginning (which makes sense), and it's 51ms long.  The second-longest one is the 10th reflow, and it's 12.2ms long.  So, the 51ms-long-reflow is a bit concerning, though it's an entirely unreasonable time for reflow-to-completion of a full complex page.  (And interruptible reflow means we're probably still somewhat responsive during that 51ms, if the user were to click or anything, since it's the initial reflow of the page.)

Also: for comparison, I profiled the same page in Chrome dev, and in that profile, I'm seeing the first layout being 83.38ms long, and then there's another layout ~100ms later that is 42.7ms long, and then there's another one ~50ms later that is 91.81ms long, and there are several more in the 20-30ms range later on.  All in all, they report 425.7ms of time spent on Layout during the load of this page, whereas we report 194.8ms of layout (time with DoReflow in the backtrace).

So at the moment, we seem to be doing a great deal better than Chrome is doing, in terms of layout on huffingtonpost.com loading (fewer long layouts, shorter duration for longest layout, and < 50% of the time spent in layout overall).

So I don't think there's likely to be much that's actionable there at the moment.
(BTW: if anyone digs in further on huffingtonpost and disagrees with my assessment in comment 12, let's track it in a new bug.)
(In reply to Daniel Holbert [:dholbert] from comment #12)
> BTW, I just profiled a pageload of huffingtonpost.com:
> https://perfht.ml/2zGiayB
> [...]
> So, the 51ms-long-reflow is a bit concerning, though it's
> an entirely unreasonable time for reflow-to-completion of a full complex
> page.

Sorry, I left out a word -- I meant to say it's *not* an entirely unreasonable time. :)
(In reply to Daniel Holbert [:dholbert] from comment #11)
> 
> Overholt: if you happen to still have the expensive-to-type-in
> twitter-DM-conversation available to test with (The one that you used in bug
> 1373023), it'd be great if you could reprofile just to double check my
> measurements! But for now, I'm going to call this WORKSFORME.

I tried it and it feels no laggier than Notepad. I didn't even bother profiling because it seems perfectly fine. Thanks!
Flags: needinfo?(overholt)
You need to log in before you can comment on or make changes to this bug.