Investigate a slow reflow case for Google Docs

NEW
Assigned to

Status

()

Core
Layout
11 months ago
3 months ago

People

(Reporter: Away for a while, Assigned: jet, NeedInfo)

Tracking

(Blocks: 2 bugs, {perf})

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(platform-rel -)

Details

(Whiteboard: [qf:meta][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs])

(Reporter)

Description

11 months ago
While investigating bug 1269695, we came across a profile which shows a behavior where we can optimize things better.

See the profile here: <https://clptr.io/2jtuBZb>.  The interesting part of the JS code here is:

    > function jhb(a, b, c, d, e) {
    >     b != a.H && (a.H = b, b = AF(BF.Ma(), b, d), a.F.style.cssText = b);
    >     a.F.innerHTML = c;
    >     e.firstChild != a.C && (mp(e), e.appendChild(a.C));
    >     return e.getBoundingClientRect().width
    > };
     
    https://docs.google.com/static/document/client/js/919068751-kix_main_i18n_kix_core.js

In Chrome, the reflow triggered by the call to getBoundingClientRect() takes about 1s to finish, but in Firefox it takes 1.8s.  It seems that out of that, 1.5s is reflowing an iframe and around 340ms reflowing its parent document here: <http://searchfox.org/mozilla-central/rev/0f254a30d684796bcc8b6e2a102a0095d25842bb/dom/base/nsDocument.cpp#7792>

Is there any special cases that we can detect here which would allow us to avoid reflowing the parent document?
Flags: needinfo?(dholbert)
Flags: needinfo?(dbaron)
(In reply to :Ehsan Akhgari from comment #0)
> Is there any special cases that we can detect here which would allow us to
> avoid reflowing the parent document?

Yes... and I thought there was a bug in which bz and I discussed this... but I can't find it right now.

Hopefully I'll have a chance to look more tomorrow.
platform-rel: --- → ?
Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
I can't find a bug -- I think I had a conversation with bz, started writing this patch:
https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/f6b8b174b0e0/condition-promote-flush-type
and then never finished it.  (Added in https://hg.mozilla.org/users/dbaron_mozilla.com/patches/rev/b97ab326e18c .)

The key ideas were:

 - we only need to flush in the parent if the child document has media queries that depend on its container size ('height', 'width', etc.)

 - I don't really remember the details about the telemetry idea, but I think we thought it was worth knowing (a) how frequent different levels of flush are in the wild and (b) maybe how often the flush promotion happens in the wild to see how important it was to reduce the frequency of.
Flags: needinfo?(dbaron)
Flags: needinfo?(dholbert)
(Reporter)

Comment 3

11 months ago
Sounds great, thanks David!  Is this something that the layout team can look into?  It may help with the Google Suite performance improvement project.
(Reporter)

Updated

11 months ago
Flags: needinfo?(bugs)
> we only need to flush in the parent if the child document has
> media queries that depend on its container size

Not quite.  We need to flush in the parent if:

1)  We're doing a layout flush in the child (because layout depends
    on the viewport size).
2)  We're doing a style flush in the child and there are media queries
    that depend on the viewport size (because in that case style depends
    on the viewport size).

In this case, we're doing getBoundingClientRect(), which is a layout flush, so to get the "right" answers we have to flush the parent.  Here's a simple testcase that I expect would fail if we did not flush the parent in this situation:

  <!DOCTYPE html>
  <iframe style="width: 100px"
          srcdoc="<input type='button' value='click me' onclick=&quot;frameElement.style.width = '200px'; alert(document.documentElement.getBoundingClientRect().width)&quot;>">
  </iframe>

What we would have to detect to avoid the flush is that either the size of our element does not depend on the viewport size or the size of the iframe does not depend on whatever is dirty in the parent document...
Depends on: 1329877
So one interesting thing I'm seeing in this case for the layout of the child frame is that there's a bunch of time spent in bidi code.  Specifically, at least 163ms in nsBidiPresUtils::Resolve, plus 56ms in nsBidiPresUtils::RepositionInlineFrames plus some other miscellany that adds up to 10-20ms.  Is that expected in this case?
Right.  I was thinking about just the "when do we need to promote a style flush to a layout flush in the parent".

I guess the thing we wanted telemetry for was figuring out if we actually hit that case with any frequency.  But given that it's a getBoundingClientRect, it's not really relevant to this bug...

Sorry for the confusion.
(Reporter)

Comment 7

11 months ago
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #5)
> So one interesting thing I'm seeing in this case for the layout of the child
> frame is that there's a bunch of time spent in bidi code.  Specifically, at
> least 163ms in nsBidiPresUtils::Resolve, plus 56ms in
> nsBidiPresUtils::RepositionInlineFrames plus some other miscellany that adds
> up to 10-20ms.  Is that expected in this case?

Yeah I think so, since there is some Arabic text in the test case on page 95...
(Reporter)

Comment 8

11 months ago
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #4)
> In this case, we're doing getBoundingClientRect(), which is a layout flush,
> so to get the "right" answers we have to flush the parent.  Here's a simple
> testcase that I expect would fail if we did not flush the parent in this
> situation:
> 
>   <!DOCTYPE html>
>   <iframe style="width: 100px"
>           srcdoc="<input type='button' value='click me'
> onclick=&quot;frameElement.style.width = '200px';
> alert(document.documentElement.getBoundingClientRect().width)&quot;>">
>   </iframe>

It does indeed break, yes.

> What we would have to detect to avoid the flush is that either the size of
> our element does not depend on the viewport size

Which element do you mean?  This sounds complicated based on my very limited understanding of layout...

> or the size of the iframe
> does not depend on whatever is dirty in the parent document...

This sounds easier to manage.  Would we need to check the dirtiness of things beside the iframe dimensions?
> Which element do you mean?

The one who's getBoundingClientRect is being called.

> This sounds complicated based on my very limited understanding of layout...

It _is_ complicated.  :(

> Would we need to check the dirtiness of things beside the iframe dimensions?

Unfortunately, yes.  Consider this testcase:

  <!DOCTYPE html>
  <div style="float: left; width: 100px; height: 100px"></div>
  <div style="overflow: scroll">
    <iframe style="width:100%"></iframe>
  </div>

Now try changing the width of the float and seeing what happens to the width of the iframe...  This is a fairly hard problem too, unfortunately.
Flags: needinfo?(aschen)
Flags: needinfo?(dbaron)
(In reply to :Ehsan Akhgari from comment #0)
> See the profile here: <https://clptr.io/2jtuBZb>.  The interesting part of

One other comment about this profile is that nsBidiPresUtils::Resolve is 19% of ViewportFrame::Reflow.
platform-rel: ? → +
Keywords: perf
(Reporter)

Updated

10 months ago
Blocks: 1341750
(Assignee)

Comment 11

9 months ago
How does this look now that bug 1329877 is fixed?
Flags: needinfo?(ehsan)
(Reporter)

Comment 12

9 months ago
(In reply to Jet Villegas (:jet) from comment #11)
> How does this look now that bug 1329877 is fixed?

Yes, that bug has made a *huge* difference!  Here is a new profile from the latest Nightly: <https://perfht.ml/2lEA0xR>

Here, getBoundingClientRect() is taking about 590ms, less than half the time it used to take before.  Even visually the page seems to load a lot faster than what I remember when we were testing in Taipei.

Shako, I'm really curious if you have been able to notice this improvement in the Hasal test results?
Blocks: 1334524
Flags: needinfo?(ehsan) → needinfo?(sho)
(Reporter)

Comment 13

9 months ago
Oh, I'm sorry for the false alarm, I forgot that the test case also include pressing End to go to the bottom of the page.  :(

After I did that, the results are less good...  https://perfht.ml/2lEzekm  1.5 seconds in getBoundingClientRect.  A bit better than before but still quite awful.

Updated

9 months ago
Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:p1][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]

Comment 14

9 months ago
Hi Ehsan,

  According to my observation of Hasal uploading number in perfherder, I didn't see any significant number changes(>1000ms) in the past 7 days.

  Do you need me to run particular test case on certain patch for you?
Flags: needinfo?(sho)
(Assignee)

Comment 15

9 months ago
(In reply to Shako Ho from comment #14)
> Hi Ehsan,
> 
>   According to my observation of Hasal uploading number in perfherder, I
> didn't see any significant number changes(>1000ms) in the past 7 days.
> 

I don't think we'll see improvement in 1000ms chunks with the fix for bug 1329877. Are you able to share the Hasal run immediately after that landed? The before/after profiles are different enough that I'd expect some change in the metrics.

Comment 16

9 months ago
yes, we can have a quick comparison about landing before and after, but I might need the revision number on try or date of nightly.

Ehsan, could you provide the build revision number for this patch or the date landed in nightly?
Flags: needinfo?(ehsan)
(Reporter)

Comment 17

9 months ago
See bug 1329877 comment 22: https://hg.mozilla.org/mozilla-central/rev/ae0efe63cb9b
Flags: needinfo?(ehsan)

Comment 18

9 months ago
Hi Jet and Ehsan,

  I've tested this patch with test case "gdoc_load_file", which is the test case for 200 gdoc page load.

  The result as below:

                                       MEDIAN          AVG             STD             SI              PSI
  Build with this patch:               8920            8885            258.5           2167            1788

  Nightly build without this patch:    8740.0          8740.0          332.5           2341            1709
(Reporter)

Updated

8 months ago
Depends on: 1352527
(Reporter)

Comment 19

8 months ago
(In reply to Shako Ho from comment #18)
> Hi Jet and Ehsan,
> 
>   I've tested this patch with test case "gdoc_load_file", which is the test
> case for 200 gdoc page load.

Does that exclude the time for pressing the end key?  The expensive reflow that the patch here is expected to improve mostly happens in response to the end key.
(Reporter)

Updated

8 months ago
Flags: needinfo?(sho)

Comment 20

8 months ago
Hi Ehsan,

  Sorry for late response.

  This test case didn't press the end key in steps.
Flags: needinfo?(sho)
Depends on: 1357694
Depends on: 1358275
(Assignee)

Comment 21

8 months ago
This one's a META bug for a bunch of Layout fixes.
Assignee: nobody → bugs
Flags: needinfo?(bugs)

Updated

7 months ago
platform-rel: + → -
(Reporter)

Comment 22

7 months ago
I don't understand what it means for a meta bug to be p1!  I think this is mistagged.
Whiteboard: [qf:p1][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:meta][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Flags: needinfo?(aschen)
You need to log in before you can comment on or make changes to this bug.