Closed Bug 1329601 Opened 7 years ago Closed 2 years ago

[meta] Investigate a slow reflow case for Google Docs

Categories

(Core :: Layout, defect)

defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE
Performance Impact ?
Tracking Status
platform-rel --- -

People

(Reporter: ehsan.akhgari, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: meta, perf, Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs])

While investigating bug 1269695, we came across a profile which shows a behavior where we can optimize things better.

See the profile here: <https://clptr.io/2jtuBZb>.  The interesting part of the JS code here is:

    > function jhb(a, b, c, d, e) {
    >     b != a.H && (a.H = b, b = AF(BF.Ma(), b, d), a.F.style.cssText = b);
    >     a.F.innerHTML = c;
    >     e.firstChild != a.C && (mp(e), e.appendChild(a.C));
    >     return e.getBoundingClientRect().width
    > };
     
    https://docs.google.com/static/document/client/js/919068751-kix_main_i18n_kix_core.js

In Chrome, the reflow triggered by the call to getBoundingClientRect() takes about 1s to finish, but in Firefox it takes 1.8s.  It seems that out of that, 1.5s is reflowing an iframe and around 340ms reflowing its parent document here: <http://searchfox.org/mozilla-central/rev/0f254a30d684796bcc8b6e2a102a0095d25842bb/dom/base/nsDocument.cpp#7792>

Is there any special cases that we can detect here which would allow us to avoid reflowing the parent document?
Flags: needinfo?(dholbert)
Flags: needinfo?(dbaron)
(In reply to :Ehsan Akhgari from comment #0)
> Is there any special cases that we can detect here which would allow us to
> avoid reflowing the parent document?

Yes... and I thought there was a bug in which bz and I discussed this... but I can't find it right now.

Hopefully I'll have a chance to look more tomorrow.
platform-rel: --- → ?
Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
I can't find a bug -- I think I had a conversation with bz, started writing this patch:
https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/f6b8b174b0e0/condition-promote-flush-type
and then never finished it.  (Added in https://hg.mozilla.org/users/dbaron_mozilla.com/patches/rev/b97ab326e18c .)

The key ideas were:

 - we only need to flush in the parent if the child document has media queries that depend on its container size ('height', 'width', etc.)

 - I don't really remember the details about the telemetry idea, but I think we thought it was worth knowing (a) how frequent different levels of flush are in the wild and (b) maybe how often the flush promotion happens in the wild to see how important it was to reduce the frequency of.
Flags: needinfo?(dbaron)
Flags: needinfo?(dholbert)
Sounds great, thanks David!  Is this something that the layout team can look into?  It may help with the Google Suite performance improvement project.
Flags: needinfo?(bugs)
> we only need to flush in the parent if the child document has
> media queries that depend on its container size

Not quite.  We need to flush in the parent if:

1)  We're doing a layout flush in the child (because layout depends
    on the viewport size).
2)  We're doing a style flush in the child and there are media queries
    that depend on the viewport size (because in that case style depends
    on the viewport size).

In this case, we're doing getBoundingClientRect(), which is a layout flush, so to get the "right" answers we have to flush the parent.  Here's a simple testcase that I expect would fail if we did not flush the parent in this situation:

  <!DOCTYPE html>
  <iframe style="width: 100px"
          srcdoc="<input type='button' value='click me' onclick=&quot;frameElement.style.width = '200px'; alert(document.documentElement.getBoundingClientRect().width)&quot;>">
  </iframe>

What we would have to detect to avoid the flush is that either the size of our element does not depend on the viewport size or the size of the iframe does not depend on whatever is dirty in the parent document...
Depends on: 1329877
So one interesting thing I'm seeing in this case for the layout of the child frame is that there's a bunch of time spent in bidi code.  Specifically, at least 163ms in nsBidiPresUtils::Resolve, plus 56ms in nsBidiPresUtils::RepositionInlineFrames plus some other miscellany that adds up to 10-20ms.  Is that expected in this case?
Right.  I was thinking about just the "when do we need to promote a style flush to a layout flush in the parent".

I guess the thing we wanted telemetry for was figuring out if we actually hit that case with any frequency.  But given that it's a getBoundingClientRect, it's not really relevant to this bug...

Sorry for the confusion.
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #5)
> So one interesting thing I'm seeing in this case for the layout of the child
> frame is that there's a bunch of time spent in bidi code.  Specifically, at
> least 163ms in nsBidiPresUtils::Resolve, plus 56ms in
> nsBidiPresUtils::RepositionInlineFrames plus some other miscellany that adds
> up to 10-20ms.  Is that expected in this case?

Yeah I think so, since there is some Arabic text in the test case on page 95...
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #4)
> In this case, we're doing getBoundingClientRect(), which is a layout flush,
> so to get the "right" answers we have to flush the parent.  Here's a simple
> testcase that I expect would fail if we did not flush the parent in this
> situation:
> 
>   <!DOCTYPE html>
>   <iframe style="width: 100px"
>           srcdoc="<input type='button' value='click me'
> onclick=&quot;frameElement.style.width = '200px';
> alert(document.documentElement.getBoundingClientRect().width)&quot;>">
>   </iframe>

It does indeed break, yes.

> What we would have to detect to avoid the flush is that either the size of
> our element does not depend on the viewport size

Which element do you mean?  This sounds complicated based on my very limited understanding of layout...

> or the size of the iframe
> does not depend on whatever is dirty in the parent document...

This sounds easier to manage.  Would we need to check the dirtiness of things beside the iframe dimensions?
> Which element do you mean?

The one who's getBoundingClientRect is being called.

> This sounds complicated based on my very limited understanding of layout...

It _is_ complicated.  :(

> Would we need to check the dirtiness of things beside the iframe dimensions?

Unfortunately, yes.  Consider this testcase:

  <!DOCTYPE html>
  <div style="float: left; width: 100px; height: 100px"></div>
  <div style="overflow: scroll">
    <iframe style="width:100%"></iframe>
  </div>

Now try changing the width of the float and seeing what happens to the width of the iframe...  This is a fairly hard problem too, unfortunately.
Flags: needinfo?(aschen)
(In reply to :Ehsan Akhgari from comment #0)
> See the profile here: <https://clptr.io/2jtuBZb>.  The interesting part of

One other comment about this profile is that nsBidiPresUtils::Resolve is 19% of ViewportFrame::Reflow.
platform-rel: ? → +
Keywords: perf
Blocks: FastReflows
How does this look now that bug 1329877 is fixed?
Flags: needinfo?(ehsan)
(In reply to Jet Villegas (:jet) from comment #11)
> How does this look now that bug 1329877 is fixed?

Yes, that bug has made a *huge* difference!  Here is a new profile from the latest Nightly: <https://perfht.ml/2lEA0xR>

Here, getBoundingClientRect() is taking about 590ms, less than half the time it used to take before.  Even visually the page seems to load a lot faster than what I remember when we were testing in Taipei.

Shako, I'm really curious if you have been able to notice this improvement in the Hasal test results?
Flags: needinfo?(ehsan) → needinfo?(sho)
Oh, I'm sorry for the false alarm, I forgot that the test case also include pressing End to go to the bottom of the page.  :(

After I did that, the results are less good...  https://perfht.ml/2lEzekm  1.5 seconds in getBoundingClientRect.  A bit better than before but still quite awful.
Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:p1][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Hi Ehsan,

  According to my observation of Hasal uploading number in perfherder, I didn't see any significant number changes(>1000ms) in the past 7 days.

  Do you need me to run particular test case on certain patch for you?
Flags: needinfo?(sho)
(In reply to Shako Ho from comment #14)
> Hi Ehsan,
> 
>   According to my observation of Hasal uploading number in perfherder, I
> didn't see any significant number changes(>1000ms) in the past 7 days.
> 

I don't think we'll see improvement in 1000ms chunks with the fix for bug 1329877. Are you able to share the Hasal run immediately after that landed? The before/after profiles are different enough that I'd expect some change in the metrics.
yes, we can have a quick comparison about landing before and after, but I might need the revision number on try or date of nightly.

Ehsan, could you provide the build revision number for this patch or the date landed in nightly?
Flags: needinfo?(ehsan)
Hi Jet and Ehsan,

  I've tested this patch with test case "gdoc_load_file", which is the test case for 200 gdoc page load.

  The result as below:

                                       MEDIAN          AVG             STD             SI              PSI
  Build with this patch:               8920            8885            258.5           2167            1788

  Nightly build without this patch:    8740.0          8740.0          332.5           2341            1709
Depends on: 1352527
(In reply to Shako Ho from comment #18)
> Hi Jet and Ehsan,
> 
>   I've tested this patch with test case "gdoc_load_file", which is the test
> case for 200 gdoc page load.

Does that exclude the time for pressing the end key?  The expensive reflow that the patch here is expected to improve mostly happens in response to the end key.
Flags: needinfo?(sho)
Hi Ehsan,

  Sorry for late response.

  This test case didn't press the end key in steps.
Flags: needinfo?(sho)
Depends on: 1357694
Depends on: 1358275
This one's a META bug for a bunch of Layout fixes.
Assignee: nobody → bugs
Flags: needinfo?(bugs)
platform-rel: + → -
I don't understand what it means for a meta bug to be p1!  I think this is mistagged.
Whiteboard: [qf:p1][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [qf:meta][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Flags: needinfo?(aschen)
Assignee: bugs → svoisen
Performance Impact: --- → ?
Keywords: meta
Whiteboard: [qf:meta][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs]
Summary: Investigate a slow reflow case for Google Docs → [meta] Investigate a slow reflow case for Google Docs

The bug assignee didn't login in Bugzilla in the last 7 months.
:dholbert, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: sean → nobody
Flags: needinfo?(dholbert)

I think we can close this; this was filed for a slow-reflow in a specific performance-profile that was captured >5 years ago of a specific Google Docs page, and I can't find a link to the gdoc anywhere at this point. (Comment 0 references bug 1269695 which just says "the google doc url with 100 page content ( 33 image, 33 table, others are txt)" but doesn't have a link.)

It's not clear from this bug what the steps-to-reproduce would be at this point to see if this is still an issue; and more-than-likely Firefox and Google Docs have both changed substantially such that our Google Docs perf cliffs are now elsewhere. (e.g. I would bet https://workspaceupdates.googleblog.com/2021/05/Google-Docs-Canvas-Based-Rendering-Update.html changed the performance characteristics of Google Docs quite a bit)

Calling this INCOMPLETE since we don't have clear steps-to-reproduce at this point.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(dholbert)
Flags: needinfo?(dbaron)
Resolution: --- → INCOMPLETE

(We can file new bugs on new Google Docs perf issues that we discover going forward. Feel free to reopen this bug here if we can rediscover the appropriate STR and determine that we're still comparatively slower, too.)

You need to log in before you can comment on or make changes to this bug.