Open Bug 1307024 Opened 8 years ago Updated 2 years ago

[Perf] [google docs] Firefox will get 3~9% faster when running gdoc test cases with chrome user agent

Categories

(Core :: General, defect)

45 Branch
x86_64
Linux
defect

Tracking

()

Performance Impact low
Tracking Status
platform-rel --- -

People

(Reporter: sho, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] [webcompat])

# Test Case
STR
1. Open the browser
2. Go to the specific page below, each page represent 1 test case
* read utf8: https://goo.gl/2SGn1l
* load page: https://goo.gl/298B5M
* read table: https://goo.gl/FTI0Vj

# Hardware
OS: Ubuntu 14.04
CPU: i7-3770 3.4GMhz
Memory: 16GB Ram
Hard Drive: 1TB SATA HDD
Graphics: GK107 [GeForce GT 640]/ GF108 [GeForce GT 440/630]

# Browsers
Firefox version: 45.02
Chrome version: 50.0.2661.86

# Result
Test Case                  | Firefox default user agent| use Chrome user agent 
gdoc_read_basic_table_1    | 22,567 ms                 | 20,856 ms
gdoc_load                  | 10,200 ms                 | 9,300 ms
gdoc_read_utf8_txt_1       | 10,856 ms                 | 10439 ms

# Detail report:
https://goo.gl/X6zdOP

# Har file
* Default agent har file 	https://drive.google.com/file/d/0BwkEhia_D6l_d21mWDBnR2FZZHc/view?usp=sharing				
* Chrome agent har file	https://drive.google.com/file/d/0BwkEhia_D6l_SVE3WWhDQUZ5NDg/view?usp=sharing
Keywords: perf
Flags: needinfo?(overholt)
platform-rel: --- → +
Whiteboard: [platform-rel-Google][platform-rel-GoogleDocs]
Whiteboard: [platform-rel-Google][platform-rel-GoogleDocs] → [platform-rel-Google][platform-rel-GoogleDocs] [webcompat]
Thanks for filing this. Can you please re-do the results with a more recent version of Firefox? Also, it would be really great if we had super, super, super simple steps to reproduce for those who have never heard of Hasal before :) I guess the vagrant setup is the simplest thing we have?
platform-rel: + → ---
Flags: needinfo?(overholt) → needinfo?(sho)
Hi Andrew,

  Sure, I can re-do the results with Firefox 49.01. Will update the result after test finish.

  Yes, I think vagrant should be the most simple and not mess up his environment. But remember to git pull the latest code and re-setup the environment after the instance downloaded.
Flags: needinfo?(sho)
For those playing along at home, here are some rough steps to reproduce:

- clone https://github.com/Mozilla-TWQA/Hasal
- run the setup steps listed there: https://github.com/Mozilla-TWQA/Hasal#installation (https://github.com/Mozilla-TWQA/Hasal#vm-template may be easiest)
- run the tests listed in comment 0 like python runtest.py re suite.txt --max-run=1 --max-retry=1|

Shako, does that seem correct? How does one run the individual tests you've got in comment 0?
Flags: needinfo?(sho)
Looking at the network request diff, I noticed that there's a jserror request to Docs with the Chrome user agent with payload "error=a.setBaseAndExtent%20is%20not%20a%20function". This would result in an error dialog being shown on the page. I reprod this behavior on Firefox with a spoofed user agent. The dialog pops up not during initial load, but as soon as I click on the document.

Is it possible that this error dialog (and subsequent missing response from the page) is responsible for the timing difference here? What exactly are those tests measuring?
Hi Andrew,

  I will suggest use https://github.com/Mozilla-TWQA/Hasal/blob/master/bootstrap-linux.sh to setup your environment if you want to setup your environment from scratch.

  The execute command is correct, but remember to specify the test case in your suite.txt, the content as below:

tests.regression.gdoc.test_firefox_gdoc_read_utf8_txt_1
tests.regression.gdoc.test_firefox_gdoc_load
tests.regression.gdoc.test_firefox_gdoc_read_basic_txt_1

the test execution sequence would be the same as comment 0
Flags: needinfo?(sho)
Hi Philipp,

  When the test begin, we will have a background video recording to record everything on desktop, and have two snapshot before we simulate user action (for this case, only one action: paste the url link) and after. Then we will use these two snapshots to matching the video to get the running time of user action.

  And I've seen the error dialog you mentioned here, but we didn't not see it between two snapshots. 
We are not quite sure about how does it impact the running time. Do you have idea about where the error dialog come from? Google doc or Firefox?
Test result for Firefox 49.01

# Hardware
OS: Ubuntu 14.04
CPU: i7-3770 3.4GMhz
Memory: 16GB Ram
Hard Drive: 1TB SATA HDD
Graphics: GK107 [GeForce GT 640]/ GF108 [GeForce GT 440/630]

# Browsers
Firefox version: 49.01

# Result
Test Case                  | Firefox default user agent       | use Chrome user agent 
gdoc_read_basic_table_1    | 17988.8888889 ms                 | 16261.1111111 ms
gdoc_load                  | 5255.55555556 ms                 | 4894.44444444 ms
gdoc_read_utf8_txt_1       | 6155.55555556 ms                 | 5338.88888889 ms
platform-rel: --- → ?
Whiteboard: [platform-rel-Google][platform-rel-GoogleDocs] [webcompat] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] [webcompat]
It would be interesting to split out the networking parts of the tests for analysis, especially bytes per domain.

Why you ask? I did run webpagetest on a google doc and found that there are some random artifacts from account.google.com popping up only on some runs. That might also be the spot where Chrome gets the fast path (not on Google Docs itself), but that is just an hypothesis.

For reference, 2 webpagetest runs using Chrome on DSL, started a few seconds apart on the same instance:

[1]: https://www.webpagetest.org/result/161021_SJ_1FP6/1/domains/ - accounts.google.com 48kb
[2]: https://www.webpagetest.org/result/161021_W9_1FQ8/1/domains/ - accounts.google.com 11kb

Same for Firefox:

[3]: https://www.webpagetest.org/result/161020_R9_4THJ/1/domains/ - accounts.google.com 47kb
[4]: https://www.webpagetest.org/result/161020_28_54PR/1/domains/ - accounts.google.com	28kb

If accounts.g.c is included for Firefox does not depend on UA as far as I can tell. What might happen though is that accounts.google.com has account dependent code.

Not sure if that leads us anywhere, more investigation needs to be done.
(In reply to Philipp Weis from comment #4)
> Looking at the network request diff, I noticed that there's a jserror
> request to Docs with the Chrome user agent with payload
> "error=a.setBaseAndExtent%20is%20not%20a%20function". This would result in
> an error dialog being shown on the page. I reprod this behavior on Firefox
> with a spoofed user agent. The dialog pops up not during initial load, but
> as soon as I click on the document.
> 
> Is it possible that this error dialog (and subsequent missing response from
> the page) is responsible for the timing difference here? What exactly are
> those tests measuring?

Shako, WDYT?
Flags: needinfo?(sho)
Hi Andrew,

Please see the comment 6.

I'm not quite sure about where the error come from, but I double check our snapshot image, there is no such error dialog in it. As Philipp mentioned here, it could affect the running time, but I'm not sure where the dialog come from? gdoc? or Firefox?
Flags: needinfo?(sho)
(In reply to Shako Ho from comment #10)
> Hi Andrew,
> 
> Please see the comment 6.
> 
> I'm not quite sure about where the error come from, but I double check our
> snapshot image, there is no such error dialog in it. As Philipp mentioned
> here, it could affect the running time, but I'm not sure where the dialog
> come from? gdoc? or Firefox?

Hmm. Philipp, any ideas? Should Shako be able to see it in the snapshot images he's taking?
Flags: needinfo?(pweis)
This dialog is shown by Docs, and I would expect it to be up in the snapshot image unless there's some further user interaction that makes it disappear again. I'm out of office for the rest of the week, but will take a closer look at what causes the dialog to show next week. Also will try to do a comparison with a modified code to avoid the dialog issue.
platform-rel: ? → -
Did you get a chance to try anything here, Philipp?
Apologies for the delay here. I'll try to have an update by the end of this week.
The error is caused by Docs using the #setBaseAndExtent method from the Selection API for Chrome, but a different implementation for Firefox since that method isn't available there. See https://w3c.github.io/selection-api/#widl-Selection-setBaseAndExtent-void-Node-anchorNode-unsigned-long-anchorOffset-Node-focusNode-unsigned-long-focusOffset

You can easily test this by setting general.useragent.override to a Chrome user agent, e.g. "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.86 Safari/537.36", and then clicking or typing in any Google doc.

Shako, any ideas why the dialog wouldn't be shown in the snapshot image? If there's further interaction, in particular an ESC keypress, the dialog might close, and any further keystrokes or clicks would for the most part be ignored (and not cause any latency). Can I see a video of the full test?
Flags: needinfo?(pweis) → needinfo?(sho)
Hi Philipp,

  I think the main reason why the dialog didn't show up then is because after we paste the link on browser url bar, we didn't send any key or click event on browser. 

  The video of this full test that time is deleted, and we recently have some changes after the url pasted, so the new test video will cover the error dialog.
Flags: needinfo?(sho)
Are you going to share a link here to the new test video when you've got it, Shako?
Flags: needinfo?(sho)
Ya, the new test video as below:

https://drive.google.com/open?id=0BwkEhia_D6l_OGlpV3otNkc0eDg
Flags: needinfo?(sho)
Thanks for the video! This shows the error dialog pop up at 0:50, and then staying up until the end. So I don't think this is really a meaningful comparison.

The missing selection API method can be hacked around by executing the following line after page load.

document.getElementsByClassName('docs-texteventtarget-iframe')[0].contentWindow.getSelection().setBaseAndExtent = function() {};

No more crash with that, but the typed characters don't appear on screen. Not really sure why. I can dig more if needed, but honestly not sure if it's worthwhile continuing down that path.
Depends on: 1321623
Philipp, over in bug 1321623 we've implemented setBaseAndExtent. Any chance you can feature-detect this? Or I guess check for Firefox >= 53 if you're using UA detection exclusively?
Flags: needinfo?(pweis)
Philipp told me via email that he's going to try to get to this soon.
Ok, took another look at this. Two separate things here now I believe:

1. Investigate switching over Docs to use setBaseAndExtent for Firefox. Can you clarify what the benefits of this would be? This is actually controlled in the Closure library. Can someone on your end with the right expertise comment on whether it should be safe to switch over GeckoRange#selectInternal [1] to do the same as WebkitRange#selectInternal [2]? If you think that's worthwhile and would be beneficial, I can try to get that prioritized on our end.

[1] https://github.com/google/closure-library/blob/master/closure/goog/dom/browserrange/geckorange.js#L72
[2] https://github.com/google/closure-library/blob/master/closure/goog/dom/browserrange/webkitrange.js#L100

2. Understand performance test behavior. I just tried with 53 nightly, and there's no crash anymore. :) However, I still see the same behavior as in #19 above, where typing doesn't work. Unclear why, but probably something where we have different behavior on Chrome and that just doesn't work in FF.
Flags: needinfo?(pweis)
(In reply to Philipp Weis from comment #22)
> Ok, took another look at this. Two separate things here now I believe:

Thanks!

> 1. Investigate switching over Docs to use setBaseAndExtent for Firefox. Can
> you clarify what the benefits of this would be? This is actually controlled
> in the Closure library. Can someone on your end with the right expertise
> comment on whether it should be safe to switch over
> GeckoRange#selectInternal [1] to do the same as WebkitRange#selectInternal
> [2]? If you think that's worthwhile and would be beneficial, I can try to
> get that prioritized on our end.
> 
> [1]
> https://github.com/google/closure-library/blob/master/closure/goog/dom/
> browserrange/geckorange.js#L72
> [2]
> https://github.com/google/closure-library/blob/master/closure/goog/dom/
> browserrange/webkitrange.js#L100

Mats or Olli are probably the best people to respond.
Flags: needinfo?(mats)
Flags: needinfo?(bugs)
I don't know what the benefits to switch to setBaseAndExtent are, but I was in the impression that not having setBaseAndExtent made GDocs to switch to some slower js code paths.
Flags: needinfo?(bugs)
(In reply to Olli Pettay [:smaug] (review queue closed until backlog cleared) from comment #24)
> I don't know what the benefits to switch to setBaseAndExtent are, but I was
> in the impression that not having setBaseAndExtent made GDocs to switch to
> some slower js code paths.

It is not GDocs, it's the closure compiler generated code that can return a different object from browserrange.createRange() depending on the UA string <https://github.com/google/closure-library/blob/d3933d6f8577ddda3bf5acc91ccef71ab17a2427/closure/goog/dom/browserrange/browserrange.js#L58>.  WebKitRange uses setBaseAndExtent, and GeckoRange does not.

As far as I can tell, this is the code that we end up running in this case <https://github.com/google/closure-library/blob/d3933d6f8577ddda3bf5acc91ccef71ab17a2427/closure/goog/dom/browserrange/geckorange.js#L85>.  It's hard for me to imagine setBaseAndExtent being meaningfully better or worse than this code...
I suspect that "!reversed" is true in most cases, so I think Gecko ends up here:
https://github.com/google/closure-library/blob/d3933d6f8577ddda3bf5acc91ccef71ab17a2427/closure/goog/dom/browserrange/w3crange.js#L256

In any case, it's hard to see how WebKitRange.prototype.selectInternal could be
much faster than that.  Unless the dispatch itself to the base method puts us
on a pessimistic JS path or something.

The WebKitRange.prototype.selectInternal code is better for a Gecko that supports
setBaseAndExtent though, because it handles "reversed" correctly.

I think we need more detailed data on what it is that makes the "Chrome UA string"
run faster.
Flags: needinfo?(mats)
Thanks all for chiming in here.

I don't believe we really have good evidence here that using a Chrome UA string makes anything faster. All we had was a test that measured Chrome UA as faster because it showed an error dialog.

With Firefox 53 and setBaseAndExtent, we wouldn't see the error dialog anymore, but typing still doesn't work because of a separate unknown issue. I believe this would still result in faster test times because characters aren't rendered etc.

I can dig a bit more, but don't really think we want to fully support running in Firefox with a fake user agent.
Whiteboard: [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] [webcompat] → [qf:p3][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] [webcompat]

Migrating Webcompat whiteboard priorities to project flags. See bug 1547409.

Webcompat Priority: --- → ?

See bug 1547409. Migrating whiteboard priority tags to program flags.

Webcompat Priority: ? → ---
Performance Impact: --- → P3
Whiteboard: [qf:p3][platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] [webcompat] → [platform-rel-Google][platform-rel-GoogleSuite][platform-rel-GoogleDocs] [webcompat]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.