Open Bug 1581560 Opened 5 years ago Updated 2 years ago

Investigate Marionette overhead during Browsertime pageloads

Categories

(Testing :: Performance, enhancement, P3)

Version 3
enhancement

Tracking

(Not tracked)

People

(Reporter: bas.schouten, Unassigned)

References

(Blocks 1 open bug)

Details

In profiles such as https://perfht.ml/2QabzIg we seem to be experiencing a significant amount of overhead caused by marionette chatter during the pageload process. This may be harmful for pageload performance, and can also make it harder to derive signal from noise when analyzing pageload profiles.

Is this profile generated by an automated test? I ask because Marionette should be disabled by default, and is only enabled to enable automation of the browser. :whimboo is there a way to reduce the verbosity of Marionette? Perhaps we could try adjusting the polling when checking if the page has loaded?

Flags: needinfo?(hskupin)
Flags: needinfo?(bas)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #1)

Is this profile generated by an automated test? I ask because Marionette should be disabled by default, and is only enabled to enable automation of the browser.

Yes, these profiles are captured using Browsertime, so Marionette presence is required and some overhead is expected. We just want to understand and limit the overhead.

:whimboo is there a way to reduce the verbosity of Marionette? Perhaps we could try adjusting the polling when checking if the page has loaded?

Yes, there are ways to reduce the verbosity; if we were to see a huge amount of logging activity we can drop the loglevel.

I dug in just a tiny bit and definitely see a significant issue: Browsertime uses selenium-webdriver's Condition implementation, and under the hood that basically does setTimeout(check, 0) in a loop to figure out if pageload is complete. (This is all working around Web Driver's unidirectional HTTP architecture, but that's for another day.)

It appears impossible to slow down the setTimeout busy loop using selenium-webdriver flags and configuration, but we should be able to use alternate asynchronous evaluation methods to reduce traffic significantly. That's not really a Marionette issue, per se; but I expect we will discover some Marionette issues so we can keep this ticket to collect them.

Flags: needinfo?(bas)

Just using this ticket as a collection point, but, with

modified   lib/core/seleniumRunner.js
@@ -145,7 +145,7 @@ class SeleniumRunner {
     const driver = this.driver,
       pageCompleteCheckTimeout = this.options.timeouts.pageCompleteCheck;
     try {
-      const pageCompleteCheckCondition = new Condition(
+      let pageCompleteCheckCondition = new Condition(
         'for page complete check script to return true',
         function(d) {
           return d.executeScript(pageCompleteCheck, waitTime).then(function(t) {
@@ -158,6 +158,23 @@ class SeleniumRunner {
       );
       log.verbose(`Waiting for script ${pageCompleteCheck}`);
 
+      pageCompleteCheckCondition = driver.executeAsyncScript(`
+          var callback = arguments[arguments.length - 1];
+          return new Promise(resolve => {
+              console.log("document.readyState = " + document.readyState);
+              if (document.readyState == "complete") {
+                  return resolve(true);
+              }
+              console.log("awaiting load");
+              window.addEventListener('load', () => {
+                  console.log("resolving load");
+                  return resolve(true);
+              });
+          })
+              .then((r) => callback({'result': r}))
+              .catch((e) => callback({'error': e}));
+      `, pageCompleteCheckTimeout-1000);
+
       await timeout(
         driver.wait(pageCompleteCheckCondition, pageCompleteCheckTimeout),
         pageCompleteCheckTimeout,

I appear to be able to drop the busy waiting for the load event down. This will still require some busy-waiting for the loadEventEnd Performance API marker, since there's no corresponding DOM event; you just have to poll, AFAICT.

I assume there is nothing else for me to look at here. Thanks Nick for digging into this issue with browsertime.

Flags: needinfo?(hskupin)

In the event that I am not around to pick up this work, I strongly suggest that we start using the load event before polling for completion at the Browsertime level. There may be additional things we want to tweak, but this will at least let most of the pageload avoid Marionette (really, selenium-webdriver) polling as fast as it can.

Btw. for which event(s) for navigation you actually have to wait for?

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #6)

Btw. for which event(s) for navigation you actually have to wait for?

This is a little hard to answer precisely. Browsertime does a lot of loadURI-and-wait equivalents, with layers of its own on top of the layers of selenium-webdriver. It has a generic pageCompletedCheckCondition script that it uses selenium-webdriver's condition mechanism to wait for, and it's that condition mechanism that busy waits. (We could slow the busy wait down ourselves with an async condition and a timeout of our own, I reckon).

But fundamentally we're waiting for the "main" page-under-test to load, and then waiting a little more.

So waiting for get() to return and then running an async execution script for the remaining events doesn't work? Or making it all async? I don't understand why it needs to poll the individual states instead of reacting on events.

Assignee: nobody → fstrugariu
Priority: -- → P1
Status: NEW → ASSIGNED

how was the profile created?
adding --firefox.geckoProfiler to raptor browsertime results to an error:

 [2019-10-18 16:12:40] VERBOSE: [browsertime] Executing privileged async script Collect GeckoProfiler
13:12:40     INFO -  /home/florinstrugariu/mozilla/mozilla-central/tools/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/promise.js:2626
13:12:40     INFO -          throw error;
13:12:40     INFO -          ^
13:12:40     INFO -  JavascriptError: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProfiler.dumpProfileToFileAsync]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: [System Principal] :: <TOP_LEVEL> :: line 4"  data: no]
13:12:40     INFO -      at Object.throwDecodedError (/home/florinstrugariu/mozilla/mozilla-central/tools/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/error.js:514:15)
13:12:40     INFO -      at parseHttpResponse (/home/florinstrugariu/mozilla/mozilla-central/tools/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/http.js:519:13)
13:12:40     INFO -      at doSend.then.response (/home/florinstrugariu/mozilla/mozilla-central/tools/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/http.js:441:30)
13:12:40     INFO -      at <anonymous>
13:12:40     INFO -      at process._tickCallback (internal/process/next_tick.js:188:7)
13:12:40     INFO -  From: Task: WebDriver.executeScript()
13:12:40     INFO -      at Driver.schedule (/home/florinstrugariu/mozilla/mozilla-central/tools/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/webdriver.js:807:17)
13:12:40     INFO -      at Driver.executeAsyncScript (/home/florinstrugariu/mozilla/mozilla-central/tools/browsertime/node_modules/browsertime/node_modules/selenium-webdriver/lib/webdriver.js:891:17)
13:12:40     INFO -      at SeleniumRunner.runPrivilegedAsyncScript (/home/florinstrugariu/mozilla/mozilla-central/tools/browsertime/node_modules/browsertime/lib/core/seleniumRunner.js:403:30)
13:12:40     INFO -      at <anonymous>
13:12:40     INFO -      at process._tickCallback (internal/process/next_tick.js:188:7)
13:14:40    ERROR -  Traceback (most recent call last):
Flags: needinfo?(bas)

Gecko profiling while running browsertime via the raptor harness is not yet implemented (Bug 1585013). I'm guessing the profile above was created outside of raptor via ./mach browsertime.

(In reply to Robert Wood [:rwood] from comment #10)

Gecko profiling while running browsertime via the raptor harness is not yet implemented (Bug 1585013). I'm guessing the profile above was created outside of raptor via ./mach browsertime.

Correct! On Geckoview Example I believe.

Flags: needinfo?(bas)
Depends on: 1585013
Assignee: fstrugariu → nobody
Status: ASSIGNED → NEW
Blocks: browsertime

I ran a try to generate gecko profiles on all OS versions:

gecko profiles for raptor browser-time jobs:

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=e18ac0190fc661278ed3c4d25cbef64add5c94c0&searchStr=tp6-1-p

:bas.schouten do you think you can re-generate that profile again. just to make sure we will compare apple to apples.
And also is it possible to generate it on a desktop browser. GeckoProfile for Browsertime raptor android is not ready yet.

Flags: needinfo?(bas)

You can find the profiles in the links above

Flags: needinfo?(bas) → needinfo?(tarek)

thanks, I'll take it from here

Flags: needinfo?(tarek)
Assignee: nobody → tarek
Assignee: tarek → nalexander
Assignee: nalexander → nobody

How important is that? Should it be a P2?

Priority: P1 → P3
Whiteboard: [perftest:triage]

In our perftriage meeting Greg mentioned that this problem is mainly related to desktop but not mobile. Given that mobile is our primary focus right now, we will have to come back to this bug later this year.

Whiteboard: [perftest:triage]

Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)

Severity: normal → S3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.