Closed Bug 1532491 Opened 9 months ago Closed 8 months ago

Add a version of AWSY that uses tp6

Categories

(Testing :: AWSY, enhancement)

Version 3
enhancement
Not set

Tracking

(Fission Milestone:M2, firefox68 fixed)

RESOLVED FIXED
mozilla68
Fission Milestone M2
Tracking Status
firefox68 --- fixed

People

(Reporter: erahm, Assigned: erahm)

References

Details

Attachments

(5 files)

We'd like add more exhaustive testing of memory usage beyond our static tp5 page set. This will let us test newer web features and should also support the fission effort as we begin to move remote iframes out of process. Additionally we think this would be useful as we make modifications that we believe will improve memory usage on "real" sites but don't show up during standard AWSY testing (bug 1523843 is one example).

A quicker solution might be to just add memory perf data to the existing raptor tests.

I've added a WIP for supporting tp6. To test locally run:

./mach awsy-test --preferences testing/awsy/conf/prefs-tp6.json --quick

This mostly works, but there are some details to iron out:

  1. Raptor has some extra scripts it runs, I'm not sure how important those are. I'm hesitant to just copy them over but I'm not sure the best way to use them otherwise
  2. We want to support both tp5 and tp6, this is somewhat hardcoded to tp6. I think we could make that a test flag easily enough.
  3. We need to annotate the perf data so that we can differentiate tp6 runs
  4. We need to support running in automation, that might be as simple as adding a flag, or there might be some weirdness around mozproxy

Rob, would you mind taking a look at the WIP to sanity check it and giving some input for point #1 above?

Flags: needinfo?(rwood)

The concept seems okay, but AFAICT sometimes the pages never finish loading, so marionette's navigate method waits 5 minutes and then throws an exception. It doesn't look like there's a reasonably way to adjust the timeout. Henrik, do you know of way to set something like a 10s timeout for loading?

Flags: needinfo?(hskupin)

I would be interested why navigate never returns and finally times out after the default 300s load timeout value. Do you have a TRACE log of Marionette which shows such a load? I would prefer to see that fixed as lowering the timeout value, which could be done via self.marionette.timeout.page_load:

https://searchfox.org/mozilla-central/rev/f1c7ba91fad60bfea184006f3728dd6ac48c8e56/testing/marionette/client/marionette_driver/timeout.py#60

Flags: needinfo?(hskupin) → needinfo?(erahm)
Attached file gecko.log
Flags: needinfo?(erahm)

I was unable to repro if I redirected the gecko log to stdout, attachment 9051409 [details] is the log from running:

./mach awsy-test --preferences testing/awsy/conf/prefs-tp6.json --per-tab-pause 1 --settle-wait-time 1 -vv --entities 6 --iterations 1
Flags: needinfo?(hskupin)

The last page load even Marionette receives is DOMContentLoaded for:

1552686403246 Marionette TRACE [6442450945] Received DOM event DOMContentLoaded for https://docs.google.com/document/d/1US-07msg12slQtI_xchzYxcKlTs6Fp7WqIc6W5GK5M8/edit

But there is no pageshow event. Maybe check with the devtools open while Marionette navigates if such an event comes in. If not there might be a resource which never ends loading for that document.

If you don't want to wait for all resources but only the DOM to be loaded, you could modify the page load strategy to eager. There are some Marionette unit tests which do that.

Flags: needinfo?(hskupin)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #7)

The last page load even Marionette receives is DOMContentLoaded for:

1552686403246 Marionette TRACE [6442450945] Received DOM event DOMContentLoaded for https://docs.google.com/document/d/1US-07msg12slQtI_xchzYxcKlTs6Fp7WqIc6W5GK5M8/edit

But there is no pageshow event. Maybe check with the devtools open while Marionette navigates if such an event comes in. If not there might be a resource which never ends loading for that document.

If you don't want to wait for all resources but only the DOM to be loaded, you could modify the page load strategy to eager. There are some Marionette unit tests which do that.

It sounds like eager is what I want (although it is odd that I can't repro if I write output to stdout, possibly a marionette timing issue). Is there a specific way to specify capabilities to the MarionetteTestRunner? I see that I should do something along the lines of pageLoadStrategy=eager, it's just not clear where. The only place I see it used is Mn tests like this, so should I delete my session and then create a new one (that seems like it would clear whatever other caps we want)?

Flags: needinfo?(hskupin)

Yes, that is exactly what you want to have to do. This capability can only be changed when creating a new session.

Flags: needinfo?(hskupin)

:erahm are you using the alternate replay script with mitmproxy? by default, any requests that are not in the recordings will time out. mitmproxy offers a way to kill these unknown requests, but that can also result in timeouts. For Raptor, we replace this behaviour with returning a 404 for any unexpected requests. One possibility for unfamiliar requests is non-deterministic JavaScript such as requests based on the time/random numbers. In recent recordings we inject JavaScript to make the requests more deterministic, but we haven't yet re-recorded the desktop sites since introducing this script.

mitmproxy alternate replay script: https://searchfox.org/mozilla-central/rev/4763b8d576ce52625d245d1ab6d9404ea025b026/testing/raptor/raptor/playback/alternate-server-replay.py#180

Flags: needinfo?(erahm)
Flags: needinfo?(rwood)

(In reply to Dave Hunt [:davehunt] [he/him] ⌚️UTC from comment #10)

:erahm are you using the alternate replay script with mitmproxy? by default, any requests that are not in the recordings will time out. mitmproxy offers a way to kill these unknown requests, but that can also result in timeouts. For Raptor, we replace this behaviour with returning a 404 for any unexpected requests. One possibility for unfamiliar requests is non-deterministic JavaScript such as requests based on the time/random numbers. In recent recordings we inject JavaScript to make the requests more deterministic, but we haven't yet re-recorded the desktop sites since introducing this script.

mitmproxy alternate replay script: https://searchfox.org/mozilla-central/rev/4763b8d576ce52625d245d1ab6d9404ea025b026/testing/raptor/raptor/playback/alternate-server-replay.py#180

I was using the alternate-server-replay.py script locally, I've updated my WIP patch to include that here.

I'm not totally convinced mitmproxy/mozproxy is working (or I don't fully understand how it works), AFAICT it's just acting as a gateway to the actual live sites rather than replaying a recorded session. It seems like the raptor tests are doing the same thing for me locally.

Flags: needinfo?(erahm)

:whimboo, in my latest try pushes I'm getting an odd marionette error when trying to send a keypress event: this.tabModal is null. Do you have any idea what's going on there?

[task 2019-03-23T00:37:25.496Z] 00:37:25     INFO - loading https://www.ebay.com/
[task 2019-03-23T00:37:25.496Z] 00:37:25     INFO - 
[task 2019-03-23T00:37:27.381Z] 00:37:27     INFO - loaded!
[task 2019-03-23T00:37:37.497Z] 00:37:37  WARNING - Failed to gather test failure debug: TypeError: this.tabModal is null
[task 2019-03-23T00:37:37.499Z] 00:37:37  WARNING - stacktrace:
[task 2019-03-23T00:37:37.500Z] 00:37:37  WARNING - 	get ui@chrome://marionette/content/modal.js:159:5
[task 2019-03-23T00:37:37.502Z] 00:37:37  WARNING - 	GeckoDriver.prototype._handleUserPrompts@chrome://marionette/content/driver.js:3256:23
[task 2019-03-23T00:37:37.502Z] 00:37:37  WARNING - 	GeckoDriver.prototype.getPageSource@chrome://marionette/content/driver.js:1159:14
[task 2019-03-23T00:37:37.502Z] 00:37:37  WARNING - 	despatch@chrome://marionette/content/server.js:289:40
[task 2019-03-23T00:37:37.503Z] 00:37:37  WARNING - 	execute@chrome://marionette/content/server.js:262:16
[task 2019-03-23T00:37:37.503Z] 00:37:37  WARNING - 	onPacket/<@chrome://marionette/content/server.js:235:20
[task 2019-03-23T00:37:37.503Z] 00:37:37  WARNING - 	onPacket@chrome://marionette/content/server.js:236:9
[task 2019-03-23T00:37:37.504Z] 00:37:37  WARNING - 	_onJSONObjectReady/<@chrome://marionette/content/transport.js:492:20
[task 2019-03-23T00:37:37.504Z] 00:37:37  WARNING - 
[task 2019-03-23T00:37:37.519Z] 00:37:37     INFO - TEST-UNEXPECTED-ERROR | awsy/test_memory_usage.py TestMemoryUsage.test_open_tabs | UnknownException: TypeError: this.tabModal is null
[task 2019-03-23T00:37:37.519Z] 00:37:37     INFO - stacktrace:
[task 2019-03-23T00:37:37.519Z] 00:37:37     INFO - 	get ui@chrome://marionette/content/modal.js:159:5
[task 2019-03-23T00:37:37.519Z] 00:37:37     INFO - 	GeckoDriver.prototype._handleUserPrompts@chrome://marionette/content/driver.js:3256:23
[task 2019-03-23T00:37:37.519Z] 00:37:37     INFO - 	GeckoDriver.prototype.findElement@chrome://marionette/content/driver.js:1989:14
[task 2019-03-23T00:37:37.520Z] 00:37:37     INFO - 	despatch@chrome://marionette/content/server.js:289:40
[task 2019-03-23T00:37:37.520Z] 00:37:37     INFO - 	execute@chrome://marionette/content/server.js:262:16
[task 2019-03-23T00:37:37.520Z] 00:37:37     INFO - 	onPacket/<@chrome://marionette/content/server.js:235:20
[task 2019-03-23T00:37:37.520Z] 00:37:37     INFO - 	onPacket@chrome://marionette/content/server.js:236:9
[task 2019-03-23T00:37:37.521Z] 00:37:37     INFO - 	_onJSONObjectReady/<@chrome://marionette/content/transport.js:492:20
[task 2019-03-23T00:37:37.521Z] 00:37:37     INFO - Traceback (most recent call last):
[task 2019-03-23T00:37:37.521Z] 00:37:37     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_harness/marionette_test/testcases.py", line 159, in run
[task 2019-03-23T00:37:37.521Z] 00:37:37     INFO -     testMethod()
[task 2019-03-23T00:37:37.521Z] 00:37:37     INFO -   File "/builds/worker/workspace/build/tests/awsy/awsy/test_memory_usage.py", line 240, in test_open_tabs
[task 2019-03-23T00:37:37.521Z] 00:37:37     INFO -     self.open_pages()
[task 2019-03-23T00:37:37.521Z] 00:37:37     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/awsy/awsy_test_case.py", line 398, in open_pages
[task 2019-03-23T00:37:37.522Z] 00:37:37     INFO -     self.signal_user_active()
[task 2019-03-23T00:37:37.522Z] 00:37:37     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/awsy/awsy_test_case.py", line 389, in signal_user_active
[task 2019-03-23T00:37:37.522Z] 00:37:37     INFO -     root = self.marionette.find_element(By.CSS_SELECTOR, ':root')
[task 2019-03-23T00:37:37.522Z] 00:37:37     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 1694, in find_element
[task 2019-03-23T00:37:37.523Z] 00:37:37     INFO -     body, key="value")
[task 2019-03-23T00:37:37.523Z] 00:37:37     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/decorators.py", line 26, in _
[task 2019-03-23T00:37:37.523Z] 00:37:37     INFO -     return func(*args, **kwargs)
[task 2019-03-23T00:37:37.523Z] 00:37:37     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 598, in _send_message
[task 2019-03-23T00:37:37.523Z] 00:37:37     INFO -     self._handle_error(err)
[task 2019-03-23T00:37:37.523Z] 00:37:37     INFO -   File "/builds/worker/workspace/build/venv/lib/python2.7/site-packages/marionette_driver/marionette.py", line 618, in _handle_error
[task 2019-03-23T00:37:37.523Z] 00:37:37     INFO -     raise errors.lookup(error)(message, stacktrace=stacktrace)
Flags: needinfo?(hskupin)
Attachment #9051153 - Attachment description: Bug 1532491 - WIP: Add a tp6 version of AWSY → Bug 1532491 - Part 1: Add a tp6 version of AWSY

An sy-tp6 variant is added to the AWSY test suite that runs against the tp6 pageset.

Hard to say why it exactly fails. Mind filing it as a new bug under Marionette? If you have a chance it would be good to also have an isolated/minimized testcase for it.

Flags: needinfo?(hskupin)
Depends on: 1538782

Tarek, it looks like mozproxy is logging an error and turning my jobs orange during shutdown of mitmproxy. I wasn't seeing this until updating to the latest m-c today (my code was probably a week or two out of date). I'm not really sure what it's upset about, mitmdump just says "mitmdump: errors occurred during run" at the end of it's log.

Do you have any idea what's going on?

Flags: needinfo?(tarek)

So, looks like the playback script fails in several places with

<< Error in HTTP connection: HttpException('Cannot assemble flow with missing content',)

And we're now bubbling up the exit code of the proxy script correctly, see

https://searchfox.org/mozilla-central/source/testing/mozbase/mozproxy/mozproxy/backends/mitm.py#214

And whimboo fix : https://bugzilla.mozilla.org/show_bug.cgi?id=1538677#c1

I guess we should look at the recorded scenario and see whether we're hitting URLs we're not supposed to hit.

Flags: needinfo?(tarek)

(In reply to Tarek Ziadé (:tarek) from comment #16)

So, looks like the playback script fails in several places with

<< Error in HTTP connection: HttpException('Cannot assemble flow with missing content',)

And we're now bubbling up the exit code of the proxy script correctly, see

https://searchfox.org/mozilla-central/source/testing/mozbase/mozproxy/mozproxy/backends/mitm.py#214

And whimboo fix : https://bugzilla.mozilla.org/show_bug.cgi?id=1538677#c1

I guess we should look at the recorded scenario and see whether we're hitting URLs we're not supposed to hit.

It looks like if I remove --no-upstream-cert I no longer get a failure on shutdown even with that error listed in the log (among others). I'm going to file a blocker to change that log level as I don't think it provides any value turning jobs orange for a non-issue.

Depends on: 1539317

(In reply to Eric Rahm [:erahm] from comment #17)

It looks like if I remove --no-upstream-cert I no longer get a failure on shutdown even with that error listed in the log (among others). I'm going to file a blocker to change that log level as I don't think it provides any value turning jobs orange for a non-issue.

As noted on the other bug this isn't a non-issue but mitmproxy crashes multiple/each time. Maybe that is an old bug which will be fixed once version 4.0.4 is active. For more details see bug 1457274.

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #18)

(In reply to Eric Rahm [:erahm] from comment #17)

It looks like if I remove --no-upstream-cert I no longer get a failure on shutdown even with that error listed in the log (among others). I'm going to file a blocker to change that log level as I don't think it provides any value turning jobs orange for a non-issue.

As noted on the other bug this isn't a non-issue but mitmproxy crashes multiple/each time. Maybe that is an old bug which will be fixed once version 4.0.4 is active. For more details see bug 1457274.

It's a non-issue in the "replay works for me" sense. Upgrading to 4.0.4 just results with an error code of -9. Are you okay accepting my patch if I file a follow up that we should revert it once someone from the testing teams has time to look into it? I'd like to get the AWSY tp6 tests landed so we can start building up a reasonably large set of data prior to attempting to enable tests for fission (targeting May 7).

Flags: needinfo?(hskupin)

(In reply to Eric Rahm [:erahm] from comment #19)

As noted on the other bug this isn't a non-issue but mitmproxy crashes multiple/each time. Maybe that is an old bug which will be fixed once version 4.0.4 is active. For more details see bug 1457274.

It's a non-issue in the "replay works for me" sense. Upgrading to 4.0.4 just results with an error code of -9. Are you okay accepting my patch if I file a follow up that we should revert it once someone from the testing teams has time to look into it? I'd like to get the AWSY tp6 tests landed so we can start building up a reasonably large set of data prior to attempting to enable tests for fission (targeting May 7).

Rob is actually the one here you will have to speak to. Your change largely impacts the stability/reporting of Raptor jobs. Maybe he or Bebe can give an update when the upgrade to 4.0.4 will finally happen.

Flags: needinfo?(rwood)
Flags: needinfo?(fstrugariu)

(In reply to Eric Rahm [:erahm] from comment #17)

It looks like if I remove --no-upstream-cert I no longer get a failure on shutdown even with that error listed in the log (among others). I'm going to file a blocker to change that log level as I don't think it provides any value turning jobs orange for a non-issue.

I believe stopping the upstream cert connection causes mitmproxy to drop HTTP/2 support. See https://discourse.mitmproxy.org/t/possible-to-playback-http-2-with-upstream-cert-sniffing-disabled-upstream-cert-false/1394 or spreak to :acreskey for more details.

As mitmproxy 4.0.4 is still in the experimental area of development. I would advise against adopting it in new testsuites.

:erahm why did you used --no-upstream-cert none of the tests that we run in raptor or other projects use that setting. As noted by :davehunt and also see Bug 1517323 it causes HTTP/2 issues.

Flags: needinfo?(fstrugariu)

(In reply to Florin Strugariu [:Bebe] from comment #22)

As mitmproxy 4.0.4 is still in the experimental area of development. I would advise against adopting it in new testsuites.

:whimboo suggested it as a fix for mitmdump returning an error code (it didn't help, I'm okay not using it).

:erahm why did you used --no-upstream-cert none of the tests that we run in raptor or other projects use that setting. As noted by :davehunt and also see Bug 1517323 it causes HTTP/2 issues.

As this is a memory test, I care more about stable timing and less about perf. The HTTP/2 downgrade doesn't concern me. Generally speaking, no test should be relying on external servers and in the past we've made efforts in AWSY to prevent outbound connections.

Depends on: 1539970

It's possible for the first tab handle to change as well. This removes the special case for when we only add a new tab and just checks all the time.

Attachment #9051153 - Attachment description: Bug 1532491 - Part 1: Add a tp6 version of AWSY → Bug 1532491 - Part 2: Add a tp6 version of AWSY. r=rwood,bc
Attachment #9053019 - Attachment description: Bug 1532491 - Part 2: Enable running AWSY tp6 in automation → Bug 1532491 - Part 3: Enable running AWSY tp6 in automation. r=bc
Flags: needinfo?(rwood)
Pushed by erahm@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d4b81226b56e
Part 1: Always check for a new tab handle. r=bc
https://hg.mozilla.org/integration/autoland/rev/b7a77be421d1
Part 2: Add a tp6 version of AWSY. r=bc,rwood
https://hg.mozilla.org/integration/autoland/rev/63888044e37b
Part 3: Enable running AWSY tp6 in automation. r=bc
Pushed by nerli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8cfba39a9c9b
Follow up: Fix f8 warning in awsy_script. CLOSED TREE
Pushed by nerli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c75b53bc557a
Follow up: Fix f8 warning in awsy_script. CLOSED TREE r=noemi_erli
Backout by nerli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5dc1b2326d3a
Backed out changeset c75b53bc557a for unnecessary fix CLOSED TREE
Flags: needinfo?(hskupin)
Fission Milestone: --- → M2
Regressions: 1540886
You need to log in before you can comment on or make changes to this bug.