Closed Bug 1123506 Opened 9 years ago Closed 8 years ago

Evaluate scripts in content with lasting side-effects

Categories

(Remote Protocol :: Marionette, defect, P1)

defect

Tracking

(firefox48 fixed, firefox49 fixed)

RESOLVED FIXED
mozilla49
Tracking Status
firefox48 --- fixed
firefox49 --- fixed

People

(Reporter: automatedtester, Assigned: ato)

References

(Blocks 1 open bug)

Details

(Keywords: pi-marionette-goal, pi-marionette-server, pi-marionette-spec, Whiteboard: [marionette=1.0])

Attachments

(2 files, 3 obsolete files)

for us to have webdriver compat we need to be able to call JS that is on the page from executeScript
/r/4963 - Bug 1123506  - Fix wpt log fetching script for treeherder changes.

Pull down this commit:

hg pull review -r e69b4a557b05a232f91926ef41072683e2fbfe4d
Comment on attachment 8573861 [details]
MozReview Request: bz://1123506/jgraham

/r/4963 - Bug 1123506  - Fix wpt log fetching script for treeherder changes.

Pull down this commit:

hg pull review -r e69b4a557b05a232f91926ef41072683e2fbfe4d
Attachment #8573861 - Flags: review?(emorley)
Comment on attachment 8573861 [details]
MozReview Request: bz://1123506/jgraham

https://reviewboard.mozilla.org/r/4961/#review3929

r=me since looks roughly right, but I didn't check against the API (it will be pretty obvious if it isn't working hehe).
Much of this implementation can be simplified once bug 1113873 lands (soon).
Attachment #8573861 - Flags: review?(emorley) → review+
Needless to say I copied the wrong bug id into the commit message. That patch was for bug 1140143
Attachment #8573861 - Attachment is obsolete: true
Attachment #8619178 - Flags: review+
Hey, Is there someone we should chat with to prioritize this fix?
Flags: needinfo?(james)
Why, yes there is

/me punts upward.
Flags: needinfo?(james) → needinfo?(dburns)
Andreas,

Please can you look into this. I suspect the issue is related to how we setup the sandbox in the content scope which is why we are looking through the wrappedJSObject in our code but could be very wrong.
Flags: needinfo?(dburns) → needinfo?(ato)
By disabling Xray vision for the content principal’s sandbox we can have access to properties and globals defined in the document.  The sandbox by default enforces a security policy that “looks through” any modifications the user has made to the global state.  Disabling this would for example allow Execute Script to return a property or call a function that is predefined.

However when it is disabled, the window equality test we use to determine if we have already created a sandbox for the current window, will fail.  This is because the window objects will have diverged and are no longer equal.  This would effectively break the contract we with code that wishes to re-use the old sandbox (newSandbox is false).

This again would mean we would be able to _store_ new global state in the sandbox, but the next time you call to retrieve that state the sandbox would be discarded because it couldn’t find its own sandbox any longer.

A digression to this discussion is that even if we fixed these problems, setting global state would only have effect within the boundaries of the sandbox.  They would not take effect on the document proper.  The current behaviour will clone the window (sandbox it) and apply a security policy, before evaluating an arbitrary string on that runtime, which is not the runtime of the actual document.

So to fix the problem described in this bug we need to:

  1. Evaluate the string inside a sandbox with Xray vision disabled
  2. Fix lookup of sandboxes by finding some alternate way to link them to the current document

On a related note WebDriver expects to be able to modify the global state with the consequences taking effect.  We can do this by evaluating the script in browser.contentWindow.eval(…) instead, which would also fix this bug, but would fundamentally break with the Marionette approach of executing everything in sandboxes.
Flags: needinfo?(ato)
So what stops us from implementing the eval("") model for webdriver in content scope? None of this sandboxes stuff has any analouge in the standard, and seems to be fundamentally incompatible with it.
(In reply to James Graham [:jgraham] from comment #12)
> So what stops us from implementing the eval("") model for webdriver in
> content scope? None of this sandboxes stuff has any analouge in the
> standard, and seems to be fundamentally incompatible with it.

Backwards compatibility and complexity mostly.  We can get around concerns about code that expects to be run in sandboxes by introducing the non-sandboxed eval as a separate command and expose this to WebDriver conforming local ends.

Because the script execution infrastructure is so tightly coupled I’m working on making emulator callbacks less intrusive.  Once this is done it should be easier to refactor.
Assignee: nobody → ato
Attachment #8619178 - Attachment is obsolete: true
Summary: Can not use execute_script to call functions defined on the document → Evaluate scripts in content with lasting side-effects
Blocks: 1199002
Status: NEW → ASSIGNED
This bug is blocked by bug 1211489 because it contains a refactor of the emulator callbacks needed to expose `runEmulatorCmd` and `runEmulatorShell` in the sandbox.
Depends on: 1211489
All Marionette unit tests are passing, but we appear to have gaps in our test coverage since all Gij jobs and certain WPT jobs are failing consistently.

I believe the Gij failures are due to scripts being run with the wrong sandbox.  The way this patch (for the time being) determines if something evaluated should have lasting side-effects or be exposed to the slightly non-standardised setup that Gecko tests depend on, is by looking at the sandboxName parameter.  If this is set, the script is evaluated inside a special sandbox that is non-conforming to the WebDriver specification.

As for the failing WPT tests, they appear to be CSP related failures.  I’m not quite sure why this is yet.  It _may_ have something to do with the way we instrument Cu.evalInSandbox.
(In reply to Andreas Tolfsen (:ato) from comment #17)
> If this is set, the script is evaluated inside a special sandbox that is
> non-conforming to the WebDriver specification.

And just to be clear on what I mean here: I suspect we can fix this rather easily by changing the Marionette Node.js client to send the sandboxName parameter also.

Since it isn’t used outside of Mozilla by anyone expecting WebDriver compatibility, I think this would be a reasonable thing to do.  One, it would mean the script execution environments for both Python- and Node.js are the same.  Two, it would save me the trouble of introducing a separate command end-point for scripts that are meant to have lasting side effects, as I hinted would be an option earlier in this bug thread.
Depends on: 1234063
To achieve WebDriver conformance, Marionette needs the ability to evaluate
scripts in content space with lasting side-effects.  This means that
global state at the time of evaluation should reflect the current state
of the active browsing context, and that modifications to globals should
affect the behaviour and state of the browsing context and live on after
the script evaluation sandbox is nuked.

With this significant refactor, the script evaluation code in Marionette
is shared between chrome- and content space.  This reduces the number
of unique script evaluation implementations in Marionette from six to one.

Script execution is centralised in the new testing/marionette/evaluate.js
module.  evaluate.sandbox provides the main entry-point for execution,
and takes a sandbox and a script at the bare minimum.  Arguments and other
options can also be provided, which in sum provides the same functionality
as before.  It is not expected that Mozilla clients should have to change
as a consequence of this change.

Previously a range of interfaces were always available in the script's
scope.  This is no longer the case as interfaces are now provided on
a need-only basis.  This means, for example, that Simple Test harness
functionality is only available if you have a sandbox augmented with
the Simple Test harness.

Review commit: https://reviewboard.mozilla.org/r/30233/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/30233/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/1-2/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/2-3/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/3-4/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/4-5/
Attachment #8706039 - Flags: review?(jgriffin)
Attachment #8706039 - Flags: review?(dburns)
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=89ebd5224410

The try run for this patch is mostly fine, but is showing a single problem in WPT tests that I think is a Python issue rather than a test issue.  I will investigate this closer, but it shouldn’t be caused by the changes in this patch as much as the change triggering some undocumented behaviour in Python’s socket library.
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/5-6/
The above linked try run contained a change that made test_log.py fail for silly reasons.  I’ve pushed another one: https://treeherder.mozilla.org/#/jobs?repo=try&revision=86fccd0c1f8d
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/6-7/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

https://reviewboard.mozilla.org/r/30233/#review27857

Mostly nits in there. The overall logic looks sound to me.

::: testing/marionette/dispatcher.js:32
(Diff revision 7)
> - *     A factory function that takes an Emulator as argument and produces
> + *     A factory function that takes an EmulatorService and and produces

`and and`

::: testing/marionette/driver.js:725
(Diff revision 7)
> +      cmd.parameters.level || undefined);

if `cmd.parameters.level` isnt there it will return `undefined`. Do we need the `||`? Are there situations where it might be `null`?

::: testing/marionette/driver.js:810
(Diff revision 7)
> +    newSandbox: !!(typeof cmd.parameters.newSandbox == "undefined") ||

`===`

::: testing/marionette/driver.js:883
(Diff revision 7)
> +    newSandbox: !!(typeof cmd.parameters.newSandbox == "undefined") ||

`===`

::: testing/marionette/proxy.js:202
(Diff revision 7)
> +        if (msg.json.id != uuid) {

`!==`

::: testing/marionette/simpletest.js:58
(Diff revision 7)
> +        typeof passString == "undefined" ? this.TEST_PASS : passString,

`===`

::: testing/marionette/simpletest.js:140
(Diff revision 7)
> -  logToFile: function Marionette__logToFile(file) {
> +  logToFile(file) {

is this method used anywhere?

::: testing/marionette/simpletest.js:153
(Diff revision 7)
> -      if (typeof(o) == "undefined") {
> +    if (typeof o == "undefined") {

Could this be 

`if (o === undefinied) {`

?

::: testing/marionette/simpletest.js:160
(Diff revision 7)
> -          if (typeof(o.__repr__) == 'function') {
> +        if (typeof(o.__repr__) == "function") {

`===`

::: testing/marionette/simpletest.js:162
(Diff revision 7)
> -          } else if (typeof(o.repr) == 'function' && o.repr != arguments.callee) {
> +        } else if (typeof(o.repr) == "function" && o.repr != arguments.callee) {

`===`

::: testing/marionette/simpletest.js:169
(Diff revision 7)
> -                  o.toString == Function.prototype.toString ||
> +          (o.toString == Function.prototype.toString || o.toString == Object.prototype.toString)) {

`===`

::: testing/marionette/simpletest.js:199
(Diff revision 7)
> +        new Date(now.valueOf() + (typeof(timeout) == "undefined" ? this.timeout : timeout));

`===`
Attachment #8706039 - Flags: review?(dburns) → review+
If the thing that you think is bad socket behaviour is the "SSLError: [Errno 1] _ssl.c:1242: error:1409F07F:SSL routines:SSL3_WRITE_PENDING:bad write retry", then that also happened before this patch. Although that doesn't mean it's not a bug of course.
Attachment #8706039 - Flags: review?(jgriffin)
Attachment #8706039 - Attachment description: MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r?AutomatedTester r?jgriffin → MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r?automatedtester
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/7-8/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/8-9/
https://reviewboard.mozilla.org/r/30233/#review27857

> if `cmd.parameters.level` isnt there it will return `undefined`. Do we need the `||`? Are there situations where it might be `null`?

This might look a bit odd, but if `cmd.parameters.level` is `null`, we need to explicitly say “this argument isn’t defined” so that `logging.ContentLogger#log(message, level = "INFO")` doesn’t think the level is `null`.  Only `undefined` uses the default.

Do you have any other ideas for how to coerce `null` into `undefined`s?  I’m not actually sure we _want_ to do that.  I think the long-term solution here is to change the Python client so that it does not send the `level` parameter at all if it just wants the default.

I’ll add a comment to clarify and resolve this issue.

> `===`

Not necessary when using `typeof`.

> `!==`

With the `AsyncMessageChannel` patch this is no longer there.

> is this method used anywhere?

The answer is “probably not”, but I don’t think it’s this patch’s job to remove it.

> Could this be 
> 
> `if (o === undefinied) {`
> 
> ?

Could probably, but it’s safer to rely on `typeof` because it always produces reliable results, whereas JS type comparisons does not.
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/9-10/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/10-11/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/11-12/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/12-13/
No longer blocks: 1241986
Any updates on this?
(In reply to Vlad Filippov :vladikoff from comment #39)
> Any updates on this?

There are CSP test failures in WPT: https://treeherder.mozilla.org/#/jobs?repo=try&revision=39103002e6b9&group_state=expanded
If this is still an issue can you n-i? ckerschb and cc kjozwiak for them have a look at those tests.
Flags: needinfo?(ato)
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/13-14/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/14-15/
ckerschb: I’m reaching out to you in hope that you can shed some light on the sanity of some CSP tests in WPT.

Are you able to tell if the following tests make sense?

https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/mixed-content/blockable/http-csp/cross-origin-http/picture-tag/top-level/no-redirect/opt-in-blocks.https.html
https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/mixed-content/blockable/http-csp/same-host-http/picture-tag/top-level/no-redirect/opt-in-blocks.https.html
https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/mixed-content/blockable/meta-csp/same-host-http/picture-tag/top-level/no-redirect/opt-in-blocks.https.html
https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/mixed-content/blockable/no-opt-in/cross-origin-http/picture-tag/top-level/no-redirect/no-opt-in-blocks.https.html
https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/mixed-content/blockable/no-opt-in/same-host-http/picture-tag/top-level/no-redirect/no-opt-in-blocks.https.html
https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/mixed-content/blockable/meta-csp/cross-origin-http/picture-tag/top-level/no-redirect/opt-in-blocks.https.html
https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/mixed-content/blockable/meta-csp/same-host-http/picture-tag/top-level/no-redirect/opt-in-blocks.https.html

This patch makes changes to the JS evaluation runtime the WPT harness uses to run the above tests, causing them to fail in unexpected ways on try:

    https://treeherder.mozilla.org/#/jobs?repo=try&revision=2daa75a6267c&group_state=expanded

Note that a few of the /mixed-content/blockable tests in WPT have expected timeouts and in particular failures on Firefox debug builds.  This doesn’t build my confidence when attempting to gauge if the tests that fail on my try run are correct.

Hopefully you have a more trained eye for CSP, or know someone else I could ask.
Flags: needinfo?(ato) → needinfo?(mozilla)
I will add some of my findings so far in attempting to reproduce
the test failures:

  (1) All the five failing tests in W(3), and probably W-e10s(3),
      on Linux x64 opt passes both locally and on a TC loaner.

  (2) The two unexpected fails in W(5) and W-e10s(5) on Linux 64
      debug are not reproducible on the TC loaner, but appear in
      the Treeherder report.

This leads me to draw a few different conclusions:

  (a) The failures could be Buildbot specific.

  (b) There could be test runtime environment pollution, where
      state from one test bleeds over into others, causing
      subsequent tests to fail.

  (c) Timing issue.  Could the changes have made the tests slower,
      causing them to fail in unexpected ways?  Other race
      conditions?
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/15-16/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/16-17/
(In reply to Andreas Tolfsen ‹:ato› from comment #44)
> ckerschb: I’m reaching out to you in hope that you can shed some light on
> the sanity of some CSP tests in WPT.
> 
> Are you able to tell if the following tests make sense?
> 
> https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/
> mixed-content/blockable/http-csp/cross-origin-http/picture-tag/top-level/no-
> redirect/opt-in-blocks.https.html
> https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/
> mixed-content/blockable/http-csp/same-host-http/picture-tag/top-level/no-
> redirect/opt-in-blocks.https.html
> https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/
> mixed-content/blockable/meta-csp/same-host-http/picture-tag/top-level/no-
> redirect/opt-in-blocks.https.html
> https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/
> mixed-content/blockable/no-opt-in/cross-origin-http/picture-tag/top-level/no-
> redirect/no-opt-in-blocks.https.html
> https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/
> mixed-content/blockable/no-opt-in/same-host-http/picture-tag/top-level/no-
> redirect/no-opt-in-blocks.https.html
> https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/
> mixed-content/blockable/meta-csp/cross-origin-http/picture-tag/top-level/no-
> redirect/opt-in-blocks.https.html
> https://github.com/mozilla/gecko-dev/blob/master/testing/web-platform/tests/
> mixed-content/blockable/meta-csp/same-host-http/picture-tag/top-level/no-
> redirect/opt-in-blocks.https.html
> 
> This patch makes changes to the JS evaluation runtime the WPT harness uses
> to run the above tests, causing them to fail in unexpected ways on try:
> 
>    
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=2daa75a6267c&group_state=expanded
> 
> Note that a few of the /mixed-content/blockable tests in WPT have expected
> timeouts and in particular failures on Firefox debug builds.  This doesn’t
> build my confidence when attempting to gauge if the tests that fail on my
> try run are correct.
> 
> Hopefully you have a more trained eye for CSP, or know someone else I could
> ask.

I only had a quick look, but those tests are mixed-content-tests that use the CSP directive 'block-all-mixed-content'. I haven't looked, but potentially I have to update those WPT tests before landing [1]. I'll run my patches through TRY now and see if I actually break those tests. Potentially [1] is going to block this bug (but it's already in review, so shouldn't take that long anymore).

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1122236
Flags: needinfo?(mozilla)
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/17-18/
Depends on: 1122236
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/18-19/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/19-20/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/20-21/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/21-22/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/22-23/
We assume the CSP tests were incorrectly passing before and that they
are now failing as expected.

Review commit: https://reviewboard.mozilla.org/r/45473/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/45473/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/23-24/
Comment on attachment 8739978 [details]
MozReview Request: Bug 1123506 - Update expectations for certain CSP tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/45473/diff/1-2/
Comment on attachment 8739978 [details]
MozReview Request: Bug 1123506 - Update expectations for certain CSP tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/45473/diff/2-3/
Attachment #8739978 - Attachment description: MozReview Request: Bug 1123506 - Disable certain CSP tests → MozReview Request: Bug 1123506 - Update expectations for certain CSP tests
Comment on attachment 8739978 [details]
MozReview Request: Bug 1123506 - Update expectations for certain CSP tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/45473/diff/3-4/
Comment on attachment 8739978 [details]
MozReview Request: Bug 1123506 - Update expectations for certain CSP tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/45473/diff/4-5/
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/24-25/
Comment on attachment 8739978 [details]
MozReview Request: Bug 1123506 - Update expectations for certain CSP tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/45473/diff/5-6/
Comment on attachment 8739978 [details]
MozReview Request: Bug 1123506 - Update expectations for certain CSP tests

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/45473/diff/6-7/
Andrew, 

We have a navigation-timing failure in WPT and were wondering who would be the best person to ask for guidance here.
Flags: needinfo?(overholt)
bz can perhaps help
Flags: needinfo?(overholt) → needinfo?(bzbarsky)
After carefully changing the expected outcomes of the failing CSP tests, I discovered there’s a further failure in the WPT test testing/web-platform/tests/navigation-timing/test_document_open.html.

If I disable that test, another test in the same run starts to fail instead (testing/web-platform/tests/navigation-timing/test_navigate_within_document.html).  I suspect state bleeds over from some previous test to cause this, as both test pass fine locally; both individually and when running the whole directory.

However, because I know next to nothing about neither the tests or `window.performance`, it’s quite hard for me to tell if the tests are sane.

The latest try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d6c4ca16e03f10800e7546f607c6a7621e1d341f
I know something about window.performance, and testing/web-platform/tests/navigation-timing/test_document_open.html looks reasonable to me.  In what way does it fail for you when it fails, exactly?

Also, I don't understand why the changes in this bug affect WPT at all.  Which aspects of the WPT harness does this bug touch, and in what way does it change them?
Flags: needinfo?(bzbarsky)
The intersection is that the wpt harness uses marionette to control the harness, which it does by injecting script, and this patch alters the environment in which marionette-injected scripts run. For most tests this seems to be fine, but for a small number, the test result is changed.
OK, but I'm looking for specifics.  What's different about the environment?  Is it just that it used to run against Xrays but now is running against Xray waivers?  Which scripts are run in this changed environment, and against which global?  For the specific case of testing/web-platform/tests/navigation-timing/test_document_open.html I would expect we're not talking about any of the scripts run via <script> tags in that file, obviously, but are there any scripts at all running against that window in the changed environment?  Or are we only talking about scripts run against the harness global itself (in the other tab)?
Fair enough, sorry.

https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/harness/wptrunner/executors/testharness_marionette.js is the script that's run from the wpt side to load the test and collect the results (modulo some constants that are substituted in at runtime). AIUI the changes in this patch are expected to be opt-in. ato will have to explain in more detail.
Flags: needinfo?(ato)
wptrunner injects https://github.com/w3c/wptrunner/blob/master/wptrunner/executors/testharness_marionette.js which uses window.open to run the test file in a new child window.  The testharness.js executes each test iteratively and posts results back to the parent window.  When the test results have come in, it calls the marionetteScriptFinished callback with the test results.

The test itself is as far as I can tell run against the regular content global environment, i.e. outside of a sandbox.  The script that is injected is run in a sandbox environment that _should_ be the same environment as before this patch (https://hg.mozilla.org/try/rev/02e9675c2ca9702d44417e9718d4a1a8b0659648#l7.212), as the “lasting side-effects” feature this patch implements is gated behind a Marionette command parameter in order to be activated.

Interestingly the test failure is a TIMEOUT, which means the marionetteScriptFinished callback is never reached, i.e. the test never finishes.  This happens consistently on try (both on Buildbot and Taskcluster), but I’ve so far been unable to reproduce it by running the test file alone, or by running the whole directory.
Flags: needinfo?(ato)
The current code to create a sandbox is https://github.com/mozilla/gecko-dev/blob/master/testing/marionette/listener.js#L506, but of course I’m not sure at this point if that’s to blame.  Any advice you can give for debugging this would be most appreciated!

Other possible theories include the possibility that this patch slows down or makes Marionette faster, so that the usual timing expectations are off, but it doesn’t answer why it doesn’t happen when you run the test locally.  Another is some form of environment state bleeding over from previous tests.  Are we letting sandboxes stay in scope somehow and not nuking them?  Can sandboxes somehow affect the test?
Here is a recent try run with all the WPT tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=112ed933f074dac3573f1b2145d1d776b6bb5600

And another with just the testing/web-platform/tests/navigation-timing directory:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=44699d1978bd35db7fc06b964212531c79a3dd2f

At least this matches what I see when the tests are run locally: They pass when run individually or even on a by-directory granularity, but fail in the context of a try batch.
Some progress, finally!  If you compare the try runs https://treeherder.mozilla.org/#/jobs?repo=try&revision=44699d1978bd35db7fc06b964212531c79a3dd2f and https://treeherder.mozilla.org/#/jobs?repo=try&revision=6157729a9dccc278782f5fc3ea07c28778be88f7, you will find that the preceding CSP tests makes test_document_open.html time out.  Somehow one or more CSP tests put the browser into a state causing a navigation timing test to have an unexpected timeout.

The next step is, I think, to see if any of the updated expectations for the CSP tests (https://hg.mozilla.org/try/rev/37625e16f5e61e999e257f46984d67a524d3592a) is the root cause of this.
It does not appear that the tests that we changed expectation statuses for are the culprits causing test_document_open.html to time out: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2bee927086d01f5455fc5f0a56b4df37474f1e08
I'm still trying to figure out what the deal is with test_document_open here, but I think I've found a race in the mixed-content tests that https://reviewboard.mozilla.org/r/45473/diff/7#index_header is changing.  I'll try to address that in bug 1184351.  So it is in fact possible that something got changed about timing by these patches.

Back to test_document_open... Can we instrument it with some logging to see which things happen and which ones don't before the test times out?  Specifically, is the onload_test function called?  If so, is the testTimingWithDocumentOpen function called?  And if so, is the anonymous function that makes that done() call in there called?
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/25-26/
Attachment #8739978 - Attachment is obsolete: true
Andreas, it's possible that the issue described in bug 1270181 is what was causing your timeouts in test_document_open.html.  Looking at the log where that test times out, we have:

08:55:39     INFO - TEST-START | /navigation-timing/test_document_open.html
08:55:49     INFO - PROCESS | 1861 | MARIONETTE LOG: INFO: Timeout fired
08:55:49     INFO - TEST-PASS | /navigation-timing/test_document_open.html | window.performance is defined 
08:55:49     INFO - TEST-UNEXPECTED-TIMEOUT | /navigation-timing/test_document_open.html | expected OK

which is a timeout in 0.2 seconds, which is clearly daft.  But about 10 seconds before that (the wpt test timeout in opt builds), there are mixed-content form-tag tests running, which are precisely the ones that trigger the buggy timeout thing I mention in bug 1270181.

So I think the test_document_open bits should be fixed now, along with the other issues you were seeing in the mixed-content tests.
__marionetteParams was renamed to __webDriverArguments.

Review commit: https://reviewboard.mozilla.org/r/51345/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/51345/
Attachment #8706039 - Attachment description: MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r?automatedtester → MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester
Attachment #8750283 - Flags: review?(dburns)
Comment on attachment 8706039 [details]
MozReview Request: Bug 1123506 - Evaluate scripts in content with lasting side-effects; r=automatedtester

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/30233/diff/26-27/
Comment on attachment 8750283 [details]
MozReview Request: Bug 1123506 - Update references to __webDriverArguments; r?automatedtester

https://reviewboard.mozilla.org/r/51345/#review48087
Attachment #8750283 - Flags: review?(dburns) → review+
(In reply to Boris Zbarsky [:bz] from comment #79)
> Andreas, it's possible that the issue described in bug 1270181 is what was
> causing your timeouts in test_document_open.html.

The try run I did, without modifying CSP test expected outcomes, confirmed that this was the cause.

Thank you so much for looking into this!  It has been of great help.
Flags: needinfo?(ato)
The landing of this patch on inbound broke a couple of our firefox-ui tests in non-e10s mode:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=Firefox%20UI&filter-tier=2&filter-tier=3&fromchange=0c26faffd7e6ad64df8b4fba8b90cee0133be268&selectedJob=27509837

I assume we should get this fixed, or not? Otherwise we will have to start marking tests as skipped in non-e10s mode.
Flags: needinfo?(ato)
Andreas is not the right person to know if we need to support non-e10s mode
Flags: needinfo?(ato)
issue is 


16:34:02     INFO -  1462811642727	Marionette	TRACE	conn171 <- [1,324,null,{}]
16:34:02     INFO -  1462811642741	Marionette	TRACE	conn171 -> [0,325,"executeScript",{"scriptTimeout":null,"newSandbox":true,"args":["app.support.baseURL","https://support.mozilla.org/1/firefox/%VERSION%/%OS%/%LOCALE%/"],"filename":"prefs.py","script":"\n              Components.utils.import(\"resource://gre/modules/Services.jsm\");\n              let prefBranch = Services.prefs;\n\n              let pref_name = arguments[0];\n              let value = arguments[1];\n\n              let type = prefBranch.getPrefType(pref_name);\n\n              // If the pref does not exist yet, get the type from the value\n              if (type == prefBranch.PREF_INVALID) {\n                switch (typeof value) {\n                  case \"boolean\":\n                    type = prefBranch.PREF_BOOL;\n                    break;\n                  case \"number\":\n                    type = prefBranch.PREF_INT;\n                    break;\n                  case \"string\":\n                    type = prefBranch.PREF_STRING;\n                    break;\n                  default:\n                    type = prefBranch.PREF_INVALID;\n                }\n              }\n\n              switch (type) {\n                case prefBranch.PREF_BOOL:\n                  prefBranch.setBoolPref(pref_name, value);\n                  break;\n                case prefBranch.PREF_STRING:\n                  prefBranch.setCharPref(pref_name, value);\n                  break;\n                case prefBranch.PREF_INT:\n                  prefBranch.setIntPref(pref_name, value);\n                  break;\n                default:\n                  return false;\n              }\n\n              return true;\n            ","sandbox":"default","line":202}]
16:34:02     INFO -  1462811642766	Marionette	TRACE	conn171 <- [1,325,{"error":"webdriver error","message":null,"stacktrace":null},null]
16:34:02     INFO -  Marionette threw an error: [Exception... "Component not initialized"  nsresult: "0xc1f30001 (NS_ERROR_NOT_INITIALIZED)"  location: "JS frame :: chrome://marionette/content/evaluate.js :: evaluate.sandbox/promise< :: line 136"  data: no]
16:34:02     INFO -  evaluate.sandbox/promise<@chrome://marionette/content/evaluate.js:136:17
16:34:02     INFO -  evaluate.sandbox@chrome://marionette/content/evaluate.js:103:17
16:34:02     INFO -  GeckoDriver.prototype.execute_@chrome://marionette/content/driver.js:883:29
16:34:02     INFO -  GeckoDriver.prototype.executeScript@chrome://marionette/content/driver.js:783:27
16:34:02     INFO -  TaskImpl_run@resource://gre/modules/Task.jsm:319:40
16:34:02     INFO -  TaskImpl@resource://gre/modules/Task.jsm:280:3
16:34:02     INFO -  createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:254:14
16:34:02     INFO -  Task_spawn@resource://gre/modules/Task.jsm:168:12
16:34:02     INFO -  TaskImpl_handleResultValue@resource://gre/modules/Task.jsm:388:16
16:34:02     INFO -  TaskImpl_run@resource://gre/modules/Task.jsm:327:13
16:34:02     INFO -  TaskImpl@resource://gre/modules/Task.jsm:280:3
16:34:02     INFO -  createAsyncFunction/asyncFunction@resource://gre/modules/Task.jsm:254:14
16:34:02     INFO -  Task_spawn@resource://gre/modules/Task.jsm:168:12
16:34:02     INFO -  Dispatcher.prototype.execute@chrome://marionette/content/dispatcher.js:120:13
16:34:02     INFO -  Dispatcher.prototype.onPacket@chrome://marionette/content/dispatcher.js:91:5
16:34:02     INFO -  DebuggerTransport.prototype._onJSONObjectReady/<@chrome://marionette/content/server.js -> resource://devtools/shared/transport/transport.js:479:9
16:34:02     INFO -  exports.makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/ThreadSafeDevToolsUtils.js:101:14
16:34:02     INFO -  exports.makeInfallible/<@resource://gre/modules/commonjs/toolkit/loader.js -> resource://devtools/shared/ThreadSafeDevToolsUtils.js:101:14
16:34:02     INFO -  1462811642842	Marionette	TRACE	conn171 -> [0,326,"setContext",{"value":"chrome"}]
Ryan,

What is the policy for non-e10s bustage these days?
Flags: needinfo?(ryanvm)
David, this is also happening in e10s mode. So it affects both modes, and need to be definitely fixed.
non-e10s is something we're going to be supporting for awhile still (we're a long ways out still from shipping e10s enabled across the board for all users). It shouldn't be a factor in any decision about whether to fix a test issue or not.
Flags: needinfo?(ryanvm)
https://hg.mozilla.org/mozilla-central/rev/7138a254ad70
https://hg.mozilla.org/mozilla-central/rev/3e58e4842984
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
(In reply to Henrik Skupin (:whimboo) from comment #87)
> The landing of this patch on inbound broke a couple of our firefox-ui tests
> in non-e10s mode:
> 
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-
> searchStr=Firefox%20UI&filter-tier=2&filter-
> tier=3&fromchange=0c26faffd7e6ad64df8b4fba8b90cee0133be268&selectedJob=275098
> 37
> 
> I assume we should get this fixed, or not? Otherwise we will have to start
> marking tests as skipped in non-e10s mode.

I have a suspicion that this comes from trying to call setTimeout on the window object in chrome.  Can you file a separate follow-up bug on this?
FYI the unit test as written for this patch in test_execute_script.py is not getting executed on desktop due to bug 1272109. If it's enabled it fails!
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.