Closed Bug 1224294 Opened 9 years ago Closed 8 years ago

Remove all XUL overlays from mochitest

Categories

(Testing :: Mochitest, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1231784

People

(Reporter: ahal, Assigned: ahal)

References

Details

Attachments

(3 files)

Because of the new addon signing requirement, we will no longer be able to use any XUL overlays in mochitest. Any unsigned addons will only be able to get installed at runtime, which means they must be restartless, which means they can't have any overlays.

Removing the auto-start feature of the overlays is also required to use the new restartless specialpowers addon.
Blocks: 1219442
oh, we have browser-harness.xul, browser-test-overlay.xul, jetpack-addon-overlay.xul, jetpack-package-overlay.xul, and harness.xul, in fact dxr shows where these overlays are used:
https://dxr.mozilla.org/mozilla-central/search?q=path%3Atesting%2Fmochitest+overlay&redirect=true&case=true

Talos has overlays as well- fun times.
Lol, was just about to paste that link, but mid-aired. Yeah, the chrome.manifest is written and installed programmatically in the addChromeToProfile method.. as if this wasn't complicated enough. It eventually gets copied over to the profile here:
https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#881

The good news is that browser-harness.xul and harness.xul aren't overlays themselves, so we should be able to keep them as is. It's just browser-test-overlay.xul and the jetpack ones that we need to replace.
I don't think they're serving any real useful purpose, TBH. We could probably rip that all out and replace it with Marionette if we were super motivated, but just moving it into the base .xul file should be fine.
I've done some digging, and decided to post my findings here before I completely forget everything again. Here is how mochitest browser-chrome and chrome tests get kicked off. Up to a point, chrome and browser-chrome share the same steps, then they diverge. Bear with me..

Common Steps:
1. A xul overlay called 'browser-test-overlay.xul' is defined for browser.xul as part of a string
2. The 'chrome' string is saved as a chrome.manifest in the mochikit extension in the profile dir
3. 'browser-test-overlay.xul' imports some stuff, most notably 'browser-test.js'
4. 'browser-test.js' adds a MozAfterPaint event listener on import, and when fired, a 'testInit' function gets called.

Browser chrome:
5. 'testInit' opens a new window pointing at 'browser-harness.xul'
6. 'browser-harness.xul' defines an inline script and calls 'TestStart' on load
7. A series of function calls later, Tester.start() is called (which is defined back in browser-test.js)
8. Eventually a function called execTest is called.. we're off to the races

Chrome (back to step 4):
9. 'testInit' defines a function that listens for a "chromeEvent", pulls out a magic url and loads it
10. 'testInit' injects some JS from a string that listens for a "contentEvent" and fires a "chromeEvent"
.. meanwhile back in python ..
11. runtests.py sets the testUrl to 'redirect.html' (see buildTestUrl())
12. testUrl is appended to the firefox binary arguments (so redirect.html gets opened to start)
13. redirect.html adds some listeners, but eventually fires a "chromeEvent" with harness.xul as the url
.. aha, this is the magic url that gets loaded by the listener in step 9! On to harness.xul ..
14. harness.xul loads the SimpleTest scripts, and eventually calls 'hookup()'
15. 'hookup()' is defined in setup.js, and in turn calls 'RunSet.runall()'
16. Which eventually calls TestRunner.runTests(), and we're off to the races

And that's all there is to it, easy as pie :p. This is probably way more information than we'll need to fix this bug, but for the first time ever.. I actually understand how browser-chrome/chrome are getting launched, and I wanted to write that down somewhere.

[1] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#944
[2] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#881
[3] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test-overlay.xul#10
[4] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#41
[5] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#110
[6] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-harness.xul#9
[7] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-harness.xul#255
[8] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#677
[9] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#122
[10] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#125
[11] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#513
[12] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#1720
[13] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/redirect.html#10
[14] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/harness.xul#59
[15] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/setup.js#260
[16] https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/setup.js#188
Bug 1224294 - Remove xul overlays from mochitest
The above patch mostly works. It removes browser-test-overlay.xul and browser chrome tests run and pass. But there's a marionette error that happens in between every test:
JavaScript error: chrome://marionette/content/driver.js, line 3215: TypeError: this.browserForTab is null

It doesn't seem to impact the results of the test, but it is very prevalent in the logs and is misleading enough that it probably blocks landing this. This error doesn't happen with 'plain' mochitests, so far just browser-chrome (haven't got chrome working yet).
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
This could be a Marionette bug; Marionette was probably attached to a window that vanished, or something like that. Adding David who might know more.
Here's the failing line for convenience:
https://dxr.mozilla.org/mozilla-central/source/testing/marionette/driver.js#3215

And just to clarify, the tests aren't actually using marionette for anything. The only difference is that browser-chrome is now being started with the '-marionette' flag (and using marionette for to execute a setup script). So it definitely seems like the browser-chrome harness could be breaking an assumption in marionette, or exposing a bug that wasn't previously exposed before.
(In reply to Andrew Halberstadt [:ahal] from comment #8)
> Here's the failing line for convenience:
> https://dxr.mozilla.org/mozilla-central/source/testing/marionette/driver.
> js#3215
> 
> And just to clarify, the tests aren't actually using marionette for
> anything. The only difference is that browser-chrome is now being started
> with the '-marionette' flag (and using marionette for to execute a setup
> script). So it definitely seems like the browser-chrome harness could be
> breaking an assumption in marionette, or exposing a bug that wasn't
> previously exposed before.

Andreas, any idea what's going on here? See comment #6
Flags: needinfo?(ato)
I believe it's because browser-chrome is removing tabs here:
https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#284

But marionette's internal 'this.tab' value isn't getting updated.
Depends on: 1227252
I think I have a "handle" on the problem. Let's move it to bug 1227252.
Flags: needinfo?(ato)
Comment on attachment 8690881 [details]
MozReview Request: Bug 1224294 - Remove xul overlays from mochitest

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25985/diff/1-2/
(In reply to Andrew Halberstadt [:ahal] from comment #11)
> I think I have a "handle" on the problem. Let's move it to bug 1227252.

Yes, this was just a case of Marionette caching a reference to the tab of the browser, which as was pointed out, disappears.
Comment on attachment 8690881 [details]
MozReview Request: Bug 1224294 - Remove xul overlays from mochitest

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25985/diff/2-3/
There are two test failures in comment 13 that are blocking this from landing:

1. browser/fuel/test/browser_Browser.js (see log [1])

This test passes, but leaks global windows after the fact. This issue actually has nothing to do with the overlay being removed, but is happening simply because marionette is initialized and running. Just adding a marionette.start_session() call to the python harness is enough to reproduce this failure.

2. devtools/client/commandline/test/browser_cmd_commands.js (see log [2])

This seems to be related to the overlay being removed. Or at least it isn't related to marionette being initialized.

I'll keep poking, but I think I'm about as far as I'm going to get on these. I'll file new bugs for each test and try to get someone who might know what's going on to take a look.

[1] https://treeherder.mozilla.org/logviewer.html#?job_id=14115279&repo=try
[2] https://treeherder.mozilla.org/logviewer.html#?job_id=14115272&repo=try
Depends on: 1229465
Depends on: 1220011
A fix for the devtools test just landed on fx-team. And I have permission to disable the fuel test as fuel is about to be removed from the tree altogether anyway. Here's a full (and hopefully last) try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3bfcd0544c4c
Yikes, that didn't go so well. I guess my try runs were a little to restrictive before, and I have some more work to do..
Blocks: 1231784
Blocks: 1233200
There are two problems here:

1) Android mochitest-chrome still needs the overlay because there is no marionette support for android.
2) The browser-chrome timeouts and leaks on linux desktop (the windows/macosx failures look like unrelated infra bustage)

The first problem will need to be ignored for now. Apparently android won't require signatures until something like 47, so we have some time to figure it out. We'll likely need to at least implement 'execute_script' marionette support for android. In the meantime, I'll keep the overlays around for android.. but won't use them for desktop/b2g.

The second problem seems to happen simply by having a marionette session enabled during a mochitest-browser-chrome run. I need to do more digging here.
could we launch the browser with marionette to install our addons, then close it down and restart it the current way, this time without marionette involved?
No, SpecialPowers will automatically be uninstalled as soon as the browser shuts down. It's why we're allowed to install it even though it isn't signed.

But you raise an interesting idea.. maybe we can use marionette to bootstrap, and then kill the session and it would be enough. It's worth a shot. Though I'd see this as a last resort. If there is an actual problem, it would be nice to fix it.
Been awhile since the last update..

Deleting the marionette session fixed the bad interaction between browser-test.js and the marionette session. It'll still need to be fixed eventually, but it's not a blocker anymore. Unfortunately, there are still a whole ton of problems with the overlay patch. I fixed a bunch of debug memory leaks recently, but all the e10s browser-chrome tests have timeouts, and there a smattering of non-e10s test specific failures as well. Here's the latest state on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e33d752f81dd&group_state=expanded
I have a better idea of what is happening, but still no idea how to fix it. The e10s timeouts arise because the `content` global variable is null in test scope. At the start, `content` is an [object Window], but it becomes null as soon as browser-test.js replaces the current tab with a fresh one here:
https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#313

Interestingly, without my patch applied (Log A), content is changed from an [object Window] to an [object Object]. Whereas with my patch (Log B) it simply becomes null. My first thought was that the dummy object in Log A was an automatic CPOW, but if that were the case, I believe it would have way more than 4 properties and would say 'CPOW' somewhere in the toString. I decided to try manually copying the `content` variable to chrome scope via framescript and message manager anyway, but that resulted in a lot of errors and eventually "intentional crash due to CPOW racing with synchronous message".

TL;DR - the fact that browser-test.js is now loaded via mozIJSSubScriptLoader as opposed to included via overlay, is somehow affecting the content variable when a new tab is created. My best lead is to figure out what that dummy object from Log A is, and where it's getting generated.
In case it wasn't clear, the attached logs are simply putting a `console.log(content);` before and after the if block here:
https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#313
Talked to mconley on irc. The dummy content window object is coming from here:
https://dxr.mozilla.org/mozilla-central/source/toolkit/components/addoncompat/RemoteAddonsParent.jsm#801

It's likely being generated because we call gBrowser.stop() right after opening the new tab, so the document never finished loading and a proper CPOW is never created. I also learned that the automatic CPOW shims are only enabled for addon compartments. The fact that I'm using marionette.execute_script, likely means automatic e10s shims are not enabled anymore.
Assuming my understanding of the situation is correct (a big assumption), I can think of four options forward:

1) Re-write the addon shim (called an interposition service) to work with non-addon code, and provide an option to turn it on for marionette executeScript compartments. According to mconley, this will be hard and borderline WONTFIX. They want to remove the automatic shims eventually anyway, so lets move on.

2) Write a new ghetto "shim" that uses bits and pieces of the addon shim code where possible. This is a hacky version of option 1, basically try and fake it until we make it. I successfully faked the content variable with the createDummyContentWindow function linked from comment 27, but promptly ran into another error immediately after. This could be very do-able, or it could be very painful. If I had to guess, I'd say the latter is more likely.

3) Make everything e10s aware. This is the ideal case, where all the executed code uses message managers properly. Unfortunatley, "all the executed code" means every browser-chrome and devtools test. This could take awhile and is almost certainly no do-able in the timeline that we want addons-signing enabled in.

4) Drop the marionette bootstrap idea, and instead make mochitest restartless and install it on the fly via loadTemporaryAddon just like specialpowers is. This way, all test execution would happen inside the addon compartment and automatic CPOW generation would be enabled again. This has the side benefit of making it unnecessary to sign the mochitest extension itself, which makes it much easier to work on it.

#4 is by far the most attractive option imo. I'll see if I can get it converted without too much effort.
Good news, I got mochitest converted to restartless and am successfully running plain, chrome and browser-chrome while installing specialpowers and the mochikit extension at runtime. Still yet to see try results, so not keeping my hopes too high.

The work and patches in this bug are largely obsolete (though my new patch incorporates pieces of them). The original intention was to try to do this work incrementally (e.g, remove overlays first, then convert to restartless, then install at runtime), but that has become way more trouble than it's worth.

I'm going to continue work in bug 1231784, which should include removing overlays. So marking this as a dupe.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: