Closed Bug 1134238 Opened 10 years ago Closed 10 years ago

all talos on e10s fails to run since Feb 10th.


(Testing :: Talos, defect)

Not set



Tracking Status
e10s + ---


(Reporter: jmaher, Assigned: jmaher)




(1 file, 1 obsolete file)

with this changeset we fail all linux/windows talos on e10s:

here is the treeherder view: (note, the jobs are hidden by default)

I assume we can backout a bunch of these patches and figure out the root cause of the failure.
odd, I ran locally on a build from inbound (build today) and it works as expected.  This could be pgo related or something on linux64 which works
Flags: needinfo?(jmathies)
Looking into getting a more precise regression window here.
in trying to debug this, I put some dump statements in the pageloader code and on a build from feb 9th, I can run e10s mode and see my dump statements from inside the contentLoadHandler function:

using a recent build, I don't see my dump statement printed out, although I do see a dump statement printed out indicating I added an eventListener for 'load'.  So the dump statements work, but the load event is failing to either be sent or call the function.
Using mozregression I bisected builds until I got this pushlog:

To reproduce, run talos with e10s enabled against your firefox executable:

talos -a tsvgx --e10s --develop -e /path/to/firefox

If it only loads one page from that suite and then dies, you've got a build with the bug. From the pushlog, it appears that it was changes introduced in bug 1047603 that caused this problem.
Depends on: 1047603
Flags: needinfo?(jmathies)
So my theory here is that the pageloader extension that talos relies on needs to be updated to deal with an assumption that bug 1047603 changed.

Bug 1047603 changes the following assumption:

For newly opened e10s windows, the initial browser is remote.

Bug 1047603 makes this no longer true. The initial browser is only ever made remote once it starts to load a page that it deems should become remote.

Marionette had to deal with this problem as well, and there is some platform code that does require us to make that initial browser remote ASAP to work properly, so there's a function "forceInitialBrowserRemote" that is exposed that perhaps pageloader could use.

I'm not 100% familiar with how pageloader starts up, but I assume it opens a browser window, and starts interacting with it. What I suggest is that the first thing that pageloader does as soon as the browser window opens, is to force the initial tab to become remote by calling that function.

For example, here's what we did for marionette:

jmaher - do you think this solution could be adapted for pageloader?
Flags: needinfo?(jmaher)
Assignee: nobody → jmaher
Flags: needinfo?(jmaher)
Attachment #8566702 - Flags: review?(mconley)
Comment on attachment 8566702 [details] [diff] [review]
force new window to be remote (1.0)

Review of attachment 8566702 [details] [diff] [review]:

::: talos/pageloader/chrome/pageloader.js
@@ +175,5 @@
>        window.resizeTo(10,10);
>        var browserLoadFunc = function (ev) {
>          browserWindow.removeEventListener('load', browserLoadFunc, true);
> +        browserWindow.XULBrowserWindow.forceInitialBrowserRemote();

This should only ever occur if gMultiProcessBrowser is true, so we should wrap that in a conditional. Otherwise, I suspect this will break the non-e10s case.

We should also add some documentation to explain what the hell is going on. Something like:

"For e10s windows, the initial browser is not remote until it attempts to browse to some URI that should be remote. We bypass this restriction by forcing the initial browser to be remote before it attempts to go anywhere."
Attachment #8566702 - Flags: review?(mconley) → review-
updated to check if we are in a remote setting and added a comment.
Attachment #8566702 - Attachment is obsolete: true
Blocks: 1134824
Closed: 10 years ago
Resolution: --- → FIXED
So I've been speaking with avih and jmaher in #perf.

I suspect that this patch broke TART for e10s. TART has a chrome:// page for managing the tests, and I believe it is into the content of this page that the add-on injects a function for dumping out the test results. When run with bug 1134238, it sounds like TART is never calling that function.

I believe this is because Talos is forcing the initial browser to be remote, then TART injects the function into the content of the initial browser, and then the chrome URI loads. The chrome URI is not one that we would normally load in the content process, so the browser is updated to be non-remote, which invalidates anything set on its "content". I suspect that this is why the function is not being called. (Note that this is all speculation on my part, and I haven't actually tested this to see if this is the case).

Reverting the patch seems to allow TART to run and complete, though I do not recommend backing this patch out, as this would break Talos entirely. I think it's prudent to find a  workaround instead for the TART test.

I suggest waiting until the chrome:// page loads before injecting any functions into the content.
Depends on: 1138749
You need to log in before you can comment on or make changes to this bug.