Closed
Bug 1134238
Opened 9 years ago
Closed 9 years ago
all talos on e10s fails to run since Feb 10th.
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(e10s+)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
e10s | + | --- |
People
(Reporter: jmaher, Assigned: jmaher)
References
Details
Attachments
(1 file, 1 obsolete file)
1.37 KB,
patch
|
mconley
:
review+
|
Details | Diff | Splinter Review |
with this changeset we fail all linux/windows talos on e10s: https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=3d3f1b07ef0f here is the treeherder view: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=3d3f1b07ef0f (note, the jobs are hidden by default) I assume we can backout a bunch of these patches and figure out the root cause of the failure.
Updated•9 years ago
|
tracking-e10s:
--- → ?
Assignee | ||
Comment 1•9 years ago
|
||
odd, I ran locally on a build from inbound (build today) and it works as expected. This could be pgo related or something on linux64 which works
Updated•9 years ago
|
![]() |
||
Updated•9 years ago
|
Flags: needinfo?(jmathies)
Comment 2•9 years ago
|
||
Looking into getting a more precise regression window here.
Assignee | ||
Comment 3•9 years ago
|
||
in trying to debug this, I put some dump statements in the pageloader code and on a build from feb 9th, I can run e10s mode and see my dump statements from inside the contentLoadHandler function: http://hg.mozilla.org/build/talos/file/tip/talos/pageloader/chrome/pageloader.js#l188 using a recent build, I don't see my dump statement printed out, although I do see a dump statement printed out indicating I added an eventListener for 'load'. So the dump statements work, but the load event is failing to either be sent or call the function.
Comment 4•9 years ago
|
||
Using mozregression I bisected builds until I got this pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=a793dd926e90&tochange=ed3e996b3805 To reproduce, run talos with e10s enabled against your firefox executable: talos -a tsvgx --e10s --develop -e /path/to/firefox If it only loads one page from that suite and then dies, you've got a build with the bug. From the pushlog, it appears that it was changes introduced in bug 1047603 that caused this problem.
Depends on: 1047603
![]() |
||
Updated•9 years ago
|
Flags: needinfo?(jmathies)
Comment 5•9 years ago
|
||
So my theory here is that the pageloader extension that talos relies on needs to be updated to deal with an assumption that bug 1047603 changed. Bug 1047603 changes the following assumption: For newly opened e10s windows, the initial browser is remote. Bug 1047603 makes this no longer true. The initial browser is only ever made remote once it starts to load a page that it deems should become remote. Marionette had to deal with this problem as well, and there is some platform code that does require us to make that initial browser remote ASAP to work properly, so there's a function "forceInitialBrowserRemote" that is exposed that perhaps pageloader could use. I'm not 100% familiar with how pageloader starts up, but I assume it opens a browser window, and starts interacting with it. What I suggest is that the first thing that pageloader does as soon as the browser window opens, is to force the initial tab to become remote by calling that function. For example, here's what we did for marionette: https://reviewboard.mozilla.org/r/3459/diff/#0 jmaher - do you think this solution could be adapted for pageloader?
Flags: needinfo?(jmaher)
Assignee | ||
Comment 6•9 years ago
|
||
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Flags: needinfo?(jmaher)
Attachment #8566702 -
Flags: review?(mconley)
Comment 7•9 years ago
|
||
Comment on attachment 8566702 [details] [diff] [review] force new window to be remote (1.0) Review of attachment 8566702 [details] [diff] [review]: ----------------------------------------------------------------- ::: talos/pageloader/chrome/pageloader.js @@ +175,5 @@ > window.resizeTo(10,10); > > var browserLoadFunc = function (ev) { > browserWindow.removeEventListener('load', browserLoadFunc, true); > + browserWindow.XULBrowserWindow.forceInitialBrowserRemote(); This should only ever occur if gMultiProcessBrowser is true, so we should wrap that in a conditional. Otherwise, I suspect this will break the non-e10s case. We should also add some documentation to explain what the hell is going on. Something like: "For e10s windows, the initial browser is not remote until it attempts to browse to some URI that should be remote. We bypass this restriction by forcing the initial browser to be remote before it attempts to go anywhere."
Attachment #8566702 -
Flags: review?(mconley) → review-
Assignee | ||
Comment 8•9 years ago
|
||
updated to check if we are in a remote setting and added a comment.
Attachment #8566702 -
Attachment is obsolete: true
Updated•9 years ago
|
Attachment #8566713 -
Flags: review+
Assignee | ||
Comment 9•9 years ago
|
||
https://hg.mozilla.org/build/talos/rev/0f48d2fb3fe3
Assignee | ||
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 10•9 years ago
|
||
So I've been speaking with avih and jmaher in #perf. I suspect that this patch broke TART for e10s. TART has a chrome:// page for managing the tests, and I believe it is into the content of this page that the add-on injects a function for dumping out the test results. When run with bug 1134238, it sounds like TART is never calling that function. I believe this is because Talos is forcing the initial browser to be remote, then TART injects the function into the content of the initial browser, and then the chrome URI loads. The chrome URI is not one that we would normally load in the content process, so the browser is updated to be non-remote, which invalidates anything set on its "content". I suspect that this is why the function is not being called. (Note that this is all speculation on my part, and I haven't actually tested this to see if this is the case). Reverting the patch seems to allow TART to run and complete, though I do not recommend backing this patch out, as this would break Talos entirely. I think it's prudent to find a workaround instead for the TART test. I suggest waiting until the chrome:// page loads before injecting any functions into the content.
You need to log in
before you can comment on or make changes to this bug.
Description
•