Closed Bug 852847 Opened 11 years ago Closed 11 years ago

Fix a lot of browser-element orange

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla22
blocking-b2g tef+
Tracking Status
firefox20 --- wontfix
firefox21 --- wontfix
firefox22 --- fixed
firefox23 --- fixed
b2g18 --- fixed
b2g18-v1.0.0 --- wontfix
b2g18-v1.0.1 --- fixed

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

Attachments

(5 files, 4 obsolete files)

3.44 KB, patch
kk1fff
: review+
Details | Diff | Splinter Review
90.58 KB, patch
kk1fff
: review+
Details | Diff | Splinter Review
2.21 KB, patch
kk1fff
: review+
Details | Diff | Splinter Review
3.21 KB, patch
smaug
: review+
Details | Diff | Splinter Review
1.09 KB, patch
Details | Diff | Splinter Review
I was trying to run the browser-element tests locally today.  They were timing out so frequently that I couldn't get through a single run of all of them!  Who wrote this stuff, anyway?

I think I've managed to fix most of the problems.  I definitely haven't fixed all of them -- there's a random timeout in LoadEvents (definitely OOP, maybe also in-process) that I haven't been able to figure out.  But with my changes I can now run the test suite to completion, which is a marked improvement...

For most of the other failures, what seemed to be happening is that the test was trying to do stuff before the browser-element machinery is loaded.

Arguably, the fact that it's possible to do things "too early" is a bug in the browser-element code.  But I'm not going to fix that in the immediate future.
In fact this test was already disabled; the OOP version was the same as
the in-process version because the test explicitly disabled OOP tabs.
I suppose we could put this in a separate bug.  But I made a few changes in the
previous patches that allow some of the simplifications here.
I don't know if these oranges are solved, but perhaps they are.

https://tbpl.mozilla.org/?tree=Try&rev=edd7446be270
Assignee: nobody → justin.lebar+bug
Thank you for looking at this :-)
And the tryserver results are a sea of orange.  I guess when I thought I was fixing timing issues, I was actually just modifying the timing so it happened to work better on my machine and worse on the testers?  That's a bummer.
Depends on: 854880
Depends on: 854924
This blocks bug 844323 because in that bug I'm adding new browser-element tests, and I rely on changes I'm making here.
Blocks: 844323
In fact this test was already disabled; the OOP version was the same as
the in-process version because the test explicitly disabled OOP tabs.

Patrick, do you feel comfortable reviewing these patches?  Please feel free to say no.
Attachment #730142 - Flags: review?(pwang)
Also set network.disable.ipc.security to true and leave it that way. This prevents security errors in the tests which happen when we pop the pref.
Attachment #730143 - Flags: review?(pwang)
Blocking a blocker.
blocking-b2g: --- → tef?
Attachment #727069 - Attachment is obsolete: true
Attachment #727070 - Attachment is obsolete: true
Attachment #727071 - Attachment is obsolete: true
Attachment #727081 - Attachment is obsolete: true
Comment on attachment 730143 [details] [diff] [review]
Part 2: Do things later in our browser-element tests, thus avoiding doing things before the browser-element machinery has loaded.

Review of attachment 730143 [details] [diff] [review]:
-----------------------------------------------------------------

The tests look more clear, thanks!

In some files, browserElementTestHelpers.setEnabledPref(true) and browserElementTestHelpers.addPermission() aren't moved out of runTest():
browserElement_SetVisible.js
browserElement_Iconchange.js
browserElement_GetScreenshot.js
browserElement_Titlechange.js

r=me with these file fixed.
Attachment #730143 - Flags: review?(pwang) → review+
Attachment #730144 - Flags: review?(pwang) → review+
Attachment #730142 - Flags: review?(pwang) → review+
> In some files, browserElementTestHelpers.setEnabledPref(true) and 
> browserElementTestHelpers.addPermission() aren't moved out of runTest():

Thanks for catching these!
blocking-b2g: tef? → tef+
https://hg.mozilla.org/mozilla-central/rev/3c08796dead2
https://hg.mozilla.org/mozilla-central/rev/a654498f21a9
https://hg.mozilla.org/mozilla-central/rev/9a5f2fcb7363
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla22
Backed out for causing leaks and hangs. Please post b2g18-specific patches that are green on Try.
https://hg.mozilla.org/releases/mozilla-b2g18/rev/d467369d1b0c
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/96f559f949bb

https://tbpl.mozilla.org/php/getParsedLog.php?id=21368738&tree=Mozilla-B2g18

TEST-UNEXPECTED-FAIL | leakcheck | tab process: 379187 bytes leaked (AtomImpl, BackstagePass, BodyRule, DR_State, DocumentRule, ...)
/tests/dom/browser-element/mochitest/test_browserElement_oop_SetVisibleFrames.html leaked 2 DOMWINDOW(s)

https://tbpl.mozilla.org/php/getParsedLog.php?id=21368795&tree=Mozilla-B2g18_v1_0_1

567 INFO TEST-START | /tests/dom/browser-element/mochitest/test_browserElement_inproc_RemoveBrowserElement.html
TEST-UNEXPECTED-FAIL | /tests/dom/browser-element/mochitest/test_browserElement_inproc_RemoveBrowserElement.html | application timed out after 330 seconds with no output
Depends on: 857412
Component: DOM: Mozilla Extensions → DOM
Dumping https://tbpl.mozilla.org/?tree=Try&rev=9a1df749fbd1 here so I don't forget about it.
For comparison, without the rebased patches (and updated slightly to tip): https://tbpl.mozilla.org/?tree=Try&rev=1c1199c51bdf
Flagging myself so I don't lose track of this.
Flags: needinfo?(josh)
remote:   https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/2c6aa7ef0d31
remote:   https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/ee089a0572d5

remote:   https://hg.mozilla.org/releases/mozilla-b2g18/rev/c5d796505942
remote:   https://hg.mozilla.org/releases/mozilla-b2g18/rev/2449b8d0f7e6

The actual mochitest-2 suites were green on the try push, and the leaks shown in 1 and 3 were present in the try push without the patches applied.
Flags: needinfo?(josh)
(In reply to Josh Matthews [:jdm] from comment #29)
> The actual mochitest-2 suites were green on the try push, and the leaks
> shown in 1 and 3 were present in the try push without the patches applied.

FWIW, those leaks are known and expected on b2g18 (and a few other older branches).
Blocks: 863706
(In reply to Ryan VanderMeulen [:RyanVM] from comment #30)
> FWIW, those leaks are known and expected on b2g18 (and a few other older
> branches).

Backed out for leaks that were not expected, however.
https://hg.mozilla.org/releases/mozilla-b2g18/rev/a41f9308a042
https://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/24d45f264146

https://tbpl.mozilla.org/php/getParsedLog.php?id=22015560&full=1&branch=mozilla-b2g18_v1_0_1

Leaked URLs:
  resource://gre-resources/ua.css
  resource://gre-resources/html.css
  chrome://global/content/xul.css
  resource://gre-resources/quirk.css
  resource://gre-resources/full-screen-override.css
  chrome://global/skin/scrollbars.css
  resource://gre-resources/forms.css
  http://mochi.test:8888/tests/dom/browser-element/mochitest/file_browserElement_SetVisibleFrames_Outer.html
  http://mochi.test:8888/tests/dom/browser-element/mochitest/file_browserElement_SetVisibleFrames_Outer.html
  chrome://global/content/bindings/scrollbar.xml#scrollbar
  chrome://global/skin/scrollbar/slider.gif
  chrome://global/skin/arrow/arrow-lft.gif
  chrome://global/content/bindings/scrollbar.xml#thumb
  chrome://global/skin/arrow/arrow-rit.gif
  chrome://global/skin/arrow/arrow-up.gif
  chrome://global/skin/arrow/arrow-dn.gif
  chrome://global/content/bindings/scrollbar.xml#scrollbar-base
  http://mochi.test:8888/tests/dom/browser-element/mochitest/file_browserElement_SetVisibleFrames_Inner.html?child1
  http://mochi.test:8888/tests/dom/browser-element/mochitest/file_browserElement_SetVisibleFrames_Outer.html
  chrome://global/skin/arrow/arrow-rit-dis.gif
  chrome://global/skin/arrow/arrow-lft-dis.gif
  chrome://global/skin/arrow/arrow-dn-dis.gif
  chrome://global/skin/arrow/arrow-up-dis.gif
  http://mochi.test:8888/tests/dom/browser-element/mochitest/file_browserElement_SetVisibleFrames_Inner.html?child1
  http://mochi.test:8888/tests/dom/browser-element/mochitest/file_browserElement_SetVisibleFrames_Inner.html?child1
Since Fedora64 builds fail on try, I'm pushing a version without my potential fix to see if I can see the leaks on Ubuntu, where my last push had many greens: https://tbpl.mozilla.org/?tree=Try&rev=745ad385b2bf
Trying pushing to b2g18, since the leak does not seem to be reproducible on try:

remote:   https://hg.mozilla.org/releases/mozilla-b2g18/rev/92107f1df1bc
remote:   https://hg.mozilla.org/releases/mozilla-b2g18/rev/6a187d801193
remote:   https://hg.mozilla.org/releases/mozilla-b2g18/rev/cd6ccbc2f9c1
remote:   https://hg.mozilla.org/releases/mozilla-b2g18/rev/8aaf6420a5ef

Once more unto the breach, cleaning up as many things in SetVisibleFrames as possible.
Additional changes present in the final patch:

>diff --git a/dom/browser-element/mochitest/browserElement_SetVisibleFrames.js b/dom/browser-element/mochitest/browserElement_SetVisibleFrames.js
>--- a/dom/browser-element/mochitest/browserElement_SetVisibleFrames.js
>+++ b/dom/browser-element/mochitest/browserElement_SetVisibleFrames.js
>@@ -69,16 +68,18 @@ function getVisibleTest2() {
> 
> function finish() {
>   // We need to remove this listener because when this test finishes and the
>   // iframe containing this document is navigated, we'll fire a
>   // visibilitychange(false) event on all child iframes.  That's OK and
>   // expected, but if we don't remove our listener, then we'll end up causing
>   // the /next/ test to fail!
>   iframe.removeEventListener('mozbrowsershowmodalprompt', checkMessage);
>+  iframe.setVisible(false);
>+  document.body.removeChild(iframe);
> 
>   var principal = SpecialPowers.wrap(SpecialPowers.getNodePrincipal(document));
>   SpecialPowers.removePermission("browser", { url: SpecialPowers.wrap(principal.URI).spec,
>                                               appId: principal.appId,
>                                               isInBrowserElement: true });
> 
>   SimpleTest.finish();
> }
>@@ -90,13 +91,15 @@ function expectMessage(msg, next) {
>   expectedMsgCallback = next;
> }
> 
> function checkMessage(e) {
>   var msg = e.detail.message;
>   is(msg, expectedMsg);
>   if (msg == expectedMsg) {
>     expectedMsg = null;
>-    SimpleTest.executeSoon(function() { expectedMsgCallback() });
>+    var callback = expectedMsgCallback;
>+    expectedMsgCallback = null;
>+    SimpleTest.executeSoon(function() { callback() });
>   }
> }
>diff --git a/dom/browser-element/mochitest/file_browserElement_SetVisibleFrames2_Outer.html b/dom/browser-element/mochitest/file_browserElement_SetVisibleFrames2_Outer.html
>--- a/dom/browser-element/mochitest/file_browserElement_SetVisibleFrames2_Outer.html
>+++ b/dom/browser-element/mochitest/file_browserElement_SetVisibleFrames2_Outer.html
>@@ -6,16 +6,17 @@ var iframe = document.createElement('ifr
> iframe.mozbrowser = true;
> 
> iframe.addEventListener('mozbrowsershowmodalprompt', function(e) {
>   if (e.detail.message == 'child:ready') {
>     setTimeout(function() {
>       iframe.setVisible(false);
>       iframe.setVisible(true);
>       setTimeout(function() {
>+        iframe.setVisible(false);
>         alert('parent:finish');
>       }, 0);
>     }, 0);
>   }
>   else {
>     // Pass the message up to our parent.
>     alert(e.detail.message);
>   }
The setVisible call in SetVisibleFrames.js definitely made a difference for me locally; without it, the test always leaked when I ran it. I doubt the one in SetVisibleFrames2_Outer made a difference, and I'm unsure whether the removeChild or callback change got rid of the final intermittent leak on tbpl.
Leaking on OSX 10.6/10.7 now. Backed out :(
https://hg.mozilla.org/releases/mozilla-b2g18/rev/d74b5adee143

/tests/dom/browser-element/mochitest/test_browserElement_oop_SetVisibleFrames.html leaked 2 DOMWINDOW(s)
Wait, this only leaks on OSX and not Linux anymore?

Let's just disable the tests on OSX; that's what we were doing before, and it's totally OK with me.
Looking through the code here, I discovered bug 866242.  I don't /think/ that's the problem here, though; if we hit that case, I think the test would hang.
Ooh, unless we're running getVisibleTest1() more than once, in which case it really could be bug 866242.

I'll write a patch to fix the test.
It might be that we're seeing this problem on b2g18 and not on trunk just because of how the chunking algorithm works out.  I'd expect we need to be near the end of the set of tests run in order to hit this probelm.
Okay, the leak in bug 866242 isn't the problem, because this is a leak in the child process.
It is very strange that this try push is green,

https://tbpl.mozilla.org/?tree=Try&rev=1e4fb0d7d86e

while this one is not,

https://tbpl.mozilla.org/?tree=Try&rev=83c616f3badd

The only difference is that the one that's orange does not call alert() from the child, while the one that's green does.

I have a bad feeling that this is actually a leak in Gecko somehow.  This behavior is too strange, and the commented-out testcase is too simple.
One of the permutations I pushed to try leaks locally (linux64).

The problem is that we're entering a nested event loop for alert() in the child process but never exiting it.  Still investigating why that's happening.
I fixed a shutdown leak.  Whether it's /the/ leak remains to be seen.
r?smaug because I want to check that the only time when we get into this case is during shutdown.

What's happening here is that the parent process never sees the messages that the child sends up.  So naturally the child never gets a response, and it hangs out in its nested event loop until shutdown.  This forces us to hold the BrowserElementChild alive, causing us to leak.

My question is, can we get into this situation where the child's mm never delivers messages up to the parent at any time other than child-process shutdown?  If so, is there some condition we should be checking to detect this?

Similarly, is it possible for us to get into a situation where messages we deliver up to the parent get there, but where messages from the parent down to the child aren't delivered, other than during child-process shutdown?  If so, can we detect this?
Attachment #742858 - Flags: review?(bugs)
(In reply to Justin Lebar [:jlebar] from comment #51)
> My question is, can we get into this situation where the child's mm never
> delivers messages up to the parent at any time other than child-process
> shutdown?
When the parent is closed. Basically remove <xul:browser remote="true">[1] from document.
But there is unload event from mm.

> Similarly, is it possible for us to get into a situation where messages we
> deliver up to the parent get there, but where messages from the parent down
> to the child aren't delivered, other than during child-process shutdown?
Well, if parent then starts shutdown process or removes <browser>




[1] or remote iframe
Comment on attachment 742858 [details] [diff] [review]
Part 4: Fix a benign child-process shutdown leak around nested event loops.

This should be ok, but perhaps there should be a check for unload event too.
(The unload which target is tabchildglobal)
Attachment #742858 - Flags: review?(bugs) → review+
Does it matter if I listen to unload using a system event listener or a regular event listener?  I guess a regular event would suffice since the target of the event is the tabchildglobal?
Yeah, regular should be fine.
Thanks, Olli.

https://hg.mozilla.org/projects/birch/rev/5e2db92c9675

I'm re-opening this so we can re-resolve it when this lands on m-c.

I'm going to push to b2g18, since birch/m-c won't be educational about whether we fixed the oranges.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'll land this once the tree re-opens.
We're all set for uplift to 1.0.1 here, at the sheriffs' convenience.
https://hg.mozilla.org/mozilla-central/rev/5e2db92c9675
https://hg.mozilla.org/mozilla-central/rev/1402432b52a2
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.