Closed Bug 880101 Opened 7 years ago Closed 7 years ago

Intermittent browser_locationBarCommand.js | New URL is loaded in new window - Got , expected example.com

Categories

(Firefox :: General, defect)

x86
Windows XP
defect
Not set

Tracking

()

RESOLVED FIXED
Firefox 26
Tracking Status
firefox24 --- fixed
firefox25 --- fixed
firefox26 --- fixed

People

(Reporter: philor, Assigned: ttaubert)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 3 obsolete files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=23839025&tree=Birch
WINNT 6.2 birch pgo test mochitest-browser-chrome on 2013-06-05 18:17:32 PDT for push 17be89a912b5
slave: t-w864-ix-130

18:21:44     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/browser_locationBarCommand.js | Content window should be focused
18:21:44  WARNING -  TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/browser_locationBarCommand.js | New URL is loaded in new window - Got , expected example.com
18:21:44     INFO -  Stack trace:
18:21:44     INFO -      JS frame :: chrome://mochikit/content/browser-test.js :: test_is :: line 536
18:21:44     INFO -      JS frame :: chrome://mochitests/content/browser/browser/base/content/test/browser_locationBarCommand.js :: listener</< :: line 42
18:21:44     INFO -      JS frame :: chrome://mochitests/content/browser/browser/base/content/test/browser_locationBarCommand.js :: loadListener :: line 169
18:21:44     INFO -      native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0

https://tbpl.mozilla.org/php/getParsedLog.php?id=23839297&tree=Mozilla-Inbound
Windows 7 32-bit mozilla-inbound opt test mochitest-browser-chrome on 2013-06-05 18:24:00 PDT for push b1da96acb0dc
slave: t-w732-ix-052

18:28:06     INFO -  TEST-PASS | chrome://mochitests/content/browser/browser/base/content/test/browser_locationBarCommand.js | Content window should be focused
18:28:06  WARNING -  TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/base/content/test/browser_locationBarCommand.js | New URL is loaded in new window - Got , expected example.com
18:28:06     INFO -  Stack trace:
18:28:06     INFO -      JS frame :: chrome://mochikit/content/browser-test.js :: test_is :: line 536
18:28:06     INFO -      JS frame :: chrome://mochitests/content/browser/browser/base/content/test/browser_locationBarCommand.js :: listener</< :: line 42
18:28:06     INFO -      JS frame :: chrome://mochitests/content/browser/browser/base/content/test/browser_locationBarCommand.js :: loadListener :: line 169
18:28:06     INFO -      native frame :: <unknown filename> :: <TOP_LEVEL> :: line 0
Margaret, as test author I don't suppose you could take a look at this top orange? :-)
Flags: needinfo?(margaret.leibovic)
(In reply to Ed Morley [:edmorley UTC+1] from comment #30)
> Margaret, as test author I don't suppose you could take a look at this top
> orange? :-)

Geez, I haven't touched this code in forever. Maybe gavin can find someone to help with this...

Also, it's odd that this intermittent issue started happening only recently, since this test has been in the tree for over a year and a half.
Flags: needinfo?(margaret.leibovic) → needinfo?(gavin.sharp)
Attached patch potential patchSplinter Review
This is kind of a stab in the dark. The failure case is that within a pageshow handler in a newly opened window, the URL bar hasn't been updated to the URL that we opened yet, and is blank. My theory is that this is because pageshow sometimes has a chance to fire against the initial about:blank load (which then gets replaced by the actual load after _delayedStartup). Other tests expect addPageShowListener's callback to fire for about:blank pages, so I made its expectedURL optional.

Not sure why this would have only started happening recently, but maybe we're just faster now or something?

(I also cleaned up the addPageShowListener helper.)
Assignee: nobody → gavin.sharp
Status: NEW → ASSIGNED
Attachment #762004 - Flags: review?(margaret.leibovic)
Flags: needinfo?(gavin.sharp)
Attachment #762004 - Flags: review?(margaret.leibovic) → review+
Flags: in-testsuite-
https://hg.mozilla.org/mozilla-central/rev/61a8216f8991
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Gavin, any other ideas? This is still a top-10 intermittent orange.
Flags: needinfo?(gavin.sharp)
Target Milestone: Firefox 24 → ---
I haven't had time to look in depth, so I took another stab in the dark. Perhaps we can't rely on the location bar being updated immediately after pageshow for some reason, so let's wait for another spin of the event loop:

https://hg.mozilla.org/integration/mozilla-inbound/rev/56a63eee73b4
Flags: needinfo?(gavin.sharp)
https://hg.mozilla.org/mozilla-central/rev/56a63eee73b4
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 25
(In reply to TinderboxPushlog Robot from comment #111)
> ehsan
> https://tbpl.mozilla.org/php/getParsedLog.php?id=24939048&tree=Mozilla-
> Inbound
> Windows 7 32-bit mozilla-inbound opt test mochitest-browser-chrome on
> 2013-07-04 08:33:57
> revision: b257784cdfdb

Suggests this didn't work...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I'm optimistic about this being fixed by the patch in bug 897943.
Depends on: 897943
Adding some debug output shows the onLocationChange somehow isn't fired so that URLBarSetURI isn't called and gURLBar.value doesn't have the right value. Not sure what can lead to a correct 'pageshow' event for a page that doesn't fire onLocationChange.

https://tbpl.mozilla.org/php/getParsedLog.php?id=25858251&tree=Try&full=1
Log output of a green run:

TEST-INFO | checking window state
TEST-START | chrome://mochitests/content/browser/browser/base/content/test/browser_locationBarCommand.js
TEST-INFO | browser_locationBarCommand.js | Running test: Alt left click
TEST-PASS | browser_locationBarCommand.js | page proxy state must be invalid for go button to be visible
TEST-PASS | browser_locationBarCommand.js | SaveURL was called
TEST-PASS | browser_locationBarCommand.js | Urlbar reverted to original value
TEST-INFO | browser_locationBarCommand.js | Running test: Shift left click
TEST-PASS | browser_locationBarCommand.js | page proxy state must be invalid for go button to be visible
 ++ onLocationChange URLBarSetURI http://example.com/
TEST-INFO | browser_locationBarCommand.js | pageshow: http://example.com/
TEST-INFO | browser_locationBarCommand.js | URL should be loaded in a new window
TEST-PASS | browser_locationBarCommand.js | Urlbar reverted to original value
TEST-PASS | browser_locationBarCommand.js | There should be no focused element
TEST-PASS | browser_locationBarCommand.js | Content window should be focused
 ++ checking gURLBar.value
TEST-PASS | browser_locationBarCommand.js | New URL is loaded in new window

Log output of a failure:

TEST-START | browser_locationBarCommand.js
TEST-INFO | browser_locationBarCommand.js | Running test: Alt left click
TEST-PASS | browser_locationBarCommand.js | page proxy state must be invalid for go button to be visible
TEST-PASS | browser_locationBarCommand.js | SaveURL was called
TEST-PASS | browser_locationBarCommand.js | Urlbar reverted to original value
TEST-INFO | browser_locationBarCommand.js | Running test: Shift left click
TEST-PASS | browser_locationBarCommand.js | page proxy state must be invalid for go button to be visible
TEST-INFO | browser_locationBarCommand.js | pageshow: http://example.com/
TEST-INFO | browser_locationBarCommand.js | URL should be loaded in a new window
TEST-PASS | browser_locationBarCommand.js | Urlbar reverted to original value
TEST-PASS | browser_locationBarCommand.js | There should be no focused element
TEST-PASS | browser_locationBarCommand.js | Content window should be focused
 ++ checking gURLBar.value
TEST-UNEXPECTED-FAIL | browser_locationBarCommand.js | New URL is loaded in new window - Got , expected example.com
Tim/Gavin, any more luck with this since comment 215? :-)
Flags: needinfo?(ttaubert)
I didn't have time to investigate it more. Any help is definitely welcome. Gavin and I talked about it shortly but we were both puzzled as to why pageshow for a page can be dispatched but onLocationChange isn't.
Flags: needinfo?(ttaubert)
Latest try results are just in. onLocationChange() is not being called due to these lines in the mTabProgressListener:

> if (!this.mBlank) {
>   this._callProgressListeners("onLocationChange",
>                               [aWebProgress, aRequest, aLocation,
>                                aFlags]);
> }

http://hg.mozilla.org/mozilla-central/file/0a63cd911b4f/browser/base/content/tabbrowser.xml#l722

It seems like there is a race condition in that sometimes mBlank isn't set to false in the onStateChange handler. I might have an idea on how to fix this. Let's see what try says.
The culprit here seems to be that the initial about:blank load is canceled so early that there aren't any STATE_START or STATE_STOP state changes sent for it.

The first thing the mTabProgressListener receives is a START_START notification for http://example.com/. It always expects notifications for about:blank though, that's why mTabProgressListener.mBlank exists after all (to ignore those).
(In reply to Tim Taubert [:ttaubert] from comment #268)
> The culprit here seems to be that the initial about:blank load is canceled
> so early that there aren't any STATE_START or STATE_STOP state changes sent
> for it.

Hmm, what is cancelling it? AFAIK for this case we should be following the http://hg.mozilla.org/mozilla-central/annotate/ad0ae007aa9e/browser/base/content/browser.js#l1012 path to load example.com in this test, and given that that runs off delayedStartup I would have assumed it happened way to late to stop the about:blank STATE_START from being fired.

> The first thing the mTabProgressListener receives is a START_START
> notification for http://example.com/. It always expects notifications for
> about:blank though, that's why mTabProgressListener.mBlank exists after all
> (to ignore those).

I guess maybe we should have the progress listener also check the URI in addition to just using mBlank?
Ok, so the code that's making this fail was introduced by bug 463384.

http://hg.mozilla.org/mozilla-central/file/b743d65e81d8/browser/base/content/tabbrowser.xml#l2960

gBrowser.init() is called a few lines above and the webProgress.addProgressListener() call may fail sometimes. When it fails we don't have the mTabProgressListener around to catch the first state changes for about:blank and to handle mBlank correctly.

I don't quite understand bug 463384 comment #11 and why this fails sometimes but I think that Dave or Dão could maybe shed some light onto this?

Just FTR, I can reproduce this very unreliably on my Windows machine but it was enough to just let it run in the background until it fails. The failing test entered the catch clause of the try statement linked to above. All successful tests didn't enter it.
Blocks: 463384
gBrowser.init is called in the tabbrowser constructor and in gBrowserInit.onLoad. Are you saying that when this fails, it's the call from gBrowserInit.onLoad that ends up adding the progress listener, whereas in other cases it's the call in the constructor?
Exactly, just needed to reproduce to verify that.

Succeeding runs add the progress listener in the tabbrowser's constructor. If the constructor fails to do so, we'll add it in gBrowserInit.onLoad and end up failing because we didn't see state notifications for about:blank.
I think we can work around this by starting the first tab's initial about:blank load *after* we successfully added the progress listener. Maybe there's a better way to fix the core issue? I have no clue though why the docShell could not be ready when we run the tabbrowser constructor.
Attachment #785936 - Flags: review?(gavin.sharp)
My suggestion from the second part of comment 271 seems better, as a workaround for the underlying issue.

It would be really nice to dig a bit deeper and get to the bottom of the inconsistent behavior. How many test runs does it usually take you on average to reproduce this locally?
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #279)
> My suggestion from the second part of comment 271 seems better, as a
> workaround for the underlying issue.

Ok, I don't mind taking that.

> It would be really nice to dig a bit deeper and get to the bottom of the
> inconsistent behavior. How many test runs does it usually take you on
> average to reproduce this locally?

Yeah, I agree. Fixing symptoms doesn't sound that nice. Sometimes it's failing really quick. Sometimes it's not failing on a thousand runs and I have to re-run. My feeling is that it seems to fail after 10-15 min of running only that test in a loop (with only the two first subtests).
So adding the progress listener fails because nsIContainerBoxObject.docShell is null. 

http://mxr.mozilla.org/mozilla-central/source/layout/xul/base/src/nsContainerBoxObject.cpp#47

It is null due to the fact, that nsBoxObject::GetFrame() returns null.

http://mxr.mozilla.org/mozilla-central/source/layout/xul/base/src/nsBoxObject.cpp#111

nsBoxObject::GetFrame() returns null because mContent->GetPrimaryFrame() does, mPrimaryFrame of nsIContent is simply set to null.

http://mxr.mozilla.org/mozilla-central/source/content/base/public/nsIContent.h#820

nsIContent::IsInDoc() however returns true.



How can we end up here? In succeeding test runs, nsBoxObject::GetFrame() returns the frame and nsSubDocumentFrame::GetDocShell() fills the pointer.
So in the failing cases, nsSubDocument::Init() is called *after* the tabbrowser's constructor. Looks like it should be called before as it calls SetPrimaryFrame() on mContent.
Assignee: gavin.sharp → ttaubert
Ok, great. This seems to be caused by bug 791670. It exactly started when it landed for the first time (around 2013-06-05). The problem here is that the BrowserNewTabPreloader collects window sizes from all open windows off a timer by calling nsIDOMWindowUtils.getBoundsWithoutFlushing(win.gBrowser).

http://hg.mozilla.org/mozilla-central/file/ff6bb97c0c74/browser/modules/BrowserNewTabPreloader.jsm#l272

If it happens to access gBrowser while the the browser window is still loading, it forces the tabbrowser constructor to run before the first tab's docShell is available.

We can certainly fix this by ignoring windows that haven't been fully loaded yet but I wonder what we can do to harden this against mistakes like this. Bug 463384 indicates that there are add-ons out there causing the exact same problem.
Blocks: 791670
Attachment #785936 - Flags: review?(gavin.sharp)
Aha! Yeah, this is a common add-on problem, so we should try to handle it, but I'm not sure how offhand. Will think more about it later!

(You should get some sleep! :)
Pushed the test to try to make sure it doesn't only fail locally.

https://tbpl.mozilla.org/?tree=Try&rev=2882de6b8fa8
Option 1: Defer loading "about:blank" until after we added the progress listener for the initial tab.

https://tbpl.mozilla.org/?tree=Try&rev=22b6b74c1304
Attachment #785936 - Attachment is obsolete: true
(In reply to Tim Taubert [:ttaubert] from comment #294)
> Created attachment 787360 [details] [diff] [review]
> part 1 - Add test that forces the gBrowser constructor a little too early

Ok, so that fails but for the wrong reasons. Part 2a also fails, which it shouldn't. Pushed another test to see how that goes.
Ok, this test times out as intended because we never receive an onLocationChange notification.

https://tbpl.mozilla.org/?tree=Try&rev=72d3efdddf4f
Attachment #787360 - Attachment is obsolete: true
Attachment #787455 - Flags: review?(gavin.sharp)
Comment on attachment 787361 [details] [diff] [review]
part 2a - Start loading about:blank in the initial tab after we have added the progress listener

This patch makes another test fail which probably indicates this isn't the right solution.

https://tbpl.mozilla.org/?tree=Try&rev=176cbf2eac16
This approach seems to be fine. It passes the test from part 1 and doesn't introduce other failures. This uses an observer to wait until the initial tab has been initialized.

https://tbpl.mozilla.org/?tree=Try&rev=aecb67bf28de
Attachment #787456 - Flags: review?(gavin.sharp)
Comment on attachment 787456 [details] [diff] [review]
part 2b - Wait until the initial tab's content document has been created before adding the progress listener

Does observing content-document-global-created work across process boundaries?
Not sure. AFAIK content processes cannot (or shouldn't) send notification via the observer service. I think, in an e10s world the whole problem we're fixing here would actually not exist because it wouldn't hurt to add a progress listener (that would probably be implemented using messages) to the browser even if the content process hasn't loaded, yet.
In that case, can you sprinkle this code with some e10s ifdefs?
Comment on attachment 787455 [details] [diff] [review]
part 1 - Add test that forces the gBrowser constructor a little too early, v2

Is this test prone to failure in the opposite direction, if the window load/tabbrowser constructor ends up occurring before the first successful tryAddProgressListener? I guess not - in that case it should just succeed but not end up testing the scenario we care about, right? That seems fine.
Attachment #787455 - Flags: review?(gavin.sharp) → review+
Comment on attachment 787456 [details] [diff] [review]
part 2b - Wait until the initial tab's content document has been created before adding the progress listener

How this should work with e10s does merit some thought, but perhaps we can just file a followup.

It's probably worth adding a comment next to the call to _addProgressListenerForInitialTab saying that we're assuming it can only fail because the docShell/content-document-global hasn't yet been created, and maybe another comment in _waitForInitialContentDocument saying that we're using "content-document-global-created" as an approximation for "docShell is initialized".
Attachment #787456 - Flags: review?(gavin.sharp) → review+
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #304)
> Is this test prone to failure in the opposite direction, if the window
> load/tabbrowser constructor ends up occurring before the first successful
> tryAddProgressListener? I guess not - in that case it should just succeed
> but not end up testing the scenario we care about, right? That seems fine.

It may intermittently not test what we care about but so far it seems very reliable. The try push with only this test but without the fix fails on all platforms with the expected failure.
Attachment #787361 - Attachment is obsolete: true
Blocks: 895865
https://hg.mozilla.org/mozilla-central/rev/2f24ae3aaa4e
https://hg.mozilla.org/mozilla-central/rev/0b8a1aa8d98d
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED