Closed Bug 872237 Opened 11 years ago Closed 11 years ago

ChromeWindows cannot be initialized correctly because onWindowLoaded is called sometimes even with window not finished loading yet

Categories

(Testing Graveyard :: Mozmill, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: xabolcs, Assigned: whimboo)

References

Details

(Whiteboard: [ateamtrack: p=mozmill q=2013q2 m=3][mozmill-2.0+])

Attachments

(2 files, 6 obsolete files)

From Bug 760720 comment #18.

FYI without this patch most mutt js tests pass with Mozilla/5.0 (X11; Linux i686; rv:23.0) Gecko/20130509 Firefox/23.0 ID:20130509031047 CSet: ea059733677c

RESULTS | Passed: 47
RESULTS | Failed: 3
RESULTS | Skipped: 0

But when applied, all test containing |mozmill.getBrowserController()| in setupModule got skipped:

RESULTS | Passed: 7
RESULTS | Failed: 0
RESULTS | Skipped: 44

The constructor of MozMillController is unable to detect the loaded window. [1]

> TEST-START | js/utils/pageload_bfcache.js | setupModule
> ERROR | Test Failure | {
>   "exception": {
>     "stack": [
>       "TimeoutError@resource://mozmill/stdlib/utils.js:243",
>       "waitFor@resource://mozmill/stdlib/utils.js:293",
>       "MozMillController@resource://mozmill/driver/controller.js:249",
>       "getBrowserController@resource://mozmill/driver/mozmill.js:182",
>       "setupModule@resource://mozmill/modules/frame.js -> file:///path/to/mozmill/mutt/mutt/tests/js/utils/pageload_bfcache.js:9",
>       "Runner.prototype.wrapper@resource://mozmill/modules/frame.js:633",
>       "Runner.prototype.runTestModule@resource://mozmill/modules/frame.js:678",
>       "Runner.prototype.runTestFile@resource://mozmill/modules/frame.js:603",
>       "runTestFile@resource://mozmill/modules/frame.js:750",
>       "Bridge.prototype._execFunction@resource://jsbridge/modules/Bridge.jsm:140",
>       "Bridge.prototype.execFunction@resource://jsbridge/modules/Bridge.jsm:147",
>       "@resource://jsbridge/modules/Server.jsm:33",
>       ""
>     ],
>     "message": "controller(): Window could not be initialized.",
>     "fileName": "resource://mozmill/stdlib/utils.js",
>     "name": "TimeoutError",
>     "lineNumber": 243
>   }
> }


The patch is compatible with recent Aurora 22.0a2 versions.
If this is not a top priority then I could bisect Nightly 23.0a1.

mozregression showed me:
Last good nightly: 2013-04-30
First bad nightly: 2013-05-01

Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=dd0c611a0a27&tochange=02aa81c59df6

Unfortunately that day was a merge day, with lot of changes. :-/
See also MozillaZine's Nightly build thread [2] for the landed bugs!



[1]: https://github.com/mozilla/mozmill/blob/fd4739f85fd7994e8e1274092d5b8d73434036c5/mozmill/mozmill/extension/resource/driver/controller.js#L241

[2]: http://forums.mozillazine.org/viewtopic.php?f=23&t=2698967
Hmmm. This is weird.

I uncommented the dump() near update and remove, and the test passes (with bad Nightly!!).
After reverting the changes the test failed again.

Trying other version of bad Nightly it also failed again.
Moving back to the previous Nightly it passes.
Moving back to the other version of bad Nightly it also passes!

What happens here?!
Bisecting wider regression ranges I found that the minimized range is moving.
For example build 20130419 was affected also in my test.

Based on this weirdness (see also comment #1) I think this is a performance regression of Mozmill, instead a change in Firefox codebase.

Keeping as UNCONFIRMED.
In the waitForPageLoad() patch I exchanged the lines for adding the listener and setting the window status inside of the init() method. What happens when you revert those as given in bug 786985? Keep in mind that this code lives in windows.js now.
Also, how long does it take on your machine until the first top-level window has been loaded?
Running mutt testjs against current mozmill (2.0rc3 + Bug 870398) with Aurora 23.0a2 on a more powered machine:

RESULTS | Passed: 49
RESULTS | Failed: 2
RESULTS | Skipped: 0

Same with Nightly 24.0a1:
RESULTS | Passed: 48
RESULTS | Failed: 3
RESULTS | Skipped: 0


So it is surely a performance issue.
I would love if you could give detailed feedback on my last two comments.
(In reply to Henrik Skupin (:whimboo) from comment #3)
> In the waitForPageLoad() patch I exchanged the lines for adding the listener
> and setting the window status inside of the init() method. What happens when
> you revert those as given in bug 786985? Keep in mind that this code lives
> in windows.js now.

After reverting those changes as given in bug 786985 seems like it solves the problem:

RESULTS | Passed: 50
RESULTS | Failed: 1
RESULTS | Skipped: 0


(In reply to Henrik Skupin (:whimboo) from comment #4)
> Also, how long does it take on your machine until the first top-level window
> has been loaded?

Not measured, but starting mutt with "--restart" command line solves the problem too.
After that run it's hard to reproduce the bug.
Here's a .zip of a crudely modified mozmill/mozmill/extension/resource/modules/windows.js which dumps some diagnostic, plus a sample output edited for length, from Nightly 20130525, mozmill2.0rc2 on Windows XP32. If that's helpful.
(In reply to Jonathan French (:jfrench) from comment #8)
> mozmill2.0rc2

Sorry, rc3.
Szabolcs, do you also see that win.document.readyState is an empty string in the init() method? If that is the case we most likely have to stop updating the window state in such a case.
Oh, and this should block our release. Beside that it would be nice to know why the readyState can be an empty string. Probably wise asking in the platform newsgroup.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Whiteboard: [mozmill-2.0?] → [mozmill-2.0+]
(In reply to Jonathan French (:jfrench) from comment #8)
> Created attachment 754618 [details]
> sample dump output and modified win.js

Jonathan, in the future please simply attach two files instead of a zip which can barely be opened directly for investigation. Thanks.
I can see the same now with a debug build of Firefox. I will further investigate the problem.
(In reply to Jonathan French (:jfrench) from comment #8)
> Created attachment 754618 [details]
> sample dump output and modified win.js
> 
> Here's a .zip of a crudely modified
> mozmill/mozmill/extension/resource/modules/windows.js which dumps some
> diagnostic, plus a sample output edited for length, from Nightly 20130525,
> mozmill2.0rc2 on Windows XP32. If that's helpful.

Thank you for the modified windows.js! It's very helpful.
And also for reproducing this bug!


(In reply to Henrik Skupin (:whimboo) from comment #10)
> Szabolcs, do you also see that win.document.readyState is an empty string in
> the init() method? If that is the case we most likely have to stop updating
> the window state in such a case.

I got "ready state is uninitialized" with the modified windows.js.

If I change |if (aWindow.content)| to |if (aWindow.document.readyState == "complete")| near L262, then I'm facing with bug 865641.
In that case, playing with the place of |attachEventListeners()| call (see comment #3) doesn't make sense.

whimboo: can I take this bug? Of course your investigation is welcome. :)
Do we have a clear understand of what's happening here? It's not extremely clear what is wrong and what needs to happen from the above comments.
Flags: needinfo?(xabolcs)
Flags: needinfo?(hskupin)
(In reply to Clint Talbert ( :ctalbert ) from comment #15)
> Do we have a clear understand of what's happening here? It's not extremely
> clear what is wrong and what needs to happen from the above comments.

The title of this bug was not very helpful. Changed.

After bug 760720 landed, I saw a lot of tests skip with "controller(): Window could not be initialized." message.

I had difficulties to provide an STR (see comment #2, #3) so I kept this bug as UNCONFIRMED,
but thankfully jfrench was able to reproduce.
Flags: needinfo?(xabolcs)
Summary: Starting with Nightly 20130501 the fix for bug 760720 breaks Mozmill tests → Bug 760720 causes tests to skip with message "controller(): Window could not be initialized." in some cases
As Szabolcs mentioned above is that in some cases especially for debug builds the readyState of the document is uninitialized. That even happens for the code in onWindowLoaded() which should never have such a readyState. I will have to check what's going on here in detail.

Szabolcs, it depends on how for I come with my investigation. Most likely a patch is the result. Given the pressure of this failure I might want to take it. If that's ok for you.
Flags: needinfo?(hskupin)
(In reply to Henrik Skupin (:whimboo) from comment #17)
> Given the pressure of this failure I might want to take it. If that's ok for you.

Sure. It's OK to me.
(In reply to Szabolcs Hubai (:xabolcs) from comment #16)
> ...
> I had difficulties to provide an STR (see comment #2, #3) ...

I mean comment #1 and comment #2.
Are you both really able to reproduce this issue with an opt build? I only see it with debug builds.
(In reply to Henrik Skupin (:whimboo) from comment #20)
> Are you both really able to reproduce this issue with an opt build? I only
> see it with debug builds.

With varying success, but basically yes.
I'm able to test on Win7 64bit and Linux (32bit).
(In reply to Henrik Skupin (:whimboo) from comment #20)
> Are you both really able to reproduce this issue with an opt build? I only
> see it with debug builds.

It reproduces consistently for me, using an opt Nightly build. I am on WinXP32 SP3.
Attached patch dirty hack around readyState (obsolete) — Splinter Review
jfrench! Could you give a try for this one-liner patch?

The patch contains more changes:
- the core is the if statement.
- removeEventListener is just coding style.
- for the commented-out attachEventListeners() please see comment #3! (You could even answer Henrik's question!)
Attachment #755359 - Flags: feedback?(tojonmz)
Yup, will try it now and report back.
(In reply to Szabolcs Hubai (:xabolcs) from comment #23)
> Created attachment 755359 [details] [diff] [review]
> dirty hack around readyState
> 
> jfrench! Could you give a try for this one-liner patch?

Which particular mozmill test are you running?
Comment on attachment 755359 [details] [diff] [review]
dirty hack around readyState

Review of attachment 755359 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Szabolcs! I did something similar and will attach a patch shortly. It will simply add some more dump lines for a better investigation, which we should combine into your patch. Would you mind to do that? Other stuff you can find inline.

::: mozmill/mozmill/extension/resource/modules/windows.js
@@ +247,5 @@
>    };
>  
>    // If the window has already been finished loading, call the load handler
>    // directly. Otherwise attach it to the current window.
> +  if (aWindow.document.readyState == "complete") {

That sounds to be the right solution. It stopped the problem in nightly builds for me. Even I'm facing some other problems (different kinds of assertions) now but not related to any window loaded state.

@@ +252,5 @@
>      onWindowLoaded();
>    } else {
> +    aWindow.addEventListener("load", function runOnce() {
> +      aWindow.removeEventListener("load", runOnce, false);
> +      onWindowLoaded();

Why do you need this change? We would run it once in any case.

@@ +277,5 @@
>      // builds) it will be overwritten again by the onload listener later.
>      var loaded = (win.document.readyState === 'complete');
>      map.update(utils.getWindowId(win), "loaded", loaded);
> +
> +    //attachEventListeners(win);

Yes, we actually don't need this as I figured out now. We already call it through attachEventListeners() anyway.
Attached patch Proposed fix (obsolete) — Splinter Review
Here my proposed fix which will make it work for me in opt and debug builds. Can you both please try it out?
Attachment #755491 - Flags: feedback?(xabolcs)
Attachment #755491 - Flags: feedback?(tojonmz)
Updating the summary to the real underlying issue we have. It's not enough to check for window.content, which is unreliable. We should only go that route if its readyState is complete.
Summary: Bug 760720 causes tests to skip with message "controller(): Window could not be initialized." in some cases → ChromeWindows cannot be initialized correctly because onWindowLoaded is called sometimes even with window not finished loading yet
Blocks: 876399
(In reply to Henrik Skupin (:whimboo) from comment #27)
> Created attachment 755491 [details] [diff] [review]
> Proposed fix
> 
> Here my proposed fix which will make it work for me in opt and debug builds.
> Can you both please try it out?

Will do.
Here's the test result I am seeing, attached. It appears it still gets stuck on waitForPageLoad(). Perhaps my test configuration is incorrect, I've provided it here for reference.

Test configuration:
- latest hg default branch for mozmill-tests
- testTabbedBrowsing/testNewTab.js with its 2nd waitForPageLoad() temporarily removed (my understanding 2.0 will fail regardless if >1 is present) - this was the test which lead to the initial investigation on IRC
- latest mozmill2.0rc3 master + Proposed patch applied (confirmed Proposed patch in resource/modules/windows.js is present before running the test)
- running correct mozmill2.0rc3 confirmed in the venv shell before running
eg. 
$ which mozmill
/c/tmp/git/mozmill/master-bug-872237-whimbooPatch/venv/Scripts/mozmill.exe
(venv)

$ mozmill --version
mozmill-script.py 2.0rc3
(venv)

- using Firefox Nightly 20130528030942

Observed:
- the test file layout/mozilla.html loads, as it does without the patch applied
- the test just sits until it times out, as it does without the patch applied
- new windows.js dump information 'outer-window-destroyed' is present

If someone can let me know which mozmill test they were trying where master(failed) and master+Patch(passed), let me know and I can try that test also.
Jonathan, this bug is not about any of the waitForPageLoad issue you might see. It's about the window not being initialized issue we talked about on Monday. If you don't have this problem anymore we have a way better situation as before.
Ok, thanks for the clarity. If the output looks fine to you in the test run, then that's good.
(In reply to Jonathan French (:jfrench) from comment #32)
> Ok, thanks for the clarity. If the output looks fine to you in the test run,
> then that's good.

As spoken on IRC can you please also test with a debug build with and without the patch applied? I would love to get feedback from a test on Windows.
Which particular test did you run when you did your patch? I'd like to use the exact same test.
No particular test is necessary here. Just run mozmill with the --manual option.
Szabolcs, given that this is blocking my further work, I would like to finalize the patch. I hope you don't have problems with it, but we have to get this in ASAP to continue the work on other dependent failures.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
Attached patch Patch v1 (obsolete) — Splinter Review
Ok, so this patch should fix it for opt and debug builds. To test it please apply it and run mozmill via 'mozmill -b %path% --manual'. If you see the SUCCESS=true dump output in the console everything will be fine. Please test the patch deeply with opt and debug tinderbox builds on different branches.
Attachment #755359 - Attachment is obsolete: true
Attachment #755491 - Attachment is obsolete: true
Attachment #755543 - Attachment is obsolete: true
Attachment #755359 - Flags: feedback?(tojonmz)
Attachment #755491 - Flags: feedback?(xabolcs)
Attachment #755491 - Flags: feedback?(tojonmz)
Attachment #755849 - Flags: feedback?(xabolcs)
Attachment #755849 - Flags: feedback?(andreea.matei)
On a WindowsXP SP3 system without Visual Studio purchased, Firefox Nightly Debug wants to link against the debug version of MSVCR100.dll... MSVCR100D.dll. That debug version of the .dll is not available in the Microsoft Visual C++ 2010 x86 Redistributable.

I tried creating a copy of that release dll as 100D.dll, but predictably Firefox Debug kacked trying to find an entry point. Failure occurs both running FF through mozmill, or by running the .exe directly as an end user by double clicking on it without that .dll.

I am digging around online to see if I can find one.

If you've any additional suggestions I will try those.
I spoke with Henrik in the AAE mtg, I will just try opt for now.
I'm seeing success=true with my WindowsXP SP3 system, with released versions of Nightly, Aurora, Beta, Release, and Esr17, as the interim patch intends.

And transcribed from IRC
Andrei has also seen the same success=true in a Win8 vm with regular Nightly also (not debug Nightly, he has the same missing .dll issue reported in comment 38).

Andreea sees the same success on Linux 12.04 64bit with Aurora, Beta Debug builds.
Attached patch Patch v1.1 (obsolete) — Splinter Review
This is the final patch with the debug statement removed. It now also works perfectly on my macbook with a very slow debug build. There is no timeout anymore.
Attachment #756084 - Flags: review?(ctalbert)
Comment on attachment 755849 [details] [diff] [review]
Patch v1

Review of attachment 755849 [details] [diff] [review]:
-----------------------------------------------------------------

I got
> ************ SUCCESS=true
with patch applied, on:
- Windows 7 (64bit) + Nightly, Aurora, ESR17
- Debian GNU/Linux 7.0 (32 bit) + Nightly, Aurora, Beta, ESR17
Attachment #755849 - Flags: feedback?(xabolcs) → feedback+
Comment on attachment 756084 [details] [diff] [review]
Patch v1.1

Review of attachment 756084 [details] [diff] [review]:
-----------------------------------------------------------------

Henrik, regarding to my comment in bug 865641 comment #3,
you should nuke the sleep statement out from
|DOMContentLoadedHandler| near L180 [1], IMHO.




[1]: https://github.com/mozilla/mozmill/blob/300232fcf34d/mozmill/mozmill/extension/resource/modules/windows.js#L180
Attachment #756084 - Flags: feedback-
Comment on attachment 756084 [details] [diff] [review]
Patch v1.1

Review of attachment 756084 [details] [diff] [review]:
-----------------------------------------------------------------

This patch as written looks good. And yes, let's also see if we can remove that sleep in the file too while we're at it since doing this properly might mean we no longer need it. :-)
Attachment #756084 - Flags: review?(ctalbert) → review+
Andreea also mentioned to me that some tests are failing on OS X. Especially one of the new ones like controller/window_focus.js. I did a sneak peak on it and found one problem in our current code. As it looks like we not always get a reference to a ChromeWindow when opening new windows via window.open(). That returns a normal Window object which is not able to handle the load event! That means that we will always fail in executing onWindowLoaded and timeout in the constructor of the MozMillController class.

I modified my latest patch and it looks like that it is working now.
(In reply to Szabolcs Hubai (:xabolcs) from comment #43)
> Henrik, regarding to my comment in bug 865641 comment #3,
> you should nuke the sleep statement out from
> |DOMContentLoadedHandler| near L180 [1], IMHO.

Wrong referenced comment here? That doesn't match. And no, we cannot remove this sleep comment given that error pages will not be loaded correctly anymore. While the internal mutt tests are passing our safebrowsing tests in mozmill-tests will fail. Something might be wrong with the mutt tests, which we have to look at.

Also I don't see any of the dead object failures. Which tests are affected and which platforms? Please try again with the next patch I'm going to upload here in a moment.
Attached patch Patch v2Splinter Review
Lets get feedback if that fixes any remaining window not initialized failures. Also I would be interested in if the dead object issue is still present for Szabolcs.
Attachment #755849 - Attachment is obsolete: true
Attachment #756084 - Attachment is obsolete: true
Attachment #755849 - Flags: feedback?(andreea.matei)
Attachment #756940 - Flags: feedback?(xabolcs)
Attachment #756940 - Flags: feedback?(andreea.matei)
No changes with dead object, but the patches eliminates the window initialization problem.
Please note that Application restarts after httpd.js error.
After that all tests passes.
Comment on attachment 756940 [details] [diff] [review]
Patch v2

With Fx 22.0 beta 3 on Linux 7.0 32 bit I still see the dead object exception.
Attachment #756940 - Flags: feedback?(xabolcs) → feedback-
Comment on attachment 757252 [details]
Sample output with patch v2 + Fx 22.0 beta 3

As mentioned earlier on this bug I need the full debug output. Therefore also comment in all the dump statements within windows.js. Only with that I might be able to figure out what's failing for you. Please try to get me the information soon so we can really get this bug fixed. Thanks.
Attachment #757252 - Attachment is obsolete: true
Blocks: 865641
Blocks: 872963
No longer blocks: 865641
Comment on attachment 756940 [details] [diff] [review]
Patch v2

Review of attachment 756940 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see the issue anymore on linux. On windows I can't run the debug build due to the dll file still "missing", but with a normal build the failure is gone.
Attachment #756940 - Flags: feedback?(andreea.matei) → feedback+
(In reply to Szabolcs Hubai (:xabolcs) from comment #48)
> No changes with dead object, but the patches eliminates the window
> initialization problem.
> Please note that Application restarts after httpd.js error.
> After that all tests passes.

Given that we all don't see this failure on each of our supported platforms I would suggest we get a new bug filed for this issue. I don't want to wait longer to get it landed, because it majorly blocks us from getting in additional patches. So please file a new bug.
Comment on attachment 756940 [details] [diff] [review]
Patch v2

Review of attachment 756940 [details] [diff] [review]:
-----------------------------------------------------------------

Clint, this version of the patch adds one more line to ensure that we always work on a ChromeWindow inside the observer callback for a new top level window. That will fix all the problems we have seen. The issue from Xabolcs we will cover in another bug.
Attachment #756940 - Flags: review?(ctalbert)
Confirming it fixes the problem on OSX
Latest v2 patch appears to be working ok for me on my slow windows machine with release(non-debug) Nightly and Aurora.
Depends on: 879491
Comment on attachment 756940 [details] [diff] [review]
Patch v2

Not sure why we need to have the chromewindow, but it seems a safe enough change.
Attachment #756940 - Flags: review?(ctalbert) → review+
Landed as:
https://github.com/mozilla/mozmill/commit/1704e674648270a786111a377647d782b590fd50
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [mozmill-2.0+] → [ateamtrack: p=mozmill q=2013q2 m=3][mozmill-2.0+]
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: