Closed Bug 790191 Opened 12 years ago Closed 12 years ago

Mozmill tests are failing with timeouts in waitForPageLoad() due to 'about:newtab'

Categories

(Mozilla QA Graveyard :: Mozmill Tests, defect, P1)

defect

Tracking

(firefox18 fixed, firefox19 fixed)

RESOLVED FIXED
Tracking Status
firefox18 --- fixed
firefox19 --- fixed

People

(Reporter: vladmaniac, Assigned: whimboo)

References

()

Details

(Whiteboard: [mozmill-test-failure][blocked by bug 799433] s=121001 u=failure c=awesomebar p=1)

Attachments

(2 files, 5 obsolete files)

This happened today on Firefox 18. 
Mozmill version: 1.5.18 
Platform: Mac OS X 10.8 (x86_64)
Report link: http://mozmill-ci.blargon7.com/#/functional/report/671677a5d9d5ca25f3cf5ae1c4697387

Even if this seems to be just a hickup, we should monitor it at least.
Whiteboard: [mozmill-test-failure]
This next failure report shows the exact same error but in multiple tests 
http://mozmill-ci.blargon7.com/#/functional/report/671677a5d9d5ca25f3cf5ae1c46a1fa8
So this was not a single time, as I suspected, but still intermittent
http://mozmill-ci.blargon7.com/#/functional/report/671677a5d9d5ca25f3cf5ae1c46b0b2b
Could be that a single testrun was affected by an issue with httpd.js.
OS: Mac OS X → All
Priority: -- → P2
Hardware: x86_64 → All
(In reply to Henrik Skupin (:whimboo) from comment #3)
> Could be that a single testrun was affected by an issue with httpd.js.

I'm afraid we have to look into this at some point. Build are still affected, its not that often though.
This appears to be affecting every Nightly on Mac OS X 10.8 (French locale) since 2012-09-11, and only en-US from 2012-09-11 to 2012-09-12. The timeout is after loading the URL defined by window.BROWSER_NEW_TAB_URL.
Whiteboard: [mozmill-test-failure] → [mozmill-test-failure] s=q3 u=failure c=awesomebar p=1
I will take this. I need a testing env on a mac 10.8 at first though, but this would be the P1 for me today, if you guys have other ideas of course.
Assignee: nobody → vlad.mozbugs
Status: NEW → ASSIGNED
This is very strange.

As preliminary investigation, I can reproduce the error but only if I run the whole functional testrun. If I run the test single from the command line it works as expected.
The setup module fails with a timeout error in tabs.closeAllTabs() which closes all tabs and opens about:blank. The test timeouts there, in opening about:blank, as dave pointed out in comment 5
If I change the url defined by window.BROWSER_NEW_TAB_URL from 'about:newtab' to 'about:blank' or in fact any webpage, the test works as expected an we have a pass. Therefore, preliminary conclusions jump to a problem with about:newtab in this particular case.
As we agreed in the 'ask an expert' session today, we are disabling this test for mac. here is the patch, tested and internally reviewed by Alex
Attachment #662994 - Flags: review?(hskupin)
Attachment #662994 - Flags: review?(dave.hunt)
The patch was fine for disabling the test for just Mac, however this issue is not just occurring on Mac... Why was it decided to do this? I'll hold off backing out the patch until we have an answer here as I joined the 'ask an expert' session part-way through this conversation.
Vlad pointed that out when I have asked for. But you are right, Dave. The given link in the URL field is wrong. Vlad, next time please ensure to select all platforms and all versions.

We should create a new patch which disables all platforms which should not be based on the last one but on the original content. Once that patch is ready we will backout the formerly one and directly land the new skip patch.

I would kinda appreciate that we check for a regression range, because it could be a bug in Firefox.
Dave, when landing skip patches please take care of the flags next time.
Whiteboard: [mozmill-test-failure] s=q3 u=failure c=awesomebar p=1 → [mozmill-test-failure][mozmill-test-skipped] s=q3 u=failure c=awesomebar p=1
We are failing if we remove all browser history and then access about:newtab
the failure in testGoButton.js is just a coincidence, we could have any test at all.
I cannot repro this manually, but if I run the simple test within the functional testrun, it fails, at least on mac os x 10.8 all the time.
Interesting, so what are your next steps here to get the full details why it fails? There are still some different factors in this testcase which could cause the problem.
So what I did so far is try to reproduce this manually by manually clearing out all the history but I had no luck and yesterday I was into the clearing history Firefox code.

It can be related to about:newtab, a bug in this features in interaction with clearing history.
Or it can be something in our testing framework, but I have nothing conclusive atm
What about reducing the testcase even further or finally starting a hg bisect? Those two things would be the most valuable actions on this bug.
(In reply to Henrik Skupin (:whimboo) from comment #18)
> What about reducing the testcase even further or finally starting a hg
> bisect? Those two things would be the most valuable actions on this bug.

Thanks for the tips. I will start on those
(In reply to Henrik Skupin (:whimboo) from comment #12)
> We should create a new patch which disables all platforms which should not
> be based on the last one but on the original content. Once that patch is
> ready we will backout the formerly one and directly land the new skip patch.

Any progress on this? I think some of our other tests are failing for the same reason. For example, see bug 794400.
(In reply to Dave Hunt (:davehunt) from comment #20)
> (In reply to Henrik Skupin (:whimboo) from comment #12)
> > We should create a new patch which disables all platforms which should not
> > be based on the last one but on the original content. Once that patch is
> > ready we will backout the formerly one and directly land the new skip patch.
> 
> Any progress on this? I think some of our other tests are failing for the
> same reason. For example, see bug 794400.

Well there is some progress but yet not the reason why this is happening
I should really try the bisect today
Blocks: 794400
This is now a P1. Thanks Dave for making the bridge to other tests.

(In reply to Maniac Vlad Florin (:vladmaniac) from comment #21)
> I should really try the bisect today

Please do not try to do it. It has to be done today! Make it your top priority please.
Priority: P2 → P1
Blocks: 794392
Are you sure that the failure of the other tests are connected to this one?
I do not think that bug 794392 is related, but bug 794400 fails in the closeAllTabs method, so yes, I think this is related.
I'm 99% sure that it's related too, because we make use of closeAllTabs() for each iteration of the endurance test. I don't know of any other failure we currently have which is connected to waitForPageLoad().
Fair enough, so it's really important that we get this fixed!
As I said early on iRC, this is happening only within the functional testrun, it does not happen when running the test manually via command line, so I am tempted to assume we have something wrong there. I was bisecting mozilla-central without any luck in finding a firefox changeset which would be bad
I really don't understand your latest statement Vlad. You have attached a simplified testcase which also showed the problem without having to run the whole tests. So why are you stepping back again? If it's clearly reproducible with the test why don't you use it for the regression test?
Comment on attachment 663371 [details]
simple testcase to demonstrate when we fail

So the problem here is that we are cloning the repository into a temporary location like:

"/var/folders/wd/zmy4z7xn7wd7sjq90z1y52f80000gn/T/tmpQ6YQZ2.mozmill-tests".

So something is not working when the tests are located there. Not sure yet if it is a failure in httpd.js or Firefox itself. Will check tomorrow morning.
Attachment #663371 - Attachment is obsolete: true
So here the notes what I did:

1. I have checked how we call Mozmill from our functional testrun:

http://hg.mozilla.org/qa/mozmill-automation/file/416592141962/libs/testrun.py#l340

2. The only item which made me wonder was the 'self._mozmill.tests' property. All others shouldn't be involved. So I have updated it to not use the cloned repository in the tmp folder but my default one under /data/code/mozmill-tests/nightly -> that was working

3. Execute a 'mkdir /private/var/folders/wd/zmy4z7xn7wd7sjq90z1y52f80000gn/T/tmpQ6YQZ2.mozmill-tests/' and copy all the mozmill-tests file in that folder.

4. Run 'mozmill -b %path% -t /private/var/folders/wd/zmy4z7xn7wd7sjq90z1y52f80000gn/T/tmpQ6YQZ2.mozmill-tests/tests/functional/ -> same failure happens.

5. It does not happen when you only run the awesomebar tests. So one of the addon tests is causing this problem.

Vlad, please continue the investigation today with the information from above.
Blocks: 794750
(In reply to Henrik Skupin (:whimboo) from comment #30)
> 5. It does not happen when you only run the awesomebar tests. So one of the
> addon tests is causing this problem.
> 
> Vlad, please continue the investigation today with the information from
> above.
I ran with manifest files today, but can't reproduce that failure when running with
>[include:testAddons/manifest.ini]
>[include:testAwesomeBar/manifest.ini]
(In reply to Henrik Skupin (:whimboo) from comment #30)
> So here the notes what I did:
> 
> 1. I have checked how we call Mozmill from our functional testrun:
> 
> http://hg.mozilla.org/qa/mozmill-automation/file/416592141962/libs/testrun.
> py#l340
> 
> 2. The only item which made me wonder was the 'self._mozmill.tests'
> property. All others shouldn't be involved. So I have updated it to not use
> the cloned repository in the tmp folder but my default one under
> /data/code/mozmill-tests/nightly -> that was working
> 
> 3. Execute a 'mkdir
> /private/var/folders/wd/zmy4z7xn7wd7sjq90z1y52f80000gn/T/tmpQ6YQZ2.mozmill-
> tests/' and copy all the mozmill-tests file in that folder.
> 
> 4. Run 'mozmill -b %path% -t
> /private/var/folders/wd/zmy4z7xn7wd7sjq90z1y52f80000gn/T/tmpQ6YQZ2.mozmill-
> tests/tests/functional/ -> same failure happens.
> 
> 5. It does not happen when you only run the awesomebar tests. So one of the
> addon tests is causing this problem.
> 

Its no need to do all that. Just have a test which uses closeAllTabs before the simple testcase, and it will reproduce in the command line environment. 
I was not able to build a testcase in a single file, we need to files to be ran for this to reproduce. 
The final scenario will be: 
1. have the simple test file and testManagerKeyboardShortcut for e.g in the same folder
2. run the folder with the mozmill -t in the command line

it will fail, at least it does for me. its very strange that if we put another test there besides testManagerKeyboardShortcut it will pass locally. If we use the testrun script for example, its most likely to fail frequently, you just need to have more than two tests in the folder, and both to make usage of tabs.closeAllTabs() in setupModule.

I was looking both me and Alex on this one today, but sadly we need more time as nothing shows up. 

We also tried to to a hg bisect but we got no Firefox changeset as bad.
Whiteboard: [mozmill-test-failure][mozmill-test-skipped] s=q3 u=failure c=awesomebar p=1 → [mozmill-test-failure][mozmill-test-skipped] s=121001 u=failure c=awesomebar p=1
Attached file simple test 1 (obsolete) —
This is the first simple test file. 
Sorry to say it would contain a dependency, 'prefs' because we need to set a pref when closing all tabs. I decided to leave it there as an exception to our simple testcases rule because otherwise it will complicate the code in the test and setting a pref does not have anything to do with our error.

I've tested it with other machines also and it does not reproduce on all of them.
I wanted to create a screencast, and the strange thing is that it does not reproduce within the screencast, only outside of it..this is new information but I can't explain why atm.
Attached file simple test 2 (obsolete) —
* this is the 2nd simple test

Please run both tests at the same time if wanna try to reproduce the failure.
They should go in a folder under tests/functional folder
(In reply to Maniac Vlad Florin (:vladmaniac) from comment #33)
> Sorry to say it would contain a dependency, 'prefs' because we need to set a
> pref when closing all tabs. I decided to leave it there as an exception to

Please use the Services.jsm module to handle setting/getting prefs.

Also please combine both in a patch which makes it easier for us to run. Thanks!
(In reply to Henrik Skupin (:whimboo) from comment #35)
> (In reply to Maniac Vlad Florin (:vladmaniac) from comment #33)
> > Sorry to say it would contain a dependency, 'prefs' because we need to set a
> > pref when closing all tabs. I decided to leave it there as an exception to
> 
> Please use the Services.jsm module to handle setting/getting prefs.
> 
> Also please combine both in a patch which makes it easier for us to run.
> Thanks!

Oki doki, on it!
Blocks: 760411
No longer blocks: 760411
Reproduced accidentally also on Windows 7
http://mozmill-crowd.blargon7.com/#/functional/report/d11b1de413a0179d904e737230cf6ca5

but this is intermittent so I do not think this is mac dependent at all.
So what about the minimized testcase? We are still waiting for it here. Also what are the results of my proposal I gave you after the Ask an Expert session? I haven't seen an update since then.
(In reply to Henrik Skupin (:whimboo) from comment #38)
> So what about the minimized testcase? We are still waiting for it here. Also
> what are the results of my proposal I gave you after the Ask an Expert
> session? I haven't seen an update since then.

On Friday I was trying to setup the prerequisites for building Firefox for mac. Had some issues with Xcode. I was building Firefox using hg bisect and divide et impera algorithm to reduce the regression range on only one changeset. during the weekend there was a blackout probably and the PC restarted, so the minimized testcases is useless now because I cannot reproduce it on my mac box. I was trying to, then I tried to investigate another issue today and found out that it happens again on win 7. a firefox build lasts 5-6 hours for me, so I could not possibly be fast with this one...
You forgot the details from the meeting. As I have said it happens each day on the Linux VM we are running for Mozmill CI. So while this box is not utilized we can make sure to use it for testing.
Attached patch simple test patch v 1.0 (obsolete) — Splinter Review
I have updated the simplified testcases
No dependencies now
No manifests in the patch because we will not check this in
Attachment #667875 - Attachment is obsolete: true
Attachment #667876 - Attachment is obsolete: true
Attached patch simple test patch v1.1 (obsolete) — Splinter Review
just realized that test1 has no blank line at the end of the file, not sure why beacuse locally it had.
just fixed that in 1.1 version
Attachment #669445 - Attachment is obsolete: true
seems the issue is still there, but I cannot see it locally. strange. hope it will be ok judging that it won't be checked in
Attached file minimized testcase
Attachment #669447 - Attachment is obsolete: true
Thankfully tinderbox builds are still available. I will use those to nail down the regression range even further.
QA Contact: hskupin
Regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=12dad118c02f&tochange=8b46964e55c9

Lets see if builds from fxteam are around. If not I highly suspect bug 762094 to being the cause here.
fxteam builds were still around. So it's indeed a regression by bug 762094. I will file a new Firefox bug for it so we can hopefully get this addressed asap.
Depends on: 799433
Whiteboard: [mozmill-test-failure][mozmill-test-skipped] s=121001 u=failure c=awesomebar p=1 → [mozmill-test-failure][mozmill-test-skipped][blocked by bug 799433] s=121001 u=failure c=awesomebar p=1
Assignee: vlad.mozbugs → nobody
Summary: Mozmill test failure /testAwesomeBar/testGoButton.js | controller.waitForPageLoad(): Timeout waiting for page loaded. → Mozmill tests are failing with timeouts in waitForPageLoad() due to 'about:newtab'
Depends on: 764782
No longer blocks: 794750
Not sure why testGoButton.js has been disabled on OS X for all the branches. Only 18.0 and 19.0 were affected.

Backed out the patch across branches:
http://hg.mozilla.org/qa/mozmill-tests/rev/26a730907ac6 (default)
http://hg.mozilla.org/qa/mozmill-tests/rev/fbb13d16bbc4 (aurora)
http://hg.mozilla.org/qa/mozmill-tests/rev/0f71274296f8 (beta)
http://hg.mozilla.org/qa/mozmill-tests/rev/9377b609fe95 (release)
http://hg.mozilla.org/qa/mozmill-tests/rev/005fe5bc4930 (esr10)

If there are still waitForPageLoad() failures in the next days please feel free to reopen. Vlad, would you mind to re-enable the Litmus test for Firefox 10? Thanks.
Assignee: nobody → hskupin
Blocks: 794750
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Flags: in-litmus?(vlad.mozbugs)
Resolution: --- → FIXED
Whiteboard: [mozmill-test-failure][mozmill-test-skipped][blocked by bug 799433] s=121001 u=failure c=awesomebar p=1 → [mozmill-test-failure][blocked by bug 799433] s=121001 u=failure c=awesomebar p=1
No longer blocks: 794400
Litmus no longer available – page gets redirected to MozTrap; no existing test cases in MozTrap yet (nothing to enable)
Flags: in-litmus?(vlad.mozbugs)
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: