Closed Bug 1069897 Opened 10 years ago Closed 10 years ago

Test failure "Window has been found" in testSecurity/testSSLStatusAfterRestart.js

Categories

(Mozilla QA Graveyard :: Mozmill Tests, defect, P1)

x86_64
Linux
defect

Tracking

(firefox34 fixed, firefox35 fixed, firefox36 fixed)

RESOLVED FIXED
Tracking Status
firefox34 --- fixed
firefox35 --- fixed
firefox36 --- fixed

People

(Reporter: AndreeaMatei, Assigned: cosmin-malutan)

References

()

Details

(Whiteboard: [mozmill-test-failure][sprint])

Attachments

(3 files, 2 obsolete files)

Please backout the causing test, there shouldn't be a need for this bug. A fix has to take place on the implementation bug.
We will skip the test, not backout because of a dependency tree.
Only linux seems to be affected.
Attached patch skip.patchSplinter Review
Attachment #8492140 - Flags: review?(andreea.matei)
For safety disabled on all platforms.
Comment on attachment 8492140 [details] [diff] [review]
skip.patch

Review of attachment 8492140 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, please land it.
Attachment #8492140 - Flags: review?(andreea.matei) → review+
So this is reproducible only on CI and it's failing on the OV certificate check, I'll investigate further.
Assignee: nobody → daniel.gherasim
Status: NEW → ASSIGNED
Priority: -- → P1
If I remove the check for 
>  27   url: "http://ssl-ov.mozqa.com/",
>  28   identity: "",
>  29   type: "unknownIdentity",
>  30   callback: checkSecurityTab_OV
>  31 }


Test passes, I will continue the investigation on Monday.
Blocks: 1025849
This only reproduces on CI and not on our local machines, that mostly because of the less memory we have on the CI which could cause race conditions.

If I put a controller.sleep(500) after we change the tab we are (changing value of the tabBrowser.selectedIndex) the test passed 200/200 times.

Here is where I added the code:
> // Check that the correct status is shown for each certificate type
> TEST_DATA.forEach(aPage => {
>   controller.waitForPageLoad();
> 
>   cert = security.getCertificate(tabBrowser.securityUI);
>   verifyCertificateStatus(aPage);
>   tabBrowser.selectedIndex++;
>   controller.sleep(500); 
> });
So along with "Window has been found" we also get a :
> controller.waitForPageLoad(URI=http://ssl-ov.mozqa.com/, readyState=complete)

It's likely that this is the first line that fails, and then we fail in waiting for the window.
Does that mean we're not waiting for the right observers in the new Window classes?
Does tabbrowser wait for the tab to load when we change selectedIndex?
(In reply to Andrei Eftimie from comment #12)
> Does that mean we're not waiting for the right observers in the new Window
> classes?

Nope. We don't use here the new window class and the specific observers we added.
We are only using the handleWindow method which just got moved from one file to another.
I don't think it's something related to how we handle windows, the failure most likely it's above this, and the window doesn't get open.

> Does tabbrowser wait for the tab to load when we change selectedIndex?

It waits for the this._tabs.getNode().selectedIndex to be equal with our expected tab.
> assert.waitFor(function () {
>      return this.selectedIndex === aIndex;
>    }, "The tab with index '" + aIndex + "' has been selected", undefined,
>     undefined, this);

We have to investigate if we can wait for a tab to be loaded (once we click on it).
(In reply to daniel.gherasim from comment #13)
> It waits for the this._tabs.getNode().selectedIndex to be equal with our
> expected tab.
> > assert.waitFor(function () {
> >      return this.selectedIndex === aIndex;
> >    }, "The tab with index '" + aIndex + "' has been selected", undefined,
> >     undefined, this);
> 
> We have to investigate if we can wait for a tab to be loaded (once we click
> on it).

This. We might need to wait for some animations to finish.
AFAIK we implement this in relation to tab loading (new tab, close tab). Maybe we need to apply the same logic to changing tabs.
Attached file testcase.js
Testcase with the failure: "Notification popup has been opened" 7/50 times.
That's similar failure with what we do in the test, and it's same problem, that doing something imediately after we change the .selectedIndex using our tabBrowser may fail sometime.

I would suggest to do a workaround here so we can have the test up & running, and fix .selectedIndex in another bug.
I think what we need here is the TabShown event. Can you please test this Daniel?

https://developer.mozilla.org/en-US/docs/Web/Events/TabShow
(In reply to Henrik Skupin (:whimboo) from comment #16)
> I think what we need here is the TabShown event. Can you please test this
> Daniel?
> 
> https://developer.mozilla.org/en-US/docs/Web/Events/TabShow

TabShow doesn't work, I think it's only triggered when it's loaded & shown.
But "TabSelect" event works locally, https://developer.mozilla.org/en-US/docs/Web/Events/TabSelect .
I'll test on the CI machine once I find one free to see if it fixes this problem.
I see. That indeed should be the correct event. Thanks for letting us know. Btw how many of our tests are using selectedIndex? I would assume this problem could be the base of many other failures we are seeing!
 (In reply to Henrik Skupin (:whimboo) from comment #18)
> I see. That indeed should be the correct event. Thanks for letting us know.
> Btw how many of our tests are using selectedIndex? I would assume this
> problem could be the base of many other failures we are seeing!

There are 11 tests which get used of this method.

So I tested with the "TabSelect" event and it's still failing. (Maybe would be good to add this event anyway in another bug for cases we select a tab).

After digging more into this, I found out what's causing this failure.
So we are clicking the tab > then call waitForPageLoad() > we click on the 'identity box' to open the notificationPopup but somehow untill the popup is open, we focus on the url bar and notification get's automatically closed before we click the moreInfoButton on it. So it's a race condition that causes this.
I'm not sure about the focus on url bar, but something it's clearely closing it really fast.
Printing the notification state after we click on the box it shows: showing, closed, closed.
(In reply to daniel.gherasim from comment #19)
> So I tested with the "TabSelect" event and it's still failing. (Maybe would
> be good to add this event anyway in another bug for cases we select a tab).

Absolutely. Lets get this fixed sooner than later. Please file a bug for it.
While for the other pages is: showing, open, open.

So maybe ssl-ov.mozqa.com it's the fastest page and that's why we fail on it.
That's one reason we should assert in the waitForNotificationPanel that we got the expected state on the panel, and not just check for the event.
The cause of this failure is that we click on "identity-popup-more-info-button" when the popup is closed for some reason, I couldn't see this failure but if I interfere with the testrun, I get exactly this issue.
Also I triggered an 'escape' event prior of clicking and I saw the issue, and if I call button.doCommand() the windows opens even if the notification has been closed.

So the issue here is not that the window hasn't been open, is that we didn't had the notification in the correct state, we have to wait for notification to close.

This line:
>http://hg.mozilla.org/qa/mozmill-tests/file/3c0b8bae9dff/firefox/tests/remote/testSecurity/testSSLStatusAfterRestart.js#l118
Should be:
>  locationBar.waitForNotificationPanel(() => {
>    moreInfoButton.click();
>  }, {type: "identity", open: false});


Also we trigger escape event on notification even if it has already closed in, this if it hits a race condition it can close something else, the try/finaly from the line above has to be removed because handling the window has nothing to do with notification, the assertion I suggest above should be sufficient.
http://hg.mozilla.org/qa/mozmill-tests/file/3c0b8bae9dff/firefox/tests/remote/testSecurity/testSSLStatusAfterRestart.js#l123
Attached patch patch v1.0 (obsolete) — Splinter Review
This patch should fix this issue.
http://mozmill-crowd.blargon7.com/#/remote/report/ad8dad905f3281cb521a23ddf310fd8b
Assignee: daniel.gherasim → cosmin.malutan
Attachment #8506021 - Flags: review?(andrei.eftimie)
Attachment #8506021 - Flags: review?(andreea.matei)
Attached patch patch v1.0 (obsolete) — Splinter Review
Sorry, wrong patch
Attachment #8506021 - Attachment is obsolete: true
Attachment #8506021 - Flags: review?(andrei.eftimie)
Attachment #8506021 - Flags: review?(andreea.matei)
Attachment #8506022 - Flags: review?(andrei.eftimie)
Attachment #8506022 - Flags: review?(andreea.matei)
Comment on attachment 8506022 [details] [diff] [review]
patch v1.0

Review of attachment 8506022 [details] [diff] [review]:
-----------------------------------------------------------------

I've got a failure with this patch applied, but at least I know that my assumption was right, the notification has been closed prior of clicking moreInfoButton button.

::: firefox/tests/remote/testSecurity/testSSLStatusAfterRestart.js
@@ +118,5 @@
> +  locationBar.waitForNotificationPanel(() => {
> +    moreInfoButton.click();
> +  }, {type: "identity", open: false});
> +
> +  windows.handleWindow("type", "Browser:page-info", aPage.callback);

We know for sure that the notification has been opend because it didn't failed to open, and that by the time we click the more info button it's closed.
http://hg.mozilla.org/qa/mozmill-tests/file/3c0b8bae9dff/firefox/tests/remote/testSecurity/testSSLStatusAfterRestart.js#l104
http://mozmill-crowd.blargon7.com/#/remote/report/ad8dad905f3281cb521a23ddf3118d25
Attachment #8506022 - Flags: review?(andrei.eftimie)
Attachment #8506022 - Flags: review?(andreea.matei)
Whiteboard: [mozmill-test-failure] → [mozmill-test-failure][sprint]
I used setTimeout to delay the closing of page-info window, and the issue was 100% reproducible. It might be the case that if the window closes after the notification has been opened, it will blur the notification which will cause it to close. That would be a reace condition, that's why we fail intermittently and we never fail on the first certificate. What I will try now to see if it fixes the issue is to retrieve the controller when handling the window, and close it explicitly with waitForWindowState.
Attached patch patch v2.0Splinter Review
So the we hit this problem because tabBrowser.selectedIndex++ doesn't correctly waits for the tab to change, and some event it's closing the notification after it opens.
Whatsoever if we open the tab, navigate to the page and check the certificate in the same loop, we don't hit that race-condition, which closes our notification so it's clearly not a bug but a test implementation weakness.

I ran over 130 rans and it didn't failed, so this fixes the issue.
http://mozmill-sandbox.blargon7.com/#/remote/reports?app=All&branch=All&platform=All&from=2014-10-17&to=2014-10-17
Attachment #8506022 - Attachment is obsolete: true
Attachment #8506875 - Flags: review?(andrei.eftimie)
Attachment #8506875 - Flags: review?(andreea.matei)
Comment on attachment 8506875 [details] [diff] [review]
patch v2.0

Review of attachment 8506875 [details] [diff] [review]:
-----------------------------------------------------------------

This looks way better. I'll wait until Monday morning to land this, it's a bit late now, with the weekend coming.
Attachment #8506875 - Flags: review?(andrei.eftimie)
Attachment #8506875 - Flags: review?(andreea.matei)
Attachment #8506875 - Flags: review+
Attachment #8506875 - Flags: checkin?(andreea.matei)
So this patch will delay the test a bit given that we now are waiting for the SSL pages to be loaded before continuing to the next tab. Why don't we try go get bug 1071566 fixed, so we wait for the other tab to be selected?
We wait anyway for the page load, we just wait in the same loop we open the tab, by this way we are sure the test works. Anyway per comment 22 it looks to be clear that bug 1071566 won't fix this issue.
Ah, missed that part. So lets get this in even we have a slightly longer overhead. At least the test is passing. Thanks Cosmin.
Comment on attachment 8506875 [details] [diff] [review]
patch v2.0

Review of attachment 8506875 [details] [diff] [review]:
-----------------------------------------------------------------

http://hg.mozilla.org/qa/mozmill-tests/rev/22fd445ad105 (default)
Attachment #8506875 - Flags: checkin?(andreea.matei) → checkin+
Transplanted:
http://hg.mozilla.org/qa/mozmill-tests/rev/805fe0b2dd29 (aurora)
http://hg.mozilla.org/qa/mozmill-tests/rev/a023470ed0c9 (beta)

Thanks!
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Has this really been tested for Beta? All of the remote testruns on Linux are failing with this specific test! It's a different problem but it should have not been landed on that branch.

http://mozmill-release.blargon7.com/#/remote/top?app=All&branch=34.0&platform=All&from=2014-10-23&to=2014-10-24
Yes I've tested this even now is not failing only if I interferer with run, I'll check it on a CI node.
Maybe it's best to checkout the re-enabling patch, or to check-in the skip again as it might be something else.
Please file a new bug for this case, and yes lets skip it if possible.
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: