Closed Bug 572503 Opened 14 years ago Closed 14 years ago

[mozmill] Timeout failure in testSafeBrowsingNotificationBar.js

Categories

(Mozilla QA Graveyard :: Mozmill Tests, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: u279076, Assigned: whimboo)

References

()

Details

(Whiteboard: [mozmill-test-failure])

Attachments

(11 obsolete files)

MODULE: firefox\testSecurity\testSafeBrowsingNotificationBar.js     
TEST:   testNotificationBar	
FAIL:   timeout exceeded for waitForEval('subject.targetURL.value == subject.currentURL')
BRANCH: default, 1.9.2
Assignee: nobody → aaron.train
Whiteboard: [mozmill-test-failure]
Status: NEW → ASSIGNED
Here's what I know from running this test and it's associated test-suite (Security) at 25x across default (trunk) and 25x across 1.9.2: 

- I am getting mixed results between passes and fails on both branches, i.e., sometimes 5 passes in a row, sometimes 4 fails in a row

- On failed runs, I am meeting the timeout exceeded for waitForEval('subject.targetURL.value == subject.currentURL') line. This fail traces down to line #158 which has an assertLoadedUrlEqual which is timing out. While debugging, I am not indicating a problem with it's arguments.

- Commenting out this check will pass the test

So this is really just a racing problem. As there is a waitForPageLoad prior to this check, I am not sure why the assertLoadedUrlEqual is having issues

Anthony, any idea here to help me out?
(In reply to comment #1)
> Here's what I know from running this test and it's associated test-suite
> (Security) at 25x across default (trunk) and 25x across 1.9.2: 
> 
> - I am getting mixed results between passes and fails on both branches, i.e.,
> sometimes 5 passes in a row, sometimes 4 fails in a row
> 
> - On failed runs, I am meeting the timeout exceeded for
> waitForEval('subject.targetURL.value == subject.currentURL') line. This fail
> traces down to line #158 which has an assertLoadedUrlEqual which is timing out.
> While debugging, I am not indicating a problem with it's arguments.
> 
> - Commenting out this check will pass the test
> 
> So this is really just a racing problem. As there is a waitForPageLoad prior to
> this check, I am not sure why the assertLoadedUrlEqual is having issues
> 
> Anthony, any idea here to help me out?

You can use Venkman (add-on) to debug the values of targetURL and currentURL in the assertLoadedUrlEqual method.  Alternatively, you can throw a couple of alerts into that method to show you the values during test run:

controller.window.alert(targetURL.value + " | " + currentURL);

This will give you an idea of what values exist and maybe this will give you a clue based on what you see happening in the browser during this test.
(In reply to comment #2)

> controller.window.alert(targetURL.value + " | " + currentURL);

In situations where we have a loop like in this case I prefer to use Cu.reportError. It will log lines to the error console without popping up a dozen of modal alert dialogs.
Attached image timeout issue (obsolete) —
As you can see in the screenshot attached, there is a url mismatch when the test clicks the 'Get me out of here' button in the checkGetMeOutOfHereButton function. In some runs, the waitForPageLoad seems to be ignored on the button press and the check is run immediately. As you can see, even the notification bar hasn't rolled up yet. So what can we do to force the test to wait for the actual page (default homepage) to be loaded completely here?
And by "wait for the actual page to be fully loaded" that would include changing the location bar's current address. It appears as though the current test in some runs  flies right over the controller.waitForPageLoad(); which doesnt give it enough time to make the changes.
Check the controller API for alternative wait methods you could be using.  Otherwise, you might have to just separate the two tasks with a sleep.  Henrik, please advise.
We could wait that the notification bar has been removed before calling assertLoadedUrlEqual or even waitForPageLoad.
Version: Trunk → 3.5 Branch
(In reply to comment #7)
> We could wait that the notification bar has been removed before calling
> assertLoadedUrlEqual or even waitForPageLoad.

Can we get an id to the notification bar or would I just use the available button? How would wait till node not exist?
(In reply to comment #6)
> Check the controller API for alternative wait methods you could be using. 
> Otherwise, you might have to just separate the two tasks with a sleep.  Henrik,
> please advise.

Anthony - No difference in testing with a controller.sleep(1000) between the two waits, still getting fails.
Henrik - I think that could be a solution. We really just want to give it a second to close the notification bar, load the default homepage completely and then do the check. To try this out, I'm a little stumped thinking about how would I go about waiting for the element (I'll use the button) to removed? I know we can check with assertNodeNotExist, but how would one wait in this case? Example?
(In reply to comment #9)
> would I go about waiting for the element (I'll use the button) to removed? I
> know we can check with assertNodeNotExist, but how would one wait in this case?
> Example?

Explained yesterday on IRC. Use the waitForEval function and check for ".getNode() == undefine" in the eval string.
(In reply to comment #10)
> (In reply to comment #9)
> > would I go about waiting for the element (I'll use the button) to removed? I
> > know we can check with assertNodeNotExist, but how would one wait in this case?
> > Example?
> 
> Explained yesterday on IRC. Use the waitForEval function and check for
> ".getNode() == undefine" in the eval string.

Apparently, it's not enough as the test run is timing out during some of my test runs (mixed passes and fails when using controller.waitForEval("subject.getNode == undefined", gTimeout, 100, button.getNode());

I've even padded in between the waits with sleeps but it's not making a difference. Any other ideas here on this one?
Means you can still access the button, even the notification bar has been closed? In such a case please use DOMi to check properties and attributes of the button or the notification bar. Normally the lookup string should be invalid. What about "== null"? Not sure what getNode() returns in such a situation.
These nodes should not exist when the notification bar closes, correct?. When we call waitThenClick on the button, the notification bar should close normally. From my testing, we still have access to the button, which seems to me like we need some other way to fully wait
Please check with DOMi if the notification bar is still be present after we clicked the button. If you have problems with it, please follow-up on IRC. Thanks.
So as we might have hinted at on Friday, there may be a Mozmill bug out of this about the controller retaining previous values as Henrik suggested in IRC [1]. Henrik, is there anything you can check out in mozmill's code that might suggest this? 

[1] In the checkGetMeOutOfHereButton function, after the call to waitThenClick, the notification bar closes after the button click. I insert a sleep(1000) and a controller.assertNodeNotExist(button); and get mixed pass and fails which shouldn't be the case. There should be a 100% pass on that as the button and notification bar is removed.
Secondly, I think another test failure is related. Please see this test failure

http://brasstacks.mozilla.com/couchdb/mozmill/_design/reports/_show/report/7702250e7c7d11dfa3c8005056a533eb

Unexpectedly found element Lookup: /id("main-window")/id("browser")/id("appcontent")/id("content")/anon({"anonid":"tabbox"})/anon({"anonid":"panelcontainer"})/{"id":"panel12770462957451"}/{"value":"blocked-badware-page"}/anon({"type":"critical"})/{"class":"messageCloseButton tabbable"}

This is another notification bar, the button is closed and yet it is unexpectedly found in a lookup after it closes
Can you please create a very minimalistic Mozmill test which demonstrates that behavior and can help us to narrow down the problem?
Attached patch v1 patch (obsolete) — Splinter Review
Good news. This issue is much simpler than thought previously. The issue is not giving enough time for the notification bar to popup before clicking on its button, so we need to wait after we click the ignore warning for a second with a sleep. After multiple passes across 1.9.2, and default this is good. I am not receiving both test fails anymore [1], [2].

[1] - See bug url
[2] - See comment #16
Attachment #452733 - Flags: review?(hskupin)
Comment on attachment 452733 [details] [diff] [review]
v1 patch

Please always ask Anthony for review first. Thanks.
Attachment #452733 - Flags: review?(hskupin) → review?(anthony.s.hughes)
Comment on attachment 452733 [details] [diff] [review]
v1 patch

I cannot commit this patch due to an empty commit message.  Make sure you add your export section to the patch:

hg export tip > path/to/patch/patchName
Attachment #452733 - Flags: review?(anthony.s.hughes) → review-
Attached patch v1.1 (obsolete) — Splinter Review
Try this
Attachment #452733 - Attachment is obsolete: true
Attachment #452816 - Flags: review?(anthony.s.hughes)
Comment on attachment 452816 [details] [diff] [review]
v1.1

I'm going to deny this patch for the time being.  We should find a better way than using sleep().  Aaron, there may be an element/attribute we can assert to make sure the notification bar is visible and in a clickable state.  I noticed that the xul:notificationbox element has a currentNotification attribute which is null when the notification bar is hidden.  Please investigate further.
Attachment #452816 - Flags: review?(anthony.s.hughes) → review-
Attached file testcase (obsolete) —
I stripped away code from the actual test and have a testcase attached that is giving me the test fail on every run. I don't know for certain if the issue is with notification bars or something else, it's over my head here.
(In reply to comment #23)
> Created an attachment (id=453096) [details]
> testcase
> 
> I stripped away code from the actual test and have a testcase attached that is
> giving me the test fail on every run. I don't know for certain if the issue is
> with notification bars or something else, it's over my head here.

If we comment out the browsing to checkNoPhishing, this will not produce the test fail. Therefore it's something involved with going forward to a page, closing it and going back.
> If we comment out the browsing to checkNoPhishing, this will not produce the
> test fail. Therefore it's something involved with going forward to a page,
> closing it and going back.

I'm seeing some interesting behaviour if you run through the steps manually with DOMi open.

When we first load the page with the notification bar, there is a notification element under the panel############ element.  However, if we click refresh then Ignore this Warning again, the notification element appears under a child of the panel element called xul:browser.

Try this manually to see if you can reproduce that behaviour.  It might be a key to this issue.
Tested the example test under os x and windows with Namoroka. I can't reproduce the failure. :/
(In reply to comment #26)
> Tested the example test under os x and windows with Namoroka. I can't reproduce
> the failure. :/

FWIW, I was only able to reproduce this failure reliably under Linux on 1.9.2 and default.  Please refer to my spreadsheet.
(In reply to comment #25)
> > If we comment out the browsing to checkNoPhishing, this will not produce the
> > test fail. Therefore it's something involved with going forward to a page,
> > closing it and going back.
> 
> I'm seeing some interesting behaviour if you run through the steps manually
> with DOMi open.
> 
> When we first load the page with the notification bar, there is a notification
> element under the panel############ element.  However, if we click refresh then
> Ignore this Warning again, the notification element appears under a child of
> the panel element called xul:browser.
> 
> Try this manually to see if you can reproduce that behaviour.  It might be a
> key to this issue.

I can reproduce this in 1.9.2 and default on OSX.
Anthony, in the test, after we return to the reported web forgery page, if we If we stick a controller.refresh() this seems to do the trick as I am unable to reproduce a fail. I suspect that there are two notification bars open.  Would you accept a patch for this in the interim?
I'm more in favour an actual fix for the problem than a workaround for the problem.  Can you please attach a WIP patch to show me the code for what you propose?
Attached patch [wip] v1.2 patch (obsolete) — Splinter Review
Attachment #452816 - Attachment is obsolete: true
In the patch above, the second refresh() is there to fix a second test failure I was randomly receiving in this test that would fail on the  assertNodeNotExist in the checkXButton test.
(In reply to comment #32)
> In the patch above, the second refresh() is there to fix a second test failure
> I was randomly receiving in this test that would fail on the 
> assertNodeNotExist in the checkXButton test.

I'm thinking the correction here needs to happen in the checking code.  In the manual test, the tester does not refresh the page.  Clearly we are checking against two notification bars (not the same one twice).  I think the solution here is to make sure you are doing the assertions on the right notification bar, not refreshing the page thereby ensuring only one notification bar to assert.  Our code should be robust enough to handle any number of concurrent notification bars in a single session.

Henrik, what are your thoughts on this?
Well, if we really have two notification bars open we should make sure that the one from the first test gets closed before we run the next test. I'm happy with any solution here. refresh is one way. I would prefer to simply call controller.open again.
(In reply to comment #34)
> Well, if we really have two notification bars open we should make sure that the
> one from the first test gets closed before we run the next test. I'm happy with
> any solution here. refresh is one way. I would prefer to simply call
> controller.open again.

My concern with this is the difference in UX...

Manual:
1. Load page
2. Ignore Warning
3. Get Me Out Of Here
4. Click BACK
5. Ignore Warning
6. Click X

Mozmill:
1. Load page
2. Ignore Warning
3. Get Me Out Of Here
4. Click BACK
5. Refresh or Load Page again
6. Ignore Warning
7. Click X

I'd say this points to a problem in controller.back() if we have to invoke step 5 in Mozmill.
Am able to reproduce the same test failure after multiple runs by invoking controller.back() in replace of controller.goBack()
Woops that should read controller.open()
Attached patch [wip] v1.3 patch (obsolete) — Splinter Review
If we rearrange our test checks alongside invoking open() rather than goBack(), to conduct as follows, the failure is removed:

1. Open http://www.mozilla.com/firefox/its-a-trap.html
2. Click the 'Ignore This Warning Button'
3. Notification is [visible] - Click the 'Get me out of here!' button
4. Notification is [removed] - Open http://www.mozilla.com/firefox/its-a-trap.html
5. Click the 'Ignore This Warning Button'
6. Notification is [visible] - Click the 'This isn't a web forgery' button, tap is closed after checks
7. Open http://www.mozilla.com/firefox/its-a-trap.html
8. Click the 'Ignore This Warning Button'
9. Notification is [visible] - Click the X (close) button
10. Repeat 1-9 with http://www.mozilla.com/firefox/its-an-attack.html
Attachment #453158 - Attachment is obsolete: true
Aaron, can you please check bug 557801 and the patch from Anthony. I would assume that the way he is using will also help us here. Shuffling things around isn't a real fix for our problem.
Attached patch v1.2 patch (obsolete) — Splinter Review
Checking for currentNotification is the actual solution here. As per investing Anthony's patch in bug 557801 and taking Henrik's advice above. The currentNotification attribute value of the panel is set to null as soon as a button press occurs and we need to wait for the change by checking for null on an waitForEval
Attachment #453478 - Attachment is obsolete: true
Attachment #453730 - Flags: review?(anthony.s.hughes)
Comment on attachment 453730 [details] [diff] [review]
v1.2 patch

>-const gDelay = 0;
>+const gDelay = 100;
> const gTimeout = 5000;
> 
All constants should be in ALL_CAPS.

> var setupModule = function(module)
>@@ -149,8 +150,14 @@
>                                    "safebrowsing.getMeOutOfHereButton.label");
>   var button = tabBrowser.getTabPanelElement(tabBrowser.selectedIndex,
>                                              '/{"value":"blocked-badware-page"}/{"label":"' + label + '"}');
You should move getting the button until we know a notification exists to grab from.

>+  var notificationPanel = tabBrowser.getTabPanelElement(tabBrowser.selectedIndex, "");
>+  
You can simply call this "panel".  Also, simply calling tabBrowser.getTabPanelElement() should yield the same result since both parameters are optional.

>+  // Check that the notification bar is not visible before loading the page
>+  controller.waitForEval("subject.notification.currentNotification == null", gTimeout, gDelay,
>+                        {notification: notificationPanel.getNode()});
I believe you can do "subject.currentNotification" and just use panel.getNode() as the subject (no {}).
Attachment #453730 - Flags: review?(anthony.s.hughes) → review-
Scratch checking and pending on the status on the currentNotification, as this is not the primary issue. Using the patch above eventually results in the same failure, (26 runs for me this afternoon ). After more testing today on trying to rid of the random test failure, I discovered that using the notification panel and calling allNotifications.length, after the button press in checkGetMeOutOfHereButton will present a value of 1, meaning that despite clicking the button and visually seeing the notification bar remove itself,  there still is a notification object in the node list. Doing this manually through DOMi you can see the currentNotification become null, yet in the test they retain a value - thus a timeout. So, after the button click, the notification panel has a nodeList length value of 1, where other checks in the similar fashion produce 0. Calling removeAllNotifications() before each call to the ignore warning page (which opens a new notification bar) does not seem to do the trick. There's something I'm not seeing about this particular button press in this particular function. Perhaps it's this code here: http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#2472

 I hate for this to drag on, can we get a Tabbed Browser expert here?
(In reply to comment #42)
>  I hate for this to drag on, can we get a Tabbed Browser expert here?

I'm sure Dao will be able to help us here.
(In reply to comment #43)
> I'm sure Dao will be able to help us here.

Dao, could you please have a look at comment 42? This failure happens nearly in all test-runs and we want to have it fixed soon. Thanks.
(In reply to comment #42)
> After more testing today on trying
> to rid of the random test failure, I discovered that using the notification
> panel and calling allNotifications.length, after the button press in
> checkGetMeOutOfHereButton will present a value of 1, meaning that despite
> clicking the button and visually seeing the notification bar remove itself, 
> there still is a notification object in the node list.

Maybe the problem is that the notification bar doesn't go away immediately but slides up...
Yes, that addresses the second of two reoccurring issues with this test. Although this is with sliding/animating in. See this test failure 

http://brasstacks.mozilla.com/couchdb/mozmill/_design/reports/_show/report/4893774c8f5911dfb2f8005056a533eb

For this particular issue, I believe we need a way to wait for the slide transition. During the test run we are clicking the 'List all tabs' button, when we meant to click the close X button on the notification bar.

Henrik, how do you feel about a sleep() before the waitThenClick?

As for the first issue, 
ERROR - Test Failure: {u'exception': {u'message': u"timeout exceeded for waitForEval('subject.targetURL.value == subject.currentURL')",

I have not identified the real issue. We are clicking the Get Me Out of Here button, the website changes but it does an assert from the previous location URL.
For default branch we can see what happens with this test once bug 578785 lands

For the other branches, we need a way to wait for the slide transition to complete on loading notifications and I don't know of any such event.
Attachment #453096 - Attachment is obsolete: true
Attachment #453730 - Attachment is obsolete: true
Attached file reduced-testcase (obsolete) —
Attachment #460254 - Attachment is obsolete: true
Attached file reduced-testcase (obsolete) —
Attached file reduced-testcase-debugmsgs (obsolete) —
Attached file testcase-errormsgs (obsolete) —
Looks like this is a problem with our current waitForPageLoad implementation. My patch on bug 503277 fixes the problem for me. Aaron can you please also try it?
Your latest patch doesn't apply cleanly for me on a clone of mozautomation just performed.
Assignee: aaron.train → nobody
Component: Security → Mozmill Tests
Product: Firefox → Testing
QA Contact: firefox → mozmilltests
Version: 3.5 Branch → unspecified
Lets work this out on IRC. Hopefully I can checkin that patch today.
Depends on: 582618
Depends on: 602040
Assignee: nobody → hskupin
Depends on: 604832
No longer depends on: 582618, 602040
Whiteboard: [mozmill-test-failure] → [mozmill-test-failure][will be fixed by bug 604832]
Bug 604832 has only fixed that for the default branch. Looks like we eventually miss something on 1.9.2
Assignee: hskupin → nobody
Status: ASSIGNED → NEW
No longer depends on: 604832
Whiteboard: [mozmill-test-failure][will be fixed by bug 604832] → [mozmill-test-failure]
Depends on: 604832
Attachment #460255 - Attachment is obsolete: true
Attachment #460544 - Attachment is obsolete: true
Attachment #452016 - Attachment is obsolete: true
Attachment #460543 - Attachment is obsolete: true
Fixed with mozmill 1.5.1
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
This should have been fixed by my patch on bug 602040.
Assignee: nobody → hskupin
Depends on: 602040
No longer depends on: 604832
Move of Mozmill Test related project bugs to newly created components. You can
filter out those emails by using "Mozmill-Tests-to-MozillaQA" as criteria.
Product: Testing → Mozilla QA
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: