Closed Bug 640877 Opened 9 years ago Closed 9 years ago

Random orange: TEST-UNEXPECTED-FAIL | test-selection.js | test_selection_extension

Categories

(Thunderbird :: Folder and Message Lists, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 5.0b1

People

(Reporter: asuth, Assigned: asuth)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

The following random orange is being pretty reliably seen on the newly unstarred linux builders:

TEST-UNEXPECTED-FAIL | test-selection.js | test_selection_extension
  EXCEPTION: Timeout waiting for events: [undefined undefined]
    at: test-folder-display-helpers.js line 143
            test-folder-display-helpers.js 143
       mark_failure([object Array]) logHelper.js 588
       FolderListener_waitForEvents() test-folder-display-helpers.js 1555
       wait_for_folder_events() test-folder-display-helpers.js 1587
       press_delete() test-folder-display-helpers.js 1318
       test_selection_extension() test-selection.js 67
            frame.js 474
            frame.js 530
            frame.js 573
            frame.js 417
            frame.js 579
            server.js 164
            server.js 168

Per arbpl logging, this is one of those exciting situations where our (delete) keypress event doesn't seem to actually be happening.  At the time of event generation, there is no focused element in the window, but I don't think that should actually affect anything (and the window proper will just catch it; this needs more investigation/remembering).

A possibility for the greater problem is that at the time we generate the failure notification, the Thunderbird window is not the active window, suggesting we may just be refusing to generate the event because the window is not focused.


This is one of those analyses where it would be nice to be able to easily see what the logging shows for a healthy run of the test.  Unfortunately, it's prohibitively expensive to log that data for successful tests for every run.

One easier way to address this without getting crazy would be to let us annotate specific runs as deserving of having their output logged every time.  We would do this by either leveraging the try server or doing a short-lived test-only type patch.

Another way would be to have runs where we do dump everything, but have that be a special buildbot job that only runs once a week or something.  The arbpl server would need to special-case these runs, but could then offer a "compare with baseline" option.
I have pushed a logging enhancement patch to try that provides additional detail in press_delete/archive_selected_messages/press_enter about what is focused when we do these things.  It's implemented as a helper in test-window-helpers so we can use it elsewhere.  Specifically, it might make sense to wrap the keypress event on the controller so we always are doing something like this.

The logging looks like so:
"in window: mail:3pane (1) focused kid: DomNode: browser#multimessage: null focused kid: DomNode: button#trash: null"

It is created by having getFocusedElementForWindow take baby steps through the window hierarchy rather than use the flag that pierces nested windows in one step.

http://hg.mozilla.org/try-comm-central/rev/19ba84db6dc3

From brief local testing, it appears like the problem likely is our focus is somehow ending up on the multimessage pane and that is breaking us, though it's not immediately obvious under what situations the multimessage pane would grab focus.  I'll see whether we can sniff focus/blur events to assist in having more insight into focus stuff.
Whiteboard: [orange] → [tb-orange]
It passed the try server (with an expected intermittent orange on the account setup thing that was green in the other mac build), and I have pushed the logging patch to trunk:
http://hg.mozilla.org/comm-central/rev/76d270866f0a
Hooray for fancy logging!:
http://arbpl.visophyte.org/?tree=Thunderbird&pushid=5657&log=19228%3A1300589912.1300591281.8389.gz

selected messages: mailbox://nobody@Local%20Folders/SelectionA#280mailbox://nobody@Local%20Folders/SelectionA#569 in window: mail:3pane (1) focused kid: browser#messagepane focused kid: null

The bloody message pane has the focus, and it's not even visible.  This makes the focus/blur sniffing sound even more useful, especially for comparative analysis purposes against a good run; I'll pursue that next with the aim of seeing one bad run with that logging and then landing an actual fix.
blur/focus logic pushed to try:
http://hg.mozilla.org/try-comm-central/rev/f730dc679775

produces exciting logging like so (this is logsploder; arbpl will be slightly different):
* winhelp blur DomNode: tree#folderTree: null in DomWindow: mail:3pane: Big Meeting Today - TabsSimpleA - Shredder
* winhelp focus in DomWindow:mailbox://.../Local%20Folders/TabsSimpleA?number=0 frame: DomNode: browser#messagepane: null in DomWindow: mail:3pane: Big Meeting Today - TabsSimpleA - Shredder
try passed with windows box abstention => blur/focus logic pushed to trunk:
http://hg.mozilla.org/comm-central/rev/e7c212dc0f4e

It appears the reason this orange happens is:

- test_selection.js never causes anything to be explicitly focused, so it just leaves whatever was previously focused focused.

- our mozmill tests appear to be run in an order that is a function of the order the OS returns them in, with linux winning the big prize for inconsistent ordering.


The right steps to take would then seem to be:

- Change the select_click_row stuff and friends to actually cause the thread-pane to be focused like it would be if we were actually clicking.  Longstanding bad on my part.  I may be able to remove a lot of explicit focusing calls that were a band-aid over this.

I think it makes sense that all semantic operations should end up having the same side-effects as the operations the user would use to trigger them.  Since we already have a _row_click_helper function, I'll try and just use that (adding ctrl/shift args) to perform the click since it's more realistic and I'm not aware of any likely failures that would occur unless we are screwing up and sizing the thread pane to have zero height, in which case we'd probably want to fail.

- Make sure that we run our mozmill tests in a consistent order, probably alphabetical.  I would not be surprised if upstream has already dealt with this...
(In reply to comment #5)
> - Make sure that we run our mozmill tests in a consistent order, probably
> alphabetical.  I would not be surprised if upstream has already dealt with
> this...

I know 1.5.2 has a manifest system, it looks like in that you need to specify every file, so presumably that would determine our order of running. Looks like bug 620471 may be better to pick up sooner rather than later.
(In reply to comment #5)
> The right steps to take would then seem to be:
> 
> - Change the select_click_row stuff and friends to actually cause the
> thread-pane to be focused like it would be if we were actually clicking. 
> Longstanding bad on my part.  I may be able to remove a lot of explicit
> focusing calls that were a band-aid over this.

I pushed a fix that attempts to do this; it passed all mozmill tests locally:
http://hg.mozilla.org/comm-central/rev/2ee7b4e1dc7b

Ideally that will fix this bug and we can spin off the ordering thing as dependent on the new version of mozmill.
OS: Linux → Windows Server 2003
I backed it out; there was a lot of breakage I was not expecting.
http://hg.mozilla.org/comm-central/rev/f1f50ea4e00c
This patch is a fixed version of my misguided previous push which has us actually generate synthetic click events (which cause focus changes as a byproduct) rather than issuing direct selection change calls.  This allows us to remove some explicit calls; in other cases where we were explicitly tracking focus across tab changes, it required additional explicit focusing because we were assuming that we could manipulate the thread tree without causing focus changes.

Additional changes:
- Some logging enhancements in test-folder-display-helpers.js that were useful in fixing the problems encountered while developing the patch.

- _row_click_helper does not accidentally assume "mc" in all cases, and tries to click on the subject column so we don't get into a situation where we are accidentally clicking reactive columns such as the read/junk indicator columns.  It also knows to stay out of the twisty regions for the thread pane (which are always leftmost.)  I think the folder pane is a more complicated situation twisty-wise where we continue to fake things, which may be for the best.  It also took me a while to correctly implement the "control" clicking for macs, ugh.

- I added various explicit plan_for_message_display/wait_for_message_display_completion calls that were required to avoid introducing new oranges or seemed like suitable fixes to make in the regions I was making changes.

- I makde folderEventLogHelper note folderAdded events; it had not been doing so previously because it was originally gloda test code and gloda does not care about folder additions directly.

This passed the try server completely green; additionally various local linux and OS X mozmill test runs ran happily.  I would not be surprised if this shakes out some additional low-frequency intermittent oranges, but I think it may also kill a lot more low-frequency ones than it creates, not to mention high-frequency ones like this bug.
Attachment #523741 - Flags: review?(sid.bugzilla)
sid, review ping.  'lemme know if you are busy and I can find another reviewer.  (Thanks!)
I've reviewed part of the patch, and I'll try to finish it off today.
Comment on attachment 523741 [details] [diff] [review]
v1 generate clicks instead of approximating the effect of clicks

(sid0 has indicated he is busy through at least the weekend, so redirecting since I think it would be ideal to get this landed sooner; I will respond to comments on the patch even after receiving review, with the caveat I may not be immediately responsive due to upcoming vacation.)

Standard8, feel free to land this if it looks good to you.
Attachment #523741 - Flags: review?(sid.bugzilla) → review?(bugzilla)
Comment on attachment 523741 [details] [diff] [review]
v1 generate clicks instead of approximating the effect of clicks

+  // how many messages to we expect to remain after the archival?

nit: "do" instead of the first "to".

Ok, the general idea here looks good, let's try it out and see what we get. I'll land this in a minute or two.
Attachment #523741 - Flags: review?(bugzilla) → review+
So far this is looking good. There are one or two possibly new random failures, but generally much better.

Trunk seems to have picked up a nasty habit of exiting early but that was before this change and I don't think its changed it either way. Something for a different bug ;-)
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
OS: Windows Server 2003 → All
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 3.3a4
I have filed bug 653770 on the dominant fallout from this bug, namely that there seem to be some cases where summarization is initiated and appears to completely but the testing code doesn't notice.
Version: unspecified → Trunk
Whiteboard: [tb-orange]
You need to log in before you can comment on or make changes to this bug.