Closed Bug 468050 Opened 17 years ago Closed 17 years ago

Linux gloda tests are frequently failing on tinderbox

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Thunderbird 3.0b2

People

(Reporter: standard8, Assigned: asuth)

References

Details

Attachments

(5 files)

[checked in] v1 get info on the exception that killed us 17 years ago Andrew Sutherland [:asuth] (he/him) 1.07 KB, patch	standard8 : review+	Details \| Diff \| Splinter Review
[checked in: comment #4] v1 make the gloda tests timeout after 3 minutes and die in a fashion that gets us the log 17 years ago Andrew Sutherland [:asuth] (he/him) 1.34 KB, patch	Bienvenu : review+	Details \| Diff \| Splinter Review
v1 have gloda handle another return code, make helper test code more resilient 17 years ago Andrew Sutherland [:asuth] (he/him) 5.39 KB, patch	Bienvenu : review+ standard8 : review+	Details \| Diff \| Splinter Review
[checked in: comment #12] v1 don't use non-existent error constants, invent our own! 17 years ago Andrew Sutherland [:asuth] (he/him) 3.06 KB, patch	standard8 : review+	Details \| Diff \| Splinter Review
[checked in] v1 gloda pending folder logic was bitrotted 17 years ago Andrew Sutherland [:asuth] (he/him) 4.94 KB, patch	standard8 : review+	Details \| Diff \| Splinter Review

Mark Banner (:standard8)

Reporter

Description

•

17 years ago

Since we checked in bug 466800 we're seeing frequent failures on Linux in test_query_messages.js e.g. http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird/1228438842.1228439080.7206.gz&fulltext=1 2008-12-04 17:19:47 gloda.indexer INFO Queue-ing folder for indexing: injecty1 2008-12-04 17:19:47 gloda.datastore DEBUG !! mapped 2 from mailbox://nobody@Local%20Folders/injecty1 2008-12-04 17:19:47 gloda.indexer DEBUG Entering folder: mailbox://nobody@Local%20Folders/injecty1 2008-12-04 17:19:47 gloda.indexer ERROR Problem entering folder: injecty1, skipping. *** exiting NEXT ERROR *** TEST-UNEXPECTED-FAIL | file:///buildbot/trunk-unix/build/objdir-tb/mozilla/dist/bin/modules/gloda/log4moz.js | 2008-12-04 17:19:47

Flags: blocking-thunderbird3+

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

17 years ago

Priority: -- → P1

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 1

•

17 years ago

Attached patch [checked in] v1 get info on the exception that killed us — Details — Splinter Review

This patch folds the error logging code on the next line into the prior error logging line. Because of my logger magic that kills us the first time we hit a logged error, the second error logging never gets a chance to live, let alone to escape to the console. This does not fix the problem, but will help me understand why things are unhappy.

Attachment #351670 - Flags: review?(bugzilla)

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

17 years ago

Whiteboard: [has diagnostic patch][needs review standard8]

Mark Banner (:standard8)

Reporter

Comment 2

•

17 years ago

Comment on attachment 351670 [details] [diff] [review] [checked in] v1 get info on the exception that killed us This looks good, more debug is always useful. r=me. I've pushed this as well: http://hg.mozilla.org/comm-central/rev/33793220167f

Attachment #351670 - Attachment description: v1 get info on the exception that killed us → [checked in] v1 get info on the exception that killed us

Attachment #351670 - Flags: review?(bugzilla) → review+

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 3

•

17 years ago

Attached patch [checked in: comment #4] v1 make the gloda tests timeout after 3 minutes and die in a fashion that gets us the log — Details — Splinter Review

So, that logic actually made things worse, as now the tests time out. This patch causes gloda tests to abort themselves after a ridiculously long delay in a manner that gets us a log. Ridiculously long is currently 3 minutes for each of the gloda tests. This is, in fact, ridiculously long; it currently takes 6 seconds to run all the gloda tests on my box. Even on a slow tinderbox with a head cold and a hangover, 3 minutes should be much more time than we need. (I am leaving a lot of margin because we are trying to fix intermittent failures here, not make them more likely.)

Attachment #351722 - Flags: review?

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

17 years ago

Attachment #351722 - Flags: review? → review?(bienvenu)

David :Bienvenu

Updated

•

17 years ago

Attachment #351722 - Flags: review?(bienvenu) → review+

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 4

•

17 years ago

Timeout patch pushed: http://hg.mozilla.org/comm-central/rev/aaf812024908

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

17 years ago

Attachment #351722 - Attachment description: v1 make the gloda tests timeout after 3 minutes and die in a fashion that gets us the log → [checked in: comment #4] v1 make the gloda tests timeout after 3 minutes and die in a fashion that gets us the log

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 5

•

17 years ago

Attached patch v1 have gloda handle another return code, make helper test code more resilient — Details — Splinter Review

We got a good failure! Hooray! http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird/1228622442.1228622844.24107.gz The error code returned 0x80550005 = NS_MSG_ERROR_FOLDER_SUMMARY_OUT_OF_DATE indicates that nsMsgLocalMailFolder::GetDatabaseWithReparse was called when m_parsingFolder was true. Gloda's indexer.js was not expecting this return value and was exploding. This was a bug in gloda (although it could be argued that GetDatabaseWithReparse should be returning NS_ERROR_NOT_INITIALIZED for consistency). I also noticed that we were passing the folder as the aReparseUrlListener, which is not required (and filtered out by ParseFolder but not GetDatabaseWithReparse), so we just pass null for that now. The curious thing is that m_parsingFolder should not have been true, since the point of the updateFolderAndNotify helper was only to call the callback when things are completely loaded. I am having trouble mentally synthesizing a chain of events that would result in this happening. The best explanation was that nsMsgLocalMailFolder::OnStopRunningUrl notified mReparseListener and that listener generated a FolderLoaded notification (at which time m_parsingFolder would still be true) so updateFolderAndNotify's immediate invocation would explain things. Except that even when mReparseListener is the folder (which is half-guarded against, as previously noted), this still can't happen unless it were to somehow short-circuit to the super-class's implementation. (m_parsingFolder is set to false immediately before calling into nsMsgDBFolder::OnStopRunningUrl who does the FolderLoaded notification.) I've made changes to to mailTestUtils to have updateFolderAndNotify also handle the additional error code that GetDatabaseWithReparse can propagate back through updateFolder. I've also introduced a 0ms-delay into the equation for the callback to attempt to address at least one class of problems, but I fear that there's another actor in the scenario that I just don't know about. (Or the bowels of the parsing logic are just doing something unexpected in the potential situation where the timestamp of the mbox file is equal to the msf file, because they would assume the msf file should be more recent due to use of > rather than >=, or something.)

Attachment #351738 - Flags: review?(bienvenu)

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

17 years ago

Attachment #351738 - Flags: review?(bugzilla)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 6

•

17 years ago

(Requesting both testing and database-related reviews given my general confusion.)

Status: NEW → ASSIGNED

Whiteboard: [has diagnostic patch][needs review standard8] → [has patch][needs review bienvenu][needs review standard8]

Mark Banner (:standard8)

Reporter

Comment 7

•

17 years ago

Comment on attachment 351738 [details] [diff] [review] v1 have gloda handle another return code, make helper test code more resilient I agree it seems a bit strange, but I have seen problems before where we've needed the timeout to allow us to let things finish properly before we go onto doing the next item. Lets give it a try with this.

Attachment #351738 - Flags: review?(bugzilla) → review+

David :Bienvenu

Comment 8

•

17 years ago

Comment on attachment 351738 [details] [diff] [review] v1 have gloda handle another return code, make helper test code more resilient notification... + // is going to send us a noitification when the reparse has completed.

Attachment #351738 - Flags: review?(bienvenu) → review+

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 9

•

17 years ago

(In reply to comment #8) > (From update of attachment 351738 [details] [diff] [review]) > notification... > > + // is going to send us a noitification when the reparse has completed. Oh, a woise guy, eh? Commited, with nit addressed: http://hg.mozilla.org/comm-central/rev/69df76ff98d3 I'll mark this fixed if the tests stop going orange after a while...

Whiteboard: [has patch][needs review bienvenu][needs review standard8] → [waiting for linux comm-central check to be clear]

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 10

•

17 years ago

Attached patch [checked in: comment #12] v1 don't use non-existent error constants, invent our own! — Details — Splinter Review

It turns out that mailnews' error codes don't show up in Components.results. Standard8 is logging a bug.

Attachment #352080 - Flags: review?(bugzilla)

Mark Banner (:standard8)

Reporter

Comment 11

•

17 years ago

Comment on attachment 352080 [details] [diff] [review] [checked in: comment #12] v1 don't use non-existent error constants, invent our own! Looks good. Lets see how this affects things.

Attachment #352080 - Flags: review?(bugzilla) → review+

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 12

•

17 years ago

Comment on attachment 352080 [details] [diff] [review] [checked in: comment #12] v1 don't use non-existent error constants, invent our own! http://hg.mozilla.org/comm-central/rev/b0c1cc7f3e97

Attachment #352080 - Attachment description: v1 don't use non-existent error constants, invent our own! → [checked in: comment #12] v1 don't use non-existent error constants, invent our own!

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 13

•

17 years ago

Attached patch [checked in] v1 gloda pending folder logic was bitrotted — Details — Splinter Review

With the error code now working, we began using the pending folder logic, which apparently had rotted, presumably around the time when we changed to our current style of async processing and probably also around the time we stopped calling updateFolder like it was going out of fashion.

Attachment #352085 - Flags: review?(bugzilla)

Mark Banner (:standard8)

Reporter

Comment 14

•

17 years ago

Comment on attachment 352085 [details] [diff] [review] [checked in] v1 gloda pending folder logic was bitrotted r=me. Checked in: http://hg.mozilla.org/comm-central/rev/b0c1cc7f3e97

Attachment #352085 - Attachment description: v1 gloda pending folder logic was bitrotted → [checked in] v1 gloda pending folder logic was bitrotted

Attachment #352085 - Flags: review?(bugzilla) → review+

Mark Banner (:standard8)

Reporter

Comment 15

•

17 years ago

This seems to be fixed now :-)

Status: ASSIGNED → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Whiteboard: [waiting for linux comm-central check to be clear]

You need to log in before you can comment on or make changes to this bug.