Closed Bug 723081 Opened 12 years ago Closed 12 years ago

Near-perma-orange TEST-UNEXPECTED-FAIL | test_index_messages_imap_offline.js | test_index_messages_imap_online_to_offline.js | Header [msgHdr imap://user@localhost/gabba12#3] in folder [folder gabba12] should have been indexed

Categories

(MailNews Core :: Database, defect)

defect
Not set
normal

Tracking

(thunderbird11 fixed, thunderbird12+ fixed)

RESOLVED FIXED
Thunderbird 13.0
Tracking Status
thunderbird11 --- fixed
thunderbird12 + fixed

People

(Reporter: standard8, Assigned: Bienvenu)

References

Details

(Keywords: regression)

Attachments

(3 files, 4 obsolete files)

This appears to be a regression from one of these bugs: bug 721543, bug 721548, bug 721515.

Since they landed, we've been seeing near-permanent oranges on our opt builds:

TEST-UNEXPECTED-FAIL | /buildbot/comm-central-linux64-opt-unittest-xpcshell/build/xpcshell/tests/mailnews/db/gloda/test/unit/test_index_messages_imap_offline.js | test failed (with xpcshell return code: 0), see following log:
>>>>>>>
TEST-UNEXPECTED-FAIL | resources/glodaTestHelper.js | Header [msgHdr imap://user@localhost/gabba12#3] in folder [folder gabba12] should have been indexed. - See following stack:
JS frame :: /buildbot/comm-central-linux64-opt-unittest-xpcshell/build/xpcshell/head.js :: do_throw :: line 462
JS frame :: ../../../../resources/logHelper.js :: mark_failure :: line 620
JS frame :: resources/glodaTestHelper.js :: _indexMessageState_assertExpectedMessagesIndexed :: line 493
JS frame :: resources/glodaTestHelper.js :: wait_for_gloda_indexer :: line 352
JS frame :: base_index_messages.js :: test_moved_message_attributes :: line 560
JS frame :: ../../../../resources/asyncTestUtils.js :: _async_driver :: line 155
JS frame :: /buildbot/comm-central-linux64-opt-unittest-xpcshell/build/xpcshell/head.js :: <TOP_LEVEL> :: line 429

TEST-INFO | (xpcshell/head.js) | exiting test
*******************************************
Generator explosion!
Unhappiness at: undefined:undefined
Because: 2147500036

I couldn't reproduce in my debug mode build.
Ok, this is near-permanent on our Linux & Windows builders, and frequent on the Mac ones. That probably means its a timing issue as Linux & Windows are VM for us.
I'll see if I can reproduce this
I was able to reproduce this on a release mac build. I'll look into it.
As part of the offline move, gloda is getting a MoveCopy notification

2012-02-08 08:46:05	gloda.index_msg	DEBUG	MoveCopy notification.  Move: true

in the successful case, glodaTestHelper.js's _indexMessageState gets an onItemsAdded notification with the moved message immediately after the MoveCopy notification. In the failure case, it doesn't get the notification at all, or at least, it doesn't get it before the test fails. I think the onItemsAdded notification is supposed to be synchronous, so I'll have to poke around to see why that notification isn't getting sent.
In the failure case, datastore.js's updateMessageLocations fails on the lookup of the msgKey of the existing message:

    let inMemoryItems = {}, modifiedItems = [];
    GlodaCollectionManager.cacheLookupMany(GlodaMessage.prototype.NOUN_ID,
                                           cacheLookupMap,
                                           inMemoryItems,
                                           /* do not cache */ false);
    for each (let [glodaId, glodaMsg] in Iterator(inMemoryItems)) {
      glodaMsg._folderID = destFolderID;
      glodaMsg._messageKey = cacheLookupMap[glodaId];
      modifiedItems.push(glodaMsg);
    }

so modifiedItems is empty and we don't call GlodaCollectionManager.itemsModified.
I only just got the two tests that we think are constantly failing :-( (thanks spam filters).

I've checked in a couple of skip-if for these tests:

http://hg.mozilla.org/comm-central/rev/cd5d85131466
Summary: Near-perma-orange TEST-UNEXPECTED-FAIL | test_index_messages_imap_offline.js | Header [msgHdr imap://user@localhost/gabba12#3] in folder [folder gabba12] should have been indexed → Near-perma-orange TEST-UNEXPECTED-FAIL | test_index_messages_imap_offline.js | test_index_messages_imap_online_to_offline.js | Header [msgHdr imap://user@localhost/gabba12#3] in folder [folder gabba12] should have been indexed
The problem is that test_moved_message_attributes is ordered correctly in the file but not in the list of tests.  It comes before test_attributes_fundamental_from_disk in the file, but after it in the list of tests.  This is a problem because test_attributes_fundamental_from_disk calls nukeGlodaCachesAndCollections, losing our invariant that the messages in 'fundamentalMsgSet' will be accessible via a strong reference.

The test works sometimes because test_attributes_fundamental_from_disk uses queryExpect which creates a query that will contain the messages.  However, its resulting collection is vulnerable to garbage collection once the test completes, hence explosions.
Attached patch reorder the test, why not (obsolete) — Splinter Review
Since you claim to be able to reproduce this...

I guess beware of it getting cancelled out by Mark's patch?
Attachment #595832 - Flags: review?(dbienvenu)
Attachment #595832 - Attachment is patch: true
Bless you, I'll buy you a beer, or a drink with an umbrella next time you're in the Bay Area - oops!

I'll try it out right away.
w/ the patch, we fail a little bit earlier, I think:

TEST-UNEXPECTED-FAIL | base_index_messages.js | imap://user@localhost/gabba11 == imap://user@localhost/gabba10 - See following stack:
JS frame :: /Users/davidbienvenu/tbirdhg/mozilla/testing/xpcshell/head.js :: do_throw :: line 462
JS frame :: /Users/davidbienvenu/tbirdhg/mozilla/testing/xpcshell/head.js :: _do_check_eq :: line 556
JS frame :: /Users/davidbienvenu/tbirdhg/mozilla/testing/xpcshell/head.js :: do_check_eq :: line 577
JS frame :: base_index_messages.js :: verify_attributes_fundamental :: line 465
JS frame :: base_index_messages.js :: verify_attributes_fundamental_from_disk :: line 590
JS frame :: resources/glodaTestHelper.js :: query_expectation_onItemsAdded :: line 846
JS frame :: resource:///modules/gloda/collection.js :: gloda_coll_onItemsAdded :: line 612
JS frame :: resource:///modules/gloda/datastore.js :: <TOP_LEVEL> :: line 166
JS frame :: resource:///modules/gloda/collection.js :: gloda_coll_onItemsAdded :: line 612
JS frame :: resource:///modules/gloda/datastore.js :: <TOP_LEVEL> :: line 166
JS frame :: resource:///modules/gloda/collection.js :: gloda_coll_onItemsAdded :: line 612
JS frame :: resource:///modules/gloda/datastore.js :: <TOP_LEVEL> :: line 166
JS frame :: resource:///modules/gloda/collection.js :: gloda_coll_onItemsAdded :: line 612
JS frame :: resource:///modules/gloda/datastore.js :: <TOP_LEVEL> :: line 166
JS frame :: resource:///modules/gloda/datastore.js :: gloda_ds_qfq_handleCompletion :: line 417

TEST-INFO | (xpcshell/head.js) | exiting test

I'll poke around a bit at that failure. It happens with both release and debug builds, so it doesn't seem to be as timing dependent as the previous failure.
whoops.  I shouldn't have been so eager to go to lunch.  Unfortunately, test_attributes_fundamental_from_disk then goes and checks that the messages are in their original location.  If we change fundamentalFolderHandle in test_moved_message_attributes to be the new target folder, things should be fine...
that gets test_attributes_fundamental_from_disk to pass, but now test_attributes_explicit fails. I set fundamentalFolderHandle to destFolder immediately after making destFolder.

TEST-INFO | (xpcshell/head.js) | test 2 pending

TEST-UNEXPECTED-FAIL | base_index_messages.js | -1 != -1 - See following stack:
JS frame :: /Users/davidbienvenu/tbirdhg/mozilla/testing/xpcshell/head.js :: do_throw :: line 462
JS frame :: /Users/davidbienvenu/tbirdhg/mozilla/testing/xpcshell/head.js :: _do_check_neq :: line 518
JS frame :: /Users/davidbienvenu/tbirdhg/mozilla/testing/xpcshell/head.js :: do_check_neq :: line 539
JS frame :: base_index_messages.js :: test_attributes_explicit :: line 637
JS frame :: ../../../../resources/asyncTestUtils.js :: _async_driver :: line 155
JS frame :: /Users/davidbienvenu/tbirdhg/mozilla/testing/xpcshell/head.js :: <TOP_LEVEL> :: line 429

TEST-INFO | (xpcshell/head.js) | exiting test
*******************************************
 I'll poke around some more...
in particular, gmsg.tags seems to be empty
from what I can tell, we're not calling Gloda_explattr_process after setting the tag on the msgSet, which means gmsg.tags will be null. But maybe I'm confused about the various objects involved.
even after commenting out the tag tests, we fail in test_attributes_explicit because test_move_messages moves a message that test_attributes_explicit looks for a message that test_moved_message_attribute moves.
I am deeply confused by the tag failure.  I'm running with logsploder and I had logHelper.js also log the keywords on a nsIMsgDBHdr.  So we perform the mutation and I see the OnItemPropertyFlagChanged event for "Keywords" and indeed the header has "$label1" in it.  But when gloda gets around to indexing it during the call to Gloda_explattr_process (which does happen), the call to:
  let keywords = aMsgHdr.getStringProperty("keywords");
nets us an empty string!

messageModifier.js uses the folder's addKeywordsToMessages and in the IMAP case this should trigger a commit of the database, so I don't really understand where that value could be disappearing.
And if I comment out test_attributes_explicit in the list of tests, the test does pass for me.  The only tests that should be inter-dependent are the fundamental tests and the moved messages tests which piggybacked on the fundamental tests.

bienvenu, any ideas on how we could end up with a disappearing keywords string property, possibly as a result of recent backend changes?  I can do a deeper dig with gdb to figure out what's going on, but if there's anything that springs to mind, that would be great.
> 
> bienvenu, any ideas on how we could end up with a disappearing keywords
> string property, possibly as a result of recent backend changes?

No, there haven't been any recent backend changes in this area that I know of. Pluggable stores landed at the end of last year and that touched a lot of stuff but it's hard to see how it could affect imap headers.

As you say, the commit should happen; and the msg hdr shouldn't go away unless you're doing something special to make it get released (like close the db) since the db caches a certain number of headers. My first guess would be that it's a different message header, though. This isn't the imap header that went from being a pseudo header to a real header during an imap move, is it? It's just a normal imap msg hdr...I can poke at this a little more - maybe you could attach that patch you have so far, including the logging and I could see if the output triggers any thoughts.
Comment on attachment 595832 [details] [diff] [review]
reorder the test, why not

this is fine, but it requires a couple more changes to work. I'll attach a cumulative patch in a minute.
Attachment #595832 - Flags: review?(dbienvenu) → review+
Attached patch proposed fix (obsolete) — Splinter Review
Thx very much to asuth for figuring out what was going on here.

Besides the gc vulnerability fixed by re-ordering the tests, this patch does a few other things:

re-enable the tests
set fundamentalFolderHandle to the dest folder
fix our handling of IMAP unsolicited keyword change notifications. Long story short, doing the move before the other tests caused a race condition where offline playback of the move happens while the other tests are running, which leads to a situation where we have to select the folder when adding the tags, which makes us clear the keywords on the msg hdr which we set locally before running the add tag url. This fix makes it so that we set the keyword again when we get the unsolicited response that the keyword was added. 

This will also help in situations where tags are applied from other connections, and we get told about the keyword change when doing some non-folder update operation. In general, we would realize this next time we did a full flag sync, but with condstore servers, that might be a while.

asking for r/sr since it's technically an interface change. I'll request a try server build in a minute.
Assignee: nobody → dbienvenu
Attachment #596362 - Flags: superreview?(mbanner)
Attachment #596362 - Flags: review?(bugmail)
Comment on attachment 596362 [details] [diff] [review]
proposed fix

clearing review requests - a couple tests are failing
Attachment #596362 - Flags: superreview?(mbanner)
Attachment #596362 - Flags: superreview?
Attachment #596362 - Flags: review?(bugmail)
Attachment #596362 - Flags: review?
one of the test failures is because the fakeserver lowercases keywords. I have a fix for that. The other failure in test_index_messages_imap_online_to_offline is a bit harder:

TEST-INFO | (xpcshell/head.js) | test 2 finished
2012-02-11 16:28:37     gloda.index_msg DEBUG   msgsClassified notification

TEST-INFO | (xpcshell/head.js) | test 2 pending

TEST-PASS | base_index_messages.js | [null : 548] 2 == 2
2012-02-11 16:28:37     gloda.datamodel INFO    Message with message key does no
t match expected header! (37@made.up expected, got )
synMsg: msg:37@made.up
glodaMsg: Message:67

TEST-UNEXPECTED-FAIL | resources/glodaTestHelper.js | Verification failure: [und
efined undefined] is not close enough to [undefined undefined] ; basing this on
exception: [undefined undefined] - See following stack:
JS frame :: c:\builds\tbirdrel\mozilla\testing\xpcshell\head.js :: do_throw :: l
ine 462
JS frame :: ../../../../resources/logHelper.js :: mark_failure :: line 620
JS frame :: resources/glodaTestHelper.js :: _indexMessageState_assertExpectedMes
sagesIndexed :: line 513
JS frame :: resources/glodaTestHelper.js :: wait_for_gloda_indexer :: line 352
JS frame :: base_index_messages.js :: test_moved_message_attributes :: line 561
JS frame :: ../../../../resources/asyncTestUtils.js :: _async_driver :: line 155

JS frame :: c:\builds\tbirdrel\mozilla\testing\xpcshell\head.js :: <TOP_LEVEL> :
: line 429

TEST-INFO | (xpcshell/head.js) | exiting test
*******************************************

http://tinderbox.mozilla.org/showlog.cgi?log=ThunderbirdTry/1329003139.1329004824.18710.gz
I believe this has to do with moving the move test earlier in the test order for the index_imap_online_to_offline version of the tests, but beyond that, I don't know...
Attached patch fix tag failure (obsolete) — Splinter Review
Attachment #595832 - Attachment is obsolete: true
Attachment #596362 - Attachment is obsolete: true
Attachment #596362 - Flags: superreview?
Attachment #596362 - Flags: review?
It occurred to me that we don't have to wait for both tests to pass to fix the first test, and it's better to have a little more coverage. So this turns the test that we have a fix for back on, and leaves the other one that still fails off.
Attachment #596402 - Attachment is obsolete: true
Attachment #596520 - Flags: superreview?(mbanner)
Attachment #596520 - Flags: review?(bugmail)
The remaining test failure turns out to be an actual bug!

The problem is that GlodaAttachment instances save off a reference to their owning GlodaMessage instance when they are instantiated.  This works fine during initial indexing and when loading from disk, but does not work okay during reindexing (which happens when we bring the message offline).

Because 1) indexing is asynchronous and 2) we want to make sure that gloda representations change atomically as a result of indexing, gloda does something clever when re-indexing.  We create a new instance of the object, and let indexers mutate that representation.  When indexing is completed, we atomically (from the perspective of other JS code given our execution model) apply the changed attributes from the new instance to the old instance.

Unfortunately, the attachments end up referencing that dummy instance!  When the move happens, the dummy does not get updated and everything breaks.

The options on the table are:

No attachment user-visible API change:
a) Perform a full JSON round-trip for all attributes during the atomic transference stage to reduce permutations and adding some overhead.
b) Allow noun types to define a custom fix-up function for the transference step to avoid overhead but increase permutations.
c) Eradicate all attributes on the dummy instance and re-parent its prototype to point at the actual gloda instance so that out-of-date references become quasi-valid read-only references.
d) Do some black magic with a getter so that when you try and get the list of attachments off of the message, we update the attachment instances to make sure they are referencing the right instance.  (Alternately, they can be encased in a currying wrapper/membrane or something else silly.)

Yes API change:
e) Stop storing a reference to the gloda message on the attachments and instead require the use of some API.

protz, do you have thoughts on this?
Attachment #596520 - Flags: review?(bugmail) → review+
Thanks for the detective work Andrew, I was completely unaware of that subtlety in the reindexing logic. Unfortunately, I don't much time on my hands right now, so if you feel like tackling this, I would more than grateful. Otherwise, I can probably manage to take a few hours off my schedule this weekend.

I'm not opposed to an API change as long as we can backport the change all the way down to beta. I believe Thunderbird Conversations is only customer right now, and I broke backwards compatibility for the next version: Conversations 2.3 will only be compatible with Thunderbird 11 onwards, so if we happen to change the API, I'm ok to fixup my code before the next release of the Thunderbird 11 + Thunderbird Conversations 2.3 combo. I just don't want to have separate codepaths :).

The easiest route is by far to make sure the url getter from GlodaAttachment becomes a real function generateUrl(aGlodaMessage). That may be low-tech compared to the standard in Gloda code, but it might just be the easiest way for us to fix this in a sane manner.

If we don't want to change the API (which may be, indeed, much better as a solution), c) sounds like the right thing to do. It may even turn out to be useful later on, so that sounds like the good fix in the long-term. I may not even be that hard :)

Thanks for figuring this out!
There are missing words in my previous comment. I'm going to go get some coffee :).
Thanks for the quick feedback, protz!

I think I came up with an even lazier option that is still correct-enough.  Since we pass in the aRawReps dictionary that can hold all kinds of stuff, we can just pass in the 'true' representation as well.  fundattr.js can then fish that reference out and create the objects referencing the true representation from the get-go.

This way we can keep your nice API and also avoid introducing any new meaningful magic.  The only issue is if someone else makes the exact same mistake later-on, but hopefully they will just crib from your attachment work! :)
Some simple additions to the gloda debugging logic.  The exception message update makes it obvious that we were definitely trying to access the wrong message header, while the additional logging makes it clear that cache unification succeeded, and that the verification function was getting the right (cache unified) message.
Attachment #597121 - Flags: review?(dbienvenu)
This fixes the gloda attachment problem.  It is meant to layer on top of bienvenu's fix, ideally the one that does not disable the problematic test (which was marked obsolete).

protz, please chime in if you think this is a bad idea, but I think it's solid so I'm just going to have bienvenu verify it makes the test happy for him.

I pushed the stack to try as wel:
http://build.mozillamessaging.com/tinderboxpushlog/?tree=ThunderbirdTry&rev=6b0286016788
Attachment #597123 - Flags: review?(dbienvenu)
No, that sounds very straightforward, and right. I'm sorry I didn't think about this :)
Attachment #597121 - Flags: review?(dbienvenu) → review+
Comment on attachment 597123 [details] [diff] [review]
pass the true gloda representation in so the attachments can hang off that v1 [Landed in 11]

yay, tests pass with this. Thx, Andrew!
Attachment #597123 - Flags: review?(dbienvenu) → review+
when asuth's patches land, this one should result in all gloda tests running passing.
Attachment #596520 - Attachment is obsolete: true
Attachment #596520 - Flags: superreview?(mbanner)
Attachment #597223 - Flags: superreview?(mbanner)
Attachment #597223 - Flags: review+
I just hit this bug while using Conversations, so unless someone vetoes that, I think I'll ask for approval for "pass the true gloda representation in so the attachments can hang off that v1" :)
(In reply to Jonathan Protzenko [:protz] from comment #35)
> I just hit this bug while using Conversations, so unless someone vetoes
> that, I think I'll ask for approval for "pass the true gloda representation
> in so the attachments can hang off that v1" :)

To the adjudicators: the risk on this one is zero.  It is a runtime-only change and does not affect or depend on the state of the database.  The code path changes are localized to the attachment objects that I think only Thunderbird conversations uses.  (Gloda search's attachments search is based on a separate attribute.)
Comment on attachment 597123 [details] [diff] [review]
pass the true gloda representation in so the attachments can hang off that v1 [Landed in 11]

[Approval Request Comment]
this is a low risk patch that will help Conversations add-on
Attachment #597123 - Flags: approval-comm-beta?
Attachment #597123 - Flags: approval-comm-aurora?
Comment on attachment 597123 [details] [diff] [review]
pass the true gloda representation in so the attachments can hang off that v1 [Landed in 11]

I'm fine with this for Aurora, and probably beta as well, though it should land on the trunk first.
Attachment #597123 - Flags: approval-comm-aurora? → approval-comm-aurora+
Comment on attachment 597223 [details] [diff] [review]
turn all tests back on [Landed in 12]

Review of attachment 597223 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/imap/public/nsIImapMessageSink.idl
@@ +75,5 @@
>     *  @param   aMessageKey    - The UID of the message that changed
>     *  @param   aHighestModSeq - The highest mod seq the parser has seen
>     *                            for this folder
>    **/
> +  void notifyMessageFlags(in unsigned long aFlags, in ACString aKeywords,

nit: this file needs a uuid bump I believe.
Attachment #597223 - Flags: superreview?(mbanner) → superreview+
Comment on attachment 597223 [details] [diff] [review]
turn all tests back on [Landed in 12]

[Triage Comment]
We should take this on aurora as well as the other patch, so we can fix the test failures there.
Attachment #597223 - Flags: approval-comm-aurora+
Comment on attachment 597123 [details] [diff] [review]
pass the true gloda representation in so the attachments can hang off that v1 [Landed in 11]

Given the risk scope, I think we can take this on beta.
Attachment #597123 - Flags: approval-comm-beta? → approval-comm-beta+
(In reply to Mark Banner (:standard8) from comment #6)
> I only just got the two tests that we think are constantly failing :-(
> (thanks spam filters).
> 
> I've checked in a couple of skip-if for these tests:
> 
> http://hg.mozilla.org/comm-central/rev/cd5d85131466

To fix the test failures on aurora for now and to reduce bitrot, I landed this disabling patch there as well:

http://hg.mozilla.org/releases/comm-aurora/rev/16b9be046aaf

(although we should also land the patches here when we're ready).
fixed on trunk http://hg.mozilla.org/comm-central/rev/19442a39d042
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 13.0
Attachment #597121 - Attachment description: some debug helpers that helped me diagnose the problem v1 → some debug helpers that helped me diagnose the problem v1 [Landed in TB 13]
Attachment #597123 - Attachment description: pass the true gloda representation in so the attachments can hang off that v1 → pass the true gloda representation in so the attachments can hang off that v1 [Landed in 11]
Attachment #597223 - Attachment description: turn all tests back on → turn all tests back on [Landed in 12]
Flags: in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: