Closed Bug 465618 Opened 16 years ago Closed 15 years ago

gloda deleted message processing logic is not purging messages

Categories

(MailNews Core :: Database, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0rc1

People

(Reporter: asuth, Assigned: asuth)

References

Details

(Whiteboard: [no l10n impact][gloda key][still need to look into mozmill regressions])

Attachments

(1 file, 1 obsolete file)

When gloda is informed that a message has been deleted, it does not purge it from the database immediately, but instead marks it as deleted. This is done for several reasons: deletion processing is expensive, it allows us to quit without processing the deleted messages, and it allows reuse of the message in cases where the message wasn't really deleted, just moved by someone who is not us. The problem is that we are clearly not actually purging the messages. I have bunches of them, and thanks to bug 465616 I can see them in my exptoolbar queries.
blocking b1
Flags: blocking-thunderbird3+
Priority: -- → P3
moving to b2
Target Milestone: Thunderbird 3.0b1 → Thunderbird 3.0b2
The problem was made obvious while investigating another bug. The following obvious issues exist suggesting I got distracted halfway through during the rewrite/refactoring: - code in _worker_indexingSweep knows to create the worker, but it checks the wrong variable; it should check pendingDeletions but checks pendingDeletion. - _worker_processDeletes exists, but its datastore helper getDeletedMessageBlock does not exist. We should just use the query mechanism with special support to get at deleted messages.
When unit test is checked in, please mark bug 470170 in-testsuite+. from bug 470170 comment 1 "We have an outstanding bug about the failure of deletion to work, which is where/when I'll add the unit tests."
Flags: in-testsuite?
Whiteboard: [no l10n impact]
Whiteboard: [no l10n impact] → [no l10n impact][target frozen]
Target Milestone: Thunderbird 3.0b2 → Thunderbird 3.0b3
Status: ASSIGNED → NEW
Whiteboard: [no l10n impact][target frozen] → [no l10n impact][target frozen][gloda key]
Whiteboard: [no l10n impact][target frozen][gloda key] → [gloda key]
Target Milestone: Thunderbird 3.0b3 → Thunderbird 3.0b4
Whiteboard: [gloda key] → [gloda key][no l10n impact]
Whiteboard: [gloda key][no l10n impact] → [no l10n impact][gloda key]
I will not have a fix up for this or its related bug (bug 498493) today, but can have one this weekend if we don't freeze solid today. I don't think there's a huge problem freezing without this, though obviously we want it for rc1.
(In reply to comment #5) > I will not have a fix up for this or its related bug (bug 498493) today, but > can have one this weekend if we don't freeze solid today. I don't think > there's a huge problem freezing without this, though obviously we want it for > rc1. We can still get this in over the weekend.
Target Milestone: Thunderbird 3.0b4 → Thunderbird 3.0rc1
Right now, if you delete an IMAP folder (e.g. by deleting a GMail label), any Gloda search that hits a message that used to be in that folder (or tagged w/ that label) will fail horribly. As such, I think it might be a b4 blocker.
This is not a final patch. This is a heads-up, skimmable patch. This patch: - Refactors the code from viewWrapperTestUtils.js and glodaTestHelper.js that was involved with injecting messages into a new messageInjection.js helper file. messageInjection.js is based on asyncTestUtils.js and provides for local folder and IMAP injection via the fake-server. Because we are using asyncTestUtils, we are able to avoid much of the event loop spinning that some fake server stuff likes to do. - Refactors glodaTestHelper.js to build on asyncTestUtils.js/messageModifier.js/messageInjection.js, doing away with much of the custom support code which can now be expressed cleanly just using generators and the support code. The testing code is now more strict in requiring that you predict all gloda indexing that will occur and that no double indexing occur (unless intentionally handled). I have some nice block comments to integrate from an accidental branch yet to boost the documentation some. - Removes a lot of dead code from gloda's datastore.js that was used when we did things synchronously. - Makes the indexer clever enough to avoid double-indexing when an indexing sweep initiated by event-driven overflow sees messages already handled by event-driven indexing. This is not yet unit tested but will be. - Moves tests that have local and IMAP (online, offline, online-to-offline) variants to be named as "base_*" with the testing variants being "test_*_local/imap_offline/imap_online/imap_online_to_offline.js". Limitations of the patch: - The patch addresses deletion a little bit, but does not unit test it yet. - Only the base_index_messages.js and base_query_messages.js tests/variants are fully believed to pass right now. I think I probably need to do some minor cleanup on the other gloda tests. - This breaks the mozmill tests, although it's close to working. I will probably stop trying to reuse mozmill's subscript loader wrapper code and just explicitly create my own subscript loader. This will allow us to just load the various helper files in a single namespace with pre-initialized globals. (mozmill's loader loads each file in its own module, and we can't preprepare the environment with globals, which requires the modules to be fixed up after the fact.) Feel free to r- once/if you've had a look. I should have a patch up in the next day that unit tests deletion and sweep indexing and leaves all the other tests in working order.
Attachment #402142 - Flags: review?(bienvenu)
Attachment #402142 - Flags: review?(bienvenu) → review-
Comment on attachment 402142 [details] [diff] [review] gloda deletion testing work and indexer cleanup looky-loo v1 OK, I've looked through this, FWIW. It looks like some good cleanup, but it's hard to separate out the part that actually fixes this bug. I'll run with your final patch when it's up.
I am expecting the patch-set that addresses this bug and most of the other "gloda key" issues to show up with tests on Tuesday. (Deletion semantics are entangled with not indexing junk, which is entangled with avoiding pathological performance on moves, etc.)
Status: NEW → ASSIGNED
(In reply to comment #10) > I am expecting the patch-set that addresses this bug and most of the other > "gloda key" issues to show up with tests on Tuesday. (Deletion semantics are Not happening today. Notification problems and complications involving pending commit semantics were a surprise. On the upside, the unit tests are doing their job. I will try to break out the notification patch on its own tomorrow once the unit tests for the gloda functionality that require them has borne out that they are doing what we need sufficiently.
Whiteboard: [no l10n impact][gloda key] → [no l10n impact][gloda key][will be fixed by gloda correctness patch]
I've made a quick pass through the correctness patch (skimming the tests, TBH). It looks fairly reasonable. There were some frankly libelous comments about our lack of ability to propagate properties for messages copied into imap folders, but asuth is going to look at nsImapMailFolder::SetPendingAttributes (it looks like gloda-id is preserved on move already, but not copy) I wasn't able to run TB with the patch, because of some js errors that asuth will fix once all the tests work.
(In reply to comment #12) > but asuth is going to look at nsImapMailFolder::SetPendingAttributes (it looks > like gloda-id is preserved on move already, but not copy) We never want to propagate gloda-id on "copy" where copy is defined as something that duplicates a message. However, a "move" that manifests itself as a copy operation followed by a deletion we definitely do want gloda-id transferred. Hopefully you are using "copy" as I just defined?
yes, in short, I believe we're already propagating gloda-id correctly.
Most recent push of the *incomplete* patch to my mq repo (http://hg.mozilla.org/users/bugmail_asutherland.org/comm-central-patches/) should run happily in Thunderbird and pass all unit and mozmill tests[1]. It is incomplete because, despite this being the pseudo-canonical bug for the patch (and the patch in my queue being called 'gloda-delete'), I haven't completed the deletion logic (and so the tests are commented out/incomplete). The good news is that all kinds of other things are completely awesome and well tested. I have fixed the ContainsKey stuff for msgsClassified, but the libel issue has not been addressed. I'm a bit concerned about the IMAP move case being too clever in the case where the IMAP message being moved is in a folder that is marked filthy but has not started being indexed. In that case, headers can have valid gloda-id's with an absent or '0' (for clean) gloda-dirty property. It is very important that in the move case we are able to observe that the origin header is in a folder that is filthy and either zero the gloda-id property before it gets copied to the pending header stuff or kill the pending header stuff. (Otherwise I think I have the corruption bug licked in the patch.) Progress should be good if somewhat less than 100% of theoretical for the next few days because I am trying not to get the rest of my co-workers sick and will be laptop-bound (and hopefully monotonically decreasingly less sick). 1: At least one of the mozmill tests flakes for me, but I rather suspect it would be broken for me without the patch too based on previous troubles I had trying to review Magnus' window deletion test. We need a linux mozmill buildbot reporting to ThunderbirdTest...
ugh, sorry you're ill. I'll run with the patch and let you know what I see. The first thing I notice is that folderPane.js registers nsIMsgFolderListener.all but does not implement msgsClassified which causes console errors.
some miscellaneous review comments: It would be nice to keep these in order for readability: /// nsIMsgFolderListener::msgAdded notification const msgFolderListenerFlag msgAdded = 0x1; + /// nsIMsgFolderListener::msgsClassified notification + const msgFolderListenerFlag msgsClassified = 0x8; + /// nsIMsgFolderListener::msgsDeleted notification const msgFolderListenerFlag msgsDeleted = 0x2; whitespace on blank line: + nsnull); if (m_size > 0) { + ShowCompactingStatusMsg(); lose spaces after and before ( ) + nsCOMPtr <nsIMutableArray> newMsgHdrs = + do_CreateInstance(NS_ARRAY_CONTRACTID, &rv); + NS_ENSURE_SUCCESS(rv, rv); + + PRUint32 numNewMessages = newMessageKeys.Length(); + for ( PRUint32 i=0 ; i < numNewMessages ; ++i ) + { folder rename and delete are a couple other reasons ForceClosed gets called. _databaseAnnouncerListener: { indexer: null, + /** + * XXX We really should define the operations under which we expect this to + * occur. While we know this must be happening as the result of a + * ForceClosed call, we don't have a comprehensive list of when this is + * expected to occur. Off the top of my head: compaction (although we A bunch of the mailnewsdb tests are failing on windows - I'll try to figure out why.
They're failing because of this: resources/glodaTestHelper.js:73: TypeError: redeclaration of var msgGen I'll dig a bit more.
my bad, I didn't clobber, so I had some of the removed tests in my objdir. All the mailnews xpcshell tests pass for me. I'll try the mail/base and xpcshell ones next.
mail/base tests are fine - I had a problem with one of the mozmill content tests, but I doubt it has to do with this patch.
I've been running with the patch from 10/26 for a few days. Perhaps it's just that gloda is indexing everything, but I've noticed that performance is pretty sluggish while indexing is going on. (This is with a debug build). Also, it seems to be trying to index the same 1873 messages in one of my folders, over and over again, i.e., I shutdown when it's in the middle of indexing those 1873 messages, and then when I restart, it seems to start over. Perhaps we're missing a commit of the .msf file?
the .msf file is getting committed fine. I see a bunch of messages with a gloda id of 1 (most of them are gloda dirty as well). Gloda claims indexing is successful, but the next time I run, it indexes the same set of messages again.
(In reply to comment #21) > I've been running with the patch from 10/26 for a few days. Perhaps it's just > that gloda is indexing everything, but I've noticed that performance is pretty > sluggish while indexing is going on. (This is with a debug build). Also, it (I am assuming you have the patch that gets rid of the nsAutoLock issues that 1.9.1 will forever have. If not, please apply that patch or run a non-debug build or what not.) Apart from gloda simply doing more indexing, the only culprit that springs to mind would be GlodaUtils.considerHeaderBasedGC(). It is new and forces a GC every 4096 headers that are exposed to JS. Without this change, we force a GC only just after we perform a database commit. We try and do that only on idle and are willing to wait up to _20_ seconds for that to happen. (Note that the 4096 count is reset to zero if we initiate a GC after a commit.) This was done to try and eliminate gloda's indexing as a potential source of VM bloat. With any luck my mastery of the slip-pery critical path will find you with some free time to see what a profiler says? :)
(In reply to comment #22) > the .msf file is getting committed fine. I see a bunch of messages with a gloda > id of 1 (most of them are gloda dirty as well). Gloda claims indexing is > successful, but the next time I run, it indexes the same set of messages again. Excellent bug trapping. Gloda id "1" means bad message. We fail to clear the dirty bit when we mark a message as bad. Which means next time through we try and reindex it again (and fail again). I've fixed the bug and added "test_reindex_on_dirty_clear_dirty_on_fail" to test_index_bad_messages.js (fails without the patch, succeeds with.) You may want to be concerned about why gloda is angry about your messages. Basically any failure in indexing a message (be it a streaming problem, core gloda bug, attribute provider bug, or legitimate attribute provider intentional explosion) will get handled by the recovery function and nothing will be logged. (I used to intentionally log things as warnings because glodaTestHelper creates a Log4Moz appender that causes a unit test to fail when a warning/error gets logged. But then people see those warnings and file bugs. So the new solution for this expected phenomenon is that we have a _unitTestHookRecover hook so that glodaTestHelper knows whenever a recovery hook gets invoked. Tests have to explicitly expect the invocation of the recovery hook or we cause the unit test to fail. Because I love logging, the test hook also gets a peek at the exception that caused things so it can be well logged. That might make for a good 'gloda developer mode' extension/mode that would log or categorize the exceptions for those users who might have caused the horrible problem. Of course, if the unit tests are sufficient, you don't need that. :)
I should probably note I got the first deletion unit test passing earlier today. I want to finish increasing the coverage to cover the various deletion cases before I declare some degree of victory and put the patch up. (Note: as previously mentioned, orphaned identity/contact culling is not currently part of this.)
>(I am assuming you have the patch that gets rid of the nsAutoLock issues that > 1.9.1 will forever have. If not, please apply that patch or run a non-debug > build or what not.) That was a big part of it, along with the continual re-indexing of a large folder. I couldn't remember if that had been fixed in 1.9.1, and I couldn't find the patch because my 1.9.1 moz-central repo did not seem to have been backed up as promised :-( But I found the patch on my mac and things are better now. They'll be even better once gloda stops trying to index the same 2000 messages over and over again :-)
Deletion and deletion unit tests are believed in good shape but the test case I added to test folder deletion is failing because I'm obviously not correctly deleting a folder. (The trashing case actually marks deleted on the move into trash because we don't index trash, so emptying the trash is not a sufficient means of testing this... we have to actually delete the folder. I did add a case for moving to trash and that works though.) (I pushed the revised patch to my queue even though that test is failing.)
I'm running with this patch now. It does not seem to keep trying to index the gloda bad messages over and over again, so if you put a fix in for that, it seems to be working. I believe the reason for most of my bad messages was the bugzilla snippet generator extension. I noticed that almost all the bad messages were from bugzilla, and I noticed I still had the bugzilla snippet generator extension enabled. I assume it was out of date and messing up the core gloda. It does seem that bad snippeters shouldn't be able to cause that much grief.
(In reply to comment #28) > I believe the reason for most of my bad messages was the bugzilla snippet > generator extension. I noticed that almost all the bad messages were from > bugzilla, and I noticed I still had the bugzilla snippet generator extension > enabled. I assume it was out of date and messing up the core gloda. It does > seem that bad snippeters shouldn't be able to cause that much grief. gp-bugzilla does more than just whittle content, but you make a good point. Some gloda plugins are more important than others; a failure of fundattr.js or explattr.js should result in a failure of the message, whereas gp-bugzilla or the phone number guy should only fail them. I can specialize the recovery hook accordingly.
Patch pushed with deletion and junk-deletion ramifications well unit tested: http://hg.mozilla.org/users/bugmail_asutherland.org/comm-central-patches/raw-file/d8b5695f6340/gloda-delete As per discussion today on IRC, this does not include orphan processing but should otherwise cover all of the other deletion things I was thinking of. I have not addressed the comments in comment #17 yet. Libelous IMAP-ish comments and the potential performance enhancements that could accompany such redaction also have not been addressed. (Except for the comments being wildly out of wack with reality, the fact that we don't fast-path that is only a performance problem and not a correctness problem.) If you are already running with a previous version of this patch, blowing away your global-messages-db.sqlite is a good idea for testing.
One significant and notable change is that nsMsgDBView generates an nsIMsgFolderListener itemEvent notification on "JunkStatusChanged" whenever it junks/unjunks messages. This was done primarily for efficiency of batching and not for guaranteed comprehensive notifications. The message classifier might have been a better place for comprehensive notification, except that it only hears about messages one by one. Since we have defined the semantics that the itemEvent "JunkStatusChanged" notification is only generated _after_ the msgsClassified notification announces the existence of a message header, nsMsgDBView might be better than the classifier because the classifier might not readily be able to distinguish the cases in which it is being used. (Note: nsMsgDBView can of course race the msgsClassified notification for junk marking, but it's sufficiently harmless for the gloda use case that I don't care. The point with the defined semantics is really that we don't generate the notification for the automatic classification.)
I'm running with this patch now.
Bug 521632 looks more like another instance of bug 465616. I suggest reopening bug 465616 and duplicating 521632 against that one, as explained below. Looking at comment #0, I doubt that the fix for this bug will fix bug 521632: I understand that we are currently marking messages as deleted in gloda as soon as they are deleted. Messages that are marked deleted should never show in search results (that's the problem of the phantom draft mails in bug 521632, and same problem as we claim was fixed by bug 465616, not this one). Sufficient fix for this bug 465618 might be: - at some later time after actual deletion of msgs (which only /marks/ messages deleted in gloda), physically /purge/ messages from gloda If that's true, then fixing this bug will NOT fix the problem of bug 521632, as we'll still show the deleted drafts in search results until they are actually purged from gloda.
As mentioned in Bug 465616, comment #4, please consider purging messages from gloda immediately upon deletion (instead of marking them deleted and purging them at some later point), for reasons of privacy: > I have no idea of the technical aspects of this, but aren't we compromising > privacy by keeping <shift>-deleted (expunged) messages in gloda and "allowing > them to be returned" through some query at all? Will data of expunged messages > still live in gloda after user has compacted a folder to wipe out the physical > trace of the expunged message? > > (This objection does NOT apply to deleted messages that are still in trash, > we'll need to index those of course)
(In reply to comment #34) I am unable to duplicate the multiple draft problem running with the current gloda-delete patch (as I linked in comment #30). If I create a draft with a unique search term (entered immediately), and either manually save the changes or wait for auto-save to save them, gloda search only returns the single, expected result. The gloda-id changes, as is expected given the change in message-id. As such, this bug addresses the problem and the dupes to this bug (and any consequent dupings for the draft situation) are correct.
(In reply to comment #16) > The first thing I notice is that folderPane.js registers > nsIMsgFolderListener.all but does not implement msgsClassified which causes > console errors. folderPane registers for nsIFolderListener.all, although it does have a comment that inaccurately identifies its listener as an nsIMsgFolderListener instead of the nsIFolderListener it actually is. I suspect what you were seeing was the vista/spotlight search stuff. I have resolved this by completely removing all/allMsgNotifications/allFolderNotifications from nsIMsgFolderNotificationService and updating searchCommon.js (and the unit tests) to stop using them. I assert that having these helpers, although consistent with nsIFolderListener, is a bad thing and removing them is right because: 1) Notifications are not free, especially when the listeners are implemented in JS. We do not want code signing up for notifications they do not need. For example, folderPane.js uses nsIFolderListener.all then provides no-op implementations for 4 of the 8 methods. 2) New notifications may get added in the future, as I have done with this patch. Obviously, old code is not going to automatically be able to do anything useful when new notifications are added, so why should they be signed up for them? C++ code will need to be made to at least provide no-op stubs, but in the JS case we don't need to implement the stubs, which is nice. 3) The whole nsIMsgFolderListener/nsIMsgFolderNotificationService is new for 3.0, so it's better to do the cleanup now.
That's possible - I do have the vista search enabled - I could very well have gotten confused between nsIFolderListener and nsIMsgFolderListener. Yeah, removing ::all is probably the right thing to do.
(In reply to comment #15) > not been addressed. I'm a bit concerned about the IMAP move case being too > clever in the case where the IMAP message being moved is in a folder that is > marked filthy but has not started being indexed. In that case, headers can > have valid gloda-id's with an absent or '0' (for clean) gloda-dirty property. > It is very important that in the move case we are able to observe that the > origin header is in a folder that is filthy and either zero the gloda-id > property before it gets copied to the pending header stuff or kill the pending > header stuff. (Otherwise I think I have the corruption bug licked in the > patch.) It would appear that the pending header propagation happens before the move is dispatched to the IMAP service, which means that the propagation must strictly happen before the move completion notification can happen. So in all cases there is no point trying to modify the source headers, we just need to clobber the pending headers. In addition to the filthy folder case that I call out above, the situation where a message is being moved from a folder that is not gloda-indexed needs to be treated similarly. I have added a unit test to verify both the local and IMAP cases (both filthy and unindex-to-indexed). The test is "test_filthy_moves_slash_move_from_unindexed_to_indexed" and lives in "base_index_messages.js".index the messages
(In reply to comment #12) > It looks fairly reasonable. There were some frankly libelous comments about our > lack of ability to propagate properties for messages copied into imap folders, The comments have been updated to reflect reality. I have not had time to implement fast-paths. I have created bug 526416 to track the potential optimization.
(In reply to comment #29) > gp-bugzilla does more than just whittle content, but you make a good point. > Some gloda plugins are more important than others; a failure of fundattr.js or > explattr.js should result in a failure of the message, whereas gp-bugzilla or > the phone number guy should only fail them. I can specialize the recovery hook > accordingly. I have logged bug 526418 to track this.
(In reply to comment #37) > I have resolved this by completely removing > all/allMsgNotifications/allFolderNotifications from > nsIMsgFolderNotificationService and updating searchCommon.js (and the unit > tests) to stop using them. I've had to further modify the unit test support code to do my clearNewMessages workaround that I have in messageInjection because the IMAP test was experiencing intermittent failures for me when it would get a duplicate msgsClassified event.
I'm claiming this is it. This patch corresponds to: http://hg.mozilla.org/users/bugmail_asutherland.org/comm-central-patches/raw-file/c26e5ddbaabe/gloda-delete I would feel better if I had unit tests for folder renaming/moving (I don't think there are any, at least.) I don't think that code has actually changed, but that's not really proof of anything.
Attachment #402142 - Attachment is obsolete: true
Attachment #410165 - Flags: superreview?(bienvenu)
Attachment #410165 - Flags: review?(bienvenu)
Playing with builds from #44 I get the following statement when I try to use the people filter. Adding the first use as must be part of the conversation does nothing on screen and generates : Warning: assignment to undeclared variable label Source File: chrome://messenger/content/glodaFacetBindings.xml Line: 620 in the Error console
(In reply to comment #45) > Playing with builds from #44 I get the following statement when I try to use > the people filter. Adding the first use as must be part of the conversation > does nothing on screen and generates : > > Warning: assignment to undeclared variable label > Source File: chrome://messenger/content/glodaFacetBindings.xml > Line: 620 > > in the Error console The patch does not touch the gloda faceted search UI. The strict warning is definitely not new. A regression in the search behavior is conceivable, but not exceedingly likely once the messages have actually been displayed. Please retry the scenario without the new patch and verify it does not occur for you then.
(In reply to comment #46) > (In reply to comment #45) > A regression in the search behavior is conceivable, but not exceedingly likely > once the messages have actually been displayed. Please retry the scenario > without the new patch and verify it does not occur for you then. Just did using latest nightly available I can filter using people while I couldn't with the try builds. (I could select, but it had no effect on the results).
I'm running with this now. The xpcshell tests all pass on windows, but there are some leaks in the mailnewsglobaldb tests that I hadn't seen before. They're probably tests not cleaning up after themselves but I'll try to figure that out. Re Ludo's issue - could the fix for bug 516680 be involved? It would be in the try server builds but not yesterday's nightly.
(In reply to comment #48) > Re Ludo's issue - could the fix for bug 516680 be involved? It would be in the > try server builds but not yesterday's nightly. It is I filed it elsewhere just for that.
And "elsewhere" would be in bug 526473, which I'm working on now. A temporary workaround would be to use the keyboard nav, if you wanted to move forward on this bug.
this doesn't seem to be used: +function blarg(t) { + dump("!!!\n " + t + "\n\n"); +} + Is this because the folder is empty but there are deleted messages in the local folder/offline store? Definitely seems like the right thing to do the notification and blow away the db, etc. + // Notify that compaction is beginning. We do this even if there are no + // messages to be copied because the summary database still gets blown away + // which is still pretty interesting. (And we like consistency.) removeListener should be on its own line: + msgsClassified: function (aMsgs, aJunkProcessed, aTraitProcessed) + { + do_check_eq(this.mReceived & nsIMFNService.msgsClassified, 0); + this.mReceived |= nsIMFNService.msgsClassified; + if (this.mRemoveSelf) gMFNService.removeListener(this); "because" : For the + * classification, Because Better style to get the rv back from do_GetService with the ", &rv)" form, and use NS_ENSURE_SUCCESS(rv, rv); + nsCOMPtr<nsIMsgFolderNotificationService> + notifier(do_GetService(NS_MSGNOTIFICATIONSERVICE_CONTRACTID)); + if (!notifier) + return rv; No space between nsCOMPtr and < (though I personally love the space :-) + nsCOMPtr <nsIMutableArray> msgHdrsNotBeingClassified; // create on demand + "i = 0;" PRUint32 numNewMessages = newMessageKeys.Length(); - for ( PRUint32 i=0 ; i < numNewMessages ; ++i ) + for (PRUint32 i=0 ; i < numNewMessages ; ++i) check for OOM here: + if (!msgHdrsNotBeingClassified) + msgHdrsNotBeingClassified = do_CreateInstance(NS_ARRAY_CONTRACTID); + // Accumulate the message header for immediate classified notification + // that we are not classifying it. + msgHdrsNotBeingClassified->AppendElement(msgHdr, PR_FALSE); you don't really need the temp var "job" here, but I leave it up to you whether the code is more readable with it. - let job = new IndexingJob("ab-card", 1, card); - GlodaIndexer.indexJob(job); + let job = new IndexingJob("ab-card", card); + GlodaIndexer.indexJob(this, job, false); same thing for curGlodaId and isDirty: + let curGlodaId = msgHdr.getUint32Property(GLODA_MESSAGE_ID_PROPERTY); + if (curGlodaId != glodaId) + msgHdr.setUint32Property(GLODA_MESSAGE_ID_PROPERTY, glodaId); + let isDirty = msgHdr.getUint32Property(GLODA_DIRTY_PROPERTY); + if (isDirty) Thoughtus-interruptus here: + * this event to two ends: + * + * - Know when a folder we were trying to open to index is actually ready to + * be indexed. (The summary may have not existed, may have been out of + * date, or otherwise.) + * - Know when + * "us." + // (there is also no need to check for gloda dirty since the enumerator + // filtered that for u.s) "We" and "headerIter" in comment: + // we shouldn't be in the loop anymore if headeriter is dead now. + if (!headerIter) "there" + // In no circumstances should their be data bouncing around from previous + // calls if we are here. |killActiveJob| depends on this. snapshotting at log4moz.js - I'll look at the second half of the patch in a few minutes.
not quite grammatical: + * Currently, we just test that quoting removal and that the content turns out + * right. We do not actually verify that the quoted blocks are correct. (We missing blank line at the end of gloda/test/unit/base_gloda_content.js +]; \ No newline at end of file commentus interruptus: + * @param [aJustResumeExecution=false] Should be just poke the callback driver + * for the indexer rather than continuing the call. You would likely want + * to do this when you expect + */ no license in this test file and others (unclear to me how much we care about that): diff -r 1533f0738a3a mailnews/db/gloda/test/unit/test_index_compaction.js --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/mailnews/db/gloda/test/unit/test_index_compaction.js Wed Nov 04 02:55:20 2009 -0500 @@ -0,0 +1,165 @@ "and no one": + * distinguish between succesful termination and abort-style termination where + * the process just keeled over and on one told us. provides us with ? + // tell the IMAP service to create the folder, passing a listener that + // wraps the folder as a listener and provides us with + mis.imapService.createFolder( That's it for now. I'd mark this r/sr=me modulo the nits except for the test memory leaks, which I'd like to make sure are really just test cleanup leaks. I'm going to look into those right now.
While trying to figure out the leak, I saw this: bad indentation here: + if (!aSomeoneElseWillTriggerTheUpdate) aFolder.updateFolder(null); We're leaving a couple db's open - gabba2.msf and Junk.msf, for the test_index_junk_imap tests, but I'm not sure if we're leaking at a higher level. There's so much sharing going on between the tests now that it's a bit hard to figure out where the cleanup is happening. There are a lot less global variables then the stand-alone tests, from what I can tell, so it doesn't seem like a simple case of having to clear out global vars before shutting down.
test_message_moving_to_junk_folder_is_deletion seems to be the cause of the leaks; or, at least, commenting out that test makes the leaks go away.
In messageInjection.js, we're passing in allowUndo = true, which will make imap move/copies first happen offline and then get played back online. Perhaps we never use this to move between folders, but it's not especially good for copying from local folders to imap folders, at least for the purposes of populating imap folders, because undo isn't really hooked up for xpcshell tests, by default (you need a msgWindow and an nsIMessenger to hang the transactions off of). If I change this to false, test_message_moving_to_junk_folder_is_deletion fails, which makes me think it or some of the core test code is expecting the behavior we get when allowUndo is true. In theory, the caller shouldn't ever care, which implies something odd is going on in the move/copy backend. In any case, passing in false here seems like the right thing to do to me: copyService.CopyMessages(folder, xpcomHdrArray, realDestFolder, /* move */ true, asyncCopyListener, null, /* undo... what do we do? */ false);
OK, I've figured out the memory leak issue and the test_message_moving_to_junk_folder_is_deletion failure when I tell CopyMessages to not allow undo. I need to unwind all the dump/printfs I've added to figure this out. I'll attach a patch that fixes it, and try to explain why it does. I'll probably have to leave it up to asuth to figure out if this is a test issue or a core gloda issue.
Well, I was way too optimistic earlier. I fixed the junk test but other tests were broken by the change to CopyMessages. I think turning off allowUndo changes the timing quite a bit, besides introducing leaks. What I saw going on with test_message_moving_to_junk_folder_is_deletion was that the gloda id did not get set on the msg hdrs because we waited for indexing to complete, but not for the commit. When we get the msgsDeleted notification, we get a bunch of headers that don't have a gloda-id set, so this code doesn't do the right thing: let glodaMessageIds = []; let deleteJob = new IndexingJob("message", -1, null); for (let iMsgHdr = 0; iMsgHdr < aMsgHdrs.length; iMsgHdr++) { let msgHdr = aMsgHdrs.queryElementAt(iMsgHdr, Ci.nsIMsgDBHdr); try { glodaMessageIds.push(msgHdr.getUint32Property( GLODA_MESSAGE_ID_PROPERTY)); } catch (ex) {} } because the gloda id is 0. See also the comment in msgsMoveCopyCompleted // no gloda id means it hasn't been indexed, so the move isn't // required. but I think this might also happen because we haven't committed the db. I realize that the PendingCommitTracker is supposed to deal with this, but it's not used everywhere.
So I'm willing to say r/sr=me despite the memory leaks as long as Andrew isn't concerned about the gloda-id/pending commit issue in the face of delete/move notifications. I worry that it might introduce its own correctness issues, if the user manages to delete a newly added message before the sqlite commit happens.
(In reply to comment #57) > indexing to complete, but not for the commit. When we get the msgsDeleted > notification, we get a bunch of headers that don't have a gloda-id set, so this > code doesn't do the right thing: ... > because the gloda id is 0. See also the comment in msgsMoveCopyCompleted > > // no gloda id means it hasn't been indexed, so the move isn't > // required. > > but I think this might also happen because we haven't committed the db. I > realize that the PendingCommitTracker is supposed to deal with this, but it's > not used everywhere. I think you were looking at the un-patched code or something is wrong with the patch application. The patched msgsDeleted code definitely uses PendingCommitTracker, and the comment you cite only exists in the un-patched code as well. I definitely believe that we could fail once the offline code gets taken out of the equation, and I was seeing some leaks, but only in the IMAP cases, so I was assuming the problems were related to the fake server and lingering IMAP activity.
(In reply to comment #51) > Is this because the folder is empty but there are deleted messages in the local > folder/offline store? Definitely seems like the right thing to do the > notification and blow away the db, etc. > > + // Notify that compaction is beginning. We do this even if there are no > + // messages to be copied because the summary database still gets blown away > + // which is still pretty interesting. (And we like consistency.) Yes. I wrote a unit test that was triggering compaction on a (local) folder with a deleted message and no live messages but didn't get a compaction event. So I fixed that right up.
(In reply to comment #59) > > I think you were looking at the un-patched code I was looking at just (:-() that particular file in the wrong tree but I was running correctly patched code. Which means that I did not find the root cause of the test failure when allowUndo is false. But I'm glad that the code is doing the right thing with pendingCommmit and I'm OK with landing it as is. I'll keep poking around to see why the test is failing when I turn allowUndo off. > I definitely believe that we could fail once the offline code gets taken out of > the equation, and I was seeing some leaks, but only in the IMAP cases, so I was > assuming the problems were related to the fake server and lingering IMAP > activity. I don't think so, though it's really difficult to pinpoint memory leaks in the xcpshell tests.
Attachment #410165 - Flags: superreview?(bienvenu)
Attachment #410165 - Flags: superreview+
Attachment #410165 - Flags: review?(bienvenu)
Attachment #410165 - Flags: review+
I dug a little deeper into the test_index_junk_imap_offline test failure with allowUndo=false. The reason the pendingCommit table doesn't seem to know about the deleted headers it that noteBlindMove calls noteDirtyHeader, which removes them from the pendingCommit table. This happens before the msgsDeleted notification, which can't find the gloda ids because the msg hdr doesn't have them, and the messages aren't in the pendingCommit table.
(In reply to comment #63) > I dug a little deeper into the test_index_junk_imap_offline test failure with > allowUndo=false. The reason the pendingCommit table doesn't seem to know about > the deleted headers it that noteBlindMove calls noteDirtyHeader, which removes > them from the pendingCommit table. This happens before the msgsDeleted > notification, which can't find the gloda ids because the msg hdr doesn't have > them, and the messages aren't in the pendingCommit table. The msgsDeleted logging is probably misleading; the move case is deferring to the deletion case when it sees that the target folder is not supposed to be indexed. It appears, based on the augmented debugging trace below (wonder what all that extra code was for? :), that the problem is that for some reason the messages are getting marked as Read and IMAPDeleted before the move event gets to us. So it's not noteBlindMove that calls noteDirtyHeader, but rather we are explicitly dirtying the message in _reindexChangedMessages called by the OnItemPropertyFlagChanged case. Thoughts on how to deal with this? # messageInjection moving messages MsgHdr: imap://user@localhost/gabba2#1 MsgHdr: imap://user@localhost/gabba2#2 # msgEvent OnItemEvent Folder: Junk FolderCreateCompleted # messageInjection moving messages from Folder: gabba2 to Folder: Junk # msgEvent OnItemPropertyFlagChanged Header MsgHdr: imap://user@localhost/gabba2#1 had property Status have the following bits change: Read # Queue-ing job for indexing: message # +++ Indexing Queue Processing Commencing # msgEvent OnItemPropertyFlagChanged Header MsgHdr: imap://user@localhost/gabba2#1 had property Status have the following bits change: IMAPDeleted # msgEvent OnItemPropertyFlagChanged Header MsgHdr: imap://user@localhost/gabba2#2 had property Status have the following bits change: Read # msgEvent OnItemPropertyFlagChanged Header MsgHdr: imap://user@localhost/gabba2#2 had property Status have the following bits change: IMAPDeleted # msgEvent msgsMoveCopyCompleted moved MsgHdr: imap://user@localhost/gabba2#2 MsgHdr: imap://user@localhost/gabba2#1 to Folder: Junk # MoveCopy notification. Move: true # !! mapped 8 from imap://user@localhost/Junk # msgsDeleted notification
I'd suggest removing the update of the source folder after the move (which is not required to get the move to happen, I swear!), but it doesn't help. When we store the imap deleted + seen flags, we get an unsolicited response about the new flags and we immediately notify the backend code about the new flags, which generates the notifications. This happens before the headers are deleted. These are needed for the imap delete model case, and in general, I wouldn't want to suppress these notifications. So I'm not sure where that leaves us.
Yes, changing fundamental and long-standing events does not sound like a great idea. While some form of batching indicator or a notification at the start of the move (before we start doing move related things) could help mitigate this problem and wouldn't be a bad thing in general, the fundamental problem is obviously in PendingCommitTracker. It needs to be able to express that a message is dirty (so noteDirtyHeader should not discard). This is tractable should be and a minor change. I'll try removing the updateFolder now that we aren't using undo. It was required before. (As a means of hanging a deterministic notification that would guarantee that all the events related to the folders had occurred so that we could assert about indexing states. There may have been other better ways to accomplish this.)
Whiteboard: [no l10n impact][gloda key][will be fixed by gloda correctness patch] → [no l10n impact][gloda key][has r,sr; needs new patch asuth]
Requested changes by bienvenu made, PendingCommitTracker problem resolved and unit tests pass, stopped using undo in test code per bienvenu and the memory leaks went away. pushed to trunk: http://hg.mozilla.org/comm-central/rev/413b2018349c push to branch happening shortly
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Flags: in-testsuite? → in-testsuite+
Resolution: --- → FIXED
Whiteboard: [no l10n impact][gloda key][has r,sr; needs new patch asuth] → [no l10n impact][gloda key]
could you summarize, in a few bullet points perhaps, what the changes made since the patch used in 11-04 tryserver builds? I'm thinking primarily in terms of the end effect to the user, rather than the underlying technical changes. modulo leaks and unit tests. To add to the record, on a laptop using the tryserver build I just discovered messages that were not indexed: case 1. from a gloda search on a bug#, clicked the bugmail thread from facet results, moved all to an archive folder, followed by a "search messages" which uncovered 2 messages in two different folders that hadn't got moved. In those 2 folders, a small percentage of messages are not indexed, but those msgs are offline. no dirty bit. case 2. my credits folder, about 20 unindexed (1/3 of the folder) up to about 1 week in age, none are bugmail, most but not all have dirty bit set in both cases: a) the unindexed messages tend to in a pocket of unindexed msgs, and are near the end of the folder chronologically. but, there are also messages after/newer than them that are indexed b) nothing extraordinary about these folders or the messages - they are not targets of filters, the messages existed in those folder when initial indexing started, i.e. when I first used the new tryserver build, and they have no attachments I can't give the exact activity history of the messages of case 1. But in case 2 I am fairly certain I didn't touch any messages in that folder since I started using the build. (for completeness ...) Activity manager has about 300 "The current command did not succeed. The mail server responded:Mailbox is empty." - TB started about 12 hours ago iirc.
(In reply to comment #69) > To add to the record, on a laptop using the tryserver build I just discovered > messages that were not indexed: One strange thing I noticed is that indexing for newly downloaded messages seems to be triggered only (or again?) when I download new messages the next time. Say there are 3 new messages on pop3 server, I click "get messages" and then status bar jumps at "indexing 19 messages", which were those 19 I got the last time from "get messages". Maybe those didn't get indexed before and are thus missing for some time? I think that's what happens, no guarantees.
There was a serious glitch in PendingCommitTracker... trunk pushed: http://hg.mozilla.org/comm-central/rev/63edde324849 comm-1.9.1 pushed: http://hg.mozilla.org/releases/comm-1.9.1/rev/b30435119c01 The revised comment for the commit logic says it pretty well: /** * The function gets called when the commit actually happens to flush our * message id's. * * It is very possible that by the time this call happens we have left the * folder and nulled out msgDatabase on the folder. Since nulling it out * is what causes the commit, if we set the headers here without somehow * forcing a commit, we will lose. Badly. * Accordingly, we make a list of all the folders that the headers belong to * as we iterate, make sure to re-attach their msgDatabase before forgetting * the headers, then make sure to zero the msgDatabase again, triggering a * commit. If there were a way to directly get the nsIMsgDatabase from the * header we could do that and call commit directly. We don't track * databases along with the headers since the headers can change because of * moves and that would increase the number of moving parts. */ _commitCallback: function PendingCommitTracker_commitCallback() { It is my hope that this resolves wsmwk's observed problems from comment 69 (and my own).
There are also some interesting mozmill regressions. messageInjection.js includes some work-arounds intended exclusively to benefit gloda. We clear the new message list to stop double msgsClassified notifications. Since none of the messages we insert are ever new anymore, the code path in FolderDisplayWidget._activeAllMessagesLoaded no longer stops at the "new messages" case. Instead, it continues on to the "last selected message" case. (Which is somewhat newly reintroduced by bug 505044.) So all kinds of cases where we used to select nothing by default we now select something. Although I fixed various things to handle the last selected message case before I got wise to the real problem, I'm just going to disable the gloda-specific work-arounds for the mozmill case for now. Bustage fix coming up.
(In reply to comment #72) > Although I fixed various things to handle the last selected message case before > I got wise to the real problem, I'm just going to disable the gloda-specific > work-arounds for the mozmill case for now. Bustage fix coming up. This does not seem to be sufficient. I'll have to look into this more tomorrow.
(In reply to comment #71) > It is my hope that this resolves wsmwk's observed problems from comment 69 (and > my own). It looks like there is something powerful bad happening in terms of enumerators either giving up early (terminating enumeration prior to when it should), or the filter part is betraying us. I am observing this happening in a post-nuked-db filthy traversal. This is regrettably the worst possible phase of operation for such a problem to occur in since subsequent filtered enumerators will always think the messages are cool, so indexing sweeps will never resolve the problem; only event-driven indexing can resolve it. I also discovered a lesser problem where old gloda-id's in the bad message id range were not getting marked filthy because of the filter enumerator in use. I have rectified that problem in a local patch.
(In reply to comment #74) > It looks like there is something powerful bad happening in terms of enumerators > either giving up early (terminating enumeration prior to when it should), or > the filter part is betraying us. I am observing this happening in a Whoops. There I go libeling things again. The problem is event-driven indexing down-grading the folders from filthy to dirty when it thinks it is upgrading them to dirty (or idempotently having no effect). I'm going to: - make event-driven indexing ignore folders that are filthy - make the GlodaFolder dirtyStatus setter only allow the clean-to-dirty transition, requiring a separate explicit method call to downgrade from filthy. - (of course) add a unit test for this case.
(In reply to comment #75) > I'm going to: > - make event-driven indexing ignore folders that are filthy > - make the GlodaFolder dirtyStatus setter only allow the clean-to-dirty > transition, requiring a separate explicit method call to downgrade from filthy. > - (of course) add a unit test for this case. I did this, although I instead made dirtyStatus read-only and added explicit setter logic. Regrettably this was a serious problem and I had to bump the schema version again. On the upside, everyone will get extremely high quality testing of the gloda indexing process. While waiting for a dogfood pass of my own accounts to complete I also cleaned up two loose ends related to compaction. The compacted state of a folder was previously not persisted, so if a user quit before gloda processed for compaction, we would never actually perform a compaction pass. Unit test added for that. While auditing index_msg.js for compaction I also figured out how one of the edge cases that we previously would just warn about could actually occur. I implemented handling for that edge case, improved the documentation, and added a unit test. One minor fix that I did not add a unit test for but did manually verify was that we were not marking messages that had gloda-id's in the 'bad' range as filthy. pushed to trunk: the patch and a harmless typo fix: http://hg.mozilla.org/comm-central/rev/f0c1f7b4f5c2 http://hg.mozilla.org/comm-central/rev/3e12668e5de4 pushed to comm-1.9.1: the patch and harmless typo fix: http://hg.mozilla.org/releases/comm-1.9.1/rev/75b1d6fa0b67 http://hg.mozilla.org/releases/comm-1.9.1/rev/40f9c4388c48
Whiteboard: [no l10n impact][gloda key] → [no l10n impact][gloda key][still need to look into mozmill regressions]
(In reply to comment #73) > (In reply to comment #72) > > Although I fixed various things to handle the last selected message case before > > I got wise to the real problem, I'm just going to disable the gloda-specific > > work-arounds for the mozmill case for now. Bustage fix coming up. > > This does not seem to be sufficient. I'll have to look into this more > tomorrow. The fix which does not sufficiently fix is here in case anyone is being blocked by my specific mozmill regressions and is willing to try and figure out why this still is not sufficient: http://hg.mozilla.org/users/bugmail_asutherland.org/comm-central-patches/file/8e68e2a72a00/mozmill-no-use-clear-new-messages There are some other mozmill fixes and logging enhancements in my queue as well, all near the top.
Blocks: 520409
Mozmill tests should be just as working as they were before I broke everything. My fix for bug 527687 obviated the need for the ugly hack where we cleared the new messages to stop msgsClassified from firing multiple times for a single message. There was one other error I introduced which has been fixed on trunk: http://hg.mozilla.org/comm-central/rev/3abdf96b226d on comm-1.9.1: http://hg.mozilla.org/releases/comm-1.9.1/rev/6a1330ac7857 (r+a=Standard8 via IRC)
Depends on: 533513
No longer blocks: 541357
Depends on: 559107
Depends on: 1209174
See Also: → 1575214
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: