Closed Bug 505307 Opened 11 years ago Closed 11 years ago

gloda should use nsIMsgDatabase::getFilterEnumerator to avoid creating a lot of js garbage nsMsgHdrs

Categories

(MailNews Core :: Database, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b4

People

(Reporter: Bienvenu, Assigned: rkent)

References

Details

(Whiteboard: [no l10n impact])

Attachments

(1 file, 2 obsolete files)

indexer.js creates a hdr enumerator and enumerates over the headers of a db looking for non-indexed message. For a folder that's mostly indexed, this generates a lot of garbage nsIMsgDBHdrs that aren't needed, and causes the cycle collector to block for long periods of time. Using the filter enumerator to find hdrs to index should produce a lot less garbage.
Flags: blocking-thunderbird3+
OS: Windows Vista → All
Hardware: x86 → All
Whiteboard: [no l10n impact]
Kent, is this something you could look at? Bug 499806 shows how to construct the search terms for the filtered enumerator...
Sure, I'll give it a shot.
Status: NEW → ASSIGNED
Assignee: nobody → kent
The new hdrProperty search attribute that was added to support this uses a string value, while most cases of interest to gloda (GLODA_MESSAGE_ID_PROPERTY, GLODA_DIRTY_PROPERTY) are Uint32 values. I'm pretty sure that both Mork and SQLite don't really care, and could be made to work OK with reading a string property which was really an integer property. But doing so is rather tacky, and I would hate to be the first and/or only one to do that. Were you (bienvenu) expecting that we would search using the string values on the integer gloda properties? Would it be better to also add a search attribute for a Uint32?
sorry, I'm still in Windows upgrade/install hell.  I felt a string property was more generally useful, and would be OK for what gloda needs, but I'm not opposed to adding a search attribute for a uint32. In fact, you would be my hero, because we need that for bug 505309
Depends on: 513102
Depends on: 513277
I've got a first cut at this. You need to first land patches for both blocking bugs for this to work.

It runs and indexes, but I really don't even know what is the real problem I am trying to solve here. What could I do to demonstrate that this is solving the underlying issue?

Also, do existing gloda unit tests cover whether basic indexing is working?

Please give suggestions on tests I can do.
(In reply to comment #5)
> It runs and indexes, but I really don't even know what is the real problem I am
> trying to solve here. What could I do to demonstrate that this is solving the
> underlying issue?

The problem being solved is avoiding XPConnect crossings and (more importantly) the garbage collection burden of having large numbers of headers exposed to XPConnect.

Since trying to verify what is or is not in memory when it should be garbage collected is crazy, we can't do that.  We could verify that the iterator doesn't tell us about things it shouldn't tell us about (and does tell us about things that it should), but that's just a re-tread of the filter enumerator test and of the search subsystem.
 
> Also, do existing gloda unit tests cover whether basic indexing is working?

Yes.  Both event-driven and bulk indexing is tested.  Gloda tests that set themselves up using "injectMessagesUsing(INJECT_MBOX);" use GlodaIndexer.indexFolder to index the added messages which uses your enumerator in the case where it should return true in all cases.

There are, however, no unit tests that tests that the enumerator ignores messages that gloda does not want to index.

> Please give suggestions on tests I can do.

First, make two changes...

1) Add the lines "else" and "this.indexer.indexingSweepNeeded = true;" to GlodaIndexer._folderListener._reindexChangedMessage as part of the last if construct circa line 2352. (This was an oversight on my part... it should be there.  Feel free to fold it into your patch.)

2) Alter the code that counts the number of messages we are going to want to index to use nextMatchingHdrs on nsIMsgDatabase.  This can be used to count the number of messages the filtered enumerator matches without exposing the headers via XPConnect.  (Pass null for the parameters that would normally output things.)  We can use it to still only traverse a finite number at a time.

Then...

A litmus test would be to select a folder in which all the messages have already been indexed by gloda.  Open the activity manager.  Select 60 messages and toggle their read status.  This should overflow the event-queue and cause an indexing sweep to be required.  The indexing sweep should encounter your folder and count how many messages it thinks it will need to index, and then it will index them.  Check that the estimate count is less than or equal to 40.  (The event driven indexing mechanism may be able to claim up to 40 of them...)  If the estimate is something ridiculous like the total number of messages in the folder, then your enumerator is broke.

This could also be implemented as a Gloda unit test... We would index a folder, then explicitly mark some of the messages dirty, then index the folder again and verify that the job's estimate was correct and that the number of messages indexed also matched that.

I'm fine if you just run the litmus test.  Although change #1 doesn't need to be part of this patch, change #2 should be.
>Since trying to verify what is or is not in memory when it should be garbage
>collected is crazy

Call me crazy :-) if we wanted, we could expose an attribute on the db which is the count of its msg hdrs in memory (i.e., the use cache hash table num entries), and verify that an index sweep of a fully indexed folder doesn't make that count grow like topsy. But that would require having a fairly large folder so we could get around the fact that the db does cache a certain number of headers (or set the cache to a small number via nsIMsgDatabase.msgHdrCacheSize).

I'm not advocating that; just saying it's possible :-)

But once this bug and the desktop search integration bug are fixed, I'll be able to tell you in a day or so if it helps, because this issue bites me multiple times a day. Which is why I'm going to try to go land the search grouping fix now.
> 
> First, make two changes...
> 
> 1) Add the lines "else" and "this.indexer.indexingSweepNeeded = true;" to
> GlodaIndexer._folderListener._reindexChangedMessage as part of the last if
> construct circa line 2352. (This was an oversight on my part... it should be
> there.  Feel free to fold it into your patch.)

Is this what you wanted? I wasn't sure about where the this.indexer.indexing = true; should go in particular.

      // only queue the message if we haven't overflowed our event-driven budget
      if (this.indexer._pendingAddJob.items.length <
          this.indexer._indexMaxEventQueueMessages)
        this.indexer._pendingAddJob.items.push(
          [GlodaDatastore._mapFolder(msgFolder).id,
           aMsgHdr.messageKey]);
      else
        this.indexer.indexingSweepNeeded = true;
      this.indexer.indexing = true;
    },
Comment on attachment 397705 [details] [diff] [review]
First cut, runs but does it work? Who knows?

when defining the search term, you need to set the hdr property, e.g.,

searchTerm.hdrProperty = "gloda-id";
Attached patch Rev 2, might even work! (obsolete) — Splinter Review
I fixed lots of issues, and incorporated the comments (I think). This passes all existing unit tests. But I have not a review of the code myself, which I need to do, particularly of the parts where I removed the _pendingFolderWantsIterator stuff.
Attachment #397705 - Attachment is obsolete: true
I've now done my own review, but only ended up changing some comments. I'm now moving this into my personal build that I use for regular email for additional testing. I've left a few extra debug statements so that I can see it working. I still have not tested to see if I can see any impact of this, plus I'm not sure what the performance issues are going to be yet. But you are welcome to try this patch.
Attachment #397812 - Attachment is obsolete: true
Attachment #397891 - Flags: superreview?(bienvenu)
Attachment #397891 - Flags: review?(bugmail)
Comment on attachment 397891 [details] [diff] [review]
Slight comment tweaks

Let's review this version. We'll need to remove the extra dumps before release, but it's nice to have them in testing, as it is good feedback of the filtered enumerator results.

I did have one issue when I first tried this, that a spam folder (that was not an official spam folder) seemed to keep repeating its indexing. Spam folders often have lots of MIME and UTF errors. But it went away, and I'm not convinced was caused by this patch. But think about that.
Comment on attachment 397891 [details] [diff] [review]
Slight comment tweaks

Search does, but this isn't search per se - it will block the UI going through all the headers of the db, if it doesn't find a match, which is why nextMatchingHdrs provides a mechanism to limit the number of headers to look at. I think you should be setting that number to a number around 100.

+        // XXX not clear that this is really needed, since search has its own
+        // method to yield to UI periodically.
I'm particularly worried about the case where we have a 10K messages folder, all up to date. We don't want to block the UI for that the time it takes to run the search criteria on 10K msg hdrs.
Attachment #397891 - Flags: superreview?(bienvenu)
Attachment #397891 - Flags: review?(bugmail)
Attachment #397891 - Flags: review+
Comment on attachment 397891 [details] [diff] [review]
Slight comment tweaks

pushed with minor changes approved by rkent: http://hg.mozilla.org/comm-central/rev/75644ade5a9a

changes were:
- Status should have been MsgStatus
- Take medium-sized bites (200 messages) instead of doing it all at once.

sr not required for this given that the architectural decision to do this came from bienvenu and the many small bites desire is now met.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
blocking1.9.1: needed → ---
You need to log in before you can comment on or make changes to this bug.