Closed Bug 466227 Opened 16 years ago Closed 15 years ago

gloda should index messages that are not offline too (headers only)

Categories

(MailNews Core :: Database, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b4

People

(Reporter: asuth, Assigned: rain1)

References

Details

(Whiteboard: [b3ux][m6][no l10n impact][gloda key])

Attachments

(2 files, 8 obsolete files)

Right now, gloda only indexes messages that are offline.  This means IMAP messages whose folders are stored offline and the messages have been retrieved, as well as local mail folders (and anything that uses local mail folders, like RSS.)

Although there is some loss of functionality for indexing messages that are only available online, it is desirable to index them as best we can.

Although a hybrid mode of operation would be available where we stream every message as we index it to get the same results, it is unlikely this is what a user who doesn't want their messages stored offline would want.  Disk usage would still be within an order of magnitude of offline storage (ignoring attachments), and network usage could be horrendous.  (And we might assume that a device you don't want things stored offline, like a netbook (due to limited storage), might also have network limitations as well.)

The functionality that is lost is:
1) full-text search
2) indexing based on possession of the full body of the message and all message headers.  This would make it would be harder to detect mailing lists, impossible to detect bug references, etc. etc.

This will not / must not make beta 1, but should be part of our pref-on solution.  The reason it cannot go in beta 1 is that we are overloading the offline control to serve as a control for what to index / not index.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Flags: blocking-thunderbird3?
Target Milestone: --- → Thunderbird 3.0b2
Flags: blocking-thunderbird3? → blocking-thunderbird3+
Whiteboard: [no l10n impact]
Whiteboard: [no l10n impact] → [no l10n impact][target frozen]
Target Milestone: Thunderbird 3.0b2 → Thunderbird 3.0b3
Whiteboard: [no l10n impact][target frozen] → [no l10n impact][target frozen][b3ux]
Whiteboard: [no l10n impact][target frozen][b3ux] → [no l10n impact][target frozen][b3ux][m3]
Whiteboard: [no l10n impact][target frozen][b3ux][m3] → [b3ux][m3][no l10n impact][target frozen]
Whiteboard: [b3ux][m3][no l10n impact][target frozen] → [b3ux][m6][no l10n impact]
Status: ASSIGNED → NEW
Whiteboard: [b3ux][m6][no l10n impact] → [b3ux][m6][no l10n impact][gloda key]
I'm looking into this bug.
Attached patch work in progress (obsolete) — Splinter Review
Right now my biggest trouble seems to be trying to get unit tests to work. The IMAP fakeserver supports only one connection in its lifetime, and something's causing the IMAP connection to drop in the middle of the test. I haven't figured out what yet -- guess I'll spend some time in the debugger today.
Adding bienvenu in case he has some ideas.
This is looking good!  A few comments on the current patch:

- I think gloda should skip indexing messages that are in a folder tagged for offline retrieval but that have not yet been retrieved.  Otherwise we get into the ugly situation where we need to re-index the message when it does get stored offline.  This would require us to update the full-text row for the message, which the code currently does't know how to do, and which does have some disk bloat and performance ramifications.

- You don't need to yield kWorkSync if you're skipping retrieving the body.  The intent is that we only do that when we've done something potentially costly, which we have not done in that case.

- You are a braver man than I to try and actually test with real IMAP accounts! :)  But the realistic testing will be very helpful, so thank you.

- If you aren't going to modify the tests in test_index_imap_messages.js, just load() the test you're copying the tests from and override tests and run_test.  If you have to modify the source test file to not load bad things, that's fine.

If bienvenu doesn't have any immediate ideas on what is exploding, I'd advise using wireshark/tcpdump to examine the (loopback) network traffic to see what's happening.  If that doesn't make it obvious, I can run your unit test under chronicle-recorder/chroniquery and attempt to generate a trace of what the IMAP code is up to.  That may not be entirely realistic, though, as I understand that chronicle-recorder is limited in its multi-threaded simulation and may flatten things so they end up working.
Assignee: bugmail → sid.bugzilla
Status: NEW → ASSIGNED
(In reply to comment #4)
> This is looking good!  A few comments on the current patch:
> 
> - I think gloda should skip indexing messages that are in a folder tagged for
> offline retrieval but that have not yet been retrieved.  Otherwise we get into
> the ugly situation where we need to re-index the message when it does get
> stored offline.  This would require us to update the full-text row for the
> message, which the code currently does't know how to do, and which does have
> some disk bloat and performance ramifications.

Yep, I've already implemented that. :)

> 
> - You don't need to yield kWorkSync if you're skipping retrieving the body. 
> The intent is that we only do that when we've done something potentially
> costly, which we have not done in that case.

Oh, that was just a random attempt to try fixing the IMAP issues.

> 
> - If you aren't going to modify the tests in test_index_imap_messages.js, just
> load() the test you're copying the tests from and override tests and run_test. 
> If you have to modify the source test file to not load bad things, that's fine.
> 

That sounds like a good way to do it.
Includes fix for the GC issue -- still need to track down the reason test_index_messages.js is failing + write a few more tests, as these are far from enough...
Attachment #376671 - Attachment is obsolete: true
erm, a few extra dump messages are there -- some indicative of my frustration ;)
Attached patch tests pass now (obsolete) — Splinter Review
OK, so we really don't want to add ghost messages to the messagesText table. Still need to figure out the update case, though.
Attachment #377926 - Attachment is obsolete: true
Attached patch even better (obsolete) — Splinter Review
Updates are now handled -- I'm not convinced the shouldIndexMessage logic is correct right now. Plus, need more tests! I'd like to adapt some of the tests from bug 474701 to IMAP, so I guess I'll wait on that (and face the bitrot ;) )
Attachment #377930 - Attachment is obsolete: true
Attached patch latest patch (obsolete) — Splinter Review
Attachment #378134 - Attachment is obsolete: true
Blocks: 495341
We've already shipped a couple of betas without this functionality; I don't think we'd actually hold b3 for it, so I'm moving the target milestone out.
Target Milestone: Thunderbird 3.0b3 → Thunderbird 3.0b4
Attached patch Unbitrotted (obsolete) — Splinter Review
still passes all the tests :)
Attachment #378348 - Attachment is obsolete: true
Attached patch even better (obsolete) — Splinter Review
Attachment #397679 - Attachment is obsolete: true
Attached patch patch, v1 (obsolete) — Splinter Review
Fixes several bugs in the existing code base, and adds a bunch of tests. I think the IMAP code is pretty well exercised in the tests overall.
Attachment #397896 - Attachment is obsolete: true
Comment on attachment 397987 [details] [diff] [review]
patch, v1

I know asuth's really busy, so whoever gets to it first :)
Attachment #397987 - Attachment description: v1? → patch, v1
Attachment #397987 - Flags: superreview?(bienvenu)
Attachment #397987 - Flags: review?(bugmail)
Attachment #397987 - Flags: review?(bienvenu)
I think updateMessage always calling _updateMessageText which always updates the text may be a little too aggressive.  Because of the way the sqlite FTS3 is implemented, updating a message with the exact same values is not a no-op.

I do think it's a good idea to be able to update the message text given that we have (extensible) content whittlers.  However, I think we should check in-memory if the old text matches the new text and bail if that is the case, avoiding the potentially redundant update.

(I realize that part of the problem is just that we re-index things more than they need to be indexed, but there are still a lot of cases where we do want to re-index but the text would not have changed.  However, the amount of text in a message can be non-trivial, and the segment merging logic has a much higher impact when it happens than the b-tree jiggle that happens from our nuking/re-insertion of the message attributes.)
Also, if you could de-bitrot this in terms of the bug 505307 changes, that would be ideal.  I promise not to bitrot/enable bitrotting of this patch after that :)
I'm going to wait for the de-bitrotted patch before applying and testing...
(In reply to comment #16)
> I think updateMessage always calling _updateMessageText which always updates
> the text may be a little too aggressive.  Because of the way the sqlite FTS3 is
> implemented, updating a message with the exact same values is not a no-op.

Ouch -- I did assume it was a no-op.
Attached patch patch, v2Splinter Review
One question -- right now I'm assuming that a change in the indexed body text is the only thing that should determine whether to update the messagesText row -- so if the attachment list changes but the indexed body text doesn't, we won't update the row. Is this assumption valid?
Attachment #397987 - Attachment is obsolete: true
Attachment #398421 - Flags: superreview?(bienvenu)
Attachment #398421 - Flags: review?(bugmail)
Attachment #398421 - Flags: review?(bienvenu)
Attachment #397987 - Flags: superreview?(bienvenu)
Attachment #397987 - Flags: review?(bugmail)
Attachment #397987 - Flags: review?(bienvenu)
(In reply to comment #20)
> One question -- right now I'm assuming that a change in the indexed body text
> is the only thing that should determine whether to update the messagesText row
> -- so if the attachment list changes but the indexed body text doesn't, we
> won't update the row. Is this assumption valid?

An interesting question.  At some point it's conceivable we or an extension might deal with the MS TNEF problem.  If it's not an extension, we can just blow away the database.  If it is an extension, they would have to trigger blowing away the database themselves, but they would probably want to do that anyways.

I think the assumption is valid enough for now.  When gloda grows a way to do the right thing in terms of re-indexing as extensions show up we can look into doing that.
The change to test_query_core.js to not call imsInit results in the adaptive indexer not getting sufficiently "adjusted".  As a result, on linux with no DISPLAY, when the indexer goes to check the idleTime, it throws an exception and the test fails.

This change makes sure we always do what is necessary to make the adaptive indexer not be adaptive and not explode.

This needs to be layered on sid's patch.  Then everything passes for me.  Woo!
Comment on attachment 398421 [details] [diff] [review]
patch, v2

r=asuth with my patch to make test_query_core work on linux or something similar.

I'm very psyched to have IMAP tests in our repetoire; I know it wasn't a cakewalk.  Thank you!
Attachment #398421 - Flags: review?(bugmail) → review+
Whiteboard: [b3ux][m6][no l10n impact][gloda key] → [needs review bienvenu][b3ux][m6][no l10n impact][gloda key]
Attachment #398421 - Flags: superreview?(bienvenu)
Attachment #398421 - Flags: superreview+
Attachment #398421 - Flags: review?(bienvenu)
Attachment #398421 - Flags: review+
Comment on attachment 398421 [details] [diff] [review]
patch, v2

this looks right to me - I'm having a bit of trouble with gloda not indexing some of my messages, but that long predates this patch.
Whiteboard: [needs review bienvenu][b3ux][m6][no l10n impact][gloda key] → [b3ux][m6][no l10n impact][gloda key]
Pushed along with asuth's test_query_core fix: http://hg.mozilla.org/comm-central/rev/7d5450206f5b

I also made one trivial change to the LEFT JOIN SQL -- the "on" should have been "ON".
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.