Closed
Bug 466227
Opened 16 years ago
Closed 15 years ago
gloda should index messages that are not offline too (headers only)
Categories
(MailNews Core :: Database, defect)
MailNews Core
Database
Tracking
(Not tracked)
RESOLVED
FIXED
Thunderbird 3.0b4
People
(Reporter: asuth, Assigned: rain1)
References
Details
(Whiteboard: [b3ux][m6][no l10n impact][gloda key])
Attachments
(2 files, 8 obsolete files)
102.71 KB,
patch
|
asuth
:
review+
Bienvenu
:
review+
Bienvenu
:
superreview+
|
Details | Diff | Splinter Review |
4.10 KB,
patch
|
Details | Diff | Splinter Review |
Right now, gloda only indexes messages that are offline. This means IMAP messages whose folders are stored offline and the messages have been retrieved, as well as local mail folders (and anything that uses local mail folders, like RSS.) Although there is some loss of functionality for indexing messages that are only available online, it is desirable to index them as best we can. Although a hybrid mode of operation would be available where we stream every message as we index it to get the same results, it is unlikely this is what a user who doesn't want their messages stored offline would want. Disk usage would still be within an order of magnitude of offline storage (ignoring attachments), and network usage could be horrendous. (And we might assume that a device you don't want things stored offline, like a netbook (due to limited storage), might also have network limitations as well.) The functionality that is lost is: 1) full-text search 2) indexing based on possession of the full body of the message and all message headers. This would make it would be harder to detect mailing lists, impossible to detect bug references, etc. etc. This will not / must not make beta 1, but should be part of our pref-on solution. The reason it cannot go in beta 1 is that we are overloading the offline control to serve as a control for what to index / not index.
Reporter | ||
Updated•15 years ago
|
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Flags: blocking-thunderbird3?
Target Milestone: --- → Thunderbird 3.0b2
Updated•15 years ago
|
Flags: blocking-thunderbird3? → blocking-thunderbird3+
Updated•15 years ago
|
Whiteboard: [no l10n impact]
Updated•15 years ago
|
Whiteboard: [no l10n impact] → [no l10n impact][target frozen]
Reporter | ||
Updated•15 years ago
|
Target Milestone: Thunderbird 3.0b2 → Thunderbird 3.0b3
Updated•15 years ago
|
Whiteboard: [no l10n impact][target frozen] → [no l10n impact][target frozen][b3ux]
Reporter | ||
Updated•15 years ago
|
Whiteboard: [no l10n impact][target frozen][b3ux] → [no l10n impact][target frozen][b3ux][m3]
Updated•15 years ago
|
Whiteboard: [no l10n impact][target frozen][b3ux][m3] → [b3ux][m3][no l10n impact][target frozen]
Reporter | ||
Updated•15 years ago
|
Whiteboard: [b3ux][m3][no l10n impact][target frozen] → [b3ux][m6][no l10n impact]
Reporter | ||
Updated•15 years ago
|
Status: ASSIGNED → NEW
Whiteboard: [b3ux][m6][no l10n impact] → [b3ux][m6][no l10n impact][gloda key]
Assignee | ||
Comment 1•15 years ago
|
||
I'm looking into this bug.
Assignee | ||
Comment 2•15 years ago
|
||
Right now my biggest trouble seems to be trying to get unit tests to work. The IMAP fakeserver supports only one connection in its lifetime, and something's causing the IMAP connection to drop in the middle of the test. I haven't figured out what yet -- guess I'll spend some time in the debugger today.
Assignee | ||
Comment 3•15 years ago
|
||
Adding bienvenu in case he has some ideas.
Reporter | ||
Comment 4•15 years ago
|
||
This is looking good! A few comments on the current patch: - I think gloda should skip indexing messages that are in a folder tagged for offline retrieval but that have not yet been retrieved. Otherwise we get into the ugly situation where we need to re-index the message when it does get stored offline. This would require us to update the full-text row for the message, which the code currently does't know how to do, and which does have some disk bloat and performance ramifications. - You don't need to yield kWorkSync if you're skipping retrieving the body. The intent is that we only do that when we've done something potentially costly, which we have not done in that case. - You are a braver man than I to try and actually test with real IMAP accounts! :) But the realistic testing will be very helpful, so thank you. - If you aren't going to modify the tests in test_index_imap_messages.js, just load() the test you're copying the tests from and override tests and run_test. If you have to modify the source test file to not load bad things, that's fine. If bienvenu doesn't have any immediate ideas on what is exploding, I'd advise using wireshark/tcpdump to examine the (loopback) network traffic to see what's happening. If that doesn't make it obvious, I can run your unit test under chronicle-recorder/chroniquery and attempt to generate a trace of what the IMAP code is up to. That may not be entirely realistic, though, as I understand that chronicle-recorder is limited in its multi-threaded simulation and may flatten things so they end up working.
Assignee: bugmail → sid.bugzilla
Status: NEW → ASSIGNED
Assignee | ||
Comment 5•15 years ago
|
||
(In reply to comment #4) > This is looking good! A few comments on the current patch: > > - I think gloda should skip indexing messages that are in a folder tagged for > offline retrieval but that have not yet been retrieved. Otherwise we get into > the ugly situation where we need to re-index the message when it does get > stored offline. This would require us to update the full-text row for the > message, which the code currently does't know how to do, and which does have > some disk bloat and performance ramifications. Yep, I've already implemented that. :) > > - You don't need to yield kWorkSync if you're skipping retrieving the body. > The intent is that we only do that when we've done something potentially > costly, which we have not done in that case. Oh, that was just a random attempt to try fixing the IMAP issues. > > - If you aren't going to modify the tests in test_index_imap_messages.js, just > load() the test you're copying the tests from and override tests and run_test. > If you have to modify the source test file to not load bad things, that's fine. > That sounds like a good way to do it.
Assignee | ||
Comment 6•15 years ago
|
||
Includes fix for the GC issue -- still need to track down the reason test_index_messages.js is failing + write a few more tests, as these are far from enough...
Attachment #376671 -
Attachment is obsolete: true
Assignee | ||
Comment 7•15 years ago
|
||
erm, a few extra dump messages are there -- some indicative of my frustration ;)
Assignee | ||
Comment 8•15 years ago
|
||
OK, so we really don't want to add ghost messages to the messagesText table. Still need to figure out the update case, though.
Attachment #377926 -
Attachment is obsolete: true
Assignee | ||
Comment 9•15 years ago
|
||
Updates are now handled -- I'm not convinced the shouldIndexMessage logic is correct right now. Plus, need more tests! I'd like to adapt some of the tests from bug 474701 to IMAP, so I guess I'll wait on that (and face the bitrot ;) )
Attachment #377930 -
Attachment is obsolete: true
Assignee | ||
Comment 10•15 years ago
|
||
Attachment #378134 -
Attachment is obsolete: true
Comment 11•15 years ago
|
||
We've already shipped a couple of betas without this functionality; I don't think we'd actually hold b3 for it, so I'm moving the target milestone out.
Target Milestone: Thunderbird 3.0b3 → Thunderbird 3.0b4
Assignee | ||
Comment 12•15 years ago
|
||
still passes all the tests :)
Attachment #378348 -
Attachment is obsolete: true
Assignee | ||
Comment 13•15 years ago
|
||
Attachment #397679 -
Attachment is obsolete: true
Assignee | ||
Comment 14•15 years ago
|
||
Fixes several bugs in the existing code base, and adds a bunch of tests. I think the IMAP code is pretty well exercised in the tests overall.
Attachment #397896 -
Attachment is obsolete: true
Assignee | ||
Comment 15•15 years ago
|
||
Comment on attachment 397987 [details] [diff] [review] patch, v1 I know asuth's really busy, so whoever gets to it first :)
Attachment #397987 -
Attachment description: v1? → patch, v1
Attachment #397987 -
Flags: superreview?(bienvenu)
Attachment #397987 -
Flags: review?(bugmail)
Attachment #397987 -
Flags: review?(bienvenu)
Reporter | ||
Comment 16•15 years ago
|
||
I think updateMessage always calling _updateMessageText which always updates the text may be a little too aggressive. Because of the way the sqlite FTS3 is implemented, updating a message with the exact same values is not a no-op. I do think it's a good idea to be able to update the message text given that we have (extensible) content whittlers. However, I think we should check in-memory if the old text matches the new text and bail if that is the case, avoiding the potentially redundant update. (I realize that part of the problem is just that we re-index things more than they need to be indexed, but there are still a lot of cases where we do want to re-index but the text would not have changed. However, the amount of text in a message can be non-trivial, and the segment merging logic has a much higher impact when it happens than the b-tree jiggle that happens from our nuking/re-insertion of the message attributes.)
Reporter | ||
Comment 17•15 years ago
|
||
Also, if you could de-bitrot this in terms of the bug 505307 changes, that would be ideal. I promise not to bitrot/enable bitrotting of this patch after that :)
Comment 18•15 years ago
|
||
I'm going to wait for the de-bitrotted patch before applying and testing...
Assignee | ||
Comment 19•15 years ago
|
||
(In reply to comment #16) > I think updateMessage always calling _updateMessageText which always updates > the text may be a little too aggressive. Because of the way the sqlite FTS3 is > implemented, updating a message with the exact same values is not a no-op. Ouch -- I did assume it was a no-op.
Assignee | ||
Comment 20•15 years ago
|
||
One question -- right now I'm assuming that a change in the indexed body text is the only thing that should determine whether to update the messagesText row -- so if the attachment list changes but the indexed body text doesn't, we won't update the row. Is this assumption valid?
Attachment #397987 -
Attachment is obsolete: true
Attachment #398421 -
Flags: superreview?(bienvenu)
Attachment #398421 -
Flags: review?(bugmail)
Attachment #398421 -
Flags: review?(bienvenu)
Attachment #397987 -
Flags: superreview?(bienvenu)
Attachment #397987 -
Flags: review?(bugmail)
Attachment #397987 -
Flags: review?(bienvenu)
Reporter | ||
Comment 21•15 years ago
|
||
(In reply to comment #20) > One question -- right now I'm assuming that a change in the indexed body text > is the only thing that should determine whether to update the messagesText row > -- so if the attachment list changes but the indexed body text doesn't, we > won't update the row. Is this assumption valid? An interesting question. At some point it's conceivable we or an extension might deal with the MS TNEF problem. If it's not an extension, we can just blow away the database. If it is an extension, they would have to trigger blowing away the database themselves, but they would probably want to do that anyways. I think the assumption is valid enough for now. When gloda grows a way to do the right thing in terms of re-indexing as extensions show up we can look into doing that.
Reporter | ||
Comment 22•15 years ago
|
||
The change to test_query_core.js to not call imsInit results in the adaptive indexer not getting sufficiently "adjusted". As a result, on linux with no DISPLAY, when the indexer goes to check the idleTime, it throws an exception and the test fails. This change makes sure we always do what is necessary to make the adaptive indexer not be adaptive and not explode. This needs to be layered on sid's patch. Then everything passes for me. Woo!
Reporter | ||
Comment 23•15 years ago
|
||
Comment on attachment 398421 [details] [diff] [review] patch, v2 r=asuth with my patch to make test_query_core work on linux or something similar. I'm very psyched to have IMAP tests in our repetoire; I know it wasn't a cakewalk. Thank you!
Attachment #398421 -
Flags: review?(bugmail) → review+
Assignee | ||
Updated•15 years ago
|
Whiteboard: [b3ux][m6][no l10n impact][gloda key] → [needs review bienvenu][b3ux][m6][no l10n impact][gloda key]
Updated•15 years ago
|
Attachment #398421 -
Flags: superreview?(bienvenu)
Attachment #398421 -
Flags: superreview+
Attachment #398421 -
Flags: review?(bienvenu)
Attachment #398421 -
Flags: review+
Comment 24•15 years ago
|
||
Comment on attachment 398421 [details] [diff] [review] patch, v2 this looks right to me - I'm having a bit of trouble with gloda not indexing some of my messages, but that long predates this patch.
Updated•15 years ago
|
Whiteboard: [needs review bienvenu][b3ux][m6][no l10n impact][gloda key] → [b3ux][m6][no l10n impact][gloda key]
Assignee | ||
Comment 25•15 years ago
|
||
Pushed along with asuth's test_query_core fix: http://hg.mozilla.org/comm-central/rev/7d5450206f5b I also made one trivial change to the LEFT JOIN SQL -- the "on" should have been "ON".
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•