Closed Bug 466066 Opened 13 years ago Closed 12 years ago

gloda could load message subject/text body for messages from the full-textdatabase

Categories

(MailNews Core :: Database, enhancement, P2)

enhancement

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: asuth, Assigned: asuth)

Details

Attachments

(2 files, 1 obsolete file)

Any message that is full-text indexed has the indexed body (and subject) available via the database.  We do not currently expose this information.  We could load this data when we load the message and expose it.

In many cases, this would eliminate the need to stream the message for display purposes.  Namely, cases where we only want the relevant textual body of the message.  The likely issue is that what we index may not be what we want to display.  For example, the gloda content analysis attempts to recognize quoted blocks.  These are not put in the fulltext index.  I imagine we would like to display the quoting in a collapsible form but still not index it.

From an efficiency perspective, a naive implementation would result in a primary key to primary key join.  Depending on whether the FTS3 match retrieves the record or not, we may be able to optimize that, but assuming we don't blow the cache it might not make a difference.
Priority: -- → P3
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
So, the previous patch combined with this exptoolbar patch should result in exptoolbar not streaming the search result messages but instead using the indexed body text values.  The patch assumes that said indexed body text exists.  This will currently be true, but in a world where we index non-offline messages, this will not be true.  We have various other issues along these lines that will need to be sorted out in the bindings, combined with some normalization, so I'm not worrying about it now.
Attachment #351336 - Flags: review?(bienvenu)
Comment on attachment 351336 [details] [diff] [review]
v1 expose message subject, body, attachment names.  add unit test check on subject.

This functionality does what we want and satisfies our medium-term needs.  Requesting gloda patch review.
Priority: P3 → P2
Whiteboard: [has patch][needs review bienvenu]
Attachment #351336 - Flags: review?(bienvenu) → review+
Whiteboard: [has patch][needs review bienvenu] → [has patch]
Updated patch as checked-in: http://hg.mozilla.org/comm-central/rev/cba1fda2f84d
r=bienvenu carried forward.

I had to make a tiny change to the unit testing code because my unit test for bug 467685 that nukes the cache had used getMessagesByMessageID to reload the message from the database to make sure we were consistent.  Except this patch went and made the returned messages from that method differ from those retrieved via queryFromQuery, requiring that we change our verification logic to use a query.

Unfortunately, as I write this, I realize that getMessagesByMessageID does not do a sufficiently good job of ensuring that the messages it loads (that are not cache-unified) that lack the full-text retrieved values do not escape into the general consciousness.  Namely, its collection and quasi-messages will be reachable for a brief period of time when it is not garbage and then for a random amount of time until it is garbage collected.

I am not going to back this patch out because the messages without the additional state actually represent a valid message state, even if it's not the right state.  This will allow exptoolbar work to more forward while I fashion a new patch...

I will create a new patch that further unifies the getMessagesByMessageID and queryFromQuery code paths.  (They use the same back-end mechanism, but each has their own method of SQL query generation.)  The alternative would be to just make sure the half-formed messages can never escape their indexing prison.  But I just now realized that by having the indexed message text available, we can potentially do neat advanced quote detection sorts of things.  (Although not the one where we thread by that, as the method in question is using message-id headers.)  I will restore the unit test to the previous set of logic and then add a cross-check against a query result as well.
Attachment #351336 - Attachment is obsolete: true
Attachment #351759 - Flags: review+
Attachment #351759 - Attachment description: v2 expose message subject, body, attachment names. add unit test check on subject; minor test change. → [checked in: comment #4] v2 expose message subject, body, attachment names. add unit test check on subject; minor test change.
Whiteboard: [has patch] → [needs successor patch]
This has been dealt with.  Sid's latest work cleaned up getMessagesByMessageID to use queries, and the code paths had been largely fused before that in another patch anyways.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
Whiteboard: [needs successor patch]
You need to log in before you can comment on or make changes to this bug.