If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Full Text Search not finding all email, due to concept/design/implementation/characteristics of "search by SQLite FTS3 Extension"

NEW
Unassigned

Status

MailNews Core
Search
--
major
7 years ago
a year ago

People

(Reporter: Stephen Ryan, Unassigned)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

7 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv:1.9.2.9) Gecko/20100824 Firefox/3.6.9
Build Identifier: 20100915 Thunderbird/3.1.4

Search all messages is not doing a proper wildcard full text search. It seems to be doing a 'text' style search. Other modifiers aren't working as expected either.

Reproducible: Always

Steps to Reproduce:
4 emails in inbox as follows.
Subject: Test1. Body: 2010blahahahaha
Subject: Test2. Body: 2010-blahahahaha
Subject: Test3. Body: 2010-blahahahaha2
Subject: Test4. Body: 2010blahahahaha2

Search all messages has the following results.
blahahahaha - Test2
*blahahahaha* - Test2, Test3

The folder quick search for blahahahaha will find all four messages.
I can search for other details related to the messages such as subject in the "Search all messages" and it will find the mails, so it is not an indexing issue.
Actual Results:  
As above

Expected Results:  
Search all messages should respond the same as filter in the folder. That is searching for blah should be enough to find the four emails.
It's basically design of "SQLite FTS3 Extension" which Tb uses for "Global Indexer and Search"(Gloda).

> https://developer.mozilla.org/en/Thunderbird/About_gloda#Full-text_search
Tb uses customized mozporter as toknizer of FTS3 Virtual Table. It's similar to pre-packaged toknizer of SQLite FTS3 Extension. 
And next is applicable to your case.
> Tokens are broken on whitespace and punctuation. 

As written in next document, "search by SQLite FTS3 Extension" is not "absolutely accurate search".  
> http://cvs.hwaci.com/fts3.html
> SQLite FTS3 Extension
> Overview
> 
> FTS3 is an SQLite virtual table module that allows users to perform
> full-text searches on a set of documents. The most common (and
> effective) way to describe full-text searches is "what Google, Yahoo and
> Altavista do with documents placed on the World Wide Web". Users input a
> term, or series of terms, perhaps connected by a binary operator or
> grouped together into a phrase, and the full-text query system finds the
> set of documents that best matches those terms considering the operators
> and groupings the user has specified. This document describes the
> deployment and usage of FTS3.

As written in next sections, "full text search by SQLite FTS3 Extension" is simple type only. It's trade-off with "very fast full text search from very many documents".
> http://cvs.hwaci.com/fts3.html#section_1_3
> http://cvs.hwaci.com/fts3.html#section_3

> Steps to Reproduce:
> 4 emails in inbox as follows.
> Subject: Test1. Body: 2010blahahahaha
> Subject: Test2. Body: 2010-blahahahaha
> Subject: Test3. Body: 2010-blahahahaha2
> Subject: Test4. Body: 2010blahahahaha2
> 
> Search all messages has the following results.
> blahahahaha - Test2
> *blahahahaha* - Test2, Test3

tokens in your case:
  2010
  blahahahaha
  blahahahaha2
  2010blahahahaha
  2010blahahahaha2
Your search results probably corresponds to "MATCH blahahahaha" and "MATCH blahahahaha*" of SQLite FTS3 Extension.

> Expected Results:  
> Search all messages should respond the same as filter in the folder.
> That is searching for blah should be enough to find the four emails.

IIRC, "Quich search" utilizes IMAP search command if IMAP folder and offline-use=off. Search at IMAP server is probably accurate type search.
IIRC, if Gloda is disabled, traditional string search for body text is done by Tb. It's also a search type you expected.
As written in SQLite FTS3 Extension document, "Full Text Search" of "SQLite FTS3 Extension" is not for "obtain all documents which contains specified string". It's for very quick and effective search like "search of Web at Google".
Because characteristics of search is different, user need to choose search type easily, but, unfortunately, it's not supported by Tb well yet.
FYI.
There are other cases which produce "not found" for you.
(i)  Indexing failure by Gloda : bug 559022
(ii) Bug in SEARCH command of Gmail IMAP : bug 500272
Note:
If "auxiliary functions of Snippet" is utilized, "start match" text, "end match" text, "ellipses" text can be supported.
> http://cvs.hwaci.com/fts3.html#section_4_2
I don't know Tb already uses it or not. I don't know there is plan to use it or not.
Blocks: 519202
Blocks: 541349
No longer blocks: 519202
Severity: normal → major
Component: Search → Search
Product: Thunderbird → MailNews Core
QA Contact: search → search
Sorry for later confirmation, even though I could ses problem.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Version: unspecified → Trunk
Too quick enter... 
Sorry for late confirmation, even though I could see problem, and sorry for spam.
Summary: Full Text Search not finding all email → Full Text Search not finding all email, due to concept/design/implementation/characteristics of "search by SQLite FTS3 Extension"
OS: Windows 7 → All
Hardware: x86 → All

Updated

6 years ago
Duplicate of this bug: 628075

Comment 7

a year ago
I get the same with Thunderbird 45.1.1. A mail containing "Gadoury" does not match search term "gadour".
You need to log in before you can comment on or make changes to this bug.