several mails I received are not found by Gloda search (index rebuild didn't help. =2C, ",", in RFC2047 encoded word of From: header)

RESOLVED FIXED in Thunderbird 3.1rc1

Status

defect
RESOLVED FIXED
10 years ago
9 years ago

People

(Reporter: copkiller_cop, Assigned: asuth)

Tracking

Thunderbird 3.1rc1
x86
Windows XP

Thunderbird Tracking Flags

(thunderbird3.1 rc1-fixed)

Details

Attachments

(4 attachments)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.1.9) Gecko/20100317 Lightning/1.0b1 Thunderbird/3.0.4

I have several mails from the same sender. None of the mails he sent me show up in the search when I search for his name or email address or any other content of the mails.

I tried to rebuild the index several times but the outcome doesn't change.

This only affects mails i received .. the ones i sent out to him are indexed all fine.

Reproducible: Couldn't Reproduce




Is it possible, that this happens because of special German letters in name of the sender? (ß in this case). I have mails from a french guy as well whose name contains a é and that mails gets indexed. German umlauts work fine as well.
Version: unspecified → 3.0
What search are you using ? the Quick search or the new global search ?
Umm..., the one one in the Mail-Toolbar, that says "search all messages"
OS: Windows Vista → Windows XP
OK i just reproduced the same issue on my Thunderbird on Vista. It shows the exact same behavior.
Rebuild index won't help for that search. Can you install the glodaquilla (https://addons.mozilla.org/en-US/thunderbird/addon/9873)extension and tell me if the emails that you do not find have been indexed (they should have an index ID , it's a colum that you can add when the addon is installed)
OK, I checked, they just show ID #1 and I found several other mails with the same problem, they are in fact from the same company and contain ß, ä, ö or ü in the senders name
Karsten can you try a build from http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-1.9.2/ we've fixed a bunch of thing with regards to ß, ä, ö while indexing, it might already be fixed. When you'll install this build, the indexer will restart the complete indexing process so that might take a while. Could you try that please ?
sure I'll try on my Vista-Box
No change with this emails. They still have the ID #1 (the rest of the mails changed their ID) and can not be found via the search.
(In reply to comment #8)
> No change with this emails. They still have the ID #1 (the rest of the mails
> changed their ID) and can not be found via the search.

So we have debugging instructions at https://developer.mozilla.org/en/Thunderbird/Gloda_debugging - can you follow them ? and attach the debug output here ? (and if you need help please ask).
Well, the guide is not really helpful, which of these options do you want me to activate .. and what task should I do after doing so (in order for you to get the logs you need)
Start by the first one so we can see if errors are generated when you index.
(In reply to comment #10)
> what task should I do after doing so

Check SQLite table content in global-messages-db.sqlite using Fireox and add-on of SQLite Manager, please.
(1) Install SQLite Manager to Firefox.
(2) Create a new profile of Tb, put some problematic mails in a folder under "Local Folders", confirm that search for ß, ä, ö or ü in senders name fails, and terminate Tb.
("Drag&Drop of .eml file to thread pane of Tb" is easiest way to import.)
(3) Copy global-messages-db.sqlite of Tb to Backup-global-messages-db.sqlite 
(4) Start Fx, start SQLite Manager, connect to database file of Backup-global-messages-db.sqlite.
(5) Select table of "messageText_content", click "message Text" button, check "c3author" column values.
Is ß, ä, ö or ü in senders name are correctly saved in SQLite table?
(SQLite DB of global-messages-db.sqlite uses utf-8 as character code)

if correctly displayed, data is saved in which format? Decomposed format?(ä==a+umlaut) Or Composed format?(ä==ä of Unicode)
What charset is used in original mail header of mail?
If utf-8, Decomposed format? Or Composed format? 

Note:
Don't Click Browse&Search tab at table of "messagesText".
Next problem occurs.
> http://code.google.com/p/sqlite-manager/issues/detail?id=444
OK i will try both tomorrow .. just to be clear:

@Ludovic: i am supposed to rebuild the index then, right?

@WADA: (2) I create i new profile in my TB and do that there? what does the drag & drop part mean?

(3) Where is this file located?
(In reply to comment #13)
> @WADA: (2) I create i new profile in my TB and do that there?

Yes. Purpose of new profile is testing with clean, very small SQLite DB.

> what does the drag & drop part mean?

An simple/easy way to import test mails to created profile.
1. Save some mails of ß, ä, ö or ü in senders name in .eml, 2. Drag&drop the .eml file to thread pane of a mail folder of the new profile.

> (3) Where is this file located?

In profile directory.
FYI.

Gloda doesn't look to use table of "messagesText_content" etc. directly. Gloda looks to use them indirectly via virtual table named "messagesText".
(Table named messagesText)
>  Create statements: in Structure Tab.
>    CREATE VIRTUAL TABLE messagesText USING fts3(tokenize mozporter,
>      subject TEXT, body TEXT, attachmentNames TEXT, author TEXT,
>      recipients TEXT)
> See next page for fts3 or fts4.
>   http://www.mail-archive.com/sqlite-users@sqlite.org/msg33439.html
(SELECT statement of Glodat for accessing "messagesText")
> http://mxr.mozilla.org/comm-central/source/mailnews/db/gloda/modules/msg_search.js#108
> 108 const NUEVO_FULLTEXT_SQL =
> 109   "SELECT messages.*, messagesText.*, offsets(messagesText) AS osets " +
> 110   "FROM messagesText, messages " +
> 111   "WHERE" +
> 112     " messagesText MATCH ?1 " +
> 113     " AND messagesText.docid IN (" +
> 114        "SELECT docid " +
> 115        "FROM messagesText JOIN messages ON messagesText.docid = messages.id " +
> 116        "WHERE messagesText MATCH ?1 " +
> 117        "ORDER BY " + DASCORE + " DESC " +
> 118        "LIMIT ?2" +
> 119     " )" +
> 120     " AND messages.id = messagesText.docid " +
> 121     " AND +messages.deleted = 0" +
> 122     " AND +messages.folderID IS NOT NULL" +
> 123     " AND +messages.messageKey IS NOT NULL";
> DASCORE looks defined constant or compile time variable.
 
As SQLite Manager has problem in access of virtual table named "messagesText", direct access of non-virtual tables via SELECT statement using JOIN may be required.
I reached next documents for fts3 extension and fts3 virtual table.
> http://www.sqlite.org/fts3.html
> http://dotnetperls.com/sqlite-fts3
(In reply to comment #5)
> OK, I checked, they just show ID #1 and I found several other mails with the
> same problem, they are in fact from the same company and contain ß, ä, ö or ü
> in the senders name

Karsten Knuth, can you paste From: header of such mails?
(From: header only is sufficient, <a@x.y.z> part can be faked).
I can do checks in my comment #12 with small data by myself, because I can write simple SQL statements.

By the way, following is a part of fts3 extension document. "Search by fts3(fts4) virtual table" sounds for me fuzzy search such as Google search.
> FTS3 is an SQLite virtual table module that allows users to perform full-text searches on a set of documents.
> The most common (and effective) way to describe full-text searches is
> "what Google, Yahoo and Altavista do with documents placed on the World Wide Web".
> Users input a term, or series of terms, perhaps connected by a binary perator or grouped together into a phrase,
> and the full-text query system finds the set of documents that best matches those terms considering the operators and groupings the user has specified.
>(snip)
> CREATE VIRTUAL TABLE enrondata1 USING fts3(content TEXT); /* FTS3 table */
> CREATE TABLE enrondata2(content TEXT);                     /* Ordinary table */
> SELECT count(*) FROM enrondata1 WHERE content MATCH 'linux';  /* 0.03 seconds */
> SELECT count(*) FROM enrondata2 WHERE content LIKE '%linux%'; /* 22.5 seconds */
> Of course, the two queries above are not entirely equivalent.

If so, fts3 virtual table is better to be used only for body text search. Is gloda/modules/msg_search.js used for other than body search? I don't check caller side of gloda/modules/msg_search.js yet, so I can say nothing about it. But, if gloda/modules/msg_search.js is used for other than body search, fts3 virtual table of "messagesText" is used, because string of "messagesText_content"(table name of non-virtual table) was not found by search at MXR.

Ludovic, you referred to Gloda_debugging in comment #9, so I thought Gloda is really relevant to this bug's problem. Is Gloda and the fts3 virtual table really used for other than body search such as search of name data in From: header?
OK pasting some From headers first:

some that don't work: (they are form the same domain)
From: =?iso-8859-1?Q?M=FCller=2C_Bernd-Ingo?=
From: =?iso-8859-1?Q?O=DFwald=2C_Uwe?=

one that does work:
From: =?iso-8859-1?Q?=22Susanne_Tsch=F6pe=22?=
(In reply to comment #13)
> OK i will try both tomorrow .. just to be clear:
> 
> @Ludovic: i am supposed to rebuild the index then, right?
> 

Yes.


> Ludovic, you referred to Gloda_debugging in comment #9, so I thought Gloda is
> really relevant to this bug's problem. Is Gloda and the fts3 virtual table
> really used for other than body search such as search of name data in From:
> header?

I also think gloda does something we address book entries, as you can search based on the email address and names I think it does also index the address book entries - asuth can you confirm ?
OK i did the gloda debugging thing. i activated all those options and let it rebuild the whole index .. there was no output on the error console whatsoever
(In reply to comment #12)

OK I tested the thing from WADA as well ... the said emails don't show up in the database .. i added some mails that work for comparison. Those show up in the Database .. for one of them who contained an umlaut it was saved in composed format.

so to make it clear: the mails i was talking about don't show up in the database
A message id of 1 means that gloda experienced a failure during indexing the messages and has marked them bad.  It will not attempt to index those messages again until the next time the folder is marked (filthy), on upgrade.

Please try a Thunderbird 3.1 nightly and verify the problem still happens.  Assuming it does, please provide us with a copy of one of the messages that is experiencing the problem.
OK i just tested again with the nightly build. It still doesn't work with the search in the "mail" menu pane. It does however work with the quick filter, that I just noticed.

I will attach one of the mails. I have deleted most of the content but let the header be .. it still has the same problem.
(In reply to comment #24)
> modified mail that still causes the problem (header is intact)

Quick check result with Tb 3.0.4(en-US) on Japanese MS Win-XP SP3(system charset=Shift_JIS).

1. Create new profile, dummy POP3 account, create a folder, import the .eml
   which has "Oßwald," in decoded name part of From:.
(mail-1)
> Subject: AW: Ihre Bewerbung bei FERCHAU
> From: =?iso-8859-1?Q?O=DFwald=2C_Uwe?= <Uwe.Osswald@ferchau.de>
> (Display: of address book card = Oßwald, Uwe)
> In-Reply-To: <4BA0C46E.20507@gmx.net>
> References: <FC959DC9E120B846AB46F97BA4B94079019594@hdc002.ferchau.local> <4BA0C46E.20507@gmx.net>
> (glodaid column value = 1)

2. Add mail sender to Address book
   (hard to type Oßwald in my environment. to set search string by copy&paste)
3. Search for Oßwald
3-1. Quich search, From filter, Oßwald
3-2. Edit/Find/Search Messages, From contains Oßwald 
3-3. Saved Search folder, From contains Oßwald
In any case, mail was found as expected.
4. Add next 2 mails. ([CRLF] == 0x0D0A, test case for other bug)
(mail-2)
> Subject: =?UTF-8?B?0JTQltCY0JnQm9Ck0KnQrdCu0K8=?=[CRLF]
>  meta:charset=GB2312[CRLF]
> (Decoded subject == ДЖИЙЛФЩЭЮЯ meta:charset=GB2312)
> From: x x <x@x.x.x>[CRLF]
> To: z@z.z.z[CRLF]
> Message-ID: <4A6AB957.4030107@x.x.x>
> (glodaid column value = 22)
(mail-3)
> Subject: =?UTF-8?B?0JTQltCY0JnQm9Ck0KnQrdCu0K8=?=[CRLF]
>  meta:charset=windows-1252[CRLF]
> (Decoded subject == ДЖИЙЛФЩЭЮЯ meta:charset=windows-1252)
> From: x x <x@x.x.x>[CRLF]
> To: z@z.z.z[CRLF]
> Message-ID: <4A6AB957.4030107@x.x.x>
> (glodaid column value = 23)
Note: As crafed mails for testing of other bug, these two mails have
      same messsage-id, and no In-Reply-To:, no References:. 
5. Terminate Tb, check SQLite DB of global-messages-db.sqlite.
5-1. Table=conversations
     id=1, subject=AW: Ihre Bewerbung bei FERCHAU
     id=2, subject=ДЖИЙЛФЩЭЮЯ meta:charset=GB2312
   Why no entry for mail-3? First Subject: header line is used for threading
   based on subject?
   "AW:" is localized "Re:"?
5-2. Table=messages
     id=32,folderid=Null,messageKey=Null,conversationid=1,date=null,
           headerMessageID=
             FC959DC9E120B846AB46F97BA4B94079019594@hdc002.ferchau.local
     id=33,folderid=Null,messageKey=Null,conversationid=1,date=null
           headerMessageID=4BA0C46E.20507@gmx.net
     id=34,folderid=5,messageKey=3036,conversationid=2,date=1248508247000000
           headerMessageID=4A6AB957.4030107@x.x.x
     id=35,folderid=5,messageKey=6107,conversationid=2,date=1248508247000000
           headerMessageID=4A6AB957.4030107@x.x.x
   id=32/33 : As two messages ID's are in In-Reply-To: and References:,
              two id's look to be generated for root messages of thread.
   id=34/35 : As no In-Reply-To: nor References:, one id for each mail looks
              to be generated as root messages of thread.
5-2. Table=messagesText_content
     docid=34,c0subject=ДЖИЙЛФЩЭЮЯ meta:charset=GB2312
     docid=35,c0subject=ДЖИЙЛФЩЭЮЯ meta:charset=windows-1252
   Why no entry for docid=32 and docid=33?
   messagesText_content is for root message of thread?
5-3. Table=conversationsText and Table=messagesText
   At Browse&Search tab, exception occurs, because of fts3 virual table.
   So, it was impossible to see virtual table content by SQLite Manager.
   I'll try to check what wrapper of fts3 which Tb uses does do.

As Shift_JIS doesn't have code point for ß, Oßwald is changed to Oswald if converted to Shift_JIS. In such case, unicode is probably used in search by Tb on Japanese MS Win.
Karsten Knuth, what is system charset of your MS Windows? windows-1252?

Cause may be next.
Generation of SELECT statement for Oßwald is executed in windws-1252 instead of unicode when Edit/Find/Search Messages, if Tb runs on MS Win of system charset=windows-1252.
If such problem, it may be localized Tb only issue. Karsten Knuth, does your problem occur with en-US version/build of Tb?
(In reply to comment #22)
> A message id of 1 means that gloda experienced a failure during indexing the
> messages and has marked them bad.

glodaid column value by GlodaQuilla in my environment.
  mail-1 :  1
  mail-2 : 22 (Hex of id=34 of table=messages in decimal by SQLite Manager?) 
  mail-3 : 23 (Hex of id=35 of table=messages in decimal by SQLite Manager?)
mail-1 has glodaid=1 by GlodaQuilla in my environment too, but search was successfull in my environment.
If "glodaid of 1 == marked as bad" always produces "not found", why was search successuful in my environment?
Is there any settings affects on search?
"AW:" is localized "Re:" and search is affected by next settings?
>(All of next is defaul of Tb 3.0, as I never touched after profile creation) 
> mail.correct_threading;true
> mail.strict_threading;true
> mail.thread_without_re;false

Andrew Sutherland, is there any way to hook SELECT statement generated by Tb upon search?
(In reply to comment #26)
> Andrew Sutherland, is there any way to hook SELECT statement generated by Tb
> upon search?

gloda debug logging reports the SQL queries generated, as enabled by the "mailnews.database.global.logging.dump" preference as documented at:
https://wiki.mozilla.org/Thunderbird:Debugging_Gloda

It does not report the results.
(In reply to comment #26)
> (In reply to comment #22)
> If "glodaid of 1 == marked as bad" always produces "not found", why was search
> successuful in my environment?

Where was it successful in your environment?  Point number 3 in your list did not use gloda search at all; those are all implemented using mailnews/base/search logic.
(In reply to comment #28)
> Where was it successful in your environment?  Point number 3 in your list did
> not use gloda search at all;
> those are all implemented using mailnews/base/search logic.

(1) I initially thought so, as I thought Gloda is for full text search of body.

(2) Problem report of comment #0 is as follows.
> None of the mails he sent me show up in the search
> when I search for his name or email address or any other content of the mails.
(3) I can't see "Any standard mail headers or Body Filter" in selection list
    of Quick Search, which is now labeled as "Global Search" at cutomize panel.
    I know next only as search of "Oßwald is contained in From or Body".
    - Edit/Find/Search Messages. or Saved Search Folder.
      "From contains Oßwald" OR "Body contains Oßwald".
    I don't know how to force Gloda search for whole "Oßwald is contained in
    From: header OR Body".
(4) Instruction of "get debug data for Gloda" was posted for problem report of comment #0.
(5) Search box called "Quick Search" is now labeled as "Global Search" at cutomized panel.

(6) So, I assumed that Gloda is used for "Oßwald is contained in From header" part in search of '"From contains Oßwald" OR "Body contains Oßwald"'.
(7) But I don't sure Gloda is used for "From contains Oßwald" part. So I posted comment #17.
(8) And, finally, I've got answer to my question #17 by you.

Andrew Sutherland, is there any way to force Gloda search for whole of "Oßwald is contained in From header or Body"?
Funny phenomenon was observed additionally.
- Tb 3.0.4. Quick Search, Body Filter, Oßwald => false positive for mail-3.
- Tb 3.0.4, Quick Search, Search All Messges, Oßwald => Nothing is found.
- Tb 3.2pre, After message of re-index due to schema change,
  Quick Search, Entire Messages, Oßwald => Nothing is found.
The only way to search gloda is when the search box is set to the "Search all messages" mode.  It is not surprising that gloda does not find messages it does not index.  It is likewise not surprising that any non-gloda search does find them.

Please do not use Thunderbird 3.2a1pre builds.  The JavaScript engine is buggy and not reliable.  Only use Thunderbird 3.1-series nightlies at this point.
Checked with next build on Japanese MS WinXP.
> Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.5pre) Gecko/20100415 Lanikai/3.1b2pre
Note: As Japanese MS Window, "O゚wald" was generated by copy at console window and paste at editor window. If utf-8 bytes was put to console, I guess garbage was displayed at console. I guess byte code of windows-1252 was put to console. 

Gloda Debug log for "Quick Search, Search All Messages, Oßwald". 
> 2010-04-23 13:32:57 gloda.datastore DEBUG QUERY FROM QUERY: SELECT * FROM contacts
>   WHERE (id IN (SELECT id FROM contacts WHERE name LIKE ? ESCAPE '/')) ARGS: %O゚wald,%
> 2010-04-23 13:32:57 gloda.datastore DEBUG QUERY FROM QUERY: SELECT * FROM identities
>   WHERE (id IN (SELECT id FROM identities WHERE (kind IN ('email')) AND value LIKE ? ESCAPE '/')) ARGS: %O゚wald,%
> 2010-04-23 13:32:58 gloda.datastore DEBUG QUERY FROM QUERY: SELECT * FROM contacts
>   WHERE (id IN (SELECT id FROM contacts WHERE name LIKE ? ESCAPE '/')) ARGS: %O゚wald%
> 2010-04-23 13:32:58 gloda.datastore DEBUG QUERY FROM QUERY: SELECT * FROM identities
>   WHERE (id IN (SELECT id FROM identities WHERE (kind IN ('email')) AND value LIKE ? ESCAPE '/')) ARGS: %O゚wald%
> 2010-04-23 13:32:58 gloda.datastore DEBUG QUERY FROM QUERY: SELECT * FROM identities
>   WHERE (id IN (SELECT id FROM identities WHERE (contactID IN (2)))) ARGS:
> 2010-04-23 13:33:00 gloda.datastore DEBUG QUERY FROM QUERY: SELECT messages.*, messagesText.*, offsets(messagesText) AS osets FROM messagesText, messages
>   WHERE messagesText MATCH ?1
>     AND messagesText.docid
>       IN (SELECT docid FROM messagesText JOIN messages ON messagesText.docid = messages.id
>             WHERE messagesText MATCH ?1
>             ORDER BY (((glodaRank(matchinfo(messagesText), 2.0, 1.0, 2.0, 1.5, 1.5) + messages.notability) * 604800000000) + messages.date) DESC LIMIT ?2 )
>     AND messages.id = messagesText.docid
>     AND +messages.deleted = 0
>     AND +messages.folderID IS NOT NULL
>     AND +messages.messageKey IS NOT NULL
>   ARGS: "O゚wald",400

If it's not problem of charater code in SELECT, I seems next.
(1) Raw for root mail of mail-1/mail-4(name contains Oßwald) is not creted as expected in table=messages(id=33), because replied mail(root of thread, mail-4) and reply mail(mail-1) was imported in reversed order, then glodaid=1 is assigned(marked as bad).
(2) Any raw in table=messagesText(fts3 virtual table) doesn't have pointer to root mail of id=33 in table=messages, or raw of id=33 in table=messages doesn't point mail data correctly, or glodaid=1(marked as bad), then nothing is found by the SELECT.

I'll check normal import order case(root of thread/replied to mail first, reply mail second).

As Oßwald part was displayed as O゚wald at console, it's hard to know real byte data. Andrew Sutherland, can Gloda Debugging put data in escaped format to console?
My guess was almost wrong, although corrupted raw in table=messages was relevant.
Culprit was =2C(",") in rfc 2047 encoded word for name field in From:(decoded text=Oßwald, Uwe).
I thought =2C problem in interpretation/handling of encoded name part in From:/To:/CC: was already resolved.

If =2C is removed in orginal test mail, glodaid=22 was assigened, and "Quick Search, Search All Messages, Oßwald" was successful.
(original, =2C in encoded word)
> From: =?iso-8859-1?Q?O=DFwald=2C_Uwe?= <Uwe.Osswald@ferchau.de>
(=2C in encoded word is removed => no problem)
> From: =?iso-8859-1?Q?O=DFwald_Uwe?= <Uwe.Osswald@ferchau.de>

If =22 is added to original encoded word("Oßwald, Uwe" including " was encoded), problem probably doesn't occur.
> From: =?iso-8859-1?Q?=22O=DFwald=2C_Uwe=22?= <Uwe.Osswald@ferchau.de>
Status: UNCONFIRMED → NEW
Ever confirmed: true
mail-00 : root mail of thread, no =2C in encoded word
          docid in messagesText_content = 32, glodaid = 20
mail-01 : reply mail, with =2C, this bug and bug 491832 occur
          entry is not defined in messagesText_content, glodaid = 1
mail-02 : reply mail, with =2C, escaped by =5C in encoded word
          docid in messagesText_content = 34, glodaid = 22
mail-03 : reply mail, with =2C, quoted by =22 in encoded word
          docid in messagesText_content = 35, glodaid = 23
Interestingly enough: other messages i have from the same domain, that do not contain umlauts have the "" included, so they work fine.
Bug 254519(problem when 0x2C exists in RFC 2047 encoded word) is already fixed, and known/remaining issue after fix of Bug 254519 is bug 491832 only(UI issue because UI directly uses decoded name with native ",").
Is this bug's problem of Gloda one like Bug 254519? (wrong order of "RFC 2047 decoding" and "RFC 2822 application")
Or problem of Gloda like bug 491832? (direct use of name after RFC 2047 decoding which has native "," in name)

Note:
If this bug is problem like bug 254519, Gloda also should appliy RFC 2822 before application of RFC 2047 to mail header.
If this bug is problem like bug 491832, this bug is avoided by quoting name with native "," by double-quote, or by escaping of native "," in name by "\", after RFC 2047 decoding of encoded word, as done upon mail composition.
Different search result between Gloda and mailnews/base/search is observed for next test mail.
> mail-02 : reply mail, with =2C, escaped by =5C in encoded word
            docid in messagesText_content = 34, glodaid = 22

Search for 'FirstName,' (no quote).
(a) Gloda : Finds mail-02. Gloda looks to remove \ for escape correctly.
(b) mailnews/base/search(Quick Search, From filter) : Unable to find mail-02.
    As FirstName\, is seen at thread pane, and as "FirstName\\, LastName"
    <local-part@a.b.c> is generated by "Compose Message To",
    data for mail&news looks 'FirstName\,' (no quote).
Note: Confusing next string is changed from attached test case.
 Message-ID:<mail.0x.@a.b.c> => Message-ID:<mail.0x@a.b.c> (remove . before @)
 @@a.b.c of mail address in To: header => @a.b.c (remove excess "@")

For mail-03 (with =2C, quoted by =22). Normal.

> 2010-04-30 15:37:49     gloda.index_msg DEBUG   *** Indexing message: 1545 : mail-03 = reply mail, with =2C, with =22, no =5C
> 2010-04-30 15:37:49     gloda.index_msg DEBUG     * Got message, subject mail-03 = reply mail, with =2C, with =22, no =5C
>(snip)
> 2010-04-30 15:37:49     gloda.NS        DEBUG    creating contact for 'FirstName, LastName' (local-part@a.b.c)
> 2010-04-30 15:37:49     gloda.NS        DEBUG    creating contact for 'FN1, LN1' (to1@a.b.c)
> 2010-04-30 15:37:49     gloda.NS        DEBUG    creating contact for 'FN2, LN2' (to2@a.b.c)

For mail-01 (with =2C, without escaping by =5C, without quoting by =22):

> 2010-04-30 15:37:50     gloda.index_msg DEBUG   *** Indexing message: 454 : mail-01 = reply mail, with =2C, no =22, no =5C
> 2010-04-30 15:37:50     gloda.index_msg DEBUG     * Got message, subject mail-01 = reply mail, with =2C, no =22, no =5C
>(snip)
> 2010-04-30 15:37:50     gloda.NS        DEBUG    found identity for 'LastName' (local-part@a.b.c)
> 2010-04-30 15:37:50     gloda.NS        DEBUG    found identity for 'LN1' (to1@a.b.c)
> 2010-04-30 15:37:50     gloda.NS        DEBUG    found identity for 'LN2' (to2@a.b.c)
> 2010-04-30 15:37:50     gloda.NS        DEBUG    creating contact for '' (firstname)
> 2010-04-30 15:37:50     gloda.NS        DEBUG    creating contact for '' (fn1)
> 2010-04-30 15:37:50     gloda.NS        DEBUG    creating contact for '' (fn2)

Gloda separates to two mail addresses at ","(=2C), and treats first name part before "," as mail address of local part only.
Even if so, why messagesText entry is not created for mail-01?
Mis-interpretation of "two mail address in From: header" is cause of problem?
Summary: several mails I received are not found by the search (index rebuild didn't help) → several mails I received are not found by the search (index rebuild didn't help. =2C, ",", in RFC2047 encoded word of From: header)
Summary: several mails I received are not found by the search (index rebuild didn't help. =2C, ",", in RFC2047 encoded word of From: header) → several mails I received are not found by Gloda search (index rebuild didn't help. =2C, ",", in RFC2047 encoded word of From: header)
FYI.
When number of mails are increased, gloda id of 2a, 2b, 2c, 2d was displayed at "gloda id" column by GlodaQuilla. GlodaQuilla looks to show docid value of messagesText in hexa decimal format.
(In reply to comment #36)
> Is this bug's problem of Gloda one like Bug 254519? (wrong order of "RFC 2047
> decoding" and "RFC 2822 application")

Yes, this appears to be the case.  Thanks for figuring out it was the comma thing and linking me to the relevant preceding bug.  The abstraction is such that gloda can easily mistakes other code has previously made.  (The good news is that code built on gloda then gets to avoid these mistakes!)
I suspect I verified that the address parser would decode encoded things for cc/bcc, but left the mime2Decoded* things around for those that had it because the default folder character set is used for them (while it is not with the address parser route) and did not think to consider that the order of operations might be significant.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Attachment #442986 - Flags: review?(bienvenu)
Attachment #442986 - Flags: review?(bienvenu) → review+
Comment on attachment 442986 [details] [diff] [review]
v1 fix and test; do not use the mime2Decoded sender/recipients and instead let the email address parser do the decoding

I'm gonna authorize myself to land this for 3.1 given the high confidence of the fix, the thoroughness of the test, and the fact that the problem is non-trivial to QA.
Attachment #442986 - Flags: approval-thunderbird3.1+
pushed to comm-1.9.2:
http://hg.mozilla.org/releases/comm-1.9.2/rev/0abdb41cbf34

pushed to comm-central:
http://hg.mozilla.org/comm-central/rev/aa187bdfe978

Thanks again for the analysis assist, WADA!
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 3.1rc1
You need to log in before you can comment on or make changes to this bug.