Open Bug 667854 Opened 13 years ago Updated 5 months ago

Mails with a body containing quoted-printable-like strings ("=" followed by 2 hexadecimal digits) not matched (false negatives/positives) in some local body searches

Categories

(Thunderbird :: Search, defect)

x86
All
defect
Not set
major

Tracking

(Not tracked)

People

(Reporter: yuki, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: dupeme, reproducible, testcase)

Attachments

(4 files, 1 obsolete file)

tested on Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 (Thunderbird 3.1 on Ubuntu) This is similar to the bug 132340. The bug said that quick search now work for base64 encoded bodies, but it still don't work for this testcase. Steps to reproduce: 1. Import the attached eml file to your local mail folder. It includes Base86 encoded body like: ------------------------------------------------------ 「王様は、人を殺します。」 「なぜ殺すのだ。」 「悪心を抱いている、というのですが、誰もそんな、悪心を持っては居りませぬ。」 「たくさんの人を殺したのか。」 「はい、はじめは王様の妹婿さまを。それから、御自身のお世嗣(よつぎ)を。 それから、妹さまを。それから、妹さまの御子さまを。それから、皇后さまを。 それから、賢臣のアレキス様を。」 「おどろいた。国王は乱心か。」 ------------------------------------------------------ ("Run,Melos!" by Osamu Dazai) The base64 encoded version is: ------------------------------------------------------ 44CM546L5qeY44Gv44CB5Lq644KS5q6644GX44G+44GZ44CC44CNCuOAjOOBquOBnOauuuOBmeOB ruOBoOOAguOAjQrjgIzmgqrlv4PjgpLmirHjgYTjgabjgYTjgovjgIHjgajjgYTjgYbjga7jgafj gZnjgYzjgIHoqrDjgoLjgZ3jgpPjgarjgIHmgqrlv4PjgpLmjIHjgaPjgabjga/lsYXjgorjgb7j gZvjgazjgILjgI0K44CM44Gf44GP44GV44KT44Gu5Lq644KS5q6644GX44Gf44Gu44GL44CC44CN CuOAjOOBr+OBhOOAgeOBr+OBmOOCgeOBr+eOi+anmOOBruWmueWpv+OBleOBvuOCkuOAguOBneOC jOOBi+OCieOAgeW+oeiHqui6q+OBruOBiuS4luWXo++8iOOCiOOBpOOBju+8ieOCkuOAguOBneOC jOOBi+OCieOAgeWmueOBleOBvuOCkuOAguOBneOCjOOBi+OCieOAgeWmueOBleOBvuOBruW+oeWt kOOBleOBvuOCkuOAguOBneOCjOOBi+OCieOAgeeah+WQjuOBleOBvuOCkuOAguOBneOCjOOBi+OC ieOAgeizouiHo+OBruOCouODrOOCreOCueanmOOCkuOAguOAjQrjgIzjgYrjganjgo3jgYTjgZ/j gILlm73njovjga/kubHlv4PjgYvjgILjgI0K ------------------------------------------------------ 2. After the mail was completely imported, do quick search about "アレキス". for message bodies. The actual source of the message has the name. However, no message is appear in the search results. Actual result: no message is found. Expected result: The imported message appears on the search result.
Oops, > 2. After the mail was completely imported, do quick search about "アレキス". > for message bodies. The actual source of the message has the name. > However, no message is appear in the search results. Please ignore the last line.
I couldn't reproduce this problem on Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20110620 Thunderbird/5.0b2 Sorry, it seems to be fixed on lately versions...
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
With another testcase, this problem happens on Thunderbird 5.0. However, by privacy reason I cannot't upload the actual testcase...
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Version: 3.1 → 5.0
Attached file minimum testcase
Minimum testcase for this problem. This message includes following text as a base64 encoded string: ---------------------------------------------------------------- https://www.example.com/?randomparam1=2d2b0e756d50b91d96e90c0c1bb1cf51&randomparam2=f84bd4b600 [SEARCHTERM] ---------------------------------------------------------------- Steps to reproduce: 1. Import the attached message to Thunderbird. I used the addon "ImportExportTools". 2. Go to the folder which the message is imported. 3. Start search by Ctrl-Shift-F. 4. Search a message by the condition: "Body", "contains", and "SEARCH" (type it into the textbox) Actual result: No message found. Expected result: The imported message is found.
Attachment #542404 - Attachment is obsolete: true
Summary: Local body search does not work if the body is encoded as Base64 and some lines are "broken" → Local body search does not work if the body is encoded as Base64 and includes long URL
Whiteboard: dupme
Keywords: testcase
I have the similar problem, message body filter doesn't work for messages with Base64 and long URL. As a result, I can't filter mails from bugzilla... user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0) Gecko/20110905 Thunderbird/7.0 ... Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Any plans to fix it?
Confirmed with Tb 7.0.1 on Win-XP. With attached mail, any of body search for aHR0(string of base64 encoded data), for http(text in decoded data), for SEARCHSTRING text in decoded data), returns nothing. If the mail is attached as message/rfc822 part by Tb, message/rfc822 part is sent in base64(corrupted format though, known bug), > Content-Transfer-Encoding: base64 and body search for YUhS(string of base64 encoded data) returns the mail. This is already known and long-lived bug 37031, and different problem from this bug.
Status: REOPENED → NEW
OS: Linux → All
Summary: Local body search does not work if the body is encoded as Base64 and includes long URL → Local body search does not work if the body is encoded as Base64
Version: 5.0 → 7
Additional quick observations. (A-1) "Search All Messages" in Toolbar, When base64 encoded http, https, https:, https:/, https://, https://w, ... => match htt => no match 2d2b0e756d50b91d96e90c0c1bb1cf51 => match Substring of "2d2b0e756d50b91d96e90c0c1bb1cf51" => no match f84bd4b600 => match Substring of f84bd4b600 => no match SEARCHTERM => match Substring of "SEARCHTERM" => no match (A-2) "Search All Messages" in Toolbar, When 8bits(not encoded) Same as (A-1), but may be slightly different. Gloda correctly indexes using base64 decoded message body. Above looks a characteristcs of simple Gloda Search. "Term in Gloda(similar to Word)" is roughly continuous non-space/non-special chars, and Tb's Gloda Search probably doesn't use "term start with"/"term end with"/"term contains" like search. IIRC, Gloda doesn't consider string less than 4 chars "Term". This is probably reason why "htt" doesn't match. "https" may be plural form of "http" for Gloda. This may be a reason why "http" matches. (B-1) "Filter These Messages" in Quick Filter Bar, Body, When base64 encoded (i) SEARCHTERM, any substring of "SEARCHTERM" => no match (this bug) (ii) URL string, any substring of URL string => no match (B-2) "Filter These Messages" in Quick Filter Bar, Body, When 8bits(not encoded) (i) SEARCHTERM, any substring of "SEARCHTERM" => match (ii) URL string, any substring of URL string => no match Note: "Filter These Messages" in Quick Filter Bar is same as Edit/Find/Search Messages and Search of folder context menu, and also same as Saved Search Folder(Virtual Folder) of "single search target folder" and of "not online search if IMAP", as far as conditions are set similarly to search at "Filter These Messages" and filter by "Quick Filter". If URL string, and if base64 encoded message body, both of (B-1)/(i) "problem in Body Search on base64 encoded message body" and (B-2)/(ii) "phenomenon in URL string search on message body of text/plain mail(linkified by Tb always)", looks to occur at same time.
All mails has message body of next format, and is base64 encoded. {FINDSTRING} http://x.y.z<depends_on_test_case> [SEARCHTEXT] (No CRLF/LF/CR to force Tb to send in base64 when text file is attached) Difference among case-1/2/3 case-1 : URL ends with /?p=... case-2 : "/" before "?" is removed from case-1, and one byte is added case-3 : "=" in case-2 is replaced by "X" Difference between case A and B: URL in case B is one byte longer than case A. Length of URL is 18 bytes or 19 bytes. URL in message body is also placed in Subject: for ease of observation. [Test Result] Body Search result for: Subject FINDSTRING SEARCHTEXT http x.y.z test-1A base64, URL=http://x.y.z/?p=a O O O O test-1B base64, URL=http://x.y.z/?p=ab X X X X test-2A base64, URL=http://x.y.z?p=ab X X X X test-2B base64, URL=http://x.y.z?p=abc X X X X test-3A base64, URL=http://x.y.z?pXab O O O O test-3B base64, URL=http://x.y.z?pXabc O O O O Note: If URL=http://x.y.z#abcde, this bug's problem couldn't be observed. Because of crafted mail and no CRLF/LF/CR in base64 encoded message body data, different problem from this bug may be involved in avobe test reslt.
Severity: normal → major
Summary: Local body search does not work if the body is encoded as Base64 → Local body search does not work if the body is encoded as Base64 and includes URL with search keyword
Version: 7 → Trunk
"Nothing is found" occurs even when message body is plain text, is not base64 encoded. Test mails are "Content-Transfer-Encoding: 8bits, with plain text data" cases for "not found with base64 data" cases in previous test. [Test Result] Body Search result for: Subject FINDSTRING SEARCHTEXT http x.y.z test-1Btext 8bits, URL=http://x.y.z/?p=ab X X X X test-2Atext 8bits, URL=http://x.y.z?p=ab X X X X test-2Btext 8bits, URL=http://x.y.z?p=abc X X X X Next in my comment #8 was above. > (B-2) "Filter These Messages" in Quick Filter Bar, Body, When 8bits(not encoded) > (ii) URL string, any substring of URL string => no match Removing base64 from bug summary. FYI. Bug 132340 for "problem in search of base64 encoded message body" is resolved in 2009.
This bug may be cause of many hard-to-analyze reports of "false negative in local body search".
Summary: Local body search does not work if the body is encoded as Base64 and includes URL with search keyword → Local body search does not work if the body contains URL with search keyword
(1) Words/terms placed in same line as term of "=.." base64 or plain text is irrelevant. Quoted-printable like string Problem occurs or not ("=" followed by two hexa decimal digits) =ab, =89 Problem occurs =79, =ag, =ax, =az, =8x Problem doesn't occur It looks problem occurs only when string larger than or equals to =80. (2) Words/terms placed in different line. (2-1) Plain text mail(8bits) : Words/terms in different line is not affected. Body search can find them. No problem. (2-2) base64 encoded : Terms in different lines is affected. Body search can not find them. (original case of this bug) (2-3) quoted-printable : Problem won't occur even when in same line, bug 481616 always occurs though. So, if URL of original case is changed to "...randomparam1=2g...randomparam2=fg..., problem disappears. Problem looks relevant to quoted-printable like string only. URL or not, search keyword or not, were irrelevant. It's merely that URL can have Keyword=Value format string in it and Value part can frequently start with two hexa decimal digits when URL like Google's search. And, base64 or not was relevant to problem. When mail is not base64 encoded, problem occurs only on words/terms which is placed in same line. Problem doesn't occur on words/term in different lines. If mail is base64 encoded, problem occurs even when words/terms are placed in different line from quoted-printable like string. Body Search can not find them. If plain text, each line is read from folder, so it's split to multiple lines by read/write operation. However, if base64, it's obtained from buffer for decoded data. Line-break handling of search in such case is probably not appropriate.
Summary: Local body search does not work if the body contains URL with search keyword → Local body search does not work if the body contains quoted-printable like string("=" followed by two hexa decimal digits)
Problem summary. When message body contains quoted-printable encode like string("=" followed by two hexa decimal digits for special character or characters higher than 0x80), (i) when plain text, any string in same line is not found by Body Search. (ii) when base64 encoded, any string in any line is not found by Body Search.
this fails both quick filter and search messages?
I can confirm that this problem exists for both quick filter and search messages in TB 24.2.0
This bug still exists in Thunderbird 24.5.0. To reproduce it you can create a new message with the following body: Field=ABCDEF Send the message and select Quick filter (Ctrl+Shift+K) of the Sent folder. Filter by the word "ABCDEF" (without quotes), no results appear. When you search for "=ABCDEF" or "CDE" (without quotes) the message appears. Here is a complete message that reproduces the problem: ------------------- From - Mon Jun 02 14:51:25 2014 Return-Path: <xxx@xxx.xxx> Delivered-To: xxx@xxx.xxx Message-ID: <538C651E.5060108@xxx.xxx> Date: Mon, 02 Jun 2014 14:50:54 +0300 From: <xxx@xxx.xxx> User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: xxx@xxx.xxx Subject: Probe Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Field=ABCDEF
FYI. Body Search. > http://mxr.mozilla.org/comm-central/source/mailnews/base/search/src/nsMsgLocalSearch.cpp#491 > http://mxr.mozilla.org/comm-central/source/mailnews/db/msgdb/src/nsMsgHdr.cpp#539 > http://mxr.mozilla.org/comm-central/source/mailnews/base/search/src/nsMsgSearchTerm.cpp#913 > nsMsgSearchTerm::MatchBody (nsIMsgSearchScopeTerm *scope, uint64_t offset, uint32_t length /*in lines*/, const char *folderCharset, > nsIMsgDBHdr *msg, nsIMsgDatabase* db, bool *pResult) > > http://mxr.mozilla.org/comm-central/source/mailnews/base/search/src/nsMsgSearchTerm.cpp#942 > 942 // If there's a '=' in the search term, then we're not going to do > 943 // quoted printable decoding. Otherwise we assume everything is > 944 // quoted printable. Obviously everything isn't quoted printable, but > 945 // since we don't have a MIME parser handy, and we want to err on the > 946 // side of too many hits rather than not enough, we'll assume in that > 947 // general direction. Blech. ### FIX ME > 948 // bug fix #314637: for stateful charsets like ISO-2022-JP, we don't > 949 // want to decode quoted printable since it contains '='.
Ticket 1101474 is probably a duplicate of this one. Please retitle to something like "Some mails with a body containing an equal sign ("=") not matched (false negatives/positives) in some local body searches". Mentioning "quick" somewhere may also help.
Changing the summary, as suggested in IRC.
Summary: Local body search does not work if the body contains quoted-printable like string("=" followed by two hexa decimal digits) → Mails with a body containing quoted-printable-like strings ("=" followed by 2 hexadecimal digits) not matched (false negatives/positives) in some local body searches
FYI: There is a workaround addon for this bug. Search Body in Quoted Printable https://addons.thunderbird.net/thunderbird/addon/search-body-in-quoted-printabl/

reverting to earliest affected version

Keywords: dupeme
Whiteboard: dupme
Version: Trunk → 3.1

Bug has been reproduced above.
I cannot reproduce comment #15 in TB 115.6.1 @ win10-64bit myself.

Keywords: reproducible
See Also: → 1101474
See Also: → 1259534
See Also: 1259534
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: