Open
Bug 667854
Opened 13 years ago
Updated 5 months ago
Mails with a body containing quoted-printable-like strings ("=" followed by 2 hexadecimal digits) not matched (false negatives/positives) in some local body searches
Categories
(Thunderbird :: Search, defect)
Tracking
(Not tracked)
NEW
People
(Reporter: yuki, Unassigned)
References
(Blocks 2 open bugs)
Details
(Keywords: dupeme, reproducible, testcase)
Attachments
(4 files, 1 obsolete file)
tested on Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 (Thunderbird 3.1 on Ubuntu)
This is similar to the bug 132340. The bug said that quick search now work for base64 encoded bodies, but it still don't work for this testcase.
Steps to reproduce:
1. Import the attached eml file to your local mail folder. It includes
Base86 encoded body like:
------------------------------------------------------
「王様は、人を殺します。」
「なぜ殺すのだ。」
「悪心を抱いている、というのですが、誰もそんな、悪心を持っては居りませぬ。」
「たくさんの人を殺したのか。」
「はい、はじめは王様の妹婿さまを。それから、御自身のお世嗣(よつぎ)を。
それから、妹さまを。それから、妹さまの御子さまを。それから、皇后さまを。
それから、賢臣のアレキス様を。」
「おどろいた。国王は乱心か。」
------------------------------------------------------
("Run,Melos!" by Osamu Dazai)
The base64 encoded version is:
------------------------------------------------------
44CM546L5qeY44Gv44CB5Lq644KS5q6644GX44G+44GZ44CC44CNCuOAjOOBquOBnOauuuOBmeOB
ruOBoOOAguOAjQrjgIzmgqrlv4PjgpLmirHjgYTjgabjgYTjgovjgIHjgajjgYTjgYbjga7jgafj
gZnjgYzjgIHoqrDjgoLjgZ3jgpPjgarjgIHmgqrlv4PjgpLmjIHjgaPjgabjga/lsYXjgorjgb7j
gZvjgazjgILjgI0K44CM44Gf44GP44GV44KT44Gu5Lq644KS5q6644GX44Gf44Gu44GL44CC44CN
CuOAjOOBr+OBhOOAgeOBr+OBmOOCgeOBr+eOi+anmOOBruWmueWpv+OBleOBvuOCkuOAguOBneOC
jOOBi+OCieOAgeW+oeiHqui6q+OBruOBiuS4luWXo++8iOOCiOOBpOOBju+8ieOCkuOAguOBneOC
jOOBi+OCieOAgeWmueOBleOBvuOCkuOAguOBneOCjOOBi+OCieOAgeWmueOBleOBvuOBruW+oeWt
kOOBleOBvuOCkuOAguOBneOCjOOBi+OCieOAgeeah+WQjuOBleOBvuOCkuOAguOBneOCjOOBi+OC
ieOAgeizouiHo+OBruOCouODrOOCreOCueanmOOCkuOAguOAjQrjgIzjgYrjganjgo3jgYTjgZ/j
gILlm73njovjga/kubHlv4PjgYvjgILjgI0K
------------------------------------------------------
2. After the mail was completely imported, do quick search about "アレキス".
for message bodies. The actual source of the message has the name.
However, no message is appear in the search results.
Actual result: no message is found.
Expected result: The imported message appears on the search result.
Reporter | ||
Comment 1•13 years ago
|
||
Oops,
> 2. After the mail was completely imported, do quick search about "アレキス".
> for message bodies. The actual source of the message has the name.
> However, no message is appear in the search results.
Please ignore the last line.
Reporter | ||
Comment 2•13 years ago
|
||
I couldn't reproduce this problem on
Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20110620 Thunderbird/5.0b2
Sorry, it seems to be fixed on lately versions...
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Comment 3•13 years ago
|
||
With another testcase, this problem happens on Thunderbird 5.0.
However, by privacy reason I cannot't upload the actual testcase...
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Reporter | ||
Updated•13 years ago
|
Version: 3.1 → 5.0
Reporter | ||
Comment 4•13 years ago
|
||
Minimum testcase for this problem.
This message includes following text as a base64 encoded string:
----------------------------------------------------------------
https://www.example.com/?randomparam1=2d2b0e756d50b91d96e90c0c1bb1cf51&randomparam2=f84bd4b600
[SEARCHTERM]
----------------------------------------------------------------
Steps to reproduce:
1. Import the attached message to Thunderbird.
I used the addon "ImportExportTools".
2. Go to the folder which the message is imported.
3. Start search by Ctrl-Shift-F.
4. Search a message by the condition:
"Body", "contains", and "SEARCH" (type it into the textbox)
Actual result:
No message found.
Expected result:
The imported message is found.
Attachment #542404 -
Attachment is obsolete: true
Reporter | ||
Updated•13 years ago
|
Summary: Local body search does not work if the body is encoded as Base64 and some lines are "broken" → Local body search does not work if the body is encoded as Base64 and includes long URL
Updated•13 years ago
|
Whiteboard: dupme
I have the similar problem, message body filter doesn't work for messages with Base64 and long URL.
As a result, I can't filter mails from bugzilla...
user-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:7.0) Gecko/20110905
Thunderbird/7.0
...
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Any plans to fix it?
Comment 6•13 years ago
|
||
Confirmed with Tb 7.0.1 on Win-XP.
With attached mail, any of body search for aHR0(string of base64 encoded data), for http(text in decoded data), for SEARCHSTRING text in decoded data), returns nothing.
If the mail is attached as message/rfc822 part by Tb, message/rfc822 part is sent in base64(corrupted format though, known bug),
> Content-Transfer-Encoding: base64
and body search for YUhS(string of base64 encoded data) returns the mail.
This is already known and long-lived bug 37031, and different problem from this bug.
Status: REOPENED → NEW
OS: Linux → All
Summary: Local body search does not work if the body is encoded as Base64 and includes long URL → Local body search does not work if the body is encoded as Base64
Version: 5.0 → 7
Comment 7•13 years ago
|
||
Additional quick observations.
(A-1) "Search All Messages" in Toolbar, When base64 encoded
http, https, https:, https:/, https://, https://w, ... => match
htt => no match
2d2b0e756d50b91d96e90c0c1bb1cf51 => match
Substring of "2d2b0e756d50b91d96e90c0c1bb1cf51" => no match
f84bd4b600 => match
Substring of f84bd4b600 => no match
SEARCHTERM => match
Substring of "SEARCHTERM" => no match
(A-2) "Search All Messages" in Toolbar, When 8bits(not encoded)
Same as (A-1), but may be slightly different.
Gloda correctly indexes using base64 decoded message body.
Above looks a characteristcs of simple Gloda Search. "Term in Gloda(similar to Word)" is roughly continuous non-space/non-special chars, and Tb's Gloda Search probably doesn't use "term start with"/"term end with"/"term contains" like search.
IIRC, Gloda doesn't consider string less than 4 chars "Term". This is probably reason why "htt" doesn't match.
"https" may be plural form of "http" for Gloda. This may be a reason why "http" matches.
(B-1) "Filter These Messages" in Quick Filter Bar, Body, When base64 encoded
(i) SEARCHTERM, any substring of "SEARCHTERM" => no match (this bug)
(ii) URL string, any substring of URL string => no match
(B-2) "Filter These Messages" in Quick Filter Bar, Body, When 8bits(not encoded)
(i) SEARCHTERM, any substring of "SEARCHTERM" => match
(ii) URL string, any substring of URL string => no match
Note: "Filter These Messages" in Quick Filter Bar is same as Edit/Find/Search Messages and Search of folder context menu, and also same as Saved Search Folder(Virtual Folder) of "single search target folder" and of "not online search if IMAP", as far as conditions are set similarly to search at "Filter These Messages" and filter by "Quick Filter".
If URL string, and if base64 encoded message body, both of (B-1)/(i) "problem in Body Search on base64 encoded message body" and (B-2)/(ii) "phenomenon in URL string search on message body of text/plain mail(linkified by Tb always)", looks to occur at same time.
Comment 8•13 years ago
|
||
All mails has message body of next format, and is base64 encoded.
{FINDSTRING} http://x.y.z<depends_on_test_case> [SEARCHTEXT]
(No CRLF/LF/CR to force Tb to send in base64 when text file is attached)
Difference among case-1/2/3
case-1 : URL ends with /?p=...
case-2 : "/" before "?" is removed from case-1, and one byte is added
case-3 : "=" in case-2 is replaced by "X"
Difference between case A and B: URL in case B is one byte longer than case A.
Length of URL is 18 bytes or 19 bytes.
URL in message body is also placed in Subject: for ease of observation.
[Test Result]
Body Search result for:
Subject FINDSTRING SEARCHTEXT http x.y.z
test-1A base64, URL=http://x.y.z/?p=a O O O O
test-1B base64, URL=http://x.y.z/?p=ab X X X X
test-2A base64, URL=http://x.y.z?p=ab X X X X
test-2B base64, URL=http://x.y.z?p=abc X X X X
test-3A base64, URL=http://x.y.z?pXab O O O O
test-3B base64, URL=http://x.y.z?pXabc O O O O
Note: If URL=http://x.y.z#abcde, this bug's problem couldn't be observed.
Because of crafted mail and no CRLF/LF/CR in base64 encoded message body data, different problem from this bug may be involved in avobe test reslt.
Updated•13 years ago
|
Severity: normal → major
Summary: Local body search does not work if the body is encoded as Base64 → Local body search does not work if the body is encoded as Base64 and includes URL with search keyword
Version: 7 → Trunk
Comment 9•13 years ago
|
||
"Nothing is found" occurs even when message body is plain text, is not base64 encoded.
Test mails are "Content-Transfer-Encoding: 8bits, with plain text data" cases for "not found with base64 data" cases in previous test.
[Test Result]
Body Search result for:
Subject FINDSTRING SEARCHTEXT http x.y.z
test-1Btext 8bits, URL=http://x.y.z/?p=ab X X X X
test-2Atext 8bits, URL=http://x.y.z?p=ab X X X X
test-2Btext 8bits, URL=http://x.y.z?p=abc X X X X
Next in my comment #8 was above.
> (B-2) "Filter These Messages" in Quick Filter Bar, Body, When 8bits(not encoded)
> (ii) URL string, any substring of URL string => no match
Removing base64 from bug summary.
FYI.
Bug 132340 for "problem in search of base64 encoded message body" is resolved in 2009.
Comment 10•13 years ago
|
||
This bug may be cause of many hard-to-analyze reports of "false negative in local body search".
Summary: Local body search does not work if the body is encoded as Base64 and includes URL with search keyword → Local body search does not work if the body contains URL with search keyword
Updated•13 years ago
|
Blocks: qfasfailtracker
Comment 11•13 years ago
|
||
(1) Words/terms placed in same line as term of "=.."
base64 or plain text is irrelevant.
Quoted-printable like string Problem occurs or not
("=" followed by two hexa decimal digits)
=ab, =89 Problem occurs
=79, =ag, =ax, =az, =8x Problem doesn't occur
It looks problem occurs only when string larger than or equals to =80.
(2) Words/terms placed in different line.
(2-1) Plain text mail(8bits) : Words/terms in different line is not affected.
Body search can find them. No problem.
(2-2) base64 encoded : Terms in different lines is affected.
Body search can not find them.
(original case of this bug)
(2-3) quoted-printable : Problem won't occur even when in same line,
bug 481616 always occurs though.
So, if URL of original case is changed to "...randomparam1=2g...randomparam2=fg..., problem disappears.
Problem looks relevant to quoted-printable like string only.
URL or not, search keyword or not, were irrelevant. It's merely that URL can have Keyword=Value format string in it and Value part can frequently start with two hexa decimal digits when URL like Google's search.
And, base64 or not was relevant to problem.
When mail is not base64 encoded, problem occurs only on words/terms which is placed in same line. Problem doesn't occur on words/term in different lines.
If mail is base64 encoded, problem occurs even when words/terms are placed in different line from quoted-printable like string. Body Search can not find them.
If plain text, each line is read from folder, so it's split to multiple lines by read/write operation. However, if base64, it's obtained from buffer for decoded data. Line-break handling of search in such case is probably not appropriate.
Updated•13 years ago
|
Summary: Local body search does not work if the body contains URL with search keyword → Local body search does not work if the body contains quoted-printable like string("=" followed by two hexa decimal digits)
Comment 12•13 years ago
|
||
Problem summary.
When message body contains quoted-printable encode like string("=" followed by two hexa decimal digits for special character or characters higher than 0x80),
(i) when plain text, any string in same line is not found by Body Search.
(ii) when base64 encoded, any string in any line is not found by Body Search.
Comment 13•13 years ago
|
||
this fails both quick filter and search messages?
Comment 14•11 years ago
|
||
I can confirm that this problem exists for both quick filter and search messages in TB 24.2.0
Comment 15•10 years ago
|
||
This bug still exists in Thunderbird 24.5.0. To reproduce it you can create a new message with the following body:
Field=ABCDEF
Send the message and select Quick filter (Ctrl+Shift+K) of the Sent folder. Filter by the word "ABCDEF" (without quotes), no results appear. When you search for "=ABCDEF" or "CDE" (without quotes) the message appears.
Here is a complete message that reproduces the problem:
-------------------
From - Mon Jun 02 14:51:25 2014
Return-Path: <xxx@xxx.xxx>
Delivered-To: xxx@xxx.xxx
Message-ID: <538C651E.5060108@xxx.xxx>
Date: Mon, 02 Jun 2014 14:50:54 +0300
From: <xxx@xxx.xxx>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: xxx@xxx.xxx
Subject: Probe
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Field=ABCDEF
Comment 16•10 years ago
|
||
FYI.
Body Search.
> http://mxr.mozilla.org/comm-central/source/mailnews/base/search/src/nsMsgLocalSearch.cpp#491
> http://mxr.mozilla.org/comm-central/source/mailnews/db/msgdb/src/nsMsgHdr.cpp#539
> http://mxr.mozilla.org/comm-central/source/mailnews/base/search/src/nsMsgSearchTerm.cpp#913
> nsMsgSearchTerm::MatchBody (nsIMsgSearchScopeTerm *scope, uint64_t offset, uint32_t length /*in lines*/, const char *folderCharset,
> nsIMsgDBHdr *msg, nsIMsgDatabase* db, bool *pResult)
>
> http://mxr.mozilla.org/comm-central/source/mailnews/base/search/src/nsMsgSearchTerm.cpp#942
> 942 // If there's a '=' in the search term, then we're not going to do
> 943 // quoted printable decoding. Otherwise we assume everything is
> 944 // quoted printable. Obviously everything isn't quoted printable, but
> 945 // since we don't have a MIME parser handy, and we want to err on the
> 946 // side of too many hits rather than not enough, we'll assume in that
> 947 // general direction. Blech. ### FIX ME
> 948 // bug fix #314637: for stateful charsets like ISO-2022-JP, we don't
> 949 // want to decode quoted printable since it contains '='.
Comment 17•8 years ago
|
||
Ticket 1101474 is probably a duplicate of this one.
Please retitle to something like "Some mails with a body containing an equal sign ("=") not matched (false negatives/positives) in some local body searches". Mentioning "quick" somewhere may also help.
Comment 18•8 years ago
|
||
Changing the summary, as suggested in IRC.
Summary: Local body search does not work if the body contains quoted-printable like string("=" followed by two hexa decimal digits) → Mails with a body containing quoted-printable-like strings ("=" followed by 2 hexadecimal digits) not matched (false negatives/positives) in some local body searches
Reporter | ||
Comment 19•6 years ago
|
||
addon workaround |
FYI: There is a workaround addon for this bug.
Search Body in Quoted Printable
https://addons.thunderbird.net/thunderbird/addon/search-body-in-quoted-printabl/
Comment 20•5 months ago
|
||
reverting to earliest affected version
Comment 21•5 months ago
•
|
||
Bug has been reproduced above.
I cannot reproduce comment #15 in TB 115.6.1 @ win10-64bit myself.
Keywords: reproducible
You need to log in
before you can comment on or make changes to this bug.
Description
•