Last Comment Bug 268459 - Search / filters for Message Body broken for quoted-printable bodies
: Search / filters for Message Body broken for quoted-printable bodies
Status: VERIFIED FIXED
: fixed1.8.1.2
Product: MailNews Core
Classification: Components
Component: Search (show other bugs)
: Trunk
: All All
: -- normal with 2 votes (vote)
: ---
Assigned To: David :Bienvenu
:
Mentors:
: 293191 (view as bug list)
Depends on:
Blocks: 370090
  Show dependency treegraph
 
Reported: 2004-11-08 13:19 PST by Marius Scurtescu
Modified: 2008-07-31 04:30 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
replace soft line break with space (1.57 KB, patch)
2007-02-15 08:16 PST, David :Bienvenu
mscott: superreview+
Details | Diff | Splinter Review
eat soft line breaks, don't insert a space (1.47 KB, patch)
2007-02-15 11:10 PST, David :Bienvenu
mozilla: superreview+
Details | Diff | Splinter Review
fix search case. (2.22 KB, patch)
2007-02-20 13:52 PST, David :Bienvenu
mscott: superreview+
Details | Diff | Splinter Review

Description Marius Scurtescu 2004-11-08 13:19:03 PST
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040616
Build Identifier: version 0.9 (20041103)

Searching messages bodies for plain text messages encoded using quoted-printable
will not generate proper results, many messages will be missed.

The cause seems to be the fact that a literal search is performed on the message
body without first decoding the content. In quoted-printable lines are wrapped
and the = sign is used to mark a continuation. If the string you are searching
for happens to be split across two lines then it will not be found.

Reproducible: Always
Steps to Reproduce:
1. Identify some text message encoded using quoted-printable
2. Look for some string that is wrapped using = as a contonuation marker
3. Search for that string

Actual Results:  
The message is not found

Expected Results:  
It should be found

Here is an example:

==========================================================================
Subject: Production Errors
Mime-Version: 1.0
Content-Type: multipart/mixed; 
	boundary="----=_Part_330_3090970.1099946048164"

------=_Part_330_3090970.1099946048164
Content-Type: text/plain; charset=Cp1252
Content-Transfer-Encoding: quoted-printable

2004-11-08 12:34:04,773 DEBUG [LanguageCache.getLanguage(Line 140)] Could n=
ot find language for locale: en_US. Using language code en to find the lang=
uage.
==========================================================================

Searching for "Could not find language" or "find the language." will not produce
this message.
Comment 1 Mike Cowperthwaite 2004-11-15 13:46:33 PST
Reproduced with TB version 0.9+ (20041112), and also Moz 1.8a5.
Comment 2 reb 2004-12-17 12:58:52 PST
Message filters suffer same problem for me (mozilla 1.5 on Solaris2.8), such as
(Body contains "=>") fails due to the wrapping issue but also because in
quoted-printable an embedded "=" is "=3D".

Sorry for some slightly off-topic digression here: Was hoping to find in
bugzilla an answer to: Are there UNIX mail filters that can simply cleanse all
incoming email, convert quoted-printable to 8-bit absolutely everywhere, even
handling attachments)? I've heard of a few (sendmail) but that only do simple
single-part email bodies, not multi-part or attachments, thus no use to me.

Tempting to request mozilla-mail config option by which it will undo
quoted-printable in all incoming emails. I realize this won't fly due to the
desire to preserve the incoming emails as-is as much as possible. Likewise any
notion of building such conversion into movemail (which I use, built-in) would
not fly. But still permits option of a third-party "external" movemail to do so.
However I anyway use also use procmail-based filters, also broken by
quoted-printable same way as mozilla, thus I'd need a undo-quoted-printable
filter that stands alone from mozilla.
Comment 3 (not reading, please use seth@sspitzer.org instead) 2005-05-07 13:44:31 PDT
*** Bug 293191 has been marked as a duplicate of this bug. ***
Comment 4 (not reading, please use seth@sspitzer.org instead) 2005-05-07 13:46:01 PDT
tbird has the same problem.

cc'ing david / mscott.
Comment 5 Mark 2006-04-23 08:03:09 PDT
Similar to this bug, I found that if the Content-Transfer-Encoding is base64 then I also cannot search for plain text in the body of the message.  Same problem for the message filters.  It seems likely that spammers are using this to get around junk mail filtering.  (It prevents me from filtering out email that has certain undesireable websites in it.)

I am using Tbird 1.5

Am filing here so that a generic solution to different Content-Transfer-Encoding's can be implemented rather than just for quoted-printable.

FYI, an additional complication on the email I am looking at is that charset="iso-2022-jp" but I tried searching and filtering on plain ascii text, so I doubt that this is the reason it didnt work.

Just speculating here, but I wonder if spam filtering is able to see into these encoded body parts.  Various reports of problems in spam identification, like bug #280716 and bug #284308  both have examples that contain quoted-printable body parts....

Comment 6 Mike Cowperthwaite 2006-12-14 09:11:35 PST
Bug 132340 is this same problem, but for base64 instead of q-p.
Comment 7 Magnus Melin 2007-02-13 10:20:07 PST
Linux too, latest branch build.
Comment 8 David :Bienvenu 2007-02-15 08:16:56 PST
Created attachment 255226 [details] [diff] [review]
replace soft line break with space

I believe search at least already goes through the quoted printable decoding; but we weren't dealing with soft line breaks correctly
Comment 9 Mike Cowperthwaite 2007-02-15 09:17:32 PST
You're converting an '=' at the end of the line to a space?  Is that correct?  I thought that was an explicit indicator to concatenate without whitespace; if a space is desired, the line ends with '=20' or else has a space immediately before the '='.
Comment 10 David :Bienvenu 2007-02-15 10:18:51 PST
Mike, you could be right - I'll need to find a test message...
Comment 11 David :Bienvenu 2007-02-15 11:10:40 PST
Created attachment 255241 [details] [diff] [review]
eat soft line breaks, don't insert a space

thx, Mike - you were right as usual. This patch just eats the soft line break, leaving the space. I've tested this doing a quick search on mesasge body's in a folder looking for a string that spans a quoted printable soft line break and it seems to work fine.
Comment 12 David :Bienvenu 2007-02-15 15:31:06 PST
fixed on trunk and branch
Comment 13 Mike Cowperthwaite 2007-02-17 10:55:16 PST
TB 2b2-0217, this doesn't seem to be working for me.  David, I sent you a test message which included these source lines:
===========
Uw advertentie staat dan weer bovenaan in de rubriek! Bellers met een Pre=
Pay telefoon of een pulse telefoon kunnen geen gebruik maken van deze die=
nst. Belgische adverteerders bellen naar: 0903 - 42040.
===========
Searching the folder for Body, contains, "prepay" (or "dienst") fails to find the message.
Comment 14 David :Bienvenu 2007-02-17 17:48:43 PST
Mike, I know you sent me that message, but I can't find it. Can you resend it? Thx!
Comment 15 David :Bienvenu 2007-02-17 19:01:13 PST
Mike, never mind, it was in my junk folder.
Comment 16 David :Bienvenu 2007-02-20 10:56:33 PST
Ugh, in the find case, we do decode the QP one line at a time, whereas in the preview text case (the other caller of this code, which now works), we pass in a block of text, including the line endings. 

So to fix this, I need to change nsMsgSearchTerm::MatchBody to coalesce quoted printable lines that end with '='.
Comment 17 Mike Cowperthwaite 2007-02-20 11:08:16 PST
Which again points to the MIME architecture.  The q-p text should not be what's getting parsed -- C-T-E should be decoded before search ever gets it.
Comment 18 David :Bienvenu 2007-02-20 11:14:14 PST
rewriting body search to go through the mime converter would certainly fix some issues - easy, it isn't.
Comment 19 David :Bienvenu 2007-02-20 13:52:22 PST
Created attachment 255828 [details] [diff] [review]
fix search case.

concatenate lines with QP soft-linebreak before doing search.
Comment 20 David :Bienvenu 2007-02-20 15:33:11 PST
last patch landed on trunk and branch.
Comment 21 Mike Cowperthwaite 2007-02-21 16:12:08 PST
V with 2b2-0221, Win2K.  Works in QuickSearch | Entire Message, too.
Thanks, David.

Note You need to log in before you can comment on or make changes to this bug.