Closed
Bug 1478828
Opened 7 years ago
Closed 7 years ago
Quick Filter and Search Messages by Body fail to find messages if message is encoded in an invalid charset
Categories
(Thunderbird :: Search, defect)
Tracking
(thunderbird_esr60 fixed, thunderbird62 affected, thunderbird63 fixed)
RESOLVED
FIXED
Thunderbird 63.0
People
(Reporter: jmdyck, Assigned: jorgk-bmo)
References
Details
Attachments
(5 files, 2 obsolete files)
3.52 KB,
text/plain
|
Details | |
4.11 KB,
text/plain
|
Details | |
4.10 KB,
patch
|
aceman
:
review+
jorgk-bmo
:
approval-comm-esr60+
|
Details | Diff | Splinter Review |
11.18 KB,
patch
|
Details | Diff | Splinter Review | |
3.28 KB,
patch
|
jorgk-bmo
:
review+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0
Build ID: 20180704192850
Steps to reproduce:
1. Went to a folder containing lots of messages (spam) that begin "We considered your resume to be very attractive".
2. Do a 'Quick Filter" (filtering messages by Body) on "resume" (or "attractive").
3. Do a 'Search Messages' on the folder for "Body contains resume" (or ".. attractive").
Actual results:
Either way, I got only 5 hits, all cases where I was getting a bounce-report that included the text of the original spam.
Expected results:
I should have got hundreds of hits.
(I got the bounce report because the spammer used my address as the From on the original spam.)
Attachment #8995355 -
Attachment mime type: text/plain → message/rfc822
Assignee | ||
Updated•7 years ago
|
Attachment #8995349 -
Attachment mime type: message/rfc822 → text/plain
Assignee | ||
Updated•7 years ago
|
Attachment #8995355 -
Attachment mime type: message/rfc822 → text/plain
Assignee | ||
Comment 3•7 years ago
|
||
Interesting ;-)
The "false negative" message is encoded in a charset cp-850 I've never heard of. If you export the message, change that to, say, windows-1252, your message is found when you import it again.
The rejection notice doesn't specify a charset and quotes the original message verbatim. So the string is and should be found there.
I can correct the charset handling, so if an unknown charset is detected, we default to something else.
The cause is bad error handling here:
https://dxr.mozilla.org/comm-central/rev/e5e1510b8d914bfa8439b21ba3f73e4f2e83e957/mailnews/base/search/src/nsMsgSearchTerm.cpp#1121
nsMsgI18NConvertToUnicode() will fail on a bad charset and we pretend everything went fine.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Quick Filter and Search Messages by Body fail to find messages → Quick Filter and Search Messages by Body fail to find messages if message is encoded in an invalid charset
Assignee | ||
Comment 4•7 years ago
|
||
This does the trick.
According to https://www.iana.org/assignments/character-sets/character-sets.xhtml, "cp850" is an alias for "IBM850", which is a valid character set, registered with IANA. (Not that I care about my spam being correctly decoded, but there might be people legitimately using the charset.) Of course, if Thunderbird doesn't grok IBM850, there's not much else you can do.
Assignee | ||
Comment 6•7 years ago
|
||
Sorry, we use Mozilla core decoding software, if that doesn't recognise the charset, there's little I can do. That said, the e-mail doesn't contain "cp850" but "cp-850".
Assignee | ||
Comment 7•7 years ago
|
||
Before I hear "please add a test", here's one.
Attachment #8995708 -
Attachment is obsolete: true
Attachment #8995708 -
Flags: review?(acelists)
Attachment #8995739 -
Flags: review?(acelists)
Comment on attachment 8995739 [details] [diff] [review]
1478828-bad-charset.patch - with test
Review of attachment 8995739 [details] [diff] [review]:
-----------------------------------------------------------------
So what is the idea in this patch? In case of unknown charset you try again with UTF8 so that maybe something useful comes out?
Attachment #8995739 -
Flags: review?(acelists) → review+
Assignee | ||
Comment 9•7 years ago
|
||
(In reply to :aceman from comment #8)
> In case of unknown charset you try again with UTF8 so that maybe something useful comes out?
Yes. The existing code tried UTF-8 if no charset was given. So I'm extending this to the case where the charset is bad. With luck, the message as ASCII, like in the bug here, and that will do something useful. Worst case, we don't find anything.
Assignee | ||
Comment 10•7 years ago
|
||
There's an awful lot of badly formatted and ugly code in that file. Here I'm fixed the worst bits I saw.
Comment 11•7 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/dc35d06acdd4
try matching string with UTF-8 if conversion fails. r=aceman
https://hg.mozilla.org/comm-central/rev/95eb75ce038f
some white-space clean-up in nsMsgSearchTerm.cpp. rs=white-space-only
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 12•7 years ago
|
||
My absolute favourite was this code which I removed:
-#ifdef DO_I18N
- if(conv)
- INTL_DestroyCharCodeConverter(conv);
-#endif
'DO_I18N' which is thankfully never defined anywhere calling 'INTL_DestroyCharCodeConverter()' which doesn't exist :-(
Target Milestone: --- → Thunderbird 63.0
Assignee | ||
Comment 13•7 years ago
|
||
I've been thinking about this.
nsMsgI18NConvertToUnicode() treats an empty charset as ASCII/windows-1252, so basically a single-byte encoding.
Here failure to convert uses ASCII/windows-1252 as well:
https://dxr.mozilla.org/comm-central/rev/e5e1510b8d914bfa8439b21ba3f73e4f2e83e957/mailnews/mime/src/mimedrft.cpp#1553
So trying UTF-8 unconditionally is maybe a bit ambitious. This one will let me sleep better ;-)
Note that since the arrival of encoding_rs, ISO-8859-1 is just a label for windows-1252.
Attachment #8995806 -
Flags: review?(acelists)
Attachment #8995806 -
Flags: review?(acelists) → review+
Assignee | ||
Comment 14•7 years ago
|
||
Tweaked the test so it exercises the UTF-8 and the windows-1252 code paths.
Attachment #8995806 -
Attachment is obsolete: true
Attachment #8995823 -
Flags: review+
Comment 15•7 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/9dc3759e20aa
If no charset or conversion failed, check for UTF-8 and fallback to ASCII. r=aceman
Assignee | ||
Comment 16•7 years ago
|
||
Comment on attachment 8995739 [details] [diff] [review]
1478828-bad-charset.patch - with test
Applies to all three patches here.
Attachment #8995739 -
Flags: approval-comm-esr60+
Attachment #8995739 -
Flags: approval-comm-beta+
Assignee | ||
Comment 17•7 years ago
|
||
TB 60 ESR:
https://hg.mozilla.org/releases/comm-esr60/rev/1cdbedbbfde5
https://hg.mozilla.org/releases/comm-esr60/rev/19785e4db13d
https://hg.mozilla.org/releases/comm-esr60/rev/31f40346359d
status-thunderbird62:
--- → affected
status-thunderbird63:
--- → fixed
status-thunderbird_esr60:
--- → fixed
Assignee | ||
Updated•7 years ago
|
Attachment #8995739 -
Flags: approval-comm-beta+
You need to log in
before you can comment on or make changes to this bug.
Description
•