Closed Bug 220884 Opened 22 years ago Closed 20 years ago

Filter on 8-bit char not working if untagged Subject is ISO-8859-1 but Content-Type specifies UTF-8

Categories

(MailNews Core :: Filters, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: strathaus, Assigned: sspitzer)

Details

(Keywords: intl)

Attachments

(2 files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20030925 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20030925 Create a filter that should run if Subject CONTAINS Medium If the Subject looks like this Cach� Medium wether the Filter will run nor a search by "Subject or Sender contains:" will work. If Cach� is removed from the Subject it works. MIME-Version: 1.0 Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable Reproducible: Always Steps to Reproduce: 1. Create a mail with same subject and Content-Type 2. Try to find the mail 3. Create a new filter and try to run it Actual Results: Nothing Expected Results: - Run the filter as defined (in my case: move the file) - Show the mail within the results of the search
*** This bug has been marked as a duplicate of 200938 ***
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
Summary: Filter not working if Subject start with Cach� → Filter not working if Subject start with Cach�
I don't think you can compare this issue with #200938 etc. Cach� is visible in the message source (CTRL+U) only. The subject line is saying Caché which is correct. Regards, Stefan
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
How do you send the message that has that � entity included in the subject line? If I compose a message with Mozilla and enter "Caché medium" as the subject, the sent message (source) contains this: Subject: =?ISO-8859-1?Q?Cach=E9_medium?= And with this encoding, the "medium" part is being caught by the filter I've got set up. And why 65533, anyway? "é" is character #233 in ISO-8859-1 and in Unicode. Maybe you could attach a message with this odd subject to this bug.
See also bug 188847.
Stefan Schulte-Strathaus, are you still monitoring this bug? If you have a sample email with the bogus Subject header, please attach to this bug; otherwise, it is quite difficult to investigate.
Still in 1.7a. Having "Cach�" in the subject line causes the search option "Subject or Recepient contains" not to work.
OK, I've investigated the sample message provided by reporter. Note that the HTML entity '�' is a red herring -- the message contains the actual character é (ISO-8859-1 character 233, hex E9) on the subject line. The Subject line of the message is, technically, illegal -- it should be encoded as Subject: =?ISO-8859-1?Q?Cach=E9 ...?= While investigating, I did some more testing; see bug 152888 comment 10. In my "example (1)" there, the message's subject is supposed to be in Asian characters but is untagged, so the mail appears to be Western style gibberish, with characters like ¶ and ù in the subject -- and yet, I can filter on those characters and locate that message. The difference appears to be in this header from reporter's message: Content-Type: text/html; charset="UTF-8" The character é, when encoded for UTF-8, is not a single byte [E9] but two bytes, [C3 A9]. I duplicated the message in my mail folder, then edited it to change the Content-Type to be iso-8859-1 -- no other changes -- and I was able to select it via filter, search and MailView, triggering on é. Apparently, filtering, viewing and searching tries to decode the message headers as well as can be done based on the message's supplied Content-Type. This particular message's subject is encoded one way, the content-type specifies another way, so the subject line is converted to an "illegal" character (maybe 65533 [FF FD]?). That fails to match the [C3 A9] used by the filter processing.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Filter not working if Subject start with Cach� → Filter not working if Subject start with Caché
The message I'm attaching is the one described in bug 152888 comment 10, example (1). Including it here because there is a slight inconsistency between Mozilla's handling of this message vs. reporter's message: If this message is placed in a folder which is configured to default to Big5, it will be displayed in the thread pane with Chinese characters in the name -- and then I can filter, view, or search triggered off one of those characters in order to find it. This message does not have a Content-Type header, altho the text/plain portion *does* have one: Content-Type: text/plain; charset="Windows-1251" However, if reporter's message is in a folder that defaults to ISO-8859-1, it will display the é correctly, but still all the matching tools fail. I think this inconsistency is the bug here. If reporter's message is in a folder that defaults to UTF-8, the subject appears as: Cach????????????????????????
Updating summary for precision. The more I think about this, tho, the more I think it should be WONTFIX'd; it's not reasonable to tell Mozilla that a message has a particular charset and then use a different charset for a header, particularly since the header should be in 7-bit anyway (or else tagged).
Keywords: intl
Summary: Filter not working if Subject start with Caché → Filter on 8-bit char not working if untagged Subject is ISO-8859-1 but Content-Type specifies UTF-8
Product: MailNews → Core
WONTFIX is definitively the right choice. Mozilla mail is a bit incoherent, in that in some situations it use the charset of the content to interpret the header, and sometimes it relies on the charset of the folder, but anyway, there's no way a message with an header contained raw 8 bit character in a charset different from the charset used in the body can be correctly handled in the general case, so there's no reason to try.
Status: NEW → RESOLVED
Closed: 22 years ago20 years ago
Resolution: --- → WONTFIX
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: