Closed
Bug 220884
Opened 22 years ago
Closed 20 years ago
Filter on 8-bit char not working if untagged Subject is ISO-8859-1 but Content-Type specifies UTF-8
Categories
(MailNews Core :: Filters, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: strathaus, Assigned: sspitzer)
Details
(Keywords: intl)
Attachments
(2 files)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20030925
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20030925
Create a filter that should run if Subject CONTAINS Medium
If the Subject looks like this Cach� Medium wether the Filter will run
nor a search by "Subject or Sender contains:" will work.
If Cach� is removed from the Subject it works.
MIME-Version: 1.0
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Reproducible: Always
Steps to Reproduce:
1. Create a mail with same subject and Content-Type
2. Try to find the mail
3. Create a new filter and try to run it
Actual Results:
Nothing
Expected Results:
- Run the filter as defined (in my case: move the file)
- Show the mail within the results of the search
Comment 1•22 years ago
|
||
*** This bug has been marked as a duplicate of 200938 ***
Status: UNCONFIRMED → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
Summary: Filter not working if Subject start with Cach� → Filter not working if Subject start with Cach�
| Reporter | ||
Comment 2•22 years ago
|
||
I don't think you can compare this issue with #200938 etc.
Cach� is visible in the message source (CTRL+U) only. The subject
line is saying Caché which is correct.
Regards,
Stefan
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
Comment 3•22 years ago
|
||
How do you send the message that has that � entity included in the
subject line? If I compose a message with Mozilla and enter "Caché medium" as
the subject, the sent message (source) contains this:
Subject: =?ISO-8859-1?Q?Cach=E9_medium?=
And with this encoding, the "medium" part is being caught by the filter I've got
set up.
And why 65533, anyway? "é" is character #233 in ISO-8859-1 and in Unicode.
Maybe you could attach a message with this odd subject to this bug.
Comment 4•22 years ago
|
||
See also bug 188847.
Comment 5•21 years ago
|
||
Stefan Schulte-Strathaus, are you still monitoring this bug? If you have a
sample email with the bogus Subject header, please attach to this bug;
otherwise, it is quite difficult to investigate.
| Reporter | ||
Comment 6•21 years ago
|
||
Still in 1.7a.
Having "Cach�" in the subject line causes the search option "Subject or
Recepient contains" not to work.
| Reporter | ||
Comment 7•21 years ago
|
||
Comment 8•21 years ago
|
||
OK, I've investigated the sample message provided by reporter. Note that the
HTML entity '�' is a red herring -- the message contains the actual
character é (ISO-8859-1 character 233, hex E9) on the subject line.
The Subject line of the message is, technically, illegal -- it should be encoded
as Subject: =?ISO-8859-1?Q?Cach=E9 ...?=
While investigating, I did some more testing; see bug 152888 comment 10. In my
"example (1)" there, the message's subject is supposed to be in Asian characters
but is untagged, so the mail appears to be Western style gibberish, with
characters like ¶ and ù in the subject -- and yet, I can filter on those
characters and locate that message.
The difference appears to be in this header from reporter's message:
Content-Type: text/html; charset="UTF-8"
The character é, when encoded for UTF-8, is not a single byte [E9] but two
bytes, [C3 A9]. I duplicated the message in my mail folder, then edited it to
change the Content-Type to be iso-8859-1 -- no other changes -- and I was able
to select it via filter, search and MailView, triggering on é.
Apparently, filtering, viewing and searching tries to decode the message headers
as well as can be done based on the message's supplied Content-Type. This
particular message's subject is encoded one way, the content-type specifies
another way, so the subject line is converted to an "illegal" character (maybe
65533 [FF FD]?). That fails to match the [C3 A9] used by the filter processing.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Filter not working if Subject start with Cach� → Filter not working if Subject start with Caché
Comment 9•21 years ago
|
||
The message I'm attaching is the one described in bug 152888 comment 10,
example (1). Including it here because there is a slight inconsistency between
Mozilla's handling of this message vs. reporter's message:
If this message is placed in a folder which is configured to default to Big5,
it will be displayed in the thread pane with Chinese characters in the name --
and then I can filter, view, or search triggered off one of those characters in
order to find it. This message does not have a Content-Type header, altho the
text/plain portion *does* have one:
Content-Type: text/plain; charset="Windows-1251"
However, if reporter's message is in a folder that defaults to ISO-8859-1, it
will display the é correctly, but still all the matching tools fail. I think
this inconsistency is the bug here.
If reporter's message is in a folder that defaults to UTF-8, the subject
appears as:
Cach????????????????????????
Comment 10•21 years ago
|
||
Updating summary for precision. The more I think about this, tho, the more I
think it should be WONTFIX'd; it's not reasonable to tell Mozilla that a message
has a particular charset and then use a different charset for a header,
particularly since the header should be in 7-bit anyway (or else tagged).
Keywords: intl
Summary: Filter not working if Subject start with Caché → Filter on 8-bit char not working if untagged Subject is ISO-8859-1 but Content-Type specifies UTF-8
Updated•21 years ago
|
Product: MailNews → Core
Comment 11•20 years ago
|
||
WONTFIX is definitively the right choice.
Mozilla mail is a bit incoherent, in that in some situations it use the charset
of the content to interpret the header, and sometimes it relies on the charset
of the folder, but anyway, there's no way a message with an header contained raw
8 bit character in a charset different from the charset used in the body can be
correctly handled in the general case, so there's no reason to try.
Status: NEW → RESOLVED
Closed: 22 years ago → 20 years ago
Resolution: --- → WONTFIX
Updated•17 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•