Closed Bug 152888 Opened 22 years ago Closed 19 years ago

search / filter with special characters

Categories

(SeaMonkey :: MailNews: Message Display, defect)

x86
Windows 98
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED EXPIRED

People

(Reporter: odi, Unassigned)

References

Details

(Keywords: intl)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.0.0) Gecko/20020530
BuildID:    2002053012

Many SPAM mails from Asia have a subject line starting with (±¤°í) or similar.
When I enter this string into the filter bar (top of messages pane) no mails are
found although there are some that match.
Obviously the string comparison is not perfect yet.

Note: I do not have asian fonts installed

Reproducible: Always
Steps to Reproduce:
1. Go to a mail folder that contains asian spam. Check visually that there are
emails whose subjects start with (±¤°í)
2. Enter (±¤°í) into the filter bar and press enter


Actual Results:  Search result does not match the expected result.

Expected Results:  I expect the emails in question to appear in the filtered list.

The same seems to apply to Message Filters. Create a single filter "move email
whose subject contains ±¤°í to a special folder" and see that it does not work
correctly.
QA Contact: olgam → marina
would i enter in the filter dlgbox subject "contains" a "┌░гдFь" , that message
would not be filtered neither find and would still appear in your Inbox. But if
you change the encoding to Big5 to make the headers display correctly and enter
into the filter window a chinese string that would get filtered and moved into
the defined folder, or in case of serach would be found. Naoki, if the headers
have no mime info then we can not catch them?
Keywords: intl
hmm... this is really wierd. I just tried it again with POP and IMAP by creating
two more filters on both accounts and entering as a search criteria just "¤",
the messages with this char got filtered and were found when doing the search
for "Subject or sender contains". It is working  both way: entering the chinese
char or "¤". So it looks like i can not confirm the bug, it worksforme.
Jeesun, could you please try this? Thanks
It's not clear to me whether the report's test string is supposed to be the 
actual Asian characters, or the string of ANSI bytes representing the Asian 
characters.
I know at one point I tried setting up a filter that would detect messages with 
"ISO-2022-JP" in the subject, but that never worked; that symptom would appear 
to match this bug.
However, I just tried copying and pasting the actual Chinese characters from a 
message subject into the QuickSearch field, and it properly matched the message. 
This was with 1.4 RC, Windows 2000.  I've seen very few Asian mails in recent 
times anyway due to filtering at my ISP, but those that do come thru have been 
well formed and display as the Asian characters, in subject and in body.

odi@odi.ch: is this bug still an issue for you?  Have you updated Mozilla to a 
current version (1.3 or 1.4)?
See also possibly duplicate or related bug 145398; the filters in that bug had 
been imported from a 4.x installation.
*** Bug 220880 has been marked as a duplicate of this bug. ***
If every email message (even spam mails) is compliant to RfC 2047, 
it'd be relatively easy to fix this, but most spam mails use 'raw' octets with
MSB set (violating RFC 2047) in the mail header(Subject, From, etc), which makes
it more difficult than otherwise to fix this and bug 152888. It's also why you
saw non-Asian characters in Subject of Asian spam mails. If you change your
Character Coding to Korean(EUC-KR), Big5, GB2312, and Shift_JIS[1] for Korean,
Simplified Chinese, Traditional Chinese and Japanese, respectively, you'll see
Korean,SC,TC and Japanese characters instead. Filters still wouldn't work if you
set up your filter rules with those characters. Why?  because (as I wrote
above), those characters are not properly labeled per RFC 2047 and Mozilla-mail
has no way to recognize them as such. Mozilla-mail has per-folder 'Character
Coding' to  assume for 'untagged' message headers and message bodies. However,
it's helpless if you receive *untagged* spam mails in multiple character codings
(EUC-KR, gb2312, big5, shift_jis, iso-2022-jp, etc). 


[1] The majority of Japanese spam mails I get are in Shift_JIS (I've seen very
few spam mails in EUC-JP), but they're _tagged_ as in ISO-2022-JP. 

re comment #4: the reason filtering with the literal string 'iso-2022-jp'
doesn't work is probably because Mozilla-mail filter is applied _after_ RFC 2047
decoding and conversion to Unicode. One possible solution may be  to get
Mozilla-filter applied to the literal octet stream of message headers before
before RFc 2047 decoding and conversion to Unicode as well. This may or may not
have to be a separate bug. 
*** Bug 206684 has been marked as a duplicate of this bug. ***
*** Bug 145398 has been marked as a duplicate of this bug. ***
I've done a little testing on this bug, using Moz 1.6.

What I'm seeing is that if the Subject is correctly tagged, e.g.:
  Subject: =?ISO-2022-JP?B?GyRCPVAycSQkJTUlJCVITDVOQSViJUslPyE8Smc9OBsoQg==?=
then I can filter directly on the character -- just copy the character from the 
subject and plugging it into a filter (or MailView), and the filter triggers.

If it's incorrectly encoded, such that the subject doesn't appear in Asian 
characters in the thread pane, filtering on the Asian characters won't work.  
However, if I filter on the Western characters that are displayed, that *will* 
work: 
  1)  Subject: EMBA¶}½Ò¤F,¤@±i³Ì¦³»ù­ÈªººÓ¤hÃÒ·
No tagging on this, and the jumble of western characters is how it shows up in 
my thread pane.  If I filter on one of those Western characters (e.g. ¶ or ù), 
it will trigger.  The View Source window somehow figures out the encoding (even 
tho the only encoding specified within is Win-1251!) and shows the subject in 
Chinese characters, but a filter on one of those won't trigger.

If I put this message into a folder which has been configured to display 
messages in Big5 (Folder Properties), then it is shown with the Chinese 
characters in the thread pane.  In this case, configuring a MailView or Message 
Search to trigger on an Asian character *does* work, but configuring a Message 
Filter to trigger on the same character still does not work.


  2) Subject: =?ISO-8859-1?B?p9qrqC4uLi4uLi4uvPa89qq6s0q7ZaXE?=
The character set is actually Big5, but it's been tagged as ISO-8859-1.  So, 
Mozilla displays that string of bytes as "§Ú«¨........¼ö¼öªº³J»e¥Ä".  (In this 
case, View Source doesn't change the display of the subject, nor does the 
subject appear in Chinese if moved into the Big5-default folder.)  If I 
specifically select Big5 encoding in the message display, that apparently 
overrides the tag and the subject appears in Chinese characters; but I can't 
filter/search/view on one of those characters, even in a folder configured for 
Big5.  I can, however, filter on '§' or 'Ú'.

My interpretation of the original report is that reporter was filtering on the 
western characters, rather than the Asian ones.  Perhaps in Moz 1.0, that didn't 
work, but it appears to be working for me now.

Note that the two bugs I duped over earlier both stated that the filters that 
weren't working were ported from a Netscape 4 installation.

Reporter (odi@odi.ch) -- please comment as to whether this is working for you.
Bob Winners, Eric Nelson -- please state whether the Netscape 4 filters were 
originally defined with Asian characters or with the Western gibberish that 
happened to be how the Asian characters were rendered.
I wrote:
> If I put this message into a folder which has been configured to display 
> messages in Big5 (Folder Properties), then it is shown with the Chinese 
> characters in the thread pane.  In this case, configuring a MailView or
> Message Search to trigger on an Asian character *does* work, but configuring
> a Message Filter to trigger on the same character still does not work.

I was partially incorrect; running a Message Filter based on a Chinese character 
in the subject *does* work, if the message resides in a folder configured for 
Big5.  
(This message, incidentally, can be found in attachment 141633 [details] from bug 220884.)
Product: Browser → Seamonkey
Assignee: sspitzer → mail
This is an automated message, with ID "auto-resolve01".

This bug has had no comments for a long time. Statistically, we have found that
bug reports that have not been confirmed by a second user after three months are
highly unlikely to be the source of a fix to the code.

While your input is very important to us, our resources are limited and so we
are asking for your help in focussing our efforts. If you can still reproduce
this problem in the latest version of the product (see below for how to obtain a
copy) or, for feature requests, if it's not present in the latest version and
you still believe we should implement it, please visit the URL of this bug
(given at the top of this mail) and add a comment to that effect, giving more
reproduction information if you have it.

If it is not a problem any longer, you need take no action. If this bug is not
changed in any way in the next two weeks, it will be automatically resolved.
Thank you for your help in this matter.

The latest beta releases can be obtained from:
Firefox:     http://www.mozilla.org/projects/firefox/
Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html
Seamonkey:   http://www.mozilla.org/projects/seamonkey/
This bug has been automatically resolved after a period of inactivity (see above
comment). If anyone thinks this is incorrect, they should feel free to reopen it.
Status: UNCONFIRMED → RESOLVED
Closed: 19 years ago
Resolution: --- → EXPIRED
works for me in TB 1.0.6
You need to log in before you can comment on or make changes to this bug.