Closed Bug 21267 Opened 20 years ago Closed 18 years ago

When "Re:" is encoded, the sort order is not correct

Categories

(MailNews Core :: MIME, defect, P4)

defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0

People

(Reporter: momoi, Assigned: nhottanscp)

References

Details

(Keywords: intl)

Attachments

(4 files, 6 obsolete files)

** Observed with 12/8/99 Win32 build **

This bug was originally filed for 4.x recently but it also applies to
Mozilla. There is also an issue as to whether or not this is the
receiver problem as opposed to the sender's, but it is worth considering.

The problem in nutshell is this:

"If you have an encoded header which includes "Re: .." and "Re: .."
itself is encoded as sometimes may happen in Japanese B64-encoded
headers, then when we sort this, we put it along with the
other alphabetic headers rather than with the headers which
contain the nearest Japanese string match, i.e. the nearest
match to the Japanese portion after "Re:".

To see this problem, use the attached zip file which
contain the Japanese sort data. I added two messages
for this prupose, 2 headers when decoded are identical.

Subject: =?iso-2022-jp?B?UmU6IBskQiRKJHMkQCQrJEokISFKNmw+UCFLGyhC?=
Subject: Re: =?iso-2022-jp?B?GyRCJEokcyRAJCskSiQhIUo2bD5QIUsbKEI=?=

The difference is that in the first one, the whole string is
MIME-encoded and this causes the sort order to
be less than optimal. Apparently, we deal with
"Re:" only in one place -- before MIME-decoding.

In the test file, you see 2 headers which begins with "Re:".
The Japanese strings which follows "Re:" are identical and
therefore these 2 should be side by side but they are not.
The 2nd one is where both of them should be.

The 4.x bug number is: 379726.
Please save the attached file as: jpnsort.zip
Assignee: rhp → putterman
I believe you are talking about the headers being displayed in the thread pane,
which is Scott's area. Let me know if I'm missing something here.

- rhp
cc'ing bienvenu.

I attach "Re:" if the reply flag is set.  I don't think that happens just
because "Re:" is in the subject.  I might be wrong about that though.
Assignee: putterman → bienvenu
If this bug is saying we should parse for Re: after mime decoding, I guess that
would be mine. No promises, though, since I don't know how hard it would be to
do.
To clarify, this is a display bug in the thread pane.
The end results we want is that 2 msgs with identical
strings in JPN (when decoded) should be displayed side by
side rather than apart as we do now.
Status: NEW → ASSIGNED
Target Milestone: M20
QA Contact: lchiang → momoi
Keywords: intl
Severity: major → normal
QA contact to marina.
QA Contact: momoi → marina
Retested with 05/31 build.
The sorting order is correct now, the reply msg shows up following the original 
msg.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
** 6/1/2001 Win32 Trunk & Branch builds **

The problem is not fixed with the above builds and as far as I know, 
no work has been done in this area to fix the problem. Please re-test 
by clicking on the "subject" column if the test mail folder.
Re-opening ... 
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch proposed fix (obsolete) — Splinter Review
This patch also fixes bug 79848.
add dependency
Depends on: 73403
someone like Naoki should r this and I can sr it. thx.
Taka, could you put a description of the change?
The cause of both bug 21267 and bug 79848 is in NS_MsgStripRE() in
base/util/nsMsgUtils.cpp.  It strips "Re:" from given string blindly,
it doesn't care if the given string is a MIME encoded-word.

The proposed fix calls MIME decoder if the string looks like MIME
encoded-word, strips "Re:", and encode the result back to MIME
encoded-word.  The last step might be unnecessary.  Let me try
without encoding back and see how it works.
Tried without applying MIME encode after striping "Re:", and it looks working
fine.  As a result, Subject line in mailbox DB will be stored in straight UTF-8
string instead of MIME encoded-word.  This might cause compatibility issue if
existing user upgrades to a binary with this patch.  So far, sorting and other
things are working fine, though.
Attachment #67365 - Attachment is obsolete: true
I would rather store the header in the db as close to possible as it occurs in
the actual message, and I'd rather not deal with compatibility issues, so I'd
prefer that you leave it MIME encoded, if possible. Is there any reason to
change that?
By removing "Re:", subjects in mailbox DB are already different from the
original message.  I believe this decision was made for performance. Since
subject in mailbox DB doesn't have annoying "Re:" for sorting, sorting function
is able to perform simple comparison against subjects
in DB.

For the same reason, you don't want to decode MIME encoded-word everytime user
changes sorting order, or cool subject search.  Probably, we should do the same
for sender's address for the same reason, but that's a different story.

Also, applying MIME encode is relatively expensive thing since it has to deal
with folding.  Unless there is a piece of code which doesn't like UTF-8 subject,
this effort is just waste of time because it eventually gets decoded.

As for compatibility issue, this is what pre 1.0 releases are for, I believe.
I'm very confident that my change won't break any because I haven't touched any
other code which deals with subject from mailbox DB.  In theory, everything
should work since existing code automatically decodes it if it's MIME
encoded-word, does nothing if it's not MIME encoded-word (e.g. ASCII or UTF-8).
There are already millions of users of this code. And this statement confuses me:
"I haven't touched any other code which deals with subject from mailbox DB" If
you're changing the format of what we store in the db, I would be more reassured
if you had touched some of the other code that deals with the subject from the
mailbox db. If you're sure that all possible users of the subject field in the
db don't care if it's MIME encoded or not, then I guess you could do this, but
if any user does care, then we'd have to upgrade all the db's out there, and I
don't want to do that.

We already have subject in two different format, ASCII and MIME encoded-word,
and there is a piece of code which knows how to deal with them.  You don't see
MIME encoded-word in UI because the code is smart enough to recognize these two
different formats and do whatever needs to be done.  Through out Mozilla code
base, UTF-8 is natural extension of ASCII.  That's why I didn't have to touch
any other code to deal with UTF-8 subject.  The code has already been here, why
don't we use it?

If, only if, there are compatibility issues, they are in our code, not in DB.
Thus, no DB upgrade needed for existing users.  We must make sure that there is
no problem in our code (not in DB) in 0.9.9 release, though.  That's what I
meant by saying "this is what pre 1.0 releases are for". (Assuming we don't
recommend Netscape 6.x users to install Mozilla 0.9.9 on top of it :-)


I understand that the db is agnostic about the encoding - it will just
faithfully report back whatever's put in it. If I understand correctly, all the
current clients of the subject field expect that data to be mime2 encoded, or
ascii, and you're saying that mime2 decoding a string that's already been mime2
decoded is a noop. That's what would have to be true for callers not to care. If
that's true, then it should be OK - it still makes me nervous, however, because
if there's a chance it's not true, all hell will break loose, if you'll pardon
my french :-)

a couple other comments - 
1. you might as well remove the #if 0 code. 
2. Linking in libmime to msgbase is a bit evil and not very com-friendly (JF,
what do you think?). It will make msgbaseutl bigger just by including the link
defs of libmime...
Attached patch revised. no more #if 0 (obsolete) — Splinter Review
I'm not good at COM stuff.  Could someone step in and get rid of libmime
dependency?
Attachment #67476 - Attachment is obsolete: true
You can use nsIMimeConverter to avoid linking libmime.

>mime2 decoding a string that's already been mime2 decoded is a noop.
The decoder still checks if the string is valid UTF-8 but I assume the check is
very fast.


reassign to myself
Assignee: bienvenu → taka
Status: REOPENED → NEW
merged the patch with the patch for 73403. see bug 73403.
Status: NEW → ASSIGNED
OS/Platform independent.  Set TM to 0.9.9.
OS: Windows NT → All
Hardware: PC → All
Target Milestone: --- → mozilla0.9.9
Blocks: 79848
Attached patch proposed patch (obsolete) — Splinter Review
Per nhotta@netscape.com 's request, split a patch into two, one for bug 73403,
one for bug 21267.
Attachment #67505 - Attachment is obsolete: true
remove dependency. to me, this is at least P2 bug since OE5.x encodes whole
subject including "Re:".  raise priority from P3 to P2.
No longer depends on: 73403
Priority: P3 → P2
There is charset override feature which is for the messages with incorrect MIME
charset. If we store MIME decoded UTF-8 in db then the override would not be
possible. That has to be taken care. Is that possilbe to MIME encode after
removing "Re" (is that what the first patch id=67365 does)?
Is there any test case for that scenario?
IQA, could you attach the test case for incorrect MIME?
I think we are talking about scenario *with* MIME encoding, but wrong charset.
FYI.  Regarding the test case posted as id=68174, charset override may not
work as expected if user is using IMAP server due to the fact that latest IMAP 
server implementations create body structure upon message delivery.  This cannot
be done correctly if non ASCII header exists w/o MIME encoding.
Do you mean that the charset override for messages w/o MIME header will be
broken if connected to the new version of Netscape Messenger server?
Doesn't have to be Netscape server.  It depends on how IMAP server stores and
represents body structure information.
The test case posted as id=68196, it's a bit unrealistic to me.
I've never seen such MUA sending ISO-2022-JP with Shift_JIS charset name.
Sending Q-encoded Shift_JIS with ISO-8859-1 charset name seems
more realistic.

Anyways, that's true that charset overriding no longer works
once we switch to UTF-8 in mailbox db, thus we need to re-encode
subject in NS_MsgStripeRE().  This implies that some performance
decrease may occur when message thread pane gets updated.
Can you re-encode only if the decoded header contains "Re:"?
No prob. 

For those who lives in ideal world (i.e. everybody sends RFC-2047 compliant
message), I will introduce new pref variable (most likely
"mailnews.summary.utf8") so that we can completely avoid re-encoding subject. 
When it's set, sender name should also be decoded.
I'm getting the impression that doing the decoding and sticking the decoded
string in the database is data loss, which is exactly what I wanted to avoid by
leaving the string "as is" from the mail message. Just to make sure, we're not
going to do that, right? And, the only additional performance hit we're taking
is when parsing message headers that are mime-encoded? We're not taking any
additional hit when displaying headers in the thread pane that we weren't taking
before?
> Just to make sure, we're not going to do that, right?

By default, yes.

As option, there will be a new pref which prevents decoded subject from 
re-encoding.  For thoese who uses IMAP from work, home, and on the road, this
new pref will be a big help as far as there is virtually no message with busted
subject.
nhotta is working on related bugs and we both agree to reassign this to nhotta.
Assignee: taka → nhotta
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Keywords: nsbeta1
we have the same problem in 4.x 
we should find out what OutlookExpress do.
keep it as nsbeta1 (not + nor -) for now.
OE 5 recognizes the encoded 'Re:'.
I tested using the attached data.
How about Eudora? I don't have that.
In Comment #44 Frank Tang wrote:

> we have the same problem in 4.x 

AFAIK, the correct statement is "we used to have", not "we have".
Eudora can also recognize the encoded 'Re:'. Tested with Eudora 4.3-J.
Target Milestone: mozilla0.9.9 → mozilla1.0
nsbeta1+ per triage meeting 
Keywords: nsbeta1nsbeta1+
I think we only need to do this if the header is MIME encoded including "Re:".
If "Re:" is not encoded then no need to do the decode/encode. I need to learn
the code before doing the change.
You cannot decide not to decode MIME encoded portion by looking at "Re:" at the
eginning of subject because the current Mozilla code blindly adds "Re:" when
replying regardless of the rest of subject.  That's causing multiple "Re:".
But NS_MsgStripRE doesn't seem to be adding "Re:". 
We don't blindly add Re: to every reply - if I reply to a reply to a reply, it
doesn't end up with Re: Re: Re:. I think what Takayuki is saying is that if the
Re:   encoded, we don't notice that. So, obviously (I think), if we want this to
work correctly, we would store the encoded subject exactly as is but w/o the Re:
and set the Re: flag on the header, like we do for headers where Re: is not encoded.
>and set the Re: flag on the header, like we do for headers where Re: is not 
>encoded.
Does that done by the callers of NS_MsgStripRE? And they are replying on the
result of NS_MsgStripRE?
Naoki, it's not "done" by anyone; it just works naturally. In the db, we store
the subject w/o the "Re:" (I guess we stip off the re:with NS_MsgStripRe, if
that's what you mean, so yes). Then, the reply code sticks a "Re:" in front of
the subject to do the reply.
Thanks. One more question. Does the function strips off more than one "Re:" or
just the first one? I don't see a reason for stripping off more than one.
Naoki, I can't claim to be an expert in that code, but it looks to me like it
attempts to strip off all Re:'s and a few other things. The reason it does this,
I assume, is that other mail and news clients might end up attaching multiple
"Re:'s" so we strip off all we see. This code is run on all messages, not just
messages sent by Netscape clients :-)
Thanks, then I need to look for "Re:" for encoded subjects even "Re:" is already
found in the non encoded part (cannot do as my comment #49).

I applied the initial patch (which encodes again after strip off "Re:", id=67365).
Then all of MIME encoded headers are displayed "Re:" prepended in thread pane.
That is thread pane only, they are shown without "Re:" in evelop view.
The problem is that it sets the return result to true after encoding the string.
The result is already set in the previous section, so no need to reset. Also, if
the result is not true (i.e. no "Re:" stripped) then no need to encode the header.
Attachment #68141 - Attachment is obsolete: true
David, could you 'r' (or 'sr')?
+        p1 += sizeof("=?")-1;

is this really what you want?  Don't you really just want p1++? or p1+= strlen("=?"
)-1; ?
It needs to proceed by 2 bytes for "=?" after strstr search and point to "=?",
then it has to look for another "?" to retrieve the charset name (e.g.,
=?ISO-8859-1?). 
are you expecting sizeof("?=") to return 4, 3, or 2?
 3
sizeof is not as elegant than strlen but way more efficient...
Comment on attachment 71964 [details] [diff] [review]
Decode MIME encoded header before parsing for "Re:".

r=bienvenu - you might want to add a comment to make it more readable.
Attachment #71964 - Flags: review+
Attachment #71964 - Attachment is obsolete: true
Comment on attachment 72706 [details] [diff] [review]
Added comments for the charset name parsing code.

1) instead of 

if (s - decodedString.get())

why not

if (s != decodedString.get())

which seems clearer to me.

2)

instead of sizeof("Subject: ")-1

why not

sizeof("Subject:")

3)

good job on protecting the buffer overrun of charset[].

curious, what happens if we pass in "" as the charset?	what will
EncodeMimePartIIStr_UTF8() do?	(hopefully not crash)

address those issues, and sr=sspitzer
Attachment #72706 - Flags: superreview+
Comment on attachment 73043 [details] [diff] [review]
Address the super-reviewer's comment. When the charset is empty the encoder returns an error.

carry 'r' 'sr'
Attachment #73043 - Flags: superreview+
Attachment #73043 - Flags: review+
Comment on attachment 73043 [details] [diff] [review]
Address the super-reviewer's comment. When the charset is empty the encoder returns an error.

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #73043 - Flags: approval+
Priority: P2 → P4
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 19 years ago18 years ago
Resolution: --- → FIXED
verified
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.