Closed
Bug 21267
Opened 25 years ago
Closed 23 years ago
When "Re:" is encoded, the sort order is not correct
Categories
(MailNews Core :: MIME, defect, P4)
MailNews Core
MIME
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla1.0
People
(Reporter: momoi, Assigned: nhottanscp)
References
Details
(Keywords: intl)
Attachments
(4 files, 6 obsolete files)
21.67 KB,
application/x-zip-compressed
|
Details | |
17.46 KB,
application/octet-stream
|
Details | |
603 bytes,
application/octet-stream
|
Details | |
2.66 KB,
patch
|
nhottanscp
:
review+
nhottanscp
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
** Observed with 12/8/99 Win32 build **
This bug was originally filed for 4.x recently but it also applies to
Mozilla. There is also an issue as to whether or not this is the
receiver problem as opposed to the sender's, but it is worth considering.
The problem in nutshell is this:
"If you have an encoded header which includes "Re: .." and "Re: .."
itself is encoded as sometimes may happen in Japanese B64-encoded
headers, then when we sort this, we put it along with the
other alphabetic headers rather than with the headers which
contain the nearest Japanese string match, i.e. the nearest
match to the Japanese portion after "Re:".
To see this problem, use the attached zip file which
contain the Japanese sort data. I added two messages
for this prupose, 2 headers when decoded are identical.
Subject: =?iso-2022-jp?B?UmU6IBskQiRKJHMkQCQrJEokISFKNmw+UCFLGyhC?=
Subject: Re: =?iso-2022-jp?B?GyRCJEokcyRAJCskSiQhIUo2bD5QIUsbKEI=?=
The difference is that in the first one, the whole string is
MIME-encoded and this causes the sort order to
be less than optimal. Apparently, we deal with
"Re:" only in one place -- before MIME-decoding.
In the test file, you see 2 headers which begins with "Re:".
The Japanese strings which follows "Re:" are identical and
therefore these 2 should be side by side but they are not.
The 2nd one is where both of them should be.
The 4.x bug number is: 379726.
Reporter | ||
Comment 1•25 years ago
|
||
Reporter | ||
Comment 2•25 years ago
|
||
Please save the attached file as: jpnsort.zip
Updated•25 years ago
|
Assignee: rhp → putterman
Comment 3•25 years ago
|
||
I believe you are talking about the headers being displayed in the thread pane,
which is Scott's area. Let me know if I'm missing something here.
- rhp
Comment 4•25 years ago
|
||
cc'ing bienvenu.
I attach "Re:" if the reply flag is set. I don't think that happens just
because "Re:" is in the subject. I might be wrong about that though.
Updated•25 years ago
|
Assignee: putterman → bienvenu
Comment 5•25 years ago
|
||
If this bug is saying we should parse for Re: after mime decoding, I guess that
would be mine. No promises, though, since I don't know how hard it would be to
do.
Reporter | ||
Comment 6•25 years ago
|
||
To clarify, this is a display bug in the thread pane.
The end results we want is that 2 msgs with identical
strings in JPN (when decoded) should be displayed side by
side rather than apart as we do now.
Updated•25 years ago
|
Status: NEW → ASSIGNED
Target Milestone: M20
Reporter | ||
Updated•24 years ago
|
Severity: major → normal
Retested with 05/31 build.
The sorting order is correct now, the reply msg shows up following the original
msg.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 9•24 years ago
|
||
** 6/1/2001 Win32 Trunk & Branch builds **
The problem is not fixed with the above builds and as far as I know,
no work has been done in this area to fix the problem. Please re-test
by clicking on the "subject" column if the test mail folder.
Re-opening ...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 10•23 years ago
|
||
This patch also fixes bug 79848.
Comment 12•23 years ago
|
||
someone like Naoki should r this and I can sr it. thx.
Assignee | ||
Comment 13•23 years ago
|
||
Taka, could you put a description of the change?
Comment 14•23 years ago
|
||
The cause of both bug 21267 and bug 79848 is in NS_MsgStripRE() in
base/util/nsMsgUtils.cpp. It strips "Re:" from given string blindly,
it doesn't care if the given string is a MIME encoded-word.
The proposed fix calls MIME decoder if the string looks like MIME
encoded-word, strips "Re:", and encode the result back to MIME
encoded-word. The last step might be unnecessary. Let me try
without encoding back and see how it works.
Comment 15•23 years ago
|
||
Tried without applying MIME encode after striping "Re:", and it looks working
fine. As a result, Subject line in mailbox DB will be stored in straight UTF-8
string instead of MIME encoded-word. This might cause compatibility issue if
existing user upgrades to a binary with this patch. So far, sorting and other
things are working fine, though.
Comment 16•23 years ago
|
||
Updated•23 years ago
|
Attachment #67365 -
Attachment is obsolete: true
Comment 17•23 years ago
|
||
I would rather store the header in the db as close to possible as it occurs in
the actual message, and I'd rather not deal with compatibility issues, so I'd
prefer that you leave it MIME encoded, if possible. Is there any reason to
change that?
Comment 18•23 years ago
|
||
By removing "Re:", subjects in mailbox DB are already different from the
original message. I believe this decision was made for performance. Since
subject in mailbox DB doesn't have annoying "Re:" for sorting, sorting function
is able to perform simple comparison against subjects
in DB.
For the same reason, you don't want to decode MIME encoded-word everytime user
changes sorting order, or cool subject search. Probably, we should do the same
for sender's address for the same reason, but that's a different story.
Also, applying MIME encode is relatively expensive thing since it has to deal
with folding. Unless there is a piece of code which doesn't like UTF-8 subject,
this effort is just waste of time because it eventually gets decoded.
As for compatibility issue, this is what pre 1.0 releases are for, I believe.
I'm very confident that my change won't break any because I haven't touched any
other code which deals with subject from mailbox DB. In theory, everything
should work since existing code automatically decodes it if it's MIME
encoded-word, does nothing if it's not MIME encoded-word (e.g. ASCII or UTF-8).
Comment 19•23 years ago
|
||
There are already millions of users of this code. And this statement confuses me:
"I haven't touched any other code which deals with subject from mailbox DB" If
you're changing the format of what we store in the db, I would be more reassured
if you had touched some of the other code that deals with the subject from the
mailbox db. If you're sure that all possible users of the subject field in the
db don't care if it's MIME encoded or not, then I guess you could do this, but
if any user does care, then we'd have to upgrade all the db's out there, and I
don't want to do that.
Comment 20•23 years ago
|
||
We already have subject in two different format, ASCII and MIME encoded-word,
and there is a piece of code which knows how to deal with them. You don't see
MIME encoded-word in UI because the code is smart enough to recognize these two
different formats and do whatever needs to be done. Through out Mozilla code
base, UTF-8 is natural extension of ASCII. That's why I didn't have to touch
any other code to deal with UTF-8 subject. The code has already been here, why
don't we use it?
If, only if, there are compatibility issues, they are in our code, not in DB.
Thus, no DB upgrade needed for existing users. We must make sure that there is
no problem in our code (not in DB) in 0.9.9 release, though. That's what I
meant by saying "this is what pre 1.0 releases are for". (Assuming we don't
recommend Netscape 6.x users to install Mozilla 0.9.9 on top of it :-)
Comment 21•23 years ago
|
||
I understand that the db is agnostic about the encoding - it will just
faithfully report back whatever's put in it. If I understand correctly, all the
current clients of the subject field expect that data to be mime2 encoded, or
ascii, and you're saying that mime2 decoding a string that's already been mime2
decoded is a noop. That's what would have to be true for callers not to care. If
that's true, then it should be OK - it still makes me nervous, however, because
if there's a chance it's not true, all hell will break loose, if you'll pardon
my french :-)
a couple other comments -
1. you might as well remove the #if 0 code.
2. Linking in libmime to msgbase is a bit evil and not very com-friendly (JF,
what do you think?). It will make msgbaseutl bigger just by including the link
defs of libmime...
Comment 22•23 years ago
|
||
I'm not good at COM stuff. Could someone step in and get rid of libmime
dependency?
Attachment #67476 -
Attachment is obsolete: true
Assignee | ||
Comment 23•23 years ago
|
||
You can use nsIMimeConverter to avoid linking libmime.
>mime2 decoding a string that's already been mime2 decoded is a noop.
The decoder still checks if the string is valid UTF-8 but I assume the check is
very fast.
Comment 25•23 years ago
|
||
merged the patch with the patch for 73403. see bug 73403.
Status: NEW → ASSIGNED
Comment 26•23 years ago
|
||
OS/Platform independent. Set TM to 0.9.9.
OS: Windows NT → All
Hardware: PC → All
Target Milestone: --- → mozilla0.9.9
Comment 27•23 years ago
|
||
Attachment #67505 -
Attachment is obsolete: true
Comment 28•23 years ago
|
||
remove dependency. to me, this is at least P2 bug since OE5.x encodes whole
subject including "Re:". raise priority from P3 to P2.
No longer depends on: 73403
Priority: P3 → P2
Assignee | ||
Comment 29•23 years ago
|
||
There is charset override feature which is for the messages with incorrect MIME
charset. If we store MIME decoded UTF-8 in db then the override would not be
possible. That has to be taken care. Is that possilbe to MIME encode after
removing "Re" (is that what the first patch id=67365 does)?
Comment 30•23 years ago
|
||
Is there any test case for that scenario?
Assignee | ||
Comment 31•23 years ago
|
||
IQA, could you attach the test case for incorrect MIME?
Comment 32•23 years ago
|
||
Comment 33•23 years ago
|
||
I think we are talking about scenario *with* MIME encoding, but wrong charset.
Comment 34•23 years ago
|
||
Comment 35•23 years ago
|
||
FYI. Regarding the test case posted as id=68174, charset override may not
work as expected if user is using IMAP server due to the fact that latest IMAP
server implementations create body structure upon message delivery. This cannot
be done correctly if non ASCII header exists w/o MIME encoding.
Comment 36•23 years ago
|
||
Do you mean that the charset override for messages w/o MIME header will be
broken if connected to the new version of Netscape Messenger server?
Comment 37•23 years ago
|
||
Doesn't have to be Netscape server. It depends on how IMAP server stores and
represents body structure information.
Comment 38•23 years ago
|
||
The test case posted as id=68196, it's a bit unrealistic to me.
I've never seen such MUA sending ISO-2022-JP with Shift_JIS charset name.
Sending Q-encoded Shift_JIS with ISO-8859-1 charset name seems
more realistic.
Anyways, that's true that charset overriding no longer works
once we switch to UTF-8 in mailbox db, thus we need to re-encode
subject in NS_MsgStripeRE(). This implies that some performance
decrease may occur when message thread pane gets updated.
Assignee | ||
Comment 39•23 years ago
|
||
Can you re-encode only if the decoded header contains "Re:"?
Comment 40•23 years ago
|
||
No prob.
For those who lives in ideal world (i.e. everybody sends RFC-2047 compliant
message), I will introduce new pref variable (most likely
"mailnews.summary.utf8") so that we can completely avoid re-encoding subject.
When it's set, sender name should also be decoded.
Comment 41•23 years ago
|
||
I'm getting the impression that doing the decoding and sticking the decoded
string in the database is data loss, which is exactly what I wanted to avoid by
leaving the string "as is" from the mail message. Just to make sure, we're not
going to do that, right? And, the only additional performance hit we're taking
is when parsing message headers that are mime-encoded? We're not taking any
additional hit when displaying headers in the thread pane that we weren't taking
before?
Comment 42•23 years ago
|
||
> Just to make sure, we're not going to do that, right?
By default, yes.
As option, there will be a new pref which prevents decoded subject from
re-encoding. For thoese who uses IMAP from work, home, and on the road, this
new pref will be a big help as far as there is virtually no message with busted
subject.
Comment 43•23 years ago
|
||
nhotta is working on related bugs and we both agree to reassign this to nhotta.
Assignee: taka → nhotta
Status: ASSIGNED → NEW
Comment 44•23 years ago
|
||
we have the same problem in 4.x
we should find out what OutlookExpress do.
keep it as nsbeta1 (not + nor -) for now.
Assignee | ||
Comment 45•23 years ago
|
||
OE 5 recognizes the encoded 'Re:'.
I tested using the attached data.
How about Eudora? I don't have that.
Comment 46•23 years ago
|
||
In Comment #44 Frank Tang wrote:
> we have the same problem in 4.x
AFAIK, the correct statement is "we used to have", not "we have".
Comment 47•23 years ago
|
||
Eudora can also recognize the encoded 'Re:'. Tested with Eudora 4.3-J.
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.9 → mozilla1.0
Assignee | ||
Comment 49•23 years ago
|
||
I think we only need to do this if the header is MIME encoded including "Re:".
If "Re:" is not encoded then no need to do the decode/encode. I need to learn
the code before doing the change.
Comment 50•23 years ago
|
||
You cannot decide not to decode MIME encoded portion by looking at "Re:" at the
eginning of subject because the current Mozilla code blindly adds "Re:" when
replying regardless of the rest of subject. That's causing multiple "Re:".
Assignee | ||
Comment 51•23 years ago
|
||
But NS_MsgStripRE doesn't seem to be adding "Re:".
Comment 52•23 years ago
|
||
We don't blindly add Re: to every reply - if I reply to a reply to a reply, it
doesn't end up with Re: Re: Re:. I think what Takayuki is saying is that if the
Re: encoded, we don't notice that. So, obviously (I think), if we want this to
work correctly, we would store the encoded subject exactly as is but w/o the Re:
and set the Re: flag on the header, like we do for headers where Re: is not encoded.
Assignee | ||
Comment 53•23 years ago
|
||
>and set the Re: flag on the header, like we do for headers where Re: is not
>encoded.
Does that done by the callers of NS_MsgStripRE? And they are replying on the
result of NS_MsgStripRE?
Comment 54•23 years ago
|
||
Naoki, it's not "done" by anyone; it just works naturally. In the db, we store
the subject w/o the "Re:" (I guess we stip off the re:with NS_MsgStripRe, if
that's what you mean, so yes). Then, the reply code sticks a "Re:" in front of
the subject to do the reply.
Assignee | ||
Comment 55•23 years ago
|
||
Thanks. One more question. Does the function strips off more than one "Re:" or
just the first one? I don't see a reason for stripping off more than one.
Comment 56•23 years ago
|
||
Naoki, I can't claim to be an expert in that code, but it looks to me like it
attempts to strip off all Re:'s and a few other things. The reason it does this,
I assume, is that other mail and news clients might end up attaching multiple
"Re:'s" so we strip off all we see. This code is run on all messages, not just
messages sent by Netscape clients :-)
Assignee | ||
Comment 57•23 years ago
|
||
Thanks, then I need to look for "Re:" for encoded subjects even "Re:" is already
found in the non encoded part (cannot do as my comment #49).
Assignee | ||
Comment 58•23 years ago
|
||
I applied the initial patch (which encodes again after strip off "Re:", id=67365).
Then all of MIME encoded headers are displayed "Re:" prepended in thread pane.
That is thread pane only, they are shown without "Re:" in evelop view.
Assignee | ||
Comment 59•23 years ago
|
||
The problem is that it sets the return result to true after encoding the string.
The result is already set in the previous section, so no need to reset. Also, if
the result is not true (i.e. no "Re:" stripped) then no need to encode the header.
Assignee | ||
Comment 60•23 years ago
|
||
Attachment #68141 -
Attachment is obsolete: true
Assignee | ||
Comment 61•23 years ago
|
||
David, could you 'r' (or 'sr')?
Comment 62•23 years ago
|
||
+ p1 += sizeof("=?")-1;
is this really what you want? Don't you really just want p1++? or p1+= strlen("=?"
)-1; ?
Assignee | ||
Comment 63•23 years ago
|
||
It needs to proceed by 2 bytes for "=?" after strstr search and point to "=?",
then it has to look for another "?" to retrieve the charset name (e.g.,
=?ISO-8859-1?).
Comment 64•23 years ago
|
||
are you expecting sizeof("?=") to return 4, 3, or 2?
Assignee | ||
Comment 65•23 years ago
|
||
3
Comment 66•23 years ago
|
||
sizeof is not as elegant than strlen but way more efficient...
Comment 67•23 years ago
|
||
Comment on attachment 71964 [details] [diff] [review]
Decode MIME encoded header before parsing for "Re:".
r=bienvenu - you might want to add a comment to make it more readable.
Attachment #71964 -
Flags: review+
Assignee | ||
Comment 68•23 years ago
|
||
Attachment #71964 -
Attachment is obsolete: true
Comment 69•23 years ago
|
||
Comment on attachment 72706 [details] [diff] [review]
Added comments for the charset name parsing code.
1) instead of
if (s - decodedString.get())
why not
if (s != decodedString.get())
which seems clearer to me.
2)
instead of sizeof("Subject: ")-1
why not
sizeof("Subject:")
3)
good job on protecting the buffer overrun of charset[].
curious, what happens if we pass in "" as the charset? what will
EncodeMimePartIIStr_UTF8() do? (hopefully not crash)
address those issues, and sr=sspitzer
Attachment #72706 -
Flags: superreview+
Assignee | ||
Comment 70•23 years ago
|
||
Attachment #72706 -
Attachment is obsolete: true
Assignee | ||
Comment 71•23 years ago
|
||
Comment on attachment 73043 [details] [diff] [review]
Address the super-reviewer's comment. When the charset is empty the encoder returns an error.
carry 'r' 'sr'
Attachment #73043 -
Flags: superreview+
Attachment #73043 -
Flags: review+
Comment 72•23 years ago
|
||
Comment on attachment 73043 [details] [diff] [review]
Address the super-reviewer's comment. When the charset is empty the encoder returns an error.
a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #73043 -
Flags: approval+
Assignee | ||
Updated•23 years ago
|
Priority: P2 → P4
Assignee | ||
Comment 73•23 years ago
|
||
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 24 years ago → 23 years ago
Resolution: --- → FIXED
Updated•20 years ago
|
Product: MailNews → Core
Updated•17 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•