Closed Bug 21267 Opened 20 years ago Closed 18 years ago
When "Re:" is encoded, the sort order is not correct
21.67 KB, application/x-zip-compressed
17.46 KB, application/octet-stream
603 bytes, application/octet-stream
2.66 KB, patch
|Details | Diff | Splinter Review|
** Observed with 12/8/99 Win32 build ** This bug was originally filed for 4.x recently but it also applies to Mozilla. There is also an issue as to whether or not this is the receiver problem as opposed to the sender's, but it is worth considering. The problem in nutshell is this: "If you have an encoded header which includes "Re: .." and "Re: .." itself is encoded as sometimes may happen in Japanese B64-encoded headers, then when we sort this, we put it along with the other alphabetic headers rather than with the headers which contain the nearest Japanese string match, i.e. the nearest match to the Japanese portion after "Re:". To see this problem, use the attached zip file which contain the Japanese sort data. I added two messages for this prupose, 2 headers when decoded are identical. Subject: =?iso-2022-jp?B?UmU6IBskQiRKJHMkQCQrJEokISFKNmw+UCFLGyhC?= Subject: Re: =?iso-2022-jp?B?GyRCJEokcyRAJCskSiQhIUo2bD5QIUsbKEI=?= The difference is that in the first one, the whole string is MIME-encoded and this causes the sort order to be less than optimal. Apparently, we deal with "Re:" only in one place -- before MIME-decoding. In the test file, you see 2 headers which begins with "Re:". The Japanese strings which follows "Re:" are identical and therefore these 2 should be side by side but they are not. The 2nd one is where both of them should be. The 4.x bug number is: 379726.
Please save the attached file as: jpnsort.zip
I believe you are talking about the headers being displayed in the thread pane, which is Scott's area. Let me know if I'm missing something here. - rhp
cc'ing bienvenu. I attach "Re:" if the reply flag is set. I don't think that happens just because "Re:" is in the subject. I might be wrong about that though.
If this bug is saying we should parse for Re: after mime decoding, I guess that would be mine. No promises, though, since I don't know how hard it would be to do.
To clarify, this is a display bug in the thread pane. The end results we want is that 2 msgs with identical strings in JPN (when decoded) should be displayed side by side rather than apart as we do now.
QA contact to marina.
QA Contact: momoi → marina
Retested with 05/31 build. The sorting order is correct now, the reply msg shows up following the original msg.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
** 6/1/2001 Win32 Trunk & Branch builds ** The problem is not fixed with the above builds and as far as I know, no work has been done in this area to fix the problem. Please re-test by clicking on the "subject" column if the test mail folder. Re-opening ...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This patch also fixes bug 79848.
Depends on: 73403
someone like Naoki should r this and I can sr it. thx.
Taka, could you put a description of the change?
The cause of both bug 21267 and bug 79848 is in NS_MsgStripRE() in base/util/nsMsgUtils.cpp. It strips "Re:" from given string blindly, it doesn't care if the given string is a MIME encoded-word. The proposed fix calls MIME decoder if the string looks like MIME encoded-word, strips "Re:", and encode the result back to MIME encoded-word. The last step might be unnecessary. Let me try without encoding back and see how it works.
Tried without applying MIME encode after striping "Re:", and it looks working fine. As a result, Subject line in mailbox DB will be stored in straight UTF-8 string instead of MIME encoded-word. This might cause compatibility issue if existing user upgrades to a binary with this patch. So far, sorting and other things are working fine, though.
I would rather store the header in the db as close to possible as it occurs in the actual message, and I'd rather not deal with compatibility issues, so I'd prefer that you leave it MIME encoded, if possible. Is there any reason to change that?
By removing "Re:", subjects in mailbox DB are already different from the original message. I believe this decision was made for performance. Since subject in mailbox DB doesn't have annoying "Re:" for sorting, sorting function is able to perform simple comparison against subjects in DB. For the same reason, you don't want to decode MIME encoded-word everytime user changes sorting order, or cool subject search. Probably, we should do the same for sender's address for the same reason, but that's a different story. Also, applying MIME encode is relatively expensive thing since it has to deal with folding. Unless there is a piece of code which doesn't like UTF-8 subject, this effort is just waste of time because it eventually gets decoded. As for compatibility issue, this is what pre 1.0 releases are for, I believe. I'm very confident that my change won't break any because I haven't touched any other code which deals with subject from mailbox DB. In theory, everything should work since existing code automatically decodes it if it's MIME encoded-word, does nothing if it's not MIME encoded-word (e.g. ASCII or UTF-8).
There are already millions of users of this code. And this statement confuses me: "I haven't touched any other code which deals with subject from mailbox DB" If you're changing the format of what we store in the db, I would be more reassured if you had touched some of the other code that deals with the subject from the mailbox db. If you're sure that all possible users of the subject field in the db don't care if it's MIME encoded or not, then I guess you could do this, but if any user does care, then we'd have to upgrade all the db's out there, and I don't want to do that.
We already have subject in two different format, ASCII and MIME encoded-word, and there is a piece of code which knows how to deal with them. You don't see MIME encoded-word in UI because the code is smart enough to recognize these two different formats and do whatever needs to be done. Through out Mozilla code base, UTF-8 is natural extension of ASCII. That's why I didn't have to touch any other code to deal with UTF-8 subject. The code has already been here, why don't we use it? If, only if, there are compatibility issues, they are in our code, not in DB. Thus, no DB upgrade needed for existing users. We must make sure that there is no problem in our code (not in DB) in 0.9.9 release, though. That's what I meant by saying "this is what pre 1.0 releases are for". (Assuming we don't recommend Netscape 6.x users to install Mozilla 0.9.9 on top of it :-)
I understand that the db is agnostic about the encoding - it will just faithfully report back whatever's put in it. If I understand correctly, all the current clients of the subject field expect that data to be mime2 encoded, or ascii, and you're saying that mime2 decoding a string that's already been mime2 decoded is a noop. That's what would have to be true for callers not to care. If that's true, then it should be OK - it still makes me nervous, however, because if there's a chance it's not true, all hell will break loose, if you'll pardon my french :-) a couple other comments - 1. you might as well remove the #if 0 code. 2. Linking in libmime to msgbase is a bit evil and not very com-friendly (JF, what do you think?). It will make msgbaseutl bigger just by including the link defs of libmime...
I'm not good at COM stuff. Could someone step in and get rid of libmime dependency?
Attachment #67476 - Attachment is obsolete: true
You can use nsIMimeConverter to avoid linking libmime. >mime2 decoding a string that's already been mime2 decoded is a noop. The decoder still checks if the string is valid UTF-8 but I assume the check is very fast.
reassign to myself
Assignee: bienvenu → taka
Status: REOPENED → NEW
merged the patch with the patch for 73403. see bug 73403.
Status: NEW → ASSIGNED
OS/Platform independent. Set TM to 0.9.9.
OS: Windows NT → All
Hardware: PC → All
Target Milestone: --- → mozilla0.9.9
Per email@example.com 's request, split a patch into two, one for bug 73403, one for bug 21267.
Attachment #67505 - Attachment is obsolete: true
remove dependency. to me, this is at least P2 bug since OE5.x encodes whole subject including "Re:". raise priority from P3 to P2.
No longer depends on: 73403
Priority: P3 → P2
There is charset override feature which is for the messages with incorrect MIME charset. If we store MIME decoded UTF-8 in db then the override would not be possible. That has to be taken care. Is that possilbe to MIME encode after removing "Re" (is that what the first patch id=67365 does)?
Is there any test case for that scenario?
IQA, could you attach the test case for incorrect MIME?
I think we are talking about scenario *with* MIME encoding, but wrong charset.
FYI. Regarding the test case posted as id=68174, charset override may not work as expected if user is using IMAP server due to the fact that latest IMAP server implementations create body structure upon message delivery. This cannot be done correctly if non ASCII header exists w/o MIME encoding.
Do you mean that the charset override for messages w/o MIME header will be broken if connected to the new version of Netscape Messenger server?
Doesn't have to be Netscape server. It depends on how IMAP server stores and represents body structure information.
The test case posted as id=68196, it's a bit unrealistic to me. I've never seen such MUA sending ISO-2022-JP with Shift_JIS charset name. Sending Q-encoded Shift_JIS with ISO-8859-1 charset name seems more realistic. Anyways, that's true that charset overriding no longer works once we switch to UTF-8 in mailbox db, thus we need to re-encode subject in NS_MsgStripeRE(). This implies that some performance decrease may occur when message thread pane gets updated.
Can you re-encode only if the decoded header contains "Re:"?
No prob. For those who lives in ideal world (i.e. everybody sends RFC-2047 compliant message), I will introduce new pref variable (most likely "mailnews.summary.utf8") so that we can completely avoid re-encoding subject. When it's set, sender name should also be decoded.
I'm getting the impression that doing the decoding and sticking the decoded string in the database is data loss, which is exactly what I wanted to avoid by leaving the string "as is" from the mail message. Just to make sure, we're not going to do that, right? And, the only additional performance hit we're taking is when parsing message headers that are mime-encoded? We're not taking any additional hit when displaying headers in the thread pane that we weren't taking before?
> Just to make sure, we're not going to do that, right? By default, yes. As option, there will be a new pref which prevents decoded subject from re-encoding. For thoese who uses IMAP from work, home, and on the road, this new pref will be a big help as far as there is virtually no message with busted subject.
nhotta is working on related bugs and we both agree to reassign this to nhotta.
Assignee: taka → nhotta
Status: ASSIGNED → NEW
we have the same problem in 4.x we should find out what OutlookExpress do. keep it as nsbeta1 (not + nor -) for now.
OE 5 recognizes the encoded 'Re:'. I tested using the attached data. How about Eudora? I don't have that.
In Comment #44 Frank Tang wrote: > we have the same problem in 4.x AFAIK, the correct statement is "we used to have", not "we have".
Eudora can also recognize the encoded 'Re:'. Tested with Eudora 4.3-J.
I think we only need to do this if the header is MIME encoded including "Re:". If "Re:" is not encoded then no need to do the decode/encode. I need to learn the code before doing the change.
You cannot decide not to decode MIME encoded portion by looking at "Re:" at the eginning of subject because the current Mozilla code blindly adds "Re:" when replying regardless of the rest of subject. That's causing multiple "Re:".
But NS_MsgStripRE doesn't seem to be adding "Re:".
We don't blindly add Re: to every reply - if I reply to a reply to a reply, it doesn't end up with Re: Re: Re:. I think what Takayuki is saying is that if the Re: encoded, we don't notice that. So, obviously (I think), if we want this to work correctly, we would store the encoded subject exactly as is but w/o the Re: and set the Re: flag on the header, like we do for headers where Re: is not encoded.
>and set the Re: flag on the header, like we do for headers where Re: is not >encoded. Does that done by the callers of NS_MsgStripRE? And they are replying on the result of NS_MsgStripRE?
Naoki, it's not "done" by anyone; it just works naturally. In the db, we store the subject w/o the "Re:" (I guess we stip off the re:with NS_MsgStripRe, if that's what you mean, so yes). Then, the reply code sticks a "Re:" in front of the subject to do the reply.
Thanks. One more question. Does the function strips off more than one "Re:" or just the first one? I don't see a reason for stripping off more than one.
Naoki, I can't claim to be an expert in that code, but it looks to me like it attempts to strip off all Re:'s and a few other things. The reason it does this, I assume, is that other mail and news clients might end up attaching multiple "Re:'s" so we strip off all we see. This code is run on all messages, not just messages sent by Netscape clients :-)
Thanks, then I need to look for "Re:" for encoded subjects even "Re:" is already found in the non encoded part (cannot do as my comment #49).
I applied the initial patch (which encodes again after strip off "Re:", id=67365). Then all of MIME encoded headers are displayed "Re:" prepended in thread pane. That is thread pane only, they are shown without "Re:" in evelop view.
The problem is that it sets the return result to true after encoding the string. The result is already set in the previous section, so no need to reset. Also, if the result is not true (i.e. no "Re:" stripped) then no need to encode the header.
David, could you 'r' (or 'sr')?
+ p1 += sizeof("=?")-1; is this really what you want? Don't you really just want p1++? or p1+= strlen("=?" )-1; ?
It needs to proceed by 2 bytes for "=?" after strstr search and point to "=?", then it has to look for another "?" to retrieve the charset name (e.g., =?ISO-8859-1?).
are you expecting sizeof("?=") to return 4, 3, or 2?
sizeof is not as elegant than strlen but way more efficient...
Comment on attachment 71964 [details] [diff] [review] Decode MIME encoded header before parsing for "Re:". r=bienvenu - you might want to add a comment to make it more readable.
Attachment #71964 - Flags: review+
Comment on attachment 72706 [details] [diff] [review] Added comments for the charset name parsing code. 1) instead of if (s - decodedString.get()) why not if (s != decodedString.get()) which seems clearer to me. 2) instead of sizeof("Subject: ")-1 why not sizeof("Subject:") 3) good job on protecting the buffer overrun of charset. curious, what happens if we pass in "" as the charset? what will EncodeMimePartIIStr_UTF8() do? (hopefully not crash) address those issues, and sr=sspitzer
Attachment #72706 - Flags: superreview+
Comment on attachment 73043 [details] [diff] [review] Address the super-reviewer's comment. When the charset is empty the encoder returns an error. carry 'r' 'sr'
Comment on attachment 73043 [details] [diff] [review] Address the super-reviewer's comment. When the charset is empty the encoder returns an error. a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #73043 - Flags: approval+
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 19 years ago → 18 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.