Closed Bug 21267 Opened 25 years ago Closed 23 years ago

When "Re:" is encoded, the sort order is not correct

Tracking

(Not tracked)

Status:

VERIFIED FIXED

Milestone:

mozilla1.0

People

(Reporter: momoi, Assigned: nhottanscp)

References

Details

(Keywords: intl)

Attachments

(4 files, 6 obsolete files)

Japanese sort data containing 2 "Re:" headers 25 years ago Katsuhiko Momoi 21.67 KB, application/x-zip-compressed		Details
proposed fix 23 years ago Takayuki Tei 3.58 KB, patch		Details \| Diff \| Splinter Review
revised proposed patch without MIME encoding 23 years ago Takayuki Tei 4.19 KB, patch		Details \| Diff \| Splinter Review
revised. no more #if 0 23 years ago Takayuki Tei 6.10 KB, patch		Details \| Diff \| Splinter Review
proposed patch 23 years ago Takayuki Tei 7.03 KB, patch		Details \| Diff \| Splinter Review
A zip file of a mailbox containing 3 Ja messages w/o MIME. 23 years ago ji 17.46 KB, application/octet-stream		Details
A zip file of a mailbox containing two iso-2022-jp mails with charset label set to Shift_JIS 23 years ago ji 603 bytes, application/octet-stream		Details
Decode MIME encoded header before parsing for "Re:". 23 years ago nhottanscp 2.62 KB, patch	Bienvenu : review+	Details \| Diff \| Splinter Review
Added comments for the charset name parsing code. 23 years ago nhottanscp 2.67 KB, patch	sspitzer : superreview+	Details \| Diff \| Splinter Review
Address the super-reviewer's comment. When the charset is empty the encoder returns an error. 23 years ago nhottanscp 2.66 KB, patch	nhottanscp : review+ nhottanscp : superreview+ asa : approval+	Details \| Diff \| Splinter Review

Katsuhiko Momoi

Reporter

Description

•

25 years ago

** Observed with 12/8/99 Win32 build ** This bug was originally filed for 4.x recently but it also applies to Mozilla. There is also an issue as to whether or not this is the receiver problem as opposed to the sender's, but it is worth considering. The problem in nutshell is this: "If you have an encoded header which includes "Re: .." and "Re: .." itself is encoded as sometimes may happen in Japanese B64-encoded headers, then when we sort this, we put it along with the other alphabetic headers rather than with the headers which contain the nearest Japanese string match, i.e. the nearest match to the Japanese portion after "Re:". To see this problem, use the attached zip file which contain the Japanese sort data. I added two messages for this prupose, 2 headers when decoded are identical. Subject: =?iso-2022-jp?B?UmU6IBskQiRKJHMkQCQrJEokISFKNmw+UCFLGyhC?= Subject: Re: =?iso-2022-jp?B?GyRCJEokcyRAJCskSiQhIUo2bD5QIUsbKEI=?= The difference is that in the first one, the whole string is MIME-encoded and this causes the sort order to be less than optimal. Apparently, we deal with "Re:" only in one place -- before MIME-decoding. In the test file, you see 2 headers which begins with "Re:". The Japanese strings which follows "Re:" are identical and therefore these 2 should be side by side but they are not. The 2nd one is where both of them should be. The 4.x bug number is: 379726.

Katsuhiko Momoi

Reporter

Comment 1

•

25 years ago

Attached file Japanese sort data containing 2 "Re:" headers — Details

Katsuhiko Momoi

Reporter

Comment 2

•

25 years ago

Please save the attached file as: jpnsort.zip

rhp (gone)

Updated

•

25 years ago

Assignee: rhp → putterman

rhp (gone)

Comment 3

•

25 years ago

I believe you are talking about the headers being displayed in the thread pane, which is Scott's area. Let me know if I'm missing something here. - rhp

scottputterman

Comment 4

•

25 years ago

cc'ing bienvenu. I attach "Re:" if the reply flag is set. I don't think that happens just because "Re:" is in the subject. I might be wrong about that though.

David :Bienvenu

Updated

•

25 years ago

Assignee: putterman → bienvenu

David :Bienvenu

Comment 5

•

25 years ago

If this bug is saying we should parse for Re: after mime decoding, I guess that would be mine. No promises, though, since I don't know how hard it would be to do.

Katsuhiko Momoi

Reporter

Comment 6

•

25 years ago

To clarify, this is a display bug in the thread pane. The end results we want is that 2 msgs with identical strings in JPN (when decoded) should be displayed side by side rather than apart as we do now.

David :Bienvenu

Updated

•

25 years ago

Status: NEW → ASSIGNED

Target Milestone: M20

lchiang

Updated

•

25 years ago

QA Contact: lchiang → momoi

Katsuhiko Momoi

Reporter

Updated

•

24 years ago

Keywords: intl

Katsuhiko Momoi

Reporter

Updated

•

24 years ago

Severity: major → normal

Katsuhiko Momoi

Reporter

Comment 7

•

24 years ago

QA contact to marina.

QA Contact: momoi → marina

Comment 8

•

24 years ago

Retested with 05/31 build. The sorting order is correct now, the reply msg shows up following the original msg.

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Katsuhiko Momoi

Reporter

Comment 9

•

24 years ago

** 6/1/2001 Win32 Trunk & Branch builds ** The problem is not fixed with the above builds and as far as I know, no work has been done in this area to fix the problem. Please re-test by clicking on the "subject" column if the test mail folder. Re-opening ...

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Takayuki Tei

Comment 10

•

23 years ago

Attached patch proposed fix (obsolete) — Details — Splinter Review

This patch also fixes bug 79848.

Takayuki Tei

Comment 11

•

23 years ago

add dependency

Depends on: 73403

David :Bienvenu

Comment 12

•

23 years ago

someone like Naoki should r this and I can sr it. thx.

nhottanscp

Assignee

Comment 13

•

23 years ago

Taka, could you put a description of the change?

Takayuki Tei

Comment 14

•

23 years ago

The cause of both bug 21267 and bug 79848 is in NS_MsgStripRE() in base/util/nsMsgUtils.cpp. It strips "Re:" from given string blindly, it doesn't care if the given string is a MIME encoded-word. The proposed fix calls MIME decoder if the string looks like MIME encoded-word, strips "Re:", and encode the result back to MIME encoded-word. The last step might be unnecessary. Let me try without encoding back and see how it works.

Takayuki Tei

Comment 15

•

23 years ago

Tried without applying MIME encode after striping "Re:", and it looks working fine. As a result, Subject line in mailbox DB will be stored in straight UTF-8 string instead of MIME encoded-word. This might cause compatibility issue if existing user upgrades to a binary with this patch. So far, sorting and other things are working fine, though.

Takayuki Tei

Comment 16

•

23 years ago

Attached patch revised proposed patch without MIME encoding (obsolete) — Details — Splinter Review

Takayuki Tei

Updated

•

23 years ago

Attachment #67365 - Attachment is obsolete: true

David :Bienvenu

Comment 17

•

23 years ago

I would rather store the header in the db as close to possible as it occurs in the actual message, and I'd rather not deal with compatibility issues, so I'd prefer that you leave it MIME encoded, if possible. Is there any reason to change that?

Takayuki Tei

Comment 18

•

23 years ago

By removing "Re:", subjects in mailbox DB are already different from the original message. I believe this decision was made for performance. Since subject in mailbox DB doesn't have annoying "Re:" for sorting, sorting function is able to perform simple comparison against subjects in DB. For the same reason, you don't want to decode MIME encoded-word everytime user changes sorting order, or cool subject search. Probably, we should do the same for sender's address for the same reason, but that's a different story. Also, applying MIME encode is relatively expensive thing since it has to deal with folding. Unless there is a piece of code which doesn't like UTF-8 subject, this effort is just waste of time because it eventually gets decoded. As for compatibility issue, this is what pre 1.0 releases are for, I believe. I'm very confident that my change won't break any because I haven't touched any other code which deals with subject from mailbox DB. In theory, everything should work since existing code automatically decodes it if it's MIME encoded-word, does nothing if it's not MIME encoded-word (e.g. ASCII or UTF-8).

David :Bienvenu

Comment 19

•

23 years ago

There are already millions of users of this code. And this statement confuses me: "I haven't touched any other code which deals with subject from mailbox DB" If you're changing the format of what we store in the db, I would be more reassured if you had touched some of the other code that deals with the subject from the mailbox db. If you're sure that all possible users of the subject field in the db don't care if it's MIME encoded or not, then I guess you could do this, but if any user does care, then we'd have to upgrade all the db's out there, and I don't want to do that.

Takayuki Tei

Comment 20

•

23 years ago

We already have subject in two different format, ASCII and MIME encoded-word, and there is a piece of code which knows how to deal with them. You don't see MIME encoded-word in UI because the code is smart enough to recognize these two different formats and do whatever needs to be done. Through out Mozilla code base, UTF-8 is natural extension of ASCII. That's why I didn't have to touch any other code to deal with UTF-8 subject. The code has already been here, why don't we use it? If, only if, there are compatibility issues, they are in our code, not in DB. Thus, no DB upgrade needed for existing users. We must make sure that there is no problem in our code (not in DB) in 0.9.9 release, though. That's what I meant by saying "this is what pre 1.0 releases are for". (Assuming we don't recommend Netscape 6.x users to install Mozilla 0.9.9 on top of it :-)

David :Bienvenu

Comment 21

•

23 years ago

I understand that the db is agnostic about the encoding - it will just faithfully report back whatever's put in it. If I understand correctly, all the current clients of the subject field expect that data to be mime2 encoded, or ascii, and you're saying that mime2 decoding a string that's already been mime2 decoded is a noop. That's what would have to be true for callers not to care. If that's true, then it should be OK - it still makes me nervous, however, because if there's a chance it's not true, all hell will break loose, if you'll pardon my french :-) a couple other comments - 1. you might as well remove the #if 0 code. 2. Linking in libmime to msgbase is a bit evil and not very com-friendly (JF, what do you think?). It will make msgbaseutl bigger just by including the link defs of libmime...

Takayuki Tei

Comment 22

•

23 years ago

Attached patch revised. no more #if 0 (obsolete) — Details — Splinter Review

I'm not good at COM stuff. Could someone step in and get rid of libmime dependency?

Attachment #67476 - Attachment is obsolete: true

nhottanscp

Assignee

Comment 23

•

23 years ago

You can use nsIMimeConverter to avoid linking libmime. >mime2 decoding a string that's already been mime2 decoded is a noop. The decoder still checks if the string is valid UTF-8 but I assume the check is very fast.

Takayuki Tei

Comment 24

•

23 years ago

reassign to myself

Assignee: bienvenu → taka

Status: REOPENED → NEW

Takayuki Tei

Comment 25

•

23 years ago

merged the patch with the patch for 73403. see bug 73403.

Status: NEW → ASSIGNED

Takayuki Tei

Comment 26

•

23 years ago

OS/Platform independent. Set TM to 0.9.9.

OS: Windows NT → All

Hardware: PC → All

Target Milestone: --- → mozilla0.9.9

Takayuki Tei

Updated

•

23 years ago

Blocks: 79848

Takayuki Tei

Comment 27

•

23 years ago

Attached patch proposed patch (obsolete) — Details — Splinter Review

Per nhotta@netscape.com 's request, split a patch into two, one for bug 73403, one for bug 21267.

Attachment #67505 - Attachment is obsolete: true

Takayuki Tei

Comment 28

•

23 years ago

remove dependency. to me, this is at least P2 bug since OE5.x encodes whole subject including "Re:". raise priority from P3 to P2.

No longer depends on: 73403

Priority: P3 → P2

nhottanscp

Assignee

Comment 29

•

23 years ago

There is charset override feature which is for the messages with incorrect MIME charset. If we store MIME decoded UTF-8 in db then the override would not be possible. That has to be taken care. Is that possilbe to MIME encode after removing "Re" (is that what the first patch id=67365 does)?

Takayuki Tei

Comment 30

•

23 years ago

Is there any test case for that scenario?

nhottanscp

Assignee

Comment 31

•

23 years ago

IQA, could you attach the test case for incorrect MIME?

Comment 32

•

23 years ago

Attached file A zip file of a mailbox containing 3 Ja messages w/o MIME. — Details

Takayuki Tei

Comment 33

•

23 years ago

I think we are talking about scenario *with* MIME encoding, but wrong charset.

Comment 34

•

23 years ago

Attached file A zip file of a mailbox containing two iso-2022-jp mails with charset label set to Shift_JIS — Details

Takayuki Tei

Comment 35

•

23 years ago

FYI. Regarding the test case posted as id=68174, charset override may not work as expected if user is using IMAP server due to the fact that latest IMAP server implementations create body structure upon message delivery. This cannot be done correctly if non ASCII header exists w/o MIME encoding.

Comment 36

•

23 years ago

Do you mean that the charset override for messages w/o MIME header will be broken if connected to the new version of Netscape Messenger server?

Takayuki Tei

Comment 37

•

23 years ago

Doesn't have to be Netscape server. It depends on how IMAP server stores and represents body structure information.

Takayuki Tei

Comment 38

•

23 years ago

The test case posted as id=68196, it's a bit unrealistic to me. I've never seen such MUA sending ISO-2022-JP with Shift_JIS charset name. Sending Q-encoded Shift_JIS with ISO-8859-1 charset name seems more realistic. Anyways, that's true that charset overriding no longer works once we switch to UTF-8 in mailbox db, thus we need to re-encode subject in NS_MsgStripeRE(). This implies that some performance decrease may occur when message thread pane gets updated.

nhottanscp

Assignee

Comment 39

•

23 years ago

Can you re-encode only if the decoded header contains "Re:"?

Takayuki Tei

Comment 40

•

23 years ago

No prob. For those who lives in ideal world (i.e. everybody sends RFC-2047 compliant message), I will introduce new pref variable (most likely "mailnews.summary.utf8") so that we can completely avoid re-encoding subject. When it's set, sender name should also be decoded.

David :Bienvenu

Comment 41

•

23 years ago

I'm getting the impression that doing the decoding and sticking the decoded string in the database is data loss, which is exactly what I wanted to avoid by leaving the string "as is" from the mail message. Just to make sure, we're not going to do that, right? And, the only additional performance hit we're taking is when parsing message headers that are mime-encoded? We're not taking any additional hit when displaying headers in the thread pane that we weren't taking before?

Takayuki Tei

Comment 42

•

23 years ago

> Just to make sure, we're not going to do that, right? By default, yes. As option, there will be a new pref which prevents decoded subject from re-encoding. For thoese who uses IMAP from work, home, and on the road, this new pref will be a big help as far as there is virtually no message with busted subject.

Takayuki Tei

Comment 43

•

23 years ago

nhotta is working on related bugs and we both agree to reassign this to nhotta.

Assignee: taka → nhotta

Status: ASSIGNED → NEW

nhottanscp

Assignee

Updated

•

23 years ago

Status: NEW → ASSIGNED

Keywords: nsbeta1

Frank Tang

Comment 44

•

23 years ago

we have the same problem in 4.x we should find out what OutlookExpress do. keep it as nsbeta1 (not + nor -) for now.

nhottanscp

Assignee

Comment 45

•

23 years ago

OE 5 recognizes the encoded 'Re:'. I tested using the attached data. How about Eudora? I don't have that.

Takayuki Tei

Comment 46

•

23 years ago

In Comment #44 Frank Tang wrote: > we have the same problem in 4.x AFAIK, the correct statement is "we used to have", not "we have".

Comment 47

•

23 years ago

Eudora can also recognize the encoded 'Re:'. Tested with Eudora 4.3-J.

nhottanscp

Assignee

Updated

•

23 years ago

Target Milestone: mozilla0.9.9 → mozilla1.0

Frank Tang

Comment 48

•

23 years ago

nsbeta1+ per triage meeting

Keywords: nsbeta1 → nsbeta1+

nhottanscp

Assignee

Comment 49

•

23 years ago

I think we only need to do this if the header is MIME encoded including "Re:". If "Re:" is not encoded then no need to do the decode/encode. I need to learn the code before doing the change.

Takayuki Tei

Comment 50

•

23 years ago

You cannot decide not to decode MIME encoded portion by looking at "Re:" at the eginning of subject because the current Mozilla code blindly adds "Re:" when replying regardless of the rest of subject. That's causing multiple "Re:".

nhottanscp

Assignee

Comment 51

•

23 years ago

But NS_MsgStripRE doesn't seem to be adding "Re:".

David :Bienvenu

Comment 52

•

23 years ago

We don't blindly add Re: to every reply - if I reply to a reply to a reply, it doesn't end up with Re: Re: Re:. I think what Takayuki is saying is that if the Re: encoded, we don't notice that. So, obviously (I think), if we want this to work correctly, we would store the encoded subject exactly as is but w/o the Re: and set the Re: flag on the header, like we do for headers where Re: is not encoded.

nhottanscp

Assignee

Comment 53

•

23 years ago

>and set the Re: flag on the header, like we do for headers where Re: is not >encoded. Does that done by the callers of NS_MsgStripRE? And they are replying on the result of NS_MsgStripRE?

David :Bienvenu

Comment 54

•

23 years ago

Naoki, it's not "done" by anyone; it just works naturally. In the db, we store the subject w/o the "Re:" (I guess we stip off the re:with NS_MsgStripRe, if that's what you mean, so yes). Then, the reply code sticks a "Re:" in front of the subject to do the reply.

nhottanscp

Assignee

Comment 55

•

23 years ago

Thanks. One more question. Does the function strips off more than one "Re:" or just the first one? I don't see a reason for stripping off more than one.

David :Bienvenu

Comment 56

•

23 years ago

Naoki, I can't claim to be an expert in that code, but it looks to me like it attempts to strip off all Re:'s and a few other things. The reason it does this, I assume, is that other mail and news clients might end up attaching multiple "Re:'s" so we strip off all we see. This code is run on all messages, not just messages sent by Netscape clients :-)

nhottanscp

Assignee

Comment 57

•

23 years ago

Thanks, then I need to look for "Re:" for encoded subjects even "Re:" is already found in the non encoded part (cannot do as my comment #49).

nhottanscp

Assignee

Comment 58

•

23 years ago

I applied the initial patch (which encodes again after strip off "Re:", id=67365). Then all of MIME encoded headers are displayed "Re:" prepended in thread pane. That is thread pane only, they are shown without "Re:" in evelop view.

nhottanscp

Assignee

Comment 59

•

23 years ago

The problem is that it sets the return result to true after encoding the string. The result is already set in the previous section, so no need to reset. Also, if the result is not true (i.e. no "Re:" stripped) then no need to encode the header.

nhottanscp

Assignee

Comment 60

•

23 years ago

Attached patch Decode MIME encoded header before parsing for "Re:". (obsolete) — Details — Splinter Review

Attachment #68141 - Attachment is obsolete: true

nhottanscp

Assignee

Comment 61

•

23 years ago

David, could you 'r' (or 'sr')?

David :Bienvenu

Comment 62

•

23 years ago

+ p1 += sizeof("=?")-1; is this really what you want? Don't you really just want p1++? or p1+= strlen("=?" )-1; ?

nhottanscp

Assignee

Comment 63

•

23 years ago

It needs to proceed by 2 bytes for "=?" after strstr search and point to "=?", then it has to look for another "?" to retrieve the charset name (e.g., =?ISO-8859-1?).

David :Bienvenu

Comment 64

•

23 years ago

are you expecting sizeof("?=") to return 4, 3, or 2?

nhottanscp

Assignee

Comment 65

•

23 years ago

Jean-Francois Ducarroz

Comment 66

•

23 years ago

sizeof is not as elegant than strlen but way more efficient...

David :Bienvenu

Comment 67

•

23 years ago

Comment on attachment 71964 [details] [diff] [review] Decode MIME encoded header before parsing for "Re:". r=bienvenu - you might want to add a comment to make it more readable.

Attachment #71964 - Flags: review+

nhottanscp

Assignee

Comment 68

•

23 years ago

Attached patch Added comments for the charset name parsing code. (obsolete) — Details — Splinter Review

Attachment #71964 - Attachment is obsolete: true

(not reading, please use seth@sspitzer.org instead)

Comment 69

•

23 years ago

Comment on attachment 72706 [details] [diff] [review] Added comments for the charset name parsing code. 1) instead of if (s - decodedString.get()) why not if (s != decodedString.get()) which seems clearer to me. 2) instead of sizeof("Subject: ")-1 why not sizeof("Subject:") 3) good job on protecting the buffer overrun of charset[]. curious, what happens if we pass in "" as the charset? what will EncodeMimePartIIStr_UTF8() do? (hopefully not crash) address those issues, and sr=sspitzer

Attachment #72706 - Flags: superreview+

nhottanscp

Assignee

Comment 70

•

23 years ago

Attached patch Address the super-reviewer's comment. When the charset is empty the encoder returns an error. — Details — Splinter Review

Attachment #72706 - Attachment is obsolete: true

nhottanscp

Assignee

Comment 71

•

23 years ago

Comment on attachment 73043 [details] [diff] [review] Address the super-reviewer's comment. When the charset is empty the encoder returns an error. carry 'r' 'sr'

Attachment #73043 - Flags: superreview+

Attachment #73043 - Flags: review+

Asa Dotzler [:asa]

Comment 72

•

23 years ago

Comment on attachment 73043 [details] [diff] [review] Address the super-reviewer's comment. When the charset is empty the encoder returns an error. a=asa (on behalf of drivers) for checkin to the 1.0 trunk

Attachment #73043 - Flags: approval+

nhottanscp

Assignee