Closed
Bug 65277
Opened 24 years ago
Closed 24 years ago
thread pane charset conversion should use folder charset override for non MIME headers
Categories
(MailNews Core :: Internationalization, defect)
MailNews Core
Internationalization
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla0.9
People
(Reporter: nhottanscp, Assigned: jgmyers)
References
Details
(Keywords: intl)
Attachments
(14 files)
2.66 KB,
patch
|
Details | Diff | Splinter Review | |
5.54 KB,
text/plain
|
Details | |
15.61 KB,
patch
|
Details | Diff | Splinter Review | |
18.49 KB,
patch
|
Details | Diff | Splinter Review | |
19.04 KB,
patch
|
Details | Diff | Splinter Review | |
9.69 KB,
patch
|
Details | Diff | Splinter Review | |
13.57 KB,
patch
|
Details | Diff | Splinter Review | |
14.13 KB,
patch
|
Details | Diff | Splinter Review | |
14.89 KB,
patch
|
Details | Diff | Splinter Review | |
32.99 KB,
patch
|
Details | Diff | Splinter Review | |
10.20 KB,
patch
|
Details | Diff | Splinter Review | |
16.74 KB,
patch
|
Details | Diff | Splinter Review | |
39.51 KB,
patch
|
Details | Diff | Splinter Review | |
14.36 KB,
patch
|
Details | Diff | Splinter Review |
Thread pane charset conversion should use folder charset. Currently, the
followings are not supported.
* When message headers does not have MIME charset, a folder charset should be
used for the conversion.
* If the folder charset override is set then all the header should be converted
as a folder charset.
Reporter | ||
Updated•24 years ago
|
Keywords: intl
Summary: thread pane charset conversion should use folder charset → thread pane charset conversion should use folder charset for non MIME headers
Reporter | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Reporter | ||
Comment 2•24 years ago
|
||
>* When message headers does not have MIME charset, a folder charset should be
>used for the conversion.
This is actually working, only the override case has the problem.
Reporter | ||
Updated•24 years ago
|
Summary: thread pane charset conversion should use folder charset for non MIME headers → thread pane charset conversion should use folder charset override for non MIME headers
Reporter | ||
Updated•24 years ago
|
Target Milestone: --- → mozilla0.8
Reporter | ||
Comment 3•24 years ago
|
||
Reporter | ||
Comment 4•24 years ago
|
||
I attached a patch to support charset override for message headers.
The change was to ignore the charset specified in the header and use the folder
charset for the charset conversion. The added code path is only executed if the
user sets charset override either folder level or global preference.
David, could you review?
Comment 5•24 years ago
|
||
sr=bievnenu
Reporter | ||
Comment 6•24 years ago
|
||
I need one more reviewer. Scott, could you review it?
Comment 7•24 years ago
|
||
r=mscott
Reporter | ||
Comment 8•24 years ago
|
||
checked in
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
i am reopning this bug: in 2001-02-08 mtrunk build the folder charset override
is working until you check the box " apply to all messages ...". Messages that
have MIME in headers turn out unreadable. Reloading doesn't help ( but
unchecking " apply to all messages in the folder does..). I've seen this in
russian newsgroup, messages that have koi8-r in theirs headers are garbled after
applying koi8-r to all messages in the folder
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 10•24 years ago
|
||
Reporter | ||
Comment 11•24 years ago
|
||
It has benn working about a week after I check in until yesterday (so this is
actually a regression).
The fix for bug 58114 changed a behavior of MIME decoder. It always returns
UTF-8 string. I think it's fine to return UTF-8 there, but I just did not know
about the recent change, I need to rewrite the code to adopt to the new behavior.
Status: REOPENED → ASSIGNED
Reporter | ||
Updated•24 years ago
|
Target Milestone: mozilla0.8 → mozilla0.9
Reporter | ||
Comment 12•24 years ago
|
||
Actually, this is caused by bug 65702.
Can't decode subject if it contains ASCII and Japanese
Depends on: 65702
Reporter | ||
Comment 13•24 years ago
|
||
Reassign to jgmyers@netscape.com.
Assignee: nhotta → jgmyers
Status: ASSIGNED → NEW
Reporter | ||
Comment 14•24 years ago
|
||
Here is a list for charset precedence, manual override has a highest precedence.
Let me know if you have questions.
default (pref) - use when no other charset is specified
default (folder) - folder level default
header - charset specified by the header, use when no override is applied
override (pref) - global override set by the pref
override (folder) - folder level override (by folder default charset)
override (manual) - set by UI (charset menu)
Assignee | ||
Comment 15•24 years ago
|
||
Would it be acceptable to treat UTF-8 as special? Labeled UTF-8 would have
precedence over any override. Possibly detected UTF-8 should have precedence
over any default charset.
Status: NEW → ASSIGNED
Reporter | ||
Comment 16•24 years ago
|
||
That would be okay for main body. How about a UTF-8 mail with (non UTF-8)
attachments? The user may want to apply manual override to see attachments
correctly when attachments are not charset labeled.
Assignee | ||
Comment 17•24 years ago
|
||
This would only fail if non-utf8 data were mislabeled as utf-8 or if
there were no override adn unlabeled data were syntactically legal utf-8. Note
that it is extremely unlikely for non-utf8 data to be syntactically legal utf-8.
Reporter | ||
Comment 18•24 years ago
|
||
Okay, that's fine, assuming the check for legal UTF-8 would not be time consuming.
BTW, why do you want to treat UTF-8 specially?
Assignee | ||
Comment 19•24 years ago
|
||
UTF-8 is the only possible way to ever get out of the existing charset soup.
Its use should not be precluded by the charset soup workarounds.
Assignee | ||
Updated•24 years ago
|
Whiteboard: blocked on ducarroz review of 63834
Assignee | ||
Comment 20•24 years ago
|
||
Current plan is to pass an input charset and an override flag down to the
encoded-word decoder, replacing the current output charset. A lot of redundant
or unused charset manipulation needs to be cleaned up in the process. I am
considering having the low-level decoder output a PRUnichar *, as several
callers immediately convert the UTF-8 output to UTF-16.
The charset converter recycling that used to go in some but not all callers
should probably be replaced with a recycler at or below the encoded-word
decoder.
Whiteboard: blocked on ducarroz review of 63834
Assignee | ||
Comment 21•24 years ago
|
||
Assignee | ||
Comment 22•24 years ago
|
||
ducarroz: please review
Reporter | ||
Comment 23•24 years ago
|
||
The new decoder MIME_DecodeMimeHeader, is not called.
Do you plan to provide another patch?
Assignee | ||
Comment 24•24 years ago
|
||
Yes. I will submit further patches to switch all callers of
MIME_DeocdeMimePartIIStr() to that interface.
Comment 25•24 years ago
|
||
nhotta, can you test this patch as it essentially about I18n. Thanks
Reporter | ||
Comment 26•24 years ago
|
||
The new decode function is not called yet, so would not affect i18n.
The other change is to replace UTF-8 to US-ASCII conversion by strdup(), since
US-ASCII is a subset of UTF-8, no conversion is needed.
Assignee | ||
Comment 27•24 years ago
|
||
Reviewers should consider whether they consider the new MIME_DecodeMimeHeader()
a good replacement for MIME_DecodeMimePartIIStr(). Is my proposed architecture
of passing a charset and a bool down to the encoded-word decoder acceptable? Am
I incorrectly neglecting support for charset detection?
intl_is_legal_utf8() should return a PRBool. I can fix that either in this
review cycle or in my next patch.
Reporter | ||
Comment 28•24 years ago
|
||
MIME decode and charset conversion were separated before. Override flag and
default charset are related to charset conversion but not needed for the
decoding. It also allowed optimization of charset conversion (e.g. cache unicode
decoder).
After the change of the function does both MIME decode and charset conversion,
the override flag and default charset should be passed to the function. So the
change is necessary when we keep MIME decode and charset conversion together.
>Is my proposed architecture
> of passing a charset and a bool down to the encoded-word decoder acceptable?
So this is necessary. The issue we could have discussed before was whether we
wanted to keep MIME decode and charset conversion separated or process them in
one function.
>Am I incorrectly neglecting support for charset detection?
Is this about charset auto detection? That's for body only, no auto detection is
applied to headers.
Assignee | ||
Comment 29•24 years ago
|
||
Optimization of charset conversion should be done below header decoding, not
above. It is better to implement any given function in one place rather than,
say, five.
Comment 30•24 years ago
|
||
looks good for me. R=ducarroz
Comment 31•24 years ago
|
||
waiting for nhotta to finish reviewing the patch before I sr.
Reporter | ||
Comment 32•24 years ago
|
||
* There are a couple of calls to intl_copy_uncoded_header() which replace strcpy().
Is that necessary? Before coming there, the caller checks "=?", so the header
contains MIME encoded words. I think we do not have to apply the default charset
to unencoded parts.
The case that the caller does not find "=?", I don't see charset conversion
using the default charset. If no conversion there, the default charset will only
be used for headers which contains "=?".
* I see mixture of lower and upper case of "UTF-8". Please use always "UTF-8".
* Type of length is different between intl_is_legal_utf8 and
intl_copy_uncoded_header, please unify.
* In the code below, why int is used instead of unsigned char? Same for
intl_is_legal_utf8()
+static void intl_copy_uncoded_header(char **output, const char *input,
+ int len, const char *default_charset)
+{
+ int c;
Assignee | ||
Comment 33•24 years ago
|
||
Just because a header has the 2-character sequence =? does not mean it has MIME
encoded words. That sequence can occur as an ASCII = followed by an ASCII ?.
If MIME_DecodeMimeHeader() does not find "=?" but the header has non-UTF8
unencoded data, then the call to intl_is_legal_utf8() from
MIME_DecodeMimeHeader() will return zero, causing MIME_DecodeMimeHeader() to
call intl_decode_mime_part2_str() in order to apply the default charset.
int is used instead of unsigned char because the shorter size is not necessary.
int is by definition the most natural size for the platform.
Assignee | ||
Comment 34•24 years ago
|
||
Assignee | ||
Comment 35•24 years ago
|
||
I also changed intl_is_legal_utf8 to return PRBool and included a 1-character
fix for bug 69251.
Reporter | ||
Comment 36•24 years ago
|
||
So intl_is_legal_utf8() may be called twice once from MIME_DecodeMimeHeader()
then intl_decode_mime_part2_str(). Is that true?
Assignee | ||
Comment 37•24 years ago
|
||
Correct. intl_is_legal_utf8() may be called twice. Once from
MIME_DecodeMimeHeader() and once from intl_copy_uncoded_header() inside of
intl_decode_mime_part2_str(). This will generally happen when there is
unencoded non-utf8 data in the header.
Reporter | ||
Comment 38•24 years ago
|
||
Okay, I don't think that would make significant performance degradation.
r=nhotta
Assignee | ||
Comment 39•24 years ago
|
||
ducarroz: please review updated patch.
Comment 40•24 years ago
|
||
looks good to me. R=ducarroz
Assignee | ||
Updated•24 years ago
|
Whiteboard: awaiting ducarroz review → awaiting sspitzer review
Comment 41•24 years ago
|
||
sr=sspitzer
Assignee | ||
Comment 42•24 years ago
|
||
Reporter | ||
Comment 43•24 years ago
|
||
r=nhotta
Please remove following line in nsStreamConverter.cpp.
+ // in future, the override flag to be per folder instead of a
global pref
Comment 44•24 years ago
|
||
R=ducarroz
Comment 45•24 years ago
|
||
sr=sspitzer
before you check in, can you remove that comment line?
Assignee | ||
Comment 46•24 years ago
|
||
Assignee | ||
Comment 47•24 years ago
|
||
The above patch blindly preserves the semantics of the existing code, except it
doesn't cache the converter it doesn't need to use.
Previously, MimeHeaders_convert_header_value() inexplicably handled a null
opt->rfc1522_conversion_fn by freeing its value and returning NULL. If this is
a "don't care" case (it never gets called in the draft-converter case), then the
above patch can be further simplified by removing opt->rfc1522_conversion_p,
always treating it as being PR_TRUE.
Assignee | ||
Comment 48•24 years ago
|
||
Reporter | ||
Comment 49•24 years ago
|
||
The patch replaces MimeHeaders_convert_rfc1522 by MIME_DecodeMimeHeader.
MimeHeaders_convert_rfc1522 does caching for charset converter. That should not
be replaced unless MIME_DecodeMimeHeader has equivalent caching mechanism.
Why don't you change mime_convert_rfc1522 to replace MIME_DecodeMimePartIIStr by
MIME_DecodeMimeHeader?
Assignee | ||
Comment 50•24 years ago
|
||
MimeHeaders_convert_rfc1522() caches charset decoders, but only uses those
cached decoders in the situation where there are no MIME encoded words. If
there are MIME encoded words, it just converts UTF-8 to UTF-8.
If I just change mime_convert_rfc1522() to call MIME_DecodeMimeHeader() instead
of MIME_DecodeMimePartIIStr(), then MimeHeaders_convert_rfc1522() would cache
the decoders but never use them. This would be because MIME_DecodeMimeHeader()
always returns data, including just-send-8 data, in UTF-8. The only effective
difference would be the cached but unused decoders would be available for use by
MimeInlineText_rotate_convert_and_parse_line().
The converter caching is being done at too high a level. It should instead be
done in MIME_ConvertCharset() or below. Correctness first, performance second.
Reporter | ||
Comment 51•24 years ago
|
||
>Correctness first, performance second.
I do not agree, as "correctness" for coding point of view no actual correctness
problem at user side.
Are you able to finish all your remaining changes plus performance enhancement
before mozilla0.9 closure?
Assignee | ||
Comment 52•24 years ago
|
||
Whether or not I am able to finish by 0.9 tree closure depends entirely on the
responsiveness of reviewers. If I get many more 3 day delays, such as for the
last patch, I may well not finish in time.
Comment 53•24 years ago
|
||
ok, looks good to me. have you tested it carrefully? if yes, R=ducarroz
Assignee | ||
Comment 54•24 years ago
|
||
I have tested it on an i18n message and verified the override charset is now
applied to the header display in the message pane.
Reporter | ||
Comment 55•24 years ago
|
||
My concern is performance. The patch removes the cache of charset converter. I
think MIME_DecodeMimeHeader() checkes the header string and only applies charset
conversion when the string is not 8 bit and not valid UTF-8. Is this correct?
If that's the case, it would not be so bad.
In any case, it's better to check performance with the change using Quantify for
example. I will try to review the patch today.
Reporter | ||
Comment 56•24 years ago
|
||
r=nhotta
- if ((!value) || (!*value))
+ if (!*value)
return *value;
Is that okay, not to check the pointer?
Assignee | ||
Comment 57•24 years ago
|
||
Either alternative will fault if the pointer is null.
Assignee | ||
Comment 58•24 years ago
|
||
Also, the function is static and it is easy to audit that it is always called
with a non-null pointer.
Comment 59•24 years ago
|
||
sr=sspitzer
Assignee | ||
Comment 60•24 years ago
|
||
Assignee | ||
Comment 61•24 years ago
|
||
In my latest patch, I also change nsDBFolderInfo::GetCharPtrCharacterSet() to
use the default character set like nsDBFolderInfo::GetCharacterSet() does.
Reporter | ||
Comment 62•24 years ago
|
||
nsMimeConverter, there were separate functions, one just dif MIME decode and
other to dif MIME decode plus converted to unicode. Now both functions return
unicode string, the difference is either returns char* or PRUnichar*.
I think we just need one, the caller can convert UCS2<->UTF-8 easily.
Assignee | ||
Comment 63•24 years ago
|
||
Another difference between the functions (which callers depend on) is that if no
conversion is needed the one returning char* will return nsnull whereas the one
returning PRUnichar* will copy the input.
I think there are enough callers wanting a PRUnichar* to warrant a second
function.
Reporter | ||
Comment 64•24 years ago
|
||
okay, please change the comments to indicate the new decoder behavior
r=nhotta
Can I expect the latest patch will fix this bug?
Assignee | ||
Comment 65•24 years ago
|
||
There are about two more patches coming. One for nsMimeHtmlEmitter.cpp,
nsMsgHeaderParser.cpp, and nsMsgI18N.cpp, another for the callers of
nsMsgI18NDecodeMimePartIIStr.
This patch takes care of the folder charset and folder override charset. I
don't know if my work will address the override caused by the view menu.
The call in nsMimeBaseEmitter.cpp doesn't have any charset information as there
is no apparent context to get that information from. I can't even figure out
when this code is called.
Reporter | ||
Comment 66•24 years ago
|
||
nsMimeBaseEmitter.cpp, override is currently not working, see bug 57086.
For menu override, the only change I have seen so far was you changed override
field from char* to bool.
Assignee | ||
Comment 67•24 years ago
|
||
The patch I just landed, id 26422, passes the override_charset down from
MimeHeaders_convert_header_value(). This is sufficient for menu overrides to
now affect (at least) subjects in the header section of the message pane.
Comment 68•24 years ago
|
||
*** Bug 71548 has been marked as a duplicate of this bug. ***
Comment 69•24 years ago
|
||
this code seems ok to me, but I'd like the module owners bienvenu (for
mailnews/db) and ducarroz (for mailnews/mime) to review it.
Comment 70•24 years ago
|
||
R=ducarroz for mime part.
Comment 71•24 years ago
|
||
John, can you wait until the performance branch lands to check this in? (at
least the db part). On the performance branch, I've added an convenience
function to get and cache the mime converter so your code should use that. Other
than that, your changes look OK.
Actually, you don't have much choice, because the performance branch is going to
land before the tree opens for .9 development, so you'll need to merge with my
changes. Sorry for the potential conflicts.
Assignee | ||
Comment 72•24 years ago
|
||
Comment 73•24 years ago
|
||
sr=bienvenu
Assignee | ||
Comment 74•24 years ago
|
||
Comment 75•24 years ago
|
||
you can use an nsXPIDLString for PRUnichar *decodedString;
in ReplyOnTop and remove the PR_Free call, I believe. Still looking at the rest.
Comment 76•24 years ago
|
||
question for nhotta: is it ok to use kCMimeConverterCID
as a service instead of using CreateInstance?
Assignee | ||
Comment 77•24 years ago
|
||
Reporter | ||
Comment 78•24 years ago
|
||
It caches unicode convertor, so use createinstance or change the class
implementation (this might have been done since the new decoder always returns
UTF-8 and no converter is needed).
http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/nsMimeConverter.h#108
Assignee | ||
Comment 79•24 years ago
|
||
The nsIMimeConverter objects retrieved by do_GetService() are only used to
invoke the DecodeMimeHeader() methods. This doesn't touch any of the members.
Reporter | ||
Comment 80•24 years ago
|
||
Reviewing nsMsgCompose.cpp,
* @@ -986,57 +986,66 @@,
why is the extra conversion needed only for "auther"? That extra conversion is
also in the original code. I think you can just use "decodedString" like other
fields.
* @@ -1235,49 +1237,34 @@,
The new code ignores the return value (error code), is that okay?
Assignee | ||
Comment 81•24 years ago
|
||
Comment 82•24 years ago
|
||
R=ducarroz for new patch.
Reporter | ||
Comment 83•24 years ago
|
||
In nsMsgSearchTerm.h, a protected function MatchString is added. Is that
implemented? I don't see that in the diff.
Other question is what does it mean to pass null for charset when calling
MatchString?
Assignee | ||
Comment 84•24 years ago
|
||
MatchString was previously exported through the interface. This patch makes it
private. A null charset means the input is already in UTF-8.
Reporter | ||
Comment 85•24 years ago
|
||
r=nhotta
For charset override in search, bug 68706 is assigned to taka (cc to him).
Assignee | ||
Comment 86•24 years ago
|
||
I added mention of bug 68706 to the XXX comment in
nsMsgSearchOfflineMail::MatchTerms(). I will reattach the combined patch for
the convenience of further reviewers.
Assignee | ||
Comment 87•24 years ago
|
||
Comment 88•24 years ago
|
||
sr=bienvenu
Assignee | ||
Comment 89•24 years ago
|
||
Reporter | ||
Comment 90•24 years ago
|
||
r=nhotta
Comment 91•24 years ago
|
||
sr=bienvenu
Comment 92•24 years ago
|
||
R=ducarroz
Assignee | ||
Comment 93•24 years ago
|
||
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago → 24 years ago
Resolution: --- → FIXED
Updated•20 years ago
|
Product: MailNews → Core
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•