Closed Bug 97314 Opened 23 years ago Closed 22 years ago

Use the charset sniffed by auto-detector from mail body to the non-MIME header

Categories

(MailNews Core :: Internationalization, defect, P4)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 90584
Future

People

(Reporter: eternal, Assigned: nhottanscp)

References

Details

(Keywords: intl)

Well, I have quite a lot of messages, some are in latin, some in cyrillic
koi8-r, some in cyrillic win-1251. It seems, that auto-detection of charset for
message itself works (not absolutely sure, though). But for message list I'm
able to select only one charset. So only messages having subject in this charset
have their subject displayed properly in the message-list. I suppose, may be
charset should be auto-detected for each message and it should be displayed in
the auto-detected charset both in message list and when I view the message itself.
The message list (thread pane) display is implemented differently with the 
message view pane. For the mails w/o MIME charset, if it uses auto-detector to 
sniff out the charset, it would be risky, since the subject is usually not so 
long, auto-detector is not powerful enough in this case. 
You can put the mails in the same charset into one folder and specify 
appropriate folder charset by selecting menu View | Folder Character Coding...

*** This bug has been marked as a duplicate of 77903 ***
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → DUPLICATE
It doesn't seem a dup of bug 77903, since the reporter referred the case which 
mail body doesn't have charset info ( the auto-detector works in mail body). In 
this case, I think we can use the charset sniffed by auto-detector  from the 
mail body and apply it to the message header. But I'm not sure if it will 
affect the performance. Kat, what do you think? Thanks.
but i think that the reporter is reffering to the case when the message has no
charset info in the body, so it can not be sniffed. In that case we apply global
default ( folder cahrset). 
Added shanjian to cc list.
In most cases, auto-detector can sniff out the charset from the mail body with
no charset info when it's turned on.
there is a bug on russian auto-detectection ( it's not working), let me find a
number
this is bug # 90581
this is bug # 90581
Yes, shanjian's fix for Russian auto-detector has not been checked in yet.
This bug is about if we need to apply the charset sniffed out by auto-detector 
from the mail body to the message header in the thread pane. We have folder 
charset feature already, but just in case the user wants to read multi-lingual 
mails in one same folder.
actually Shanjian fix is for autodetecting body not headers, this is the case
when our autodetector can not detect the charset when the message body is short
( not enough info for sniffing).That's why Frank proposition is to obsolete
nsIStringCharsetDetector and use nsIStringCharsetDetector since the data we
provide to it is too small. If i would apply folder's charset it would work.
I didn't mean we should use auto-detector for message headers in the thread 
pane. I'm thinking that maybe we should pass the charset info to the header 
display in thread pane after auto-detector finds out the charset from the mail 
body.
I guess this could be a duplicate of Bug 77903. It would depend on
how that bug is fixed. If the solution there is to apply the body charset
to non-MIME headers before using the folder charset -- no matter how 
that body charset is obtained, then it would fix this bug also. 
I don't know how that fix will be implemented, and so it might be best to 
leave this bug open so that whoever will work on Bug 77903 take this 
aspect of the problem into consideration. Also this bug should
provide additional test cases to look in case Bug 77903 takes care
of this bug.
Reopened the bug, modified summary.
Status: RESOLVED → UNCONFIRMED
Keywords: intl
Resolution: DUPLICATE → ---
Summary: As I have messages both in Koi8-R and Cp-1251 Cyrillic charsets, I'm not able to list them all simultaneously → Use the charset sniffed by auto-detector from mail body to the non-MIME header
Status: UNCONFIRMED → NEW
Ever confirmed: true
Hardware: PC → All
assiging to jbetak.  Please have a look.  Thanks
Assignee: yokoyama → jbetak
Reassign to nhotta.

Currently, charset detection is not applied for message headers.
There are a couple of issues, performance and accuracy.
Usually, the number of characters in the headers are relatively small, so I
assume the accuracy of the detection would be low.
Assignee: jbetak → nhotta
nhotta, what we're talking about is the same as bug 77903 except in bug 77903
case, the charset for Body is known, while in this bug no charset for Headers or
Body is specified. So basically this is about doing the same trick as for bug
77903, but auto-detecting the Body charset prior to passing it on to Header.
Getting body charset for header does not always work. For Imap, no body data
available when we display the message list in thread pane.
I think the original report is requesting charset auto detection for headers.
Status: NEW → ASSIGNED
OS: Linux → All
Priority: -- → P4
set to future
Target Milestone: --- → Future
*** Bug 115631 has been marked as a duplicate of this bug. ***

*** This bug has been marked as a duplicate of 90584 ***
Status: ASSIGNED → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → DUPLICATE
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.