Closed
Bug 90581
Opened 24 years ago
Closed 23 years ago
universal charset detector does not work in mail/news
Categories
(MailNews Core :: Internationalization, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
mozilla0.9.4
People
(Reporter: ezh, Assigned: shanjian)
References
()
Details
(Keywords: intl, Whiteboard: need a=)
Attachments
(9 files, 1 obsolete file)
567 bytes,
text/html
|
Details | |
498 bytes,
text/plain
|
Details | |
498 bytes,
text/html
|
Details | |
1.05 KB,
text/plain
|
Details | |
1.23 KB,
text/plain
|
Details | |
1.24 KB,
patch
|
Details | Diff | Splinter Review | |
2.31 KB,
patch
|
tetsuroy
:
review+
waterson
:
superreview+
|
Details | Diff | Splinter Review |
57.51 KB,
text/plain
|
Details | |
1.66 KB,
text/plain
|
Details |
1. Set character coding to Autodetect->Russian.
2. Open this news group.
3. Open list thrue messages
Some are displayed as
????? ? ??? -?? ?????????. ? ???? ????????? ?????? ??????? ? MSIE ???????? ?
http:// ?????????. ??? ??? ?????????? ???????? ?????????????? ?????????. ???
Now change manually the codepage to KOI8-R. Now it looks as it should.
Comment 1•24 years ago
|
||
Could you check if this is a generic Cyrillic auto detection problem?
Could you try this?
* Copy the KOI8-R text to HTML composer and save as KOI8-R
* Remove META charset tag.
* Open the HTML file in browser with Cyrillic auto detection ON.
Keywords: intl
* Copy the KOI8-R text to HTML composer and save as KOI8-R
* Remove META charset tag.
* Open the HTML file in browser with Cyrillic auto detection ON.
after all those steps with Russian auto detection ON encoding points cyrillic-koi8-r
Reporter | ||
Comment 5•24 years ago
|
||
First I want say I've the same result for the test as Marina had.
I tried 20010711 build under win98 and now see the same problem with 2001071308
under Linux (since my windows computer died I use at first time Linux RH 6.2 on
my emergency P133 :) ).
Reporter | ||
Comment 6•24 years ago
|
||
Hmmm, maybe it helps.
Scrolling up and down the messages (I must say the perf of mozilla on P133 is
very slow (espessially mail/news)) there poped-up a window saying "Unknown Error
804b0001".
Maybe it was some server error but after this the message loaded (but anyway
with this bug).
Comment 7•24 years ago
|
||
Netscape PR1 has auto detect for "All" which I think supports cyrillic detection
too, cc to shanjian.
Eugene, could you try Netscape 6.1 PR1 and use the "All" detector and see if you
can still reproduce the problem?
Status: NEW → ASSIGNED
Reporter | ||
Comment 8•24 years ago
|
||
Can Marina do this, please? I have 6.1 PR1 installed, but it starts every time
with an error... :(
Sorry, but I do not have time now for instaling 6.1 PR1. :(
i looked into Nscp6.0 rtm ( 2001-11-08 build) and with autodetect "All" russian
sites are detected with no problem
Assignee | ||
Comment 10•24 years ago
|
||
Using any of charset detector to detecting mail/news, the results may not be
satisfactory. I believe we are still feeding charset detector line by line and
ask for result line by line. Some big change need to happen before we could fix
this kind of problem. There is a bug filed against this problem. It is
assigned to somebody outside I18n group.
Comment 11•24 years ago
|
||
compare to 6rtm this is a regressin. Same newsgroup in 6.0 has no problem with
detecting russian encoding for newsarticles with Autodetect" all" is on. I don't
have to reload a message and manually correct the encoding to koi8-r as i have
to do with 6.1 (2001-07-16 branch)
Comment 12•24 years ago
|
||
Marina, which detection did you use, "Russian" or "All"?
The libmime problem is bug 12481.
Target Milestone: --- → Future
Comment 13•24 years ago
|
||
i used "All" in both cases: 6.0 and 6.1. With Autodetect set to "All" in 6.0 i
have no problem with this newsgroup ( regsoft.com) to detect koi8-r.. though
doesn't work with today's branch build
Comment 14•24 years ago
|
||
Maina, could you create a local file (.txt or .html) with text of the news
message? That way, we can see if the problem is mail/news specific.
Comment 15•24 years ago
|
||
Comment 16•24 years ago
|
||
Comment 17•24 years ago
|
||
Comment 18•24 years ago
|
||
I used today's branch build on Windows and both "Russian" and "All" detected the
attachments as KOI8-R. So it's mail/news specific.
Depends on: 12481
Reporter | ||
Comment 19•24 years ago
|
||
Yep, I also think so. While browsing I had no problem with detecting codepage.
Comment 20•24 years ago
|
||
Marina, could you copy the news message to local and attach it to this bug, thanks?
Comment 21•24 years ago
|
||
Comment 22•24 years ago
|
||
Comment 23•24 years ago
|
||
Libmime is getting data per line. So I divided the HTML file into separated
files each contains only one line. Those files are detected by browser
correctly. I used "Russian" detector for the test.
So I think the amount of the input data to the detector not really causing the
problem.
I noticed when I view the Russian message, very first line of the first message
is shown correctly but the following lines and other messages are displayed as
question marks. In the debugger, the first line, charset "koi8-r" is returned by
the detector but an empty string is returned as a charset for the following lines.
I did another test by doubling each line in the mail. The first line was
detected as "koi8-r" but the second line which is exactly the same string as the
first line was not detected and got an empty string for a charset.
So I suspect some kind of internal state is messed up for nsIStringCharsetDetector.
Shanjian, please check if anything wrong in nsIStringCharsetDetector.
Assignee: nhotta → shanjian
Status: ASSIGNED → NEW
Updated•24 years ago
|
Summary: Cyrillic is not autodetected → Cyrillic is not autodetected by nsIStringCharsetDetector
Assignee | ||
Comment 24•24 years ago
|
||
Some time between now and 6.01, mail/news reuse the old detector instead
creating new one. (That's perfect reasonable for performance reason.) However,
XPCom String detector does not work this way yet. Fix is simple, but I need to
check all detector to make sure we fix all similar problem.
Status: NEW → ASSIGNED
Assignee | ||
Comment 25•24 years ago
|
||
Assignee | ||
Comment 26•24 years ago
|
||
I checked psm detector, and it does not have this problem. So we are done with
mozilla tree. But I need to check universal detector.
Comment 28•24 years ago
|
||
So other detector's in mozilla do not need the change?
Reporter | ||
Comment 29•24 years ago
|
||
Could it land to the NN6.1 trunk, since it's a big regression for cyrillic
languages?
Comment 30•24 years ago
|
||
cc to jenm
Comment 31•24 years ago
|
||
I think the real soultion is to obsoleted nsIStringCharsetDetector and force
every code to use nsICharsetDetector isntead. There are no way we can detect a
good result with nsIStringCharsetDetector since the data we provide to it is too
small
Comment 32•24 years ago
|
||
I agree it's more efficient to feed more data to detector (see bug 12481).
But in the data (of attachment 07 [details]/17/01 17:45), the first line with 12
characters are detected correctly by the Russian detector.
I heard that some other detection module which is used for search server work
with small amount of data (e.g. user's search query string). I think auto
detection with small amount of data is not impossible.
Assignee | ||
Comment 33•24 years ago
|
||
*** Bug 58236 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 34•24 years ago
|
||
naoki, can you review this one? I talked with frank about similar change to
universal detector, and he gave me r= for that. So I don't think he will have
any objection for this bug. Since he is on vacation, can you give me r= instead?
Comment 35•24 years ago
|
||
r=nhotta
Why mDone was moved from private to protected?
Assignee | ||
Comment 36•24 years ago
|
||
In the beginning of my patch, you see "mDone" is initialized. So this variable
has to be declared as "protected" in order for it to be accessed there.
Assignee | ||
Comment 37•24 years ago
|
||
chris, can you sr this one?
Comment 38•24 years ago
|
||
sr=waterson
Assignee | ||
Comment 39•23 years ago
|
||
fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 40•23 years ago
|
||
i still see this problem in the newsgroup. I can reproduce it with the same
newsarticle with 2001-08-22 build : going to the above server with the
Autodetect set to All ( russian) the encoding of the article is detected as
Western, i can correct the display manually. The view in browser has no problem:
with Autodetect set to All ( or Russian) the encoding is detected as Koi8-r in
the browser window. Eugene, do you see this happening? Reopen
Assignee | ||
Comment 41•23 years ago
|
||
Assignee | ||
Comment 42•23 years ago
|
||
There are 2 problems in charset detector code. 1, mail assume string charset
detector use the same name as its counterpart in browser. 2, When resetting,
mAvailable should not be cleared.
This patch is for universal charset detector, same patch will be checked in to
comercial tree to fix detector "All". Universal detector works far better than
all, because 3rd party code always report something wrong.
Status: REOPENED → ASSIGNED
Assignee | ||
Comment 43•23 years ago
|
||
Nominate this one for branch.
Roy, can you review my patch?
Target Milestone: Future → mozilla0.9.4
Comment 45•23 years ago
|
||
/r=yokoyama
shanjian: thanks for correcting
NS_STRCDETECTOR_CONTRACTID_BASE "universal_string_charset_detector"
Updated•23 years ago
|
Attachment #47400 -
Flags: review+
Assignee | ||
Comment 46•23 years ago
|
||
chris, could you sr this one? thanks.
Assignee | ||
Updated•23 years ago
|
Whiteboard: need r/sr, since 7/19 → need sr/a
Comment 47•23 years ago
|
||
Comment on attachment 47400 [details] [diff] [review]
proposed patch
Um, does anyone ever actually _read_ the value of mAvailable?
Assignee | ||
Comment 48•23 years ago
|
||
that's right. mAvailable is use to show if 3rd language module has been
initiated correctly or not. Since 3rd party detector is removed in mozilla tree,
this flag should be removed as well.
(I copied the patch from commercial tree without care examination. Sorry.)
New patch will come soon.
Assignee | ||
Comment 49•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Whiteboard: need sr/a → need r/sr/a
Comment 50•23 years ago
|
||
Comment on attachment 48623 [details] [diff] [review]
update my patch (remove unused mAvailable).
sr=waterson
Attachment #48623 -
Flags: superreview+
Comment 51•23 years ago
|
||
Comment on attachment 48623 [details] [diff] [review]
update my patch (remove unused mAvailable).
looks good
/r=yokoyama
Attachment #48623 -
Flags: review+
Comment 52•23 years ago
|
||
Comment on attachment 47400 [details] [diff] [review]
proposed patch
marking as obsolete
Attachment #47400 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Whiteboard: need r/sr/a → need a=
Assignee | ||
Comment 53•23 years ago
|
||
fix checked in to trunk.
Assignee | ||
Comment 54•23 years ago
|
||
update summary
Summary: Cyrillic is not autodetected by nsIStringCharsetDetector → universal charset detector does not work in mail/news
a=roc+moz for the 0.9.4 branch
Assignee | ||
Comment 56•23 years ago
|
||
fix checked in to branch.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Comment 57•23 years ago
|
||
i am using 2001-09-13-03-0.9.4. build. i still see the problem in mail/news but
if i perform the following steps with browser it works fine:
* Copy the KOI8-R text to HTML composer and save as KOI8-R
* Remove META charset tag.
* Open the HTML file in browser with Cyrillic auto detection ON
and it points to Cyrillic-koi8-r, in mail when there is no mime it is still
pointing to Western... i am reopening. Shanjian, any suggestions?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 58•23 years ago
|
||
This bug is about universal charset detector. For Cyrillic detector, please open
another bug.
Status: REOPENED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Comment 59•23 years ago
|
||
I have problems to use Universal detector to detect a shift jis attachment. I'll
attach the mails to the report.
Comment 60•23 years ago
|
||
this is not a cyrillic problem, i had problem with detecting japanese as well
Comment 61•23 years ago
|
||
Comment 62•23 years ago
|
||
Assignee | ||
Comment 63•23 years ago
|
||
shirley/marina,
Using the original testcase, I did verify that the problem is at least resolved
in my local tree. The problem you are experiencing might be a different one. So
please file new one against those problems. By the way, for email problems, if
you can cc me a copy, it will be much easier for me to reproduce the problem.
thanks.
Comment 64•23 years ago
|
||
Shanjian, i would open a new bug but wouldn't the problem be the same:
Autodetect set to all is not working in Mail/news?
Comment 65•23 years ago
|
||
I guess the new problem is in the attachment auto-detect.
Comment 66•23 years ago
|
||
not only, i have cyrillic message that have no attach but are still not detected
as cyrillic
Assignee | ||
Comment 67•23 years ago
|
||
marina, using your original testcase, which is a newsgroup in russian. It
didn't work for me before my patch, and now it works well. That's why I believe
the problem I want to address has been fixed. If that is not the behavior you
observed, we should reopen this bug. If you observe the problem using another
testcase, which is mail attachment, it is very likely to be a different problem.
(BTW, I could not get any news in russian using the url provided. So I just
subcribe a newsgroup called "fido7.www.station.ru" which can be found in almost
any newsserver. )
Does that sound reasonable to you?
If you still feel confused, send me a mail testcase which does not work for you,
and let me find out if this is a new problem or now. I can file a new bug or
reopen this one after I see what's happening.
Comment 68•23 years ago
|
||
Shanjian, i think that the only confusion ( after clearing cache) was that the
checkmark after setting to Universal is still pointing to Western eventhough the
display of the message is correct. Same would be true for Autodetect All ( or
Russian or Japanese): the display is correct after reselcting but the checkmark
doesn't show the right encoding.. I am verifying this as fixed because you are
right , the origina; problem is gone.
Status: RESOLVED → VERIFIED
Comment 69•23 years ago
|
||
Filed bug 99630 for the problem in sjis attachment auto detect.
Updated•20 years ago
|
Product: MailNews → Core
Updated•17 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•