Closed
Bug 165373
Opened 22 years ago
Closed 4 years ago
Page with raw 8 bit characters is not showing them correct encoding
Categories
(Core :: Internationalization, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: mlemos, Assigned: jshin1987)
References
()
Details
(Keywords: intl)
Attachments
(2 files)
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020826
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020826
In all pages that have 8 bit characters supposed to be displayed as ISO-8859-1
like this one http://www.gildot.org/ , they show as question marks.
It seems to be happening with text submitted with forms that are loaded with 8
bit characters.
Downloading to disk the page with some other tool (wget) and then displaying it
with Mozilla does not exhibit the problem, so it seems to be a HTTP encoding
translation problem.
My guess is that Mozilla 1.1 is assuming the encoding is UTF-8 . This was not
happening in Mozilla 1.0 .
Reproducible: Always
Steps to Reproduce:
1. Display the page http://www.gildot.org/ and see for yourself.
2. Try also submitting a form already loaded with 8 bit characters
3.
Actual Results:
8 bit characters appear as question marks.
Expected Results:
Display 8 bit encoded characters by ISO-8859-1
Comment 1•22 years ago
|
||
As
http://validator.w3.org/check?uri=http%3A%2F%2Fwww.gildot.org%2F&charset=%28detect+automatically%29&doctype=Inline
says, the page and the server give no encoding, so whatever you have specified
as default in your browser (obviously not ISO-8859-1) will apply.
As a matter of fact, a virgin Mozilla does default to ISO-8859-1, but many of
these site bugs would be more easily detected if UTF-8 was made default.
Comment 2•22 years ago
|
||
intl
Assignee: asa → yokoyama
Component: Browser-General → Internationalization
QA Contact: asa → ruixu
Reporter | ||
Comment 3•22 years ago
|
||
Of course my browser default encoding is set to ISO-8859-1 . The bug is on
Mozilla and was introduced between 1.0 and 1.1 that I just upgraded yesterday.
Updated•22 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
It's a bit tricky to reproduce the bug. Try this way:
1. Set your browser's default character coding to, say, ISO-8859-15.
2. Access the respective page via a link which is embedded in a UTF-8
encoded page. I. e. google.de's result pages are UTF-8 encoded:
http://www.google.de/search?q=bundestagswahl+stimmzettel+reutlingen&ie=UTF-8&oe=UTF-8&hl=de&btnG=Google-Suche&meta=
(Verify with page info and take a look into google's page source)
3. Now click at the first link named "Bundestagswahl 98" which is
http://www.ikg.rt.bw.schule.de/wahlen98/wahlverf.html [1]
Neither this page nor the header sent from the server (try wget -s)
contain an explicit char coding.
4. If you never ever have had a bookmark or history entry of [1] you
will see the text "Jeder W?hler hat zwei Stimmen" which should
really read "Jeder Wähler hat zwei Stimmen". The reason is Mozilla
assumes UTF-8 encoding instead of ISO-8859-15.
If you try to reproduce this bug you have to delete your history and
probably your cache. Eventually restart Mozilla before you try again.
Comment 6•22 years ago
|
||
WFM on 10-14 trunk build/linux RH7.2. When I try the steps in comment #4, I
create a new profile and the page will marked as the browser default charset not
UTF-8.
Hardly to believe. I have checked both with 1.2b and with the
nightly build 2002-10-22 using W2K. *Please* do the following.
Once and *exactly* in this order.
(1) Set your default character coding to ISO-8859-15.
(2) Go to www.google.de
(3) Search for "in die Nutzung elektronischer Ressourcen"
(4) click at the link to the first hit (something at University
of Heidelberg
(5) Tell us here, what you read in lines 1-3. I have:
Einf�hrung in die Nutzung elektronischer Ressourcen
Einf�hrung in HTML
which really should read:
Einführung in die Nutzung elektronischer Ressourcen
Einführung in HTML
The View | Page Info gives UTF-8 instead of ISO-8859-15.
Could somebody please confirm this behaviour?
Comment 8•22 years ago
|
||
I'll follow a screen shot to show how it display in my machine.
I think google search page always in UTF-8, so if you go View | Character
Coding, should see UTF-8 is marked. And the page in comment #4:
http://www.ikg.rt.bw.schule.de/wahlen98/wahlverf.html
is a page without charset meta-tag, so it will be marked as the default charset
(in your case iso-8859-15).
Comment 9•22 years ago
|
||
The default charset is iso-8859-15, and charset is marked as UTF-8.
What's your google search page language preferences setting?
Comment 10•22 years ago
|
||
Comment 11•22 years ago
|
||
Your google-search is perfectly OK. Google is not the problem.
The problem *is* that Mozilla displays any link now as UTF-8
encoded *iff* this page does not have it's own encoding.
So please, click at the first link in google's hit list named
"Einfuehrung in die..." and compare the result with my screenshot.
If you want to reproduce, don't forget to
- clear history
- clear location bar
- clar cache
- restart Mozilla.
TIA
Comment 12•22 years ago
|
||
Sorry, I don't see your problem with both google search page and the pages by
clicking the links in the search result. If those papes without meta-tag
charset, then will be marked as the default charset not UTF-8. I did try clear
Cache or with a new profile.
Reporter | ||
Comment 13•22 years ago
|
||
Let me bring up the fact that these problems do not happen all the time and
usually quiting Mozilla and restarting usually makes the problem not show up.
Also, this problem does not happen only with the Web browser. I have just
experienced the problem with a HTML e-mail message. Quiting and restarting the
program made the problem not show up with the same message.
It seems the problem is caused by some encoding data structure that gets trashed
and is not re-initialized with the correct default document encoding.
Comment 14•22 years ago
|
||
On my systems the problem does show up each time I click at a
google result. Versions: I remember hat until 1.0rc3
the problem was absent. Currently I use Mozilla/5.0 (Windows; U;
Windows NT 5.0; en-US; rv:1.1) Gecko/20020826. Under a different
W2K User I have tried 1.2b and the latest build (2002-10-22).
Both expose the beforementioned behavior.
I will try now with the Linux versions.
Comment 15•22 years ago
|
||
So. I have check 1.0, 1.0.1, 1.1, 1.2b and 2002-10-24-22 (latest trunk).
Something I was not aware of: You must click at google's result-link
*with the middle mouse button* to open a new browser window. Otherwise
the error does not show up. Here are the result of freshly installed
versions (SuSE Linux 7.3):
Version error
----------------------
1.0 NO
1.0.1 YES
1.1 YES
1.2b YES
latest-tr YES
Comment 16•22 years ago
|
||
This bug was very annoying, at each page I used the "View -> Character Coding ->
Western ISO-8859-1" to restore the correct coding to see right characters.
I have tested some things and after I cannot reproduce the problem...
What I found is that at prime installation, the Default Character Coding is
"ISO-8859-1", but it is not listed in file "prefs.js". Then, I change it to
another coding and exit mozilla. I restart mozilla and reset it to "ISO-8859-1".
Then, the entry << user_pref("intl.charset.default", "ISO-8859-1"); >> is now in
the file "prefs.js" and the problem cannot be reproduced.
I hope it could help you to fix this problem...
Comment 17•22 years ago
|
||
First: this is how my prefs.js looks like since I can remember:
99 user_pref("intl.accept_languages", "en-us, de-de, de");
100 user_pref("intl.charset.default", "ISO-8859-15");
101 user_pref("intl.charset.detector", "");
Second: Did you notice, that it is necessary to start a new browser
windows (i.e. with middle mouse button on google result page)?
*Do not open in the same window, since mozilla remembers the actual
character coding in history*
Third: Did you ensure that google looses all memories to the respective
page before you tried to reproduce the bug? (clear history, clear
location bar, clear cache and restart Mozilla)
Comment 18•20 years ago
|
||
shanjian is no longer working on mozilla for 2 years and these bugs are still
here. Mark them won't fix. If you want to reopen it, find a good owner first.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
Comment 20•20 years ago
|
||
Mass Re-opening Bugs Frank Tang Closed on Wensday March 02 for no reason, all
the spam is his fault feel free to tar and feather him
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Comment 21•20 years ago
|
||
Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam
Assignee: nobody → jshin1987
Status: REOPENED → NEW
Updated•15 years ago
|
QA Contact: amyy → i18n
Comment 22•4 years ago
|
||
The originally reported site WFM today.
Status: NEW → RESOLVED
Closed: 20 years ago → 4 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•