Closed Bug 165373 Opened 22 years ago Closed 4 years ago

Page with raw 8 bit characters is not showing them correct encoding

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mlemos, Assigned: jshin1987)

References

()

Details

(Keywords: intl)

Attachments

(2 files)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020826 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020826 In all pages that have 8 bit characters supposed to be displayed as ISO-8859-1 like this one http://www.gildot.org/ , they show as question marks. It seems to be happening with text submitted with forms that are loaded with 8 bit characters. Downloading to disk the page with some other tool (wget) and then displaying it with Mozilla does not exhibit the problem, so it seems to be a HTTP encoding translation problem. My guess is that Mozilla 1.1 is assuming the encoding is UTF-8 . This was not happening in Mozilla 1.0 . Reproducible: Always Steps to Reproduce: 1. Display the page http://www.gildot.org/ and see for yourself. 2. Try also submitting a form already loaded with 8 bit characters 3. Actual Results: 8 bit characters appear as question marks. Expected Results: Display 8 bit encoded characters by ISO-8859-1
As http://validator.w3.org/check?uri=http%3A%2F%2Fwww.gildot.org%2F&charset=%28detect+automatically%29&doctype=Inline says, the page and the server give no encoding, so whatever you have specified as default in your browser (obviously not ISO-8859-1) will apply. As a matter of fact, a virgin Mozilla does default to ISO-8859-1, but many of these site bugs would be more easily detected if UTF-8 was made default.
intl
Assignee: asa → yokoyama
Component: Browser-General → Internationalization
QA Contact: asa → ruixu
Of course my browser default encoding is set to ISO-8859-1 . The bug is on Mozilla and was introduced between 1.0 and 1.1 that I just upgraded yesterday.
Keywords: intl
QA Contact: ruixu → ylong
Status: UNCONFIRMED → NEW
Ever confirmed: true
It's a bit tricky to reproduce the bug. Try this way: 1. Set your browser's default character coding to, say, ISO-8859-15. 2. Access the respective page via a link which is embedded in a UTF-8 encoded page. I. e. google.de's result pages are UTF-8 encoded: http://www.google.de/search?q=bundestagswahl+stimmzettel+reutlingen&ie=UTF-8&oe=UTF-8&hl=de&btnG=Google-Suche&meta= (Verify with page info and take a look into google's page source) 3. Now click at the first link named "Bundestagswahl 98" which is http://www.ikg.rt.bw.schule.de/wahlen98/wahlverf.html [1] Neither this page nor the header sent from the server (try wget -s) contain an explicit char coding. 4. If you never ever have had a bookmark or history entry of [1] you will see the text "Jeder W?hler hat zwei Stimmen" which should really read "Jeder Wähler hat zwei Stimmen". The reason is Mozilla assumes UTF-8 encoding instead of ISO-8859-15. If you try to reproduce this bug you have to delete your history and probably your cache. Eventually restart Mozilla before you try again.
-> shanjian
Assignee: yokoyama → shanjian
WFM on 10-14 trunk build/linux RH7.2. When I try the steps in comment #4, I create a new profile and the page will marked as the browser default charset not UTF-8.
Hardly to believe. I have checked both with 1.2b and with the nightly build 2002-10-22 using W2K. *Please* do the following. Once and *exactly* in this order. (1) Set your default character coding to ISO-8859-15. (2) Go to www.google.de (3) Search for "in die Nutzung elektronischer Ressourcen" (4) click at the link to the first hit (something at University of Heidelberg (5) Tell us here, what you read in lines 1-3. I have: Einf�hrung in die Nutzung elektronischer Ressourcen Einf�hrung in HTML which really should read: Einführung in die Nutzung elektronischer Ressourcen Einführung in HTML The View | Page Info gives UTF-8 instead of ISO-8859-15. Could somebody please confirm this behaviour?
I'll follow a screen shot to show how it display in my machine. I think google search page always in UTF-8, so if you go View | Character Coding, should see UTF-8 is marked. And the page in comment #4: http://www.ikg.rt.bw.schule.de/wahlen98/wahlverf.html is a page without charset meta-tag, so it will be marked as the default charset (in your case iso-8859-15).
The default charset is iso-8859-15, and charset is marked as UTF-8. What's your google search page language preferences setting?
Your google-search is perfectly OK. Google is not the problem. The problem *is* that Mozilla displays any link now as UTF-8 encoded *iff* this page does not have it's own encoding. So please, click at the first link in google's hit list named "Einfuehrung in die..." and compare the result with my screenshot. If you want to reproduce, don't forget to - clear history - clear location bar - clar cache - restart Mozilla. TIA
Sorry, I don't see your problem with both google search page and the pages by clicking the links in the search result. If those papes without meta-tag charset, then will be marked as the default charset not UTF-8. I did try clear Cache or with a new profile.
Let me bring up the fact that these problems do not happen all the time and usually quiting Mozilla and restarting usually makes the problem not show up. Also, this problem does not happen only with the Web browser. I have just experienced the problem with a HTML e-mail message. Quiting and restarting the program made the problem not show up with the same message. It seems the problem is caused by some encoding data structure that gets trashed and is not re-initialized with the correct default document encoding.
On my systems the problem does show up each time I click at a google result. Versions: I remember hat until 1.0rc3 the problem was absent. Currently I use Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1) Gecko/20020826. Under a different W2K User I have tried 1.2b and the latest build (2002-10-22). Both expose the beforementioned behavior. I will try now with the Linux versions.
So. I have check 1.0, 1.0.1, 1.1, 1.2b and 2002-10-24-22 (latest trunk). Something I was not aware of: You must click at google's result-link *with the middle mouse button* to open a new browser window. Otherwise the error does not show up. Here are the result of freshly installed versions (SuSE Linux 7.3): Version error ---------------------- 1.0 NO 1.0.1 YES 1.1 YES 1.2b YES latest-tr YES
This bug was very annoying, at each page I used the "View -> Character Coding -> Western ISO-8859-1" to restore the correct coding to see right characters. I have tested some things and after I cannot reproduce the problem... What I found is that at prime installation, the Default Character Coding is "ISO-8859-1", but it is not listed in file "prefs.js". Then, I change it to another coding and exit mozilla. I restart mozilla and reset it to "ISO-8859-1". Then, the entry << user_pref("intl.charset.default", "ISO-8859-1"); >> is now in the file "prefs.js" and the problem cannot be reproduced. I hope it could help you to fix this problem...
First: this is how my prefs.js looks like since I can remember: 99 user_pref("intl.accept_languages", "en-us, de-de, de"); 100 user_pref("intl.charset.default", "ISO-8859-15"); 101 user_pref("intl.charset.detector", ""); Second: Did you notice, that it is necessary to start a new browser windows (i.e. with middle mouse button on google result page)? *Do not open in the same window, since mozilla remembers the actual character coding in history* Third: Did you ensure that google looses all memories to the respective page before you tried to reproduce the bug? (clear history, clear location bar, clear cache and restart Mozilla)
shanjian is no longer working on mozilla for 2 years and these bugs are still here. Mark them won't fix. If you want to reopen it, find a good owner first.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → WONTFIX
Mass Reassign Please excuse the spam
Assignee: shanjian → nobody
Mass Re-opening Bugs Frank Tang Closed on Wensday March 02 for no reason, all the spam is his fault feel free to tar and feather him
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam
Assignee: nobody → jshin1987
Status: REOPENED → NEW
QA Contact: amyy → i18n

The originally reported site WFM today.

Status: NEW → RESOLVED
Closed: 20 years ago4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: