From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7) Gecko/20011221 BuildID: 2001122106 I have cut and pasted the isse from the front page of arstechnica (Jan 2,2002): "Jean-Louis Gassýe" (Same icon in View Source) This is rendered properly as "Jean-Louis Gassée" bu Netscape 4.78 I am using the default american english settings and same fonts in both documents (Times New Roman, Courier New). If I follow the link to the discussion, http://arstechnica.infopop.net/OpenTopic/page?a=tpc&s=50009562&f=174096756&m=6500968043&r=6500968043 then Mozilla renders the character correctly: "Jean-Louis Gassée" Reproducible: Always Steps to Reproduce: 1. For now, go to http://www.arstechnica.com 2. 3. Actual Results: Wierd diamond w/quesiton mark icon appears Expected Results: An accented e.
Assignee: asa → yokoyama
Component: Browser-General → Internationalization
QA Contact: doronr → teruko
I saw the diamond w/question mark again today - at the NYT website w/ Moz 0.9.7 on Win2000 Pro. The View Source let me change the character coding to ISO-8859-1 (from UTF-8) which then showed it correctly in the view source window as a long dash. The saved HTML had the entity � which was correctly shown in Netscape 4.78 as a long dash. For those interested "A Tempest at Shakespeare Shrine: Plan to Raze Theater Is Debated" is at http://www.nytimes.com/2002/01/03/arts/theater/03ROYA.html for now.
My Edit->Preferences->Navigator->Languages had a blank "Default Character Encoding", when I changed this to Western(ISO-8859-1) then the NYT article would display a long dash as Netscape 4.78 does. The current www.salon.com page has text with hexidecimal bytes A0 and E9 as characters which are still replaced by the diamond with the question mark, though. The A0 and E9 are supposed to be (from "man 7 iso_8859-1") Oct Dec Hex Char Description -------------------------------------------------------------------- 240 160 A0 NO-BREAK SPACE 351 233 E9 é LATIN SMALL LETTER E WITH ACUTE I will try restarting my browser.....
I tried this with 2001-12-21-06 and 01-03 2002 trunk build. I could not reproduce this. I could see French accet on e correctly. Could you try this with new mozilla and clean profile?
I created a clean profile (called bughunter) which came up with Language preferences English [en-us] (which is different than the previous English [en] that I had) and default coding Wester (iso-8859-1) which is the same as before. Same fonts. The accented characters display correctly so far. I remove English [en-us] and add English [en] and that still are correct. I go back to my original profile and they are incorrect. I switch from English [en] to English[en-us] and that are incorrect. So at the moment I am stumped on the relevant differences in the profiles, but I do have a work-around by creating a fresh profile. Thanks for the help. Feel free to ask me more questions.
Created attachment 63428 [details] 'Infectious' exported bookmarks. Importing this into a new profile break that profile's ability to render accented characters somehow. (bug 117758).
Wierd wierd wierd..... I went into my old profile and went to manage bookmarks and export bookmarks.html I created a new profile (with a more sensable name) and it displays correctly (www.salon.com). Close mozilla. I import my bookmarks (they show up). I press RELOAD and www.salon.com now shows the diamond with the question mark. I empty the bookmarks. It still displays incorrectly. Close mozilla. I went back to the still working bughunter profile. I emptied the bookmarks first. Still displaying correctly. Import bookmarks.html. Press RELOAD. Now displays incorrectly. Close mozilla. Go back to the bughunter profile - still broken. So the bug can be propigated via my bookmarks.html file. Very screwed up. I will figure out how to attach it to this bug report.
So it seems the LAST_CHARSET in bookmarks.html for my salon.com bookmark is the cuprit. $ grep -i salon bookmarks.html <DT><A HREF="http://www.salon.com/" ADD_DATE="1009488799" AST_VISIT="1010098472" ICON="http://www.salon.com/favicon.ico" LAST_CHARSET="UTF-8">Salon.com</A> <DT><A HREF="http://www.salon.com/" ADD_DATE="1009488799" AST_VISIT="1010098472" ICON="http://www.salon.com/favicon.ico" LAST_CHARSET="UTF-8">Salon.com</A> If I make a clean profile and a fresh bookmark and export I get: $ grep -i salon bh6.html <DT><A HREF="http://www.salon.com/" LAST_VISIT="1010100830" LAST_MODIFIED="1010100820" LAST_CHARSET="ISO-8859-1">Salon.com</A> which has the correct ISO charset. Hmm...I see in mar iso_8859-1 "Note that the ISO 8859-1 characters are also the first 256 characters of ISO 10646 (Unicode)." So I would have naively guessed UTF-8 would not have rendered it badly. Oh well. I do not know exactly why my old salon bookmark has a UTF-8 attribute value but it would seem that if a website updates / fixes / changes its encoding then people with legacy bookmarks can be silently screwed. So I do not a real workaround yet.
Chris, could you attach the bookmark.html file in this bug report?
Chris, ISO-8859-1 and UTF-8 have some character in common, but the encoding of all characters above 0x80 is different. It seems the basic problem you had was the blank value for "Default Character Encoding". I think this should never happen, and it should be investigated and FIXED. As a result you were encoutering problems much more often than should be the case. All the pages that you have problem with do not indicate an encoding charset, so correct display depends on the "Default Character" being correct, or autodetect being enabled and succesful, or the user manually selecting the correct encoding. This choice is then memorised in the bookmark entry so that the user does not have to reselect it the next time. The discussion page of arstechnica does indicate utf-8 as it's encoding in web server headers, so everythings works good. The main page does not indicate anything. I think the presence of ISO-8859-1 in the main page is rather accidental. The problem you have seems to be that as your "Default Character Encoding" was blank, very often when you visualised new page, the display was incorrect and bookmarks for new pages got created with an incorrect charset. What happens is that when you visualize several page without encoding indication in a row, the last encoding that has been selected is reused. This is something that works very well most of the time. If you visualize a discussion page on arstechnica in utf-8 and have no default encoding, utf-8 might get reused for the next page on another site, so that would explain why you were often in utf-8 encoding when visualising new pages. Sites usually don't updates / fixes / changes very often their encoding. Auto-detect is not effective enough that auto-detect would give better result than memorising the page charset in the bookmark entry. I've checked than if you select a page from your bookmark entries, and manually change the encoding to get a correct display, the bookmark entry gets updated with the correct charset, and everything works well the next time you access the page. So for me, the bookmark problem is INVALID/WONTFIX.
In some build, "Default Character Encoding" had blank. It has been fixed. Chris, could you try what you did in the recent build to reproduce the problem and log the different bug report? This original problem is works for me. I mark this as worksforme.
Status: UNCONFIRMED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → WORKSFORME
Verified as worksforme.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.