French Accent on e in Gassee is sometimes not rendered, get diamond w/question mark icon

VERIFIED WORKSFORME

Status

()

Core
Internationalization
VERIFIED WORKSFORME
16 years ago
16 years ago

People

(Reporter: Chris Kuklewicz, Assigned: Roy Yokoyama)

Tracking

Trunk
x86
Windows 2000
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

16 years ago
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.7) Gecko/20011221
BuildID:    2001122106

I have cut and pasted the isse from the front page of arstechnica
(Jan 2,2002): "Jean-Louis Gassýe"  (Same icon in View Source)

This is rendered properly as "Jean-Louis Gassée" bu Netscape 4.78

I am using the default american english settings and same fonts
in both documents (Times New Roman, Courier New).

If I follow the link to the discussion,
http://arstechnica.infopop.net/OpenTopic/page?a=tpc&s=50009562&f=174096756&m=6500968043&r=6500968043
then Mozilla renders the character correctly: "Jean-Louis Gassée"




Reproducible: Always
Steps to Reproduce:
1. For now, go to http://www.arstechnica.com
2.
3.

Actual Results:  Wierd diamond w/quesiton mark icon appears

Expected Results:  An accented e.

Comment 1

16 years ago
->I18N
Assignee: asa → yokoyama
Component: Browser-General → Internationalization
QA Contact: doronr → teruko
(Reporter)

Comment 2

16 years ago
I saw the diamond w/question mark again today - at the NYT website w/ Moz 0.9.7
on Win2000 Pro.

The View Source let me change the character coding to ISO-8859-1 (from UTF-8)
which then showed it correctly in the view source window as a long dash.  The
saved HTML had the entity � which was correctly shown in Netscape 4.78 as
a long dash.

For those interested "A Tempest at Shakespeare Shrine: Plan to Raze Theater Is
Debated" is at http://www.nytimes.com/2002/01/03/arts/theater/03ROYA.html for now.
(Reporter)

Comment 3

16 years ago
My Edit->Preferences->Navigator->Languages had a blank
"Default Character Encoding", when I changed this to Western(ISO-8859-1)
then the NYT article would display a long dash as Netscape 4.78 does.

The current www.salon.com page has text with hexidecimal bytes A0 and E9 as
characters which are still replaced by the diamond with the question mark, though.

The A0 and E9 are supposed to be (from "man 7 iso_8859-1")

       Oct   Dec   Hex   Char   Description
       --------------------------------------------------------------------
       240   160   A0           NO-BREAK SPACE
       351   233   E9     é     LATIN SMALL LETTER E WITH ACUTE

I will try restarting my browser.....

Comment 4

16 years ago
I tried this with 2001-12-21-06 and 01-03 2002 trunk build.  I could not 
reproduce this.  I could see French accet on e correctly.

Could you try this with new mozilla and clean profile?
(Reporter)

Comment 5

16 years ago
I created a clean profile (called bughunter) which came up with
Language preferences English [en-us] (which is different than the
previous English [en] that I had) and default coding Wester (iso-8859-1)
which is the same as before.  Same fonts.

The accented characters display correctly so far.

I remove English [en-us] and add English [en] and that still are correct.

I go back to my original profile and they are incorrect.  I switch from
English [en] to English[en-us] and that are incorrect.

So at the moment I am stumped on the relevant differences in the profiles,
but I do have a work-around by creating a fresh profile.

Thanks for the help.  Feel free to ask me more questions.
(Reporter)

Comment 6

16 years ago
Created attachment 63428 [details]
'Infectious' exported bookmarks.

Importing this into a new profile break that profile's ability to render
accented characters somehow. (bug 117758).
(Reporter)

Comment 7

16 years ago
Wierd wierd wierd.....

I went into my old profile and went to manage bookmarks and export bookmarks.html

I created a new profile (with a more sensable name) and it displays
correctly (www.salon.com).  Close mozilla.

I import my bookmarks (they show up).  I press RELOAD and www.salon.com now
shows the diamond with the question mark.
I empty the bookmarks.  It still displays incorrectly.
Close mozilla.

I went back to the still working bughunter profile.
I emptied the bookmarks first.  Still displaying correctly.
Import bookmarks.html.  Press RELOAD.  Now displays incorrectly.
Close mozilla.

Go back to the bughunter profile - still broken.

So the bug can be propigated via my bookmarks.html file.  Very screwed up.
I will figure out how to attach it to this bug report.
(Reporter)

Comment 8

16 years ago
So it seems the LAST_CHARSET in bookmarks.html for my salon.com bookmark is
the cuprit.

$ grep -i salon bookmarks.html
            <DT><A HREF="http://www.salon.com/" ADD_DATE="1009488799" 
AST_VISIT="1010098472" ICON="http://www.salon.com/favicon.ico"
LAST_CHARSET="UTF-8">Salon.com</A>
    <DT><A HREF="http://www.salon.com/" ADD_DATE="1009488799" 
AST_VISIT="1010098472" ICON="http://www.salon.com/favicon.ico"
LAST_CHARSET="UTF-8">Salon.com</A>

If I make a clean profile and a fresh bookmark and export I get:
$ grep -i salon bh6.html
    <DT><A HREF="http://www.salon.com/" LAST_VISIT="1010100830"
LAST_MODIFIED="1010100820" LAST_CHARSET="ISO-8859-1">Salon.com</A>

which has the correct ISO charset.  Hmm...I see in mar iso_8859-1
"Note that the ISO 8859-1 characters are also the first 256 characters of ISO
10646 (Unicode)."  So I would have naively guessed UTF-8 would not have
rendered it badly.  Oh well.

I do not know exactly why my old salon bookmark has a UTF-8 attribute value
but it would seem that if a website updates / fixes / changes its encoding then
people with legacy bookmarks can be silently screwed.

So I do not a real workaround yet.

Comment 9

16 years ago
Chris, could you attach the bookmark.html file in this bug report?

Comment 10

16 years ago
Chris, ISO-8859-1 and UTF-8 have some character in common, but the encoding of 
all characters above 0x80 is different.

It seems the basic problem you had was the blank value for "Default Character 
Encoding". 
I think this should never happen, and it should be investigated and FIXED.
As a result you were encoutering problems much more often than should be the 
case.

All the pages that you have problem with do not indicate an encoding charset, so 
correct display depends on the "Default Character" being correct, or autodetect 
being enabled and succesful, or the user manually selecting the correct 
encoding. This choice is then memorised in the bookmark entry so that the user 
does not have to reselect it the next time.

The discussion page of arstechnica does indicate utf-8 as it's encoding in web 
server headers, so everythings works good. 
The main page does not indicate anything. 
I think the presence of ISO-8859-1 in the main page is rather accidental.

The problem you have seems to be that as your "Default Character Encoding" was 
blank, very often when you visualised new page, the display was incorrect and 
bookmarks for new pages got created with an incorrect charset.

What happens is that when you visualize several page without encoding indication 
in a row, the last encoding that has been selected is reused. 
This is something that works very well most of the time.
If you visualize a discussion page on arstechnica in utf-8 and have no default 
encoding, utf-8 might get reused for the next page on another site, so that 
would explain why you were often in utf-8 encoding when visualising new pages.

Sites usually don't updates / fixes / changes very often their encoding.
Auto-detect is not effective enough that auto-detect would give better result 
than memorising the page charset in the bookmark entry.

I've checked than if you select a page from your bookmark entries, and manually 
change the encoding to get a correct display, the bookmark entry gets updated 
with the correct charset, and everything works well the next time you access the 
page.

So for me, the bookmark problem is INVALID/WONTFIX.

Comment 11

16 years ago
In some build, "Default Character Encoding" had blank.  It has been fixed.
Chris, could you try what you did in the recent build to reproduce the problem
and log the different bug report?  This original problem is works for me.  I
mark this as worksforme.  

Status: UNCONFIRMED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → WORKSFORME

Comment 12

16 years ago
Verified as worksforme.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.