Character encoding can be wrong when opening link in new window.

NEW
Assigned to

Status

()

Core
Internationalization
14 years ago
3 years ago

People

(Reporter: Daniel Ryde, Assigned: Jungshik Shin)

Tracking

({intl})

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: dupeme, URL)

(Reporter)

Description

14 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007 Firebird/0.7
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20031007 Firebird/0.7

A webpage that HAS a specified character encoding (ex. google utf-8) contains
links to other locations that does NOT have character encodings specified.
Opening one of these links in a NEW WINDOW will render it using the previous
webpage character encoding instead of the default iso-8859-1.

Reproducible: Always

Steps to Reproduce:
1. Exit FireBird an save its profile: rename "Application Data\Phoenix" to Phoenix_x
2. Start FireBird (a new profile will be created) and goto
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=http%3A%2F%2Fwww.puttes.se%2Fsmorgasar%2Fraksmorgas.htm&btnG=Google+Search

3. Open the search result link in a NEW WINDOW (via the rightclick context menu).

Actual Results:  
A webpage that is rendered using the wrong character encoding utf-8. Many of the
charecter is displayed as '?'.

Expected Results:  
It should have displayed a webpage that is renderd using the default character
encoding iso-8859-1.

It is important to clean the profiles by renaming or removing the "Application
Data\Phoenix" dir, and let it create a new fresh profile each time, since
FireBird saves the character encoding in several files, and this can be very
confusing.

It can be noted that opening the link in a NEW TAB instead will result in a
correctly displayed webpage.

Comment 1

14 years ago
It shows ? no matter how I open the link. I didn't make a new profile, I don't
understand how that should matter.

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031206
Firebird/0.7+

Comment 2

14 years ago
Works for me.
Mozilla/5.0 (Windows; U; Win95; en-US; rv:1.6b) Gecko/20031207 Firebird/0.7+

Have you set View --> Character Coding --> Autodetect to Universal? 

I've noticed when I install new builds that it defaults to Off, so, if this
solves it then maybe this needs to be changed so it defaults to Universal.
(Reporter)

Comment 3

14 years ago
Jason:
As I said, FireBird saves the character information in several files (I guess in
cache, and bookmarks), so it will reuse the last encoding for that website next
time you visit it.

Blair:
Ok, setting Autodetect to universal works, almost. The characters are renderd
correctly in this special case, but the charset is Windows-1252, wrong IMHO.
Does'nt the Autodetect feature try to guess the charset depending on the content?
Autodetect is only needed when you have a webpage that does not use the default
(HTML spec.) charset iso-8859-1 and also does not specify which one it is using,
so it will try to guess, and in this case wrongly.

I still think there is a initialization bug lurking around here, especially
considering the difference between opening a new window and opening a new tab.
The latter works correct, the former incorrectly uses the charset from the
referring page.
(Reporter)

Comment 4

14 years ago
The autodetect feature is dangerous and definitely detects wrong sometimes. I
recently visited a site that got renderd using Chinese Simplified GB18030, but
it should have been iso-8859-1. Entire numberseries became '?'.
(Assignee)

Comment 5

14 years ago
What's requested here is :

When opening in a new window/new tab, the default charset (specified in the
preference) should be used instead of the 'parent' charset __when no other
source of information exists__. 

This is not always desirable. Your google example is one of cases where that's
desirable, but there are other cases. For example, Russian web pages are split
between KOI8-R and Windows-1251 with each taking about the equal share. Russian
users usually set their default encoding to either of them. Suppose the default
encoding is set to Windows-1251 and a link (in KOI8-R encoded page) is requested
to be opened in a new window/tab via the context menu. With what you asked for
implemented, the link would be opened in Windows-1251 even though the chance is
pretty high that it's in KOI8-R (assuming that the link is internal.) The same
can happen with Shift_JIS and EUC-JP for Japanese web pages. It can happen even
for Western European pages (ISO-8859-1 vs ISO-8859-15).

Needless to say, if everybody specifies the charset in their pages/sites, there
would be no problem. 

One possible solution would be to use the default encoding instead of the parent
document encoding ONLY if the parent document encoding is UTF-8 (and other
encoding forms of Unicode such as UTF-16, UTF-32) because usually UTF-8 encoded
pages are explicitly tagged.

> charset is Windows-1252, wrong IMHO.

 Windows-1252 is a proper superset of ISO-8859-1. Are you sure the page in
question doesn't have a single character not covered by ISO-8859-1 but covered
by Windows-1252? Anyway, mistaking ISO-8859-1 for Windows-1252 doesn't do any
harm when rendering the page.


re: comment #4
You're right that it doesn't always work. Instead of 'universal charset', you
may want to use one of more restricted detectors. 
Assignee: blake → jshin
Severity: normal → enhancement
Component: General → Internationalization
Keywords: intl
OS: Windows 2000 → All
Product: Firebird → Browser
Hardware: PC → All
Version: unspecified → Trunk

Updated

14 years ago
Whiteboard: dupeme
(Reporter)

Comment 6

14 years ago
Adding a special case for UTF* might help in the google case, but not the
others. A more reasonable approach would be to add a "Use parent charset"
setting in the "View -> Character Coding" menu and everyone will be happy
including the russians.

Please note that opening in NEW TAB or SAME WINDOW is diffrent from opening in
NEW WINDOW. Thus I still consider this a bug, not an enhancement.

But anyway, there is contradiction in the argument of using the parent charset.
Is the entire web build from one mother page defining what charset to use? Why
do we have the ability to select a default charset then?

If the russian "double default character encodings" trouble is to be solved then
we need: selectable multi default character encodings (that works rather similar
to Auto-Detect).

Um, BTW, there is an Auto-Detect - russian. Does'nt this work for the russian
websites?
(Assignee)

Comment 7

14 years ago
Can you tell me why opening in a new tab is different from opening in a new
window? Also, can you tell me your scenario where using 'the default' charset is
better? 

> But anyway, there is contradiction in the argument of using 
> the parent charset. Is the entire web build from one mother page 
> defining what charset to use? Why
> do we have the ability to select a default charset then?

  You didn't pay attention to 'when NO other souce of information is available'
part. Mozilla rely on several different (actually almost 10) sources of
information to determine the document charset. The parent charset and the
default charset take rather __low___ priority in that mechanism. 
(Reporter)

Comment 8

14 years ago
> Can you tell me why opening in a new tab is different from opening in a new
> window?

Why? I have not looked at the source, but I guess it's a bug. If you follow the
link http//www.ryde.net/bug/link.html there is a "how-to" to reproduce the bug.
Important please note: the profile needs to be removed between the tests.

This is the very problem: Open in SAME WINDOW or NEW TAB works perfectly, but
open in NEW WINDOW does not.

> Also, can you tell me your scenario where using 'the default' charset is
> better?

Becouse this is the most common charset for the pages _I_ visit when __no other
source of information exists__. And I dont count the referring page relevant.
(Reporter)

Comment 9

14 years ago
Ok, it might be relevant if the parent (referring) page is in the same
domainname as the new, then the parent charset can be used if no other source of
information exists.
(Reporter)

Comment 10

14 years ago
In Firefox 0.9 the situation is even worse. Instead of printing '?' for the
misencoded characters it randomly removes entire words and sentences.
Severity: enhancement → normal

Comment 11

13 years ago
I really think that it is a (serious) bug.

If you:
1. enter google.com;
2. hit Ctrl+n to open a new window;
3. enter bol.com.br in the new window.

The new window will use the encoding specified in google.com.

This is odd. First of all, this new window aren't related to the old window.
And, of course, the user has choosed to use the *default* enconding (i.e., the
user has choosed to *not use* the autodetect enconding feature).

Why a new window shouldn't use the default enconding?
I agree with Daniel.

It simply does not make any sense that a window inherits properties from a not
related window. Even though the bug disappears with the autodetecting setting,
this is not the the default option! Another point is that, even if the
autodetect setting is turned off, if you open google.com, close Firefox and open
bol.com.br, the bugs does not show up. That show us that the browser works with
bol.com.br with the autodetect setting off, and that the bug is caused due to
the fact that the Firefox is not handling correctly with different enconding of
the two pages. I do not understand why this bug is still unconfirmed, since so
many people are complaining about it.

Comment 13

13 years ago
(In reply to comment #11)
> I really think that it is a (serious) bug.
> 
> If you:
> 1. enter google.com;
> 2. hit Ctrl+n to open a new window;
> 3. enter bol.com.br in the new window.
> 
> The new window will use the encoding specified in google.com.

That is bug 158285.

Comment 14

13 years ago
*** Bug 266440 has been marked as a duplicate of this bug. ***
(Assignee)

Comment 15

13 years ago
How about this? If the character encoding of the current (parent-to-be)
document/window is UTF-8 (or other forms of Unicode), a new window will be
opened without any pre-set character encoding so that the default character
encoding (set in the user's pref.) will be used. This will not fix all the
problems, but will solve most of problems. 

Why can't I just do the above for other encodings? Well, 'follow the
parent/referer encoding-heuristic' was introduced because that's needed (for
Japanese and Russians) and I don't want to break cases that needs it.

Comment 16

13 years ago
(In reply to comment #15)
> How about this? If the character encoding of the current (parent-to-be)
> document/window is UTF-8 (or other forms of Unicode), a new window will be
> opened without any pre-set character encoding so that the default character
> encoding (set in the user's pref.) will be used. This will not fix all the
> problems, but will solve most of problems. 
> 
I think the real problem here is "what should the charset be
when new window is opened and the page doesn't contain charset info?"

We had number of cases where the parent-to-be page had a charset; but
child pages had no charset information.  (If bol.com.br has meta-charset, 
then it will be displayed correctly even after ctl+n from google.com)

The real fix is to have meta-charset in all pages.

Comment 17

13 years ago
(In reply to comment #16)
> I think the real problem here is "what should the charset be
> when new window is opened and the page doesn't contain charset info?"

Yes, exactly.

Why not use the character encoding defined as default in user preferences? I
think this is the behaviour that I expect.

Comment 18

13 years ago
(In reply to comment #17)
> Why not use the character encoding defined as default in user preferences? 
I can't agree with you 100%.

Some point in time, I believe, Netscape/Mozilla used the char encoding defined 
in user pref.  However, the behavior caused the problem where the 
non-meta-charset defined pages are always displaying the page using the default.   
Appearently, users wanted to have the SMARTer behavior to inherite 
the charset from parent page.
( our ex-netscape evangelist momoi-san may be able to give us more info )

Having said I agree from the comment #12 where 
"It simply does not make any sense that a window inherits properties 
from a not related window."   The key is __NOT RELATED WINDOW___

Current implementation is to fix the encoding problem with assumption 
that new window is related to the parent. 
(Incidently, we decided to NOT to inherite properties for new tab window.
 Tab browsing was introduced later in the dev cycle)

IMHO, THE REAL FIX IS TO HAVE META-CHARSET IN ALL HTML PAGES and 
we should close this bug. ( I am sure the same bug will re-surface again 
in future though.... )
Related to this seems to be bug 158285.

Comment 20

12 years ago
There should be a way to set the encoding, couse we live in a real world,
and many sites are only checked with IE.
If we want to get Mozilla/Firefox to be used international,
we must make most of the pages usable as they are.
So let me overwrite any settings, if I know, wat I'm doing.
I hate to set the encoding to ISO 8859-1 on each reload.
I testes it with the new 30gigs.com Mailsystem in german.
There is no encoding set. The umlaute are symbols (? in a box) here.
I set from Unicode to ISO, press reload, set from Uncode to Iso
and so on.
This is a kind of NONSENSE too!
(In reply to comment #20)
> There should be a way to set the encoding, couse we live in a real world,
> and many sites are only checked with IE.
> If we want to get Mozilla/Firefox to be used international,
> we must make most of the pages usable as they are.

Maybe to make behaviour more similar to IE you should make the broswer autodetect the character encoding. In my case (*), I could go to the menu "View -> Character Encoding -> Auto Detect" and choose Universal. I would also advice you to contact the webmaster of the site you are visiting to make the problem known to all concerned parties.


> So let me overwrite any settings, if I know, wat I'm doing.
> I hate to set the encoding to ISO 8859-1 on each reload.
> I testes it with the new 30gigs.com Mailsystem in german.
> There is no encoding set. The umlaute are symbols (? in a box) here.
> I set from Unicode to ISO, press reload, set from Uncode to Iso
> and so on.
> This is a kind of NONSENSE too!

In general, when the character encoding is unspecified through the HTTP request or through an appropriate HTML meta tag, the default encoding in Mozilla Firefox (and other Mozilla browsers, I think) is ISO-8859-1, as the HTML standards suggest. This behaviour can be changed. In my case (*), I can go to the options panel (menu "Edit -> Preferences")m and in the "General" section select "Languages", where I can change the default character encoding.

The fact that when you reload you lose the character encoding is strange but unrelated to this bug. In principle, the character encoding is chached, which means that when revisiting that page or when reloading normally (and maybe even when force-reloading), the character encoding is preserved. Are you sure that the encoding is not set through HTTP or through HTML? If it is not so, you should look for a bug report that matches your description or open a new one.

I cannot make further diagnosis because I cannot visit <http://3gigs.com> in German. In English, the character encoding is specified --at least-- through HTML meta tags... Since your report is not relevant to the bug described in this page, please continue your enquiries elsewhere, unless you have not expressed yourself correctly. Feel free to contact me via e-mail for help regarding this topic, or ask around the Mozillazine forums [http://forums.mozillazine.org/].

Cheers.


(*) I am using Mozilla Firefox 1.0.7 under Debian GNU/Linux, English version.
QA Contact: i18n
You need to log in before you can comment on or make changes to this bug.