Closed
Bug 264871
Opened 21 years ago
Closed 14 years ago
Universal autodetect needs to be on by default (for all users)
Categories
(Core :: Internationalization, defect)
Core
Internationalization
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: mnemo, Assigned: smontagu)
References
()
Details
(Keywords: intl)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.3) Gecko/20040910
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.3) Gecko/20040910
This problem is NOT releated only to the URL that I just gave. I have been
running into this problem every now and then at different servers. Sometimes
swedish chars appear correctly though, on most servers infact, so I guess it
might be related to some webserver setting (MIME or whatnot?).
Reproducible: Always
Steps to Reproduce:
1. Surf to the URL I gave.
2. Look at the question marks appearing everywhere where there should have been
åäö chars.
Actual Results:
Nothing. I just chilled.
Expected Results:
It should have displayed the åäö chars instead of question marks.
Wrong character encoding, the URL is shown as UTF-8, with ISO-8859-1/-15 it is
ok. Because the server sends no encoding and the page itself contains no info
about it Mozilla have to use the default or auto-detect or the last used.
Do you have Auto-detect on? (see view -> character encoding)
Ok, when I turned out automatic char encoding it looks right.
However, this might have revealed another bug. When I first noticed this bug I
kept changing the char encoding at edit -> prefs -> navigator -> language but
even if I changed that to 8859 it did not work and I did not even find the thing
in the view menu. Maybe these two places should be connected somehow, so if one
is looking for char encoding options one could click some button in prefs and
modify the view menu option too. why is there two anyway? the one in the prefs
does not seem to have a function?
Also, I strongly recommend that you change the default to automatic encoding.
Here is Sweden MSIE users are complaining ALOT on the ååäö issue, infact its
their main argument for MSIE versus mozilla/firefox. Fixing this default setting
would radically impact mozillas ability to capture a larger userbase in Sweden!
(and I assume also in norway, denmark and quite a few other countries)
Comment 3•21 years ago
|
||
Turning out automatic char encoding detection should not make the page look
right.
If you find pages that automatic encoding can not identifiy correctly, you
should enter bugs in Browser/internationalisation for that.
As far as I'm concerned, autodetect universel identifies the page you reference
correctly as windows-1252, so I have the feeling it was turning it on that made
the page display correctly, and for most user of european charset the browser
works better with automatic detection turned on.
The encoding in the pref is the one used when there is no other source to use to
make a decision about what character encoding to use, it is of little use as
there most often is something else involved.
I think you are hit by the fact that when acceeding a page with a link, Mozilla
reuses the encoding of the last page viewed, not the one from the preference.
What makes this bad is Google using UTF-8 to diplay results, and the referenced
pages use ISO-8859-1.
Component: Browser-General → Accessibility APIs
Comment 4•20 years ago
|
||
If things work better with automatic detect, why isn't it on by default? I can't
reproduce this error myself but have heard complaints about it several times
before. It is a hurdle for Mozilla/Firefox acceptance here (it affects both).
If the solution to this issue is to turn on automatic detection, we need to
negotiate a permission to turn this on by default in the swedish localized
versions, preferably before Firefox 1.0... at least if it's not going to be
turned on by default in all versions of Mozilla and Firefox.
If this is not going to be solved in general, it needs to be solved in the sv-SE
l10n version, so if someone can inform me of the destiny of this bug ASAP I'd be
happy.
Keywords: intl
I completely agree, it needs to be the default in some context. Afaik far from
everyone in Sweden uses a 'localized' version though, so my hope is that
enabling automatic detection in the international version is feasible.
Comment 6•20 years ago
|
||
*** Bug 262315 has been marked as a duplicate of this bug. ***
Comment 7•20 years ago
|
||
From bug 262315 I've found a way to reproduce this which always works:
1. Go to www.seb.se
2. Click 'Börs & finans'
The page which loads (in a new window) has <?> for swedish characters. Enabling
Auto-Detect resolves the problem.
Comment 8•20 years ago
|
||
Several other languages have the same problem. This need to be resolved for all
languages, not just swedish.
I'll post a separate bug about a temporary sv-SE-only solution.
Status: UNCONFIRMED → NEW
Component: Accessibility APIs → Browser-General
Ever confirmed: true
Comment 9•20 years ago
|
||
Should we consider this now to be a duplicate of bug 267644 that you entered ?
General comment :
French, German, Swedish and probably some other nationality users all have the
problem that the current heuristic of charset determination fails too frequently.
Turning auto-detect on by default certainly helps lowering the failure rate and
getting a better user experience.
But they are several bug recorded for auto-detect, and nobody is working on
correcting them. This makes it difficult to turn auto-detect on universally at
the moment.
Another option could be to add an option to always use of the default charset
for new pages. This would be a separate bug, and it is left to be proven whether
it would actually work better than the current solution.
Summary: Swedish chars appear as question marks instead of åäö etc → Universal autodetect needs to be on by default (for Swedish users)
Comment 10•20 years ago
|
||
(In reply to comment #9)
> Should we consider this now to be a duplicate of bug 267644 that you entered ?
No. I mean bug 267644 as a workaround-fix-bug for the sv-SE build. Many swedish
people consider themselves at good hand with english and use en-US builds. Other
languages are affected too (see the mozilla-l10n list). Thus this is a bug and
needs to be resolved somehow, not only in the sv-SE builds but in all builds.
Whether or not it can be solved now is a different matter, but that still
doesn't make it a duplicate. I should have more time myself in a month or two,
so I'll try to take a look at the autodetect bugs.
As for the always use the default charset idea, I'd bet that's what IE does. It
also seems the reasonable way to do things - the default charset isnt really
default unless it's used by default when opening pages...
Comment 11•20 years ago
|
||
OK, I'm changing this to a request to turn Universal Detector on by default for
everybody, and setting a list of known problems with auto-detection as blocking it.
I might be missing some more.
The most surprising in fact is that there isn't already an open bug requesting
that, but I never saw one.
Reporter | ||
Comment 12•20 years ago
|
||
What are the current problems with auto-detect anyway?
Comment 13•20 years ago
|
||
See bugs listed as blocking this bug.
OS: Windows 2000 → All
Hardware: PC → All
Comment 14•20 years ago
|
||
One remark about this bug is that auto-detect apparently slows down page loading
in a non-neglectable way, so setting it by default would impact performance and
this could be a problem even if the other points are solved.
Depends on: 306272
Updated•19 years ago
|
Reporter | ||
Comment 15•19 years ago
|
||
This bug is a showstopper for all users that speak a language which uses non-english letters. Still it's been like years without anyone fixing this? In a commercial product this would have been fixed long long long ago (indeed neither msie nor opera has this problem). Why bother to report bugs at all when the devs just arrogantly ignore them like this? What is the problem with this particular bug anyway? Is it exceptionally hard to fix? Or just exceptionally boring?
Sorry for the political rant but I find this very annoying.
Comment 16•19 years ago
|
||
We seem to be doing some progress on a few of the bugs blocking this one (bug 306272, the fix seems to help bug 168526 too, and bug 177505 was fixed a while ago by JGM).
OTOH I have to say I seem to be seeing a regression since a while. I now frequently see ISO-8859-1 content mis-identified as cyrillic (windows-1251) which didn't happen before, especially for some reason inside spam. Unfortunately I haven't the time right now to create a good bug entry on that (or to check if there's already a duplicate), and to try to find a regression window.
Updated•18 years ago
|
Assignee: general → smontagu
QA Contact: general → i18n
Comment 17•14 years ago
|
||
I think this bug should be WONTFIXed. We have gone years and years without having chardet enabled by default for non-CJK, non-Cyrillic locales. So has IE.
Hence, to work in the browsers with the largest market share, sites have had not to rely on sniffing. There may be occasional isolated glitches with the admins of Swedish, etc., sites goofing, but the kind of legacy reliance of heuristic detection that the CJK and Cyrillic locales suffer from hasn't logically had a chance to arise for other locales. Also, a crucial difference between CJK/Cyrillic and other locales is that CJK/Cyrillic have multiple legacy encodings (legacy being anything but UTF-8) while the English, Swedish, Finnish, French, German, etc. locales have only had one serious legacy encoding (Windows-1252), so there's just one thing to fall back on when UTF-8 hasn't been declared.
Enabling chardet for all locales would make it possible for authors to rely on chardet in more locales. Building this kind of reliance where there isn't such a reliance already would be bad for
* user experience
* performance
* interoperability.
It would be bad for interoperability, because heuristic detection isn't well defined and, thus, would be different in subtle ways in different browsers.
It would be bad for performance, because there'd be more code to run.
It would be bad for user experience, because there are two ways to do heuristic detection and both are bad. One option involves buffering up data before showing the page. This is bad for the UX, because the user starts seeing content later. The other option (the Firefox way) is reloading the page. This takes overall longer when the reload happens, looks ugly and messes focus if the user already started interacting with the page.
Comment 18•14 years ago
|
||
Back when this bug was filed, this was a big problem since many of the major Swedish web sites had this problem (one of the examples in the comments above is one of Sweden's major banks). We switched on auto-detect for the Swedish version, and it's been on since.
The web has evolved a long way from then, and I have seen no indications in recent years that we actually still need it.
Assignee | ||
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•