264871 - Universal autodetect needs to be on by default (for all users)

Reporter

Description

•

21 years ago

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.3) Gecko/20040910 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.3) Gecko/20040910 This problem is NOT releated only to the URL that I just gave. I have been running into this problem every now and then at different servers. Sometimes swedish chars appear correctly though, on most servers infact, so I guess it might be related to some webserver setting (MIME or whatnot?). Reproducible: Always Steps to Reproduce: 1. Surf to the URL I gave. 2. Look at the question marks appearing everywhere where there should have been åäö chars. Actual Results: Nothing. I just chilled. Expected Results: It should have displayed the åäö chars instead of question marks.

OstGote!

Comment 1

•

21 years ago

Wrong character encoding, the URL is shown as UTF-8, with ISO-8859-1/-15 it is ok. Because the server sends no encoding and the page itself contains no info about it Mozilla have to use the default or auto-detect or the last used. Do you have Auto-detect on? (see view -> character encoding)

martin

Reporter

Comment 2

•

21 years ago

Ok, when I turned out automatic char encoding it looks right. However, this might have revealed another bug. When I first noticed this bug I kept changing the char encoding at edit -> prefs -> navigator -> language but even if I changed that to 8859 it did not work and I did not even find the thing in the view menu. Maybe these two places should be connected somehow, so if one is looking for char encoding options one could click some button in prefs and modify the view menu option too. why is there two anyway? the one in the prefs does not seem to have a function? Also, I strongly recommend that you change the default to automatic encoding. Here is Sweden MSIE users are complaining ALOT on the ååäö issue, infact its their main argument for MSIE versus mozilla/firefox. Fixing this default setting would radically impact mozillas ability to capture a larger userbase in Sweden! (and I assume also in norway, denmark and quite a few other countries)

Jean-Marc Desperrier

Comment 3

•

21 years ago

Turning out automatic char encoding detection should not make the page look right. If you find pages that automatic encoding can not identifiy correctly, you should enter bugs in Browser/internationalisation for that. As far as I'm concerned, autodetect universel identifies the page you reference correctly as windows-1252, so I have the feeling it was turning it on that made the page display correctly, and for most user of european charset the browser works better with automatic detection turned on. The encoding in the pref is the one used when there is no other source to use to make a decision about what character encoding to use, it is of little use as there most often is something else involved. I think you are hit by the fact that when acceeding a page with a link, Mozilla reuses the encoding of the last page viewed, not the one from the preference. What makes this bad is Google using UTF-8 to diplay results, and the referenced pages use ISO-8859-1.

Component: Browser-General → Accessibility APIs

Mikael Hedberg

Comment 4

•

21 years ago

If things work better with automatic detect, why isn't it on by default? I can't reproduce this error myself but have heard complaints about it several times before. It is a hurdle for Mozilla/Firefox acceptance here (it affects both). If the solution to this issue is to turn on automatic detection, we need to negotiate a permission to turn this on by default in the swedish localized versions, preferably before Firefox 1.0... at least if it's not going to be turned on by default in all versions of Mozilla and Firefox. If this is not going to be solved in general, it needs to be solved in the sv-SE l10n version, so if someone can inform me of the destiny of this bug ASAP I'd be happy.

Keywords: intl

martin

Reporter

Comment 5

•

21 years ago

I completely agree, it needs to be the default in some context. Afaik far from everyone in Sweden uses a 'localized' version though, so my hope is that enabling automatic detection in the international version is feasible.

Mikael Hedberg

Comment 6

•

21 years ago

*** Bug 262315 has been marked as a duplicate of this bug. ***

Mikael Hedberg

Comment 7

•

21 years ago

From bug 262315 I've found a way to reproduce this which always works: 1. Go to www.seb.se 2. Click 'Börs & finans' The page which loads (in a new window) has <?> for swedish characters. Enabling Auto-Detect resolves the problem.

Mikael Hedberg

Comment 8

•

21 years ago

Several other languages have the same problem. This need to be resolved for all languages, not just swedish. I'll post a separate bug about a temporary sv-SE-only solution.

Status: UNCONFIRMED → NEW

Component: Accessibility APIs → Browser-General

Ever confirmed: true

Jean-Marc Desperrier

Comment 9

•

21 years ago

Should we consider this now to be a duplicate of bug 267644 that you entered ? General comment : French, German, Swedish and probably some other nationality users all have the problem that the current heuristic of charset determination fails too frequently. Turning auto-detect on by default certainly helps lowering the failure rate and getting a better user experience. But they are several bug recorded for auto-detect, and nobody is working on correcting them. This makes it difficult to turn auto-detect on universally at the moment. Another option could be to add an option to always use of the default charset for new pages. This would be a separate bug, and it is left to be proven whether it would actually work better than the current solution.

Summary: Swedish chars appear as question marks instead of åäö etc → Universal autodetect needs to be on by default (for Swedish users)

Mikael Hedberg

Comment 10

•

21 years ago

(In reply to comment #9) > Should we consider this now to be a duplicate of bug 267644 that you entered ? No. I mean bug 267644 as a workaround-fix-bug for the sv-SE build. Many swedish people consider themselves at good hand with english and use en-US builds. Other languages are affected too (see the mozilla-l10n list). Thus this is a bug and needs to be resolved somehow, not only in the sv-SE builds but in all builds. Whether or not it can be solved now is a different matter, but that still doesn't make it a duplicate. I should have more time myself in a month or two, so I'll try to take a look at the autodetect bugs. As for the always use the default charset idea, I'd bet that's what IE does. It also seems the reasonable way to do things - the default charset isnt really default unless it's used by default when opening pages...

Jean-Marc Desperrier

Comment 11

•

21 years ago

OK, I'm changing this to a request to turn Universal Detector on by default for everybody, and setting a list of known problems with auto-detection as blocking it. I might be missing some more. The most surprising in fact is that there isn't already an open bug requesting that, but I never saw one.

Component: Browser-General → Internationalization

Depends on: 168526, 177505, 178495, 181344

Summary: Universal autodetect needs to be on by default (for Swedish users) → Universal autodetect needs to be on by default (for all users)

martin

Reporter

Comment 12

•

21 years ago

What are the current problems with auto-detect anyway?

Jungshik Shin

Comment 13

•

21 years ago

See bugs listed as blocking this bug.

OS: Windows 2000 → All

Hardware: PC → All

Jean-Marc Desperrier

Updated

•

21 years ago

Depends on: 285435

Jean-Marc Desperrier

Comment 14

•

20 years ago

One remark about this bug is that auto-detect apparently slows down page loading in a non-neglectable way, so setting it by default would impact performance and this could be a problem even if the other points are solved.

Smokey Ardisson (offline for a while; not following bugs - do not email)

Updated

•

20 years ago

Depends on: 306272

Jean-Marc Desperrier

Updated

•

20 years ago

Depends on: 301915, 306224
No longer depends on: 181344

martin

Reporter

Comment 15

•

19 years ago

This bug is a showstopper for all users that speak a language which uses non-english letters. Still it's been like years without anyone fixing this? In a commercial product this would have been fixed long long long ago (indeed neither msie nor opera has this problem). Why bother to report bugs at all when the devs just arrogantly ignore them like this? What is the problem with this particular bug anyway? Is it exceptionally hard to fix? Or just exceptionally boring? Sorry for the political rant but I find this very annoying.

Jean-Marc Desperrier

Comment 16

•

19 years ago

We seem to be doing some progress on a few of the bugs blocking this one (bug 306272, the fix seems to help bug 168526 too, and bug 177505 was fixed a while ago by JGM). OTOH I have to say I seem to be seeing a regression since a while. I now frequently see ISO-8859-1 content mis-identified as cyrillic (windows-1251) which didn't happen before, especially for some reason inside spam. Unfortunately I haven't the time right now to create a good bug entry on that (or to check if there's already a duplicate), and to try to find a regression window.

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

19 years ago

Assignee: general → smontagu

QA Contact: general → i18n

Henri Sivonen (:hsivonen)

Comment 17

•

15 years ago

I think this bug should be WONTFIXed. We have gone years and years without having chardet enabled by default for non-CJK, non-Cyrillic locales. So has IE. Hence, to work in the browsers with the largest market share, sites have had not to rely on sniffing. There may be occasional isolated glitches with the admins of Swedish, etc., sites goofing, but the kind of legacy reliance of heuristic detection that the CJK and Cyrillic locales suffer from hasn't logically had a chance to arise for other locales. Also, a crucial difference between CJK/Cyrillic and other locales is that CJK/Cyrillic have multiple legacy encodings (legacy being anything but UTF-8) while the English, Swedish, Finnish, French, German, etc. locales have only had one serious legacy encoding (Windows-1252), so there's just one thing to fall back on when UTF-8 hasn't been declared. Enabling chardet for all locales would make it possible for authors to rely on chardet in more locales. Building this kind of reliance where there isn't such a reliance already would be bad for * user experience * performance * interoperability. It would be bad for interoperability, because heuristic detection isn't well defined and, thus, would be different in subtle ways in different browsers. It would be bad for performance, because there'd be more code to run. It would be bad for user experience, because there are two ways to do heuristic detection and both are bad. One option involves buffering up data before showing the page. This is bad for the UX, because the user starts seeing content later. The other option (the Firefox way) is reloading the page. This takes overall longer when the reload happens, looks ugly and messes focus if the user already started interacting with the page.

Mikael Hedberg

Comment 18

•

15 years ago

Back when this bug was filed, this was a big problem since many of the major Swedish web sites had this problem (one of the examples in the comments above is one of Sweden's major banks). We switched on auto-detect for the Swedish version, and it's been on since. The web has evolved a long way from then, and I have seen no indications in recent years that we actually still need it.

Simon Montagu :smontagu

Assignee

Updated

•

15 years ago

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → WONTFIX

Bugzilla

Universal autodetect needs to be on by default (for all users)

Categories

(Core :: Internationalization, defect)

Tracking

()

People

(Reporter: mnemo, Assigned: smontagu)

References

(
URL
)

Details

(Keywords: intl)

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Comment 14

Updated

Updated

Comment 15

Comment 16

Updated

Comment 17

Comment 18

Updated