Closed
Bug 248304
Opened 20 years ago
Closed 19 years ago
Poor text-encoding guessing for sites with no specified encoding (e.g. pound signs on UK sites)
Categories
(Camino Graveyard :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
Camino0.9
People
(Reporter: stuart.morgan+bugzilla, Assigned: sfraser_bugs)
References
()
Details
Attachments
(2 files)
257 bytes,
text/html
|
Details | |
10.62 KB,
patch
|
mikepinkerton
:
review+
|
Details | Diff | Splinter Review |
There have been several forum complaints about sites with garbled text, especially numbers, on sites that don't have an encoding set (see the "More Buying Choices" box on the right of the test URL, before and after setting text encoding to ISO Latin 1). Changing the encoding manually works, but is a pain if it's a site they use often or navigate around very much. According to the reports, the other Moz family browsers don't have problems with the test URL or many other sites, so we should look into doing what they are doing to guess text encoding.
Reporter | ||
Comment 1•20 years ago
|
||
*** Bug 249196 has been marked as a duplicate of this bug. ***
Comment 2•20 years ago
|
||
*** Bug 257383 has been marked as a duplicate of this bug. ***
Comment 3•20 years ago
|
||
*** Bug 263704 has been marked as a duplicate of this bug. ***
Comment 4•20 years ago
|
||
Add note about pound signs to summary as this is one of the most noticeable effects of this bug.
Summary: Poor text-encoding guessing for sites with no specified encoding → Poor text-encoding guessing for sites with no specified encoding (e.g. pound signs on UK sites)
*** Bug 280172 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 6•19 years ago
|
||
This is a common complaint from the feedback.
Priority: -- → P2
Target Milestone: --- → Camino1.0
Comment 7•19 years ago
|
||
They are many duplicates directly to this bug, but it would be better to check if the page if wrongly detected as SJIS, in which case it's a duplicate of bug 168526, or wrongly detected as GB18030, in which case it's a duplicate of bug 181344. And if it's neither of those, it shows they are more problems than those two to solve to make the universal detector more reliable, which is interesting info that must not be lost because the duplicate marking was too quick. From a very superficial check, all the already reported cases seems to be duplicate of bug 181344 (the pound sign wrongly leads the detector to interpret the page as GB18030). If so, I think it would be better to tag them as such, so that it's easier to check if a fix for bug 181344 fixes them all.
Depends on: 181344
> According to the reports, the other Moz family browsers don't have problems
> with the test URL or many other sites
This is Camino specific and not one of those bugs.
Assignee | ||
Comment 9•19 years ago
|
||
This probably only happens in Camino because it uses the Universal Charset Detector.
Comment 10•19 years ago
|
||
this needs to be fixed for 09, we get a lot of feedback about this from non-US users.
Status: NEW → ASSIGNED
Target Milestone: Camino1.0 → Camino0.9
*** Bug 290317 has been marked as a duplicate of this bug. ***
*** Bug 292320 has been marked as a duplicate of this bug. ***
This is apparently happening on MacFixIt now/today, guessing a Shift JIS encoding.
Assignee | ||
Comment 14•19 years ago
|
||
Testcase. Interestingly, a single pounds sign displays OK, but with > 1, the encoding is guess incorrectly.
Assignee | ||
Comment 15•19 years ago
|
||
The testcase is incorrectly detected as GB18030, so that's bug 181344.
Assignee | ||
Comment 16•19 years ago
|
||
OK, here's the deal. We hardcode the "intl.charset.detector" pref to "universal_charset_detector" in [PreferencesManager syncMozillaPrefs], which means that every camino user will have this in their prefs.js file. If I take that out (and nuke the pref by hand), the testcase works. So do we want to expose toggling "Auto Detect" on and off via the Text Encoding menu?
Assignee | ||
Comment 18•19 years ago
|
||
This patch does several things: 1. Flips the "universal_charset_detector" off for people running a build with this change for the first time (using a new pref version key), and removes the hardcoding of this pref. 2. Adds a "Automatically Detect Page Encoding" item to the bottom of the text encodings menu, which toggles the "universal_charset_detector" on and off (reloading the page when toggled) 3. Makes the Text encodings menu not auto-update; we update it on display (this removes code that assumed that any meny item with a tag > 10 was an encoding item)
Attachment #187903 -
Flags: review?(pinkerton)
Assignee | ||
Updated•19 years ago
|
Status: NEW → ASSIGNED
Comment 19•19 years ago
|
||
wait wait wait. we had to put that pref in otherwise a number of sites wouldn't render correctly (or at all) because the encoding would be wrong. I think it was many japanese/chinese/korean/russian sites, I don't remember the details, it's all hazy. Maybe look back into bugzilla to see what turning this stuff on was fixing. i know i turned that stuff on for a reason.
Comment 20•19 years ago
|
||
(In reply to comment #19) > wait wait wait. we had to put that pref in otherwise a number of sites wouldn't > render correctly (or at all) because the encoding would be wrong. I think it was > many japanese/chinese/korean/russian sites, I don't remember the details, it's Well, not quite true unless you're truly multilingual and have to/can read many different languages (C,J,K, and R). For most people, just using a lang-specific detector which works better for (as opposed to the universal detector) should suffice. Actually, Japanese and Russians need JA/Ru detectors but Chinese(both SC and TC) and Koreans don't need detector most of time because for the latter group, there is a single dominant encoding which can be set to the default.
Assignee | ||
Comment 21•19 years ago
|
||
I thought we put the pref in early on because we didn't want Camino to have a Text Encodings menu. But now we have one.
Comment 22•19 years ago
|
||
i recall putting it in close to 0.7 shipping, after we had an encoding menu. again, that's just a recollection.
Assignee | ||
Comment 23•19 years ago
|
||
That's probably because we used to have it in all-camino.js, but they they changed some of the i18n prefs to use this funky locale thing, so we have to move it into code. I don't see why we need to be any different than Firefox, and this patch makes it us similar.
Assignee | ||
Comment 24•19 years ago
|
||
Bugs to look at when testing for regressions: bug 180703 (duped to bug 153150) Test pages: http://www.rest.co.il/yoezer/ (hebrew) http://aoshimak.tripod.co.jp/ http://www.geocities.co.jp/ http://forums.maccentral.com/wwwthreads/showthreaded.php?Cat=&Board=Lounge&Number=270015&Search=true&Forum=Lounge&Words=BiggerFoot&Match=Username&Searchpage=0&Limit=25&Old=1week&Main=270015 http://www.tvland.com/shows/ http://slashdot.org/article.pl?sid=02/10/20/156247&mode=thread&tid=141 https://bugzilla.mozilla.org/show_bug.cgi?id=168526 (yes, the whole bug)
Assignee | ||
Comment 25•19 years ago
|
||
All those urls look fine with the charset detector off.
The crash logs pasted into bug 281679 seem to have fooled the charset detector, too; I don't know what it's guessing, because there's *no* check in the text encoding menu.
Assignee | ||
Comment 27•19 years ago
|
||
It's probably guessing gb18030, which is a superset of gb2312 I believe.
Comment 28•19 years ago
|
||
Comment on attachment 187903 [details] [diff] [review] Patch r=pink. good idea on the prefs version as well. we'll need that going forward.
Attachment #187903 -
Flags: review?(pinkerton) → review+
Assignee | ||
Comment 29•19 years ago
|
||
Checked in. Note that the first time you run a build with this change (which will be 20050705), the universal charset detector will be turned off. (If you go and run an older build, it will get turned back on.) You can toggle the detector on and off via the bottom item of the Text Encodings submenu.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 30•19 years ago
|
||
Pageload went from ~870ms to ~800ms with this change.
(In reply to comment #30) > Pageload went from ~870ms to ~800ms with this change. Smaller is better? Does that mean bug 234683 got fixed (found it by accident; perhaps what Mike was remembering in comment 19 or 22)?
Comment 32•19 years ago
|
||
*** Bug 305775 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•