Closed Bug 213517 Opened 16 years ago Closed 6 years ago

Consider making x-user-defined an alias of windows-1252 for IE/Chrome/Safari compat

Categories

(Core :: HTML: Parser, defect)

x86
Windows 2000
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla29
Tracking Status
firefox28 --- fixed
firefox29 --- fixed

People

(Reporter: awernx, Assigned: hsivonen)

References

()

Details

(Whiteboard: [qa-])

Attachments

(3 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6

This site uses a font to render Tamil script. But not all characters render
correctly. There are ? symbols at many places.

Reproducible: Always

Steps to Reproduce:
1. Go to the site.
2. Sometimes, the homepage renders ok. Navigate deeper


Actual Results:  
Improper rendering. Unreadable text

Expected Results:  
Proper rendering.
Reporter, that site is currently defined as in 'x-user-defined' encoding, and
uses a custom font.

Steps for problem reproduction :
- download the custom font, 
- install it (following the instructions), 
- set it as the font to use for 'x-user-defined', 
- (if needed) restart Mozilla/Firefox
- you will see most of the page displayed correctly
- But you have bad, strange characters where you should get ISO-8859-1
- Check in preference that you allow sites to set fonts
- Force the encoding of the page to ISO-8859-1 manually with the menu
- Everything gets displayed correctly

The stylesheet (http://www.vikatan.com/av/avstyles/css/av.css) forces the use of
 the curstom encoded Vikatan_TAM font for the part of the page in tamil, and
Arial for the part in english.

The font selection through stylesheet is not taken into account in the
'x-user-defined' encoding, but is if the encoding is set to 'ISO-8859-1'.

In my opinion, the site is better using "x-user-defined" than "ISO-8859-1" (if
just to help search engine).
Mozilla might have motives for disabling the CSS font rules for
"x-user-defined", but here it doesn't work right.

I expected this to have a duplicate somewhere or related bug, but can't find one.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: Tamil Font ( Vikatan ) does not render correctly → Stylesheet font no applied on x-user-defined page (with Tamil/Vikatan exemple)
Component: Other → Internationalization
Product: Tech Evangelism → Core
Version: unspecified → Trunk
Assignee: other → smontagu
QA Contact: other → i18n
This particular site has migrated to Unicode. However, our x-user-defined handling remains different from  other browsers, so sites that still declare x-user-defined and require an intentionally bogus (commandeering the Latin-1 Supplement for non-Latin use) font to view don't work with the same font in Firefox vs. other browsers.

It seems to me that there is no value* in retaining Firefox-unique x-user-defined handling. For non-XHR uses, we should consider making x-user-defined a label for windows-1252 (or making the label unrecognized if there are x-user-defined uses that commandeer Cyrillic range for non-Cyrillic use in places where it's common for IE to fall back on windows-1251).

* Unless one considers incompatibility as an incentive to migrate to UTF-8 and the proper Unicode ranges.
Summary: Stylesheet font no applied on x-user-defined page (with Tamil/Vikatan exemple) → Consider making x-user-defined an alias of windows-1252 for IE/Chrome/Safari compat
Turns out that WebKit/Blink special-case x-user-defined in <meta> instead of special-casing it for XHR:
https://bugs.webkit.org/show_bug.cgi?id=18270
https://mxr.mozilla.org/chromium/source/src/third_party/WebKit/Source/core/fetch/TextResourceDecoder.cpp#139

If we did the same, users who have Arial AM  (http://main.am/armenianchurch/ARIALAM.TTF ; Googling around suggests that font has been widely distributed in Armenia) installed could see http://archive.hetq.am/arm/ visually as Armenian and users who have Arial LatArm (apparently similiar font) installed could read see http://archive.aravot.am/ visually as Armenian.

According to StatCounter, Chrome has way more market share than we do in Armenia, so it might make sense to adopt this WebKit/Blink quirk.
Testing the sites mentioned in the WebKit bug, this would give http://www.bartamanpatrika.com/ a chance to render. The other sites mentioned in the WebKit bug no longer depend on user-installed non-Unicode fonts.
It seems like we could make that WebKit hack part of the HTML Standard by changing the encoding, if the encoding returned at the <meta> point is x-user-defined, into windows-1252. Shame on them for not bringing this up in a standards context.
(In reply to Henri Sivonen (:hsivonen) from comment #4)
> Testing the sites mentioned in the WebKit bug, this would give
> http://www.bartamanpatrika.com/ a chance to render. The other sites
> mentioned in the WebKit bug no longer depend on user-installed non-Unicode
> fonts.

Huh. Upon second load, this site no longer shows mojibake. I guess it has a slow-loading @font-face font these days or something.
Filed bug 944673 and bug 944671 about evangelizing the sites to make their archives readable cross-browser without special config or user action.
Attached patch Clone the WebKit/Blink hack (obsolete) — Splinter Review
Filed bug 944668 as follow-up.
Assignee: smontagu → hsivonen
Status: NEW → ASSIGNED
Component: Internationalization → HTML: Parser
I forgot how the error reporting works. Let's try again:
https://tbpl.mozilla.org/?tree=Try&rev=71fc0455f46c
Attachment #8340320 - Attachment is obsolete: true
It's worth noting that adopting the Chrome behavior would make our font settings for "User Defined" definitely useless. The setting is in practice useless already, since the real-world fonts that one would use with the feature use the Latin-1 range instead of the PUA range, but doing what Chrome does would remove even the remaining theoretical usefulness.

If we want to make the "User Defined" font setting actually make sense, then we should adopt IE's behavior instead of Chrome's behavior: Keeping x-user-defined as a distinct encoding while making it decode like windows-1252 (for non-XHR uses).

Do we want to get rid of the "User Defined" font prefs (the Chrome way) or do we want to make the "User Defined" font prefs actually work now that the feature has been broken for a decade and the feature is less relevant than it would have been a decade ago if it had worked back then?
If the feature has been broken over a decade, reviving it does not seem worth it today if Chrome does not have it. Let's embrace it as a win and remove more dead code. x-user-defined making sense should not be high on our priority list :-)
(In reply to Anne (:annevk) from comment #12)
> If the feature has been broken over a decade, reviving it does not seem
> worth it today if Chrome does not have it. Let's embrace it as a win and
> remove more dead code. x-user-defined making sense should not be high on our
> priority list :-)

Let's see if folks who could r+ the would-be dead code removal agree.

smontagu, jfkthame:

We have the following constraints:
 1) x-user-defined declared in HTML <meta> should decode like windows-1252.
 2) x-user-defined should decode like it currently does in the context of XHR responseText.

These constraints can be met either by cloning WebKit/Blink behavior exactly or my cloning IE behavior with an XHR exception. (A third option would be trying to take the WebKit/Blink approach and try to make it less ad hoc, but I'd rather not make up new behaviors that other browsers don't already have.)

# WebKit/Blink behavior

The label x-user-defined  is changed to windows-1252 in the code that handles HTML <meta> declarations.

As a consequence, the font preference for User Defined makes no sense (currently, it doesn't make sense, because it doesn't actually work; it hasn't worked for the past decade!) and we should also remove the User Defined font UI and the gfx code that treats User Defined as a distinct script / language group.

Pros:
 * Matches Chrome.
 * Allows us to remove mystery UI.
 * Potentially allows us to remove the gfx concept of encoding-coupled font prefs in the future.

Cons:
 * If a given usage pattern of x-user-defined has more than one font with the same codepoint to glyph mapping, the user has to have all the fonts installed to browser all the sites that declare these different fonts. Having only one font installed and configured via the User Defined font UI won't work for all sites. Concretely,  if the user has only Arial AM  installed,  the user won't be able to browse sites that declare Arial LatArm.  To browse those sites, the user also needs to have Arial LatArm installed.

# IE behavior with XHR exception

Change the XPCOM constructor for the x-user-defined decoder to instantiate the windows-1252 decoder. Special-case XHR to instantiate the current x-user-defined decoder (using straight C++ |new|) when decoding for responseText. (The use of x-user-defined in XHR is a Mozilla-documented hack that's actually in use, so we can't break that one even if IE doesn't support that hack.)

Pros:
 * Matches IE for non-XHR usage.
 * Allows users who have only Arial AM installed also to browse Arial LatArm sites *iff* they know how to operate the User Defined font UI. (Potentially the analogous scenario arises with other intentionally mis-encoded fonts for other scripts, but I don't know if pairs like Arial AM and Arial LatArm for other scripts, such as Tamil are actually in circulation.)
 * Conceptually a cleaner fix than the WebKit/Blink hack.

Cons:
 * Perpetuates the User Defined mystery UI in the font prefs.
 * Perpetuates the concept of gfx having encoding-dependent font fallback.

- -

Which fix should I implement? I.e. which one you'd rather r+?

I kinda like the second approach, since special-casing XHR gives a less ad hoc result than special-casing <meta>. But I also worry about perpetuating the gfx/UI side of x-user-defined.

Either fix would work (visually) for sites that keep their legacy HTML (i.e. resist proper migration to UTF-8) but throw in their special font via @font-face as an afterthought.
Flags: needinfo?(smontagu)
Flags: needinfo?(jfkthame)
(In reply to Henri Sivonen (:hsivonen) from comment #13)
> (A third option would be
> trying to take the WebKit/Blink approach and try to make it less ad hoc, but
> I'd rather not make up new behaviors that other browsers don't already have.)

From IRC:
< zcorpan> hsivonen: so for the record let's consider #3 remove mystery UI, make x-user-defined alias of windows-1252 everywhere except XHR and in XHR it maps to the x-user-defined encoding
(In reply to Henri Sivonen (:hsivonen) from comment #13)
> Cons:
>  * If a given usage pattern of x-user-defined has more than one font with
> the same codepoint to glyph mapping, the user has to have all the fonts
> installed to browser all the sites that declare these different fonts.
> Having only one font installed and configured via the User Defined font UI
> won't work for all sites. Concretely,  if the user has only Arial AM 
> installed,  the user won't be able to browse sites that declare Arial
> LatArm.  To browse those sites, the user also needs to have Arial LatArm
> installed.

Considering bug 910187 comment 14, we probably shouldn't worry too much about the case where the user has only one of Arial AM and Arial LatArm installed.
Given that comment it seems the Blink/WebKit approach has only pros, no cons. And removing UI that has not worked anyway for over a decade and is obsolete these days (people will not be taught about it) seems like the right thing to do.
Yes, I'd love to see "User Defined" disappear from the font preferences dialog; nobody[1] has the faintest idea what it means or what to use it for anyway. We shouldn't spend time and effort to fix this if we have the option of banishing it instead.

(The whole font prefs dialog needs a radical overhaul, IMO, being largely based on pre-Unicode charsets that are pretty well obsolete. But that's for another day, another bug.)

[1] Obviously, that's a sweeping generalization, but I think it's near enough true!
Flags: needinfo?(jfkthame)
Hi everybody. Just my 2 cents in this discussion.
After some playing I found a very simple trick that works in Chrome/Opera/Safari.  

I experimented on this very old site in ArmSCII-8. http://users.freenet.am/~vm/  ( hosted on Freenet.am an Armenian Tripod analog) 
This site has NO CSS, Using old <font> tags. It is defined x-user-defined in META. 
So this is a worst case scenario.

Google Chrome didn't show it automatically, because there were no any CSS declared. So I go in Settings->Show advanced settings...   Entered in Customize font. Choosed an armscii-8 compatible font "Arial AM"  and now bingo it is visible.
This same trick worked in Opera and Safari also. It worked also in Gnome browser WEB ( ex-Epiphany )

Unfortunately it didn't work in Firefox.  

This was a worst case scenario, as  i said in another thread, when  the CSS is declared properly and the necessary fonts are present in the PC, then Chrome handles this situation automatically without any tricking and hacking. 

So i think that handling "user defined" in a Google Chrome way is a good choice, In my humble opinion. 

p.s. Just one minor thing. In Google Chrome the font customization place is very hard to find. I hope Firefox will not do the same thing :)
I must add , that this was a just an Armenian example. I don't know what is the situation in other languages.
(In reply to Henri Sivonen (:hsivonen) from comment #13)
> Which fix should I implement? I.e. which one you'd rather r+?

I prefer the first option (Webkit/Blink)
Flags: needinfo?(smontagu)
Attachment #8349319 - Flags: review?(smontagu)
Attachment #8349320 - Flags: review?(jfkthame)
(In reply to Jonathan Kew (:jfkthame) from comment #17)
> (The whole font prefs dialog needs a radical overhaul, IMO, being largely
> based on pre-Unicode charsets that are pretty well obsolete. But that's for
> another day, another bug.)

In case someone finds this bug report in the future while searching for related bugs: The Latin part of changing how the font prefs work is bug 756022.
Comment on attachment 8349319 [details] [diff] [review]
Clone the WebKit/Blink hack, with bitrot fixed

Review of attachment 8349319 [details] [diff] [review]:
-----------------------------------------------------------------

::: parser/html/nsHtml5StreamParser.cpp
@@ +1207,1 @@
>        newEncoding.EqualsLiteral("UTF-16LE")) {

Does this change come from another patch?

::: parser/htmlparser/tests/mochitest/test_bug672453.html
@@ +30,1 @@
>  ];

nit: trailing whitespace
Attachment #8349319 - Flags: review?(smontagu) → review+
Comment on attachment 8349320 [details] [diff] [review]
Remove the x-user-def language group

Review of attachment 8349320 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, provided tests are happy. There are several existing testcases that use charset=x-user-defined; are these affected at all by the changes here?

::: gfx/thebes/gfxOS2Fonts.cpp
@@ +506,5 @@
>  
>      nsTArray<nsString> familyArray;
>      ForEachFont(FontCallback, &familyArray);
>  
> +    // To be able to easily search for glyphs in other fonts, append theu

s/theu/the/
Attachment #8349320 - Flags: review?(jfkthame) → review+
(In reply to Simon Montagu :smontagu from comment #24)
> >        newEncoding.EqualsLiteral("UTF-16LE")) {
> 
> Does this change come from another patch?

No. It's a trivial consistency change to a comparison that's neighboring to the comparison being added. (As seen in the diff, nsHtml5MetaScannerCppSupplement.h already didn't compare for "UTF-16", which can't occur at that point anyway.)

> ::: parser/htmlparser/tests/mochitest/test_bug672453.html
> @@ +30,1 @@
> >  ];
> 
> nit: trailing whitespace

Fixed. Thanks.

(In reply to Jonathan Kew (:jfkthame) from comment #25)
> LGTM, provided tests are happy. There are several existing testcases that
> use charset=x-user-defined; are these affected at all by the changes here?

The tests looked OK on try.

> s/theu/the/

Fixed. Thanks.

Landed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/91c405b0ab50
https://hg.mozilla.org/integration/mozilla-inbound/rev/59e7ca87c163
https://hg.mozilla.org/mozilla-central/rev/91c405b0ab50
https://hg.mozilla.org/mozilla-central/rev/59e7ca87c163
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
This patch is a subset of attachment 8349319 [details] [diff] [review] to backport the change to Aurora without string changes.

I think backporting to Aurora makes sense in order to have this (Chrome-consistent) mechanism for reading Armenian legacy sites in the same release that takes away the previous (Firefox-specific) mechanism for reading them.
Attachment #8355523 - Flags: review?(smontagu)
Comment on attachment 8355523 [details] [diff] [review]
Part 1 without error reporting for Aurora

Review of attachment 8355523 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, since I already reviewed the trunk version :)
Attachment #8355523 - Flags: review?(smontagu) → review+
Comment on attachment 8355523 [details] [diff] [review]
Part 1 without error reporting for Aurora

[Approval Request Comment]
> Bug caused by (feature/regressing bug #): 

This is an ancient bug, but it became more relevant to Firefox 28 due to bug 805374 removing a Firefox-specific workaround for Armenian legacy sites.

> User impact if declined: 

Legacy Armenian sites, such as the archive sites http://archive.hetq.am/arm/ and http://archive.aravot.am/ would be unreadable for the Firefox 28 release if this patch is not taken to Firefox 28. With this patch, the sites become readable under the same font installation conditions as in Chrome (also roughly the same as in IE).

Previously, up to and including Firefox 27, the sites were readable in Firefox in a Firefox-specific way by manually making a relatively obscure choice form the menu on a page-by-page basis. This Firefox-specific workaround for a Firefox-specific problem (this bug) went away in Firefox 28 (with the fix for bug 805374).

> Testing completed (on m-c, etc.):

Landed on m-c. The fix clones the behavior of Blink and WebKit, so the behavior introduced to Gecko here should be pretty safe.
 
> Risk to taking this patch (and alternatives if risky): 

Very low risk. The alternative to taking the patch is having the mentioned sites go unreadable without workaround in Firefox for the Firefox 28 cycle.

> String or IDL/UUID changes made by this patch:

None.
Attachment #8355523 - Flags: approval-mozilla-aurora?
Attachment #8355523 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Given in-testsuite coverage this fix will not be manually verified by QA. If you believe this warrants extra QA attention please nominate for testing by removing this whiteboard tag and adding the verifyme keyword. Please also provide any details you have that may inform our testing.
Whiteboard: [qa-]
Blocks: 1303529
You need to log in before you can comment on or make changes to this bug.