Closed Bug 849113 Opened 11 years ago Closed 9 years ago

Remove UI and HTML parser use of the "universal" encoding detector

Categories

(Core :: Internationalization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: hsivonen, Assigned: hsivonen)

References

Details

(Whiteboard: [fixed by bug 805374 and bug 844115])

Attachments

(1 file, 2 obsolete files)

Bug 848842 makes fixing bug 844115 in one go infeasible. Let's remove the UI for activating the "universal" detector and stop using it in the HTML parser first.
We have no locales enabling this by default.
Attachment #722781 - Flags: review?
Comment on attachment 723454 [details] [diff] [review]
Stop using the "universal" detector in the HTML parser and remove the UI, with assertion counts adjusted

Review of attachment 723454 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, I can't review something which I don't believe in. IMO bug 844115 and all its dependencies are WONTFIX, but I admit the possibility that I'm not objective because of the amount of time that I've invested in encoding detection.
Attachment #723454 - Flags: review?(smontagu)
(In reply to Simon Montagu from comment #3)
> Sorry, I can't review something which I don't believe in. IMO bug 844115 and
> all its dependencies are WONTFIX, but I admit the possibility that I'm not
> objective because of the amount of time that I've invested in encoding
> detection.

Can you propose how to solve the following set of problems?

We have a detector labeled as "universal". The idea of a universal detector appeals to people and from time to time people who don't know that the "universal" detector isn't actually universal turn it on by default for a localization (has happened for Swedish and Traditional Chinese, both now reverted) or uses it in some new Gecko code even when a spec doesn't call for it (happened with File API which per spec should use UTF-8 if there's no label).

Using the universal detector exposes non-obvious implementation-specific mystery behavior to the Web. There seems to be neither an effort to standardize the details of the behavior nor an effort to make the "universal" detector actually universal.

I think it's not okay to expose implementation-specific mystery behavior as part of the Web platform. I also think it's not okay to use the enticing label "universal" for something that's not actually universal but people who see the label don't know that. Undoing changes that were made with the faulty assumptions that the detector was universal and that detection is good (as opposed to being a source of implementation-specific mystery) is always harder than making changes under faulty assumptions.

(I'm saying that the "universal" detector is not universal, because it seems arbitrary that "universal" includes Hebrew and Thai but does not include Arabic and Vietnamese.)

Note that I am not proposing the removal of the CJK detection code that lives under the "universal" detector in the source tree. Also, if there is a clear need still for the Hebrew detector, I think we could have a detector labeled as a Hebrew detector in the menu.
Also, every time a Web author turns on any detector is an opportunity for that author to publish Web content that depends on the implementation-specific behaviors of that detector. Consider how we'd feel if people were authoring content depending on a set of mystery behaviors in IE or Chrome.
Attached patch Fix bitrotSplinter Review
Attachment #723454 - Attachment is obsolete: true
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [fixed by bug 805374 and bug 844115]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: