Open Bug 1223829 Opened 9 years ago Updated 2 years ago

Remove space from XHTML definitions of ⃛ ⃛ ⃜ and ̑

Categories

(Core :: DOM: HTML Parser, defect)

defect

Tracking

()

Tracking Status
firefox45 --- affected

People

(Reporter: fredw, Assigned: fredw)

References

()

Details

(Keywords: parity-chrome, parity-safari)

Attachments

(4 files)

Attached file testcase
The following entities are defined as a sequence space + combining char in the XML Entity spec:

DownBreve U+0020 U+0311
tdot, TripleDot: U+0020 U+20DB
DotDot: U+0020 U+20DC

From the spec: 

"For reasons explained further in [Charmod-norm], it is not advisable to to start the replacement text of an entity with a combining character, as then potentially different results may be produced depending on the order in which entity expansion and Unicode normalisation are performed. As far as possible this specification uses non-combining characters, however, in the cases tdot, TripleDot and DotDot Unicode only has combining forms of the accents, and so the entity replacement text starts with a space, to avoid the possibility that the expansion of the entity combines with preceding text."

This is indeed how they are defined in htmlmathml-f.ent, but that seems to be lost when we expand the entity. I attach a "visual" testcase. I realized that today when I wrote a script test: http://tests.mathml-association.org/mathml/relations/html5-tree/entities.html

WebKit and Blink seems to have the same behavior as Gecko.
Assignee: nobody → fred.wang
Attached patch PatchSplinter Review
Attached file testcase (xhtml)
Here is the same testcase using XHTML. As David Carlisle noted, in that case Gecko does add a space (this is because the DTD https://dxr.mozilla.org/mozilla-central/source/dom/xml/htmlmathml-f.ent is used). Apparently the whatwg is leaning towards keeping the current entity definitions (without a space before combining char). So we should instead probably just fix htmlmathml-f.ent to match HTML5.
If we change this, WebKit/Blink/Gecko will agree on "no space" for both HTML and XML. I think we should go ahead and do that. The HTML standard already requires this (in its XHTML section), it's arguably an oversight that it does so, though at this point it's easier to just change the implementations that do not agree (Gecko, and maybe Internet Explorer).
IE11 seems to do no space as well. I'll try to get htmlmathml-f.ent regenerated upstream without the spaces asap (I could do it now but process to be followed.....) you are of course free to edit your local copy at any time. For your copy you may as well just delete the spaces, in the version I distribute with the entity spec I may decide to make it paramaterised so you can have or not have the space (Or I may not, it may be that the resulting complication documenting how to set the parameter really isn't worth it, deciding....)
Here is a version of the fix+test that instead remove the space from the XHTML definition.
Summary: ⃛ ⃛ ⃜ and ̑ should generate a space → Remove space from XHTML definitions of ⃛ ⃛ ⃜ and ̑
Attachment #8705740 - Flags: review?(hsivonen)
Attachment #8705740 - Flags: review?(hsivonen) → review+
Mass bug change to replace various 'parity' whiteboard flags with the new canonical keywords. (See bug 1443764 comment 13.)
Whiteboard: [parity-webkit][parity-blink]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: