Last Comment Bug 726420 - unicode-bidi:plaintext should make all-neutral paragraphs LTR
: unicode-bidi:plaintext should make all-neutral paragraphs LTR
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Layout: Text (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla13
Assigned To: Simon Montagu :smontagu
:
Mentors:
Depends on:
Blocks: DirAuto
  Show dependency treegraph
 
Reported: 2012-02-12 03:47 PST by Aharon (Vladimir) Lanin
Modified: 2012-02-23 00:54 PST (History)
7 users (show)
smontagu: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
test case being submitted to W3C's HTML5 test suite (2.22 KB, text/html)
2012-02-12 03:52 PST, Aharon (Vladimir) Lanin
no flags Details
Patch (3.15 KB, patch)
2012-02-13 10:38 PST, Simon Montagu :smontagu
ehsan: review+
Details | Diff | Splinter Review
Aharon's test as reftest (4.32 KB, patch)
2012-02-13 10:39 PST, Simon Montagu :smontagu
ehsan: review+
Details | Diff | Splinter Review
An even stronger test case (also being submitted to HTML5 test suite) (2.46 KB, text/html)
2012-02-14 02:32 PST, Aharon (Vladimir) Lanin
no flags Details

Description Aharon (Vladimir) Lanin 2012-02-12 03:47:36 PST
Currently, in a unicode-bidi:-moz-plaintext element, paragraphs that contain no strongly left-to-right (bidi class L) or right-to-left (bidi class R and AL) characters default to the element's computed direction style. This is bad because:
- It does not conform to http://www.w3.org/TR/css3-writing-modes/, which states that "the base directionality of each bidi paragraph for which the element forms the containing block is determined not by the element's computed ‘direction’ as usual, but by following the heuristic in rules P2 and P3 of the Unicode bidirectional algorithm". In turn, http://unicode.org/reports/tr9/#P3 states that  "if a character is found in P2 and it is of type AL or R, then set the paragraph embedding level to one; otherwise, set it to zero." Paragraph level 0 is LTR.
- It will incorrectly display text intended for display using the standards above, such as text generated in existing plain text editors like gedit.
- It does not work well for a paragraph consisting of a phone number, which requires LTR.

That said, I think that there is room for "fudging" a little and making the direction of a completely empty (not just all-neutral) paragraph match the direction of the preceding paragraph, and if there is none, then the computed direction. The purpose of this would be just to set the alignment (when text-align is start or end) such that when typing a sequence of RTL paragraphs in a unicode-bidi:plaintext textarea, one will not initially start off with the caret on the left for each new paragraph. This little tweak will have no affect whatsoever on the display of text outside of a textarea, where the direction of an empty paragraph is a moot point. But this is just icing; the basic point of this bug is that all-neutral paragraphs should default to LTR.
Comment 1 Aharon (Vladimir) Lanin 2012-02-12 03:52:57 PST
Created attachment 596451 [details]
test case being submitted to W3C's HTML5 test suite
Comment 2 Simon Montagu :smontagu 2012-02-12 09:18:46 PST
I don't agree here: in fact I went out of my way to implement this behaviour using NSBIDI_DEFAULT_LTR and NSBIDI_DEFAULT_RTL.

In terms of the Unicode Bidi Algorithm, this follows the second bullet point at http://unicode.org/reports/tr9/#HL1 (perhaps CSS should reference this?)

gedit also has a concept of default direction, initialized from the locale direction (I'm not sure if the user can override it).

Phone numbers ... OK, but other cases of numerals only may want to default to RTL, especially in an Arabic context.
Comment 3 Aharon (Vladimir) Lanin 2012-02-13 00:16:45 PST
> I went out of my way to implement this behaviour

Sorry to hear that ...

> this follows the second bullet point at http://unicode.org/reports/tr9/#HL1
> (perhaps CSS should reference this?)

CSS does *not* define a higher-level protocol. I think the absence of one there is binding. In other words, I do not think it is the right of an individual user agent to define one. It is a recipe for lack of interoperability. If you feel strongly that one should be defined, it needs to be taken up on www-style and put into the Writing Modes Level 3 spec.

> gedit also has a concept of default direction, initialized from the locale
> direction

I did not know that. Regrettable, IMO. Why should a document that looks one way for one user look another way for another user? Could I have a pointer to documentation?

> other cases of numerals only may want to default to RTL, especially in an Arabic context.

I think it would be very hard to find examples of RTL math that do not make use of any letters indicating unknowns or constants. Being in RTL letters, they would force the paragraph RTL anyway.
Comment 4 Aharon (Vladimir) Lanin 2012-02-13 03:18:27 PST
The following is not meant to be an argument one way or the other (I still think that the CSS definition has to be binding), only a clarification of a potentially confusing aspect of this issue.

One might think that the exact definition of unicode-bidi:plaintext is mostly moot, since it will be usually masked by dir=auto, which assigns unicode-bidi:plaintext by default for <pre> and <textarea>, and is probably more likely to be used than unicode-bidi:plaintext directly. That's because a part of the definition of dir=auto (which is not yet implemented in Gecko) is that the element is assigned a CSS direction using the first-strong algorithm: "If such a character is found and it is of bidirectional character type AL or R, the directionality of the element is 'rtl'. Otherwise, the directionality of the element is 'ltr'." (http://dev.w3.org/html5/spec/Overview.html#the-dir-attribute)

This means that <textarea dir=auto>--></textarea> should come out as "-->" (and not as "<--") regardless of whether the <textarea> is in an LTR or RTL page, and regardless of whether unicode-bidi:plaintext uses the computed direction for all-neutral paragraphs or not.

However, even if dir=auto is in use, the exact unicode-bidi:plaintext definition still matters. For example,

<pre dir=auto>
א!
-->
</pre>

will result in the second line being <-- if unicode-bidi:plaintext for all-neutral paragraphs depends on the computed direction of the element, but it being --> if all-neutral paragraphs are always LTR.
Comment 5 fantasai 2012-02-13 04:43:24 PST
Aharon is correct in his interpretation of css3-writing-modes: the 'direction' property is not consulted for 'unicode-bidi: plaintext'. CSS3 Writing Modes *is* the higher-level protocol, and in this particular instance it defers 100% to UAX9.

One of the intentions of this mode is to provide bidi reordering that is compatible with plaintext, so that it is possible to transfer content between HTML and plaintext (for example, between email and a Web email archive) without it getting scrambled. If some plaintext editors are using a different algorithm, that's too bad; we will never get compatibility if we try to emulate them because they don't all agree. In particular, an interpretation that is locale-dependent, such as the one you cite for gedit, would give different results on different platforms, and thus be quite unsuitable for the Web. If 'unicode-bidi: plaintext' were to default to the inherited 'direction', the same problem would occur in Web applications: a user of Hebrew GMail would not see the same rendering of a plaintext email as a user of English GMail.
Comment 6 Simon Montagu :smontagu 2012-02-13 10:38:29 PST
Created attachment 596719 [details] [diff] [review]
Patch

So be it, but it's only a matter of time before the requests to change the spec will start coming in...
Comment 7 Simon Montagu :smontagu 2012-02-13 10:39:19 PST
Created attachment 596720 [details] [diff] [review]
Aharon's test as reftest
Comment 8 Aharon (Vladimir) Lanin 2012-02-14 02:32:39 PST
Created attachment 596952 [details]
An even stronger test case (also being submitted to HTML5 test suite)
Comment 11 Aharon (Vladimir) Lanin 2012-02-23 00:54:34 PST
Thanks for doing this!

Note You need to log in before you can comment on or make changes to this bug.