Open Bug 218823 Opened 17 years ago Updated 10 years ago

RFE: Automatic paragraph direction of plain text (email, IRC messages, {optionally} textareas, etc.)

Categories

(Core :: Layout: Text and Fonts, enhancement)

enhancement
Not set

Tracking

()

People

(Reporter: bugzillamozilla, Assigned: smontagu)

References

(Blocks 1 open bug)

Details

(Keywords: rtl, Whiteboard: Please read comment #39 before commenting on textareas)

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624

There are many cases where paragraph direction can be assumed according to the
first character.

* In a textarea, if the user starts a paragraph by typing an LTR character (e.g.
an English "A"), it is safe to assume that the general direction of the
paragraph should remain LTR. The same is true with RTL characters (such as a
Hebrew "à") which are usually followed by RTL paragraphs.

* If the first character doesn't have explicit (or "strong") direction, than the
next character that does should set the paragraph direction.

* When a paragraph starts with a character that has different direction than
intended by the author (such as a Hebrew sentence that start with an English
word), there should be an option to manually override Mozilla. Either by
providing a keyboard shortcut (Bug 98160), or via direction buttons in the
toolbar (Bug 96057 and Bug 119857). In the case of plain text, changing the
display of textarea isn't sufficient. Mozilla should insert the relevant control
characters (if supported by the active encoding), so that this text/message
could still display properly at the recipients side.

* This implementation should affect any plain text input widget, as well as the
display of plain text messages (mail, IRC, Gopher and so on). Currently, all
plain text emails that are displayed in Mozilla are rendered left-aligned and
LTR. Implementing my suggestion would fix this problem without requiring any
user intervention.

Reference: At least two existing mail client currently implement a similar
algorithm - KDE KMail and Microsoft Outlook XP.

Prog.

Reproducible: Always

Steps to Reproduce:
Paragraphs #1 and #4 show how context-based automatic direction should work.
Note that the first character sets the direction of the entire paragraph.

Paragraphs #2 and #3 show cases where ambiguous direction can be solved
(manually) through the use of RLM and LRM control characters. The user should
not have to be aware of the underlying workings of such characters, but merely
to use standard direction controls (Ctrl+Shift or Direction buttons). Mozilla
should transparently handle the insertion of these characters in the begining
of the active paragraph, regardless of the current caret position.

Prog.
I'd like to take this, since I did a little preliminary work on it in the past.
As far as I remember, I didn't file a bug on it.
Assignee: mkaply → smontagu
CONFIRMING re Comment #3
Status: UNCONFIRMED → NEW
Ever confirmed: true
My suggested algorithm (hoping that I won't get accused of "blindly following
Unicode" ;-) ) is to follow
http://www.unicode.org/reports/tr9/#The_Paragraph_Level for text we receive. I'm
not yet convinced about adding control characters to text we produce as
suggested in comment 2: I need to consider possible scenarios.
Here are some links to related documents:

RFC 1555 - Hebrew Character Encoding for Internet Messages
http://www.faqs.org/rfcs/rfc1555.html

RFC 1556 - Handling of Bi-directional Texts in MIM
http://www.faqs.org/rfcs/rfc1556.html

Standard ECMA-48 - Control Functions for Coded Character Sets
http://cr.yp.to/bib/1991/ecma-48.pdf

[IS-1904]  (Israeli Standard): Application of Hebrew in mail
I don't have a link, but I'll soon have a copy this document.

Prog.
The first three documents listed in comment #4 are very old (more than 10 
years), thus are quite irrelevant.
The last one (IS-1904) is in Hebrew, so its impact on international developers 
is probably quite limited.
Ooops!!!  In comment #7, I meant "comment #6".  Confused?
shouldn't textarea, being an html element, inherit it's direction from the
document, or from it's own attributes, rather than "decide" independently on
it's direction?
When it is HTML, the spec is quite clear, see 
http://www.w3.org/TR/html401/struct/dirlang.html#h-8.2

But the bug is about plain text.
the bug is about plain text in textarea, among other things. and in a textarea,
IMHO mozilla should not set the direction of the text based on the language
used, but based on the inherited dir attribute.

so i think textareas should be excluded from this bug, and only be handled as
bug 96160 suggests.
There is one very commonly needed situation where implementing this algorithm in
browser textareas would be beneficial: web interfaces for plain text emails.

If the user composes a plain text email that will be rendered using
per-paragraph direction, then the user should see the same result during
composition. Now, I'm not sure how to make this comply with standards, but the
need is definitely there.

Prog.
that was bug 98160, of course.
i can think of many needs for this feature, and many other needs that can be
fulfilled by bending the rules. but that's not why we're here, right?
> but that's not why we're here, right?

Well, in my case, wrong. I'm not into any elusive ideals. I just want to use the
best damn browser in existence and although Mozilla qualifies, it still has room
to improve.

Back to the issue at hand. Possible routes are (1) evolving the standard (not
very likely), (2) providing this functionality as an option/pref or (3) giving
it up altogether. As someone who mostly corresponds using plain text, I'd hate
to see the latter happens.

Prog.
and what if someone wants to start a generaly RTL paragraph with some english
word? i think the most natural solution, at least for text areas, but for
plain-text email messages in Mail & Newsgroups as well, is what is suggested in
bug 98160 - using ctrl+shift, or whatever is common on other platforms, if any.
or perhaps, a combination of the two solution.
(In reply to comment #15)
> and what if someone wants to start a generaly RTL paragraph with some english
> word? i think the most natural solution, at least for text areas, but for
> plain-text email messages in Mail & Newsgroups as well, is what is suggested in
> bug 98160 - using ctrl+shift, or whatever is common on other platforms, if any.
> or perhaps, a combination of the two solution.

I've already suggested a solution for such cases, in comment #2:

> The user should not have to be aware of the underlying workings of such
characters, 
> but merely to use standard direction controls (Ctrl+Shift or Direction
buttons). Mozilla
> should transparently handle the insertion of these characters in the beginning
of the active 
> paragraph, regardless of the current caret position.

Prog.
Blocks: 241587
you are not answering my concern. trying to guess the "right" direction based on
context may result in error, and in user dissatisfaction of the browser's
behaviour. therefore, i think we should stick to the inheritence + user-selected
(via keyboard shortcuts) direction scheme.
(In reply to comment #17)
> you are not answering my concern. 

I already did. See comment #12, but just to reiterate, let's do this over again...

> trying to guess the "right" direction based on
> context may result in error, and in user dissatisfaction of the browser's
> behaviour.

The same error may occur in plain text composition under Mail&News, yet I don't
thing you object to this functionality, right? Users can learn how to make the
best use of automatic paragraph direction, they do on Linux, as well as on
Windows applications such as Babylon Translator.

> therefore, i think we should stick to the inheritence + user-selected
> (via keyboard shortcuts) direction scheme.

Sticking to the old method will leave Mozilla textareas lagging behind the
functionality provided by mail applications, thus effecting many potential users
who use webmail via the browser. With automatic paragraph direction, webmail
users may compose plain text email with paragraphs of different directions and
still see the same layout as an intended recipient who uses a compatible mail
client (such as Outlook XP/2003 and KMail)

Prog.
prog,
a browser is not a mail client, and web mail is far from being the only, or even
the primary, use for textarea.

while in mail the situation is clear - you want to write a message, usually in
one language, on the world wild web, you cannot expect the situation in which
you will use the textarea.

so yes, i don't object to it in mail, but i doubt if it will be of much use in
the browser.
I disagree with much of your comment, but instead of spamming this bug with a
lengthy discussion over webmail usage and trends, let's do this in email or in
one of the forums.

Prog.
Look at line 23 in the screen shot here:
http://linmagazine.co.il/misc/images/screenshot-bluefish-1648.jpg?1148020213

The code is actaully OK, but displayed wrong. How do you propose to overcome
such problems?

The only reason that this seems to be an isolated problem is the fact that I
made sure that the HTML tags are in separate lines.

Unless we have a good way to deal with such cases, we are likely to get into
many messy situatuations which are worse then the current state (with the
directionality bookmarklets).
Your screenshot actually displays the opposite. It looks much more legible than
how a single direction textarea would have displayed the same code. In fact, it
further convinces me that this functionality could be useful not just for
plain-text webmail, but also for web authoring via the browser (which admittedly
is less common).

To make good use of automatic paragraph direction, users will have to know the
rules and how override them if the programs happens to choose the wrong
direction. The basics concepts of both are quite simple:

1. An English character in the beginning of a paragraph is "aligned" to the
left, while a Hebrew character makes the text aligned to the right.
2. To manually right align a paragraph, press the Right Ctrl+Shift. To
left-align a paragraph, press the Left Ctrl+Shift.

These instructions are somewhat simplified and dumbed down, but I'm sure that
most users will not find them difficult to follow.

Prog.
Did you miss my line here?
"The only reason that this seems to be an isolated problem is the fact that I
made sure that the HTML tags are in separate lines."

Don't get confused from the fact that I used a workaround.

This only works well when they are separate- when you have to deal with mixed
text in the same line, you end up with many cases like line 23.

Anyway, how are you planning to deal with cases like line 23 in the screenshot?
Remember- the text is actually ok there, just displayed wrong. Getting it to
display right (manually) produces wrong code.

Try to edit exsisting pages with Bluefish and you can see what I mean. It can be
a real mess. Again- don't get fooled by the fact that I used a workaround in my
screen shot.

(In reply to comment #23)
(In reply to comment #21)
Sorry, Shoshannah Forbes, but your example is not relevant.  We are trying to 
deal with *plain text*, and HTML code certainly does not qualify as such.
Formal languages (like HTML, or Java, ...) are a special case because they 
combine a generally LTR flow (all the syntactic words) with phrases which may 
be in a RTL language.  The orientation of each such phrase should be handled 
independently of the other phrases and of the surrounding LTR syntax.  A 
standard algorithm for plain text, even from Unicode, cannot cope with such 
complexity.
But this is not the problem at hand!
(In reply to comment #24)
> A 
> standard algorithm for plain text, even from Unicode, cannot cope with such 
> complexity.

ýThat is my whole point.


> But this is not the problem at hand!

Why not? Editing a wiki  or a weblog many times ivnvolves code syntax- and they
are done using the browser's textarea. As the bug refferes to textarea's and not
just email, we are going to have to deal with such problems.

And as you wrote- we don't have the tools to do that.

That is why I am opposed to having automatic directionality in textareas.

(In reply to comment #23)
> Did you miss my line here?
> "The only reason that this seems to be an isolated problem is the fact that I
> made sure that the HTML tags are in separate lines."

No I didn't, that's why I suggested to educate users about the benefits and
limitations of this functionality, hence the two rules.

(In reply to comment #24)
> Sorry, Shoshannah Forbes, but your example is not relevant.  We are trying to 
> deal with *plain text*, and HTML code certainly does not qualify as such.

Actually, no one seems to object for implementing this algorithm in plain text
scenarios (such as the mail client). Tsahi and Shosh simply disagree with my
suggestion to also implement this in browser textareas, but if they read comment
#14 again, they'll see that this is only suggested as an optional behavior, not
as the default.

Prog.
Changing the summary to reflect comment #14, option 2:

> Possible routes are (1) evolving the standard (not very likely), (2) providing 
> this functionality as an option/pref or (3) giving it up altogether. As someone 
> who mostly corresponds using plain text, I'd hate to see the latter happens.

Prog.
Summary: RFE: Automatic paragraph direction of plain text (received email, textareas, IRC messages, etc.) → RFE: Automatic paragraph direction of plain text (email, IRC messages, {optionally} textareas, etc.)
Well, it will be ok for email, as long as we don't get confused by quote marks,
and don't brake things like the "quote colors" extenstion (which works great
with moffie's auto-direction extention, which means that this can be done)
Mozilla Thunderbird displaying plain text Hebrew mail with auto-direction using
moofie's extention, and the "quote color" extention.
We should not brake this level of support.
(In reply to comment #25)
I don't want to take side in the discussion between Prog and Shoshannah, 
but... 
Shoshanna sustains that the textarea is no substitute for an HTML editor, and 
she is right.  She infers that there is no point in enhancing it to support 
less ambitious endeavours (e.g. plain text), and the logical relation escapes 
me.
(In reply to comment #30)
> and the logical relation escapes 
> me.

In practice, there are many many cases around the web where a textrea is used as
a "mini html editor". Two common examples are editing weblogs, and editing wikis.

As these are very common uses, I am afraid that in attempt to "support less
ambitious endeavours (e.g. plain text" we are going to brake current widely used
functionality (as I do not see any practical way for mozilla to detect that we
are editing "plain text" vs. text with coded syntax in any given textrea).
Forgot to write (sorry for the spam): these comments are obviosly only valid if
we attempt to implement this in textareas and not just in Email.

If we are only going to implement this in email, then the only concern that I
have is outlined in Comment #29 .
(In reply to comment #29)
> Created an attachment (id=149620)
> screenshot- current status with extentions
> 
> Mozilla Thunderbird displaying plain text Hebrew mail with auto-direction using
> moofie's extention, and the "quote color" extention.
> We should not brake this level of support.

shouldn't the vertical lines to the left of the message be on the right side?
Blocks: 296689
Blocks: Persian
Why is everyone assuming text widgets or plain text messages have paragraphs for
us to play with, to begin with? Think e.g. of one-liner text widgets;

Or suppose I like to type my line
breaks at ends of lines of the same
paragraph. What would you do then?
How will you tell whether the
previous line is a beginning of a
new paragraph or not?

(and this is regardless of how you're determining paragraph direction eventually).

Anyway, do textareas have DOM trees within them? I just inspected this bugzilla
page and it seemed to me like they don't, so what is being suggested w.r.t.
textareas? Creating a DOM subtree, or maybe doing paragraph handling non-DOMishly?

Another point: I'm not sure people would like having paragraphs switching
direction "on their own" as they're writing text. There's a difference between
seeing an incoming message in which you're not the one doing the writing, so
there's just one moment of surprise when it comes up, and composing a message,
when whenever you enter a new paragraph you may be surprised to find the text
switching sides.

Finally, it may be worthwhile considering an implementation of a more limited
feature, regardless of whether direction is auto-set or whether control chars
are used: The ability to set the direction a text widget 'paragraph' manually
_for_viewing_or_composition_purposes_only_ without this affecting the submitted
text (i.e. the same as what happens when you do a Ctrl+Shift+X, but for a single
paragraph rather than the entire text widget, or for the rest of text from this
point on without affecting what you've typed previously).

Ok, that's enough blurb for one comment :-)
I suggest following the HTML specification, i.e. the dir attribute of the 
textarea.
(In reply to comment #36)
> I suggest following the HTML specification, i.e. the dir attribute of the 
> textarea.

Let me elaborate: The display of HTML is governed by the HTML specification, 
http://www.w3.org/TR/html/ and http://www.w3.org/TR/html4/, not by the Unicode 
specification.

This bug isn't a bug.
There are some proposals for CSS3 to support use of direction=auto for
paragraphs, text input, textarea, etc.  So it's good to have such a support
before the standard make us to have.

Another thing is plain-text emails, that should follow unicode standard, not
html/css.  So text widgets (input and view) of mail application needs this feature.
(In reply to rosennej@qsm.co.il, comment #37)
> This bug isn't a bug.

That's right, it's an RFE.

For some reason most of the comments are centered on textareas, so perhaps some
more details are needed to clarify what this RFE is actually about:

1. This RFE is *first and foremost* about auto-direction in the mail and IRC
clients - not about textareas.

2. Textarea implementation is requested as an *option*. The phrasing in the
summary is sufficiently explicit about this ("optionally" anyone?), but I've
also bothered to repeat this fact in the comments more than once. Please read them.

2.1 Textarea with auto-direction is needed by anyone who uses webmail and needs
some visual cue as to how the message will be displayed in auto-direction-aware
mail clients (e.g. Kmail and Outlook). Hotmail and Yahoo alone have more than
200,000,000 users, so it's fair to say that at least a few of them use
plain-text and BiDi and will find such an *option* useful.

Prog.
Whiteboard: Please read comment #39 before commenting on textareas
Depends on Bug 231701 - format=flowed DelSp=yes not supported (RFC 3676)

Because:  Lake of line-break/paragraph-separator detection cause many BiDi
problems, i.e. mail client fails to auto-align paragraph.
Depends on: 231701
*** Bug 296689 has been marked as a duplicate of this bug. ***
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: zach → layout.fonts-and-text
Keywords: rtl
You need to log in before you can comment on or make changes to this bug.