Closed Bug 288453 Opened 20 years ago Closed 20 years ago

Right to left formatting characters ignored in plain text

Categories

(Core :: Layout: Text and Fonts, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 177148

People

(Reporter: peter, Unassigned)

References

Details

(Keywords: rtl)

Attachments

(3 files, 3 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041217
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20041217

This relates to the following e-mail I sent in response to someone's signature:

On 31/03/2005 14:53, <someone> wrote:

>
> ...
>
> (-:  Support left-handed emoticons! 



Much software already does - just precede with the character RLE (U+202B) as
follows:

:-)

(and follow with PDF, U+202C, if you want to go back to normal on the same line)

=====

(There is a RLE character before ":-)".) I tried inserting RLM, RLE and RLO
before ":-)". The effect of any of these should have been to change the
direction of this set of direction-neutral characters to right-to-left, and to
mirror the parenthesis. And indeed this is what happens with Notepad, and what
happens in Mozilla when I insert a Hebrew character. In fact this works
correctly while editing this page in Bugzilla. But it does not work in the mail
composer, in the mail viewer, in the mail source viewer, or in Compose.

Reproducible: Always

Steps to Reproduce:
Copy the quoted text above into a window

Actual Results:  
The line after "as follows" looks like ":-)"

Expected Results:  
The line after "as follows" should look like "(-:"

This should probably be under category "Editor Core" or similar, or "complex
scripts", but such categories have disappeared for Mozilla Suite which is the
program where I observed this problem.

While this particular application is trivial, this is a potentially significant
problem with displaying Hebrew and Arabic text with punctuation correctly.
Component: General → Layout: BiDi Hebrew & Arabic
Product: Mozilla Application Suite → Core
Version: unspecified → Trunk
Sorry about misallocation. I found the right component under "Core" and moved
this bug there. I note that to view this as intended it is necessary to manually
set the encoding of the Bugzilla page to UTF-8.
(In reply to comment #0)
> Steps to Reproduce:
> Copy the quoted text above into a window

Maybe related to bug 287502?

(In reply to comment #2)
>
> Maybe related to bug 287502?
> 
> 
This bug is indeed related, but not the same because it does not refer to
selection. In bug 287502 comment 3 someone wrote "In fact, the root
of this bug may be that Mozilla doesn't honor the right-to-left override, which
would be more serious." This bug is that underlying bug - although referring not
just to RLO but also to RLM and RLE.

This bug is also related to bug 9100. See bug 9100 comment 18. But no one can
appeal to the HTML spec on this as this bug is not about HTML but about plain
Unicode text e-mail, which should conform to the Unicode bidi algorithm whether
or not it contains displayable right-to-left characters. Bug 9100 seems to have
been closed because no one came up with a test case. Well, perhaps I have
succeeded. I don't seem to have succeeded with an HTML test case (but see Bidi
smiley check.html which I am about to upload), but I have succeeded with text
embedded in the initial report of this bug, which includes the RLE character
(even as in the Bugzilla database) and yet is not displayed correctly. In other
words, the bug seems to be in processing plain text (within <pre>...</pre>)
rather than HTML. But I can't find a test case in an HTML page - it must be
something to do with the stylesheets used for displaying Bugzilla pages.
Nevertheless, I have a test case in a plain text e-mail.

But even in the HTML test case RLM seems to be ignored, rather than providing an
RTL context for the following punctuation, although RLE, RLO and the HTML
directives are processed correctly.
Attached file Attempt to find this bug in HTML (obsolete) —
This HTML appears to display correctly except for the smileys preceded by RLM
which are not reversed although they should be - and I note a spurious line
break at the PDF character within <pre>...</pre>.
Attached file The bug displayed in a plain text file (obsolete) —
This bug is displayed when I open this plain text file in Mozilla (with UTF-8
encoding selected). None of the smileys are reversed.
This bug is partly suppressed when a Hebrew alef is inserted later in the plain
text file - although the problem of spurious new lines with PDF reappears.

I note that there is nothing in the Unicode definition of plain text (although
arguably there is in the HTML specification) to suggest that the bidi algorithm
does not apply to text containing bidi control characters but no displayable
RTL characters. It is certainly very strange that the appearance of one section
of a plain text file is changed by a change to a separate and following
section. This can hardly be a matter of efficiency, as it implies that the
software must scan a whole file, potentially megabytes, for RTL characters near
the end before starting to render the start of the file - so it must be far
more efficient (and not incompatible with the HTML spec - indeed probably the
real intention of its authors) for bidi processing to be invoked as soon as any
bidi control character is found.
(In reply to comment #3)

> But even in the HTML test case RLM seems to be ignored, rather than providing an
> RTL context for the following punctuation, although RLE, RLO and the HTML
> directives are processed correctly.


Let me get this issue out of the way first. "&rlm;:-)" in a paragraph with
left-to-right base direction should be displayed ":-)", because the neutral
characters are in conflict between the preceding right-to-left character and the
left-to-right characters following (or the end of the paragraph which is also
considered left-to-right). "&rlm;:-)&rlm;" should be displayed "(-:" (I am using
the HTML entitities for clarity, but the same applies to the raw characters in
plain text, of course)

http://www.unicode.org/reports/tr9/#Resolving_Neutral_Types
Simon, thank you for comment #7. I have added lines to each of the attachments
showing the behaviour when the smiley is enclosed between two RLM's. This HTML
example now seems to display correctly.
Attachment #179261 - Attachment is obsolete: true
The bug is displayed in this plain text file, whether or not there is a
following RML.
Attachment #179262 - Attachment is obsolete: true
The bug now seems to be fully suppressed by adding the alef later in the file,
although the problem with PDF remains.
Attachment #179263 - Attachment is obsolete: true
With a recent build, the smiley is reversed in the case with a PDF behind it,
and taking into account comment #7, I don't see any case of right to left
formatting characters ignored. 

Only the plain text file doesn't work correctly, but this is a more general problem.
Mozilla display plain text by treating it as <pre> surrounded html text, and an
awful lot of i18n doesn't work anymore there. Only basic, LTR, with no composing
characters data is still correctly handled there.
(In reply to comment #11)
> With a recent build, the smiley is reversed in the case with a PDF behind it,
> and taking into account comment #7, I don't see any case of right to left
> formatting characters ignored. 

I agree that there is now no problem with the HTML.
> 
> Only the plain text file doesn't work correctly, but this is a more general
problem.
> Mozilla display plain text by treating it as <pre> surrounded html text, and an
> awful lot of i18n doesn't work anymore there. Only basic, LTR, with no composing
> characters data is still correctly handled there.

Yes, this is the bug, which relates to plain text only, not HTML - although
there is some support for RTL in plain text. I am renaming this bug to clarify
this, if I am allowed to.
Summary: Right to left formatting characters ignored → Right to left formatting characters ignored in plain text
This is a kind of "subset dependency" of bug 177148, for want of a better term.
I mean that fixing bug 177148 (à la attachment 104393 [details] [diff] [review]) would fix this, but it
could also be fixed separately for plain text.

I am beginning to incline to just fix it for all cases. It's becoming clearer
and clearer that that is what people expect, whatever the HTML spec says.
Depends on: 177148

*** This bug has been marked as a duplicate of 177148 ***
Status: UNCONFIRMED → RESOLVED
Closed: 20 years ago
Resolution: --- → DUPLICATE
Reporter, could you verify if after the fix to bug 177148, everything is now working as you expect ?
(In reply to comment #15)
> Reporter, could you verify if after the fix to bug 177148, everything is now
> working as you expect ?
> 
I can confirm that this is now working as expected in Firefox 1.5.0.1. Thank you.
Mass-assigning the new rtl keyword to RTL-related (see bug 349193).
Keywords: rtl
Component: Layout: BiDi Hebrew & Arabic → Layout: Text
QA Contact: general → layout.fonts-and-text
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: