Closed Bug 287502 Opened 15 years ago Closed 13 years ago

Right-to-left text reordering directives confuses text selection

Categories

(Core :: Selection, defect, trivial)

x86
Linux
defect
Not set
trivial

Tracking

()

RESOLVED DUPLICATE of bug 246482

People

(Reporter: david_costanzo, Unassigned)

Details

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b2) Gecko/20050323
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8b2) Gecko/20050323

Normally, when you double-click on a word, it highlights and selects everything
between the surrounding whitespace.  Normally, when you select a word and
right-click, the "Search Web For" option includes the highlighted text.

However, if you're looking at HTML that includes non-printing character
entities, such as "‮", this does not happen.  Instead, double-clicking on
a word may highlight some adjacent words (and may only highlight part of the
desired word).  Furthermore, right-clicking will show "Search Web For" followed
by some text that may not even be highlighted.


Reproducible: Always

Steps to Reproduce:
1. Open "text-selection-bug.html"
2. Select (by double-clicking) the word "You" in "You can't select it"
3. Right-Click

Actual Results:  
When you double-click on "You" the text "text. You can" is highlighted.
When you right-click, the "Search Web For" shows "s t r a n g e".
Pressing CTRL+C, selecting a text area, and pressing CTRL+V inserts the word
"strange".

Expected Results:  
When you double-click on "You", the text "You" should be highlighted.
When you right-click, the "Search Web For" should show the word "You".

I do not know the meaning of "‮".  I receive it in a spam to my Yahoo!
account.  I noticed how strangely the text selection was behaving and decided to
report it.

I have given this a "trivial" severity because not only is this a minor cosmetic
problem, it only affects the rendering of bizarre or malicious HTML.
An HTML that contains non-printing character entities (on my system, anyway). 
This is the HTML that is mentioned in the repro scenario.

The entity "‮" appears between each letter of the text "This is some
strange text.  You can't select it".
I can reproduce this behaviour with Mozilla 1.8b1. For the user it seems that
Mozilla only selects the visible text but it also selects the invisible
characters. E.g. you seem to select "This is" (= 7 real characters), but in fact
Mozilla selects "T‮h‮i‮s" (= 3 invisible + 4 visible
characters). Replacing ‮ with ‭ or ‬ gives the same effect.

FWIW, according to http://www.fileformat.info/info/unicode/char/202e/index.htm
is "&#8238" an Unicode Character 'Right-To-Left Override' (U+202E). 
Thanks for the URL.  That's the exact resource I was looking for last night.

It's interesting that "‮" is a right-to-left override.  In fact, the root
of this bug may be that Mozilla doesn't honor the right-to-left override, which
would be more serious.

As I said, I received this repro scenario in a phishing scam e-mail.  The e-mail
had many apparent typos--transposing of letters.  I assumed it was either to
bypass a spam filter or because the spammer didn't know English.  But maybe he
was using a trick that transposes the letters in HTML (to fool the spam filter),
but uses Unicode control characters to re-assemble the text correctly on the
screen (to fool the human).  Anyway, this trick worked on Yahoo!'s spam filter.

The original spam used other different non-printing escape sequence that I
didn't include in my repro.  It was probably a right-to-left override (‭).

If anyone has access to Interent Explorer, could you view HTML attachment and
see if the second line reads from right-to-left?
(In reply to comment #3)
> If anyone has access to Internet Explorer, could you view HTML attachment and
> see if the second line reads from right-to-left?

Yes, with IE6 the second line does read from right-to-left (.ti tceles t'nac uoY
.txet egnarts emos si sihT), but both Mozilla 1.8b1 and IE6 seem to display the
examples from http://www.robinlionheart.com/stds/html4/dir correctly.
Reply to comment #4:
> both Mozilla 1.8b1 and IE6 seem to display the
> examples from http://www.robinlionheart.com/stds/html4/dir correctly.

Great page!  The text selection bug exists even when Mozilla correctly swaps the
reading order, so ignoring the override character entities is independent of
this bug.

By the way, the page links to a W3C recommendation that asks browser to *ignore*
the directional overrides characters.  But that doesn't make this bug
irrelevant, because the bug still occurs when you override the direction with
HTML tags (which is the preferred way of doing it).

I've renamed this summary in this bug to reflect that the selection problem is
tied to the right-left direction of the text, not non-printing HTML characters.

By the way, I noticed a few other problems that may be related.  I'll file them
as separate bugs if you don't think they're the same as this bug.  First, if you
open attachment #178444 [details], then save it to disk, the file changes--the character
entities are replaced with something else (probably the UTF-8 encoding of the
characters).  Second, if you select the bad text in attachment #178444 [details],
right-click, and choose "View Selection Source", you don't see the character
entities.  This is inconsistent with what happens when you "View Page Source".
Summary: Non-printing HTML character entities confuses text selection → Right-to-left text reordering directives confuses text selection
This is a HTML fragment that uses the preferred method of reversing the text
direction.  It uses the "dir" attribute of the "p" element and "bdo" tags.  The
text selection still behaves strangely.
==> selection
Assignee: general → selection
Component: General → Selection
Product: Mozilla Application Suite → Core
QA Contact: general
Version: unspecified → Trunk
About the other problems :

- "View Selection Source" showing the data, not the entities

"View Selection Source" shows content interpreted from the DOM, not the original
content download. It's normal and per the spec that you don't get the entities.

BTW while testing under 1.7.3, I was seeing some strange spurious characters
within the selection, it doesn't happen with the trunk. Looks like it was an
occurence of the recently solved security issue about leaking random data from
the stack in the selection.

- Save to disk saving the data, not the entities :

This only happens when you do a save "web page, complete", the behaviour will be
as you expect with "html only". Related to bug 115328. 
OTOH even if 115328 is solved, "save web page complete" works from the DOM
representation of the page, not the original content, so it can be seen as
normal behavior, and therefore INVALID/WONTFIX to request keeping the entities
in the saved version.

So it's only features, not a bug.

About the bug itself, the last attachment enables to confirm it, but I wonder if
a more bidi related component wouldn't be more adequate ? Is the selection owner
willing to handle bidi troubles ?
Status: UNCONFIRMED → NEW
Ever confirmed: true
The main issue described here is a duplicate of bug 246482 (fixed on trunk). Marking as a duplicate of that bug.

Please report any remaining issues as separate bugs.

*** This bug has been marked as a duplicate of 246482 ***
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.