Open Bug 321444 Opened 19 years ago Updated 3 years ago

Visualization of BiDi control, Unicode whitespace and invalid Unicode characters when pasting text from clipboard into editbox, textarea

Categories

(Core :: DOM: Serializers, defect)

defect

Tracking

()

People

(Reporter: gangleri, Unassigned)

References

()

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

Hallo!

feature / enhancement request

To understand the enhancement request please think at the following scenario:
Users are writing an article for a wiki. They post a message in a forum etc. Generaly they write in a form / editbox. Wiki contributors make researches about their subject / topic. They want to cite or reference to availble text.

a) *copying from a rendered page*
Now see http://yi.wiktionary.org/w/index.php?oldid=6888 . The page and its main area is RTL. If you copy the first 3 - 6 non empty lines and paste the clipboard into http://pastebin.com/ you will see there also a starting RLE and a PDF at the end.

But normal users are not aware what is in the clipboard. they will not use "pastebin.com" and the following happens:
- wiki articles are created starting with a genaral pinctuation character (this behaviour / result has been found at the contributions of more users)
- wiki articles have the punctuation characters inside the title (because of later modifications to the pasted text); last titles are extremly hard to find manualy
- inside the page source BiDi characters break the wiki syntax (because users are not aware that lines start with somthing they can not see)
-- broken lists
-- broken tables
-- broken links to existing pages

In editboxes these general punctuation characters are creating much trouble. If the PFD is deleted this will / might influence the evaluation of the nesting levels of the BiDi algorithm which is supporting only a limited number. Arrow movemments between lines (especially if you move into "nothing"), selection with the mouse, <home>, <end> etc. gets "out of control".

As a result many users will stop editing RTL because they will not understand what happens, they will not find a relation / determinism between what they do and how the system reacts.

b) *Copying form the editbox:*
Please verify
http://yi.wiktionary.org/w/index.php?oldid=6888&action=edit
See http://pastebin.com/477916
There are no dir="foo" there.
There is a control characters at the end of line #7 :
#
[[:category:&amp;#1510;&amp;#1522;&amp;#1463;&amp;#1496; - tsayt|&amp;#1510;&amp;#1522;&amp;#1463;&amp;#1496; - tsayt]]&amp;#8236;
This was *not* intended at the time I editited that page.

Conclusion: If a marked text is inserted into the clipboard it makes sense to strip the BiDi control characters from the edges. The content of the clipboard can be embeded with "visible marup". This is more transparent for everyone.

best regards reinhardt [[user:gangleri]]

from
http://landfill.bugzilla.org/bugzilla-tip/show_bug.cgi?id=3298
== when marking / selecting BiDi text and copying it will this automaticaly embed the selection in general BiDi punctuation characters

Reproducible: Sometimes

Steps to Reproduce:
Be patient!
http://pastebin.com/477920 does not show punctuation.
neither http://pastebin.com/477922 or http://pastebin.com/477925
from http://yi.wiktionary.org/wiki/%D7%94%D7%95%D7%A0%D7%98

Please recall pages you have created yourself and work there.



Does this all makes sense?
Stripping the BiDi characters from the edges can make sense. Editors know the position where they are inserting and can insert embeding if needed.

What alternatives are available?
- Escape BiDi characters using HTML entities for RLM and LRM and &#nnnn; for Unicode characters where no HTML entity is availbale.

What are the worst scenarios?
Imagne somebody id marking text and this the content of the clipboard would not be balanced. Please take a look at
bug 320273 BiDi: request for a "BiDi balancing function" to avoid BiDi overlapping between objects
Summary: Srip bidi control characters from edges when copying text into clopboard → Srip BiDi control characters from edges when copying text into clopboard
Summary: Srip BiDi control characters from edges when copying text into clopboard → Strip BiDi control characters from edges when copying text into clopboard
Severity: normal → enhancement
Component: General → DOM to Text Conversion
Product: Firefox → Core
Version: unspecified → Trunk
I agree that the copying of non-printing control characters is something which merits more consideration w.r.t. what would be the most useful behavior, but stripping these characters is out of the question. Please change bug title.
(In reply to comment #1)
> I agree that the copying of non-printing control characters is something which
> merits more consideration w.r.t. what would be the most useful behavior, but
> stripping these characters is out of the question. Please change bug title.
 
I'm the one who suggested this bug title to Reinhardt. Note that it only suggests stripping control chars from the edges of the selection (i.e. beginning and end). Why is this out of the question? What would you suggest as an alternative?

Myself, I'm still undecided on whether anything should be done here at all or not. I suggested filing a bug so a discussion can take place.
(In reply to comment #1)
> I agree that the copying of non-printing control characters is something which
> merits more consideration w.r.t. what would be the most useful behavior, but
> stripping these characters is out of the question. Please change bug title.

Thanks Eyal for the feedback! I agree that the old summary did not point to all related issues. Thinking at backward compatibility the old summary would be impossible to achive.

The enhancement request is related to a new functionality for users which's implementation in the operating system could make more sense then the implementation in a browser. However it can have its place in browser's too.

Whatever change in behaviour is implemented users should have the oportunity to choose in advanced settings, else discussions will continue forever. The discussion about the default value of that addional parameter should be postponed until the specification of the feature / enhancement is made.

The main reason for the request is to provide a way for "helpless" users to deal with characters they can not see.

Copying may differ depending if users copy from "rendered page" or from the content of an editbox / textbox. Last sentence relates for example to the invalid Unicode characters which may be coded in &#nnnn; / &#nnnn; notation in the editbox / textbox but would render as Unicode Character REPLACEMENT CHARACTER U+FFFD elsewhere.

Depending on where marking and copying is made it will relate / will have implications on other topics as cursor movements (see bug 283415 Caret must be moved by grapheme cluster boundaries) etc.

P.S. I do not know if colors could be used inside an editbox. There invalid characters could be marked red. But this might all be issues for different bug reeports /enhancement requests.
Summary: Strip BiDi control characters from edges when copying text into clopboard → Visualization of BiDi control, Unicode whitespace and invalid Unicode characters when copying text into clopboard
P.S. to comment 4
The feature would only have effects if the content of the clipboard would be pasted somewhere. Changing the coding is usefull for input in another editbox / textbox while unmodified paste might be more suitable when the content is pasted in other applications.

after mid air collision with comment 3

(In reply to comment #2)
> I suggested filing a bug so a discussion can take place.

Thanks Uri for having this discussion here. The correct title is still unclear to me because but it seems to relate more and more on how the clipboard content is pasted. The problem was described as a main edting problem. If users paste the content in other applications it is up to them. I can live also with pasting of the unchanged clipboard content in the browsers url field. This is more or less "hacking" and peoples doing that should take responsibility and should verify if they get the required results or not.
*note*
This request does not require escaping of "&" as "& amp;" (without space). Escaping of "&" as "& amp;" (without space) would cause code similar to "&amp;amp;rlm;", "&amp;amp;amp;rlm;", "&amp;amp;amp;amp;rlm;" etc. when multiple copy and paste is made.
Summary: Visualization of BiDi control, Unicode whitespace and invalid Unicode characters when copying text into clopboard → Visualization of BiDi control, Unicode whitespace and invalid Unicode characters when pasting text from clopboard into editbox, textarea
This might belong to another bug / feature request:

Please copy the text at
http://yi.wiktionary.org/wiki/user:bugzilla/unicode/Unicode_Characters_in_the_Combining_Diacritical_Marks_Block#examples 
from "examples" to links.

Please paste it into a textarea. Most of the Unicode Characters in the Combining Diacritical Marks Block can *not* be seen neither at the original page *nor* in the textarea.

If you insert the clipboard at http://pastebin.com/ you will have identical results as the second part of http://pastebin.com/479467 .
Summary: Visualization of BiDi control, Unicode whitespace and invalid Unicode characters when pasting text from clopboard into editbox, textarea → Visualization of BiDi control, Unicode whitespace and invalid Unicode characters when pasting text from clipboard into editbox, textarea
Assignee: nobody → dom-to-text
QA Contact: general
This could be handled by including a font like the specials.ttf included with Unibook at http://unicode.org/unibook/ which has glyphs for all of the Unicode control characters (in the private use range), and using that to display control characters in edit boxes.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee: dom-to-text → nobody
QA Contact: dom-to-text

Bulk-downgrade of unassigned, >=3 years untouched DOM/Storage bug's priority.

If you have reason to believe this is wrong, please write a comment and ni :jstutte.

Severity: normal → S4
Priority: -- → P5

:jstutte : The downgrade is wrong.

First, on principle - the fact that a bug is not getting attention from developers should not in itself self-perpetuate through auto-downgrading.

Concretely, S4 means "small or trivial", "low or no impact", merely costmetic etc. That is not the case. This has a significant detrimental effect on the ability of right-to-left language users (and other users working with content rendered using complex unicode) to copy text into an editable area and work with it. It is specifically significant in the context of Thunderbird, where editing text is of paramount significance; and text is occasionally copied from HTML messages.

It's true that we've all had to just live with the way things are, but it is at least an S3, I would say.

Severity: S4 → S3

In fact, I'd say the lack of visualization, at least as an option, constitutes a defect, nor a mere enhancement request.

Type: enhancement → defect
Priority: P5 → --
You need to log in before you can comment on or make changes to this bug.