Can't insert non-breaking space (U+00A0) in message composition, and existing nbsp converted into space (0x20) when surrounding characters are altered/deleted - STR: Comment #6

NEW
Unassigned

Status

()

P3
critical
9 years ago
a year ago

People

(Reporter: neil_mayhew, Unassigned)

Tracking

({dataloss, ux-consistency, ux-efficiency})

Trunk
dataloss, ux-consistency, ux-efficiency
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [poor/fragile workaround: comment 4] [Tools for analysis: comment 6])

(Reporter)

Description

9 years ago
User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5
Build Identifier: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.5) Gecko/20091130 Thunderbird/3.0

Typing Option-Space on Mac generates a non-breaking space (U+00A0). This works in other applications (eg TextEdit). In TB3rc2 on Snow Leopard a regular space is inserted.

However, it works in the Insert HTML dialog, but only if I select the words either side of the break before opening that dialog. I can't insert just the space that way.

I get the same results if I use the Character Viewer (previously known as Character Palette) and click Insert. I can verify what type of space is inserted in several ways, such as by looking at the line breaking.

Reproducible: Always

Steps to Reproduce:
1. Open a new message window
2. Make the window narrow
3. Type words until the line wraps by one word
4. Remove the last space typed, leaving the insertion point between the words
5. Press Option-Space, or open Character Viewer and insert U+00A0
6. Observe line breaking
Actual Results:  
Line breaks before final word

Expected Results:  
Line breaks before second-to-last word

I am using Mac OS 10.6.2. My locale is Canada but the problem also exists with the standard US keyboard.
(Reporter)

Updated

9 years ago
Version: unspecified → 3.0
Status: UNCONFIRMED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 290565
Reopening/un-duping after we've just fixed a large chunk of the problem in bug 290565, but unfortunately users still have no direct and reliable way to enter nbsp when composing, which is a major shortcoming given that some locales like French heavily depend on non-breaking spaces (e.g. to keep last word and trailing punctuation together). But there are plenty of usecases in many languages, e.g. users might just want to keep numbers and their units together, or initials and last names etc.
> 100_Euro (where _ =  )
> 100_times faster than...
> Thomas_D.
> J._K.

(In reply to Thomas D. from bug 290565 comment #44)
> At least non-breaking spaces are now getting sent when they are in the
> composition.
> 
> However, getting them into composition is still way too complicated, which
> looks like the main pain point of comment 0:
> - Any direct entry of nbsp into composition fails (alt+255, alt+0160,
> copy/paste using charmap app, Ctrl+Shift+Space not implemented)
> - pasting generally works but not always, e.g. can't paste only nbsp from
> notepad++
> - Insert > HTML:   only works if context words are also inserted, not
> for adding only   at cursor insertion point.
> For this basic everyday task, we can't expect our users to continue using
> workarounds involving copy/paste, Insert > HTML, or Jörg's great
> ThunderHTMLEdit addon
> (https://addons.mozilla.org/en-us/thunderbird/addon/thunderhtmledit/).

Such workarounds are obviously annoying in terms of ux-efficiency, and also violating ux-consistency with other apps out there which just have Ctrl+Shift+Space to enter nbsp. Replacing user-entered data with something else is unwarranted dataloss -> critical.

> So we'll want to open a followup bug for that, which includes implementing
> the default keyboard shortcut for inserting nbsp, Ctrl+Shift+Space. I've
> tried and that part is actually fairly easy (apart from some focus issues),
> but unfortunately in my tests, even editor.insertHTML(" ") fails, so I
> don't know how to add the nbsp to the message source so that it lasts. Any
> ideas welcome.
Blocks: 290565
Severity: normal → critical
Status: RESOLVED → REOPENED
Ever confirmed: true
Keywords: dataloss, ux-consistency, ux-efficiency
OS: Mac OS X → All
Hardware: x86 → All
Resolution: DUPLICATE → ---
Summary: Can't insert non-breaking space (U+00A0) in message composer on Mac → Can't insert non-breaking space (U+00A0) in message composer
Version: 3.0 → Trunk
This is most likely a bug in Core > Editor, but I'll let others decide which product/component is best.
There's definitely something weird going on which involves word boundaries, internal dom node boundaries, or on-the-fly serializers or some such (sorry I'm not very familiar with these things).

Maybe for someone who knows more than me, the following observation can be helpful:

STR

1) In Windows Notepad, type "#_#" where _ is a nbsp entered with Alt+255.
2) In TB composition, type HelloWorld (without space), then paste #_# from Notepad between Hello and World, so we now have:
> Hello#_#World.
3) Shrink the window of TB composition and insert spaces before "Hello" until it starts to wrap into the next line

Actual result:
#_# has been correctly inserted, nbsp is still present in composition, as seen when the whole phrase "Hello#_#World" wraps into the next line, not just #World.

4) Now, use any known method to delete any one of the hash characters (#) around the non-breaking space, and insert spaces before Hello as needed to force wrapping.

Actual result:
The nbsp is replaced by a regular space (0x20), as seen when World alone wraps into next line

Expected result:
nbsp inserted by user by whichever method must be preserved, and not disappear randomly when surrounding characters are deleted or altered.

Observation:
Non-breaking spaces can only be inserted into composition with at least one leading and trailing other character, and will remain in composition if the sequence remains untouched.
Inserting nbsp alone, or with only one leading, or only one trailing other character fails (same applies for Insert > HTML).
Removing any of the leading or trailing characters inserted with nbsp will replace nbsp with normal space 0x20.

Maybe this observation gives a hint to where the problem is?
Summary: Can't insert non-breaking space (U+00A0) in message composer → Can't insert non-breaking space (U+00A0) in message composition, and existing nbsp converted into space (0x20) when surrounding characters are altered/deleted
Poor/fragile workaround:

Any indirect method which inserts the non-breaking space together with at least one leading and one trailing other character. Thereafter, never touch that character sequence in any way.

E.g., enter #, alt+255, # in Notepad, then copy/paste "#_#" into TB composition (where _ is nbsp).
Or if you want to get "huhu_!", type and select "huhu!", then Insert > HTML, place cursor before the exclamation mark, press Alt+255, then "Insert" (note that you have to select the words/characters surrounding the intended nbsp for this trick to work!).
Whiteboard: [poor/fragile workaround: comment 4]
Maybe there's something like DOM-internal text nodes, which come with a trimming function, and node boundaries are wrongly determined instead of morphing foo_bar into a single DOM-internal text node which must not be trimmed? Just speculating...

Comment 6

a year ago
str
Thunderbird users read here:

Please use ThunderHTMLedit 1.7 (attachment 8884657 [details], publication on AMO imminent). It shows clearly were NBSP are.

Just for fun, type |A<space>B| and look at the HTML. Then type |A<space><space>B| and look at the HTML. Lo and behold, you'll see NBSP which the editor added for you, since if it hadn't, the second space would have been collapsed. Now delete the second space and see what happens.

So what I'm trying to say is that NBSP are handled a lot by the M-C Core::Editor and users never think of it.

I'm not sure how reasonable the expected result "and not disappear randomly when surrounding characters are deleted or altered" really is since the editor does remove NBSP when it deems fit.

BTW, I can reproduce your test easily:
Enter |A# #B|. In ThunderHTMLedit replace the space by a &#xA0; or &nbsp; Then delete the two #'s in the "normal view" and check the HTML again. The NBSP is gone.

Firefox users read here:

Do this on http://www-archive.mozilla.org/editor/midasdemo/:
Enter |A# #B|. In the HTML view replace the space by a &#xA0; or &nbsp; Then delete the two #'s in the "normal view" and check the HTML again. The NBSP is gone.
Status: REOPENED → NEW
Component: Message Compose Window → Editor
Product: Thunderbird → Core
Summary: Can't insert non-breaking space (U+00A0) in message composition, and existing nbsp converted into space (0x20) when surrounding characters are altered/deleted → Can't insert non-breaking space (U+00A0) in message composition, and existing nbsp converted into space (0x20) when surrounding characters are altered/deleted - STR: Comment #6
Whiteboard: [poor/fragile workaround: comment 4]

Updated

a year ago
Whiteboard: poor/fragile workaround: comment 4
Followup note: When we have succeded to find any way of directly inserting a nbsp which survives alteration of surrounding characters, Thunderbird will want to implement a cmd_insertNbsp with shortcut key Ctrl+Shift+Space (with key=" ", VK_SPACE did not work in my tests), in a followup bug. Mind that cmd_insertNbsp must only be enabled when focus is in text input fields where we want and are able to accept that input.
(In reply to Jorg K (GMT+2) from comment #6)
> Thunderbird users read here:
> 
> Please use ThunderHTMLedit 1.7 (attachment 8884657 [details], publication on
> AMO imminent). It shows clearly were NBSP are.

Thanks for that valuable information.
 
> I'm not sure how reasonable the expected result "and not disappear randomly
> when surrounding characters are deleted or altered" really is since the
> editor does remove NBSP when it deems fit.

I'm sure the expected result is very reasonable, but we we need to try against all odds and find someone who is able to fix this in editor, or to convince TB council to offer Jörg a bigger contract so that he can use his excellent skills and energies to fix this for TB... ;)

N.B. Whiteboard entries traditionally come with brackets, has that syntax changed?
Whiteboard: poor/fragile workaround: comment 4 → [poor/fragile workaround: comment 4] [Tools for analysis: comment 6]

Comment 9

a year ago
Editor bugs get some attention these days from the excellent Japanese team around Masayuki, Makoto and others and of course Aryeh. One reason for fixing those bugs is compatibility with Chrome.

That said, using NBSP in web forms is not so common, so I'd imagine that this bug won't have a high priority for FF. In my seven years of TB usage I've never thought about NBSP, so I'm really surprised this is such an issue. However, I can see that for example "Le Monde" (http://www.lemonde.fr/) is using NBSP after/before the French quote characters « »: « La perte de Mossoul est un coup majeur porté au projet de construction d’un Etat islamique »
Priority: -- → P3
This looks to me like the *TESTS* for HTML editor's behaviour wrt nbsp...
So here's where "as Editor deems fit" space handling is mirrored...
Maybe this can assist to understand and locate the actual behaviour in editor's code.

https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/editing/include/implementation.js#5204-5475

*** Behaviour definitions...

https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/editing/include/implementation.js#5204
> function canonicalSpaceSequence(n, nonBreakingStart, nonBreakingEnd) {
>     // "If n is zero, return the empty string."
> ...

https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/editing/include/implementation.js#5265
> function canonicalizeWhitespace(node, offset, fixCollapsedSpace) {
>     if (fixCollapsedSpace === undefined) {
>         // "an optional boolean argument fix collapsed space that defaults to
>         // true"
>         fixCollapsedSpace = true;
>     }
> ...


*** ...and consumers:

https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/editing/include/implementation.js#6191
> ///// The delete command /////
> //@{
> commands["delete"] = {

https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/tests/editing/include/implementation.js#7829
> ///// The insertText command /////
> //@{
> commands.inserttext = {
> ...
> // "Canonicalize whitespace at (node, offset)."
>        canonicalizeWhitespace(node, offset);

Comment 11

a year ago
@JorgK: Yes, this is a big issue from french users (at least the one that respect typography rules). We must use a nbsp inside quotes (« like this ») and before colon (like this : but normal space after). They are maybe some other cases, but none jumps out of my head right now.

That being said, people using sane keyboard dispositions (mainly BÉPO, the french Dvorak version, but also Linux or even MacOS AZERTY) have a way to insert nbsp easily (Maj+Space for BÉPO, not sure about the others).

So we don’t need a shortcut for that, this is not the responsibility of the compositing software I think. But respecting the input characters is OTOH. I’m not sure of the status here after you’ve fixed #290565, I’m waiting for 52.3 which I suppose will include your fix?

Comment 12

a year ago
I don't know the status here either. I fixed bug 290565, so NBSP will no longer get lost. TB 52.3 will include that fix. I also changed my add-on ThunderHTMLedit so you can see NBSP in the composition better.
I think the status here is exactly in described in the STR and actual results of either of comment 0, comment 3, comment 6.
I.e. most directly entered single non-breaking spaces can easily disappear when editing surrounding text; but when they survive in the HTML editor (using tricks as described, e.g. pasting with surrounding characters), they'll also get sent.

The (FF) editor team should look into this.
Along the lines of my comment 10, we need to fully understand the current behaviour first, and the underlying purpose, then try to find something better.

The problem is that in HTML editor, multiple regular spaces do not create any distance between words, because they'll be ignored when parsing HTML. But in email messages, we need the behaviour of a word processor where spacing actually creates distance. That's achieved by converting a sequence of n regular spaces into something like n-1 nbsps and one regular space, so that the nbsps get rendered in HTML to create the distance, and the last space ensures we're not hard-linking words which the user never linked. Then there's an algorithm which in case you decide to delete your distance-creating whitespace, editor is trying to undo/maintain its own conversion by actively managing spaces, e.g. if you delete the last space after the sequence of auto-inserted nbsps, it'll delete the last nbsp instead and preserve the normal space. I guess that also happens in the single-nbsp case, where it's wrongly assumed that deleting the character after the nbsp must effect that that nbsp gets converted into a normal space which breaks (wrongly assuming that any nbsp has been auto-inserted by entering normal spaces). The surprising part is that apparently it's not checked if the deleted character is a normal space, and also just entering only one nbsp is immediately converted into a normal space (which is wrong and probably not required for the other tricks to work). I think something like that happens. Don't take my word for it. We need to analyse.

First step would be to find out where that automagical conversion actually happens in code.
So from my layman's pov, if I understand this behaviour correctly (of which I'm not sure), these might be the required changes in the space management code:

1) A single nbsp entered by user must not be converted to 0x20 if preceded by a non-whitespace character (and we must ensure that this also works for rtl languages). Would that be a sufficient condition?
2) When deleting the character which follows a single nbsp, only if that character is a space, the nbsp must be deleted. If it's any other following character which gets deleted, don't delete the nbsp.
3) There might be more.

Comment 15

a year ago
Or, as mentioned by someone else, just remove this ugly hack altogether. ;) nbsp are not meant for layout. People should either do plain/text, which is what 99 % of human sent emails require and don’t have such issues or fully embrace HTML and do the actual spacing with CSS.
(In reply to Thomas D. (currently busy elsewhere; needinfo?me) from comment #14)
> So from my layman's pov, if I understand this behaviour correctly (of which
> I'm not sure), these might be the required changes in the space management
> code:
> 
> 1) A single nbsp entered by user must not be converted to 0x20 if preceded
> by a non-whitespace character (and we must ensure that this also works for
> rtl languages). Would that be a sufficient condition?
> 2) When deleting the character which follows a single nbsp, only if that
> character is a space, the nbsp must be deleted. If it's any other following
> character which gets deleted, don't delete the nbsp.
> 3) There might be more.

/deleted/substituted by regular space/

Jörg, does that sound like a viable plan to adjust the existing space management without removing it?
Could you find the spot in code where that space management is done? See comment 10.
Flags: needinfo?(jorgk)

Comment 17

a year ago
I did my investigation as per comment #6 and even updated my add-on ThunderHTMLedit to make the NBSP easily visible. Sadly I really don't have any time locating and fixing the code in Core::Editor.
Flags: needinfo?(jorgk)
You need to log in before you can comment on or make changes to this bug.