Closed Bug 290565 Opened 15 years ago Closed 2 years ago

Non-breaking spaces (0xA0/ ) are converted to spaces (0x20) in mail composer

Categories

(MailNews Core :: Composition, defect, critical)

defect
Not set
critical

Tracking

(thunderbird_esr5255+ fixed, thunderbird55 fixed, thunderbird56 fixed)

RESOLVED FIXED
Thunderbird 56.0
Tracking Status
thunderbird_esr52 55+ fixed
thunderbird55 --- fixed
thunderbird56 --- fixed

People

(Reporter: mozilla, Assigned: jorgk-bmo)

References

(Depends on 3 open bugs)

Details

(Keywords: dataloss, Whiteboard: [we know it doesn't work][poor workaround: comment 30][ref bug 218277 comment 19, 21])

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050404 Firefox/1.0.2
Build Identifier: 

When composing a message, all non-breaking spaces entered are converted to
regular spaces. Non-breaking spaces (character U+000000A0) are used in languages
like French to separate a word from some punctuation marks (!?:;«») and ensuring
that that mark will stay next to the word and not be wrapped to the next line in
an automatic line wrapping environment. In the message compose window, entering
a non-breaking space (keysym 0xa0, nobreakspace) has the same effect as entering
a regular space: a space character (U+00000020) is inserted. The punctuation
mark is then wrapped alone to the next line if it exceeds the current line.

The HTML entity for a non-breaking space is   and it is correctly handled
by the browser.

Reproducible: Always

Steps to Reproduce:
1. You must have the ability to enter non-breaking spaces with your keyboard. In
the symbol file used for your configuration (in /etc/X11/xkb/symbols/pc), there
should be a "nobreakspace" symbol, as in the "cz" keymap. To test if it works,
in OpenOffice.org Writer for example, entering a non-breaking space displays it
as a solid grey rectangle.
2. Compose a new message, enter "Bonjour[non-breaking space]!", where
[non-breaking space] is the character you enter with the correct key combination
or by copy-pasting it from elsewhere (a character chooser, for example).
3. Now enter as many spaces as needed before "Bonjour" to see the exclamation
point go alone on the next line.
Actual Results:  
The line is wrapped on the non-breaking space (which is not a non-breaking space
here, since it has been replaced by a regular space at the time you entered it),
making the exclamation point go alone on the next line.

Expected Results:  
The non-breaking space entered should never be converted and the wrapping
algorithm should treat the whole "Bonjour[non-breaking space]!" as one word,
making it wrap entirely on the next line if it exceeds the line.

I noticed also the presence of this bug in all versions of Mozilla Mail I used.
This is also the case in Mozilla Composer.
This also applies to HTML forms and probably similar components. I always
thought it was IE that screwed up Wikipedia articles this way.
Depends on: 218277
over at bug 218277 comment 19, david made the point numerous times that that «When using the HTML editor, hitting the space bar multiple times turns spaces into  .  Mail composed using the HTML editor is often sent as text, and if this is the code used to convert that mail to text, those nbsp characters need to be converted to spaces.»
i never understood why this actually needed to be the case, but the checkin for that bug specifically eliminated the   -> regular space conversion for textareas only. a checkin that would eliminate the conversion in all text fields would fix this problem.
I'd also like to mention that this is really a data loss issue. Text copied over from a different application which had NBSP characters will lose them.
also someone with privileges needs to set Product to Core, Hardware to All, and OS to All.
QA Contact: message-compose
Product: Core → MailNews Core
Assignee: mscott → nobody
Component: Message Compose Window → MailNews: Composition
OS: Linux → All
Product: Thunderbird → Core
QA Contact: message-compose → composition
Hardware: PC → All
Version: unspecified → Trunk
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b2pre) Gecko/20081021 SeaMonkey/2.0a2pre - Build ID: 20081021000739

I'm seeing this bug in the above build. Bug 332510 describes the same behaviour on Tb 1.5 on the Mac. Hence the Whiteboard entry -- in case someone fixes it before Tb2/Sm1 end-of-life.

I'm entering no-break spaces by clipboard copy from gvim and checking them by copying back from the "View source" window after sending the message to myself. They do indeed arrive as ordinary spaces, 0x20. The copy in the "Sent" folder has them too. Setting the "dataloss" keyword since the distinction between space (0x20) and no-break space (0xA0) is lost (cf. comment #3).

The same (wrong) behaviour is seen regardless of whether I send the test mail as ISO-8859-1 or as UTF-8.
Keywords: dataloss
Whiteboard: [trunk and 1.8 Branch]
bump >critical to match dataloss
Severity: normal → critical
Duplicate of this bug: 532712
As I mentioned in the duplicate bug (sorry about that, I really did try to find a pre-existing one) you can replace a regular space with a non-breaking space by typing it on the keyboard if you use Insert HTML to edit a stretch of text around the space. I realize this doesn't help anyone actually trying to compose text, but it might help someone trying to work on the bug.

On Mac you can type a non-breaking space even on a US English keyboard.
Duplicate of this bug: 554014
Hi, sorry for the duplicate, I searched but found nothing.

NBSP is a standard charater on many keyboard-layouts (there some more than standard us-layout).

I read the old bug 218277 mentioned above and what david said, but as far as I can see there is no major advantage in replacing NBSP if the charset supports it especially in comparison to the dataloss. NBSP is a valid character for most Charsets and even if you copy a html mail into a plain text mail, it doesn’t hurt when thinking of where NBSP are inserted to html.

At least the user should have the possibility to choose. Maybe something for user.js like 
user_pref("mailnews.send_chars_as_i_type_do_not_mess_up_my_mail_you_idiot", true); /* ;) */
as mentioned in Bug 359303
commenting out line 1265 in nsPlainTextSerializer.cpp ¹
solves the bug 359303 but has an effect to this bug. 
It is now possible to send some NBSP, but
a) in text-Mode some of the NBSP are still replaced ²
b) in html-Mode every character in a (mixed) row of NBSP and/or Spaces is replaced by NBSPs but the last one is replaced by Space.

So there must be at two(?) more methods to replace valid characters just for nothing with others. Hints where to look?


¹ „aString.ReplaceChar(kNBSP, kSPACE);“
² if a NBSP follows a Space it’s not replace. If a NBSP doesn’t follows a Space it is replaced by a Space, allowing a further NBSP to stay … very strange.
This is an ongoing annoyance in SeaMonkey 2.8 and Thunderbird 12.  When I use the "Insert HTML" option and insert " ", save, then re-open "Insert HTML", the " " is GONE, replaced by an ordinary space character.  It doesn't matter whether I'm composing in ISO-8859-1 or UTF-8 mode.

This bad behavior causes problems with certain characters, such as em-dash, en-dash and ellipsis, which normally are separated from the text by a non-breaking space on the left side and a regular space on the right, to ensure that these characters don't end up in column 1 due to line wrapping when the recipient reads the message.  Since one doesn't know the window width that the reader will be using, the only way one has of reasonably controlling the wrapping point in critical situations is through deliberate insertion of non-breaking spaces.

Moreover, if I insert the "—" code, it mangles it, so that after retrieving the source code of a draft message it has turned it into "—", but at least it still displays as an em-dash in SeaMonkey/Thunderbird.  (This may have been a concession to Microsoft Outlook; see NOTE at bottom.)

The only way I've found of being able to force a single non-breaking space in front of another character is to manually enter the hex code " " in the HTML view.  If I save the draft and re-open it, Thunderbird/SeaMonkey has changed " " into " ".  Is this bizarre, or what?

If I manually insert a valid HTML code such as &,  , etc., I expect both Thunderbird and SeaMonkey to honor it and not strip it out or transmogrify it.  Likewise, if I manually enter hex codes in the form &#nnn;, I expect those to be preserved, not translated into something else, or at least give the user the option to override program behavior.

NOTE: It's true that Microsoft Internet Explorer has historically been unable to display —, – and possibly other HTML characters correctly, and Microsoft Outlook may have the same bug, but this is a matter for users of those programs to push back to Microsoft for corrective action, as these are perfectly valid codes as defined by the W3C HTML standards, and have been for well over a decade.  I've seen the problem as late as MSIE 7, although it appears to have been fixed in MSIE 8.  Since I haven't used Outlook in years, I don't know how it currently behaves.
(In reply to Hugues De Keyzer from comment #0)

Bug as described by Hugues is still here, 8 years later. Thundertbird 24.2.0
Still here in Thunderbird 28.0.
It should be possible to enter non-breaking spaces using Shift-Space or some other modifier combination with space. Currently it seems like there is no way to enter them but to use a source editor like the stationary extension. This is a basic feature that any decent text editor has. It's been several years. Thunderbird is at version 28 now, how come basic issues like this are still unresolved???
I get frustrated too. Unfortunately "editor" is an area with little, sometimes no coverage of developers. Plus, someone who can decide on and code a solution is needed. 

Not to minimize users' frustrations, but there is plenty of technical data here - non-technical comments are not so helpful and can be a distraction. Please see https://bugzilla.mozilla.org/page.cgi?id=etiquette.html

For the technically inclined https://bugzilla.mozilla.org/buglist.cgi?v4=nbsp&f1=short_desc&o3=substring&list_id=9930879&short_desc=space%20nbsp&v3=breaking%20space&o1=nowordssubstr&j2=OR&classification=Client%20Software&classification=Components&f4=longdesc&v5=breaking-space&query_format=advanced&f3=longdesc&f2=OP&o4=substring&short_desc_type=anywordssubstr&f5=longdesc&component=Composition&component=Composition&component=Editor&component=Editor&component=Message%20Compose%20Window&component=Message%20Compose%20Window&component=Serializers&component=Serializers&product=Core&product=MailNews%20Core&product=Thunderbird is a partial state of nbsp. I believe we have some duplicates in here, of someone wants to clean up some bug reports. 

Perhaps we can get some traction by finding someone who can make a decision on this. Until then, please be patient.
Summary: Non-breaking spaces are converted to spaces in mail composer → Non-breaking spaces (nbsp) are converted to spaces in mail composer
Whiteboard: [trunk and 1.8 Branch] → [we know it doesn't work][workaround:comment 5][ref bug 218277 comment 19, 21]
I agree with Geoffroy from April 2014... A year later and at version 31 now, nothing happened.

Is it so difficult to implement such a basic feature like being able to insert a non-breaking space in the editor, for text-only and for HTML ?

best regards
_~_  Meaulnes Legler
'¿') Zurich, Switzerland.
`-´
Ten years later, version 38: Still no way of getting a non-breaking space in TB. Why is this basic feature so difficult to integrate?
I don’t think it’s difficult. But now that they are almost no devs left (and likely no paid ones)…

Mail client seems to be the most difficult piece of software to do right, the last promising software (Nylas Mail) is dead before being usable. I’m keeping on TB, but bug like this one or the Address Book being a thing from the past don’t help…
I don't have time to perform tests, but a quick look to nsPlainTextSerializer.cpp, which bug 624666 links to, lead to some interesting comments (lines ca. 1210+) :

/**
 * Prints the text to output to our current output device (the string mOutputString).
 * The only logic here is to replace non breaking spaces with a normal space since
 * most (all?) receivers of the result won't understand the nbsp and even be
 * confused by it.
 */

Might setting kNBSP to -1 be a quick and dirty but effective fix ?
Well, I’m not exactly sure about this but likely the whole https://dxr.mozilla.org/mozilla-beta/source/dom/base/nsPlainTextSerializer.cpp#1211-1241 should go. Dunno who to ping about that.
Depends on: 624666
I don't see any workaround in comment 5.
Summary: Non-breaking spaces (nbsp) are converted to spaces in mail composer → Non-breaking spaces (0xA0/ ) are converted to spaces (0x20) in mail composer
Whiteboard: [we know it doesn't work][workaround:comment 5][ref bug 218277 comment 19, 21] → [we know it doesn't work][ref bug 218277 comment 19, 21]
Bruno and Skippy, thanks for taking this up and sympathizing with our real situation where it's not easy to find someone to fix this type of bug, somewhere deep-down in age-old shared components like our HTML editor which are no longer getting much attention as nobody gets paid for that and nobody feels responsible.

Also thanks for trying to find starting points in source code, which is very helpful.

I played with the current behaviour a bit and it's really hopeless, nonsensical and disastrous in terms of UX. Sorry for the inconvenience. Suffice to say it's all relicts from time immemorial; the irony being that there's a "feature" of automagically creating non-breakable spaces from regular spaces which later creates the need for that other "feature" of eliminating them again, due to another "feature" of automatically downgrading messages from HTML to plaintext... You see now!? This bug is actually a feature! Just kidding...

Here's my understanding of the history of the story:

<History rant>
HTML Editor component (used by TB composition) has a "feature" where multiple regular spaces typed in by user get automagically converted to non-breaking spaces. I'm not quite sure why but I can only guess that it was well-meant as a formatting assistance to create whitespace within texts, at a time when css styling was still in its infancy. When that HTML editor got re-employed for mail composition, at a time when plaintext ASCII email was still the order of the day, and international versatile document encodings like UTF-8 were maybe not yet universally acceptable, that "trick" was now perceived as a problem as it would create "special characters" (non-breaking spaces) which were undesired for the preferred mail format at the time, which was plaintext. It was the time of Plaintext vs. HTML wars. As you may have noticed, to this day, Thunderbird still downgrades your messages composed in HTML to plaintext when there's almost no HTML formatting in your message (and I have been at the forefront of taming that HTML-eating behaviour), due to an arguable "feature" called "Delivery format: Auto-Detect" (the auto-downgrading part of which is now optional at least, after my intervention). So even when you succeed to somehow preserve your non-breaking spaces in your draft (which is possible), at the time of sending, downgrading to plaintext by delivery-format auto-detect might spoil everything again by eating your non-breakable spaces. I don't think that the assumption of non-breaking spaces being undigestable for plaintext still holds, with formats like UTF-8 etc. etc.; even the simplest text editor on Windows, "Notepad" has absolutely no problem of digesting non-breaking spaces...
</History rant>

And here's my take on the general direction of how to fix this (from my bug 347689, comment 4):

<Solution rant style="philosophy">
The main reason why we're currently wrongly eliminating non-breaking spaces seems to be that we're needlessly creating them in the first place when user inputs multiple spaces in HTML editor which we then automagically convert into a sequence of non-breaking spaces, followed by one breaking space. WHY!?

I'm totally convinced that we must get rid of all that automagical conversion circus. Whatever reason made us convert user input of multiple regular spaces into non-breaking spaces when editing HTML, that behaviour is now both BAD and OBSOLETE. I'm actually surprised that such non-standard hacks are still haunting us today; even more surprising, that they are coming from Mozilla as an advocate for standards compliance and teaching the web. An HTML editor which silently converts multiple regular spaces into non-breaking spaces is not only violating the standards, but also teaching users the wrong thing. They'll get used to our deviant automagical behaviour and assume that regular spaces "just work" to do the trick (whichever trick that might be, I'm not sure), a fatally wrong conclusion. Instead, Mozilla should be at the forefront of teaching the difference between normal spaces and non-breaking spaces, so that users are aware and can use the right one according to their purposes, and know that their favorite editor / mail composer / whatever app will just render whichever flavor exactly as entered. Surely non-breakable spaces are no longer an HTML design tool which should be encouraged for general layouting of whitespace. This seems to be a relict from times immemorial where CSS styling was not yet the order of the day.
</Solution rant>
(In reply to Bruno Pagani from comment #21)
> Well, I’m not exactly sure about this but likely the whole
> https://dxr.mozilla.org/mozilla-beta/source/dom/base/nsPlainTextSerializer.
> cpp#1211 should go. Dunno who to ping about that.
Depends on: 359303
This seems to be the right solution indeed. And I like my emails in UTF-8 plain-text. ;)
I completely agree with the sentiment that all the nbsp magic is obsolete and should go away, but I'm sure some people will argue that it's important that if the user types multiple spaces in the composer, they get wider spacing as a result, and doing this without replacing some spaces by nbps's is complicated.

But I think a simple fix would be to simply make this a configurable preference.  Firefox and Thunderbird have built their reputation on being highly configurable, and this would be very easy to implement: simply have a single preference setting, under both Firefox and Thunderbird, which eliminates _all_ nbsp magic (stop replacing multiple spaces by nbsp's, and stop replacing nbsp's by spaces upon serialization), in other words, make U+00A0 NO-BREAK SPACE just as ordinary as U+202F NARROW NO-BREAK SPACE (say).  I think this would be a completely satisfactory solution: geeks who care about data preservation and who know what an non-breaking space is could use this preference, whereas users who care about compatibility and the former behavior would still get it.
(In reply to David A. Madore from comment #27)

David, thanks for rapid feedback :)

> I completely agree with the sentiment that all the nbsp magic is obsolete
> and should go away, but I'm sure some people will argue that it's important
> that if the user types multiple spaces in the composer, they get wider
> spacing as a result, and doing this without replacing some spaces by nbps's
> is complicated.

Indeed. David's reference to the wider spacing effect made me revisit this in more detail.
Turns out there are so many cases which fail that I overlooked the main case where it actually works as designed, and is useful. So maybe I spoke to soon wrt the solution part of my comment 24. For HTML compositions only(!), converting consecutive spaces to nbsp's makes sense and is required to ensure wysiwyg, because otherwise, multiple spaces are conflated away by definition which significantly changes the layout. So we do need that word processing effect where typing multiple spaces actually creates wider spacing.

> But I think a simple fix would be to simply make this a configurable
> preference.

Given the usefulness and layout necessity of replacing multiple spaces with nbsp's in HTML, I don't think we should have a preference for this. We just have to stop eliminating nbsp's randomly on input and when sending HTML as plaintext.

So it looks to me that the grand plan of bug 347689 (sic) is still the way to go:

(In reply to David Baron :dbaron: ⌚️UTC-7 from 218277 comment #59)
> It seems like what we should really be doing is:
>  * when we're editing plain text, store multiple presses of space as spaces
> (this is a change)
>  * when we're editing HTML, store multiple presses of space using
> non-breaking
> spaces for all but the last press (tricky with deletion) (we probably do this
> fine already)
>  * when serializing HTML to text, convert runs of non-breaking spaces
> terminated
> by a space to spaces
>  * when using nsPlainTextSerializer to convert text to text (if we need to
> use
> it at all, although we seem to now), don't mess with spaces

If it turns out we can alleviate the effects with less than that, I'm fine with that (maybe taming nsPlainTextSerializer to only replace {nbsp's followed by space} could go a long way?). In terms of UX, we should definitely try to stop the substantial and annoying dataloss for everyday scenarios reported in this bug asap.
I'm really happy that people are showing interest in this bug again. Thank you all!

Regarding the multiple whitespaces -> no-break space translation in HTML, I would like to give a vote and motivation against it. HTML is a code, and the fact that multiple spaces are not counted in the output is a feature to help visualizing and structuring the code better without affecting the output. I think this need can hold true even when composing an email in HTML: I might want to keep the code behind it clean and structured with the help of whitespaces, but without affecting the look of the final email.

Wouldn't it be possible to add a shortcut for no-break space – say, ctrl+space – in Thunderbird, so that the users can simply input no-break space when they need a no-break space, and a whitespace when they want a whitespace, keeping the behaviour of the latter as originally intended in HTML (collapse) and plain-text (no collapse but breakable)?

(By the way, my keyboard layout actually has a no-break space, as ctrl+alt+space, that can be used in all UTF-8 applications.)
From an UX perspective, I see two main aspects of this bug where we currently fail:

1) Most manually entered nbsp's are immediately removed. E.g., when you enter Alt+255, or Insert > HTML > &nbsp;, they are immediately converted from non-breaking spaces (0xA0) into regular spaces (0x20).

2) All nbsp's which are initially preserved in the draft are converted to regular spaces when Delivery-format: Auto-Detect converts an HTML message to plaintext before sending. E.g., nbsp's from automagical conversion of multiple spaces entered are initially preserved in draft, as seen in source view. Or you can write text containing real nbsp's in Notepad, then copy into your HTML composition (nbsp's must be with words), ensure there's no HTML-triggering formatting. In the draft, nbsp's are still present. After sending, because of auto-detect conversion to plaintext, all nbsp's are gone.

Poor WORKAROUND for this bug:

* Per 2) above, you could compose your message text in other apps like Notepad and then copy into TB composition. Ensure that your nbsp's are actually preserved in saved draft, as they might get lost in the process of copying. Continue with next step.
* Then, ensure your message gets sent as HTML, by using any one of the following methods:
  - Add visible formatting anywhere in your message, like bold, colors, etc.
  - Add css style anywhere in your message, or define an HTML signature with styles in your account settings
  - Switch of Delivery-Format: Auto-Detect globally: Tools > Options > Composition > General > Send Options >
    Remove the checkmark for [ ] Send messages as plaintext if possible
* If you're still using Delivery-Format: Auto-Detect, run some minimal test cases for any solutions involving HTML/styles and see if it really gets sent as HTML. Things like <p>, <pre> or <tt> still get eaten if without styles or other formatting.
Whiteboard: [we know it doesn't work][ref bug 218277 comment 19, 21] → [we know it doesn't work][poor workaround: comment 30][ref bug 218277 comment 19, 21]
(In reply to Thomas D. (currently busy elsewhere; needinfo?me) from comment #30)
> Poor WORKAROUND for this bug:
>   - Switch of Delivery-Format: Auto-Detect globally: Tools > Options >
> Composition > General > Send Options >
>     Remove the checkmark for [ ] Send messages as plaintext if possible
> * If you're still using Delivery-Format: Auto-Detect, run some minimal test

Wrt switching off: [ ] Send messages as plaintext if possible
More precisely, recipient-centric Delivery-Format: Auto-Detect will still be active, but you'll switch off the message-centric Auto-Downgrading part of it, which is what we want here. Of course, if any of your recipients are marked as prefers-plaintext, you might still get into trouble depending on your other Send Options.
(In reply to pglpm0 from comment #29)
> I'm really happy that people are showing interest in this bug again. Thank
> you all!

We're always interested, alas, fixing is the harder part...

> Regarding the multiple whitespaces -> no-break space translation in HTML, I
> would like to give a vote and motivation against it. HTML is a code, and the
> fact that multiple spaces are not counted in the output is a feature to help
> visualizing and structuring the code better without affecting the output. I
> think this need can hold true even when composing an email in HTML: I might
> want to keep the code behind it clean and structured with the help of
> whitespaces, but without affecting the look of the final email.

Yeah, I was initially thinking the same, but that's more like a reason to keep the current substitution (multiple spaces -> nbsp's). Composing a message is more like word processing, so we need to match the user expectation that multiple spaces will result in more visible whitespace.

> Wouldn't it be possible to add a shortcut for no-break space – say,
> ctrl+space – in Thunderbird, so that the users can simply input no-break
> space when they need a no-break space, and a whitespace when they want a
> whitespace, keeping the behaviour of the latter as originally intended in
> HTML (collapse) and plain-text (no collapse but breakable)?

Yes, when TB does no longer eliminate nbsp's as soon as they are entered (this bug), we definitely want a shortcut for non-breaking space, probably ux-consistent with major word processors, so it'll be Ctrl+Shift+Space. Alt+0160 or Alt+255 would also be working again.

I don't think we'll want to enable conflating behaviour of multiple spaces. Due to Word processor habit formation, I think for most users it would be very odd if you'd type 5 normal spaces and your cursor wouldn't even move; I think that's exactly what would happen without the automagic substitution. (Of course, there might be technical alternatives other than nbsp's to preserve the whitespace, but they all come with their own disadvantages again, and hard to code.) While having the keyboard shortcut for non-breaking spaces is good, forcing users to use non-breaking spaces where they just want to create some space in front of a line, or between words, would be too cumbersome.

The real problem of this bug is not automagical substitution of spaces -> nbsp's, but the other way round: We're too greedy to convert nbsp's -> spaces, on input, and when sending HTML as plaintext.
I sent a HTML message containing |huhu&nbsp;!| as HTML and plain text.

Result:
Content-Type: multipart/alternative;
 boundary="------------26EE3AE20039D0A3270828F0"
Content-Language: de-DE

This is a multi-part message in MIME format.
--------------26EE3AE20039D0A3270828F0
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit

huhu !


--------------26EE3AE20039D0A3270828F0
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 8bit

<html>
  <head>
    <meta http-equiv="content-type" content="text/html;
      charset=windows-1252">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p><tt>huhu !</tt><br>
    </p>
  </body>
</html>

--------------26EE3AE20039D0A3270828F0--

Looking at the message in a HEX editor I see 0xA0 (NBSP in windows-1252) in the HTML, but in the plain text part I see 0x20.

This could be related to
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsPlainTextSerializer.cpp#1231-1250
as pointed out in bug 624666 comment #10.

I'll run the plain text encoder with nsIDocumentEncoder::OutputPersistNBSP to see what happens.
Simple fix to avoid losing NBSP when converting HTML to plain text.

I did minimal testing, just sent |huhu&nbsp;!| as plain text, and voilà, I got an 0xA0 after the huhu.

Those who feel inclined can download a Windows try build ...

Once completed, builds and logs will be available at:
https://archive.mozilla.org/pub/thunderbird/try-builds/mozilla@jorgk.com-3ee41b94524088cef05ae44136b143b90c078ec4/
Attachment #8884559 - Flags: review?(acelists)
Comment on attachment 8884559 [details] [diff] [review]
290565-nbsp.patch (v1)

Review of attachment 8884559 [details] [diff] [review]:
-----------------------------------------------------------------

OK, this fixes the part when you manage to get &nbsp; or 0xA0 in the HTML composition, then converting the message to plain text (e.g. for plain+HTML delivery format) keeps the non-breaking space.

When editing such a message (e.g. Edit as new) still looses the non-breaking spaces.
Also getting the non-breaking space into the composition in the first place is still hard. Insert->HTML, typing &nbps; converts to space. I only managed it via the ThunderHTMLedit addon by writing the &nbsp; directly into the message HTML source.
Will these parts be covered?
Attachment #8884559 - Flags: review?(acelists) → review+
(In reply to :aceman from comment #35)
> OK, this fixes the part when you manage to get &nbsp; or 0xA0 in the HTML
> composition, then converting the message to plain text (e.g. for plain+HTML
> delivery format) keeps the non-breaking space.
Indeed.

> When editing such a message (e.g. Edit as new) still looses the non-breaking
> spaces.
Oh yes, I didn't know/test. I'll look into it.

> Also getting the non-breaking space into the composition in the first place
> is still hard. Insert->HTML, typing &nbps; converts to space.
Interesting, I don't use that stone-age HTML editor. But that code goes through serialisation as well, no wonder M-C loses it there. Oh, comment #30 mentions this: Insert > HTML and Alt+255 get remove immediately.

> I only managed it via the ThunderHTMLedit addon by writing the &nbsp;
> directly into the message HTML source.
That's what I did.

> Will these parts be covered?
Looks like it, but not today :-( I'm glad I didn't take the bug ;-)

Maybe the best solution to *all* problems including bug 359303 would be to remove
https://dxr.mozilla.org/mozilla-central/source/dom/base/nsPlainTextSerializer.cpp#1231-1250,
but changing M-C code is a a long-winded process and I'd need to fix any M-C breakage and write M-C tests :-(
(In reply to Jorg K (GMT+2) from comment #36)
> > When editing such a message (e.g. Edit as new) still looses the non-breaking
> > spaces.
> Oh yes, I didn't know/test. I'll look into it.
Actually, that's not the case. I inserted my |huhu&nbsp;!| and saved as draft where I received an 0xA0. Editing this draft (as new) and saving or sending it maintains the 0xA0. So no problem here.

Since you can't copy those NBSP (bug 359303) and ThunderHTMLedit seems to lose them (I'll have to fix that) it's hard to tell whether you have one or not. The only way to be sure is save the message as draft, extract as .eml file and look at it in a hex editor.
It's also not true that NBSP can't be added with Insert > HTML. If I do and save the draft, it contains a 0xA0. You can even edit with Insert > HTML and the 0xA0 are maintained.
New version that won't lose NBSP and will show NBSP as &nbsp; and tabs as &#x09; so we can see what's going on. I'll publish this on AMO now.

With this version you can see that:
1) Insert > HTML &nbsp; works.
2) Saving this as a draft works.
3) Editing the draft works.
4) Sending the draft as HTML works and with the patch
   attached here, sending as plain text works as well.
5) Adding NBSP as ALT+255 doesn't work.

So as far as I'm concerned, TB now fully supports NBSP in compose, send, edit as new, downgrade to plain text.

That you still can't copy NBSP and paste them into Notepad++ is covered in bug 359303.
https://hg.mozilla.org/comm-central/rev/2a5ae8fc279b45758964d61fcd4989e63338ae06
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 56.0
Assignee: nobody → jorgk
Comment on attachment 8884559 [details] [diff] [review]
290565-nbsp.patch (v1)

Old and annoying bug with pretty risk-free fix, so let's uplift.
Attachment #8884559 - Flags: approval-comm-esr52?
Attachment #8884559 - Flags: approval-comm-beta+
https://hg.mozilla.org/comm-central/rev/18217986ac6a0d930b6dc7d693045c2eb7045b50

I noticed that when you manage to get a NBSP into a plain text composition to start with (using ThunderHTMLedit) that gets lost without this additional patch. So now we're 100% waterproof.
Jörg, thanks a lot for rapidly picking up on my detailed impetus to fix this in bug 624666 comment 7. You rock!!! Great teamwork, starting from the constructive feedback from users like Bruno and Skippy above (comment 19 ff.) which got my attention. Could have been fixed long ago with that kind of cooperative interest and effort. Our users will certainly appreciate and hope to see more of such synergies to eliminate long-standing bugs...

So a large chunk of this problem has been fixed, which is great. Full stop. At least non-breaking spaces are now getting sent when they are in the composition.

However, getting them into composition is still way too complicated, which looks like the main pain point of comment 0:
- Any direct entry of nbsp into composition fails (alt+255, alt+0160, copy/paste using charmap app, Ctrl+Shift+Space not implemented)
- pasting generally works but not always, e.g. can't paste only nbsp from notepad++
- Insert > HTML: &nbsp; only works if context words are also inserted, not for adding only &nbsp; at cursor insertion point.
For this basic everyday task, we can't expect our users to continue using workarounds involving copy/paste, Insert > HTML, or Jörg's great ThunderHTMLEdit addon (https://addons.mozilla.org/en-us/thunderbird/addon/thunderhtmledit/).

So we'll want to open a followup bug for that, which includes implementing the default keyboard shortcut for inserting nbsp, Ctrl+Shift+Space. I've tried and that part is actually fairly easy (apart from some focus issues), but unfortunately in my tests, even editor.insertHTML("&nbsp;") fails, so I don't know how to add the nbsp to the message source so that it lasts. Any ideas welcome.
Depends on: 532712
Attachment #8884559 - Flags: approval-comm-esr52? → approval-comm-esr52+
@ThomasD: As I wrote in #532712, I think that users waiting for nbsp to be respected don’t really care about a shortcut being added for that, but mostly care about their standard insertion method to actually work. No-one wants to use a different key combination to insert nbsp in TB than they do for different software. ;)

15 years, still annoying. Same issue in Firefox, by the way…

(In reply to yekcim from comment #47)

15 years, still annoying. Same issue in Firefox, by the way…

strange, I can't reproduce since I wrote this message… Sorry for noise

It is still happening for me and definitively not fixed per https://bugzilla.mozilla.org/show_bug.cgi?id=359303.

You need to log in before you can comment on or make changes to this bug.