Last Comment Bug 218277 - no-break spaces (nbsp) in submitted text are replaced by breakable spaces
: no-break spaces (nbsp) in submitted text are replaced by breakable spaces
Status: RESOLVED FIXED
: dataloss, intl
Product: Core
Classification: Components
Component: HTML: Form Submission (show other bugs)
: Trunk
: x86 Windows 2000
: -- critical with 37 votes (vote)
: mozilla1.9alpha1
Assigned To: David Baron :dbaron: ⌚️UTC-10
: Ashish Bhatt
: Andrew Overholt [:overholt]
Mentors:
http://mess.genezys.net/NoBreak/
: 194498 195946 219774 251404 310877 375954 (view as bug list)
Depends on:
Blocks: 290565 194498 213628 test-suites
  Show dependency treegraph
 
Reported: 2003-09-04 03:08 PDT by Hadrien Nilsson
Modified: 2012-05-11 07:10 PDT (History)
45 users (show)
asa: blocking‑aviary1.0-
dbaron: blocking1.8b-
asa: blocking1.8b3-
sayrer: in‑testsuite+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Code source of the test page located at http://mess.genezys.net/NoBreak/ (1.20 KB, application/octet-stream)
2003-09-04 03:11 PDT, Hadrien Nilsson
no flags Details
This patch corrects the « nsPlainTextSerializer::Output » function (1.06 KB, patch)
2003-11-26 05:42 PST, Hadrien Nilsson
no flags Details | Diff | Splinter Review
Testcase (528 bytes, text/html)
2004-10-02 15:41 PDT, Olivier Mengué
no flags Details
Persist nbsp characters only when serializing the value of form controls. (7.23 KB, patch)
2004-10-19 13:55 PDT, Johnny Stenback (:jst, jst@mozilla.com)
bryner: review+
dbaron: superreview+
Details | Diff | Splinter Review
patch_v2 (4.89 KB, patch)
2006-08-06 16:47 PDT, jpl24
no flags Details | Diff | Splinter Review

Description Hadrien Nilsson 2003-09-04 03:08:07 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6

Let's have a string containing 30 times the letter 'a', 5 no-break space
(character 0xA0 or 160 in iso-8859-1, iso-8859-15 or unicode) and a dot '.'

Note : you can input no-breack space on Windows with by letting the Alt key down
and type 0160 on the keypad

Input this string into a text input element in a form. Submit the form and
display the submitted string. The 5 no-break space have changed to 1 breakable
space (0x20 or 32). The 5 no-break spaces should have been kept.

You can test this on http://mess.genezys.net/NoBreak/

This is quite important when you submit french texts as this language uses a lot
of no-break spaces (for instance before those punctuation marks --> ? ! : ;)

Reproducible: Always

Steps to Reproduce:
1.input a string with many no-break space in a text input in a form
2.submit the data
3.display the data received

Actual Results:  
One single no-break space (0xA0) are replaced by one breakable space (0x20)
Continuous multiple no-break spaces are replaced by one breakable space

Expected Results:  
Mozilla should keep the no-break spaces just as they appear in the original
input string.
Comment 1 Hadrien Nilsson 2003-09-04 03:11:55 PDT
Created attachment 130899 [details]
Code source of the test page located at http://mess.genezys.net/NoBreak/

Will need PHP version 4 or better
Comment 2 Boris Zbarsky [:bz] (still a bit busy) 2003-09-23 07:03:22 PDT
*** Bug 219774 has been marked as a duplicate of this bug. ***
Comment 3 Vincent Robert 2003-10-03 08:22:32 PDT
I would like to point that Mozilla just *transforms* 0xA0 bytes (non-breakable
space) into 0x20 bytes (breakable space). Why ?

HTTP posting should just send bytes through the network, and not analyse them.
As you can see with the test page attached, Internet Explorer does not make that
transformation.
Comment 4 Hadrien Nilsson 2003-11-25 02:45:29 PST
I discovered I was wrong about one point :
« The 5 no-break space have changed to 1 breakable space (0x20 or 32) »

Instead, each no-break space (0xA0) is replaced by one breakable space (0x20).
For instance, « a   b », encoded as :
0x0061, 0x00A0, 0x00A0, 0x00A0, 0x0062

will be converted to :
0x0061, 0x0020, 0x0020, 0x0020, 0x0062

I downloaded and compiled the source code and successfully reproduced the bug. I
will try to find the origin of that bug.
Comment 5 Hadrien Nilsson 2003-11-25 07:08:32 PST
I found the problem. The scary thing is that... it is done on purpose ! Here are
more details about the bug :

(btw, I used the official source archive : MozillaFirebird-source-0.7.tar.gz)

Here is the function call stack I had when debugging (the most recent call is
written first, the parent calls follow) :

nsTextControlFrame::GetText(nsTextControlFrame * const 0x03a03028, nsString *
0x0012f00c {""}) line 2758
nsTextControlFrame::GetValue(nsTextControlFrame * const 0x03a030b4, nsAString &
{...}, int 0x00000001) line 2949 + 54 bytes
nsPlaintextEditor::OutputToString(nsPlaintextEditor * const 0x03a81320, const
nsAString & {...}, unsigned int 0x00000418, nsAString & {...}) line 1424 + 39 bytes
nsDocumentEncoder::EncodeToString(nsDocumentEncoder * const 0x0392a438,
nsAString & {...}) line 935 + 39 bytes
nsPlainTextSerializer::Flush(nsPlainTextSerializer * const 0x0392ac08, nsAString
& {...}) line 453
nsPlainTextSerializer::FlushLine() line 1273
nsPlainTextSerializer::Output(nsString & {"u   v"}) line 1296
-----------------------------------------------------------------------------------
So here we are, the problem is in 
void nsPlainTextSerializer::Output(nsString& aString)
In file :
/mozilla/content/base/src/nsPlainTextSerializer.cpp line 1279

Here is the function :

/**
 * Prints the text to output to our current output device (the string
mOutputString).
 * The only logic here is to replace non breaking spaces with a normal space since
 * most (all?) receivers of the result won't understand the nbsp and even be
 * confused by it.
 */
void 
nsPlainTextSerializer::Output(nsString& aString)
{
  if (!aString.IsEmpty()) {
    mStartedOutput = PR_TRUE;
  }

  // First, replace all nbsp characters with spaces,
  // which the unicode encoder won't do for us.
  static PRUnichar nbsp = 160;
  static PRUnichar space = ' ';
  aString.ReplaceChar(nbsp, space);

  mOutputString->Append(aString);
}
--------------------------------------------------------------------
So here you see the special replacement. This is bad to do such a thing.

Those three code lines should be removed :
  static PRUnichar nbsp = 160;
  static PRUnichar space = ' ';
  aString.ReplaceChar(nbsp, space);

I removed them in my personnal build and rebuild the browser. I tried the page
on http://mess.genezys.net/NoBreak and everything worked lovely.

Internet Explorer and Opera behave correctly. They do not replace 0x00A0 (160)
chars with 0x0020 (32) chars. The same should be done for Mozilla.
--------------------------------------------------------------------------

Now that the bug was found, who could fix it in the CVS tree ?
Comment 6 Hadrien Nilsson 2003-11-25 07:10:39 PST
D'oh ! I forgot I made reorganisation. Please read the trace stack with the
parent calls first instead of « (the most recent call is written first, the
parent calls follow) : »
Comment 7 Hadrien Nilsson 2003-11-26 05:42:03 PST
Created attachment 136358 [details] [diff] [review]
This patch corrects the « nsPlainTextSerializer::Output » function
Comment 8 Vincent Robert 2003-11-27 01:36:09 PST
Bug 195946 may be resolved too with this patch... Need some tests.
Comment 9 Hadrien Nilsson 2003-11-27 02:04:53 PST
> Vincent Robert :
> Bug 195946 may be resolved too with this patch... Need some tests.

I did some tests, the patch also corrects bug 195946 and bug 213628
Comment 10 Asa Dotzler [:asa] 2003-12-03 10:28:02 PST
We changed this behavior over two years ago, see bug 62189, thinking we were
doing the right thing. 
Comment from nsPlainTextSerializer.cpp where this happens: "Prints the text to
output to our current output device (the string mOutputString).The only logic
here is to replace non breaking spaces with a normal space sincemost (all?)
receivers of the result won't understand the nbsp and even be confused by it."

If we're going to change this, it probably needs to happen early in an alpha or
beta cycle so we can have some time to see if we break some applications that
might expect Mozilla's current behavior. 
Comment 11 Bob Clary [:bc:] 2003-12-03 10:56:22 PST
Asa, from what I can tell, the nsPlainTextSerializer.cpp was checked in by Vidur
with this nbsp->sp conversion directly from nsHTMLToTXTSinkStream. bug 62189
cleaned up a bunch of stuff and added comments to nsPlainTextSerializer::Output
but didn't change the already existing nbsp->sp conversion.

I see no reason to be doing this conversion and no justification found in the
comments. Perhaps Daniel Glazman and others can take a look at this and comment.

I agree that this is something for alpha and not beta.
Comment 12 Hadrien Nilsson 2004-01-05 04:31:51 PST
As suggested, I set a review request flag for 1.7a.
Comment 13 piers 2004-01-22 11:32:24 PST
Comment on attachment 136358 [details] [diff] [review]
This patch corrects the « nsPlainTextSerializer::Output » function

Requesting r from Daniel per comment 11.
Comment 14 Daniel Glazman (:glazou) 2004-01-22 11:47:36 PST
I can't think of any reason why this patch is wrong but I would like a few other
opinions before I give this r=.

Akkana, Joe : we need your help here. Do you think the proposed change is safe?
Comment 15 Joe Francis 2004-01-23 20:29:33 PST
Well, someone should enumerate the common cases where the plaintext serializer
is used, and think abuo the results.  Example: you compose mail in composer in
the html view, and then choose the "send as plaintext" option when you get a
warning about one of your recipients not savvy to html.  Are there relevant
RFC's on plaintext mail that address this?  Are there internationalization
issues with 0xA0?  

My experience doens't help much here.  I know more about which situations demand
nbsp's in our internal representation of the data while editing is in progress,
rather than the various consumers of output from the plaintext serializer.

If I understand this bug correctly, though, one side effect should be that
copy/paste of nbsp's within a plaintext editor should transform them to spaces?
 So if you use nbsp's in soem ascii art to prevent a line from wrapping, and
then copy/paste that line in a plaintext editting session, it will suddenly wrap
(if long enough), ya?
Comment 16 chris hofmann 2004-02-12 18:00:53 PST
time to make the call in this for 1.7a.   how much, and what kind of testing has
been done with the patch?
Comment 17 David Baron :dbaron: ⌚️UTC-10 2004-02-12 18:06:36 PST
Someone should figure out what codepaths this is used for and make sure this
doesn't happen where we don't want it.  Furthermore, I don't see any reason this
should block the release.
Comment 18 Hadrien Nilsson 2004-02-13 02:58:20 PST
Here is some piece of information I could gather :
---------------------------------------------------------------
The function to change is :
protected nsPlainTextSerializer::Output(nsString& aString)

called by:
	protected nsPlainTextSerializer::FlushLine()
	protected nsPlainTextSerializer::EndLine(PRBool aSoftlinebreak)
	protected nsPlainTextSerializer::OutputQuotesAndIndent(PRBool
stripTrailingSpaces /* = PR_FALSE */)
	protected nsPlainTextSerializer::Write(const nsAString& aString)

class nsPlainTextSerializer used in
NS_NewPlainTextSerializer(nsIContentSerializer** aSerializer)

NS_NewPlainTextSerializer(nsIContentSerializer** aSerializer) used in
\layout\build\nsLayoutModule.cpp(466):MAKE_CTOR(CreatePlainTextSerializer,
nsIContentSerializer, NS_NewPlainTextSerializer)

CreatePlainTextSerializer found twice in
\layout\build\nsLayoutModule.cpp in the array :

static const nsModuleComponentInfo gComponents[]

{ "plaintext content serializer",
    NS_PLAINTEXTSERIALIZER_CID,
    NS_CONTENTSERIALIZER_CONTRACTID_PREFIX "text/plain",
    CreatePlainTextSerializer },
    
{ "plaintext sink",
 NS_PLAINTEXTSERIALIZER_CID,
 NS_PLAINTEXTSINK_CONTRACTID,
 CreatePlainTextSerializer },
---------------------------------------------------------------
Now I don't have a good global vision of the project so I can't say what
gComponents is used for. I hope this could help anyway.

From a user point of view I didn't see any side effect with my personnal Mozilla
version of Mozilla.

From a logical point of view I just think like Bob Clary : I see no reason to be
doing this conversion.

Did someone else try the patch with his own build ?
Comment 19 David Baron :dbaron: ⌚️UTC-10 2004-02-13 10:26:33 PST
I can see a very good reason to be doing this conversion.  When using the HTML
editor, hitting the space bar multiple times turns spaces into  .  Mail
composed using the HTML editor is often sent as text, and if this is the code
used to convert that mail to text, those nbsp characters need to be converted to
spaces.
Comment 20 Hadrien Nilsson 2004-02-13 22:27:42 PST
Comment #19: "those nbsp characters need to be converted to spaces"

This depends on the charset, and does not only concern no-break spaces.

When I'm using the iso-8859-1 charset, I have the right to use the 0xA0
character with this charset (and others which know about 0xA0, like
windows-1252, iso-8859-15, unicode flavours,...), the 0xA0 is a perfeclty valid
character, like é, è, à, ç...

Conversion from characters written in the html editor and not included in the
charset selected by the user is another issue. 

Btw, I found that Thunderbird/Mozilla mail components fails to manage 0xA0
characters. During edition when I insert some 0xA0 characters they also get
converted into 0x20, this is very annoying as my default mail encoding is
iso-8859-15. Other iso-8859-15 characters are kept. So this is just a matter of
consistency. Why some valid characters should be changed, and some others kept ?

At best the patch will also correct the mail component. At worst it will do
nothing, and then another bug file should be open about this 0xA0 conversion.
Comment 21 David Baron :dbaron: ⌚️UTC-10 2004-02-13 22:32:09 PST
My point is that when the user types <SPACE> <SPACE>, we should not send
non-breaking space characters in plaintext messages.  Does the patch break that?
Comment 22 Hadrien Nilsson 2004-02-16 04:47:57 PST
Comment #21: there is a side effect of the patch in the mail/news component.
Here are the results of my tests.

A) I composed mails with Thunderbird 0.5 in html mode (my default encoding is
iso-8859-15). The sending format is set to « Auto-Detect ».

1) I only used ascii characters, and started a line with two spaces.
result: the mail was sent as :

Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit

info: 7bit, so I only got 0x20 spaces.
user expectation: OK. With plain text renderer, spaces are not merged together,
so having 0x20 spaces is fine for separation, moreover I did type 0x20 characters. 

2) I used ascii letters but inserted manually several 0xA0 characters :

Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit

info: 7bit, this means my 0xA0 characters were converted to 0x20 spaces.
user expectation: Not OK. The 0xA0 I inserted manually should be kept, and so
the Content-Transfer-Encoding should be 8bit.

3) I used characters with accent, and started a line with two spaces.
result:

Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8bit

info: In an hexa viewer I don't see any 0xA0 spaces.
user expectation: OK. The Content-Transfer-Encoding is 8bit to handle the
characters with accent. No 0xA0 characters were inserted manually so none should
be appear.

4) Same test than A3) but I forced the sending in html
result:

Content-Type: text/html; charset=ISO-8859-15
Content-Transfer-Encoding: 8bit

info: the line which starts with two spaces is encoded as : 0xA0 0x20
user expectation: the rendering is OK. The insertion of 0xA0 in order to do a
margin ? Why not.


B) I composed mails with Mozilla 1.5 with the patch applied, in html mode (my
default encoding is iso-8859-1). The sending format is set to « Auto-Detect ».

1) Same test than A1)
result:

Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

info: this time, the 0xA0 automatically inserted was kept, the
Content-Transfer-Encoding has been set to 8bits instead of 7bit.
user expectation: Not OK. I never inserted any 0xA0 characters manually. I just
inserted regular spaces. So I should only get 0x20 spaces.

2) Same test than A2) (manually inserting 0xA0 characters)
result:

Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

info: same as A2).
user expectation: Not OK. Same as A2).

3) Same test than B1) but I forced the format to text mode.
info: same as B1).
user expectation: I expected to avoid the 0xA0 composer behaviour, but same as B1)

-------------------------------
In all cases the mail data itself is perfectly valid, but I do not always get
expected characters :

With the patch, I get unexpected 0xA0 characters instead of 0x20 ones.
In both cases, I get unexpected 0x20 characters instead of 0xA0 ones.

The problems comes from the fact the composer does not make any difference
between a 0xA0 inserted manually by the user, which should be always kept, and
its own automatically generated 0xA0 in order to create a margin, which could be
kept in html mode, but should be removed in text mode.

Daniel may have better ideas than mines to deal with this issue, but here you
are anyway :
- have a real text mode for the mail composer. In that case the automatic 0xA0
insertion would not be activated ;
- use another technic for margins ;
- make a difference in the composer between user's 0xA0, and automatic 0xA0. For
example :

 Four spaces<span class="moznbsp">    </span>then three spaces<span
class="moznbsp">   </span>done.

 Automatic 0xA0 would be enclosed with a span in order to make the difference.
- use the entity "&nbsp;" for automatic 0xA0, and reals 0xA0 for user's 0xA0.
When sending in text/plain mode, &nbps; would be converted to 0x20 and real 0xA0
would be kept.
- store somewhere in memory where are the automatic 0xA0 located
-------------------------------

A temporary-not-perfect solution would be to keep the 0xA0 --> 0x20 conversion
but at the mail component level, not in the nsPlainTextSerializer class itself,
so at least we would relieve the browser of this bug (and then open a new bug
about the mail component or about the composer).
Comment 23 Hadrien Nilsson 2004-02-17 11:05:10 PST
Today I've been inspecting the code in order to to implement the
"temporary-not-perfect solution" but then I thought about a nice improvement in
the way the composer could handle spaces.

In Comment #22 I proposed to enclose 0xA0 with a special span element :

Four spaces<span class="moznbsp">    </span>done.

I didn't really like the fact of using 0xA0 characters as the user just inserted
normal 0x20 characters. But I discovered those 0x20 characters can be kept given
the CSS class :

.moznbsp { white-space: pre; }

This works just fine.

----------------------------------
Small note :
Well, if you want to be stricter the CSS 1 recommandation says that white-space
applies to: block-level elements, so maybe it should be changed to :

Four spaces<div class="moznbsp">    </div>done.
.moznbsp { white-space: pre; display: inline; }

but with the span it has been working so far in many browsers.
----------------------------------

Summary :

hitting once the space key in the composer will insert a 0x20 character.
hitting once again the space key in the composer will insert another 0x20, and
enclose both of them into : <span class="moznbsp">  </span>
hitting again the space key will insert 0x20 characters into the span element :
<span class="moznbsp">       </span>

0xA0 inserted by the user will be managed just like another normal character.

I'm going to look around in the source code to find where the automatic 0xA0
insertion mechanism is made and think about how to implement such a solution
(but maybe somebody else knows more about this routine and could do the change
more easily).
Comment 24 Aaron Kaluszka 2004-05-24 17:19:58 PDT
Note that bug 213628 is dataloss and this potentially could be.
Comment 25 Olivier Mengué 2004-10-02 15:31:58 PDT
This bug is major on french Wikipédia (http://fr.wikipedia.org/) because in
french non-breaking spaces have to be used before colon, exclamation/question
mark...

Many articles correctly written by IE users using nbsp are altered by editors
using Mozilla/Firefox user agent without even being aware of this bug.

This bug should be at least written in in the Release Notes of the next version,
or better, fixed.


This is an HTML/XHTML bug as a readonly control will always have its "current
value" (see http://www.w3.org/TR/html401/interact/forms.html#current-value ) 
different from the "initial value" if the ''initial value'' contains nbsp (which
is not the definition of "readonly"). (and the bug is not limited to readonly
controls).

Severity should be changed to major. Keywords 'conversion' and 'xhtml' should be
added.
Comment 26 Olivier Mengué 2004-10-02 15:41:34 PDT
Created attachment 160872 [details]
Testcase

This is a testcase that can be used locally (without involving form submission
over HTTP).

"bug!" is shown in the page if your user agent has the bug.
Comment 27 Vincent Robert 2004-10-18 17:59:05 PDT
This bug is still NEW, and severity normal...

But I just broke my whole wiki content because of it ! 

I was transferring my database content using phpMyAdmin, and my content did
contain the 'à' character which is encoded as 'Ã[nbsp]' in UTF-8.

But when submitting this using phpMyAdmin, 'Ã[nbsp]' are simply transformed in
'Ã[sp]', and my char is lost as it is not a valid UTF-8 character.

Fortunately, I have IE to upload my text again but I am wondering how would
someone under Linux have repaired the problem.
Comment 28 Brendan Eich [:brendan] 2004-10-18 18:17:04 PDT
Cc'ing jst in case he has any insights.

/be
Comment 29 Johnny Stenback (:jst, jst@mozilla.com) 2004-10-19 13:53:17 PDT
One possibility here is to stop calling that code only when serializing the
value of form controls. I'll attach a patch...
Comment 30 Johnny Stenback (:jst, jst@mozilla.com) 2004-10-19 13:55:54 PDT
Created attachment 162609 [details] [diff] [review]
Persist nbsp characters only when serializing the value of form controls.
Comment 31 Hadrien Nilsson 2004-10-25 07:05:38 PDT
Great! It looks like the temporary-not-perfect I though of. It seems sensible
now, as this bug is about the browser only.

I may open a new bug for the mail component.
Comment 32 Vincent Robert 2004-10-29 14:00:39 PDT
Sorry for the spam but...
Pleaaaaase. Check this patch in for 1.0 final.
Comment 33 Johnny Stenback (:jst, jst@mozilla.com) 2004-10-29 15:57:30 PDT
We're way too close to the release of 1.0 to consider taking this kind of
changes. If you strongly disagree you can set the blocking-aviary1.0 flag to ?
and make a case for this, but I doubt it'll go through :(
Comment 34 Vincent Robert 2004-10-29 16:04:09 PDT
You may be right, but the thing is that this software is opensource (aka community based), and now 
that the community has found a solution to a very annoying problem, there seem to have no way to 
help solving this problem :-(

I think there is something else we could do to help instead of just waiting for a guru to check in this 
patch, but I don't know what... Testing ? Voting ? What can be done ?
Comment 35 Aaron Kaluszka 2004-10-29 17:05:18 PDT
Increasing severity and requesting block status due to dependent bugs and the
existence of a patch.
Comment 36 Johnny Stenback (:jst, jst@mozilla.com) 2004-10-29 19:39:08 PDT
(In reply to comment #34)
> You may be right, but the thing is that this software is opensource (aka
community based), and now 
> that the community has found a solution to a very annoying problem, there seem
to have no way to 
> help solving this problem :-(

Yeah, you're right, it's open source n' all, and the community is what matters
here. What we, the community, everyone involved, is dealing with here though is
balancing between risk of this patch causing problems and delaying the release
(or even worse, causing embarasing problems that we don't notice before the
release goes out the door) and the benefit of shipping with this fix. IOW, is
this problem worse to the community as a whole than the Firefox 1.0 release
being a failure due to a problem that could be caused by this fix?

With all the amazing effort being put into this release, in the actual product,
and at spreadfirefox.com et al, I *really* don't want to screw this up. So
unless there's a *huge* benefit to a *lot* of users in taking any given fix, I
wouldn't want to take the fix at this point...
Comment 37 Benoit 2004-10-31 06:05:16 PST
Well I don't think we have a keyword for that, but FWIW I can confirm this bug
is indeed important for French-speaking users, and for French l10n in general.
Comment 38 Jon Wiley 2004-10-31 09:02:58 PST
Is 251404 a duplicate of this bug?
Comment 39 Asa Dotzler [:asa] 2004-11-05 15:15:56 PST
not this late in the game. sorry. not a blocker.
Comment 40 David Baron :dbaron: ⌚️UTC-10 2004-11-25 11:34:27 PST
Comment on attachment 162609 [details] [diff] [review]
Persist nbsp characters only when serializing the value of form controls.

>+    // Normally &npsp; is replaced with a space character when

&nbsp;, not &npsp;

sr=dbaron  ***if*** you're sure that the editor never generates non-breaking
spaces when the user presses the space bar in text inputs (e.g., by pressing
space multiple times).
Comment 41 Benoit 2004-11-28 14:01:09 PST
Most probably a duplicate of #194498 (or probably the reverse since this one has
a patch with r/sr and dependencies)
Comment 42 Alexander Schremmer 2004-11-28 14:07:42 PST
(In reply to comment #41)
> Most probably a duplicate of #194498 (or probably the reverse since this one has
> a patch with r/sr and dependencies)

But this one is newer so it is rather the duplicate.
Comment 43 Frankie 2004-12-06 12:30:39 PST
*** Bug 251404 has been marked as a duplicate of this bug. ***
Comment 44 Benoit 2005-01-16 07:32:49 PST
Requesting for 1.8b since there is a patch which could be checked in now (and
hoping that it would make it in Firefox 1.1 as well)
Comment 45 Benoit 2005-01-26 12:44:56 PST
Adding intl keyword since this affects particularly french-speaking users who
(should) have to use nbsp extensively (see comment #27, comment #29 and comment #41)
Comment 46 Daniel Glazman (:glazou) 2005-01-31 00:53:54 PST
Sorry but I don't understand why this is a problem in french, and I am not sure
why we need this fix. And I am a native french speaker who has the book "Règles
Typographiques en vigueur à l'Imprimerie Nationale" on his work desk, who uses
french-speaking wikis.

Typographical rules in french allow only one non-breaking space before a
semi-colon, a question mark or an exclamation point. I've never seen a prose
with more than one unless it's in preformatted style, like in code for instance.
In that case - for a wiki - a pre element is the solution, isn't it?

Furthermore, David Baron is perfectly right: that code is needed for email...
The editor can't preserve multiple spaces if we don't turn some of them into &nbsp;.

Again, I am not convinced we need this fix, I can't see the plusses, and we have
one possible side-effect on email.
Comment 47 Adrien Beau 2005-01-31 07:05:15 PST
(In reply to comment #46)
> Sorry but I don't understand why this is a problem in french,

This is not especially a problem in French, but since French uses more
non-breaking spaces than other languages, this is a problem dear to many French
Mozilla users.

> and I am not sure why we need this fix.

We need to fix this for two reason: a Mozilla user cannot input a non-breaking
space in a web form; and most importantly, Mozilla destroys the work of other
users of web forms by removing the non-breaking spaces they might have put in
the form. Look at it this way: you edit a document, you save it without touching
anything, yet the text has been modified and you cannot do anything about it.

> Typographical rules in french allow only one non-breaking space before a
> semi-colon, a question mark or an exclamation point.

You forgot a few cases.

> I've never seen a prose with more than one unless it's in preformatted style,
> like in code for instance.

The problem is not with using several non-breaking spaces. The problem is using
even a single non-breaking space. You cannot do that.

> Furthermore, David Baron is perfectly right: that code is needed for email...

That code is *currently* needed for email, and that's a purely terrible design.
Unfortunately the code is deeply embedded at the core of Mozilla, so it cannot
be changed easily.

> The editor can't preserve multiple spaces if we don't turn some of them into
> &nbsp;.

Oh yes it can, just turn some of them into something else than a non-breaking
space! Use a character in the Unicode private area, for example.
Comment 48 Aaron Kaluszka 2005-01-31 08:39:28 PST
If this problem is not fixed here, then bug 213628 absolutely must be fixed
since in that case _Mozilla doesn't work_.
Comment 49 Benoit 2005-01-31 09:34:50 PST
Bug #194498 comment 22 explains why it is so tricky and exposes some possible
solutions based on charset detection, or by using another character internally
for editor-generated spaces (using it for user-entered data would probably work
too, although it may cause problems when copy/pasting from a text control to
another application). In any case, I'm confident that a solution will eventually
be found. 

My two cents on this: in my understanding, 'nbsp's created by the editor
internally are never "alone", they are either preceded or followed by other nbsp
characters, line returns or 'classic' spaces. 

So, instead of preserving any nbsp found in a text control like attachment
162609 [details] [diff] [review] does, let's try preserving only nbsps found "alone" in a form control
(input text, input file or textarea). That is, when they are surrounded by
letters (any language, not only latin ones), numbers or punctuation signs. It
would fix 99% of the cases, including the upload problem described in bug
#213628 (I can't possibly imagine the use for a file name with multiple nbsps in
a row. If it exists it is probably a virus trying to hide a file extension anyway). 

The only problem I think of would be with so-called ASCII art and the like,
which is not so important IMHO, and for which users should rightfully use "pre"
or the associated command in their wiki syntax.

Is that a sensible approach? I've been thinking of this all day long, but I may
not have understood all the ins and outs. I don't know for example how it would
affect performance.
Comment 50 Vincent Robert 2005-01-31 09:42:33 PST
This bug is not about mail or multiple nbsp's. This bug is about data loss in
form serialization in the browser. Characters provided in a form are not the
characters received by the server, and that is not a correct behaviour.

It has nothing to do with mails, and the last patch provide a way to keep the
current mail behavior while making the browser behaviour correct.
Comment 51 Benoit 2005-01-31 10:06:58 PST
(In reply to comment #50)
> This bug is not about mail or multiple nbsp's. This bug is about data loss in
> form serialization in the browser. Characters provided in a form are not the
> characters received by the server, and that is not a correct behaviour.

The relation with mail is that the same code (editor) is used for both, and that
it generates "fake" nbsp characters for internal use when you type multiple
spaces in a row (see comment #40 and bug #194498 comment 12). That's why they
are removed at submission time, because in most cases they are in fact generated
by the editor and thus not desired. 

But that code is just being a little over-zealous, since it also removes nbsp
characters entered manually by the user, by the file selector (upload case), by
copy-paste from a Word processor (french quote marks and punctuation) or
existing nbsp characters that were pre-filled in the form input from a remote
database (wikipedia case).
Comment 52 Hadrien Nilsson 2005-01-31 11:03:30 PST
The patch unactivates the 0x00A0 removal for Web forms only. So at least the
browser component would be kept bug-free, and many related bugs could be closed.

As I stated in comment #22 and comment #23, the composer uses an ugly method to
keep adjacent spaces. In an ideal world, this method would be replaced by
another one which does not use the 0x00A0 character hack. Therefore, the 0x00A0
to 0x0020 conversion would vanish.

Unfortunately this is quite a big work, so the current patch is a very good and
safe temporary solution (thanks Johnny).

(For Daniel, here is a reminder from the « Lexiques des règles typographiques en
usage à l'Imprimerie Nationale (2002) » :
http://psydk.org/gecko-bugs/french-spaces.jpg )
Comment 53 Akkana Peck 2005-01-31 11:21:07 PST
Did anyone ever answer dbaron's question in comment 40?
Comment 54 David Baron :dbaron: ⌚️UTC-10 2005-01-31 11:38:43 PST
*** Bug 194498 has been marked as a duplicate of this bug. ***
Comment 55 Benoit 2005-01-31 14:41:37 PST
(In reply to comment #53)
> Did anyone ever answer dbaron's question in comment 40?

I think it was Daniel's point in comment #46. Here is what he said on IRC:

[09:53] <Benoit-> I see, so the result would be that with the patch applied it
would *generate* nbsps when someone uses multiple spaces in a text form input?
[09:54] <glazou> right

In my understanding, that means the latest patch is not good enough but it could
be tweaked a bit to do what I describe in comment #49 (conserve only _single_
nbsps).

Here is what it does in nsPlainTextSerializer.cpp:

+  if (!(mFlags & nsIDocumentEncoder::OutputPersistNBSP)) {
+    // First, replace all nbsp characters with spaces,
+    // which the unicode encoder won't do for us.
+    static PRUnichar nbsp = 160;
+    static PRUnichar space = ' ';
+    aString.ReplaceChar(nbsp, space);
+  }

To that could be added something like (that's really pseudo-code, I don't know
how you'd do char concatenation here but you get the picture)

+  else {
+    // nbsp characters which are surrounded by spaces or line breaks
+    // are probably generated by the editor and not meant by the user.
+    // We are converting them anyway.
+    static PRUnichar nbsp = 160;
+    static PRUnichar space = ' ';
+    static PRUnichar linebreak = '\n';
+
+    // multiple nbsp characters
+    aString.ReplaceSubstring(nbsp+nbsp, space+space);
+
+    // nbsp characters following or preceding a space
+    aString.ReplaceSubstring(nbsp+space, space+space);
+    aString.ReplaceSubstring(space+nbsp, space+space);
+
+    // nbsp characters beginning or ending a line (the latter could be trimmed)
+    aString.ReplaceSubstring(nbsp+linebreak, space+linebreak);
+    aString.ReplaceSubstring(linebreak+nbsp, linebreak+space);
+  }

That "else" would never be reached when not in a form control (e.g. in mail). It
would take care of all edge case I can think of, some of them are probably not
necessary at all. I know it's a very dirty hack, I'd at least have done that
using only one regular expression but I don't think we can do that in c++ can we?

(Just trying to be helpful here, sorry if you think I'm being overly naive and
consider this as bugspam, that's not in my intention - I'd try this at home, see
if it builds and send a proper patch if I was not in a time of exams)
Comment 56 Vincent Robert 2005-01-31 15:42:37 PST
The last patch does NOT affect mail or composer. Comment #30 says that it
applies the nbsp-to-sp convertion to everything but web forms.

In Comment #31, Hadrien says that he will file a new bug for Mail. I think he is
refering to the current wrong behaviour of Mail described in Comment #22. This
wrong behaviour is not corrected by the last patch because it does only affect
web forms.

Just trying to clarify some thoughts :)
Comment 57 Hadrien Nilsson 2005-02-01 01:17:23 PST
I got two ideas to keep adjacent spaces in the editor without the NBSP method.

1) First idea

Use the private Unicode area for Gecko and create an "EditorInternalNbsp". Some
value carefully chosen in the range 0xE000..0xF8FF. The editor would insert this
 new character instead of 0x00A0.

Pros :
- minor changes to the editor.

Cons : 
- changes needed in the html renderer ;
- non portable html files.

2) Second idea

Instead of using 0x00A0 (NBSP) sequences, a pair of 0x0020 (SPACE) and 0xFEFF
(ZERO WIDTH NBSP) could be used.

Here you are a scenario after pressing the space key 3 times in the editor :
Currently :  0x00A0 0x00A0 0x0020
Suggestion : 0x0020 0xFEFF 0x0020 0xFEFF 0x0020

When converting from html to text, a (0x0020 0xFEFF) pair would change to a
single 0x0020 character.

Pros : 
- no change in the html render, adjacent spaces are kept with this method ;
- 0x00A0 are always kept and becomes a normal char ;
- portable html files, meaning the content can be sent as-is.

Cons :
- in the editor, the user as to type twice on the arrow key to move the caret ;
- when copying & pasting text, the 0xFEFF values are kept, and may be rendered
strangely in the destination software (for example, Notepad renders the 0xFEFF
as dots).

In the last two items, the editor would be aware of (0x0020 0xFEFF) sequences
and could move of one space only (I think it would be the same kind of code that
when managing surrogate pairs). About copy & pasting, the html to text converter
could be aware of the kind of sequence too, and convert them to single 0x0020
characters (same idea than converting html representation to plain text
representation, like <li> to "-").

Comment 58 Akkana Peck 2005-02-01 09:59:14 PST
> Use the private Unicode area for Gecko and create an "EditorInternalNbsp".

I'd argue that the new character should be treated as a space, not as a
non-breaking space; the non-breaking aspect of nbsp isn't why we're using it
here, and if someone actually wants non-breaking behavior they'll use a real nbsp.

The important part would be modifying layout to do the right thing with the new
character.  If you could interest a layout person in this, the editor and
serializer portions probably wouldn't be that difficult.  

My guess is that the layout changes wouldn't be too difficult either, for
someone familiar with the code: the new character would be treated exactly like
normal spaces: no new behavior, just a new char which gets the old behavior.

> 0xFEFF (ZERO WIDTH NBSP)

Interesting idea, but ...the argument here is that nobody actually wants 0xFEFF
in output, while they do want 0xA0?  I don't think most people thought anybody
wanted 0xA0 when the current code was originally written, so I'd be leery of
solutions which assumed "Our character is important, but this other character
must not be ..."

I also worry that the new caret-moving code in gecko and the code for
inserting/deleting would be fairly tricky to account for sometimes skipping the
zwsp characters.  There would be a potential for a lot of new bugs there if it
wasn't tested very thoroughly.
Comment 59 David Baron :dbaron: ⌚️UTC-10 2005-02-01 13:09:37 PST
It seems like what we should really be doing is:
 * when we're editing plain text, store multiple presses of space as spaces
(this is a change)
 * when we're editing HTML, store multiple presses of space using non-breaking
spaces for all but the last press (tricky with deletion) (we probably do this
fine already)
 * when serializing HTML to text, convert runs of non-breaking spaces terminated
by a space to spaces
 * when using nsPlainTextSerializer to convert text to text (if we need to use
it at all, although we seem to now), don't mess with spaces

Does this make sense?
Comment 60 Hadrien Nilsson 2005-02-02 06:41:51 PST
> It seems like what we should really be doing is:
>  * when we're editing plain text, store multiple presses of space as spaces
> (this is a change)

Yes, there is a need for a full text editor in the mail component. No need of
the NBSP-for-adjacent-spaces trick then. Fine for me.

>  * when we're editing HTML, store multiple presses of space using non-breaking
> spaces for all but the last press (tricky with deletion) (we probably do this
> fine already)

The editor actually does this, and this behaviour is ok when editing HTML. Fine
for me.

>  * when serializing HTML to text, convert runs of non-breaking spaces terminated
> by a space to spaces

When you are editing a plain text mail, currently the mail component seems to do
HTML editing internally, and converts the HTML to text. That's why the
conversion from NBSP to SP was needed. With a true plain text editor (as you
suggest in the first point), this conversion won't be needed anymore. 

The conversion from NBSP to SP would be kept at an higher level, for example
when the user starts writing a mail in HTML and then decides to switch to plain
text mode while still editing his message. He knows this may lead to some losses
or some conversion tricks.

Fine for me.

>  * when using nsPlainTextSerializer to convert text to text (if we need to use
> it at all, although we seem to now), don't mess with spaces

YES. In that case we can come back to the original patch which just removes the
conversion. Fine for me.
 
> Does this make sense?

I think you get it right David, your ideas look sensible imho.

So changes are :
- a real plain text edition mode for the editor ;
- a move of the NBSP to SP conversion code out of the nsPlainTextSerializer.

I'm clueless about the new location of the NBSP to SP conversion, but I guess
there is already a piece of code which goal is to convert HTML to plain text ?
Comment 61 timeless 2005-02-02 08:31:01 PST
> So changes are :
> - a real plain text edition mode for the editor ;

this so isn't happening.
Comment 62 David Baron :dbaron: ⌚️UTC-10 2005-02-17 11:58:39 PST
Too late for 1.8b1; I'll try to do this for 1.8b2.
Comment 63 Asa Dotzler [:asa] 2005-04-17 11:17:33 PDT
dbaron, should we try to keep this in the b2 timeframe (next few days?) or move
it to 1.8b3?
Comment 64 Asa Dotzler [:asa] 2005-06-14 11:32:05 PDT
not a blocker. dbaron will keep this on his list in case he has time.
Comment 65 Timwi 2005-07-31 15:41:58 PDT
This is a duplicate of bug 195946.
Comment 66 David Baron :dbaron: ⌚️UTC-10 2005-07-31 17:38:09 PDT
*** Bug 195946 has been marked as a duplicate of this bug. ***
Comment 67 Simon Montagu :smontagu 2005-10-03 00:04:23 PDT
*** Bug 310877 has been marked as a duplicate of this bug. ***
Comment 68 Sergey «Mithgol the Webmaster» Sokoloff 2006-01-07 03:59:58 PST
Is there any chance of this fixed in Fx 2.0?

(FYI, I've just ran into this bug while editing Russian Wikipedia, ru.wikipedia.org... I've had to use lots of &nbsp; instead, in order to fix the Wikipedia article, and that in turn annoyed several further editors who thought that HTML entity, while used much, makes the source code of Wiki article nearly unreadable, and reading diffs of it is made a real pain. They were obviously right. Now I start thinking that mere using of Firefox in Wikipedia sometimes should be considered as a severe case of vandalism, and the wikiuser's account should be banned from the system until he/she starts using some better designed browser.)
Comment 69 Denis Jacquerye 2006-08-03 14:51:03 PDT
This bug has been opened for 3 years now. 
What's stopping from fixing it?

This is the most distrubing 'feature'.
If users type nbsp, copy and paste it or type the Alt+num value it's because they want it.
Comment 70 Boris Zbarsky [:bz] (still a bit busy) 2006-08-03 15:32:45 PDT
> What's stopping from fixing it?

Lack of time.  Feel free to help out -- see comment 59 for what needs to be done.
Comment 71 Boris Zbarsky [:bz] (still a bit busy) 2006-08-03 15:33:08 PDT
> What's stopping from fixing it?

Lack of time.  Feel free to help out -- see comment 59 for what needs to be done.
Comment 72 David A. Madore 2006-08-05 06:18:37 PDT
I can understand that more time would be needed for an elaborate fix, but there is already a patch to fix this problem, attached to this bug, which has been reviewed and superreviewed: what is stopping it from getting committed?  Why are two years not sufficient to check in a patch which has already been approved?

This bug has such dramatic consequences everywhere (disastrous edits on Wikipedia en masse, for one thing) that I'm willing to stand on my knees and beg, or donate money, or whatever, but I don't have check-in permission so I can't commit the patch myself...
Comment 73 Boris Zbarsky [:bz] (still a bit busy) 2006-08-06 10:13:26 PDT
> what is stopping it from getting committed?

Just the need to merge it to trunk.  I'd be happy to commit if someone does that.  Note that nsIDocumentEncoder is an IDL file now, not a .h....
Comment 74 Boris Zbarsky [:bz] (still a bit busy) 2006-08-06 10:17:07 PDT
jpl24, would you mind updating that patch to tip?
Comment 75 jpl24 2006-08-06 12:07:01 PDT
Sure, no problem.
Comment 76 Akkana Peck 2006-08-06 12:17:18 PDT
The patch would still convert all nbsps, though, right?
So runs of multiple spaces typed by the user would generate 0x20 0xa0 0x20 0xa0 etc. with this patch even if it were updated to the trunk.

There were several suggestions to avoid that (use zwsp, don't convert nbsp when it's by itself but do convert it if it's interspersed with spaces) but the patch that was posted here doesn't attempt to address those.
Comment 77 David A. Madore 2006-08-06 12:59:24 PDT
[To answer Akkana Peck's objection.]  What I understand from the discussion so far and from the diff itself is that the patch we're talking about suppresses conversion of nbsp to space on plain text input forms, not on HTML composer.  Now on plain text input forms, no spurious nbsp's are generated by Mozilla hacks (contrary to composer), so there any nbsp found there is a genuinely input nbsp (either by user input or by initial form value).  Am I misunderstanding something?  Because if this is correct, there is really no reason not to apply this patch, it can't break anything.

Of course, I still think it's a bug to convert nbsp to space on composer text, but it's not nearly as bad as converting nbsp to space on plain text input forms which is what the patch (again, if I understand correctly) does.

Please correct me if I'm wrong.
Comment 78 Boris Zbarsky [:bz] (still a bit busy) 2006-08-06 14:33:01 PDT
I believe comment 77 is correct.
Comment 79 jpl24 2006-08-06 16:47:01 PDT
Created attachment 232457 [details] [diff] [review]
patch_v2

This is an updated version of jst's patch. It fixes the test case in this bug, but does not change composer behavior.

Do we have any unit tests for the plaintext serializer?
Comment 80 Akkana Peck 2006-08-06 16:52:06 PDT
If the plaintext edit rules used for form fields don't insert the nbsp characters, then by all means let's get this fixed for forms (then the debate re. html composer can continue: there are valid arguments both ways). I thought that was the holdup, that both html and plaintext edit rules are inserting nbsps when the user types a run of spaces.

Is there a form test case somewhere that reports exactly what got submitted? The test case referenced in this bug's URL field doesn't do that, but it would be useful both for testing the patch and as a regression test afterward.
Comment 81 Akkana Peck 2006-08-06 16:55:16 PDT
> Do we have any unit tests for the plaintext serializer?

The DOM to Text conversion tests (in htmlparser/tests/outsinks, iirc) that run from Tinderbox test the the html and plaintext serializer, and it would definitely be worth adding a case to those to cover this issue. They don't test form submission, though, which takes a slightly different path through the parser (but which might require a manual test).
Comment 82 Boris Zbarsky [:bz] (still a bit busy) 2006-08-06 20:20:20 PDT
> Do we have any unit tests for the plaintext serializer?

Not really.  Bug 333060 more or less covers that (initing the nsIDocumentEncoder with text/plain as the type should do it).  It should be possible to test a variety of output types and flags and such, including whatever flags form submission uses in this case.

Akkana is probably right that we should add a test to the outsinks if we can too, esp. since tinderbox actually runs those.
Comment 83 David Baron :dbaron: ⌚️UTC-10 2006-08-06 21:16:38 PDT
(In reply to comment #73)
> > what is stopping it from getting committed?
> 
> Just the need to merge it to trunk.

I thought comment 40 was, but maybe that was addressed and I don't see it.
Comment 84 Boris Zbarsky [:bz] (still a bit busy) 2006-08-06 22:08:49 PDT
I did verify that typing spaces in a text input and textarea generates spaces, not non-breaking spaces, when we hit nsPlainTextSerializer::Output.

Checked in that patch to trunk; marking fixed.  I've filed bug 347689 on implementing the proposal in comment 59.
Comment 85 Sergey «Mithgol the Webmaster» Sokoloff 2006-08-07 08:41:09 PDT
Would it be too imprudent to nominate the patch of this dataloss for 1.8.1 blockers? (I mean, somewhere about 2006-08-08, if it survives on the trunk with no regressions reported in 2 days.)
Comment 86 Boris Zbarsky [:bz] (still a bit busy) 2006-08-07 08:47:19 PDT
This patch can't land on 1.8 as-is -- it changes interfaces.
Comment 87 Robert Sayre 2006-10-26 13:33:14 PDT
RCS file: /cvsroot/mozilla/testing/mochitest/tests/test_bug218277.html,v
done
Checking in tests/test_bug218277.html;
/cvsroot/mozilla/testing/mochitest/tests/test_bug218277.html,v  <--  test_bug218277.html
initial revision: 1.1
done
Comment 88 Paul Borokhov (lensovet) 2006-11-25 11:34:55 PST
when will this become functional on other browsers, such as camino, ffx, etc?
Comment 89 Boris Zbarsky [:bz] (still a bit busy) 2006-11-26 10:09:33 PST
Whenever they update to Gecko 1.9.  So Firefox 3 and whatever Camino, etc versions will be using that version of Gecko.
Comment 90 Paul Borokhov (lensovet) 2006-11-26 10:26:42 PST
FYI, the fact that this fix was intentionally made to only affect TEXTAREAs and not all text fields has resulted in bug 213628 and bug 290565 remaining not fixed when they could have been resolved at the same time. can someone explain why this patch was intentionally castrated?
Comment 91 Boris Zbarsky [:bz] (still a bit busy) 2006-11-26 10:38:54 PST
> the fact that this fix was intentionally made to only affect TEXTAREAs and
> not all text fields

This patch affects <textarea>, <input type="text">, and <input type="password">.

> as resulted in bug 213628 and bug 290565 remaining not
> fixed when they could have been resolved at the same time

Bug 213628 should have been fixed by this patch.  Can you point to a testcase that demonstrates that it's not?

Bug 290565 would need to be fixed in totally different code.

> can someone explain

See comment 19 and comment 21.  And in general, please just read bugs carefully and in their entirety before commenting.  It'll help prevent the hundreds of emails you've now generated.
Comment 92 Sergey Svishchev 2007-02-11 02:51:23 PST
(In reply to comment #89)
> Whenever they update to Gecko 1.9.  So Firefox 3 and whatever Camino, etc
> versions will be using that version of Gecko.
> 
How Gecko 1.9-using browser can be identified from User-Agent string?
Comment 93 Boris Zbarsky [:bz] (still a bit busy) 2007-02-11 08:51:18 PST
It'll have "rv:1.9" in it.
Comment 94 Jesse Ruderman 2007-04-02 00:49:17 PDT
*** Bug 375954 has been marked as a duplicate of this bug. ***
Comment 95 Simon Paquet [:sipaq] 2007-05-24 04:18:42 PDT
Comment on attachment 136358 [details] [diff] [review]
This patch corrects the « nsPlainTextSerializer::Output » function

Since this bug has been resolved as FIXED, I'm removing the
review request.
Comment 96 Shift 2008-09-11 14:28:27 PDT
This bug still exists.

When I copy a non breaking space from Firefox, it is automatically converted to a normal space.

Please, reopen this case.

My u.a. : Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008090210 Mandriva/1.9.0.1-15mdv2009.0 (2009.0) Firefox/3.0.1
Comment 97 Benoit 2008-09-11 14:39:52 PDT
Cut'n'paste would be another bug, this one was about form submission. 

Feel free to open a new bug on cut'n'paste if it doesn't exist yet.
Comment 98 Bogdan Butnaru 2011-02-14 04:56:48 PST
As far as I can tell this still happens in 4.0b11. For example, NBSP characters typed in the <textarea>s used by GMail and Launchpad are converted to normal spaces. 

I suspect it also happens with the textarea I’m typing this in. On my system, the second paragraph of comment 96 (above) wraps between “converted to” and “a normal”. Below I type the exact same paragraph, but I’ll type NBSPs instead of  normal spaces after “automatically”:

When I copy a non breaking space from Firefox, it is automatically converted to a normal space.
Comment 99 Bogdan Butnaru 2011-02-14 04:59:50 PST
Note that while I was typing the above comment, the text wrapped correctly (everything from “automatically”, including that word, went on the next line before I clicked “save changes”); however the text above wraps between “to” and “a”.
Comment 100 Bogdan Butnaru 2011-02-14 05:05:16 PST
Hmm. Sorry for the many messages, it appears I’m wrong. Looking at the source for the above test, there is a “&nbsp;” where I typed NBSP characters. There’s apparently a different bug of Firefox that causes it to line-wrap at those entities; can anyone point me to the relevant bug, or to an explanation why that isn’t a bug?
Comment 101 Benoit 2011-02-14 05:11:26 PST
What you are experiencing seems to be a case of server-side conversion, about which Firefox can't do anything.

The testcase attached to this bug still returns "no bug" and a preview edit on a Wikipedia page shows that NBSPs are correctly parsed and submitted by Firefox 4 beta 11.

If this very sentence full of NBSPs is wrapping, you should probably file a bug on the Bugzilla product itself.

Note You need to log in before you can comment on or make changes to this bug.