Closed
Bug 218277
Opened 21 years ago
Closed 18 years ago
no-break spaces (nbsp) in submitted text are replaced by breakable spaces
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.9alpha1
People
(Reporter: moz, Assigned: dbaron)
References
()
Details
(Keywords: dataloss, intl)
Attachments
(4 files, 1 obsolete file)
1.20 KB,
application/octet-stream
|
Details | |
1.06 KB,
patch
|
Details | Diff | Splinter Review | |
528 bytes,
text/html
|
Details | |
4.89 KB,
patch
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6
Let's have a string containing 30 times the letter 'a', 5 no-break space
(character 0xA0 or 160 in iso-8859-1, iso-8859-15 or unicode) and a dot '.'
Note : you can input no-breack space on Windows with by letting the Alt key down
and type 0160 on the keypad
Input this string into a text input element in a form. Submit the form and
display the submitted string. The 5 no-break space have changed to 1 breakable
space (0x20 or 32). The 5 no-break spaces should have been kept.
You can test this on http://mess.genezys.net/NoBreak/
This is quite important when you submit french texts as this language uses a lot
of no-break spaces (for instance before those punctuation marks --> ? ! : ;)
Reproducible: Always
Steps to Reproduce:
1.input a string with many no-break space in a text input in a form
2.submit the data
3.display the data received
Actual Results:
One single no-break space (0xA0) are replaced by one breakable space (0x20)
Continuous multiple no-break spaces are replaced by one breakable space
Expected Results:
Mozilla should keep the no-break spaces just as they appear in the original
input string.
Reporter | ||
Comment 1•21 years ago
|
||
Will need PHP version 4 or better
Updated•21 years ago
|
Whiteboard: DUPEME
Comment 2•21 years ago
|
||
*** Bug 219774 has been marked as a duplicate of this bug. ***
Comment 3•21 years ago
|
||
I would like to point that Mozilla just *transforms* 0xA0 bytes (non-breakable
space) into 0x20 bytes (breakable space). Why ?
HTTP posting should just send bytes through the network, and not analyse them.
As you can see with the test page attached, Internet Explorer does not make that
transformation.
Reporter | ||
Comment 4•21 years ago
|
||
I discovered I was wrong about one point :
« The 5 no-break space have changed to 1 breakable space (0x20 or 32) »
Instead, each no-break space (0xA0) is replaced by one breakable space (0x20).
For instance, « a b », encoded as :
0x0061, 0x00A0, 0x00A0, 0x00A0, 0x0062
will be converted to :
0x0061, 0x0020, 0x0020, 0x0020, 0x0062
I downloaded and compiled the source code and successfully reproduced the bug. I
will try to find the origin of that bug.
Reporter | ||
Comment 5•21 years ago
|
||
I found the problem. The scary thing is that... it is done on purpose ! Here are
more details about the bug :
(btw, I used the official source archive : MozillaFirebird-source-0.7.tar.gz)
Here is the function call stack I had when debugging (the most recent call is
written first, the parent calls follow) :
nsTextControlFrame::GetText(nsTextControlFrame * const 0x03a03028, nsString *
0x0012f00c {""}) line 2758
nsTextControlFrame::GetValue(nsTextControlFrame * const 0x03a030b4, nsAString &
{...}, int 0x00000001) line 2949 + 54 bytes
nsPlaintextEditor::OutputToString(nsPlaintextEditor * const 0x03a81320, const
nsAString & {...}, unsigned int 0x00000418, nsAString & {...}) line 1424 + 39 bytes
nsDocumentEncoder::EncodeToString(nsDocumentEncoder * const 0x0392a438,
nsAString & {...}) line 935 + 39 bytes
nsPlainTextSerializer::Flush(nsPlainTextSerializer * const 0x0392ac08, nsAString
& {...}) line 453
nsPlainTextSerializer::FlushLine() line 1273
nsPlainTextSerializer::Output(nsString & {"u v"}) line 1296
-----------------------------------------------------------------------------------
So here we are, the problem is in
void nsPlainTextSerializer::Output(nsString& aString)
In file :
/mozilla/content/base/src/nsPlainTextSerializer.cpp line 1279
Here is the function :
/**
* Prints the text to output to our current output device (the string
mOutputString).
* The only logic here is to replace non breaking spaces with a normal space since
* most (all?) receivers of the result won't understand the nbsp and even be
* confused by it.
*/
void
nsPlainTextSerializer::Output(nsString& aString)
{
if (!aString.IsEmpty()) {
mStartedOutput = PR_TRUE;
}
// First, replace all nbsp characters with spaces,
// which the unicode encoder won't do for us.
static PRUnichar nbsp = 160;
static PRUnichar space = ' ';
aString.ReplaceChar(nbsp, space);
mOutputString->Append(aString);
}
--------------------------------------------------------------------
So here you see the special replacement. This is bad to do such a thing.
Those three code lines should be removed :
static PRUnichar nbsp = 160;
static PRUnichar space = ' ';
aString.ReplaceChar(nbsp, space);
I removed them in my personnal build and rebuild the browser. I tried the page
on http://mess.genezys.net/NoBreak and everything worked lovely.
Internet Explorer and Opera behave correctly. They do not replace 0x00A0 (160)
chars with 0x0020 (32) chars. The same should be done for Mozilla.
--------------------------------------------------------------------------
Now that the bug was found, who could fix it in the CVS tree ?
Reporter | ||
Comment 6•21 years ago
|
||
D'oh ! I forgot I made reorganisation. Please read the trace stack with the
parent calls first instead of « (the most recent call is written first, the
parent calls follow) : »
Reporter | ||
Comment 7•21 years ago
|
||
Reporter | ||
Updated•21 years ago
|
Flags: blocking1.6b?
Comment 8•21 years ago
|
||
Bug 195946 may be resolved too with this patch... Need some tests.
Reporter | ||
Comment 9•21 years ago
|
||
> Vincent Robert :
> Bug 195946 may be resolved too with this patch... Need some tests.
I did some tests, the patch also corrects bug 195946 and bug 213628
Comment 10•21 years ago
|
||
We changed this behavior over two years ago, see bug 62189, thinking we were
doing the right thing.
Comment from nsPlainTextSerializer.cpp where this happens: "Prints the text to
output to our current output device (the string mOutputString).The only logic
here is to replace non breaking spaces with a normal space sincemost (all?)
receivers of the result won't understand the nbsp and even be confused by it."
If we're going to change this, it probably needs to happen early in an alpha or
beta cycle so we can have some time to see if we break some applications that
might expect Mozilla's current behavior.
Flags: blocking1.6b? → blocking1.6b-
Comment 11•21 years ago
|
||
Asa, from what I can tell, the nsPlainTextSerializer.cpp was checked in by Vidur
with this nbsp->sp conversion directly from nsHTMLToTXTSinkStream. bug 62189
cleaned up a bunch of stuff and added comments to nsPlainTextSerializer::Output
but didn't change the already existing nbsp->sp conversion.
I see no reason to be doing this conversion and no justification found in the
comments. Perhaps Daniel Glazman and others can take a look at this and comment.
I agree that this is something for alpha and not beta.
Reporter | ||
Comment 12•21 years ago
|
||
As suggested, I set a review request flag for 1.7a.
Flags: blocking1.7a?
Comment 13•21 years ago
|
||
Comment on attachment 136358 [details] [diff] [review]
This patch corrects the « nsPlainTextSerializer::Output » function
Requesting r from Daniel per comment 11.
Attachment #136358 -
Flags: review?(daniel)
I can't think of any reason why this patch is wrong but I would like a few other
opinions before I give this r=.
Akkana, Joe : we need your help here. Do you think the proposed change is safe?
Comment 15•21 years ago
|
||
Well, someone should enumerate the common cases where the plaintext serializer
is used, and think abuo the results. Example: you compose mail in composer in
the html view, and then choose the "send as plaintext" option when you get a
warning about one of your recipients not savvy to html. Are there relevant
RFC's on plaintext mail that address this? Are there internationalization
issues with 0xA0?
My experience doens't help much here. I know more about which situations demand
nbsp's in our internal representation of the data while editing is in progress,
rather than the various consumers of output from the plaintext serializer.
If I understand this bug correctly, though, one side effect should be that
copy/paste of nbsp's within a plaintext editor should transform them to spaces?
So if you use nbsp's in soem ascii art to prevent a line from wrapping, and
then copy/paste that line in a plaintext editting session, it will suddenly wrap
(if long enough), ya?
Comment 16•21 years ago
|
||
time to make the call in this for 1.7a. how much, and what kind of testing has
been done with the patch?
Assignee | ||
Comment 17•21 years ago
|
||
Someone should figure out what codepaths this is used for and make sure this
doesn't happen where we don't want it. Furthermore, I don't see any reason this
should block the release.
Flags: blocking1.7a? → blocking1.7a-
Reporter | ||
Comment 18•21 years ago
|
||
Here is some piece of information I could gather :
---------------------------------------------------------------
The function to change is :
protected nsPlainTextSerializer::Output(nsString& aString)
called by:
protected nsPlainTextSerializer::FlushLine()
protected nsPlainTextSerializer::EndLine(PRBool aSoftlinebreak)
protected nsPlainTextSerializer::OutputQuotesAndIndent(PRBool
stripTrailingSpaces /* = PR_FALSE */)
protected nsPlainTextSerializer::Write(const nsAString& aString)
class nsPlainTextSerializer used in
NS_NewPlainTextSerializer(nsIContentSerializer** aSerializer)
NS_NewPlainTextSerializer(nsIContentSerializer** aSerializer) used in
\layout\build\nsLayoutModule.cpp(466):MAKE_CTOR(CreatePlainTextSerializer,
nsIContentSerializer, NS_NewPlainTextSerializer)
CreatePlainTextSerializer found twice in
\layout\build\nsLayoutModule.cpp in the array :
static const nsModuleComponentInfo gComponents[]
{ "plaintext content serializer",
NS_PLAINTEXTSERIALIZER_CID,
NS_CONTENTSERIALIZER_CONTRACTID_PREFIX "text/plain",
CreatePlainTextSerializer },
{ "plaintext sink",
NS_PLAINTEXTSERIALIZER_CID,
NS_PLAINTEXTSINK_CONTRACTID,
CreatePlainTextSerializer },
---------------------------------------------------------------
Now I don't have a good global vision of the project so I can't say what
gComponents is used for. I hope this could help anyway.
From a user point of view I didn't see any side effect with my personnal Mozilla
version of Mozilla.
From a logical point of view I just think like Bob Clary : I see no reason to be
doing this conversion.
Did someone else try the patch with his own build ?
Assignee | ||
Comment 19•21 years ago
|
||
I can see a very good reason to be doing this conversion. When using the HTML
editor, hitting the space bar multiple times turns spaces into . Mail
composed using the HTML editor is often sent as text, and if this is the code
used to convert that mail to text, those nbsp characters need to be converted to
spaces.
Reporter | ||
Comment 20•21 years ago
|
||
Comment #19: "those nbsp characters need to be converted to spaces"
This depends on the charset, and does not only concern no-break spaces.
When I'm using the iso-8859-1 charset, I have the right to use the 0xA0
character with this charset (and others which know about 0xA0, like
windows-1252, iso-8859-15, unicode flavours,...), the 0xA0 is a perfeclty valid
character, like é, è, à, ç...
Conversion from characters written in the html editor and not included in the
charset selected by the user is another issue.
Btw, I found that Thunderbird/Mozilla mail components fails to manage 0xA0
characters. During edition when I insert some 0xA0 characters they also get
converted into 0x20, this is very annoying as my default mail encoding is
iso-8859-15. Other iso-8859-15 characters are kept. So this is just a matter of
consistency. Why some valid characters should be changed, and some others kept ?
At best the patch will also correct the mail component. At worst it will do
nothing, and then another bug file should be open about this 0xA0 conversion.
Assignee | ||
Comment 21•21 years ago
|
||
My point is that when the user types <SPACE> <SPACE>, we should not send
non-breaking space characters in plaintext messages. Does the patch break that?
Reporter | ||
Comment 22•21 years ago
|
||
Comment #21: there is a side effect of the patch in the mail/news component.
Here are the results of my tests.
A) I composed mails with Thunderbird 0.5 in html mode (my default encoding is
iso-8859-15). The sending format is set to « Auto-Detect ».
1) I only used ascii characters, and started a line with two spaces.
result: the mail was sent as :
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
info: 7bit, so I only got 0x20 spaces.
user expectation: OK. With plain text renderer, spaces are not merged together,
so having 0x20 spaces is fine for separation, moreover I did type 0x20 characters.
2) I used ascii letters but inserted manually several 0xA0 characters :
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
info: 7bit, this means my 0xA0 characters were converted to 0x20 spaces.
user expectation: Not OK. The 0xA0 I inserted manually should be kept, and so
the Content-Transfer-Encoding should be 8bit.
3) I used characters with accent, and started a line with two spaces.
result:
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8bit
info: In an hexa viewer I don't see any 0xA0 spaces.
user expectation: OK. The Content-Transfer-Encoding is 8bit to handle the
characters with accent. No 0xA0 characters were inserted manually so none should
be appear.
4) Same test than A3) but I forced the sending in html
result:
Content-Type: text/html; charset=ISO-8859-15
Content-Transfer-Encoding: 8bit
info: the line which starts with two spaces is encoded as : 0xA0 0x20
user expectation: the rendering is OK. The insertion of 0xA0 in order to do a
margin ? Why not.
B) I composed mails with Mozilla 1.5 with the patch applied, in html mode (my
default encoding is iso-8859-1). The sending format is set to « Auto-Detect ».
1) Same test than A1)
result:
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
info: this time, the 0xA0 automatically inserted was kept, the
Content-Transfer-Encoding has been set to 8bits instead of 7bit.
user expectation: Not OK. I never inserted any 0xA0 characters manually. I just
inserted regular spaces. So I should only get 0x20 spaces.
2) Same test than A2) (manually inserting 0xA0 characters)
result:
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
info: same as A2).
user expectation: Not OK. Same as A2).
3) Same test than B1) but I forced the format to text mode.
info: same as B1).
user expectation: I expected to avoid the 0xA0 composer behaviour, but same as B1)
-------------------------------
In all cases the mail data itself is perfectly valid, but I do not always get
expected characters :
With the patch, I get unexpected 0xA0 characters instead of 0x20 ones.
In both cases, I get unexpected 0x20 characters instead of 0xA0 ones.
The problems comes from the fact the composer does not make any difference
between a 0xA0 inserted manually by the user, which should be always kept, and
its own automatically generated 0xA0 in order to create a margin, which could be
kept in html mode, but should be removed in text mode.
Daniel may have better ideas than mines to deal with this issue, but here you
are anyway :
- have a real text mode for the mail composer. In that case the automatic 0xA0
insertion would not be activated ;
- use another technic for margins ;
- make a difference in the composer between user's 0xA0, and automatic 0xA0. For
example :
Four spaces<span class="moznbsp"> </span>then three spaces<span
class="moznbsp"> </span>done.
Automatic 0xA0 would be enclosed with a span in order to make the difference.
- use the entity " " for automatic 0xA0, and reals 0xA0 for user's 0xA0.
When sending in text/plain mode, &nbps; would be converted to 0x20 and real 0xA0
would be kept.
- store somewhere in memory where are the automatic 0xA0 located
-------------------------------
A temporary-not-perfect solution would be to keep the 0xA0 --> 0x20 conversion
but at the mail component level, not in the nsPlainTextSerializer class itself,
so at least we would relieve the browser of this bug (and then open a new bug
about the mail component or about the composer).
Reporter | ||
Comment 23•21 years ago
|
||
Today I've been inspecting the code in order to to implement the
"temporary-not-perfect solution" but then I thought about a nice improvement in
the way the composer could handle spaces.
In Comment #22 I proposed to enclose 0xA0 with a special span element :
Four spaces<span class="moznbsp"> </span>done.
I didn't really like the fact of using 0xA0 characters as the user just inserted
normal 0x20 characters. But I discovered those 0x20 characters can be kept given
the CSS class :
.moznbsp { white-space: pre; }
This works just fine.
----------------------------------
Small note :
Well, if you want to be stricter the CSS 1 recommandation says that white-space
applies to: block-level elements, so maybe it should be changed to :
Four spaces<div class="moznbsp"> </div>done.
.moznbsp { white-space: pre; display: inline; }
but with the span it has been working so far in many browsers.
----------------------------------
Summary :
hitting once the space key in the composer will insert a 0x20 character.
hitting once again the space key in the composer will insert another 0x20, and
enclose both of them into : <span class="moznbsp"> </span>
hitting again the space key will insert 0x20 characters into the span element :
<span class="moznbsp"> </span>
0xA0 inserted by the user will be managed just like another normal character.
I'm going to look around in the source code to find where the automatic 0xA0
insertion mechanism is made and think about how to implement such a solution
(but maybe somebody else knows more about this routine and could do the change
more easily).
Comment 24•21 years ago
|
||
Note that bug 213628 is dataloss and this potentially could be.
Comment 25•20 years ago
|
||
This bug is major on french Wikipédia (http://fr.wikipedia.org/) because in
french non-breaking spaces have to be used before colon, exclamation/question
mark...
Many articles correctly written by IE users using nbsp are altered by editors
using Mozilla/Firefox user agent without even being aware of this bug.
This bug should be at least written in in the Release Notes of the next version,
or better, fixed.
This is an HTML/XHTML bug as a readonly control will always have its "current
value" (see http://www.w3.org/TR/html401/interact/forms.html#current-value )
different from the "initial value" if the ''initial value'' contains nbsp (which
is not the definition of "readonly"). (and the bug is not limited to readonly
controls).
Severity should be changed to major. Keywords 'conversion' and 'xhtml' should be
added.
Comment 26•20 years ago
|
||
This is a testcase that can be used locally (without involving form submission
over HTTP).
"bug!" is shown in the page if your user agent has the bug.
Comment 27•20 years ago
|
||
This bug is still NEW, and severity normal...
But I just broke my whole wiki content because of it !
I was transferring my database content using phpMyAdmin, and my content did
contain the 'à' character which is encoded as 'Ã[nbsp]' in UTF-8.
But when submitting this using phpMyAdmin, 'Ã[nbsp]' are simply transformed in
'Ã[sp]', and my char is lost as it is not a valid UTF-8 character.
Fortunately, I have IE to upload my text again but I am wondering how would
someone under Linux have repaired the problem.
Comment 28•20 years ago
|
||
Cc'ing jst in case he has any insights.
/be
Comment 29•20 years ago
|
||
One possibility here is to stop calling that code only when serializing the
value of form controls. I'll attach a patch...
Comment 30•20 years ago
|
||
Updated•20 years ago
|
Attachment #162609 -
Flags: superreview?(dbaron)
Attachment #162609 -
Flags: review?(bryner)
Reporter | ||
Comment 31•20 years ago
|
||
Great! It looks like the temporary-not-perfect I though of. It seems sensible
now, as this bug is about the browser only.
I may open a new bug for the mail component.
Comment 32•20 years ago
|
||
Sorry for the spam but...
Pleaaaaase. Check this patch in for 1.0 final.
Comment 33•20 years ago
|
||
We're way too close to the release of 1.0 to consider taking this kind of
changes. If you strongly disagree you can set the blocking-aviary1.0 flag to ?
and make a case for this, but I doubt it'll go through :(
Comment 34•20 years ago
|
||
You may be right, but the thing is that this software is opensource (aka community based), and now
that the community has found a solution to a very annoying problem, there seem to have no way to
help solving this problem :-(
I think there is something else we could do to help instead of just waiting for a guru to check in this
patch, but I don't know what... Testing ? Voting ? What can be done ?
Comment 35•20 years ago
|
||
Increasing severity and requesting block status due to dependent bugs and the
existence of a patch.
Comment 36•20 years ago
|
||
(In reply to comment #34)
> You may be right, but the thing is that this software is opensource (aka
community based), and now
> that the community has found a solution to a very annoying problem, there seem
to have no way to
> help solving this problem :-(
Yeah, you're right, it's open source n' all, and the community is what matters
here. What we, the community, everyone involved, is dealing with here though is
balancing between risk of this patch causing problems and delaying the release
(or even worse, causing embarasing problems that we don't notice before the
release goes out the door) and the benefit of shipping with this fix. IOW, is
this problem worse to the community as a whole than the Firefox 1.0 release
being a failure due to a problem that could be caused by this fix?
With all the amazing effort being put into this release, in the actual product,
and at spreadfirefox.com et al, I *really* don't want to screw this up. So
unless there's a *huge* benefit to a *lot* of users in taking any given fix, I
wouldn't want to take the fix at this point...
Comment 37•20 years ago
|
||
Well I don't think we have a keyword for that, but FWIW I can confirm this bug
is indeed important for French-speaking users, and for French l10n in general.
Comment 38•20 years ago
|
||
Is 251404 a duplicate of this bug?
Comment 39•20 years ago
|
||
not this late in the game. sorry. not a blocker.
Flags: blocking1.7a-
Flags: blocking1.6b-
Flags: blocking-aviary1.0?
Flags: blocking-aviary1.0-
Updated•20 years ago
|
Attachment #162609 -
Flags: review?(bryner) → review+
Assignee | ||
Comment 40•20 years ago
|
||
Comment on attachment 162609 [details] [diff] [review]
Persist nbsp characters only when serializing the value of form controls.
>+ // Normally &npsp; is replaced with a space character when
, not &npsp;
sr=dbaron ***if*** you're sure that the editor never generates non-breaking
spaces when the user presses the space bar in text inputs (e.g., by pressing
space multiple times).
Attachment #162609 -
Flags: superreview?(dbaron) → superreview+
Comment 41•20 years ago
|
||
Most probably a duplicate of #194498 (or probably the reverse since this one has
a patch with r/sr and dependencies)
Comment 42•20 years ago
|
||
(In reply to comment #41)
> Most probably a duplicate of #194498 (or probably the reverse since this one has
> a patch with r/sr and dependencies)
But this one is newer so it is rather the duplicate.
Comment 43•20 years ago
|
||
*** Bug 251404 has been marked as a duplicate of this bug. ***
Comment 44•20 years ago
|
||
Requesting for 1.8b since there is a patch which could be checked in now (and
hoping that it would make it in Firefox 1.1 as well)
Flags: blocking1.8b?
Comment 45•20 years ago
|
||
Adding intl keyword since this affects particularly french-speaking users who
(should) have to use nbsp extensively (see comment #27, comment #29 and comment #41)
Keywords: intl
Sorry but I don't understand why this is a problem in french, and I am not sure
why we need this fix. And I am a native french speaker who has the book "Règles
Typographiques en vigueur à l'Imprimerie Nationale" on his work desk, who uses
french-speaking wikis.
Typographical rules in french allow only one non-breaking space before a
semi-colon, a question mark or an exclamation point. I've never seen a prose
with more than one unless it's in preformatted style, like in code for instance.
In that case - for a wiki - a pre element is the solution, isn't it?
Furthermore, David Baron is perfectly right: that code is needed for email...
The editor can't preserve multiple spaces if we don't turn some of them into .
Again, I am not convinced we need this fix, I can't see the plusses, and we have
one possible side-effect on email.
Comment 47•20 years ago
|
||
(In reply to comment #46)
> Sorry but I don't understand why this is a problem in french,
This is not especially a problem in French, but since French uses more
non-breaking spaces than other languages, this is a problem dear to many French
Mozilla users.
> and I am not sure why we need this fix.
We need to fix this for two reason: a Mozilla user cannot input a non-breaking
space in a web form; and most importantly, Mozilla destroys the work of other
users of web forms by removing the non-breaking spaces they might have put in
the form. Look at it this way: you edit a document, you save it without touching
anything, yet the text has been modified and you cannot do anything about it.
> Typographical rules in french allow only one non-breaking space before a
> semi-colon, a question mark or an exclamation point.
You forgot a few cases.
> I've never seen a prose with more than one unless it's in preformatted style,
> like in code for instance.
The problem is not with using several non-breaking spaces. The problem is using
even a single non-breaking space. You cannot do that.
> Furthermore, David Baron is perfectly right: that code is needed for email...
That code is *currently* needed for email, and that's a purely terrible design.
Unfortunately the code is deeply embedded at the core of Mozilla, so it cannot
be changed easily.
> The editor can't preserve multiple spaces if we don't turn some of them into
> .
Oh yes it can, just turn some of them into something else than a non-breaking
space! Use a character in the Unicode private area, for example.
Comment 48•20 years ago
|
||
If this problem is not fixed here, then bug 213628 absolutely must be fixed
since in that case _Mozilla doesn't work_.
Comment 49•20 years ago
|
||
Bug #194498 comment 22 explains why it is so tricky and exposes some possible
solutions based on charset detection, or by using another character internally
for editor-generated spaces (using it for user-entered data would probably work
too, although it may cause problems when copy/pasting from a text control to
another application). In any case, I'm confident that a solution will eventually
be found.
My two cents on this: in my understanding, 'nbsp's created by the editor
internally are never "alone", they are either preceded or followed by other nbsp
characters, line returns or 'classic' spaces.
So, instead of preserving any nbsp found in a text control like attachment
162609 [details] [diff] [review] does, let's try preserving only nbsps found "alone" in a form control
(input text, input file or textarea). That is, when they are surrounded by
letters (any language, not only latin ones), numbers or punctuation signs. It
would fix 99% of the cases, including the upload problem described in bug
#213628 (I can't possibly imagine the use for a file name with multiple nbsps in
a row. If it exists it is probably a virus trying to hide a file extension anyway).
The only problem I think of would be with so-called ASCII art and the like,
which is not so important IMHO, and for which users should rightfully use "pre"
or the associated command in their wiki syntax.
Is that a sensible approach? I've been thinking of this all day long, but I may
not have understood all the ins and outs. I don't know for example how it would
affect performance.
Comment 50•20 years ago
|
||
This bug is not about mail or multiple nbsp's. This bug is about data loss in
form serialization in the browser. Characters provided in a form are not the
characters received by the server, and that is not a correct behaviour.
It has nothing to do with mails, and the last patch provide a way to keep the
current mail behavior while making the browser behaviour correct.
Comment 51•20 years ago
|
||
(In reply to comment #50)
> This bug is not about mail or multiple nbsp's. This bug is about data loss in
> form serialization in the browser. Characters provided in a form are not the
> characters received by the server, and that is not a correct behaviour.
The relation with mail is that the same code (editor) is used for both, and that
it generates "fake" nbsp characters for internal use when you type multiple
spaces in a row (see comment #40 and bug #194498 comment 12). That's why they
are removed at submission time, because in most cases they are in fact generated
by the editor and thus not desired.
But that code is just being a little over-zealous, since it also removes nbsp
characters entered manually by the user, by the file selector (upload case), by
copy-paste from a Word processor (french quote marks and punctuation) or
existing nbsp characters that were pre-filled in the form input from a remote
database (wikipedia case).
Reporter | ||
Comment 52•20 years ago
|
||
The patch unactivates the 0x00A0 removal for Web forms only. So at least the
browser component would be kept bug-free, and many related bugs could be closed.
As I stated in comment #22 and comment #23, the composer uses an ugly method to
keep adjacent spaces. In an ideal world, this method would be replaced by
another one which does not use the 0x00A0 character hack. Therefore, the 0x00A0
to 0x0020 conversion would vanish.
Unfortunately this is quite a big work, so the current patch is a very good and
safe temporary solution (thanks Johnny).
(For Daniel, here is a reminder from the « Lexiques des règles typographiques en
usage à l'Imprimerie Nationale (2002) » :
http://psydk.org/gecko-bugs/french-spaces.jpg )
Comment 53•20 years ago
|
||
Did anyone ever answer dbaron's question in comment 40?
Assignee | ||
Comment 54•20 years ago
|
||
*** Bug 194498 has been marked as a duplicate of this bug. ***
Comment 55•20 years ago
|
||
(In reply to comment #53)
> Did anyone ever answer dbaron's question in comment 40?
I think it was Daniel's point in comment #46. Here is what he said on IRC:
[09:53] <Benoit-> I see, so the result would be that with the patch applied it
would *generate* nbsps when someone uses multiple spaces in a text form input?
[09:54] <glazou> right
In my understanding, that means the latest patch is not good enough but it could
be tweaked a bit to do what I describe in comment #49 (conserve only _single_
nbsps).
Here is what it does in nsPlainTextSerializer.cpp:
+ if (!(mFlags & nsIDocumentEncoder::OutputPersistNBSP)) {
+ // First, replace all nbsp characters with spaces,
+ // which the unicode encoder won't do for us.
+ static PRUnichar nbsp = 160;
+ static PRUnichar space = ' ';
+ aString.ReplaceChar(nbsp, space);
+ }
To that could be added something like (that's really pseudo-code, I don't know
how you'd do char concatenation here but you get the picture)
+ else {
+ // nbsp characters which are surrounded by spaces or line breaks
+ // are probably generated by the editor and not meant by the user.
+ // We are converting them anyway.
+ static PRUnichar nbsp = 160;
+ static PRUnichar space = ' ';
+ static PRUnichar linebreak = '\n';
+
+ // multiple nbsp characters
+ aString.ReplaceSubstring(nbsp+nbsp, space+space);
+
+ // nbsp characters following or preceding a space
+ aString.ReplaceSubstring(nbsp+space, space+space);
+ aString.ReplaceSubstring(space+nbsp, space+space);
+
+ // nbsp characters beginning or ending a line (the latter could be trimmed)
+ aString.ReplaceSubstring(nbsp+linebreak, space+linebreak);
+ aString.ReplaceSubstring(linebreak+nbsp, linebreak+space);
+ }
That "else" would never be reached when not in a form control (e.g. in mail). It
would take care of all edge case I can think of, some of them are probably not
necessary at all. I know it's a very dirty hack, I'd at least have done that
using only one regular expression but I don't think we can do that in c++ can we?
(Just trying to be helpful here, sorry if you think I'm being overly naive and
consider this as bugspam, that's not in my intention - I'd try this at home, see
if it builds and send a proper patch if I was not in a time of exams)
Comment 56•20 years ago
|
||
The last patch does NOT affect mail or composer. Comment #30 says that it
applies the nbsp-to-sp convertion to everything but web forms.
In Comment #31, Hadrien says that he will file a new bug for Mail. I think he is
refering to the current wrong behaviour of Mail described in Comment #22. This
wrong behaviour is not corrected by the last patch because it does only affect
web forms.
Just trying to clarify some thoughts :)
Reporter | ||
Comment 57•20 years ago
|
||
I got two ideas to keep adjacent spaces in the editor without the NBSP method.
1) First idea
Use the private Unicode area for Gecko and create an "EditorInternalNbsp". Some
value carefully chosen in the range 0xE000..0xF8FF. The editor would insert this
new character instead of 0x00A0.
Pros :
- minor changes to the editor.
Cons :
- changes needed in the html renderer ;
- non portable html files.
2) Second idea
Instead of using 0x00A0 (NBSP) sequences, a pair of 0x0020 (SPACE) and 0xFEFF
(ZERO WIDTH NBSP) could be used.
Here you are a scenario after pressing the space key 3 times in the editor :
Currently : 0x00A0 0x00A0 0x0020
Suggestion : 0x0020 0xFEFF 0x0020 0xFEFF 0x0020
When converting from html to text, a (0x0020 0xFEFF) pair would change to a
single 0x0020 character.
Pros :
- no change in the html render, adjacent spaces are kept with this method ;
- 0x00A0 are always kept and becomes a normal char ;
- portable html files, meaning the content can be sent as-is.
Cons :
- in the editor, the user as to type twice on the arrow key to move the caret ;
- when copying & pasting text, the 0xFEFF values are kept, and may be rendered
strangely in the destination software (for example, Notepad renders the 0xFEFF
as dots).
In the last two items, the editor would be aware of (0x0020 0xFEFF) sequences
and could move of one space only (I think it would be the same kind of code that
when managing surrogate pairs). About copy & pasting, the html to text converter
could be aware of the kind of sequence too, and convert them to single 0x0020
characters (same idea than converting html representation to plain text
representation, like <li> to "-").
Comment 58•20 years ago
|
||
> Use the private Unicode area for Gecko and create an "EditorInternalNbsp".
I'd argue that the new character should be treated as a space, not as a
non-breaking space; the non-breaking aspect of nbsp isn't why we're using it
here, and if someone actually wants non-breaking behavior they'll use a real nbsp.
The important part would be modifying layout to do the right thing with the new
character. If you could interest a layout person in this, the editor and
serializer portions probably wouldn't be that difficult.
My guess is that the layout changes wouldn't be too difficult either, for
someone familiar with the code: the new character would be treated exactly like
normal spaces: no new behavior, just a new char which gets the old behavior.
> 0xFEFF (ZERO WIDTH NBSP)
Interesting idea, but ...the argument here is that nobody actually wants 0xFEFF
in output, while they do want 0xA0? I don't think most people thought anybody
wanted 0xA0 when the current code was originally written, so I'd be leery of
solutions which assumed "Our character is important, but this other character
must not be ..."
I also worry that the new caret-moving code in gecko and the code for
inserting/deleting would be fairly tricky to account for sometimes skipping the
zwsp characters. There would be a potential for a lot of new bugs there if it
wasn't tested very thoroughly.
Assignee | ||
Comment 59•20 years ago
|
||
It seems like what we should really be doing is:
* when we're editing plain text, store multiple presses of space as spaces
(this is a change)
* when we're editing HTML, store multiple presses of space using non-breaking
spaces for all but the last press (tricky with deletion) (we probably do this
fine already)
* when serializing HTML to text, convert runs of non-breaking spaces terminated
by a space to spaces
* when using nsPlainTextSerializer to convert text to text (if we need to use
it at all, although we seem to now), don't mess with spaces
Does this make sense?
Reporter | ||
Comment 60•20 years ago
|
||
> It seems like what we should really be doing is:
> * when we're editing plain text, store multiple presses of space as spaces
> (this is a change)
Yes, there is a need for a full text editor in the mail component. No need of
the NBSP-for-adjacent-spaces trick then. Fine for me.
> * when we're editing HTML, store multiple presses of space using non-breaking
> spaces for all but the last press (tricky with deletion) (we probably do this
> fine already)
The editor actually does this, and this behaviour is ok when editing HTML. Fine
for me.
> * when serializing HTML to text, convert runs of non-breaking spaces terminated
> by a space to spaces
When you are editing a plain text mail, currently the mail component seems to do
HTML editing internally, and converts the HTML to text. That's why the
conversion from NBSP to SP was needed. With a true plain text editor (as you
suggest in the first point), this conversion won't be needed anymore.
The conversion from NBSP to SP would be kept at an higher level, for example
when the user starts writing a mail in HTML and then decides to switch to plain
text mode while still editing his message. He knows this may lead to some losses
or some conversion tricks.
Fine for me.
> * when using nsPlainTextSerializer to convert text to text (if we need to use
> it at all, although we seem to now), don't mess with spaces
YES. In that case we can come back to the original patch which just removes the
conversion. Fine for me.
> Does this make sense?
I think you get it right David, your ideas look sensible imho.
So changes are :
- a real plain text edition mode for the editor ;
- a move of the NBSP to SP conversion code out of the nsPlainTextSerializer.
I'm clueless about the new location of the NBSP to SP conversion, but I guess
there is already a piece of code which goal is to convert HTML to plain text ?
Comment 61•20 years ago
|
||
> So changes are :
> - a real plain text edition mode for the editor ;
this so isn't happening.
Assignee | ||
Comment 62•20 years ago
|
||
Too late for 1.8b1; I'll try to do this for 1.8b2.
Assignee: form-submission → dbaron
Flags: blocking1.8b?
Flags: blocking1.8b2+
Flags: blocking1.8b-
Comment 63•20 years ago
|
||
dbaron, should we try to keep this in the b2 timeframe (next few days?) or move
it to 1.8b3?
Updated•20 years ago
|
Flags: blocking1.8b2+ → blocking1.8b3+
Assignee | ||
Updated•19 years ago
|
Target Milestone: --- → mozilla1.8beta3
Comment 64•19 years ago
|
||
not a blocker. dbaron will keep this on his list in case he has time.
Flags: blocking1.8b3+ → blocking1.8b3-
Comment 65•19 years ago
|
||
This is a duplicate of bug 195946.
Assignee | ||
Comment 66•19 years ago
|
||
*** Bug 195946 has been marked as a duplicate of this bug. ***
Comment 67•19 years ago
|
||
*** Bug 310877 has been marked as a duplicate of this bug. ***
Comment 68•19 years ago
|
||
Is there any chance of this fixed in Fx 2.0?
(FYI, I've just ran into this bug while editing Russian Wikipedia, ru.wikipedia.org... I've had to use lots of instead, in order to fix the Wikipedia article, and that in turn annoyed several further editors who thought that HTML entity, while used much, makes the source code of Wiki article nearly unreadable, and reading diffs of it is made a real pain. They were obviously right. Now I start thinking that mere using of Firefox in Wikipedia sometimes should be considered as a severe case of vandalism, and the wikiuser's account should be banned from the system until he/she starts using some better designed browser.)
Updated•19 years ago
|
Summary: no-break spaces in submited text are replaced by breakable spaces → no-break spaces in submitted text are replaced by breakable spaces
Comment 69•18 years ago
|
||
This bug has been opened for 3 years now.
What's stopping from fixing it?
This is the most distrubing 'feature'.
If users type nbsp, copy and paste it or type the Alt+num value it's because they want it.
Comment 70•18 years ago
|
||
> What's stopping from fixing it?
Lack of time. Feel free to help out -- see comment 59 for what needs to be done.
Comment 71•18 years ago
|
||
> What's stopping from fixing it?
Lack of time. Feel free to help out -- see comment 59 for what needs to be done.
Comment 72•18 years ago
|
||
I can understand that more time would be needed for an elaborate fix, but there is already a patch to fix this problem, attached to this bug, which has been reviewed and superreviewed: what is stopping it from getting committed? Why are two years not sufficient to check in a patch which has already been approved?
This bug has such dramatic consequences everywhere (disastrous edits on Wikipedia en masse, for one thing) that I'm willing to stand on my knees and beg, or donate money, or whatever, but I don't have check-in permission so I can't commit the patch myself...
Comment 73•18 years ago
|
||
> what is stopping it from getting committed?
Just the need to merge it to trunk. I'd be happy to commit if someone does that. Note that nsIDocumentEncoder is an IDL file now, not a .h....
Comment 74•18 years ago
|
||
jpl24, would you mind updating that patch to tip?
Comment 75•18 years ago
|
||
Sure, no problem.
Comment 76•18 years ago
|
||
The patch would still convert all nbsps, though, right?
So runs of multiple spaces typed by the user would generate 0x20 0xa0 0x20 0xa0 etc. with this patch even if it were updated to the trunk.
There were several suggestions to avoid that (use zwsp, don't convert nbsp when it's by itself but do convert it if it's interspersed with spaces) but the patch that was posted here doesn't attempt to address those.
Comment 77•18 years ago
|
||
[To answer Akkana Peck's objection.] What I understand from the discussion so far and from the diff itself is that the patch we're talking about suppresses conversion of nbsp to space on plain text input forms, not on HTML composer. Now on plain text input forms, no spurious nbsp's are generated by Mozilla hacks (contrary to composer), so there any nbsp found there is a genuinely input nbsp (either by user input or by initial form value). Am I misunderstanding something? Because if this is correct, there is really no reason not to apply this patch, it can't break anything.
Of course, I still think it's a bug to convert nbsp to space on composer text, but it's not nearly as bad as converting nbsp to space on plain text input forms which is what the patch (again, if I understand correctly) does.
Please correct me if I'm wrong.
Comment 78•18 years ago
|
||
I believe comment 77 is correct.
Comment 79•18 years ago
|
||
This is an updated version of jst's patch. It fixes the test case in this bug, but does not change composer behavior.
Do we have any unit tests for the plaintext serializer?
Attachment #162609 -
Attachment is obsolete: true
Comment 80•18 years ago
|
||
If the plaintext edit rules used for form fields don't insert the nbsp characters, then by all means let's get this fixed for forms (then the debate re. html composer can continue: there are valid arguments both ways). I thought that was the holdup, that both html and plaintext edit rules are inserting nbsps when the user types a run of spaces.
Is there a form test case somewhere that reports exactly what got submitted? The test case referenced in this bug's URL field doesn't do that, but it would be useful both for testing the patch and as a regression test afterward.
Status: NEW → ASSIGNED
Comment 81•18 years ago
|
||
> Do we have any unit tests for the plaintext serializer?
The DOM to Text conversion tests (in htmlparser/tests/outsinks, iirc) that run from Tinderbox test the the html and plaintext serializer, and it would definitely be worth adding a case to those to cover this issue. They don't test form submission, though, which takes a slightly different path through the parser (but which might require a manual test).
Comment 82•18 years ago
|
||
> Do we have any unit tests for the plaintext serializer?
Not really. Bug 333060 more or less covers that (initing the nsIDocumentEncoder with text/plain as the type should do it). It should be possible to test a variety of output types and flags and such, including whatever flags form submission uses in this case.
Akkana is probably right that we should add a test to the outsinks if we can too, esp. since tinderbox actually runs those.
Assignee | ||
Comment 83•18 years ago
|
||
(In reply to comment #73)
> > what is stopping it from getting committed?
>
> Just the need to merge it to trunk.
I thought comment 40 was, but maybe that was addressed and I don't see it.
Comment 84•18 years ago
|
||
I did verify that typing spaces in a text input and textarea generates spaces, not non-breaking spaces, when we hit nsPlainTextSerializer::Output.
Checked in that patch to trunk; marking fixed. I've filed bug 347689 on implementing the proposal in comment 59.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Updated•18 years ago
|
Blocks: test-suites
Comment 85•18 years ago
|
||
Would it be too imprudent to nominate the patch of this dataloss for 1.8.1 blockers? (I mean, somewhere about 2006-08-08, if it survives on the trunk with no regressions reported in 2 days.)
Comment 86•18 years ago
|
||
This patch can't land on 1.8 as-is -- it changes interfaces.
Updated•18 years ago
|
Flags: in-testsuite?
Updated•18 years ago
|
Summary: no-break spaces in submitted text are replaced by breakable spaces → no-break spaces (nbsp) in submitted text are replaced by breakable spaces
Whiteboard: DUPEME
Comment 87•18 years ago
|
||
RCS file: /cvsroot/mozilla/testing/mochitest/tests/test_bug218277.html,v
done
Checking in tests/test_bug218277.html;
/cvsroot/mozilla/testing/mochitest/tests/test_bug218277.html,v <-- test_bug218277.html
initial revision: 1.1
done
Flags: in-testsuite? → in-testsuite+
Comment 88•18 years ago
|
||
when will this become functional on other browsers, such as camino, ffx, etc?
Comment 89•18 years ago
|
||
Whenever they update to Gecko 1.9. So Firefox 3 and whatever Camino, etc versions will be using that version of Gecko.
Target Milestone: mozilla1.8beta3 → mozilla1.9alpha
Comment 90•18 years ago
|
||
FYI, the fact that this fix was intentionally made to only affect TEXTAREAs and not all text fields has resulted in bug 213628 and bug 290565 remaining not fixed when they could have been resolved at the same time. can someone explain why this patch was intentionally castrated?
Comment 91•18 years ago
|
||
> the fact that this fix was intentionally made to only affect TEXTAREAs and
> not all text fields
This patch affects <textarea>, <input type="text">, and <input type="password">.
> as resulted in bug 213628 and bug 290565 remaining not
> fixed when they could have been resolved at the same time
Bug 213628 should have been fixed by this patch. Can you point to a testcase that demonstrates that it's not?
Bug 290565 would need to be fixed in totally different code.
> can someone explain
See comment 19 and comment 21. And in general, please just read bugs carefully and in their entirety before commenting. It'll help prevent the hundreds of emails you've now generated.
Comment 92•18 years ago
|
||
(In reply to comment #89)
> Whenever they update to Gecko 1.9. So Firefox 3 and whatever Camino, etc
> versions will be using that version of Gecko.
>
How Gecko 1.9-using browser can be identified from User-Agent string?
Comment 93•18 years ago
|
||
It'll have "rv:1.9" in it.
Comment 95•18 years ago
|
||
Comment on attachment 136358 [details] [diff] [review]
This patch corrects the « nsPlainTextSerializer::Output » function
Since this bug has been resolved as FIXED, I'm removing the
review request.
Attachment #136358 -
Flags: review?(daniel)
Comment 96•16 years ago
|
||
This bug still exists.
When I copy a non breaking space from Firefox, it is automatically converted to a normal space.
Please, reopen this case.
My u.a. : Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008090210 Mandriva/1.9.0.1-15mdv2009.0 (2009.0) Firefox/3.0.1
Comment 97•16 years ago
|
||
Cut'n'paste would be another bug, this one was about form submission.
Feel free to open a new bug on cut'n'paste if it doesn't exist yet.
Comment 98•14 years ago
|
||
As far as I can tell this still happens in 4.0b11. For example, NBSP characters typed in the <textarea>s used by GMail and Launchpad are converted to normal spaces.
I suspect it also happens with the textarea I’m typing this in. On my system, the second paragraph of comment 96 (above) wraps between “converted to” and “a normal”. Below I type the exact same paragraph, but I’ll type NBSPs instead of normal spaces after “automatically”:
When I copy a non breaking space from Firefox, it is automatically converted to a normal space.
Comment 99•14 years ago
|
||
Note that while I was typing the above comment, the text wrapped correctly (everything from “automatically”, including that word, went on the next line before I clicked “save changes”); however the text above wraps between “to” and “a”.
Comment 100•14 years ago
|
||
Hmm. Sorry for the many messages, it appears I’m wrong. Looking at the source for the above test, there is a “ ” where I typed NBSP characters. There’s apparently a different bug of Firefox that causes it to line-wrap at those entities; can anyone point me to the relevant bug, or to an explanation why that isn’t a bug?
Comment 101•14 years ago
|
||
What you are experiencing seems to be a case of server-side conversion, about which Firefox can't do anything.
The testcase attached to this bug still returns "no bug" and a preview edit on a Wikipedia page shows that NBSPs are correctly parsed and submitted by Firefox 4 beta 11.
If this very sentence full of NBSPs is wrapping, you should probably file a bug on the Bugzilla product itself.
Updated•6 years ago
|
Component: HTML: Form Submission → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•