Closed Bug 171405 Opened 22 years ago Closed 21 years ago

Composer editor deletes all the   characters on pages using UTF-8 character set

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: bugzilla2, Assigned: t_mutreja)

References

Details

Attachments

(1 file)

PatchV1.0 22 years ago Tanu Mutreja 911 bytes, patch		Details \| Diff \| Splinter Review

Jim Booth

Reporter

Description

•

22 years ago

User-Agent:       Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.2b) Gecko/20020928
Build Identifier: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.2b) Gecko/20020928

All the Non-breaking space characters (&nbsp;) are deleted when you switch to
HTML Source View.  

Reproducible: Always

Steps to Reproduce:
1. Open Composer Test Page from the Debug menu
2. Observe the third line that says "This sentence has two &nbsp; tags between
each word."
3. Switch to HTML SOURCE view and note that it DOES NOT contain any &nbsp;s,
just two spaces.
4. Make any change to the source code
5. Switch back to NORMAL view, and note that the double spaces are now ignored
in the display  (The sentence appears with single spaces.)
6. Switch back to HTML SOURCE view and manually replace the double spaces with
&nbsp;&nbsp;
7. Switch to NORMAL view and back to HTML SOURCE view.  Note the &nbsp;s are
gone again
8. Change 
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">  TO
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
     and resave the file.
9. Close Composer and reopen it with the same file (or change View/Character
coding to Western (ISO-8859-1)
10. Switch to HTML SOURCE view and manually replace the double spaces with
&nbsp;&nbsp; again
11. Switch to NORMAL view and back to HTML SOURCE view.  Note the &nbsp;s ARE
THERE NOW! Now the formatting of the page will remain correct.

Actual Results:  
&nbsp; characters are converted to spaces and subsequently ignored (displayed as
single spaces)

Expected Results:  
Keep the formatting as it was originally meant to be.  (Retain multiple spaces
between words where the author intended.)

Bug is also present in Mozilla 1.1 (20020826) and 1.2 Alpha (20020910) at least.

Workaround: check the character set on any page before you open it in Composer,
and if it's UTF-8, open it first in another editor and change the character set.
 You'll then have to manually fix any extended characters that display incorrectly.

Also can be reproduced by setting View/Character Coding to UTF-8 and then typing
in multiple spaces between words in a blank page.

kinmoz

Comment 1

•

22 years ago

I have a feeling this is serializer related ... but over to jfrancis first to
make sure.

Assignee: kin → jfrancis

Joe Francis

Comment 2

•

22 years ago

kin may be hesitant to hand off to serializer sans investigation, but i'm not. 
This has to be serializer.

Assignee: jfrancis → harishd

Status: UNCONFIRMED → NEW

Component: Editor: Core → DOM to Text Conversion

Ever confirmed: true

Tanu Mutreja

Assignee

Comment 3

•

22 years ago

In nsHTMLContentSerializer.cpp, we are checking for charset and based on that 
we convert the character to corresponding entity. Right now only for 
charset "ISO-8859-1", we do this conversion. 
From what I understand about the character references, they are encoding-
independent mechanism. I'm not exactly understanding the reasoning behind doing 
it only for the ISO-8859-1. Any pointer???

Heikki Toivonen (remove -bugzilla when emailing directly)

Comment 4

•

22 years ago

I have no idea why we do that. My advice would be to see from Bonsai who
introduced those lines and ask from them if possible.

Tanu Mutreja

Assignee

Comment 5

•

22 years ago

Thanks Heikki. This bug seems to be the side effect of patch for bug#:65324. 
CC'ing JST and Nhotta for their inputs.

I feel this bug is valid only for "nbsp". It's correct that UTF-8 has a code 
point for space and hence for a space it does not need any reference 
like "nbsp" but then HTML squeezes all the adjacent spaces to a single space. 
This is exactly the case here and seems correct(unless there is some 
specification for utf-8 to treat all adjacent spaces in a way similar to nbsp). 
Also, from a list a character references that fall in the range of 127 to 256, 
I feel that no HTML specific action is taken for them. Based on this 
assumption, I'm attaching a patch here...

Assignee: harishd → t_mutreja

Tanu Mutreja

Assignee

Comment 6

•

22 years ago

Attached patch PatchV1.0 — Details — Splinter Review

Irrespective of the "charset" value, treating "&nbsp;" as an special case and
retaining it for all encodings.

nhottanscp

Comment 7

•

22 years ago

I am not sure if everybody wants &nbsp;. 
I think this should be a pref for the serializer like the charset check (bug
169590).

Dwayne C. Litzenberger

Comment 8

•

22 years ago

I have a similar problem.  &nbsp; is being converted first into real spaces,
then into &Acirc;&nbsp; when publishing in Composer.  It happens in this version:

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020913 Debian/1.1-1

but not in:

Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.1) Gecko/20020826

Boris Zbarsky [:bzbarsky]

Updated

•

22 years ago

Blocks: 179525

Jim Booth

Reporter

Comment 9

•

21 years ago

Checking back through my old bugs, this one seems to be fixed now.  

Can someone confirm that and mark it WFM?

Jim Booth

Reporter

Comment 10

•

21 years ago

Marking as wFM.  Can't reproduce my test case anymore.  Some other patch must
have fixed this.

Status: NEW → RESOLVED

Closed: 21 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Composer editor deletes all the   characters on pages using UTF-8 character set

Categories

(Core :: DOM: Serializers, defect)

Tracking

()

People

(Reporter: bugzilla2, Assigned: t_mutreja)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Attachment

General

Description

File Name

Content Type

Composer editor deletes all the &nbsp; characters on pages using UTF-8 character set

Composer editor deletes all the characters on pages using UTF-8 character set