Closed
Bug 171405
Opened 22 years ago
Closed 21 years ago
Composer editor deletes all the characters on pages using UTF-8 character set
Categories
(Core :: DOM: Serializers, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: bugzilla2, Assigned: t_mutreja)
References
Details
Attachments
(1 file)
911 bytes,
patch
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.2b) Gecko/20020928 Build Identifier: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.2b) Gecko/20020928 All the Non-breaking space characters ( ) are deleted when you switch to HTML Source View. Reproducible: Always Steps to Reproduce: 1. Open Composer Test Page from the Debug menu 2. Observe the third line that says "This sentence has two tags between each word." 3. Switch to HTML SOURCE view and note that it DOES NOT contain any s, just two spaces. 4. Make any change to the source code 5. Switch back to NORMAL view, and note that the double spaces are now ignored in the display (The sentence appears with single spaces.) 6. Switch back to HTML SOURCE view and manually replace the double spaces with 7. Switch to NORMAL view and back to HTML SOURCE view. Note the s are gone again 8. Change <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> TO <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> and resave the file. 9. Close Composer and reopen it with the same file (or change View/Character coding to Western (ISO-8859-1) 10. Switch to HTML SOURCE view and manually replace the double spaces with again 11. Switch to NORMAL view and back to HTML SOURCE view. Note the s ARE THERE NOW! Now the formatting of the page will remain correct. Actual Results: characters are converted to spaces and subsequently ignored (displayed as single spaces) Expected Results: Keep the formatting as it was originally meant to be. (Retain multiple spaces between words where the author intended.) Bug is also present in Mozilla 1.1 (20020826) and 1.2 Alpha (20020910) at least. Workaround: check the character set on any page before you open it in Composer, and if it's UTF-8, open it first in another editor and change the character set. You'll then have to manually fix any extended characters that display incorrectly. Also can be reproduced by setting View/Character Coding to UTF-8 and then typing in multiple spaces between words in a blank page.
I have a feeling this is serializer related ... but over to jfrancis first to make sure.
Assignee: kin → jfrancis
Comment 2•22 years ago
|
||
kin may be hesitant to hand off to serializer sans investigation, but i'm not. This has to be serializer.
Assignee: jfrancis → harishd
Status: UNCONFIRMED → NEW
Component: Editor: Core → DOM to Text Conversion
Ever confirmed: true
Assignee | ||
Comment 3•22 years ago
|
||
In nsHTMLContentSerializer.cpp, we are checking for charset and based on that we convert the character to corresponding entity. Right now only for charset "ISO-8859-1", we do this conversion. From what I understand about the character references, they are encoding- independent mechanism. I'm not exactly understanding the reasoning behind doing it only for the ISO-8859-1. Any pointer???
I have no idea why we do that. My advice would be to see from Bonsai who introduced those lines and ask from them if possible.
Assignee | ||
Comment 5•22 years ago
|
||
Thanks Heikki. This bug seems to be the side effect of patch for bug#:65324. CC'ing JST and Nhotta for their inputs. I feel this bug is valid only for "nbsp". It's correct that UTF-8 has a code point for space and hence for a space it does not need any reference like "nbsp" but then HTML squeezes all the adjacent spaces to a single space. This is exactly the case here and seems correct(unless there is some specification for utf-8 to treat all adjacent spaces in a way similar to nbsp). Also, from a list a character references that fall in the range of 127 to 256, I feel that no HTML specific action is taken for them. Based on this assumption, I'm attaching a patch here...
Assignee: harishd → t_mutreja
Assignee | ||
Comment 6•22 years ago
|
||
Irrespective of the "charset" value, treating " " as an special case and retaining it for all encodings.
Comment 7•22 years ago
|
||
I am not sure if everybody wants . I think this should be a pref for the serializer like the charset check (bug 169590).
Comment 8•22 years ago
|
||
I have a similar problem. is being converted first into real spaces, then into  when publishing in Composer. It happens in this version: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020913 Debian/1.1-1 but not in: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.1) Gecko/20020826
Checking back through my old bugs, this one seems to be fixed now. Can someone confirm that and mark it WFM?
Reporter | ||
Comment 10•21 years ago
|
||
Marking as wFM. Can't reproduce my test case anymore. Some other patch must have fixed this.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•