Open Bug 195420 Opened 22 years ago Updated 1 month ago

[ComposerSourceView] Composer converts non-Unicode characters (trademark &trade, copyright &copy) and accented characters to Unicode symbols, despite charset given

Categories

(Core :: DOM: Serializers, defect)

defect

Tracking

()

People

(Reporter: sbrown3, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: editorbase-)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3b) Gecko/20030220 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3b) Gecko/20030220 In a document containing the codes &#8482; or &trade, these codes are converted to a single 'TM' character when the document is saved. This occurs even after changing the setting in Preferences to 'Retain Original Source Formatting'. The reason I need to retain the code is because I paste the source from Composer's <HTML> Source' page into a memo field in an Oracle (7.3) database which converts the 'TM' into a question mark. Also, a tool for inserting such codes (or symbols, with the code being inserted into the source) into a document in Composer would be useful. Reproducible: Always Steps to Reproduce: 1. Open a new document in Composer. 2. Type 'Mozilla' (no quotes). 3. Go to '<HTML> Source' page. 4. Append the code &#8482; or &trade; to the text (Mozilla) you typed. 5. Go to the 'Normal' page. 6. Return to the '<HTML> Source' page. 7. Repeat after changing setting in Preferences to 'Retain Original Source Formatting'. Actual Results: The code &#8482; or &trade; is converted to ™. Expected Results: The codes should be retained exactly as I typed them (as is the case with other codes: &gt;, &lt;, &amp;, &nsbs;, etc.
I see this in Netscape7 on MacOSX sounds like a bug for -->DOM to Text Conversion akkana--is this a dupe?
Assignee: composer → harishd
Status: UNCONFIRMED → NEW
Component: Editor: Composer → DOM to Text Conversion
Ever confirmed: true
Keywords: nsbeta1
OS: Windows 2000 → All
QA Contact: petersen → sujay
Hardware: PC → All
Whiteboard: editorbase
Looks like we have another character not being output as an entity when it should be.
When it is saved, it is saved as "&#8482;". I think the editor maintains entities for Latin1 set only but not for others like &trade; and &euro; (for source view) because &aacute; and &copy; are kept as entities in the source view.
Should OutputEncodeHTMLEntities (rather than OutputEncodeLatin1Entities) be the default for ISO-8859-1? It seems that wouldn't regress bug 65324. Reporter: in the meantime, you can try to set this pref n your prefs.js (or user.js or editor.js): pref("editor.encode_entity", "html"); // which includes &sup2; &alpha; &trade; etc But note that this will cause Composer to entity-ze 8bit accented letters, greek letters, and other special markup symbols as defined in HTML4. So it will only work if your Oracle 7.3 product understands the set of HTML4 entities.
editorbase-
Whiteboard: editorbase → editorbase-
adt: nsbeta1-
Keywords: nsbeta1nsbeta1-
Assignee: harishd → dom-to-text
Severity: normal → minor
QA Contact: sujay
Summary: Composer converts Unicode to character for trademark symbol. → Composer converts Unicode to character for trademark, copyright &copy and probably other special characters
*** Bug 288866 has been marked as a duplicate of this bug. ***
*** Bug 288384 has been marked as a duplicate of this bug. ***
*** Bug 354943 has been marked as a duplicate of this bug. ***
Based on the dupes I'm altering the priority back to "normal" and (hopefully) tweaking the summary
Severity: minor → normal
Summary: Composer converts Unicode to character for trademark, copyright &copy and probably other special characters → Composer converts non-Unicode characters (trademark &trade, copyright &copy) and accented characters to Unicode symbols, despite charset given
Solution : Instead of selecting "File > Save" you should select : "File > Save and change Character encoding" and select "Western (Iso 8859-1)".
Assignee: dom-to-text → nobody
QA Contact: dom-to-text

Mass-removing myself from cc; search for 12b9dfe4-ece3-40dc-8d23-60e179f64ac1 or any reasonable part thereof, to mass-delete these notifications (and sorry!)

Severity: normal → S3
Summary: Composer converts non-Unicode characters (trademark &trade, copyright &copy) and accented characters to Unicode symbols, despite charset given → [ComposerSourceView] Composer converts non-Unicode characters (trademark &trade, copyright &copy) and accented characters to Unicode symbols, despite charset given
You need to log in before you can comment on or make changes to this bug.