<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 3

•

23 years ago

Attached file test case — Details

Comment 4

•

23 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Comment 5

•

23 years ago

How to verify this fix: Open the attached test case in browser, and choose file->Edit page. Inside composer, select saveAs and choose a different name to save it. Open the new html file in a notebook and check its source. Before the patch, ² will be displayed as "²". After the patch, it should be displayed as "&sup2".

Status: NEW → ASSIGNED

Comment 6

•

23 years ago

naoki, could you r=?

Comment 7

•

23 years ago

This was actually done intentionally. For Latin1 documents, it is preferred to use entities even when they can be encoded by the document charset. I think this is also done in order to keep the save behavior as 4.x. I think we want to keep the current behavior unless we get complaints from users. cc to akkana, bobj, jst

Comment 8

•

23 years ago

resolve as wfm per naoki's comment.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → WORKSFORME

Comment 9

•

23 years ago

don't rush ;-) I encountered this problem while trying to figure out why nsIEditor::OutputToString() doesn't return entities even though I'm specifying nsIDocumentEncoder::OutputEncodeEntities flag. I set document encoding to UTF-8. Is it expected that nobody composes Latin document in UTF-8 encoding? What about ISO-8859-15? Should output be different between ISO-8859-1 and ISO-8859-15? I don't think so. Especially non-breaking space character is problematic. As far as you stick to 0xA0 code point, it won't be portable across character set encoding (e.g. JIS0208 doesn't have such code point). Only solution for this is to use   entity instead of charset encoding dependent code point. Re-opening.

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Comment 10

•

23 years ago

I am confused.   does not mean code point 160 in current charset, it always mean u+00a0 in unicode. Why   doesn't work and   works? Does it indicate another problem?

Status: REOPENED → ASSIGNED

Comment 11

•

23 years ago

There are two cases. 1) Use entities only if the character cannot be encoded to the target charset. 2) Use available entities as much as possible. For #1, that is done as a fallback for the charset conversion. This is basically the way the serializer implemented. For #2, the entity mapping can be applied before the charset conversion. This is not done by the serializer currently. nsISaveAsCharset has an option to do this which also takes a flag for kind of entities (e.g. Latin1, Symbol). I think the similar feature can be added to the serializer.

Comment 12

•

23 years ago

Please look at the problem from embedding application devloppers' point of view. As for me, I'm not using Gecko's charset convertor. Thus, #1 is helpless. Existance of nsIDocumentEncoder::OutputEncodeEntities flag implies #2, and it's easily done by removing the testing of document charset encoding.

Comment 13

•

23 years ago

Any comments or updates?

Comment 14

•

23 years ago

Reassign this one to naoki. Now is the question is about possible impact this patch may have. Naoki is better aware of those issues than me.

Assignee: shanjian → nhotta

Status: ASSIGNED → NEW

Comment 15

•

23 years ago

>I'm not using Gecko's charset convertor. Thus, #1 is helpless. nsIEntityConverter may be used to encode Unicode to entity. >Existance of nsIDocumentEncoder::OutputEncodeEntities flag implies #2 If the flag is set then we should create the entity (CER) before the charset conversion. I think the serializer is working that way.

Status: NEW → ASSIGNED

Comment 16

•

23 years ago

>>Existance of nsIDocumentEncoder::OutputEncodeEntities flag implies #2 >If the flag is set then we should create the entity (CER) before the >charset conversion. I think the serializer is working that way. Only when document charset encoding is ISO-8859-1. It should be consistent regardless of document charset encoding.

Comment 17

•

23 years ago

If that is the case then need to remove that from the serializer and move the special case handling up to the editor.