Closed
Bug 20062
Opened 25 years ago
Closed 25 years ago
Send messages with non-encoded NBSPs
Categories
(Core :: DOM: Editor, defect, P3)
Tracking
()
VERIFIED
FIXED
M13
People
(Reporter: sfraser_bugs, Assigned: nhottanscp)
References
()
Details
Attachments
(2 files)
2.06 KB,
text/html
|
Details | |
404 bytes,
patch
|
Details | Diff | Splinter Review |
I've seen a couple of times that HTML messages posted to newsgroups with 5.0 contain raw non-breaking space characters (ASCII 160) in the middle of HTML text. These should have been converted to somewhere.
Updated•25 years ago
|
Status: NEW → ASSIGNED
Target Milestone: M13
Comment 1•25 years ago
|
||
This isn't supposed to happen now that we're using Naoki's new entity converter. I'll look into what's happening (maybe I'm calling the converter with the wrong flags).
Reporter | ||
Comment 2•25 years ago
|
||
Assignee | ||
Comment 3•25 years ago
|
||
This is expected because the entity conversion is applied as a fallback for charset conversion. Since nbsp is in ISO-8859-1, no fallback (i.e. entity conversion happens). I think that option is currently used for messenger only (because it may benefits message search). But if this is undesirable it can be changed easily by resetting flag.
Comment 4•25 years ago
|
||
Naoki, if I change the flags to change the fallback option, will we still get double quotes? We don't want to go back to where " was always encoded into " since lots of people complained about that. What's the right flag to use to get but not " ?
Assignee | ||
Comment 5•25 years ago
|
||
No, ", &, <, > are always excluded from the conversion. It is in mail/news code, mailnews/base/util/nsMsgI18N.cpp line 409 nsISaveAsCharset::attr_EntityAfterCharsetConv + nsISaveAsCharset::attr_FallbackDecimalNCR : change to nsISaveAsCharset::attr_htmlTextDefault :
Updated•25 years ago
|
Assignee: akkana → rhp
Status: ASSIGNED → NEW
Comment 6•25 years ago
|
||
Sounds like this is Rich's bug, not mine, then; but I've changed the nsHTMLContentSinkStream.cpp to follow Naoki's suggestion. But Naoki: even after I make that change, I still don't see the entities; I just see spaces, the same thing I saw when I was passing the flag nsISaveAsCharset::attr_EntityAfterCharsetConv | nsISaveAsCharset::attr_FallbackDecimalNCR.
Assignee | ||
Updated•25 years ago
|
Assignee: rhp → nhotta
Assignee | ||
Comment 7•25 years ago
|
||
Actually it's my bug. I need to change the flag in mail/news code.
Assignee | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 8•25 years ago
|
||
There are two issues around this bug. 1) Currently, messenger uses unicode interface to get data from editor then convert to mail charset (using nsISaveAsCharset) inside messenger. This is why the editor change didn't affect the messenger output. One benefit of getting uinocde is that it makes text manipulation easier (e.g. parse the body and generate links and mailto like messenger does). Using the stream interface instead, we need to consider some of the text manipulation to be moved to editor. Although the issue itself is a separate one from this bug. 2) Sending nbsp (and other latin1 characters) as not entity encoded, that's the current behavior of html mail send. Not using entity is not illegal because mail is labeled as ISO-8859-1 which includes nbsp as a code point 0xA0. The change to output can be done by just flipping the flag but I need to know if the change is really needed. For message searching, I don't think both IMAP and locale search (the current server and client) supports entity decoding (e.g. cannot search É but can search É).
I agree that using raw NBSP is as legitimate as using raw e-acutes in iso-8859-1 HTML files. But the rationale for using raw codes for alphanumerics and punctuation was to make search/find work easier. I don't think this rationale holds true for NBSP. Maybe we should entity-ize NBSP along with the mandatory entities ('<', '>', '&', etc.).
Comment 10•25 years ago
|
||
I'm agnostic on this issue currently. Other than the fact that it is not widely done to insert raw NBSP code point rather than the entity representation, is there a strong reason why we must use the entity representation? Are there other processes depending on this NBSP entity? BTW, what is the relevance of the news article link above?
Comment 11•25 years ago
|
||
I'd like to see nbsp turned into the entity . I've been confused more than once by the non-entityizing of nbsp, wondering why all the nbsp's in the editor's OutputHTML function were disappearing when printed to stdout, and wondering whether it was a bug in the sink streams. Plain ascii users use nbsp's even if they can't display characters like e-acute, and by the time we get to the point of converting from unicode to ascii, we're past the point where we can decide on what flags to pass in to the converter.
Assignee | ||
Comment 12•25 years ago
|
||
Editor uses a flag to do the entity conversion before the charset conversion (for save/saveas), so is always generated. I think this option is good for editor because charset label is optional (by META tag), it's safer to use as match as entities. For mail, we always label charset so we don't have to always generate entities. Regarding treating nbsp special, I prefer non special handling (i.e. do not want to change the interface only for nbsp). Other latin1 characters may be invisible depending on the glyph availability of the installed fonts. I hold the change until we have a reason to change this for mail send.
Comment 13•25 years ago
|
||
Currently, we special case the mandatory entities: "<" represents the < sign ">" represents the > sign "&" represents the & sign "" represents the " mark So, I was suggesting that we might add " " to this list even though it is not mandatory. The assumption that raw 0xA0 is not very useful...
Assignee | ||
Comment 14•25 years ago
|
||
Yes, those four characters are excluded from the entity conversion interface (i.e. the interface does not generate entities for those characters). The four characters are entity encoded before coming to the entity converter (needed for html escape). I am not sure if nbsp to be the same category but if we add nbsp then probably contentsink/parser need changes, I think.
Comment 15•25 years ago
|
||
It sounds like we're in agreement. Where possible, let's convert these characters back to their entity versions when emitting HTML/XIF.
Assignee | ||
Comment 16•25 years ago
|
||
Akkana, can that be done in ContentSinks?
Comment 17•25 years ago
|
||
The content sinks depend on nsISaveAsCharset to do this entity encoding, so the sinks will output whatever that class returns.
Assignee | ||
Comment 18•25 years ago
|
||
If we want that capability, the interface needs to be extended to accept character base options in addition to the category (e.g. Latin1, Symbol, etc.). Or I can flip the flag now to do the entity conversion before the charaset conversion.
Comment 19•25 years ago
|
||
Maybe flipping the order is the best solution, then. What's the disadvantage of that?
Assignee | ||
Comment 20•25 years ago
|
||
All the Latin1 characters <= 160 are converted to entity. So far, I have not heard clear advantage or disadvantage of doing that, seems to be a matter of taste. Let me flip the flag in early M13 (so that I can at least close the bug).
Assignee | ||
Comment 21•25 years ago
|
||
Assignee | ||
Comment 22•25 years ago
|
||
There is an entity related bug 22315 which needs to be resolved before check in the fix.
Assignee | ||
Comment 23•25 years ago
|
||
Checked in, now it always generates .
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Comment 24•25 years ago
|
||
I know i'm coing in way too late here, but nonetheless I'll suggest that encoding nbsp's as entities is not the right thing to do for mail. There are plenty of mail clients that read ISO-8859-1 but not html. When these recieve html mail they would be better off seeing the nbsp as 0xa0, which will render correctly, rather than as a clutter of '&nbps', which will make the mail even harder to read. For non-mail use, I agree that   is superior.
Assignee | ||
Comment 25•25 years ago
|
||
Adding phil for his opinion.
Comment 26•25 years ago
|
||
verified in 1/7 build.
Assignee | ||
Comment 27•24 years ago
|
||
This issue is raised again in other bug 27376. The bug mentions the HTML spec. I think we want to reconsider the current behavior (both mail and composer). >HTML 4.01 5.3 says >A given character encoding may not be able to express all characters of the >document character set. For such encodings, or when hardware or software >configurations do not allow users to input some document characters directly, >authors may use SGML character references.
You need to log in
before you can comment on or make changes to this bug.
Description
•