Last Comment Bug 6672 - Japanese text is removed from HTML mail
: Japanese text is removed from HTML mail
Product: MailNews Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: x86 Windows NT
P3 normal (vote)
: M11
Assigned To: Jean-Francois Ducarroz
: Katsuhiko Momoi
Depends on: 5894
Blocks: 7228
  Show dependency treegraph
Reported: 1999-05-18 12:52 PDT by nhottanscp
Modified: 2008-07-31 01:22 PDT (History)
6 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---

diff file (796 bytes, text/plain)
1999-08-05 11:33 PDT, nhottanscp
no flags Details
Possible reimplementation of nsHTMLContentSinkStream::Write (607 bytes, text/plain)
1999-08-05 14:51 PDT, Akkana Peck
no flags Details

Description User image nhottanscp 1999-05-18 12:52:52 PDT
Tested with 1999-05-17-09-M6 build on Japanese NT.

I created a new mail windows and typed both Japanese and ascii, messenger only
gets text in ascii range from Ender.
So I cannot send Japanese text with HTML mail.
Comment 1 User image nhottanscp 1999-05-18 12:54:59 PDT
M6, cannot send Japanese HTML mails (blocks the testing).
Comment 2 User image Greg Kostello 1999-05-19 09:46:59 PDT
A few weeks back we added a new signature to OutputHTML to take in an
nsIOStream* object. This new method returned back a stream of HTML in the
document specified charset. The old OutputHTML method is still there, but it
only returns back Unicode characters. The mail code need only call the method
using the new parameter to get the properly encoded document.

Reassigning to Jean-Francois to make the change in the mail/news code.
Comment 3 User image nhottanscp 1999-05-19 13:14:59 PDT
nsIDOMEditorAppCore.h does not contain the stream version.
nsEditorAppCore.h does have the stream version but it does not take the charset
Ender interface should also change if messenger change to use the stream
Comment 4 User image Jean-Francois Ducarroz 1999-05-19 14:17:59 PDT
Changing Messenger for using nsIOstream will imply a lot of work that we will have to trash anyway in the next
milestone as we are rewriting the backend of message compose. Sorry but too late for this change. If HTML is really
a blocking problem especially for Japanese, we should maybe change the default for message composition to be plain
text and not HTML
Comment 5 User image nhottanscp 1999-06-02 16:42:59 PDT
Has the nsIOstream version been provided by Ender? Otherwise, we would like
nsString version to be fixed.
Comment 6 User image Greg Kostello 1999-06-03 08:11:59 PDT
I am transitioning the Ender code to Akkana. We should have the stream API
completed by the end of M7.
Comment 7 User image Akkana Peck 1999-06-22 14:26:59 PDT
The new nsIStream methods are in place in the editor interface.  I'm not quite
sure what's being asked for in this bug -- could someone please take a look at
the interfaces that are there now and let me know if we need to do something

For instance, the new editor shell doesn't have any stream methods; does it need
to?  I think I had some in the editor app core, but then the app core went away.
Comment 8 User image Jean-Francois Ducarroz 1999-08-04 13:32:59 PDT
Comment 9 User image nhottanscp 1999-08-04 14:33:59 PDT
This still happens (with my local build pulled 8/1).
I put three Japanese characters in the editor (0x3042, 0x3044, 0x3046) but those
are not in the unicode text returned from the following call,
m_editor->GetContentsAs(format.GetUnicode(), flags, &bodyText);
in nsMsgCompose::SendMsg.
Other text (e.g. html tags) were returned from the editor.
I have Japanese NT, let me know if further testing is needed.
Comment 10 User image nhottanscp 1999-08-05 11:33:59 PDT
I debugged with my Japanese NT and patched nsHTMLContentSinkStream::Write to
make Japanese to work. Note that this is not a fix proposal, I am not familiar
with the code. Someone in Ender should fix this.
In the code, there are two cases handled (stream and nsString). For nsString, I
think we don't need conversion since both source and target are UCS2. So I
changed to just append the string. Japanese was treated as Latin1 and converted
In order to make Japanese to work I had to change one more thing. The same
problem as bug#10940, nbsp 0x00A0 was returned to messenger and Japanese
(ISO-2022-JP) converter cannot map this character. So I changed to replace nbsp
to space. This should be done somewhere else to create an entity (&nbsp)
I'll attach my diff.
Comment 11 User image nhottanscp 1999-08-05 11:33:59 PDT
Created attachment 1143 [details]
diff file
Comment 12 User image Akkana Peck 1999-08-05 11:50:59 PDT
I looked at the patch, and I don't understand it.  We send the buffer through
the unicode encoder, then if we're doing the string (rather than stream) case we
have special-case logic for that one html entity.

Shouldn't the unicode encoder, or a service related to it, handle that?  Isn't
this wrapped up with the ongoing issue of having a service to encode entities,
bug 8865?  It shouldn't be the job of the nsHTMLContentSinkStream, because then
we'll have to duplicate the logic in the nsHTMLToTextSinkStream and any other
output content sink that might appear in the future.
Comment 13 User image nhottanscp 1999-08-05 13:11:59 PDT
>I looked at the patch, and I don't understand it.
If you're talking about the entity, I put the hack there just make it work.
Ender eng. should find out the right solution.
> Note that this is not a fix proposal, I am not familiar
> with the code. Someone in Ender should fix this.
Please let me know for any help for debugging/testing (I have Japanese NT).
Comment 14 User image Akkana Peck 1999-08-05 13:28:59 PDT
I don't see that this is an Ender issue.  It looks to me like an entity/I18n
issue, related to bug 8865 and dependant on that bug.
Comment 15 User image nhottanscp 1999-08-05 13:42:59 PDT
> I don't see that this is an Ender issue.  It looks to me like an entity/I18n
> issue, related to bug 8865 and dependant on that bug.
There are two problems, the entity bug is a secondary. How about the problem of
applying incorrect (and probably unnecessary) conversion to get UCS2 from UCS2?
I think that needs to be fixed, is that also related to an entity/I18n issue?
Comment 16 User image Akkana Peck 1999-08-05 13:53:59 PDT
I don't see in the patch where you've avoided doing any conversions; all it does
is loop over the string before it's appended to mString.  (I wonder if bugzilla
isn't showing me the whole patch, or something?)

Which conversion are you saying doesn't need to be there -- the EncodeToBuffer
call?  That could well be right -- I haven't been able to find anyone who admits
to understanding what EncodeToBuffer does or when it needs to be called.  If you
know of a case where it's being called and shouldn't be, I'm quite willing to
believe you.
Comment 17 User image nhottanscp 1999-08-05 14:00:59 PDT
> I don't see in the patch where you've avoided doing any conversions;
No, I didn't but I ignored the result of the conversion then appened the
original string instead of converted one.
> Which conversion are you saying doesn't need to be there -- the EncodeToBuffer
> call?
Yes, I think that's not needeed when appending to mString because both strings
are UCS2.
Comment 18 User image Akkana Peck 1999-08-05 14:51:59 PDT
Created attachment 1146 [details]
Possible reimplementation of nsHTMLContentSinkStream::Write
Comment 19 User image Akkana Peck 1999-08-05 14:55:59 PDT
Okay, thanks for the explanation.  I've attached a possible rewrite of
nsHTMLContentSinkStream::Write.  Could you take a look at it and see if you
think it would solve the problem, and/or plug it in and see if it works okay?
Or, if it's easier (let me know), I can check it in inside an ifdef.
Comment 20 User image bobj 1999-08-06 11:29:59 PDT
CC'd tague because of the entity issue.  I agree with Akkana, the entity
issue probably should be resolved by the fix for bug 8865, but that is planned
for M10.  If we could apply a M9 workaround, then it would allow the testing
of the mail sending code.  Would it be possible to modify the code that already
generates other entities (e.g., < or &eacute) to also generate   ?
That code will probably be modified in conjuction with the fix for bug 8865.
This temporary fix would give us 3 more weeks of testing Japanese mail sending.
Tague, Can you look into this and confer with Akkana?
Comment 21 User image nhottanscp 1999-08-06 11:56:59 PDT
> and/or plug it in and see if it works okay?
Yes, I plugged in the code and tested a simple ascii html mail, worked fine.
Japanese mail cannot be tested/verified because of the entity issue. But the
change should handle any unicode equally ascii or Japanese, so I think that's no
problem. I would like this to be checked in (no ifdef) so if the entity issue is
resolved then we can send non-ascii html correctly.
I didn't know how to check stream case so please check if that part is also
working before check in.
Comment 22 User image Akkana Peck 1999-08-06 13:54:59 PDT
The fix to not convert in the nsString case is checked in (no ifdef).
Comment 23 User image Jean-Francois Ducarroz 1999-08-06 14:52:59 PDT
by removing the two nbsp entities in defaultHTMLBody.html, nhotta and myself
were able to send HTML Japanese message. The only problem left is if you delete
the default body before typing your message, ender(?) will add a nbsp at the end
of the data and therefore Japanese mail composition will failed. it's same than
for plain text japanese message (bug 10940)
Comment 24 User image Jean-Francois Ducarroz 1999-08-06 17:46:59 PDT
Except entity issue with nbsp character in the body, HTML Japanese mail works
now. We need to wait for bug 8865 to be fixed in order to be able to close this
bug report.
Comment 25 User image Jean-Francois Ducarroz 1999-08-30 15:43:59 PDT
Move for M11. Still waiting on 8865 to close it...
Comment 26 User image nhottanscp 1999-09-15 14:07:59 PDT
Hooked up the entity converter to nsMsgSend.cpp.
Comment 27 User image Katsuhiko Momoi 1999-09-17 07:15:59 PDT
** Compared 9/15/99 Win32 M11 build to 9/16/99 Win32 M11 build **

Beginning with the 9/16 build, I see &nbsp's in the Japanese HTML
mail body when I input ASCII spaces. The number of SPACES match
the number of NBSP's. Whereas in the 9/15 build, I see only spaces
in HTML (i.e. workaround) for the same ASCII space input.
I'll mark fix verified.

Note You need to log in before you can comment on or make changes to this bug.