Closed
Bug 23418
Opened 25 years ago
Closed 24 years ago
"File | save as file" has problems with Japanese messages saved as plain text
Categories
(MailNews Core :: Internationalization, defect, P2)
Tracking
(Not tracked)
VERIFIED
FIXED
M18
People
(Reporter: momoi, Assigned: nhottanscp)
References
Details
(Whiteboard: nsbeta3+, patch in hand, reviewed)
Attachments
(7 files)
** Observed with 1/6/99 Win32 build **
When non-ASCII msg is saved as a file using File | Save as menu,
the file is current saved without any extension and "as is" whether
or not the option selected by the user is HTML or plain text.
Some user may be attempted to supply the .html extension him/herself.
If so, the result will be an Unicoded encoded somewhat deficient HTML
file. This needs to be corrected.
I guess HTML format should save as is -- JPN msg should be saved in
HTML format and in JIS.
WE should also try supplying .txt extension and see what happens.
Also, if there isn't one, we should file a separate bug for
the format option malfunction at the save as dialog window.
Reporter | ||
Comment 1•25 years ago
|
||
Assignee | ||
Updated•25 years ago
|
Assignee: nhotta → jefft
Assignee | ||
Comment 2•25 years ago
|
||
Reassign to jefft.
Save as text/html -> Currently saved as UTF-8 (wrong) - we need to convert UTF-8
to mail charset (e.g. ISO-2022-JP).
Save as text/plain -> IQA need to test the current behavior. The spec it to
convert the data to platform file charset. Below is the code to get the platform
file charset.
#define NS_IMPL_IDS
#include "nsIPlatformCharset.h"
#undef NS_IMPL_IDS
nsCOMPtr <nsIPlatformCharset> platformCharset;
nsAutoString aPlatformCharset;
rv = nsComponentManager::CreateInstance(NS_PLATFORMCHARSET_PROGID, nsnull,
NS_GET_IID(nsIPlatformCharset),
getter_AddRefs(platformCharset));
if (NS_SUCCEEDED(rv))
{
rv = platformCharset->GetCharset(kPlatformCharsetSel_FileName,
aPlatformCharset);
}
Reporter | ||
Comment 3•25 years ago
|
||
Currently when we save as into txt format, this is what happens:
1. Save as into text without explicitly supplying the .txt extension
by the user. --> saves the whole msg including vCard, etc. into
ISO-2022-JP. This looks like source data themselves.
2. Save as into text by supplying .txt extension yourself.
Saves without VCard and other extra parts but into iso-2022-jp
rather than expected Shift_JIS for JPN Windows.
Reporter | ||
Comment 4•25 years ago
|
||
Both case 1 and case 2 should save without extra parts and into Shift_JIS
rather than in ISO-2022-JP.
Comment 5•25 years ago
|
||
I have just added a new function in nsMsgI18N.h named msgCompFileSystemCharset
(oops, not enough generic name, maybe we should rename it) that give you back
the file system character set.
Reporter | ||
Comment 6•25 years ago
|
||
I would like to designate this as beta1 since "file | save as" is something
users do use and its UI should be non-confusing and
its resulting effects should be consistent and correct.
Keywords: beta1
Updated•25 years ago
|
Status: NEW → ASSIGNED
Updated•25 years ago
|
Whiteboard: [PDT+] → [PDT+] Land by: 2/11/00
Comment 9•25 years ago
|
||
Ok, I have most of this done for plain text, but I have a question on saving
messages as HTML. Why can't I emit a META tag header with UTF-8 as the output
for the saved HTML web page. With the current architecture, you suggestion of
going back to the original charset of the message if full of problems and I
don't think will give us a very good result.
Comments?
- rhp
Reporter | ||
Comment 10•25 years ago
|
||
I think users will want to see the original data encoding.
On most platforms, you can then open such a file with
a text editor and see & edit the content without a problem.
That is not going to be the case if you save into UTF-8.
That and in the past we have been saving the original data
as is under HTML extension and I don't think we should break the
familiar behavior.
Comment 11•25 years ago
|
||
I understand your point and I think I have an idea how to fix this...but just
to clarify, I don't think this feature really works on 4.x. When you do a Save
As for a mail message in 4.x (HTML format), you basically get the raw contents
of the message dumped to the file with RFC822 headers, etc.
We'll do better than this for 5.x :-)
- rhp
Comment 12•25 years ago
|
||
Ok, I've tried to improve this performance. Is this all going to work 100%,
probably not. For plain text, I am getting the output from libmime and saving
it in plaintext as the charset of the system.
For HTML, I am trying to output the message in the original charset without any
converstion.
momoi san,
Now, there are a million combinations that can happen here and scenarious that
will probably break, but what I need you to do is help me with the major issues
and I debug from there.
- rhp
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 13•25 years ago
|
||
Hi, Rich. In the case of HTML msg, are you going to strip out
some headers and MIME structural material? Under 4.x, we didn't
do any of that and saved the original msg. In case we run into
complicated problems, can we go back to that simple msg
saving method for HTML.
Comment 14•25 years ago
|
||
Actually, the way this works is as follows:
- Save As: .EML (or no extension) - Saves the raw RFC822 message to a file
- Save As: .TXT - Does its best job at converting what you see in your message
display into a .TXT file (in the native charset of the system)
- Save As: .HTML - Again, it outputs HTML that will give you a display similar
to what you see in your 3 pane message display
Actually, Communicator 4.x "Save As" HTML doesn't really work. It simply saves
the raw RFC822 text into a file. So, in 5.x we will actually generate something
that is an HTML document without any RFC822 messages.
- rhp
Reporter | ||
Comment 15•25 years ago
|
||
Great! Look forward to seeing the results intoday's build.
Reporter | ||
Comment 16•25 years ago
|
||
I'm not done looking at this yet, but so far I have found a couple of problems.
1. Even though, the save as dialog shows possible extension types, it does not supply one
automatically unless the user writes it in. Otherwise it's all saved in ,eml format.
2. When saving into .txt format, the headers (To:, CC:, Date:, etc.) and their content are
separated into different lines, e.g.
From:
Jane Banning
Comment 17•25 years ago
|
||
Momi,
The separate line problem is in the HTML - TEXT converter. Table conversion is
pretty bad. Don't reopen this bug on that issue.
- rhp
Reporter | ||
Comment 18•25 years ago
|
||
Thanks. Browser save as has the same problems as you say for .txt format.
The charset converion to system charset in ,txt format seems to be working well.
Reporter | ||
Comment 19•25 years ago
|
||
Rich, sorry it's taking me longer to verify this fix.
I've found some definite problems:
1. We are saving the original MIME-encoded subject headers
in UTF-8 rather than in the original encoding indicated by the
MIME charset in the header when using the HTML save option.
2. Earlier .txt (when supplied by the user) files were generally
saved correctly -- that is what I reported. But now with this
build, all I see is the same data as .eml file. What happened
since then?
3. We don't automatically supply the extensions even though the
file | save as file dialog window clearly shows the
possible extension types. Users are used to the application
automatically supplying the extension in the dialog if none
is entered.
Re-opening for the above reasons... there may have been regressions
in the last few days also for .txt type.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 20•25 years ago
|
||
The build I saw the problems reported above was: Win32 2000021708.
Comment 21•25 years ago
|
||
Ok, well the REGRESSION that we are seeing (i.e. extensions are ignored) is a
bug that I think I found in nsString().
Rick: On line 517 of xpcom/ds/nsStr.cpp, there is a line:
const char* destLast = root+((aDest.mLength-1)*aDelta); //pts to last char
in aDest (likely null)
but if you really want to point to the NULL character of this string, the "-1"
shouldn't be there. I changed the line to remove the -1 and it started working
again. With the -1, you never get a hit.
The headers issue sounds like a bug I will tackle. Also, I will look at adding
the extensions for these files.
Technically, I'm supposed to be on vacation today, but I'll see what I can do.
- rhp
Status: REOPENED → ASSIGNED
Comment 22•25 years ago
|
||
Ok, I think I have a fix for the header conversion stuff, but I need Naoki's
help now.
I will attach a patch that you can apply from the same level as you /mozilla
directory. It fixes the headers being converted into UTF-8. Now, I have a
problem with the ConvertFromUnicode() call. I pass in a bunch of unicode data
(I think) and I want it converted, but for one of the Smoketest email messages,
the ConvertFromUnicode() routine seems to truncate after the text "Subject: ".
Can you help me with this one?
Other than that, I think this is close to being fixed.
- rhp
Whiteboard: [PDT+] Land by: 2/11/00 → [PDT+]
Comment 23•25 years ago
|
||
Comment 24•25 years ago
|
||
Assignee | ||
Comment 25•25 years ago
|
||
I had just started to rebuild the tree. I'll try the patch when it's done.
I look at the patch. I have not followed this bug in detail, so the header is
supposed to be converted to the charset specified in MIME header instead of
UTF-8? I am not sure which part of the code fixed it.
Anyway, I'll test after the build is done.
Comment 26•25 years ago
|
||
Right, we should be saving the file as the original message charset. The thing
that fixed the problem was taking out the "header=quoting" line. This change
the behavior in libmime.
If you can try to save that message off, you'll see the conversion problem.
Thanks for the help!
- rhp
Assignee | ||
Comment 27•25 years ago
|
||
Okay, my build completed. And I applied the patch and did nmake at
mailnews/base. But headers are still saved as UTF-8 (I saved as html).
I set a break point at the changed line as below but didn't hit.
--- 1417,1423 ----
ConvertBufToPlainText(m_msgBuffer);
rv = ConvertFromUnicode(msgCompFileSystemCharset(), m_msgBuffer,
&conBuf);
Comment 28•25 years ago
|
||
You won't hit that line unless you save as plain text. Make sure you are saving
the file as "test.TXT".
- rhp
Assignee | ||
Comment 29•25 years ago
|
||
I saved as .TXT (selected it from the popup and add extension manualy) that
works (saved correctly) but it didn't hit the break point.
Save as html, just above the break point I mentioned, it checks a value
m_doCharsetConversion.
In my case that's 0 (false), do I need to change my environment?
BTW, I think the reason of the conversion failure you saw is probably your using
US windows. That case, the file system charset is windows-1252 and the convert
will fail to convert the Japanese text.
Updated•25 years ago
|
Whiteboard: [PDT+] → [PDT+] [2/19/00]
Comment 30•25 years ago
|
||
I've landed my change to nsStr which partially fixes this problem. RHP -- it's
up to you now.
Comment 31•25 years ago
|
||
Ok, I have this boiled down to an issue of "Save As" into HTML still converting
the headers to UTF-8. I know what I need to do so this shouldn't be too hard to fix.
- rhp
Comment 32•25 years ago
|
||
Ok, I have this fixed now and will get it into the tree when I get approval.
- rhp
Comment 33•25 years ago
|
||
Ok, this should be much better than it was now. - rhp
Status: ASSIGNED → RESOLVED
Closed: 25 years ago → 25 years ago
Resolution: --- → FIXED
Comment 34•25 years ago
|
||
Reopening this bug. I found a test case that broke it, but the good thing is
that I have it fixed and just need to get permission to checkin.
- rhp
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 35•25 years ago
|
||
Just updating the whiteboard. I have a pretty simple fix in hand.
- rhp
Whiteboard: [PDT+] [2/19/00] → [PDT+] [2/22/00]
Updated•25 years ago
|
Status: REOPENED → ASSIGNED
Comment 36•25 years ago
|
||
Ok, I think my latest checkins fixed this problem now. - rhp
Status: ASSIGNED → RESOLVED
Closed: 25 years ago → 25 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 37•25 years ago
|
||
I still see 2 problems with ** 2/24/2000 Win32 build **
1. Extensions need to be supplied to get the right results.
--> Maybe this should go to another bug.
2. When saving into .txt file format, it does not save into
the system charset. So this means that JPN mail msg is saved
in ISO-2022-JP rather than in Shift_JIS.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 38•25 years ago
|
||
I have the first problem fixed in my tree...let's not file a new bug on this.
On the second one, I save the plain text file in the following charset:
nsAutoCString(msgCompFileSystemCharset()
if this is wrong, I would have Naoki look at it.
I really want to get this bug resolved...I've spent a lot of time on this one
issue.
- rhp
Comment 39•25 years ago
|
||
I've tested this to the best of my abilities on my machine. I am converting
the Unicode text from the mail message into the charset I get from:
msgCompFileSystemCharset()
I can send you a recent diff to look at what is going into the tree eventually
to fix the extension request.
- rhp
Assignee: rhp → nhotta
Status: REOPENED → NEW
Assignee | ||
Comment 41•25 years ago
|
||
msgCompFileSystemCharset() returns Shift_JIS on my NT-J.
I debugged the conversion code it's getting a conversion charset as Shift_JIS.
So it's supposed to convert from unicode to Shift_JIS.
But the data it's getting is ISO-2022-JP in UCS2 format. The Shift_JIS converter
just removes the padded zeros from each character, the result we get is
ISO-2022-JP.
Assignee | ||
Comment 42•25 years ago
|
||
On my machine it happends with nsMessenger.cpp 1.136 or later
but not happens with 1.135 (at least for body).
Assignee: nhotta → rhp
Status: ASSIGNED → NEW
Comment 43•25 years ago
|
||
will investigate.
Status: NEW → ASSIGNED
Whiteboard: [PDT+] [2/22/00] → [PDT+]
Comment 44•25 years ago
|
||
Ok, I think I have a fix for the Save As Text. To tell for sure, I'm going to
have to send 2 patches to Naoki for him to try.
Let me generate those and I will send it in email. This will also address the
extension issue.
- rhp
Comment 45•25 years ago
|
||
Ok, after working with naoki and momoi san, I have this fixed...really :-) I
will get this reviewed.
- rhp
Whiteboard: [PDT+] → [PDT+] CAN CHECKIN FIX ANY TIME
Updated•25 years ago
|
Summary: "File | save as file" saves in Unicode when user supplies html extension → [FIXED] "File | save as file" saves in Unicode when user supplies html extension
Comment 46•25 years ago
|
||
Ok, this one should be fixed once and for all :-)
- rhp
Status: ASSIGNED → RESOLVED
Closed: 25 years ago → 25 years ago
Resolution: --- → FIXED
Summary: [FIXED] "File | save as file" saves in Unicode when user supplies html extension → "File | save as file" saves in Unicode when user supplies html extension
Reporter | ||
Comment 47•25 years ago
|
||
** Checked with 3/7/2000 Win32 build **
Saving as into .eml, .html, and .txt is now generally working.
You can let the dialog supply an extension or supply one yourself.
Either way, this works.
I have not seen any problem with saving into .eml and .html files.
There is a problem, however saving into .txt file, particularly
Japanese (ISO-2022-JP) messages. The data is cut off in the process
of saving. I'll append 2 images showing one such example.
I'll also upload a test msg file showing 4 JPN msgs + 1 UTF-8 (JPN)
test message.
It's hard to predict where the cut off occurs saving into .txt
format. But in all 4 messages, it does. I've seen something similar
before in processing ISO-2022-JP and mistaking some bytes as
HTML tags or special characters. It also does not look like we
handle ASCII well saving into .txt format. You get a "?" symbol
in one of the msgs. The cut off may be occurring for more than
one reasons.
I'm re-opening this...
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 48•25 years ago
|
||
Reporter | ||
Comment 49•25 years ago
|
||
Reporter | ||
Comment 50•25 years ago
|
||
The image above is a relatively benign case of a few characters at the
end being cut off. 5 test messages show much more servere
truncation of one kind or another.
The test file is attached below.
Reporter | ||
Comment 51•25 years ago
|
||
Reporter | ||
Comment 52•25 years ago
|
||
Put the mailbox file into your Loacal folder and
save each one into .txt format under Japanese Windows.
Well, this could be a problem for rhp.
Naoki, can you help with debugging this problem?
Reporter | ||
Comment 53•25 years ago
|
||
I'm going to re-do the quoted portion below because some crucial words
have been omitted due to my poor typing.
"It's hard to predict where the cut off occurs saving into .txt
format. But in all 4 messages, it does. I've seen something similar
before in processing ISO-2022-JP and mistaking some bytes as
HTML tags or special characters. It also does not look like we
handle ASCII well saving into .txt format. You get a "?" symbol
in one of the msgs. The cut off may be occurring for more than
one reasons."
shoul read:
"It's hard to predict where the cut off occurs saving into .txt
format. But in all *5* messages, it does. I've seen something similar
before in processing ISO-2022-JP and mistaking some bytes as
HTML tags or special characters. It also does not look like we
handle ASCII *space* well saving into .txt format. You get a "?" symbol
in one of the msgs. The cut off may be occurring for more than
one reasons."
I've indicated addition of 2 words with * *.
Comment 54•25 years ago
|
||
I'll investigate, but I've put so much time into this already, I need to focus
on a bunch of other issues. Sorry, this is good enough for beta. I will change
the summary, clear the whiteboard, etc.. and work on it later.
- rhp
Status: REOPENED → ASSIGNED
Keywords: beta1
Summary: "File | save as file" saves in Unicode when user supplies html extension → "File | save as file" has problems with Japanese messages saved as plain text
Whiteboard: [PDT+] CAN CHECKIN FIX ANY TIME
Target Milestone: M14 → M17
Comment 55•25 years ago
|
||
*** Bug 39357 has been marked as a duplicate of this bug. ***
Updated•25 years ago
|
Target Milestone: M17 → M18
Comment 56•24 years ago
|
||
I don't seem to be getting cutoff files when saved. Can you retest this?
- rhp
Status: ASSIGNED → RESOLVED
Closed: 25 years ago → 24 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Comment 57•24 years ago
|
||
Sorry. Somehow this escaped my attention for a while.
Ilooked at this problem with 9/8/2000 Win32 build.
The same problem still exists.
The body text: XXXPlain text
is saved as: XXXPlain te
thus missing the last 2 letters.
I tried a number of Japanese messages both HTML and plain text
type, they all suffer from this problem.
I'm re-opening this bug. I think you need to have a Japanese
Windows system to see this problem clearly because it involves
saving into Shift_JIS. I don't see this problem when saving
ASCII mail.
Naoki, please take a look at this. There seems to be a problem
in converting to Shift_JIS.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment 58•24 years ago
|
||
Naoki,
If you can see if this is a conversion error, that would be great.
- rhp
Assignee: rhp → nhotta
Status: REOPENED → NEW
Assignee | ||
Comment 59•24 years ago
|
||
I can reproduce when saving as a text file but not html.
It always cut at the bottom of the file so I assume there something wrong in
length calculation.
Reporter | ||
Comment 60•24 years ago
|
||
Saving into a plain text file is problematical when the message is
question is HTML or plain text type.
Of all the saving options, this one is probably used the most.
Nominating for nsbeta 3.
Keywords: nsbeta3
Assignee | ||
Comment 63•24 years ago
|
||
Updated•24 years ago
|
Whiteboard: nsbeta3+ → nsbeta3+, patch in hand need review
Assignee | ||
Updated•24 years ago
|
Whiteboard: nsbeta3+, patch in hand need review → nsbeta3+, patch in hand, reviewed
Assignee | ||
Comment 64•24 years ago
|
||
Fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago → 24 years ago
Resolution: --- → FIXED
Comment 65•24 years ago
|
||
Verified with win32 2000091909 and linux 2000091906 build. It's fixed.
Status: RESOLVED → VERIFIED
Updated•20 years ago
|
Product: MailNews → Core
Updated•16 years ago
|
Product: Core → MailNews Core
You need to log in
before you can comment on or make changes to this bug.
Description
•