Closed Bug 1521007 Opened 5 years ago Closed 5 years ago

Sending documents from MS Word can break document names

Categories

(MailNews Core :: Simple MAPI, defect)

defect
Not set
normal

Tracking

(thunderbird_esr6065+ fixed, thunderbird65 fixed, thunderbird66 fixed)

RESOLVED FIXED
Thunderbird 66.0
Tracking Status
thunderbird_esr60 65+ fixed
thunderbird65 --- fixed
thunderbird66 --- fixed

People

(Reporter: mikekaganski, Assigned: mikekaganski)

References

Details

Attachments

(4 files, 1 obsolete file)

Attached patch CP_UTF8.diff (obsolete) — Splinter Review

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36

Steps to reproduce:

Sending documents from MS Word on Windows can make the documents' names unreadable in the composed email, when the current language for non-Unicode applications (ACP) is not UTF-8.

Word uses previously-undocumented feature of MapiMessage struct [1] to take a special value of ulReserved equal to CP_UTF8, when it means "strings are UTF-8-encoded instead of ACP-encoded". Word does not use MAPISendMessageW (which support is being added in bug 1048658, taking UTF-16 strings), even when it is present.

To resolve the issue, TB needs to support this possible special value.

Steps:

  1. Open attachment 8957565 [details] from bug 1048658 in MS Word (tested with Word 2016) on Windows. Its name contains an English, a Cyrillic, and a Greel character, and is only representable in ACP if it is set to UTF-8 (beta function in Windows 10 at the moment); for testing, non-UTF-8 ACP is required.
  2. Use its "Send by email" feature (for Word 2016, it's File->Share->Email->Send as Attachment).

Actual results:

The composer window with the document appears, where subject, and the attached file are filled. In both, the file name if broken, and is not "aбγ.odt" as it should be.

Expected results:

The file name must have been "aбγ.odt" in both Subject, and Attachments fields.

The attachment is a proposed patch. Unfortunately, I couldn't test the code that handles addresses; a Mail Merge test is required.

[1] https://docs.microsoft.com/ru-ru/windows/desktop/api/mapi/ns-mapi-mapimessage

See Also: → 1048658
Attachment #9037465 - Attachment is patch: true
Attachment #9037465 - Attachment mime type: text/x-patch → text/plain

Hmm, another bug with no audience ;-)

I'll rebase this for bug 1048658 an get it landed, thanks!

Rebased for bug 1048658. I'll comment in a moment.

Attachment #9037465 - Attachment is obsolete: true
Assignee: nobody → mikekaganski
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment on attachment 9037532 [details] [diff] [review]
1521007-CP_UTF8.patch

Review of attachment 9037532 [details] [diff] [review]:
-----------------------------------------------------------------

Great, we continue cleaning up the mess ;-)

::: mailnews/mapi/mapihook/src/msgMapiHook.cpp
@@ +577,5 @@
>    if (aMessage->lpOriginator)
>    {
>      nsAutoString From;
> +    if (!bUTF8)
> +        From.Append(NS_ConvertASCIItoUTF16(aMessage->lpOriginator->lpszAddress));

Was this NS_ConvertASCIItoUTF16() correct to start with? Should that have been NS_CopyNativeToUnicode() like elsewhere?
Also three times further down for To, CC, Bcc.
Attachment #9037532 - Flags: review+
Comment on attachment 9037532 [details] [diff] [review]
1521007-CP_UTF8.patch

Review of attachment 9037532 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/mapi/mapihook/src/msgMapiHook.cpp
@@ +577,5 @@
>    if (aMessage->lpOriginator)
>    {
>      nsAutoString From;
> +    if (!bUTF8)
> +        From.Append(NS_ConvertASCIItoUTF16(aMessage->lpOriginator->lpszAddress));

Well - as I mentioned, I have been unable to check this in action... *possibly* this is expected to come Web-encoded and be actually ASCII? then this "addresses" section would not require special handling wrt CP_UTF8. The Microsoft documentation doesn't mention that this flag affects anything except "lpszSubject, lpszNoteText, lpszMessageType, lpszDateReceived, lpszConversationID", while I *know* (because I see that) that at least file descriptions' file names are affected (but not sure about file paths; would need to setup a temp path in a Unicode-named directory to try to make sure; for now I left filepath unchanged)...

FWIW, I have left a feedback on the MS documentation page to inform them about missing documentation bits - who knows, nowadays they are quite open.

(In reply to Mike Kaganski from comment #4)

possibly this is expected to come Web-encoded and be actually ASCII?
Hey, you're saying that MS would actually follow the international standard of RFC 6532 that allows raw UTF-8 in headers but nothing else? Sadly I've seen too many bugs where people complain about windows-1252 headers that don't work.

Anyway, we can go with this for now and follow-up later. Now over to bug 594239 or does the patch not need rebasing there? Since we're landing two MAPI bugs today, we can leave that one for another day ;-)

For the record: I duplicated my question at https://social.msdn.microsoft.com/Forums/en-US/4d029672-e1e6-4d79-9b13-d027e79bdf6e, so possibly some information will appear later.

I'm OK with proceeding to bug 594239 :-) - are you going to rebase it, or should I?

You could, if you woudn't mind. On changed lines, it would be good to fix any white-space sins, and to change the indentation to two spaces as I've done in the SendMailW bug.

Of course! I'll do it after this change gets merged, is that OK? (It's easier for me to update then continue the patch.)

Of course :-)

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/41d0ee56c665
Support UTF-8 encoded MAPI data to avoid broken document names when sending documents from MS Word. r=jorgk

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 66.0
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/e87ac6b1e47e
Follow-up: fix variable name of UTF-8 indicator. r=me DONTBUILD

Sorry, I had to interfere with this a little more. We use 'a' for call parameters/arguments, I've never seen a 'b' prefix.

Looking at the code again, I came across this:
pFile->InitWithNativePath(nsDependentCString((const char*)aFiles[i].lpszPathName));

What happens if the data is UTF-8? The pathname won't be, just the file name? Maybe we need to use InitWithPath instead? I guess you could try on a system with non-UTF-8 system locale and a folder of テスト using MS Word (which I don't have). In case we need to change this, just attach a follow-up patch here.

(In reply to Jorg K (GMT+1) from comment #13)

Sorry, I had to interfere with this a little more. We use 'a' for call parameters/arguments, I've never seen a 'b' prefix.

Sorry - it's my current habit (a LibreOffice codebase convention for bool params; excuse me).

Looking at the code again, I came across this:
pFile->InitWithNativePath(nsDependentCString((const char*)aFiles[i].lpszPathName));

What happens if the data is UTF-8? The pathname won't be, just the file name? Maybe we need to use InitWithPath instead?

:-) just what I described in comment 4.

(In reply to Mike Kaganski from comment #14)

:-) just what I described in comment 4.

Sorry, I'm unable to memorise all the ~200 bugmails I read/glance at every day (Friday: "only" 160, Thursday 203) :-(

So can you simply create a folder called テスト ?

(In reply to Jorg K (GMT+1) from comment #13)

Looking at the code again, I came across this:
pFile->InitWithNativePath(nsDependentCString((const char*)aFiles[i].lpszPathName));

What happens if the data is UTF-8? The pathname won't be, just the file name? Maybe we need to use InitWithPath instead? I guess you could try on a system with non-UTF-8 system locale and a folder of テスト using MS Word (which I don't have).

(In reply to Jorg K (GMT+1) from comment #15)

So can you simply create a folder called テスト ?

Word prepares a temp file for sending, with the name like

C:\Users\mikek\AppData\Local\Microsoft\Windows\INetCache\Content.Word~WRD0002.docx.docx

so I'll try first setting LocalAppData to such a directory; if this won't work, I'll create a user with such a funny name.

Attached patch PathUTF8.diffSplinter Review

(In reply to Jorg K (GMT+1) from comment #13)

You were absolutely right.
I have tested with a user conveniently named テストабв, and was able to see the UTF-8 paths arriving from Word. Here's the patch addressing that.

The only thing left (to a follow-up patch) is to add a DWORD registry value "SupportUTF8" to HKLM\SOFTWARE\Clients\Mail\Mozilla Thunderbird, which I discovered by reading source code of MAPISendMailHelper (see [1] for details). I haven't yet found where the relevant registry value (DLLPath) is set.

[1] https://social.msdn.microsoft.com/Forums/en-US/4d029672-e1e6-4d79-9b13-d027e79bdf6e/what-is-the-true-scope-of-cputf8-in-mapimessageulreserved

I tried to find the place where the registry settings are written. I supposed that's mail/installer/windows/nsis/shared.nsh (or suite/installer/windows/nsis/shared.nsh) and macro SetClientsMail in them, and that gets into dist\bin\uninstall\helper.exe; but adding WriteRegDWORD HKLM "$0" "SupportUTF8" 1 after WriteRegStr HKLM "$0" "DLLPath" "$6", and running resulting uninstall\helper.exe there does not add SupportUTF8 reg key.

What am I missing?

Comment on attachment 9037756 [details] [diff] [review]
PathUTF8.diff

Excellent :-) - BTW, no spaces between function name and parameters.

As for the installer, I have no idea, last I looked I was searching for 00B0D0F3BAA7, but that found the same file you found. I think it's material for a follow-up bug and we can ask the M-C installer experts.
Attachment #9037756 - Flags: review+
Attached patch SupportUTF8.diffSplinter Review

Oh I found my mistake! It did it right all the time; just I checked 32-bit registry hive when it wrote to 64-bit hive.

So this patch adds the registry setting, so that clients could check that TB supports passing UTF-8 strings when using MAPISendMail API.

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/609c28fdc2a7
Follow-up: also check for UTF-8 when copying file path. r=jorgk DONTBUILD

(In reply to Jorg K (GMT+1) from comment #19)

BTW, no spaces between function name and parameters.

Sorry for that - and it was my misunderstanding: I seem to have read your review from bug 393302, where I )mis)read it as if you required the space to be present. So I started to add it now, doing thing that I personally dislike (I don;t like those spaces!) because of that. Excuse me.

Comment on attachment 9037773 [details] [diff] [review]
SupportUTF8.diff

OK, let's finish it here then.
Attachment #9037773 - Flags: review+
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/1948a0ff606a
Follow-up: Set SupportUTF8 registry key to indicate that Thunderbird is supporting UTF-8 at the MAPISendMail API. r=jorgk DONTBUILD
Comment on attachment 9037532 [details] [diff] [review]
1521007-CP_UTF8.patch

Approval for all four landed changesets.
Attachment #9037532 - Flags: approval-comm-esr60+
Attachment #9037532 - Flags: approval-comm-beta+
Attachment #9037532 - Flags: approval-comm-esr60+
Attachment #9037532 - Flags: approval-comm-beta+
Attachment #9038002 - Flags: approval-comm-beta+
Component: Untriaged → Simple MAPI
Product: Thunderbird → MailNews Core
Depends on: 1531724
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: