Closed Bug 1686034 Opened 4 years ago Closed 4 years ago

Saving/dragging a message from gmail IMAP folder with accent/diacritic (e.g. Envoyés = Sent folder) fails with uncaught exceptions

Categories

(MailNews Core :: Networking: IMAP, defect, P2)

Thunderbird 84

Tracking

(thunderbird_esr78 unaffected, thunderbird85 affected, thunderbird86 affected)

RESOLVED FIXED
86 Branch
Tracking Status
thunderbird_esr78 --- unaffected
thunderbird85 --- affected
thunderbird86 --- affected

People

(Reporter: math.clavel, Assigned: rnons)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(3 files, 6 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:84.0) Gecko/20100101 Firefox/84.0

Steps to reproduce:

I’m using Thunderbird with a French gmail account.
French is using diacritics like é, à, ç_
In French, the gmail "sent" folder is called "Envoyés".

I have noticed the problem on my install. To reproduce, I have used portable version of Thunderbird from portableapps and only added a gmail account with default options.

The problem is not in version 78.6.0, but at least from 84.0_Beta_1 to 85.0b3.
I have reproduced it on English and French version of T.

In a gmail imap folder with a name containing a "é", right-click on a message and select save to disk.

Actual results:

The save windows isn’t displayed.

Expected results:

The save windows should be displayed.

Bonjour Mathieu, merci beaucoup! You are quite right, this is happening and there's a bug somewhere between Thunderbird and gmail!
Given that this affects the localized French version of "Sent" folder (Envoyé), imo this is pretty serious and should be fixed asap, although some important functionality like reply and forward is still functional.

Reproduced on Daily 86.0a1 (2021-01-11) (64-bit)

STR

1.) gmail IMAP account in TB (does not reproduce e.g. on gmx.de)
2.) In that account, create a TB account subfolder / gmail label with an accent/diacritic, e.g. "testé" (location of the folder doesn't matter)
3.) Clear error console
4.) move a random message into that folder (simple saved draft "Hello world" will suffice; I also tested a received message)
5.) Select the message so that it gets displayed in message reader
6.) Ctrl+S to save the selected message / try drag n drop
7.) For comparison, try Reply/Forward

Actual result

  • Step 5: displaying the message in folder with accent (testé) triggers an error:

    Uncaught Exception { name: "NS_ERROR_ILLEGAL_VALUE", message: "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIMessenger.msgHdrFromURI]", result: 2147942487, filename: "chrome://messenger/content/msgHdrView.js", lineNumber: 796, columnNumber: 0, data: null, stack: "onEndMsgDownload@chrome://messenger/content/msgHdrView.js:796:33\n", location: XPCWrappedNative_NoHelper }
    msgHdrView.js:796

  • Step 6: trying to save or drag the message fails with error (does nothing):

    Uncaught Exception { name: "NS_ERROR_ILLEGAL_VALUE", message: "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIMsgMessageService.messageURIToMsgHdr]", result: 2147942487, filename: "chrome://messenger/content/mailCommands.js", lineNumber: 411, columnNumber: 0, data: null, stack: "SaveAsFile@chrome://messenger/content/mailCommands.js:411:55\nMsgSaveAsFile@chrome://messenger/content/mailWindowOverlay.js:2228:13\ndoCommand@chrome://messenger/content/mail3PaneWindowCommands.js:923:9\ndoCommand@chrome://messenger/content/mailTabs.js:842:23\ndoCommand@chrome://messenger/content/tabmail.js:577:27\ngoDoCommand@chrome://global/content/globalOverlay.js:101:18\noncommand@chrome://messenger/content/messenger.xhtml:1:12\n", location: XPCWrappedNative_NoHelper }
    mailCommands.js:411

    Thereafter, there'll be some peripheral errors like failure of cmd_saveAsFile. Drag attempt fails with a slightly different error but apparently same cause.

  • Step 7: Reply / Forward fortunately works without further errors.

Expected Result:

  • Saving or dragging message should succeed and not cause uncaught exceptions.
Status: UNCONFIRMED → NEW
Component: Untriaged → Networking: IMAP
Ever confirmed: true
Product: Thunderbird → MailNews Core
Severity: -- → S2
Priority: -- → P2
Summary: Saving a message to disk from a gmail imap folder containing a diacritic is not working → Saving/dragging a message from gmail IMAP folder with accent/diacritic (e.g. Envoyés = Sent folder) fails with uncaught exceptions

I suspect this is regressed by bug 1571672.

Regressed by: 1571672
Flags: needinfo?(remotenonsense)
No longer regressed by: 1571672

Ah, or maybe you're right about bug 1571672 since this bug pre-dates that. But bug 1685450 is reported for nightly, but that's before bug 1685033. landed.

Bug 1685033 certainly could have had an impact though, so is this bug reproducible on current nightly?

(In reply to Magnus Melin [:mkmelin] from comment #4)

Actually, I very much suspect https://hg.mozilla.org/comm-central/rev/7b6317fb00899a2ded12809a019dd9686e35a61e - bug 1685033. Can you check?

Bug 1685033 changed two lines, I tried reverting the change to nsIMsgFolder.idl . Can still reproduce.

Yes, this bug is reproducible on current nightly.

What I found is https://searchfox.org/comm-central/rev/76c81fad320aa4ab20135ac507c65c9316044d4a/mailnews/imap/src/nsImapMailFolder.cpp#861,883 is different from https://searchfox.org/comm-esr78/rev/95877c623c116297fcdc29d313b22553d7d6a62a/mailnews/imap/src/nsImapMailFolder.cpp#866,888. With nightly, a msg folder named testé will be saved as a mailbox file named testé on disk. But with esr78, the mailbox file name will be test&AOk-.

Flags: needinfo?(remotenonsense)

Thanks, it would be from bug 1571672 then.

Any message in a folder with a name containing multi-byte UTF8 on a server supporting UTF8=ACCEPT won't save a message and won't show source. Turning off utf8 accept in config editor fixes the problem. This is something I never tested when implementing UTF8=ACCEPT in bug 1571672. I'll check into it deeper but I guess a workaround is to set mail.server.default.allow_utf8_accept false and restart tb. This will re-discover the folders and bring back the non-UTF8 encoded names and then works OK. (I did see a server error pop up on restart but it didn't recur and I could save messages again.)

When upgrading from TB 78.6x86 to TB 85.0b3x64, the folder "Envoyés" from Gmail account was duplicated in the interface.
It wasn't on my computer at the time so I can't be sure, but a new one with all the previously sent messages was created.
The old one was emptied (by sync I think) and newly sent messages were copied in it by TB as configured.
I didn't try to save messages from that folder to disk before removing it.

The problem is since bug 1571672, mailbox file can be non-ASCII, as a result msg uri is non-ASCII. But the functions for querying msg from an URI still only accepts char*.

Main changes are three idl files:

  • nsIMsgMailNewsUrl.idl: to fix msgHdr
  • nsIMsgMailSeesion.idl: to fix view source
  • nsIMsgMessageService.idl: to fix save as

Other changes are to make it compile.

Assignee: nobody → remotenonsense
Status: NEW → ASSIGNED

I tried the patch and copy via drag/drop and view source now work! However, when I save the message I'm getting a server error like this:

The current operation on 'gds-<?><?>-envoy<?>s' did not succeed. The mail server for account w4k9vws@gmail.com responded: [NONEXISTENT] Unknown Mailbox: gds-<?><?>-envoy<?>s (Failure).

Where <?> is the "replacement" UTF8 char (a ? inside a black diamond) I think U+FFFD doesn't print in bugilla, sorry.

The mailbox name is actually gds-ää-envoyés.

Before the patch, neither copy, view source nor save-as worked at all.

Is is a good idea to change URIs to UTF-8?

-  readonly attribute ACString URI;
+  readonly attribute AUTF8String URI;

Shouldn't the non-ASCII string in the folder name not just be escaped in the URL (NS_MsgEscapeEncodeURLPath)?
https://searchfox.org/comm-central/search?q=NS_MsgEscapeEncodeURLPath&path=&case=false&regexp=false

How does a local folder with a non-ASCII character get encoded in the mailbox: URL? Have you looked how/why it works when saving from a local folder? Try with one where the name can be encoded in your local ANSI code page (like windows-1252), like "Envoyé" and one where it can't, like "RÓŻNE". Note that TB does rename replacement, for example "RÓŻNE" is 79357a77 on my system. Your system locale shouldn't be UTF-8 (on Windows 10) for these experiments. BTW, the funny hashing up is done here: https://searchfox.org/comm-central/search?q=ConvertibleToNative&redirect=false

A further nit in the patch is:

   nsAutoCString urlString;
   if (NS_SUCCEEDED(tURI->GetSpec(urlString))) {
-    *aURL = ToNewCString(urlString);
-    NS_ENSURE_ARG_POINTER(aURL);
+    aURL = urlString;
   }
   return rv;

You don't need urlString at all any more, you can get the spec directly into aURL. There are more places in the patch like this.

More remarks:

  • test&AOk- is the MUTF-7 encoding of testé.
  • I don't understand the commit message: The problem is since bug 1571672, mailbox file can be non-ASCII, as a result msg uri is non-ASCII.

(Local) mailbox files could always be non-ASCII and IMAP folder names are still encoded in MUTF-7, aren't they? Bug 1571672 added support for RFC 6855, but that's not related to folder names.

I think you should work out where it's going wrong exactly and that fix that. There also seem to be remaining issues looking at comment #11. Additionally, this code here makes no sense:

if (aURI.IsEmpty()) return NS_ERROR_OUT_OF_MEMORY;

Also, this should use .get():

rv = DecomposeMailboxURI(ToNewCString(uri), getter_AddRefs(folder), &msgKey);

OK, I got that part wrong, seems like folder names are in UTF-8 when mail.server.default.allow_utf8_accept is switched on. So instead of
imap://klaus%2Ebartosch%40gmail%2Ecom@imap.gmail.com:993/fetch%3EUID%3E/INBOX/test%26AOk-%3E1
we get
imap://klaus%2Ebartosch%40gmail%2Ecom@imap.gmail.com:993/fetch%3EUID%3E/INBOX/test%C3%A9%3E1
So the folder name testé is escaped in UTF-8.

The debug from comment #14 was from nsMsgMailNewsUrl::CreateURL(). In nsImapService::MessageURIToMsgHdr() the URL is not escaped and we have
imap-message://klaus.bartosch%40gmail.com@imap.gmail.com/INBOX/testé#1
That's with Ping's patch. A little sad that the handling is so inconsistent.

To answer my own question, for mailbox: URLs the escaping is more consistent, in nsMailboxService::MessageURIToMsgHdr() we see
mailbox-message://nobody@Local%20Folders/test%C3%A9#1

Looks like Ping's patch adds the change in nsIMsgMailNewsUrl.idl was just "missed" in bug 1571672 given that things were already moved to UTF-8 here:
https://hg.mozilla.org/comm-central/rev/0054ea73e658#l2.12
https://hg.mozilla.org/comm-central/rev/0054ea73e658#l4.12
and elsewhere in the patch. I'm not sure those changes are correct, maybe proper escaping would have done the trick.

Attached patch debug-imap.patch (obsolete) — Splinter Review

Just a bit of debug.

With the patch, here's what I see in the log when I try to save-as a message. (Again the <?> is invalid "replacement" utf8 chars that bugzilla markup refuses to show.)

2021-01-14 21:39:51.793333 UTC - [Parent 24637: IMAP]: I/IMAP 0x7efd19fa0000:imap.gmail.com:A:SendData: 6 select "gds-<?><?>-envoy<?>s"
2021-01-14 21:39:51.845819 UTC - [Parent 24637: IMAP]: D/IMAP ReadNextLine [rv=0x0 stream=0x7efd2c6d1710 nb=68 needmore=0]
2021-01-14 21:39:51.845834 UTC - [Parent 24637: IMAP]: I/IMAP 0x7efd19fa0000:imap.gmail.com:A:CreateNewLineFromSocket: 6 NO [NONEXISTENT] Unknown Mailbox: gds-<?><?>-envoy<?>s (Failure)

The imap select that occurs when the folder is just selected has the proper mailbox name and works OK with or without the patch:

2021-01-14 21:39:23.323068 UTC - [Parent 24637: IMAP]: I/IMAP 0x7efd1a18a800:imap.gmail.com:A:SendData: 11 select "gds-ää-envoyés"

Looks like Ping's patch adds the change in nsIMsgMailNewsUrl.idl was just "missed" in bug 1571672 given that things were already moved to UTF-8

Yes, I don't see this as a regression but a feature I failed to test for and so it got left out.

Hmm, the more important part of the comment was: "I'm not sure those changes are correct, maybe proper escaping would have done the trick" given that mailbox: URLs are always ASCII even if they contain non-ASCII characters which are escaped (mailbox-message://nobody@Local%20Folders/test%C3%A9#1). I left a comment in bug 1571672 which has a few more issues. Overall, it would be good if all the URL schemes followed the same rules. With Ping's patch you further cement-in the diversion: Escaping in mailbox: URLs and UTF-8 in imap: URLs. Maybe the reversal of some of the changes in bug 1571672 and escaping the imap: URL would be the better fix here.

Let's say it another way: mailbox: URLs could always contain non-ASCII characters, that's why they did proper escaping throughout. IMAP folders were always in MUTF-7 (which uses 7bit (ASCII) characters). So there proper escaping wasn't necessary. So instead of adding it in bug 1571672, everything was moved to UTF-8 and a few bits were missed. However, for consistency, escaping would potentially have been the better way.

Attached patch fix-save-as.diff (obsolete) — Splinter Review

Ignoring Klaus B's issues for now, this fixes the save-as problem I mentioned in comment 11. Whoever wrote the original code seemed a bit uncertain if the encoding is UTF8 and it turns out it's now not. It is more like single byte ISO-<whatever> where the character is encoded in a single non-ascii byte like octal 344 in this example:

Alt-d   \344   228   E4  U+00E4  &auml;     ä  small a, umlaut mark

The changes I made here were actually in my obsolete patches from when I added UTF8=ACCEPT but, for unknown reason, I took them out. (Probably because I didn't test save-as, copy and view source on a UTF8 named folder so I missed the effect this has.)

I tested this on top of Ping's patch so I assume it is also needed to fix all the problems.
P/S: This probably only works for Western chars that are encoded in a single byte. Not sure about other sets, but probably not.

I agree escaping is a possible fix, but one downside is having to decide when to escape and when to unescape.

Gene, I tried creating a folder named gds-ää-envoyés, seems to work fine for me.

Ping, I don't think the problem is creating the folder. Your patch fixed copying to another folder in the account and viewing message source. But it fail when I tried to do a save-as. With my patch with yours, save-as works, at least for the folder named gds-ää-envoyés. I was going to test with other names having non-western chars.

Well, I tried moving a msg into the folder, and view source, save-as worked fine.

Target Milestone: --- → 86 Branch

Comment on attachment 9197260 [details] [diff] [review]
fix-save-as.diff

This is another band-aid on top of the confusion of which encoding you're dealing with. Looks like we have three encodings: UTF-8, MUTF-7 and also some ANSI/ISO/windows-* encoding. The latter can come from the operating system encoding of file names. On Windows, it uses the local ANSI code page, so windows-1252 for Western systems, and many others for Eastern European systems.

The replacement character <?> happens, when such a "native" ANSI string is interpreted as UTF-8. The results depend on the OS settings, for example which system locale is selected on Windows. Windows 10 also supports UTF-8 as system locale in which case you won't see the issue. Mac generally uses UTF-8 for filenames and on Linux, it's anyone's guess, it's just a byte string that can be interpreted. Looking at bug 1686415 comment #0 you've managed to confuse Debian's file manager by writing a file name it can't interpret ("Messages envoys.msf (codage non valide)"), likely an ANSI/ISO/windows-* character where UTF-8 was expected.

So instead of spot-fixing things what may or may not fail on a particular system, IMHO, you should first define which charset is used internally. If you then interface with the file system, you need to use NS_CopyNativeToUnicode() and friends.

As stated before, mailbox: URLs are "escaped" (percent-encoded) to the very end, no UTF-8 is pushed into them.

The way forward here is to carefully inspect all the charset handling in from bug 1571672 and fix it. You also need to decide whether UTF-8 is really needed in those URLs.

Please don't check this in with all the problems mentioned in comment #12 and comment #13.

Attached image 1686034.png

Well, I tried moving a msg into the folder, and view source, save-as worked fine.

That's possible, but there are so many encoding issues here that depend on parameters which can be different between you and Gene: OS, which charset does the OS use for file names, what is the folder charset, etc. Please refer to comment #24 for a list of issues. Sure, some scenarios will be fixed by your patch. I'm on Windows with system locale windows-1252 and "Save as" from an IMAP folder called testé results in this. Which OS and system locale are you using?

These are fixes for the issues I mentioned.
Also fixes https://hg.mozilla.org/comm-central/rev/0054ea73e658#l16.149 from bug 1571672.

Altogether I don't think pushing UTF-8 here is the right way to go, but if you do, you should fix the string processing at the very least. All this needs a (big) follow-up to address the charset processing.

Attachment #9197277 - Flags: review?(mkmelin+mozilla)
Attachment #9197277 - Flags: feedback?(remotenonsense)

Holding off on the checkin-needed-tb for now until the discussion is done.

See Also: → 1686415

I've looked at this for a while and debugged it a bit. For making saving a message work, you will likely need to convert nsMessenger::SaveAs(const nsACString& aURI, ...) and underlying SaveMessageToDisk(). This will ripple on badly, just look at all the nsACString URI parameters everywhere:
https://searchfox.org/comm-central/search?q=ACString.*url&path=.idl&case=false&regexp=true

It might be cheaper to percent-encode/escape the URLs instead of driving UTF-8 any further.

Attached patch fix-save-as.patch (obsolete) — Splinter Review

This fixes the "Save As" for me. As I said in the previous comment, there are many other places where things need to get the same treatment, for example, when saving attachments.

For the people who want to understand what's going on, here's the theory:

Looks like I was wrong when saying that lack of NS_CopyNativeToUnicode() was the cause (comment #24). The issue seems to be this: The user selects a message and the message URI is correctly encoding the testé as Unicode. That URI is then passed to nsIMessenger.saveAs() and meets a ACString parameter. Note the JS string is UTF-16 encoded, but that UTF-16 will then be concerted to whatever XPCOM thinks is appropriate for ACString. Now, despite popular belief, ACString is not restricted to ASCII, it's restricted to so-called Latin1 which is essentially windows-1252. So we get an unwanted conversion from Unicode to ANSI, and it's downhill from there. I did break at nsImapService::SaveMessageToDisk() and saw testé there, and I believe that what I saw was ANSI (one byte for é) and not UTF-8, 2 bytes.

Gene observed quite correctly that NS_ASSERTION(mozilla::IsUtf8(fileName), fails, but the fix is to pass in proper UTF-8 there.

That said, the theory doesn't explain why "Save As" works for Ping and not for us.

I rest my case for a while and let other's decide whether they want to drive the "UTF-8 in URL" game further or drop back to percent-encoding like for mailbox: URLs.

(In reply to Klaus B. from comment #29)

I've looked at this for a while and debugged it a bit. For making saving a message work, you will likely need to convert nsMessenger::SaveAs(const nsACString& aURI, ...) and underlying SaveMessageToDisk(). This will ripple on badly, just look at all the nsACString URI parameters everywhere:
https://searchfox.org/comm-central/search?q=ACString.*url&path=.idl&case=false&regexp=true

Would every ACString in the search need to change to AUTF8String even though most of those are in the /mozilla area?

It might be cheaper to percent-encode/escape the URLs instead of driving UTF-8 any further.

Not saying this is wrong, but I think the uri in the url string is (usually) copied into the folder name sent via imap. With UTF8 turned on, the folder name sent via IMAP must be utf8. The server error you show in comment 26 is due to the string sent via imap not being the utf8 string the server expects.
P/S: This collided with comment 30 which I haven't read yet so it may or may not be relevant

Ok, I'm confused. I see at least 3 "Klaus B" patches. Are they all needed or are some obsolete? I just want to test if I can do save-as at this point. I'm sure I need the last one in comment 30. Also, I assume these apply on top of Ping's patch in comment 10.

OK, my comment #30 is correct. Try adding this to nsImapService::SaveMessageToDisk():

  printf("=== nsImapService::SaveMessageToDisk |%s|\n", aMessageURI);
  for (size_t i = 0; i < strlen(aMessageURI); i++) printf("=== nsImapService::SaveMessageToDisk char |%c|\n", aMessageURI[i]);

Result is:

=== nsImapService::SaveMessageToDisk |imap-message://klaus.bartosch%40gmail.com@imap.gmail.com/INBOX/testé#1|
=== nsImapService::SaveMessageToDisk char |i|
=== nsImapService::SaveMessageToDisk char |m|
=== nsImapService::SaveMessageToDisk char |a|
=== nsImapService::SaveMessageToDisk char |p|
[snip]
=== nsImapService::SaveMessageToDisk char |é|
=== nsImapService::SaveMessageToDisk char |#|
=== nsImapService::SaveMessageToDisk char |1|

So there you have your single-character é in ANSI.

Would every ACString in the search need to change to AUTF8String even though most of those are in the /mozilla area?

Sorry, mostly in mailnews/base/public/nsIMessenger.idl, but all the other ones in mailnews/ need inspection.

Not saying this is wrong, but I think the uri in the url string is (usually) copied into the folder name sent via imap. With UTF8 turned on, the folder name sent via IMAP must be utf8. The server error you show in comment 26 is due to the string sent via imap not being the utf8 string the server expects.

Well, what to send via IMAP and how you handle URLs are two different things. It only happens that the folder name is in the URL. You are quite right in saying that my change in attachment 9197473 [details] [diff] [review] is far away from the IMAP issue. I assume the ANSI string that gets into "SaveAs" does some further damage down the road, as I said: "it's downhill from there". If you're URL comes to you with an ANSI character in it, you're, well, in a bad position.

Patches: Only the last one (fix-save-as.patch) is needed for testing.

Ben, can you please comment on this. You've worked on the de-RDF and on
mailnews/base/public/nsIFolderLookupService.idl
30 nsIMsgFolder getFolderForURL(in ACString aUrl);
41 nsIMsgFolder getOrCreateFolderForURL(in ACString aUrl);
IMAP is has busily converted URL parameters to AUTF8String here
https://hg.mozilla.org/comm-central/rev/0054ea73e658#l2.12
https://hg.mozilla.org/comm-central/rev/0054ea73e658#l4.12 (and more)
and a lot of APIs were missed that's why there are now some encoding issues when JS-UTF-16-string gets passed into a ACString parameter and silently converted to Latin1 :-( - What's your take on going UTF-8 everywhere? It's certainly not what the mailbox: URLs are doing.

Flags: needinfo?(benc)

Maybe OT and/or a gmail bug or feature?:
I have a gmail folder called .bashrc for testing another related bug. When I copy messages to it from another folder having non-ascii name and then delete the message in .bashrc, the message gets deleted in the non-ascii named folder that I copied from. It's like the label get removed from both when I only intent to remove it from folder .bashrc. (Probably source folder being non-ascii not important, haven't tried ascii named folder.)

(See edit below)
With Klaus' comment 30 patch along with Ping's patch I can now save-as emails on linux within folder names
gds-ää-envoyés
Skråp-bucketᄂ

The 2nd one has a non-Latin1 char U+FFA4 which is 3 bytes 0xEF 0xBE 0xA4.

The filenames I see for these are:
gds-ää-envoyés.msf
Skråp-bucketᄂ.msf

Unfortunately I also see these phantom files getting created too alongside the "good" files:
gds-ää-envoyés.msf
Skråp-bucketᄂ.msf
The 2nd one took a bit longer to appear.

In the imap log, the "proposed url" seems to show utf8 strings (that get sent to imap server):

2021-01-15 22:03:16.428539 UTC - [Parent 33443: Main Thread]: D/IMAP proposed url = Skräppost--åeds folder for connection Skråp-bucketᄂ has To Wait = false can run = false
2021-01-15 22:03:16.428594 UTC - [Parent 33443: Main Thread]: D/IMAP proposed url = Skräppost--åeds folder for connection gds-ää-envoyés has To Wait = false can run = false

And for the "currentURL" imap log shows % encoded:

2021-01-15 22:03:14.217789 UTC - [Parent 33443: IMAP]: I/IMAP 0x7f91a2d32800:imap.gmail.com:NA:ProcessCurrentURL:imap://w4k9vws%40gmail%2Ecom@imap.gmail.com:993/select%3E/Skr%C3%A5p-bucket%EF%BE%A4: = currentUrl

Of course, the folder names are encoded in imap strings and sent as correct utf8, e.g.:

2021-01-15 22:03:14.740320 UTC - [Parent 33443: IMAP]: I/IMAP 0x7f91a2d32800:imap.gmail.com:A:SendData: 6 select "Skråp-bucketᄂ"

Finally, I just noticed that messages with well-defined .jpg attachments show the attachment inline OK but there are no links at the bottom for downloading the attachments. Don't know what is causing this. So, I can't check if the patch(s) affect attachment downloads.

Edit: I just reverted both patches back to trunk with "hg shelve" and can now see attachment links and save .jpg attachments OK. Putting the patches back (hg unshelve) and now I can't see the attachment links again.

Updated D101706 to include Klaus's patch, thanks a lot. I can't explain how it worked for me though.

Finally, I just noticed that messages with well-defined .jpg attachments show the attachment inline OK but there are no links at the bottom for downloading the attachments. Don't know what is causing this. So, I can't check if the patch(s) affect attachment downloads.

Gene, good catch! Fixed as well.

My patch only fixed the known problems, I agree with Klaus that there are a whole lots of URI types need to be changed to AUTF8String. On the other hand, we can consider reverting to use MUTF7 for mailbox filename. My patch only affects the boundary between JS <-> cpp. Reverting to use MUTF7 for mailbox filename can be done independently.

Attachment #9197277 - Flags: feedback?(remotenonsense)

👍 (U+1F44D): The attachment links are now showing and save OK within the folders with non-ascii chars, e.g., folder "Skråp-bucketᄂ👍" after reverting previous patches from this bug and applying patch https://phabricator.services.mozilla.com/D101706?id=387443&download=true

Just a general comment: Bug 1571672 made changes so the very core of Thunderbird: What Thunderbird understands to be a folder URL. Before that was and ASCII string, now you're allowing UTF-8. As you saw, that led to some "basic" problems. However, if you go down this road, you need to re-test the entire system on messages in those folders and the folders themselves:

  • move message in and out of the folder, delete message
  • save message, save attachment from message, delete/detach attachment from message
  • rename, delete folder
  • search folder, run quick search on folder, include folder in filter action
  • does Gloda still correctly index messages in such a folder
  • compact, repair folder
  • convert folder storage from mailbox to maildir and vice versa
    I'm sure I have forgotten many things.

First we need to decide whether allowing UTF-8 in folder URIs is desired, if so, all this needs to be tested.

Comment on attachment 9197277 [details] [diff] [review]
fix-string-issues.patch (needs rebasing)

Ping, I'd appreciate if you didn't ignore my input here. I can see that you adopted one hunk from my patch here:

-  nsAutoCString urlString;
-  if (NS_SUCCEEDED(tURI->GetSpec(urlString))) {
-    *aURL = ToNewCString(urlString);
-    NS_ENSURE_ARG_POINTER(aURL);
-  }
+  rv = tURI->GetSpec(aURL);

I attached the patch to show you what needs to be fixed in your patch. You can either take all the changes into your patch, or leave them in the follow-up patch. Let me repeat again that

+  rv = DecomposeMailboxURI(ToNewCString(uri), getter_AddRefs(folder), &msgKey);

is wrong and

-  *aURI = ToNewCString(mURI);
-  if (!*aURI) return NS_ERROR_OUT_OF_MEMORY;
+  aURI = mURI;
+  if (aURI.IsEmpty()) return NS_ERROR_OUT_OF_MEMORY;

is equally wrong. It's regrettable that this was approved in a review.

So please look at the patch.

Attachment #9197277 - Flags: feedback?(remotenonsense)

Comment on attachment 9197016 [details]
Bug 1686034 - Use AUTF8String for uri attribute/arguments to support non-ASCII msg folder. r=mkmelin

This needs another review and the issues pointed out in fix-string-issues.patch need to be fixed.

save message, save attachment from message, delete/detach attachment from message

I assume that would need further changes to nsIMessenger.idl like here:

void saveAttachment(in ACString contentTpe, in ACString url, in ACString displayName, in ACString messageUri, in boolean isExternalAttachment);
Attachment #9197473 - Attachment is obsolete: true
Attachment #9197187 - Attachment is obsolete: true

(In reply to Klaus B. from comment #41)

Comment on attachment 9197277 [details] [diff] [review]
fix-string-issues.patch

Ping, I'd appreciate if you didn't ignore my input here. I can see that you adopted one hunk from my patch here:

I made some changes based on your comment, sorry I didn't open the patch.

What's the difference between PromiseFlatCString and ToNewCString?

Are the changes of nsCString to nsACString in your patch really necessary?

(In reply to Ping Chen (:rnons) from comment #44)

What's the difference between PromiseFlatCString and ToNewCString?

Please do your own research on the matter. ToNewCString allocates a new string that the caller needs to free later. In your case, it's leaked since you don't capture the result. PromiseFlat(C)String is some C++ voodoo magic that guarantees that a nsA(C)String is really null terminated so you can use the .get() method on it.

Are the changes of nsCString to nsACString in your patch really necessary?

That depends on the definition of "really necessary". Here an example. My proposed change is:

-      nsAutoCString uriStr;
-      nsBuildLocalMessageURI(baseMessageURI.get(), m_messageKey, uriStr);
-      aURI = uriStr;
+      nsBuildLocalMessageURI(baseMessageURI.get(), m_messageKey, aURI);

Will the code without the change work? Yes. Is the code elegant? No. Is the code efficient? No, there is processing around the unnecessary temporary variable and extra string allocation and copying. Do I need the nsCString to nsACString change? Yes, I do since sadly nsBuildLocalMessageURI() needlessly has a nsCString parameter which infringes the Mozilla string guide here:
https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings
Note that some function do have a nsCString parameter for a reason, but it should be changed if possible. Strings in Mozilla are very tricky and it takes a while to get a good grip on them. But there are some simple recipes you can follow. I'm happy to answer any questions.

In order to maintain the quality and maintainability of the code base, all the changes I proposed are absolutely necessary. Let me mention that development has traditionally followed (mild) "boy scout" rules, that means that if you come to a (camping) site that could use some clean-up you do the clean-up to improve the code base. Feel free to merge all the changes into your patch (as you've already partly done), I claim no authorship. Cancelling a feedback request without actually looking at the content is not recommended.

Apologies for sneaking

-
-  // Extra step here prevents chars flagged as invalid, don't know why.
-  nsAutoString temp;
-  temp.Assign(nsDependentString(aName));
-  return SubscribeToFolder(temp, true, nullptr);
+  return SubscribeToFolder(nsDependentString(aName), true, nullptr);

into the patch. That corrects some damage done in bug 1571672. I though I'd slip it in here since this bug here is correcting issues introduced there. In view of all the issues from that bug 1571672 (see bug 1571672 comment #86) you can add it to your patch or leave it out since there will be more clean-up needed.

Anyway, and I know it sounds like a broken record: I suggest to first evaluate the idea of UTF-8 in URLs before going any further. That's why I NI'ed BenC who is the resident C++/backend guru who had a lot of contact with this area in de-RDF.

Thanks, updated to use PromiseFlatCString.

I didn't pick nsCString to nsACString changes. My goal is to update IDL files and do minimal changes to get it compile. I can see your point of improving the code base. But I think it can and should be done separately. I can also argue why using PromiseFlatCString, why not changing DecomposeMailboxURI to take nsACString as well, then if following the line, the patch size can't be controlled.

As I said in comment 38, whether to use UTF8 folder name can be decided separately. I don't think my patch needs to be backed out or blocked by it.

I can also argue why using PromiseFlatCString, why not changing DecomposeMailboxURI to take nsACString as well ...

Good point, that could indeed be done but then the change would ripple on inside the function. Changing to nsACString doesn't have a ripple-on effect.

whether to use UTF8 folder name can be decided separately. I don't think my patch needs to be backed out or blocked by it.

Well, if is was decided not to go with UTF-8 in URIs, your patch wouldn't be necessary at all and we'd have to find a different solution.

On a different topic, did you read comment #40 and #43? You made a few changes in nsIMessenger.idl, but have you tested deleting/saving an attachment from a message in such a folder?

Comment on attachment 9197277 [details] [diff] [review]
fix-string-issues.patch (needs rebasing)

This needs rebasing since a few of the hunks here were put into the main patch. Personally, there wouldn't be much left and I don't see why "minimal changes to get it compile" would be a guiding principle when there are some easy additional wins to be made.

Attachment #9197277 - Attachment description: fix-string-issues.patch → fix-string-issues.patch (needs rebasing)
Attachment #9197277 - Flags: feedback?(remotenonsense)
Flags: needinfo?(benc)

Ben, that wasn't quite the expected answer. Please refer to comment #35.

Flags: needinfo?(benc)

Sorry, didn't mean to clear the NI - still looking at it!

BTW, most of the bugs summarized in bug 124287 are about special characters (?/#) in folder names, not about encoding issues. I can only see two encoding bugs, bug 822230 and bug 689942 and it's well possible those have been fixed in the meantime. This issue here is a regression from bug 1571672 which introduced new encoding issues, by admitting UTF-8 into URLs which don't pass through certain interfaces. Comment #37 is more proof of that since, for example, gds-ää-envoyés is the ANSI interpretation of the UTF-8 representation of ä and é. So those new issues should not go onto the "old pile" in the meta bug. Please comment on the idea of admitting UTF-8 into URIs and also the consequences for the two functions quoted in comment #35.

After the mid-air collision: Thanks for looking into it!

Here are my thoughts:

Ultimately, I think we should be using nsIURI for passing URIs about. That would remove all ambiguity and chance for errors - there's very well defined points for setting and mutating them and it's always very clear what the rules are.
But that's not going to happen for a long long time, as all our nsIURI-derived classes are hopelessly mixed up with request state and can't be used as URIs. Sigh.

In any case, we do have to support unicode in our URIs. So the question is:

whenever we have a uri being passed as a string, how do we know what kind of uri encoding the function is expecting?

I would say that an obvious convention would be:

  • AUTF8String: a utf-8 string containing a URI, with no percent-encoding.
  • ACString: an ASCII string, containing a percent-encoded URI (uri is converted to utf-8 first, then percent-encoded)
  • AString: a utf-16 (or is it ucs-2?) string, no percent-encoding.

(Another convention would just be to ensure that every string URI parameter had a comment, saying which form it was expected to be in. But I can't say I like that idea).
All functions that take URIs as strings should be following these same conventions, and any which don't should be considered incorrect, and fixed.

If we can decide on the conventions, then suddenly it's much clearer which functions are correct and which ones need to be fixed.

I realise this doesn't directly address the concrete IMAP folder-name issue in this bug, but I think it's an important first step to sorting this stuff out (and a huge number of the ones under bug 124287).
I can dig into this bug specifically if anyone needs me to (although it might be a couple of days before I can get onto it).

Flags: needinfo?(benc)

Oh, and going with the above convention, I'd say AUTF8String (a non-percent-encoded URI) should be the preferred one, which fits in with Pings changes (although I think a the implementations probably expect percent-encoding, so would require some more work).

I think also more in general, strings should move towards AUTF8String, or there are easily encoding surprises when implementing the component in JavaScript instead. The only question is when to do what, as certain changes has very large ripple-effects.

If we can decide on the conventions, then suddenly it's much clearer which functions are correct and which ones need to be fixed.

Somehow those conventions are mutually exclusive. Once the first URL type moves to UTF-8/AUTF8String, you need to change all interfaces to that type, as we're seeing here, although some other URLs are still percent-encoded, like mailbox: URLs. So at least during a transitional period, it's actually more unclear since a lot of percent-encoded URLs will pass through interfaces designed for UTF-8 URLs. I haven't looked at the other URL types we have in the system (news:, smtp:, mailto:, ldap: and other address book types).

That said, Ben, do you see any need to "upgrade" getFolderForURL(in ACString aUrl) and getOrCreateFolderForURL(in ACString aUrl)? My concern is that moving to UTF-8 will need careful retesting as per comment #40. We can already see that encoding issues bugs are popping up.

Note some ugly special cases already implemented here:
https://searchfox.org/comm-central/rev/a6cd650b1ba33f48251a94769bfda0009cf803e8/mail/components/extensions/parent/ext-mail.js#1480

Altogether this seems like a mediums size project that unfortunately comes trailing as a regression fix from bug 1571672.

Off-topic:
Also, UTF-8 is not the "cure all" fix, since at least for interfacing with the Windows file system, you still need to convert to the "native" ANSI code page, which on Western systems is windows-1252 (Latin1), but not on Eastern European systems (windows-1250), let alone Greek (windows-1253), Russian (windows-1251), Turkish (windows-1254), Japanese (windows-932), etc. systems. Fun reading here:
https://searchfox.org/mozilla-central/rev/dac45cc7020dfddbcc937827810dd11550c07dc3/xpcom/io/nsNativeCharsetUtils.cpp#27
CP_ACP is the local ANSI code page that corresponds to the system locale.

Comment on attachment 9197277 [details] [diff] [review] fix-string-issues.patch (needs rebasing) Review of attachment 9197277 [details] [diff] [review]: ----------------------------------------------------------------- Looks reasonable, f+ for now since unclear which parts would apply
Attachment #9197277 - Flags: review?(mkmelin+mozilla) → feedback+
Attached patch fix-string-issues.patch (obsolete) — Splinter Review

Same as before, two hunks removed which made it into the main patch. I suggest Ping adopt the changes into his patch. I noticed that in nsNntpService::MessageURIToMsgHdr() there is still an incorrect DecomposeNewsMessageURI(ToNewCString(uri), ..., so it really doesn't make much sense to split this.

Attachment #9197277 - Attachment is obsolete: true
Attached patch D101706.diff (obsolete) — Splinter Review

Merged patch. Sorry no HG header since Phab's "download raw diff" doesn't provide it.

Could someone please address comment #40 comment #43 and the second paragraph in comment #55.

(In reply to Klaus B. from comment #55)

Somehow those conventions are mutually exclusive. Once the first URL type moves to UTF-8/AUTF8String, you need to change all interfaces to that type

You don't have to change all the interfaces. But you might have to percent-encode/decode as required before calling an interface.
And if the encoding/decoding gets to be too annoying then that's an argument for changing some interfaces.

, as we're seeing here, although some other URLs are still percent-encoded, like mailbox: URLs. So at least during a transitional period, it's actually more unclear since a lot of percent-encoded URLs will pass through interfaces designed for UTF-8 URLs.

("UTF-8 URL" is probably not a helpful term. All our URIs could potentially contain unicode parts, the question is: percent-encoded UTF-8 or plain UTF-8 (or even UTF-16)?).

That said, Ben, do you see any need to "upgrade" getFolderForURL(in ACString aUrl) and getOrCreateFolderForURL(in ACString aUrl)?

Amazingly enough, the nsIFolderLookupService comments are already pretty clear that it wants it's URIs percent-encoded (although it's not very clear that they should be percent-encoded UTF-8, as per general URI conventions, so I will update it).
So leaving them as ACString seems fine. If we find loads of code having to perform percent-encoding before calling getFolderForURL(), then maybe that's an argument for changing it to take AUTF8String instead.

My concern is that moving to UTF-8 will need careful retesting as per comment #40. We can already see that encoding issues bugs are popping up.

Note some ugly special cases already implemented here:
https://searchfox.org/comm-central/rev/a6cd650b1ba33f48251a94769bfda0009cf803e8/mail/components/extensions/parent/ext-mail.js#1480

Well, at least that one's consistent: nsIMsgDBFolder.URI is defined as AUTF8String, thus no percent-encoding.
(but I'll bet at least one of the other folder types does percent-encode it's URI attribute ;-)

Altogether this seems like a mediums size project that unfortunately comes trailing as a regression fix from bug 1571672.

Yep, it's an arse. But if it wasn't bug 1571672 that brought it up, then some other bug would have. The root cause is that we've been really inconsistent about how URIs are passed about and how they are mapped to things like IMAP names or to filesystem names or whatever...

Off-topic:
Also, UTF-8 is not the "cure all" fix, since at least for interfacing with the Windows file system, you still need to convert to the "native" ANSI code page, which on Western systems is windows-1252 (Latin1), but not on Eastern European systems (windows-1250), let alone Greek (windows-1253), Russian (windows-1251), Turkish (windows-1254), Japanese (windows-932), etc. systems. Fun reading here:
https://searchfox.org/mozilla-central/rev/dac45cc7020dfddbcc937827810dd11550c07dc3/xpcom/io/nsNativeCharsetUtils.cpp#27
CP_ACP is the local ANSI code page that corresponds to the system locale.

And mapping URI parts to/from filesystem names has a whole heap of other considerations too (eg don't call anything CON, PRN, AUX, NUL etc on windows). Knowing for certain that your URI is percent-encoded, plain UTF-8 or UTF-16 is a good starting point. But you need an explicit mapping to take the chunk representing the folder name, say, and to turn it into a filesystem-safe path. Currently this is scattered all over the place, and so many places in the code just pick a chunk out of a URI and use that as a directory name. Hence the number of bugs gathered under bug 124287...

I noticed that in nsNntpService::MessageURIToMsgHdr() there is still an incorrect DecomposeNewsMessageURI(ToNewCString(uri),

Thanks, updated.

Save attachment seems to work for me, I didn't test delete attachment. If it doesn't work, I would be happy to fix it in a followup patch. I think each bug/patch has its scope. I prefer to not include too much in one patch.

May I recall the course of action here: In bug 1571672 it was decided that UTF-8 encoding is a good thing to replace MUTF-7 for folder names. Without looking at the consequences, interfaces in nsIMessenger.idl and nsIMsgMessageService.idl where changed from ACString to AUTF8String for convenience reasons. It's not even clear why that was done, most likely the existing code just added the folder name into the URI which was no longer ASCII/MUTF-7 but UTF-8 now and something stopped working. Instead of looking at the issue and the boarder picture, the API was changed so a particular issue was resolved:
https://hg.mozilla.org/comm-central/rev/0054ea73e658#l2.12
https://hg.mozilla.org/comm-central/rev/0054ea73e658#l4.12 (and more)

If your rules were true, all URLs passing through those generic interfaces would now have to drop the percent-encoding. That is not the case since the other big URL class, mailbox:, still does percent-encoding. In this bug here, further rippling-on of moving to AUTF8String is promoted, breaking the rules you established even further. And if my predictions are right, there will be more changes required to handle saving/deleting of attachments (I could be wrong on this).

IOW, if your rules were true, those interfaces would need to be reverted to ACString since via other URL types they carry percent-encoded URLs. Instead, the IMAP processing should have percent-encoded the URLs instead of changing generic APIs with a huge ripple-on effect and now having "raw" UTF-8 and percent-encoded UTF-8 pass through the same API. I agree that if the aim is to move to "raw" UTF-8 URLs, then you have to start somewhere, "widen" the API and infringe the rules as is proposed here. IMHO that would best be done after some stock take, not in regression-fix mode.

Bug 124287 only references two encoding issues, the rest is special character stuff (?\/#.^, etc.).

Attachment #9197729 - Attachment is obsolete: true
Attachment #9197731 - Attachment is obsolete: true
Attachment #9197844 - Flags: review?(mkmelin+mozilla)

The patches seem to fix up the "Save As" for me with a "testé" folder.

I completely retract my proposed convention of using AUTF8String for non-percent-encoded URIs.
Turns out AUTF8String is used for percent-encoded stuff all over the place (including by nsIURI and nsIURL, which I'm kind of taking as the gold standard here).

  • So if you're passing percent-encoded URIs, then use ACString or AUTF8String. Makes no difference.
  • For non-percent-encoded URIs (like nsIMsgFolder.URI), it should be AUTF8String.
  • Main thing: if there's any possible confusion about the encoding, it needs documenting in a comment.

Definitely a few other issues. I just entered Bug 1687412 as a result of wading through this stuff in the debugger. It causes an extra (badly-named) folder to be created, and it still occurs with these patches. But it could have been doing it for ages...

... I would be happy to fix it in a followup patch.

Please had over to bug 1686415.

Ben, re. bug 1687412: Have you seen comment #37? gds-ää-envoyés.msf is UTF-8 interpreted as ANSI.

If you don't mind, I'm setting checkin-needed-tb now.

Usually you get review on all the patches in a bug before checking in.

Comment on attachment 9197844 [details] [diff] [review] fix-string-issues.patch Review of attachment 9197844 [details] [diff] [review]: ----------------------------------------------------------------- LGTM, r=mkmelin
Attachment #9197844 - Flags: review?(mkmelin+mozilla) → review+

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/e30558214296
Use AUTF8String for uri attribute/arguments to support non-ASCII msg folder. r=mkmelin
https://hg.mozilla.org/comm-central/rev/817298abd30f
Follow-up: Simplify string processing. r=mkmelin

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

Save attachment seems to work for me, I didn't test delete attachment. If it doesn't work, I would be happy to fix it in a followup patch.

Sadly my predictions from comment #40 turned out to be true: Head to bug 1687452.

Attachment #9197260 - Attachment is obsolete: true
Keywords: regression
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: