[.csv,.txt,.tab only] addressbook export/import: characters outside the locale codepage lost: use UTF8 and/or let the user choose the charset

RESOLVED FIXED in Thunderbird 45.0

Status

defect
--
critical
RESOLVED FIXED
18 years ago
4 years ago

People

(Reporter: sspitzer, Assigned: jorgk)

Tracking

(Blocks 1 bug, {dataloss, intl})

Trunk
Thunderbird 45.0
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: nab-imp)

Attachments

(2 attachments, 2 obsolete attachments)

issues with import / export of non ascii addressbook data

from nsAddressBook.cpp:

// XXX i18n TODO
// is this right?  
// do we want escaped utf8?  base64 encoded data?
// the import code appears to expect it in the system charset
// so we'll do that for now.  
// 
// one reason against doing the system char set:
// my machine is set to US-ASCII, but I can have japanese names in my
// addressbook.  but if I go to export to a .csv or .tab file
// the conversion will fail.

one possible solution is to continue to do this for .cvs or .tab export.

ldif export should be doable, as I think ldif data is required to be in UTF-8 
or something

when I add XML import / export, I'll probably do the same thing as LDIF.

nhotta / ji, comments?
Whiteboard: nab-imp
I think export to a .csv or .tab file needs to cdepend on system charset.
The system needs to support the charset to make the file readable anyway. 
Keywords: intl
Status: NEW → ASSIGNED
Summary: issues with import / export of non ascii addressbook data → [.csv,.txt,.tab only] issues with import / export of non ascii addressbook data
Whiteboard: nab-imp → nab-imp,dmose-dataloss
Keywords: dataloss
Whiteboard: nab-imp,dmose-dataloss → nab-imp
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and
<http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss
bugs are of critical or possibly higher severity.  Only changing open bugs to
minimize unnecessary spam.  Keywords to trigger this would be crash, topcrash,
topcrash+, zt4newcrash, dataloss.
Severity: normal → critical
Depends on: 153882
Product: Browser → Seamonkey
The export function works flawlessly in 1.7.2, but with 1.7.5, I get a hard
error on the export request - "Not enough arguments
[nsIAddressBook.exportAddressBook]" every time.
to comment 3: that's bug 271895
Assignee: sspitzer → mail
Status: ASSIGNED → NEW
I get the same
Not enough arguments [nsIAddressBook.exportAddressBook]
error messsage mentioned by dkowal@paritysys.net, but I don't have any non-ascii data in my address book to the best of my ability to determine. So the title of this bug seems too restrictive. 
This is the current title:
"[.csv,.txt,.tab only] issues with import / export of non ascii addressbook data"
It seems as if this is a more general bug that "Tools>>Export" just doesn't work in 1.7.5. I am going to adjust the bug title to reflect that.
(In reply to comment #5)
> I am going to adjust the bug title to reflect that.

Ah. I don't have permission to change the bug title. Well, someone should change it to be broader. Or create a separate bug report. 

(In reply to comment #5)
> I get the same
> Not enough arguments [nsIAddressBook.exportAddressBook]
> error messsage mentioned by dkowal@paritysys.net, but I don't have any
> non-ascii data in my address book to the best of my ability to determine. So
> the title of this bug seems too restrictive. 
> ...
> It seems as if this is a more general bug that "Tools>>Export" just doesn't
> work in 1.7.5. I am going to adjust the bug title to reflect that.

The bug you mention was fixed by bug 271895 almost a year ago, and is fixed in versions after 1.7.5. Please *always* try the latest version before reporting problems.
*** Bug 173912 has been marked as a duplicate of this bug. ***
At least on Windows, we can use a 'BOM-prefixed' UTF-8 (or UTF-16)
Component: Address Book → MailNews: Address Book
Product: Mozilla Application Suite → Core
Summary: [.csv,.txt,.tab only] issues with import / export of non ascii addressbook data → [.csv,.txt,.tab only] addressbook export/import : characters outside the locale codepage lost
OS: Windows 2000 → All
Hardware: PC → All
Summary: [.csv,.txt,.tab only] addressbook export/import : characters outside the locale codepage lost → [.csv,.txt,.tab only] addressbook export/import: characters outside the locale codepage lost: use UTF8 and/or let the user choose the charset
QA Contact: nbaca → addressbook
Product: Core → MailNews Core
Assignee: mail → nobody
Blocks: 157010
Duplicate of this bug: 254118
Today I got a list of people (Chinese) and eventually I had to re-type everything because it was impossible to import UTF-8 CSV/TAB in any email program and gmail totally failed as well. Please fix this and let me choose the encoding. Encoding is at least as important as which field to map to which info. In 2011 character encoding shouldn't be an issue anymore.
Posted patch Always export as UTF-8 (obsolete) — Splinter Review
... because Microsoft wants it like that:
https://support.microsoft.com/en-us/kb/933855

And of course for users who want to export Asian names from their address books.
Attachment #8645494 - Flags: review?(mkmelin+mozilla)
Gah, despite that article it seems Outlook (at least) does not export to UTF-8, and it also doesn't properly allow importing CSV contacts in UTF-8 with or without BOM.
Well, we just forget about it then.
Comment on attachment 8645494 [details] [diff] [review]
Always export as UTF-8

Forget patch for now. We need something more elaborate to allow selection of export character set. BTW, same is true for import.
Attachment #8645494 - Flags: review?(mkmelin+mozilla)
Gmail offers too options for export
 - gmail csv => utf-8
 - outlook csv => Windows-1252 

I don't think there's really any sense using anything except utf-8. If it works to re-import our exported csv to thunderbird, maybe that's still the best option. Then you can convert the file for outlook if you really need to. As it is now you can't do that.
You decide, the simple patch is there. I think imitating Gmail would make sense, that is, offer two options.

I can confirm that UTF-8 import in Thunderbird works right now, I tried it yesterday: bug 1188306 comment #8. Even Notepad can save UTF-8 as ANSI for the conversion, if required.
To just fix the dataloss issue, we could just CopyUTF16toUTF8 on failure here: hhttp://hg.mozilla.org/comm-central/annotate/7965091ed556/mailnews/addrbook/src/nsAbManager.cpp#l797
Hmm. This is more complicated, isn't it? You need to preprocess all the data to see whether it can be exported with the "system character set". If not, use UTF-8, so CopyUTF16toUTF8 (see: http://mxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgI18N.cpp#50). You can't happily export until you hit an error and then switch to UTF-8. Or am I misunderstanding something here?
Yes you're right, or just start over if we hit an error.
I've been considering adding an option to export either with the "system character set" (current behaviour) or UTF-8. Sadly, the export function has no dialogue. You click "Export..." and up comes the file selection. So where would be a good spot to add the UI to make the choice?
Flags: needinfo?(richard.marti)
Flags: needinfo?(mkmelin+mozilla)
What about after clicking "Export..." showing a dialog with the different file format options and then show the save dialog with the correct format selected? The pro would be, the formats could be better explained. The con would be the additional dialog. But the user needs now also to choose the correct format in file chooser without understanding what they are for.

Now LDIF is pre-selected and I don't know if Outlook understands this format. If the user only sets the file name it could be he can't import this file to Outlook. A better description in the additional dialog could help here to choose the right one.
Flags: needinfo?(richard.marti)
Good approach, thanks. I've wondered in the past what LDIF is. We could also mention the version number in the explanatory text, for example vCard 2.1 in our case. Let's see what Magnus says.
You could do that, but since you still have to select the file you'll save to it would be kind of an abnormal UI. 

For an easy solution you could just add the UTF-8 versions in the format selector

 Comma Separated (Outlook compatible)
 Comma Separated (UTF-8)
 Tab Separated (Outlook compatible)
 Tab Separated (UTF-8)
 vCard
 LDIF

I think LDIF could be last, as I really doubt people use that much.
(I wouldn't add a vCard version number.)
Flags: needinfo?(mkmelin+mozilla)
When it's possible to add this lines to the format selector, then this is better.
I prefer not to mention Outlook. Besides, who knows what will be "Outlook compatible" in the future.
Assignee: nobody → mozilla
Attachment #8645494 - Attachment is obsolete: true
Attachment #8677677 - Flags: ui-review?(richard.marti)
Attachment #8677677 - Flags: review?(mkmelin+mozilla)
Note the the reviewer: The lines in this code are hopelessly longer than 80 characters, so I didn't bother adhering to the limit, however, I did shorten lines where possible.
Comment on attachment 8677677 [details] [diff] [review]
Proposed solution (v1): Let user choose between system charset or UTF-8

Looks good.
Attachment #8677677 - Flags: ui-review?(richard.marti) → ui-review+
Comment on attachment 8677677 [details] [diff] [review]
Proposed solution (v1): Let user choose between system charset or UTF-8

Review of attachment 8677677 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, thx! r=mkmelin

::: mail/locales/en-US/chrome/messenger/addressbook/addressBook.properties
@@ +138,2 @@
>  TABFiles=Tab Delimited
> +TABFilesSysCharset=Tab Delimited  (System Charset)

nit: double space snuck in here
Attachment #8677677 - Flags: review?(mkmelin+mozilla) → review+
Carrying forward Magnus' r+ and Richard's ui-r+
Fixed nit (removed double space).
Thanks for the quick turn-around!
Attachment #8677677 - Attachment is obsolete: true
Attachment #8678337 - Flags: ui-review+
Attachment #8678337 - Flags: review+
Keywords: checkin-needed
Status: NEW → ASSIGNED
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 45.0
You need to log in before you can comment on or make changes to this bug.