Last Comment Bug 117236 - [.csv,.txt,.tab only] addressbook export/import: characters outside the locale codepage lost: use UTF8 and/or let the user choose the charset
: [.csv,.txt,.tab only] addressbook export/import: characters outside the local...
Status: RESOLVED FIXED
nab-imp
: dataloss, intl
Product: MailNews Core
Classification: Components
Component: Address Book (show other bugs)
: Trunk
: All All
-- critical with 7 votes (vote)
: Thunderbird 45.0
Assigned To: Jorg K (GMT+1)
:
:
Mentors:
: 173912 254118 (view as bug list)
Depends on: 153882
Blocks: 157010
  Show dependency treegraph
 
Reported: 2001-12-28 11:14 PST by (not reading, please use seth@sspitzer.org instead)
Modified: 2015-11-10 03:53 PST (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Always export as UTF-8 (2.04 KB, patch)
2015-08-09 13:56 PDT, Jorg K (GMT+1)
no flags Details | Diff | Splinter Review
Screenshot of proposed solution. (8.37 KB, image/png)
2015-10-22 13:34 PDT, Jorg K (GMT+1)
no flags Details
Proposed solution (v1): Let user choose between system charset or UTF-8 (11.93 KB, patch)
2015-10-22 13:39 PDT, Jorg K (GMT+1)
mkmelin+mozilla: review+
richard.marti: ui‑review+
Details | Diff | Splinter Review
Proposed solution (v1b): Let user choose between system charset or UTF-8 (11.93 KB, patch)
2015-10-23 14:05 PDT, Jorg K (GMT+1)
jorgk: review+
jorgk: ui‑review+
Details | Diff | Splinter Review

Description User image (not reading, please use seth@sspitzer.org instead) 2001-12-28 11:14:58 PST
issues with import / export of non ascii addressbook data

from nsAddressBook.cpp:

// XXX i18n TODO
// is this right?  
// do we want escaped utf8?  base64 encoded data?
// the import code appears to expect it in the system charset
// so we'll do that for now.  
// 
// one reason against doing the system char set:
// my machine is set to US-ASCII, but I can have japanese names in my
// addressbook.  but if I go to export to a .csv or .tab file
// the conversion will fail.

one possible solution is to continue to do this for .cvs or .tab export.

ldif export should be doable, as I think ldif data is required to be in UTF-8 
or something

when I add XML import / export, I'll probably do the same thing as LDIF.

nhotta / ji, comments?
Comment 1 User image ji 2002-01-08 11:14:05 PST
I think export to a .csv or .tab file needs to cdepend on system charset.
The system needs to support the charset to make the file readable anyway. 
Comment 2 User image Brant Gurganus 2003-01-18 17:58:09 PST
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and
<http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss
bugs are of critical or possibly higher severity.  Only changing open bugs to
minimize unnecessary spam.  Keywords to trigger this would be crash, topcrash,
topcrash+, zt4newcrash, dataloss.
Comment 3 User image dkowal 2005-02-28 11:53:57 PST
The export function works flawlessly in 1.7.2, but with 1.7.5, I get a hard
error on the export request - "Not enough arguments
[nsIAddressBook.exportAddressBook]" every time.
Comment 4 User image R.K.Aa. 2005-03-16 07:13:50 PST
to comment 3: that's bug 271895
Comment 5 User image Stephen Mercer 2005-12-11 18:39:25 PST
I get the same
Not enough arguments [nsIAddressBook.exportAddressBook]
error messsage mentioned by dkowal@paritysys.net, but I don't have any non-ascii data in my address book to the best of my ability to determine. So the title of this bug seems too restrictive. 
This is the current title:
"[.csv,.txt,.tab only] issues with import / export of non ascii addressbook data"
It seems as if this is a more general bug that "Tools>>Export" just doesn't work in 1.7.5. I am going to adjust the bug title to reflect that.
Comment 6 User image Stephen Mercer 2005-12-11 18:43:17 PST
(In reply to comment #5)
> I am going to adjust the bug title to reflect that.

Ah. I don't have permission to change the bug title. Well, someone should change it to be broader. Or create a separate bug report. 

Comment 7 User image Mark Banner (:standard8) 2005-12-12 04:31:55 PST
(In reply to comment #5)
> I get the same
> Not enough arguments [nsIAddressBook.exportAddressBook]
> error messsage mentioned by dkowal@paritysys.net, but I don't have any
> non-ascii data in my address book to the best of my ability to determine. So
> the title of this bug seems too restrictive. 
> ...
> It seems as if this is a more general bug that "Tools>>Export" just doesn't
> work in 1.7.5. I am going to adjust the bug title to reflect that.

The bug you mention was fixed by bug 271895 almost a year ago, and is fixed in versions after 1.7.5. Please *always* try the latest version before reporting problems.
Comment 8 User image Wayne Mery (:wsmwk, NI for questions) 2006-01-25 13:09:52 PST
*** Bug 173912 has been marked as a duplicate of this bug. ***
Comment 9 User image Jungshik Shin 2006-01-25 19:52:20 PST
At least on Windows, we can use a 'BOM-prefixed' UTF-8 (or UTF-16)
Comment 11 User image Wayne Mery (:wsmwk, NI for questions) 2011-02-28 03:24:16 PST
*** Bug 254118 has been marked as a duplicate of this bug. ***
Comment 12 User image Reinhard 2011-09-16 09:34:02 PDT
Today I got a list of people (Chinese) and eventually I had to re-type everything because it was impossible to import UTF-8 CSV/TAB in any email program and gmail totally failed as well. Please fix this and let me choose the encoding. Encoding is at least as important as which field to map to which info. In 2011 character encoding shouldn't be an issue anymore.
Comment 13 User image Jorg K (GMT+1) 2015-08-09 13:56:45 PDT
Created attachment 8645494 [details] [diff] [review]
Always export as UTF-8

... because Microsoft wants it like that:
https://support.microsoft.com/en-us/kb/933855

And of course for users who want to export Asian names from their address books.
Comment 14 User image Magnus Melin 2015-08-10 00:54:21 PDT
Gah, despite that article it seems Outlook (at least) does not export to UTF-8, and it also doesn't properly allow importing CSV contacts in UTF-8 with or without BOM.
Comment 15 User image Jorg K (GMT+1) 2015-08-10 01:21:21 PDT
Well, we just forget about it then.
Comment 16 User image Jorg K (GMT+1) 2015-08-10 01:25:23 PDT
Comment on attachment 8645494 [details] [diff] [review]
Always export as UTF-8

Forget patch for now. We need something more elaborate to allow selection of export character set. BTW, same is true for import.
Comment 17 User image Magnus Melin 2015-08-10 13:16:12 PDT
Gmail offers too options for export
 - gmail csv => utf-8
 - outlook csv => Windows-1252 

I don't think there's really any sense using anything except utf-8. If it works to re-import our exported csv to thunderbird, maybe that's still the best option. Then you can convert the file for outlook if you really need to. As it is now you can't do that.
Comment 18 User image Jorg K (GMT+1) 2015-08-10 14:06:42 PDT
You decide, the simple patch is there. I think imitating Gmail would make sense, that is, offer two options.

I can confirm that UTF-8 import in Thunderbird works right now, I tried it yesterday: bug 1188306 comment #8. Even Notepad can save UTF-8 as ANSI for the conversion, if required.
Comment 19 User image Magnus Melin 2015-08-21 13:02:46 PDT
To just fix the dataloss issue, we could just CopyUTF16toUTF8 on failure here: hhttp://hg.mozilla.org/comm-central/annotate/7965091ed556/mailnews/addrbook/src/nsAbManager.cpp#l797
Comment 20 User image Jorg K (GMT+1) 2015-08-21 15:37:43 PDT
Hmm. This is more complicated, isn't it? You need to preprocess all the data to see whether it can be exported with the "system character set". If not, use UTF-8, so CopyUTF16toUTF8 (see: http://mxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgI18N.cpp#50). You can't happily export until you hit an error and then switch to UTF-8. Or am I misunderstanding something here?
Comment 21 User image Magnus Melin 2015-08-22 12:13:44 PDT
Yes you're right, or just start over if we hit an error.
Comment 22 User image Jorg K (GMT+1) 2015-10-22 02:06:28 PDT
I've been considering adding an option to export either with the "system character set" (current behaviour) or UTF-8. Sadly, the export function has no dialogue. You click "Export..." and up comes the file selection. So where would be a good spot to add the UI to make the choice?
Comment 23 User image Richard Marti (:Paenglab) 2015-10-22 07:51:19 PDT
What about after clicking "Export..." showing a dialog with the different file format options and then show the save dialog with the correct format selected? The pro would be, the formats could be better explained. The con would be the additional dialog. But the user needs now also to choose the correct format in file chooser without understanding what they are for.

Now LDIF is pre-selected and I don't know if Outlook understands this format. If the user only sets the file name it could be he can't import this file to Outlook. A better description in the additional dialog could help here to choose the right one.
Comment 24 User image Jorg K (GMT+1) 2015-10-22 08:53:22 PDT
Good approach, thanks. I've wondered in the past what LDIF is. We could also mention the version number in the explanatory text, for example vCard 2.1 in our case. Let's see what Magnus says.
Comment 25 User image Magnus Melin 2015-10-22 12:24:58 PDT
You could do that, but since you still have to select the file you'll save to it would be kind of an abnormal UI. 

For an easy solution you could just add the UTF-8 versions in the format selector

 Comma Separated (Outlook compatible)
 Comma Separated (UTF-8)
 Tab Separated (Outlook compatible)
 Tab Separated (UTF-8)
 vCard
 LDIF

I think LDIF could be last, as I really doubt people use that much.
(I wouldn't add a vCard version number.)
Comment 26 User image Richard Marti (:Paenglab) 2015-10-22 12:54:17 PDT
When it's possible to add this lines to the format selector, then this is better.
Comment 27 User image Jorg K (GMT+1) 2015-10-22 13:34:02 PDT
Created attachment 8677669 [details]
Screenshot of proposed solution.

I prefer not to mention Outlook. Besides, who knows what will be "Outlook compatible" in the future.
Comment 28 User image Jorg K (GMT+1) 2015-10-22 13:39:30 PDT
Created attachment 8677677 [details] [diff] [review]
Proposed solution (v1): Let user choose between system charset or UTF-8
Comment 29 User image Jorg K (GMT+1) 2015-10-22 13:45:18 PDT
Note the the reviewer: The lines in this code are hopelessly longer than 80 characters, so I didn't bother adhering to the limit, however, I did shorten lines where possible.
Comment 30 User image Richard Marti (:Paenglab) 2015-10-23 00:31:02 PDT
Comment on attachment 8677677 [details] [diff] [review]
Proposed solution (v1): Let user choose between system charset or UTF-8

Looks good.
Comment 31 User image Magnus Melin 2015-10-23 13:13:43 PDT
Comment on attachment 8677677 [details] [diff] [review]
Proposed solution (v1): Let user choose between system charset or UTF-8

Review of attachment 8677677 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, thx! r=mkmelin

::: mail/locales/en-US/chrome/messenger/addressbook/addressBook.properties
@@ +138,2 @@
>  TABFiles=Tab Delimited
> +TABFilesSysCharset=Tab Delimited  (System Charset)

nit: double space snuck in here
Comment 32 User image Jorg K (GMT+1) 2015-10-23 14:05:30 PDT
Created attachment 8678337 [details] [diff] [review]
Proposed solution (v1b): Let user choose between system charset or UTF-8

Carrying forward Magnus' r+ and Richard's ui-r+
Fixed nit (removed double space).
Thanks for the quick turn-around!

Note You need to log in before you can comment on or make changes to this bug.