[.csv,.txt,.tab only] addressbook export/import: characters outside the locale codepage lost: use UTF8 and/or let the user choose the charset

RESOLVED FIXED in Thunderbird 45.0

Status

MailNews Core
Address Book
--
critical
RESOLVED FIXED
16 years ago
2 years ago

People

(Reporter: (not reading, please use seth@sspitzer.org instead), Assigned: Jorg K (GMT+2))

Tracking

(Blocks: 1 bug, {dataloss, intl})

Trunk
Thunderbird 45.0
dataloss, intl
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: nab-imp)

Attachments

(2 attachments, 2 obsolete attachments)

issues with import / export of non ascii addressbook data

from nsAddressBook.cpp:

// XXX i18n TODO
// is this right?  
// do we want escaped utf8?  base64 encoded data?
// the import code appears to expect it in the system charset
// so we'll do that for now.  
// 
// one reason against doing the system char set:
// my machine is set to US-ASCII, but I can have japanese names in my
// addressbook.  but if I go to export to a .csv or .tab file
// the conversion will fail.

one possible solution is to continue to do this for .cvs or .tab export.

ldif export should be doable, as I think ldif data is required to be in UTF-8 
or something

when I add XML import / export, I'll probably do the same thing as LDIF.

nhotta / ji, comments?

Updated

16 years ago
Whiteboard: nab-imp

Comment 1

16 years ago
I think export to a .csv or .tab file needs to cdepend on system charset.
The system needs to support the charset to make the file readable anyway. 
Keywords: intl

Updated

16 years ago
Status: NEW → ASSIGNED
Summary: issues with import / export of non ascii addressbook data → [.csv,.txt,.tab only] issues with import / export of non ascii addressbook data
Whiteboard: nab-imp → nab-imp,dmose-dataloss

Updated

15 years ago
Keywords: dataloss
Whiteboard: nab-imp,dmose-dataloss → nab-imp

Comment 2

15 years ago
By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and
<http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss
bugs are of critical or possibly higher severity.  Only changing open bugs to
minimize unnecessary spam.  Keywords to trigger this would be crash, topcrash,
topcrash+, zt4newcrash, dataloss.
Severity: normal → critical

Updated

14 years ago
Depends on: 153882
Product: Browser → Seamonkey

Comment 3

12 years ago
The export function works flawlessly in 1.7.2, but with 1.7.5, I get a hard
error on the export request - "Not enough arguments
[nsIAddressBook.exportAddressBook]" every time.

Comment 4

12 years ago
to comment 3: that's bug 271895

Updated

12 years ago
Assignee: sspitzer → mail
Status: ASSIGNED → NEW

Comment 5

12 years ago
I get the same
Not enough arguments [nsIAddressBook.exportAddressBook]
error messsage mentioned by dkowal@paritysys.net, but I don't have any non-ascii data in my address book to the best of my ability to determine. So the title of this bug seems too restrictive. 
This is the current title:
"[.csv,.txt,.tab only] issues with import / export of non ascii addressbook data"
It seems as if this is a more general bug that "Tools>>Export" just doesn't work in 1.7.5. I am going to adjust the bug title to reflect that.

Comment 6

12 years ago
(In reply to comment #5)
> I am going to adjust the bug title to reflect that.

Ah. I don't have permission to change the bug title. Well, someone should change it to be broader. Or create a separate bug report. 

(In reply to comment #5)
> I get the same
> Not enough arguments [nsIAddressBook.exportAddressBook]
> error messsage mentioned by dkowal@paritysys.net, but I don't have any
> non-ascii data in my address book to the best of my ability to determine. So
> the title of this bug seems too restrictive. 
> ...
> It seems as if this is a more general bug that "Tools>>Export" just doesn't
> work in 1.7.5. I am going to adjust the bug title to reflect that.

The bug you mention was fixed by bug 271895 almost a year ago, and is fixed in versions after 1.7.5. Please *always* try the latest version before reporting problems.
*** Bug 173912 has been marked as a duplicate of this bug. ***

Comment 9

12 years ago
At least on Windows, we can use a 'BOM-prefixed' UTF-8 (or UTF-16)
Component: Address Book → MailNews: Address Book
Product: Mozilla Application Suite → Core
Summary: [.csv,.txt,.tab only] issues with import / export of non ascii addressbook data → [.csv,.txt,.tab only] addressbook export/import : characters outside the locale codepage lost

Updated

9 years ago
OS: Windows 2000 → All
Hardware: PC → All
Summary: [.csv,.txt,.tab only] addressbook export/import : characters outside the locale codepage lost → [.csv,.txt,.tab only] addressbook export/import: characters outside the locale codepage lost: use UTF8 and/or let the user choose the charset

Comment 10

9 years ago
The comment mentioned in comment 0 is from rev. 1.74 of nsAddressBook.cpp.
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/mailnews/addrbook/src/Attic/nsAddressBook.cpp&rev=1.74&root=/cvsroot&mark=1450-1460#1434

The code is now located in nsAbManager.cpp:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/mailnews/addrbook/src/nsAbManager.cpp&rev=1.179&mark=662-669#651
QA Contact: nbaca → addressbook
Product: Core → MailNews Core
Assignee: mail → nobody
Blocks: 157010
Duplicate of this bug: 254118

Comment 12

6 years ago
Today I got a list of people (Chinese) and eventually I had to re-type everything because it was impossible to import UTF-8 CSV/TAB in any email program and gmail totally failed as well. Please fix this and let me choose the encoding. Encoding is at least as important as which field to map to which info. In 2011 character encoding shouldn't be an issue anymore.
(Assignee)

Comment 13

2 years ago
Created attachment 8645494 [details] [diff] [review]
Always export as UTF-8

... because Microsoft wants it like that:
https://support.microsoft.com/en-us/kb/933855

And of course for users who want to export Asian names from their address books.
Attachment #8645494 - Flags: review?(mkmelin+mozilla)

Comment 14

2 years ago
Gah, despite that article it seems Outlook (at least) does not export to UTF-8, and it also doesn't properly allow importing CSV contacts in UTF-8 with or without BOM.
(Assignee)

Comment 15

2 years ago
Well, we just forget about it then.
(Assignee)

Comment 16

2 years ago
Comment on attachment 8645494 [details] [diff] [review]
Always export as UTF-8

Forget patch for now. We need something more elaborate to allow selection of export character set. BTW, same is true for import.
Attachment #8645494 - Flags: review?(mkmelin+mozilla)

Comment 17

2 years ago
Gmail offers too options for export
 - gmail csv => utf-8
 - outlook csv => Windows-1252 

I don't think there's really any sense using anything except utf-8. If it works to re-import our exported csv to thunderbird, maybe that's still the best option. Then you can convert the file for outlook if you really need to. As it is now you can't do that.
(Assignee)

Comment 18

2 years ago
You decide, the simple patch is there. I think imitating Gmail would make sense, that is, offer two options.

I can confirm that UTF-8 import in Thunderbird works right now, I tried it yesterday: bug 1188306 comment #8. Even Notepad can save UTF-8 as ANSI for the conversion, if required.

Comment 19

2 years ago
To just fix the dataloss issue, we could just CopyUTF16toUTF8 on failure here: hhttp://hg.mozilla.org/comm-central/annotate/7965091ed556/mailnews/addrbook/src/nsAbManager.cpp#l797
(Assignee)

Comment 20

2 years ago
Hmm. This is more complicated, isn't it? You need to preprocess all the data to see whether it can be exported with the "system character set". If not, use UTF-8, so CopyUTF16toUTF8 (see: http://mxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgI18N.cpp#50). You can't happily export until you hit an error and then switch to UTF-8. Or am I misunderstanding something here?

Comment 21

2 years ago
Yes you're right, or just start over if we hit an error.
(Assignee)

Comment 22

2 years ago
I've been considering adding an option to export either with the "system character set" (current behaviour) or UTF-8. Sadly, the export function has no dialogue. You click "Export..." and up comes the file selection. So where would be a good spot to add the UI to make the choice?
Flags: needinfo?(richard.marti)
Flags: needinfo?(mkmelin+mozilla)
What about after clicking "Export..." showing a dialog with the different file format options and then show the save dialog with the correct format selected? The pro would be, the formats could be better explained. The con would be the additional dialog. But the user needs now also to choose the correct format in file chooser without understanding what they are for.

Now LDIF is pre-selected and I don't know if Outlook understands this format. If the user only sets the file name it could be he can't import this file to Outlook. A better description in the additional dialog could help here to choose the right one.
Flags: needinfo?(richard.marti)
(Assignee)

Comment 24

2 years ago
Good approach, thanks. I've wondered in the past what LDIF is. We could also mention the version number in the explanatory text, for example vCard 2.1 in our case. Let's see what Magnus says.

Comment 25

2 years ago
You could do that, but since you still have to select the file you'll save to it would be kind of an abnormal UI. 

For an easy solution you could just add the UTF-8 versions in the format selector

 Comma Separated (Outlook compatible)
 Comma Separated (UTF-8)
 Tab Separated (Outlook compatible)
 Tab Separated (UTF-8)
 vCard
 LDIF

I think LDIF could be last, as I really doubt people use that much.
(I wouldn't add a vCard version number.)
Flags: needinfo?(mkmelin+mozilla)
When it's possible to add this lines to the format selector, then this is better.
(Assignee)

Comment 27

2 years ago
Created attachment 8677669 [details]
Screenshot of proposed solution.

I prefer not to mention Outlook. Besides, who knows what will be "Outlook compatible" in the future.
(Assignee)

Updated

2 years ago
Assignee: nobody → mozilla
(Assignee)

Comment 28

2 years ago
Created attachment 8677677 [details] [diff] [review]
Proposed solution (v1): Let user choose between system charset or UTF-8
Attachment #8645494 - Attachment is obsolete: true
Attachment #8677677 - Flags: ui-review?(richard.marti)
Attachment #8677677 - Flags: review?(mkmelin+mozilla)
(Assignee)

Comment 29

2 years ago
Note the the reviewer: The lines in this code are hopelessly longer than 80 characters, so I didn't bother adhering to the limit, however, I did shorten lines where possible.
Comment on attachment 8677677 [details] [diff] [review]
Proposed solution (v1): Let user choose between system charset or UTF-8

Looks good.
Attachment #8677677 - Flags: ui-review?(richard.marti) → ui-review+

Comment 31

2 years ago
Comment on attachment 8677677 [details] [diff] [review]
Proposed solution (v1): Let user choose between system charset or UTF-8

Review of attachment 8677677 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, thx! r=mkmelin

::: mail/locales/en-US/chrome/messenger/addressbook/addressBook.properties
@@ +138,2 @@
>  TABFiles=Tab Delimited
> +TABFilesSysCharset=Tab Delimited  (System Charset)

nit: double space snuck in here
Attachment #8677677 - Flags: review?(mkmelin+mozilla) → review+
(Assignee)

Comment 32

2 years ago
Created attachment 8678337 [details] [diff] [review]
Proposed solution (v1b): Let user choose between system charset or UTF-8

Carrying forward Magnus' r+ and Richard's ui-r+
Fixed nit (removed double space).
Thanks for the quick turn-around!
Attachment #8677677 - Attachment is obsolete: true
Attachment #8678337 - Flags: ui-review+
Attachment #8678337 - Flags: review+
(Assignee)

Updated

2 years ago
Keywords: checkin-needed
(Assignee)

Updated

2 years ago
Status: NEW → ASSIGNED

Updated

2 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 45.0

Comment 33

2 years ago
https://hg.mozilla.org/comm-central/rev/7ee33e041ca1
You need to log in before you can comment on or make changes to this bug.