The default bug view has changed. See this FAQ.

Import of address book in CSV format fails if data contains accented/international characters

RESOLVED FIXED in Thunderbird 43.0

Status

Thunderbird
Address Book
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: Guy De Ceulaer, Assigned: Jorg K (GMT+1))

Tracking

({intl, regression})

38 Branch
Thunderbird 43.0
intl, regression

Thunderbird Tracking Flags

(thunderbird40 affected, thunderbird41 fixed, thunderbird42 fixed, thunderbird43 fixed, thunderbird_esr3840+ fixed)

Details

(Whiteboard: [regression:TB35])

Attachments

(3 attachments, 2 obsolete attachments)

(Reporter)

Description

2 years ago
Created attachment 8639803 [details]
test.csv with one contact containing accented character

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0
Build ID: 20150630154324

Steps to reproduce:

Import an list of addresses from a CSV file.
The file contains some records with accented characters, like for the firstname André (e-accent aigu at the end).

The file is created by a program, but this bug didn't occur in the past, so the problem lies not there.


Actual results:

The import fails with message "Error importing address book xxxx, all addresses may not have been imported". In fact, none is imported.

The import of exactly the same CSV file worked in previous releases.


Expected results:

Import should succeed.

When I remove the accented characters with Notepad, the import works again.

The bug can be reproduced by creating an address with an accented character, exporting to a CSV file and trying to import again. It will fail. The attached test.csv is an example.
(Assignee)

Comment 1

2 years ago
Created attachment 8645394 [details]
Another example

I exported my address book and tried to re-import it and that failed.
I tracked it down to an address with an accented character.

This is exactly was is reported in comment #0 (last two lines).

Works with TB 31.6.
(Assignee)

Updated

2 years ago
Keywords: regression
OS: Unspecified → Windows 7
Hardware: Unspecified → x86_64
(Assignee)

Updated

2 years ago
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment hidden (obsolete)
(Assignee)

Comment 3

2 years ago
OK, found the problem.

In the case where the import contains the accented character (é or í in the attached examples), the input is not considered UTF-8 compatible and *no* character set is returned at all:
https://dxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgUtils.cpp?offset=200#2420

This was regressed by bug 1074585 in https://hg.mozilla.org/comm-central/rev/fdfc4f1a8c6e.

Just assigning ISO-8859-1 at the end of the function will make the import work again in this case:
if (aCharset.IsEmpty()) aCharset.AssignLiteral("ISO-8859-1");

The function *MUST* return a characterset! I will NI the people who broke it ;-)

General comment:
I think it's pretty embarrassing that the CSV import doesn't work (and there is no text for it either). I was looking through the Outlook crashes and one comment was "Can't import CSV". That's how I found the bug. So the poor user tried the Outlook import first, that crashed (my fault, since I hadn't disabled it properly). Then he tried CSV and that didn't work either.
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(Pidgeot18)
(Assignee)

Updated

2 years ago
OS: Windows 7 → All
Hardware: x86_64 → All
(Assignee)

Updated

2 years ago
Summary: Import of address book in CSV format fails if foreign accented characters → Import of address book in CSV format fails if data contains accented/international characters
Blocks: 1074585
Keywords: intl
Whiteboard: [regression:TB35]

Comment 4

2 years ago
Great detective work, thx Jorg! I think we should just fix the caller - assign charset ISO-8859-1 (or UTF-8?) on failure here:
https://dxr.mozilla.org/comm-central/source/mailnews/import/text/src/nsTextAddress.cpp#53
Flags: needinfo?(mkmelin+mozilla)
(Assignee)

Comment 5

2 years ago
I think that should be two fixed:
One quick one and one with a UI change that lets the user select the character set if it can't be determined automatically.

Given that TB is used a lot in Europe and America, we should at least detect Windows-1252 (superset of ISO8859-1).

Comment 6

2 years ago
https://support.microsoft.com/en-us/kb/933855 suggests UTF-8 is what's always used nowadays. (Haven't tested it though.)
(Assignee)

Comment 7

2 years ago
Hmm, well, that's another problem: The format that TB uses upon export. I exported and got "Western". We can change that too, but we should also do something useful if it's not UTF-8 upon import.

The problem of this bug is that the é/í are already encoded in "Western", so the UTF-8 test fails (saw that in debugging). Just assigning "UTF-8" gives �, tried that, too.
(Assignee)

Comment 8

2 years ago
Workaround for import: Convert "Western" encoded CSV files to UTF-8, for example in Notepad++ on Windows.

Updated

2 years ago
Duplicate of this bug: 1178728
(Assignee)

Comment 10

2 years ago
Created attachment 8645489 [details] [diff] [review]
Proposed solution: Quick fix.

Whoever has time and deems this important can review this.

Code was reestablished in the caller from
https://hg.mozilla.org/comm-central/rev/fdfc4f1a8c6e

(Magnus suggested to put something into the caller (comment #4), that's what I've done.)
Flags: needinfo?(Pidgeot18)
Attachment #8645489 - Flags: review?(rkent)
Attachment #8645489 - Flags: review?(mkmelin+mozilla)

Comment 11

2 years ago
nsMsgUtils.cpp should probably be changed to return error if no charset was detected

Comment 12

2 years ago
MsgDetectCharsetFromFile I mean.
(Assignee)

Comment 13

2 years ago
If you always want to export UTF-8, then the change would be here:
https://dxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbManager.cpp#702
https://dxr.mozilla.org/comm-central/source/mailnews/addrbook/src/nsAbManager.cpp#793
Should I add this to the patch?
(Assignee)

Comment 14

2 years ago
(In reply to Magnus Melin from comment #11)
> nsMsgUtils.cpp should probably be changed to return error if no charset was
> detected

Which error would you like?

Comment 15

2 years ago
NS_ERROR_FAILURE perhaps?
Export should be in another bug, maybe bug 117236.
(Assignee)

Comment 16

2 years ago
Created attachment 8645493 [details] [diff] [review]
Proposed solution: Quick fix (v2)

As per Magnus' comment #15.
Attachment #8645489 - Attachment is obsolete: true
Attachment #8645489 - Flags: review?(rkent)
Attachment #8645489 - Flags: review?(mkmelin+mozilla)
Attachment #8645493 - Flags: review?(mkmelin+mozilla)
(Assignee)

Comment 17

2 years ago
Some homework:
MsgDetectCharsetFromFile has two callers:
http://mxr.mozilla.org/comm-central/search?string=MsgDetectCharsetFromFile
One in /mailnews/import/text/src/nsTextAddress.cpp (where the new return value is handled) and one in nsMsgAttachmentHandler.cpp. There, in nsMsgAttachmentHandler::PickCharset, the return value is just returned. This function has one caller only:
http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#1023
which ignores the return value.

In conclusion: Changing the behaviour and returning a failure code from MsgDetectCharsetFromFile now won't cause any unexpected side-effects.
(Assignee)

Comment 18

2 years ago
Created attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

Better way to do it.
Attachment #8645493 - Attachment is obsolete: true
Attachment #8645493 - Flags: review?(mkmelin+mozilla)
Attachment #8645836 - Flags: review?(mkmelin+mozilla)

Comment 19

2 years ago
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

Review of attachment 8645836 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, thx Jorg! r=mkmelin
Attachment #8645836 - Flags: review?(mkmelin+mozilla) → review+

Updated

2 years ago
Assignee: nobody → mozilla
(Assignee)

Comment 20

2 years ago
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

[Approval Request Comment]
Regression caused by (bug #): 1074585
User impact if declined: Major function not working.
Testing completed (on c-c, etc.): Yes.
Risk to taking this patch (and alternatives if risky):
None.
Attachment #8645836 - Flags: approval-comm-esr38?
Attachment #8645836 - Flags: approval-comm-beta?
Attachment #8645836 - Flags: approval-comm-aurora?
(Assignee)

Updated

2 years ago
Keywords: checkin-needed

Comment 21

2 years ago
https://hg.mozilla.org/comm-central/rev/908c9564e958
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-thunderbird43: --- → fixed
Keywords: checkin-needed
Resolution: --- → FIXED

Comment 22

2 years ago
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

http://hg.mozilla.org/releases/comm-esr38/rev/97b27d9d64f1
Attachment #8645836 - Flags: approval-comm-esr38? → approval-comm-esr38+

Updated

2 years ago
status-thunderbird_esr38: affected → fixed
tracking-thunderbird_esr38: ? → 40+

Comment 23

2 years ago
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

https://hg.mozilla.org/releases/comm-beta/rev/9d3eefbed90e
Attachment #8645836 - Flags: approval-comm-beta?
Attachment #8645836 - Flags: approval-comm-beta+
Attachment #8645836 - Flags: approval-comm-aurora?

Comment 24

2 years ago
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

[Triage Comment]

https://hg.mozilla.org/releases/comm-aurora/rev/3b0aef562c40
Attachment #8645836 - Flags: approval-comm-aurora+

Updated

2 years ago
status-thunderbird41: affected → fixed
status-thunderbird42: affected → fixed
Target Milestone: --- → Thunderbird 43.0
You need to log in before you can comment on or make changes to this bug.