Import of address book in CSV format fails if data contains accented/international characters

RESOLVED FIXED in Thunderbird 43.0

Status

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: guy.deceulaer, Assigned: jorgk)

Tracking

({intl, regression})

38 Branch
Thunderbird 43.0
intl, regression

Thunderbird Tracking Flags

(thunderbird40 affected, thunderbird41 fixed, thunderbird42 fixed, thunderbird43 fixed, thunderbird_esr3840+ fixed)

Details

(Whiteboard: [regression:TB35])

Attachments

(3 attachments, 2 obsolete attachments)

(Reporter)

Description

4 years ago
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0
Build ID: 20150630154324

Steps to reproduce:

Import an list of addresses from a CSV file.
The file contains some records with accented characters, like for the firstname André (e-accent aigu at the end).

The file is created by a program, but this bug didn't occur in the past, so the problem lies not there.


Actual results:

The import fails with message "Error importing address book xxxx, all addresses may not have been imported". In fact, none is imported.

The import of exactly the same CSV file worked in previous releases.


Expected results:

Import should succeed.

When I remove the accented characters with Notepad, the import works again.

The bug can be reproduced by creating an address with an accented character, exporting to a CSV file and trying to import again. It will fail. The attached test.csv is an example.
(Assignee)

Comment 1

4 years ago
Posted file Another example
I exported my address book and tried to re-import it and that failed.
I tracked it down to an address with an accented character.

This is exactly was is reported in comment #0 (last two lines).

Works with TB 31.6.
(Assignee)

Updated

4 years ago
Keywords: regression
OS: Unspecified → Windows 7
Hardware: Unspecified → x86_64
(Assignee)

Updated

4 years ago
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment hidden (obsolete)
(Assignee)

Comment 3

4 years ago
OK, found the problem.

In the case where the import contains the accented character (é or í in the attached examples), the input is not considered UTF-8 compatible and *no* character set is returned at all:
https://dxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgUtils.cpp?offset=200#2420

This was regressed by bug 1074585 in https://hg.mozilla.org/comm-central/rev/fdfc4f1a8c6e.

Just assigning ISO-8859-1 at the end of the function will make the import work again in this case:
if (aCharset.IsEmpty()) aCharset.AssignLiteral("ISO-8859-1");

The function *MUST* return a characterset! I will NI the people who broke it ;-)

General comment:
I think it's pretty embarrassing that the CSV import doesn't work (and there is no text for it either). I was looking through the Outlook crashes and one comment was "Can't import CSV". That's how I found the bug. So the poor user tried the Outlook import first, that crashed (my fault, since I hadn't disabled it properly). Then he tried CSV and that didn't work either.
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(Pidgeot18)
(Assignee)

Updated

4 years ago
OS: Windows 7 → All
Hardware: x86_64 → All
(Assignee)

Updated

4 years ago
Summary: Import of address book in CSV format fails if foreign accented characters → Import of address book in CSV format fails if data contains accented/international characters

Updated

4 years ago
Blocks: 1074585
Keywords: intl
Whiteboard: [regression:TB35]
Great detective work, thx Jorg! I think we should just fix the caller - assign charset ISO-8859-1 (or UTF-8?) on failure here:
https://dxr.mozilla.org/comm-central/source/mailnews/import/text/src/nsTextAddress.cpp#53
Flags: needinfo?(mkmelin+mozilla)
(Assignee)

Comment 5

4 years ago
I think that should be two fixed:
One quick one and one with a UI change that lets the user select the character set if it can't be determined automatically.

Given that TB is used a lot in Europe and America, we should at least detect Windows-1252 (superset of ISO8859-1).
https://support.microsoft.com/en-us/kb/933855 suggests UTF-8 is what's always used nowadays. (Haven't tested it though.)
(Assignee)

Comment 7

4 years ago
Hmm, well, that's another problem: The format that TB uses upon export. I exported and got "Western". We can change that too, but we should also do something useful if it's not UTF-8 upon import.

The problem of this bug is that the é/í are already encoded in "Western", so the UTF-8 test fails (saw that in debugging). Just assigning "UTF-8" gives �, tried that, too.
(Assignee)

Comment 8

4 years ago
Workaround for import: Convert "Western" encoded CSV files to UTF-8, for example in Notepad++ on Windows.
Duplicate of this bug: 1178728
(Assignee)

Comment 10

4 years ago
Posted patch Proposed solution: Quick fix. (obsolete) — Splinter Review
Whoever has time and deems this important can review this.

Code was reestablished in the caller from
https://hg.mozilla.org/comm-central/rev/fdfc4f1a8c6e

(Magnus suggested to put something into the caller (comment #4), that's what I've done.)
Flags: needinfo?(Pidgeot18)
Attachment #8645489 - Flags: review?(rkent)
Attachment #8645489 - Flags: review?(mkmelin+mozilla)
nsMsgUtils.cpp should probably be changed to return error if no charset was detected
MsgDetectCharsetFromFile I mean.
(Assignee)

Comment 14

4 years ago
(In reply to Magnus Melin from comment #11)
> nsMsgUtils.cpp should probably be changed to return error if no charset was
> detected

Which error would you like?
NS_ERROR_FAILURE perhaps?
Export should be in another bug, maybe bug 117236.
(Assignee)

Comment 16

4 years ago
As per Magnus' comment #15.
Attachment #8645489 - Attachment is obsolete: true
Attachment #8645489 - Flags: review?(rkent)
Attachment #8645489 - Flags: review?(mkmelin+mozilla)
Attachment #8645493 - Flags: review?(mkmelin+mozilla)
(Assignee)

Comment 17

4 years ago
Some homework:
MsgDetectCharsetFromFile has two callers:
http://mxr.mozilla.org/comm-central/search?string=MsgDetectCharsetFromFile
One in /mailnews/import/text/src/nsTextAddress.cpp (where the new return value is handled) and one in nsMsgAttachmentHandler.cpp. There, in nsMsgAttachmentHandler::PickCharset, the return value is just returned. This function has one caller only:
http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#1023
which ignores the return value.

In conclusion: Changing the behaviour and returning a failure code from MsgDetectCharsetFromFile now won't cause any unexpected side-effects.
(Assignee)

Comment 18

4 years ago
Better way to do it.
Attachment #8645493 - Attachment is obsolete: true
Attachment #8645493 - Flags: review?(mkmelin+mozilla)
Attachment #8645836 - Flags: review?(mkmelin+mozilla)
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

Review of attachment 8645836 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, thx Jorg! r=mkmelin
Attachment #8645836 - Flags: review?(mkmelin+mozilla) → review+
Assignee: nobody → mozilla
(Assignee)

Comment 20

4 years ago
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

[Approval Request Comment]
Regression caused by (bug #): 1074585
User impact if declined: Major function not working.
Testing completed (on c-c, etc.): Yes.
Risk to taking this patch (and alternatives if risky):
None.
Attachment #8645836 - Flags: approval-comm-esr38?
Attachment #8645836 - Flags: approval-comm-beta?
Attachment #8645836 - Flags: approval-comm-aurora?
(Assignee)

Updated

4 years ago
Keywords: checkin-needed
https://hg.mozilla.org/comm-central/rev/908c9564e958
Status: NEW → RESOLVED
Last Resolved: 4 years ago
status-thunderbird43: --- → fixed
Keywords: checkin-needed
Resolution: --- → FIXED
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

http://hg.mozilla.org/releases/comm-esr38/rev/97b27d9d64f1
Attachment #8645836 - Flags: approval-comm-esr38? → approval-comm-esr38+

Updated

4 years ago
status-thunderbird_esr38: affected → fixed
tracking-thunderbird_esr38: ? → 40+
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

https://hg.mozilla.org/releases/comm-beta/rev/9d3eefbed90e
Attachment #8645836 - Flags: approval-comm-beta?
Attachment #8645836 - Flags: approval-comm-beta+
Attachment #8645836 - Flags: approval-comm-aurora?
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

[Triage Comment]

https://hg.mozilla.org/releases/comm-aurora/rev/3b0aef562c40
Attachment #8645836 - Flags: approval-comm-aurora+

Updated

4 years ago
status-thunderbird41: affected → fixed
status-thunderbird42: affected → fixed
Target Milestone: --- → Thunderbird 43.0
You need to log in before you can comment on or make changes to this bug.