Closed Bug 1188306 Opened 9 years ago Closed 9 years ago

Import of address book in CSV format fails if data contains accented/international characters

Categories

(Thunderbird :: Address Book, defect)

38 Branch
defect
Not set
normal

Tracking

(thunderbird40 affected, thunderbird41 fixed, thunderbird42 fixed, thunderbird43 fixed, thunderbird_esr3840+ fixed)

RESOLVED FIXED
Thunderbird 43.0
Tracking Status
thunderbird40 --- affected
thunderbird41 --- fixed
thunderbird42 --- fixed
thunderbird43 --- fixed
thunderbird_esr38 40+ fixed

People

(Reporter: guy.deceulaer, Assigned: jorgk-bmo)

References

Details

(Keywords: intl, regression, Whiteboard: [regression:TB35])

Attachments

(3 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0
Build ID: 20150630154324

Steps to reproduce:

Import an list of addresses from a CSV file.
The file contains some records with accented characters, like for the firstname André (e-accent aigu at the end).

The file is created by a program, but this bug didn't occur in the past, so the problem lies not there.


Actual results:

The import fails with message "Error importing address book xxxx, all addresses may not have been imported". In fact, none is imported.

The import of exactly the same CSV file worked in previous releases.


Expected results:

Import should succeed.

When I remove the accented characters with Notepad, the import works again.

The bug can be reproduced by creating an address with an accented character, exporting to a CSV file and trying to import again. It will fail. The attached test.csv is an example.
Attached file Another example
I exported my address book and tried to re-import it and that failed.
I tracked it down to an address with an accented character.

This is exactly was is reported in comment #0 (last two lines).

Works with TB 31.6.
Keywords: regression
OS: Unspecified → Windows 7
Hardware: Unspecified → x86_64
Status: UNCONFIRMED → NEW
Ever confirmed: true
OK, found the problem.

In the case where the import contains the accented character (é or í in the attached examples), the input is not considered UTF-8 compatible and *no* character set is returned at all:
https://dxr.mozilla.org/comm-central/source/mailnews/base/util/nsMsgUtils.cpp?offset=200#2420

This was regressed by bug 1074585 in https://hg.mozilla.org/comm-central/rev/fdfc4f1a8c6e.

Just assigning ISO-8859-1 at the end of the function will make the import work again in this case:
if (aCharset.IsEmpty()) aCharset.AssignLiteral("ISO-8859-1");

The function *MUST* return a characterset! I will NI the people who broke it ;-)

General comment:
I think it's pretty embarrassing that the CSV import doesn't work (and there is no text for it either). I was looking through the Outlook crashes and one comment was "Can't import CSV". That's how I found the bug. So the poor user tried the Outlook import first, that crashed (my fault, since I hadn't disabled it properly). Then he tried CSV and that didn't work either.
Flags: needinfo?(mkmelin+mozilla)
Flags: needinfo?(Pidgeot18)
OS: Windows 7 → All
Hardware: x86_64 → All
Summary: Import of address book in CSV format fails if foreign accented characters → Import of address book in CSV format fails if data contains accented/international characters
Blocks: 1074585
Keywords: intl
Whiteboard: [regression:TB35]
Great detective work, thx Jorg! I think we should just fix the caller - assign charset ISO-8859-1 (or UTF-8?) on failure here:
https://dxr.mozilla.org/comm-central/source/mailnews/import/text/src/nsTextAddress.cpp#53
Flags: needinfo?(mkmelin+mozilla)
I think that should be two fixed:
One quick one and one with a UI change that lets the user select the character set if it can't be determined automatically.

Given that TB is used a lot in Europe and America, we should at least detect Windows-1252 (superset of ISO8859-1).
https://support.microsoft.com/en-us/kb/933855 suggests UTF-8 is what's always used nowadays. (Haven't tested it though.)
Hmm, well, that's another problem: The format that TB uses upon export. I exported and got "Western". We can change that too, but we should also do something useful if it's not UTF-8 upon import.

The problem of this bug is that the é/í are already encoded in "Western", so the UTF-8 test fails (saw that in debugging). Just assigning "UTF-8" gives �, tried that, too.
Workaround for import: Convert "Western" encoded CSV files to UTF-8, for example in Notepad++ on Windows.
Attached patch Proposed solution: Quick fix. (obsolete) — Splinter Review
Whoever has time and deems this important can review this.

Code was reestablished in the caller from
https://hg.mozilla.org/comm-central/rev/fdfc4f1a8c6e

(Magnus suggested to put something into the caller (comment #4), that's what I've done.)
Flags: needinfo?(Pidgeot18)
Attachment #8645489 - Flags: review?(rkent)
Attachment #8645489 - Flags: review?(mkmelin+mozilla)
nsMsgUtils.cpp should probably be changed to return error if no charset was detected
MsgDetectCharsetFromFile I mean.
(In reply to Magnus Melin from comment #11)
> nsMsgUtils.cpp should probably be changed to return error if no charset was
> detected

Which error would you like?
NS_ERROR_FAILURE perhaps?
Export should be in another bug, maybe bug 117236.
As per Magnus' comment #15.
Attachment #8645489 - Attachment is obsolete: true
Attachment #8645489 - Flags: review?(rkent)
Attachment #8645489 - Flags: review?(mkmelin+mozilla)
Attachment #8645493 - Flags: review?(mkmelin+mozilla)
Some homework:
MsgDetectCharsetFromFile has two callers:
http://mxr.mozilla.org/comm-central/search?string=MsgDetectCharsetFromFile
One in /mailnews/import/text/src/nsTextAddress.cpp (where the new return value is handled) and one in nsMsgAttachmentHandler.cpp. There, in nsMsgAttachmentHandler::PickCharset, the return value is just returned. This function has one caller only:
http://mxr.mozilla.org/comm-central/source/mailnews/compose/src/nsMsgSend.cpp#1023
which ignores the return value.

In conclusion: Changing the behaviour and returning a failure code from MsgDetectCharsetFromFile now won't cause any unexpected side-effects.
Better way to do it.
Attachment #8645493 - Attachment is obsolete: true
Attachment #8645493 - Flags: review?(mkmelin+mozilla)
Attachment #8645836 - Flags: review?(mkmelin+mozilla)
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

Review of attachment 8645836 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, thx Jorg! r=mkmelin
Attachment #8645836 - Flags: review?(mkmelin+mozilla) → review+
Assignee: nobody → mozilla
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

[Approval Request Comment]
Regression caused by (bug #): 1074585
User impact if declined: Major function not working.
Testing completed (on c-c, etc.): Yes.
Risk to taking this patch (and alternatives if risky):
None.
Attachment #8645836 - Flags: approval-comm-esr38?
Attachment #8645836 - Flags: approval-comm-beta?
Attachment #8645836 - Flags: approval-comm-aurora?
Keywords: checkin-needed
https://hg.mozilla.org/comm-central/rev/908c9564e958
Status: NEW → RESOLVED
Closed: 9 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

http://hg.mozilla.org/releases/comm-esr38/rev/97b27d9d64f1
Attachment #8645836 - Flags: approval-comm-esr38? → approval-comm-esr38+
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

https://hg.mozilla.org/releases/comm-beta/rev/9d3eefbed90e
Attachment #8645836 - Flags: approval-comm-beta?
Attachment #8645836 - Flags: approval-comm-beta+
Attachment #8645836 - Flags: approval-comm-aurora?
Comment on attachment 8645836 [details] [diff] [review]
Proposed solution: Quick fix (v3)

[Triage Comment]

https://hg.mozilla.org/releases/comm-aurora/rev/3b0aef562c40
Attachment #8645836 - Flags: approval-comm-aurora+
Target Milestone: --- → Thunderbird 43.0
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: