Import in text format (txt,csv,tab): Should support non-ascii display names and email addresses

RESOLVED FIXED in Thunderbird 16.0

Status

defect
--
critical
RESOLVED FIXED
17 years ago
7 years ago

People

(Reporter: cavin, Assigned: hiro)

Tracking

(Blocks 1 bug, {dataloss, intl})

Trunk
Thunderbird 16.0
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 3 obsolete attachments)

This is spun off bug 91295 as the code currently only works for ascii display
names and email addresses. Some notes from 91592:

a)  how does your code handle non-ASCII display names?

b) will this code work with IDNs in email addresses?  (does eudora support 
that?) see http://bugzilla.mozilla.org/show_bug.cgi?id=127399.
QA contact to myself.
Nominating for nsbeta1 since this used to be working.
Keywords: intl, nsbeta1
QA Contact: nbaca → ji
Discussed in mail news bug meeting.  Decided to minus this bug.
Keywords: nsbeta1nsbeta1-
Target Milestone: --- → mozilla1.2alpha
Blocks: 157010
Blocks: 157673
cavin, is this a bug for Eudora import?
No, I think this is a problem for all import.
okay, non ASCII e-mail address is not legal at this point I think no commercial
mailer support it, so we can focus on the non ASCII dispaly name problem.
nominate for nsbeta1
Keywords: nsbeta1-nsbeta1
QA contact to Marina. Please reassgin to appropriate person.
QA Contact: ji → marina
Gregg, please look at this bug too 
-dassi
Mail triage team: nsbeta1+/adt3
Keywords: nsbeta1nsbeta1+
Summary: Import: Should support for non-ascii display names and email addresses → Import: Should support non-ascii display names and email addresses
Whiteboard: [adt3]
Target Milestone: mozilla1.2alpha → mozilla1.4beta
I received e-mail with the source from line:

From: =?8859_1?B?S2FybGr8cmdlbg==?= Xxxxxxxxx <xxxxxxxxx@xxxxxx.com>

and Mozilla renders that first name as a Chinese character, 翽, which is U+7FFD
meaning "sounds of wings flapping". Well, the arguably input is garbage (it's
nothing like the guy's real name, and I usually get an accurate display of it
although it includes a u with umlaut) so I should expect to see garbage, but
certainly the input seems to be intended as an ISO 8859-1 string which cannot
include Chinese characters.

I am using Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4) Gecko/20030624.
Product: MailNews → Core
should Bug 163501 dup to or depend on this bug?
Assignee: cavin → nobody
Severity: normal → critical
Keywords: dataloss
OS: Windows NT → All
QA Contact: marina
Mark and others, 

a) is this really an import issue? or is it garden variety AB?

b) with no dupes or votes it's hard to imagine there aren't other bugs that haven't fixed or dupe parts of this bug.  Might some be
 bug 164121    	
 bug 129407
 bug 207998
And after those, what's not listed that's left to implement?

for the record, the spinoff reference in comment 0 is to end of bug 91295 comment 10.
QA Contact: import
(In reply to comment #12)
> a) is this really an import issue? or is it garden variety AB?

Not sure, I can't say I've really tried it, but given the lack of comments/other bugs about it, I'd be surprised if its a big problem still.

> b) with no dupes or votes it's hard to imagine there aren't other bugs that
> haven't fixed or dupe parts of this bug.  Might some be
>  bug 164121     
>  bug 129407
>  bug 207998
> And after those, what's not listed that's left to implement?

There's one somewhere about allowing different character sets for import, that's probably the more relevant one in this case. So we could probably close this one now.
The e-mail mentioned in comment 10 now, in Thunderbird 2.0.0.6, displays with a Unicode unknown character symbol, a question mark in a diamond, rather than the Chinese character. As the input is garbage, that is probably correct behaviour, so at least this part of the bug seems to have been fixed.
Product: Core → MailNews Core
Target Milestone: mozilla1.4beta → ---
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100411 Icedove/3.0.4

The message from #10 still unreadable, despite my iconv knows that "8859_1" is valid codeset, and "S2FybGr8cmdlbg==" is decoded properly to "Karljürgen".
Duplicate of this bug: 164121
Summary: Import: Should support non-ascii display names and email addresses → Import in text format (txt,csv,tab): Should support non-ascii display names and email addresses
Whiteboard: [adt3]
Assignee: nobody → hiikezoe
Depends on: 703503
Posted patch Proposed fix (obsolete) — Splinter Review
This patch can not be applied without the fix for bug 703175 since both of patches touch nsTextAddress::DetermineDelim.

This patch uses MsgDetectCharsetFromFile which is introduced in bug 703503.
Attachment #576874 - Flags: review?(dbienvenu)
Posted patch Some tests (obsolete) — Splinter Review
Attachment #576876 - Flags: review?(mconley)
Duplicate of this bug: 522865
Duplicate of this bug: 226813
Comment on attachment 576876 [details] [diff] [review]
Some tests

I haven't tried running the tests, but the code looks good to me.  Thanks.
Attachment #576876 - Flags: review?(mconley) → review+
Attachment #576874 - Attachment is obsolete: true
Attachment #576874 - Flags: review?(dbienvenu)
Attachment #637389 - Flags: review?(dbienvenu)
carrying over review+.
Attachment #576876 - Attachment is obsolete: true
Attachment #637391 - Flags: review+
Attachment #637389 - Attachment is obsolete: true
Attachment #637389 - Flags: review?(dbienvenu)
Attachment #638259 - Flags: review?(dbienvenu)
Comment on attachment 638259 [details] [diff] [review]
Adapt to the latest trunk and some cleanup from the previous patch

looks reasonable, tests pass, thx, Hiro!
Attachment #638259 - Flags: review?(mozilla) → review+
https://hg.mozilla.org/comm-central/rev/c9097070b413
https://hg.mozilla.org/comm-central/rev/e6d59effc2e1
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: in-testsuite+
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 16.0
Comment on attachment 638259 [details] [diff] [review]
Adapt to the latest trunk and some cleanup from the previous patch

>-        numQuotes += MsgCountChar(line, '"');
>+        numQuotes += line.CountChar(PRUnichar('"'));
Noooooooooooooooooooooo!
Depends on: 773840
Depends on: 803835
You need to log in before you can comment on or make changes to this bug.