full name from system should be no be assumed to be iso-8859-1

RESOLVED DUPLICATE of bug 469797

Status

Thunderbird
Preferences
--
minor
RESOLVED DUPLICATE of bug 469797
12 years ago
9 years ago

People

(Reporter: Egmont Koblinger, Assigned: Nicolás Lichtmaier)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

12 years ago
User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; X11; Linux i686; hu) Opera 8.54
Build Identifier: Mozilla Thunderbird 1.5 (20060408)

I have a full UTF-8 environment. My locale is set to LANG=hu_HU.UTF-8 (and no
LC_* environment variables set), and /etc/passwd contains the full names
encoded in UTF-8.

When thunderbird is started for the first time, it asks me to enter my full
name, and this field is initialized with the value read from /etc/passwd.
However, it is interpreted as if it was encoded in iso-8859-1, which is wrong.
Hence my full name is displayed with accented characters in "double" UTF-8
encoding.

Although there's no documentation about the encoding of /etc/passwd, it's
definitely a wrong idea to assume a fixed charset that is unable to satisfy
international needs. Either assuming UTF-8 independently from the locale,
or assuming the encoding of the current locale would be a much better
approach. Since using any non-Unicode charset only leads to a big unfixable
mass of bugs, and since every modern distros use UTF-8 by default, I don't
really care what Thunderbird does for old-style locales, but if my locale is
UTF-8 and /etc/passwd is valid UTF-8 too, then thunderbird should assume it is
encoded in UTF-8.


Reproducible: Always

Steps to Reproduce:
Create a new user, make sure his full name contains accents encoded in UTF-8
in /etc/passwd and make sure that the locale is set to an UTF-8 one.

Actual Results:  
Accents in the full name are wrong when I'm asked to enter it.

Expected Results:  
Full name being displayed correctly when Thunderbird is started for the first time.
(Reporter)

Comment 1

12 years ago
Created attachment 218824 [details] [diff] [review]
worksforme utf8 patch

With this patch, /etc/passwd is assumed to be encoded in UTF-8.

Comment 2

12 years ago
-> new (since it has a patch).

Egmont: you have to ask for review (use the edit action, set r? for the patch e.g. to bsmedberg) if you wan't the patch to be included into thunderbird.
Severity: normal → minor
Status: UNCONFIRMED → NEW
Ever confirmed: true
Hardware: PC → All
Summary: Double UTF-8 in full name → full name from system should be no be assumed to be iso-8859-1
(Reporter)

Updated

12 years ago
Attachment #218824 - Flags: review?(bsmedberg)

Comment 3

11 years ago
Comment on attachment 218824 [details] [diff] [review]
worksforme utf8 patch

Move review to Ben's mail address, that he reads.
Attachment #218824 - Flags: review?(bsmedberg) → review?(benjamin)

Comment 4

11 years ago
Comment on attachment 218824 [details] [diff] [review]
worksforme utf8 patch

This is not quite correct, we should use the correct locale setting. This can be achieved using the NS_CopyNativeToUnicode function.
Attachment #218824 - Flags: review?(benjamin) → review-
(Reporter)

Comment 5

11 years ago
> we should use the correct locale setting

Neither approach is correct. There is simply no correct way to do it, since the charset metainfo of /etc/passwd is not available anywhere. IMHO using the current locale would be bizarre since the current locale is a process-specific value, it can be different for one process than for another one, while on the other hand the contents of /etc/passwd is not locale-specific. (Probably the best way would be to use the OS's default locale, but that's not available, it is stored in a distribution-dependent way.)

The Linux world is switching to UTF-8 very fast, actually it's almost there to have UTF-8 everywhere. Hence it's important that if both the current locale is an UTF-8 one and /etc/passwd is valid UTF-8 too (which is the case in all modern distros) then UTF-8 should be assumed. This is reached both by my patch and by your proposal. In any other cases there's no good way, only random heuristics. The one you propose happen to work on situations where my one doesn't, and vice versa. I'd kinda say it's absolutely irrelevant which heuristics to choose, I'd let you decide it. But now that this bug has been sitting here for exactly a year, I'd be glad to see _any_ solution applied to the UTF-8 problem, if possible. Thanks :)

Updated

10 years ago
Assignee: mscott → nobody
(Assignee)

Comment 6

9 years ago
Created attachment 357587 [details] [diff] [review]
new patch, first tries native encoding, then ISO-8859-1

What about this? I try the native encoding, if it fails, it keeps the current behavious. I've tried it and in my UTF-8 system it works, even if I edit /etc/passwd and replace my name with a ISO-8859-1.
Attachment #357587 - Flags: review?

Updated

9 years ago
Attachment #357587 - Flags: review? → review?(benjamin)

Updated

9 years ago
Assignee: nobody → nick
Status: NEW → ASSIGNED

Comment 7

9 years ago
Comment on attachment 357587 [details] [diff] [review]
new patch, first tries native encoding, then ISO-8859-1

>diff -r 24de3997ea60 toolkit/components/startup/src/nsUserInfoUnix.cpp

>+    if(NS_CopyNativeToUnicode(fullname, str) == NS_OK) {

Instead of == NS_OK, please make this

if (NS_SUCCEEDED(NS_CopyNativeToUnicode(fullname, str))

also note code style doesn't use snugglyparens for control structures.

Otherwise this looks good. Give me a patch with those nits fixed and this will be good to go. Thanks!
Attachment #357587 - Flags: review?(benjamin) → review-
(Assignee)

Comment 8

9 years ago
Created attachment 357872 [details] [diff] [review]
version 2

Ok, this is a new version. I hope it's ok this time =).
Attachment #357587 - Attachment is obsolete: true
Attachment #357872 - Flags: review?(benjamin)

Updated

9 years ago
Attachment #357872 - Flags: review?(benjamin) → review+
So I've already committed a patch for this in bug 469797 ... its currently on mozilla-central trunk and is waiting approval for the 1.9.1 branch.
(Assignee)

Comment 10

9 years ago
Funny... the patch in bug 469797 was rejecteed in this bug, since it blindly assumed UTF-8...
(Assignee)

Comment 11

9 years ago
Correction, it doesn't assume UTF-8, but it assumes the user enconding matches system encoding...
(Assignee)

Updated

9 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 469797
You need to log in before you can comment on or make changes to this bug.