Closed Bug 382398 Opened 17 years ago Closed 14 years ago

checksetup.pl localized messages should be output in the console's charset

Categories

(Bugzilla :: Installation & Upgrading, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 4.0

People

(Reporter: vitaly.fedrushkov, Assigned: mkanat)

References

Details

(Keywords: intl)

Attachments

(1 file, 3 obsolete files)

Problem running checksetup.pl from non UTF-8 capable console.

We have messages.html.tmpl in UTF-8 which is right, but windows people (besides
Cygwin bash users) do use different text charsets -- for example, Windows-1251
here in Russia.

Keeping messages in different charsets within single file is not good.

[based on bug 352608 comment 3]
As a note, in case I don't fix this -- the solution is for Bugzilla::Install::Util::install_string to encode things into the console charset if the console charset is not UTF-8.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
Summary: localized checksetup.pl charset → checksetup.pl localized messages should be output in the console's charset
Target Milestone: --- → Bugzilla 3.2
Keywords: l12y
Keywords: l12yintl
How about to check 'LANG' shell environment?
If user uses cmd.exe of Windows, we can assume that user can use utf-8.
On others, i think we should consider that shell cannot display utf-8 when $ENV{LANG} doesn't include '.UTF-8'. 
No, you can just use the POSIX locale functions for it, and they should return something sensible on Windows, I think. The only difficult part is that POSIX locales don't map to Encode's understanding of character sets, necessarily.
(In reply to comment #2)
> If user uses cmd.exe of Windows, we can assume that user can use utf-8.

Wrong assumption: russian Windows uses codepage 1251 fot text windows.  Cygwin bash works well however.
So we're talking about

   binmode(STDOUT, ":encoding($charset)");
   binmode(STDERR, ":encoding($charset)");

here, and we're trying to find a way to determine $charset?
(In reply to comment #5)
> So we're talking about
> 
>    binmode(STDOUT, ":encoding($charset)");
>    binmode(STDERR, ":encoding($charset)");
> 
> here, and we're trying to find a way to determine $charset?

  If :encoding($charset) will properly translate utf-8 into that charset, then yeah. I'm not sure if the POSIX locale functions work on Windows or not, but if they do, that would possibly give us the info we need on all platforms. Otherwise there might be some Win32:: function we can use.
Can we rely on console windows using True Type fonts?  Then we could enforce codepage 65001.
I don't think so. Even worse, I suspect codepages may not be a subset of cp 65001.
Yeah, I'm pretty sure all Windows consoles use bitmap fonts by default.
Workaround, tested on Russian Windows:

Select Lucida Console as cmd window font
Run chcp 65001 before checksetup.pl
Bugzilla 3.2 is restricted to security bugs only. Moreover, this bug is either assigned to nobody or got no traction for several months now. Rather than retargetting it at each new release, I'm clearing the target milestone and the bug will be retargetted to some sensible release when someone starts fixing this bug for real (Bugzilla 3.8 more likely).
Target Milestone: Bugzilla 3.2 → ---
Severity: normal → enhancement
Target Milestone: --- → Bugzilla 3.8
Attached patch v1 (obsolete) — Splinter Review
Okay, this does it. I didn't test it on Windows, but I did test that the POSIX::setlocale function works on Windows (which it does).

If you try to print out a character that your encoding doesn't support, Perl throws warnings.
Assignee: installation → mkanat
Status: NEW → ASSIGNED
Attachment #434739 - Flags: review?(LpSolit)
To work correctly, it requires the patch from bug 550765.
Depends on: 550765
Comment on attachment 434739 [details] [diff] [review]
v1

This is a huge improvement over what we have currently, but there are still a few bits which are displayed incorrectly, see the output of checksetup.pl below, with french templates installed. Problems are:

1)
Wide character in print at Bugzilla/Install/Requirements.pm line 340.
Vérification des modules Perl DBD disponibles�

2)
ATTENTION : Vous devez définir le paramètre max_allowed_packet dans votre
configuration MySQL à au moins 3276750. Actuellement, il est défini à 1048576.
Vous pouvez définir ce paramètre dans la section [mysqld] de votre fichier de
configuration MySQL.

-----------

C:\Program Files\Apache Software Foundation\Apache2.2\htdocs\bugzilla>checksetup.pl
* Bugzilla 3.7 avec Perl 5.10.1
* sur Win7 Build 7100

Vérification des modules Perl.
Vérification de              CGI.pm (v3.33)   ok: v3.45 trouvé
Vérification de          Digest-SHA (tout)    ok: v5.48 trouvé
Vérification de            TimeDate (v2.21)   ok: v2.24 trouvé
Vérification de            DateTime (v0.28)   ok: v0.53 trouvé
Vérification de   DateTime-TimeZone (v0.79)   ok: v1.11 trouvé
Vérification de                 DBI (v1.41)   ok: v1.609 trouvé
Vérification de    Template-Toolkit (v2.22)   ok: v2.22 trouvé
Vérification de          Email-Send (v2.16)   ok: v2.198 trouvé
Vérification de          Email-MIME (v1.861)  ok: v1.863 trouvé
Vérification de Email-MIME-Encodings (v1.313)  ok: v1.313 trouvé
Vérification de Email-MIME-Modifier (v1.442)  ok: v1.444 trouvé
Vérification de                 URI (tout)    ok: v1.52 trouvé

Wide character in print at Bugzilla/Install/Requirements.pm line 340.
Vérification des modules Perl DBD disponibles�
Vérification de              DBD-Pg (v1.45)    non trouvé
Vérification de           DBD-mysql (v4.00)   ok: v4.011 trouvé
Vérification de          DBD-Oracle (v1.19)    non trouvé

Les modules Perl suivants sont optionnels :
Vérification de                  GD (v1.20)   ok: v2.44 trouvé
Vérification de               Chart (v2.1)    ok: v2.4.1 trouvé
Vérification de         Template-GD (tout)    ok: v1.56 trouvé
Vérification de          GDTextUtil (tout)    ok: v0.86 trouvé
Vérification de             GDGraph (tout)    ok: v1.44 trouvé
Vérification de            XML-Twig (tout)    ok: v3.34 trouvé
Vérification de          MIME-tools (v5.406)  ok: v5.427 trouvé
Vérification de         libwww-perl (tout)    ok: v5.829 trouvé
Vérification de         PatchReader (v0.9.4)  ok: v0.9.5 trouvé
Vérification de           perl-ldap (tout)    ok: v0.39 trouvé
Vérification de         Authen-SASL (tout)    ok: v2.13 trouvé
Vérification de          RadiusPerl (tout)    ok: v0.17 trouvé
Vérification de           SOAP-Lite (v0.710.06) ok: v0.710.10 trouvé
Vérification de            JSON-RPC (tout)    ok: v0.96 trouvé
Vérification de          Test-Taint (tout)    ok: v1.04 trouvé
Vérification de         HTML-Parser (v3.40)   ok: v3.64 trouvé
Vérification de       HTML-Scrubber (tout)    ok: v0.08 trouvé
Vérification de Email-MIME-Attachment-Stripper (tout)    ok: v1.316 trouvé
Vérification de         Email-Reply (tout)    ok: v1.202 trouvé
Vérification de         TheSchwartz (tout)     non trouvé
Vérification de      Daemon-Generic (tout)     non trouvé
Vérification de            mod_perl (v1.999022)  non trouvé
***********************************************************************
* MODULES OPTIONNELS                                                  *
***********************************************************************
* Certains modules Perl ne sont pas indispensables pour Bugzilla,     *
* mais en installant la dernière version, vous pourrez accéder à des  *
* fonctionnalités supplémentaires.                                    *
*                                                                     *
* Les modules optionnels que vous n'avez pas installés sont listés    *
* ci-dessous, avec le nom de la fonctionnalité qu'ils activent. Sous  *
* ce tableau se trouvent les commandes pour installer chaque module.  *
***********************************************************************
*    MODULE NAME * ENABLES FEATURE(S)                                 *
***********************************************************************
*    TheSchwartz * File d'attente de courrier                         *
* Daemon-Generic * File d'attente de courrier                         *
*       mod_perl * mod_perl                                           *
***********************************************************************
* Note pour les utilisateurs Windows                                  *
***********************************************************************
* Pour installer les modules listés ci-dessous, vous devez d'abord    *
* exécuter la commande suivante en tant qu'administrateur :           *
*                                                                     *
*   ppm repo add theory58S http://cpan.uwinnipeg.ca/PPMPackages/10xx/
***********************************************************************
COMMANDES POUR INSTALLER LES MODULES OPTIONNELS :

    TheSchwartz: ppm install TheSchwartz
 Daemon-Generic: ppm install Daemon-Generic
       mod_perl: ppm install mod_perl

Reading ./localconfig...

OPTIONAL NOTE: If you want to be able to use the 'difference between two
patches' feature of Bugzilla (which requires the PatchReader Perl module
as well), you should install patchutils from:

    http://cyberelk.net/tim/patchutils/

Vérification de           DBD-mysql (v4.00)   ok: v4.011 trouvé
Checking for           MySQL (v4.1.2)  ok: found v5.5.1-m2-community

ATTENTION : Vous devez définir le paramètre max_allowed_packet dans votre
configuration MySQL à au moins 3276750. Actuellement, il est défini à 1048576.
Vous pouvez définir ce paramètre dans la section [mysqld] de votre fichier de
configuration MySQL.

Suppression des modèles compilés existants.
Précompilation des modèles.terminé.
Checking for        GraphViz (any)     ok: found
Attachment #434739 - Flags: review?(LpSolit) → review-
Unless there is a a technical limitation, we should really take it for 3.6. Else the output is unreadable, all lines beings of the form:

Vérification des modules Perl�
Vérification de              CGI.pm (v3.33)   ok: v3.45 trouvé
Vérification de          Digest-SHA (tout)    ok: v5.48 trouvé
Vérification de            TimeDate (v2.21)   ok: v2.24 trouvé
Flags: blocking3.6?
Target Milestone: Bugzilla 3.8 → Bugzilla 3.6
  It's too much of an enhancement and refactoring at this point to take for 3.6.

  Bug 550765 should resolve the issues with checksetup.pl, provided that the templates are stored in UTF-8 and the user's terminal encoding is UTF-8 (which should be the most common encoding for modern terminals).
Flags: blocking3.6? → blocking3.6-
Target Milestone: Bugzilla 3.6 → Bugzilla 3.8
(In reply to comment #16)
> templates are stored in UTF-8 and the user's terminal encoding is UTF-8 (which
> should be the most common encoding for modern terminals).

It's not on Windows, which is what comment 15 is about.
  Ahh. Well, that's been a problem for quite some time (since Bugzilla 3.2), and it's what this bug is about. This patch affects every command-line script in Bugzilla, though, not just checksetup, so I don't want to mess around with that while we're in an RC stage.

  FWIW, there are many languages (Russian, CJK, anything that isn't ISO-8859-1) that will do nothing but throw warnings on Windows's default charset, so checksetup.pl will become entirely a string of warnings. I think that's not a safe thing to do post-RC, also, but it's probably OK for 3.8 because we will have some time to test and get feedback and see if it really is a problem in practical situations.
Attached patch v2 (obsolete) — Splinter Review
Okay, I figured it out. There were two problems:

1) We were calling init_console twice, which was leading to double-encoding characters.
2) We didn't set encoding() on STDERR.
Attachment #434739 - Attachment is obsolete: true
Attachment #440395 - Flags: review?(LpSolit)
Without your patch, the output on Windows 7 is:

* Bugzilla 3.7 avec Perl 5.10.1
* sur Win7 Build 7600

V├®rification des modules PerlÔǪ
V├®rification de              CGI.pm (v3.33)   ok: v3.48 trouv├®


With your patch:

* Bugzilla 3.7 avec Perl 5.10.1
* sur Win7 Build 7600

VÚrification des modules Perlà
VÚrification de              CGI.pm (v3.33)   ok: v3.48 trouvÚ


This is only a slightly better, but all letters with accents are still rendered incorrectly.
Hum, despite the shell uses cp1252, the last few lines of checksetup.pl are displayed correctly when using cp850.
(In reply to comment #20)
>> VÚrification des modules Perlà
> VÚrification de              CGI.pm (v3.33)   ok: v3.48 trouvÚ

  I can't reproduce this issue. Using the current French templates, the lines appear correctly for me using Windows's default terminal settings. Do you have something unusual about your terminal configuration?

  I do see a problem with a single message in checksetup.pl--the one printed about the DBD modules. But that's it.
  Also, you might want to try throwing some debug code into set_output_encoding to see what Bugzilla thinks your terminal's encoding is. Mine says cp1252.
Okay, so the problem that I was experiencing (and possibly that you were experiencing as well) is that CGI.pm sets binmode on STDOUT, but only on Windows! I'm going to report it to them as a bug.
I've reported the CGI.pm bug here:

https://rt.cpan.org/Ticket/Display.html?id=57524
Attached patch v3 (obsolete) — Splinter Review
Okay, this works around the CGI.pm bug. Calling set_output_encoding over and over is harmless, because it does nothing if the output encodings are already correct.
Attachment #440395 - Attachment is obsolete: true
Attachment #445575 - Flags: review?(LpSolit)
Attachment #440395 - Flags: review?(LpSolit)
Hum, this change doesn't help. The output remains the same.
I added some debug code into set_output_encoding() as follows:

sub set_output_encoding {
    # If we've already set an encoding layer on STDOUT, don't
    # add another one.
    my @stdout_layers = PerlIO::get_layers(STDOUT);
print "\nSTDOUT layers are " . join("/", @stdout_layers) . "\n";
    return if grep(/^encoding/, @stdout_layers);

    my $encoding;
    my $locale = setlocale(LC_CTYPE);
print "LC_CTYPE = $locale\n";
    if ($locale =~ /\.([^\.]+)$/) {
        $encoding = $1;
print "found encoding $encoding\n";
        if (ON_WINDOWS) {
            $encoding = "cp$encoding";
print "Windows detected. Setting encoding to $encoding\n";
        }
    }

    $encoding = Encode::resolve_alias($encoding) if $encoding;
print "encoding alias is $encoding\n";
    ...
}


And now the output of checksetup.pl becomes:

C:\Program Files\Bugzilla\bugzilla>..\perl\perl\bin\perl.exe checksetup.pl -t

STDOUT layers are unix/crlf
LC_CTYPE = French_Switzerland.1252
found encoding 1252
Windows detected. Setting encoding to cp1252
encoding alias is cp1252
* Bugzilla 3.7 avec Perl 5.10.1
* sur Win7 Build 7600

VÚrification des modules Perlà

STDOUT layers are unix/crlf
LC_CTYPE = French_Switzerland.1252
found encoding 1252
Windows detected. Setting encoding to cp1252
encoding alias is cp1252
VÚrification de              CGI.pm (v3.33)   ok: v3.48 trouvÚ

STDOUT layers are unix/crlf/encoding(cp1252)/utf8
VÚrification de          Digest-SHA (tout)    ok: v5.48 trouvÚ

STDOUT layers are unix/crlf/encoding(cp1252)/utf8
VÚrification de            TimeDate (v2.21)   ok: v2.24 trouvÚ


Is the mix encoding(cp1252)/utf8 expected?
Have you tried explicit 'chcp 65001' before checksetup.pl?  Any changes in output?
(In reply to comment #29)
> Have you tried explicit 'chcp 65001' before checksetup.pl?  Any changes in
> output?

What's that?
chcp returns 850, despite LC_CTYPE says 1252.
Oh, and chcp 65001 before checksetup.pl has no effect, with or without the patch applied.
(In reply to comment #32)
> Oh, and chcp 65001 before checksetup.pl has no effect, with or without the
> patch applied.

I take that back. I changed the font used by cmd.exe to Lucida, and now your trick works great, without mkanat's patch!
Ohhh, I think maybe we have to use a different function to get the console encoding, on Windows. I know what it is, I'll provide another patch and see if it makes a difference.
Attached patch v4Splinter Review
Okay, this patch uses OutputCP instead of setlocale, now, on Windows. Does this fix your problem?
Attachment #445575 - Attachment is obsolete: true
Attachment #445601 - Flags: review?(LpSolit)
Attachment #445575 - Flags: review?(LpSolit)
As I said in comment 33, I see no difference now that I set the font to Lucida, so I cannot review your patch as "ok, this fixes my problem". Vitaly, does this patch help in your case?
Attachment #445601 - Flags: review?(LpSolit) → review?(timello)
Attachment #445601 - Flags: review?(timello) → review+
Comment on attachment 445601 [details] [diff] [review]
v4

It works! I tested it using cp1252. I printed some portuguese words with accents which were written in UTF-8. They all were printed the way they should be. I suppose it will work for other languages too.
Flags: approval?
Flags: approval? → approval+
Committing to: bzr+ssh://bzr.mozilla.org/bugzilla/trunk/
modified Bugzilla.pm
modified checksetup.pl
modified Bugzilla/Install/Requirements.pm
modified Bugzilla/Install/Util.pm
Committed revision 7257.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Keywords: relnote
Added to the release notes in bug 604256.
Keywords: relnote
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: