Closed Bug 320156 Opened 14 years ago Closed 12 years ago

Internationalization and Localization for Litmus

Categories

(Webtools Graveyard :: Litmus, defect, P1, critical)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: zach, Assigned: coop)

References

Details

Attachments

(1 file, 1 obsolete file)

One of the things that came out of the Litmus BOF at the Mozilla Summit was the need for internationalization and localization support in Litmus. Bugzilla has a pretty good system for localized templates, so we should look at that and incorporate it into Litmus. Secondly, we need to allow testers to indicate the localization of the product under test (auto-detecting for Firefox when possible).

We should also examine the roadmap and determine how much of a priority this is for Litmus.
So, um, part of this is already in place. I added the locale selection and auto-detect (for Firefox) for running tests in advance of the Thunderbird Test Day (today). I even updated the advanced search tools. This will need some testing and perhaps extension, but it is largely there.

With regards to localizing Litmus itself, I think we can certainly try to design Litmus in such a way that it is easy to localize, but I don't think this can be a primary concern. If there is someone in the development community with experience localizing webtools who would like to help us out with this, I would welcome their assitance. 

Since we're not tying Litmus directly to Bugzilla anymore at the auth level, I also don't think we need to tie ourselves to there template model unless it does happen to be the best available paradigm.
QA Contact: ccooper → litmus
Bumping the priority and severity on this since we're actively seeking help from Mozilla Japan on this now.

We have a few decisions we need to make here:

1) How do we track user language/locale preference? As part of their core user data? As part of each test run?

2) How do we want to present testcases to testers in other languages? Do we want to have completely separate test runs/testgroups/subgroups/testcases in (e.g.) Japanese? Or do we want to maintain translated versions of each testcase, and default to displaying en-US versions where no translation exists?

3) Do we want to translate the entire interface, or just testcases?

I'm sure there's more...that's just what comes to mind immediately.

Some useful links:

Setting up MySQL to use utf-8: 
http://community.postnuke.com/index.php?name=News&file=article&sid=2831

clouserw's l10n blog post:
http://micropipes.com/blog/2007/07/26/ten-tips-for-website-localization/
Severity: normal → major
Priority: -- → P2
I've got the staging server to the point where it's accepting Japanese input. See testcase #4532 as an example. Note: this is a new, bogus placeholder Japanese testcase with some kanji I cribbed out of the Japanese Litmus tutorial on QMO.

Here's the list of changes I made:
* dumped Litmus schema and data separately
* updated schema, set database level character set to utf8, and database collate to utf8_unicode_ci
* updated all schema references from latin1_bin->utf8_unicode_ci and latin1->utf8
* dropped compound contact_info key from users because it violates the key length under utf8 
* reimported data
* changed the charset to utf-8 in the global/header.html.tmpl template
* added 'use utf8;' to the mod_perl startup params

As you can see, the summary doesn't display correctly when piped through the html template filter, but that's probably a change we want to make to the filter itself rather than to (almost) all our templates.
So that seems to be only a partial solution. The Japanese test is appearing, but it is being translated into HTML entities prior to being stored in the database.
Hi cooper, zach. Thanks working for this bug.

(In reply to comment #4)
> So that seems to be only a partial solution. The Japanese test is appearing,
> but it is being translated into HTML entities prior to being stored in the
> database.

Yes, testcase #4528, #4531 was translated into number entity references before the sending to server. This was probably done by Firefox automatically if the encoding of the page is latin-1 (ISO-8859-1).

Here is a HTTP header captured with Live HTTP Headers extension:
http://litmus-stage.mozilla.org/index.cgi
----------------------------------------
GET /index.cgi HTTP/1.1
Host: litmus-stage.mozilla.org
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; ja-JP-mac; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: ja,en-us;q=0.7,en;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: Shift_JIS,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://litmus-stage.mozilla.org/manage_testcases.cgi

HTTP/1.x 200 OK
Date: Thu, 09 Aug 2007 01:25:58 GMT
Server: Apache/2.0.55 (Red Hat)
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=ISO-8859-1
----------------------------------------
As you see at the last line, Content-Type: header specify ISO-8859-1 but generated html is: 
----------------------------------------
html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ja" lang="ja">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
...
----------------------------------------
Content attribute of the meta element say charset=utf-8.
HTTP Content-Type: header should also say charset=UTF-8.

And if I set charcter encoding of Firefox UTF-8 by hand (ignoring HTTP headers) and add testcase, Japanese characters is send as UTF-8 strings, not number entity references.
In that case, testcase like #4531, #4545 is generated.
Their characters are saved (or miss-converted after getting from mysql) as funny characters.
# I cannot access raw data saved in mysql and don't know when strings are broken.
(In reply to comment #5)
> (In reply to comment #4)
> > So that seems to be only a partial solution. The Japanese test is appearing,
> > but it is being translated into HTML entities prior to being stored in the
> > database.
> 
> Yes, testcase #4528, #4531 was translated into number entity references before
> the sending to server. This was probably done by Firefox automatically if the
> encoding of the page is latin-1 (ISO-8859-1).

Of course this behaver cannot accepted.
When we store Japanese as number entity references, it's difficult to edit it and in some place, we see raw number entity references. for example
 - popup tooltip when we click comment icon of result id row of search result page
 - name row of 'Select a Testgroup and Subgroup to Test' page for test run.

I'm also trying to install and hack litmus but some trouble happened to install some CPAN modules and cannot do it yet.
I think the difficulty is that our webserver is setting the default character encoding to Western (ISO-8859-1) instead of Unicode (UTF-8). If I change the chracter encoding manually, the display/submission works. Sites like QMO default correctly to UTF-8.

BTW, to avoid dealing with the many layers of Litmus at this early stage, I'm using coop_test.cgi on the staging server.
Modified the Litmus::CGI->header call to set the default charset to utf-8 on the staging server, and this is allowing the Japanese characters to appear correctly in the output with translation. Note: this change only appears on the staging server right now.

I haven't done an end-to-end test of putting Japanese text into the system->database->retrieving said data successfully out the other side. I did start to play around with the existing Japanese Firefox test run (15), but didn't get very far due to other distractions today, and that test run was input under the wrong encoding scheme anyway.

Zach: if you get a chance, can you determine whether there is an easy way to utf8-scrub all incoming and outgoing data automatically, e.g. subclassing an existing method or something? Doing the end-to-end test I mentioned above for a subset of the scripts (say test runs) should determine what scrubbing is required and how pervasive those changes will need to be.
(In reply to comment #8)
> correctly in the output with translation.

...er, that should read 'without any translation, e.g. to HTML entities'
I confirmed HTTP header say now utf-8 and coop_test.cgi treat UTF-8 strings well but litmus-stage.mozilla.org cannot yet.
json.cgi still specify ISO-8859-1 in Content-Type: header but I'm not sure if it will cause some problem or not.


As far as I read documents of Perl 5.8+ about UTF-8 treatment, we should:
 - write "use utf8;" pragma when the perl script itself contains UTF-8 strings
 - use binmode and tell perl to use utf8 clearly when we use STDIO
   binmode STDIN,  ":utf8";
   binmode STDOUT,  ":utf8";
 - use 3 arguments version of open() for UTF-8 file reading
   open(FH, "<:utf8", $filename);
 - encode/decode all inout for other IOs including DB

Concerning DB, if the module define some filter like DB_File, you can:
  use DB_File;
  use Encode qw/ encode decode /;
  use encoding "utf8";
  
  $db = tie( %hash, 'DB_File', "filename", O_CREAT ) or die;
  $db->filter_store_key  ( sub { $_ = encode("utf8", $_) } );
  $db->filter_store_value( sub { $_ = encode("utf8", $_) } );
  $db->filter_fetch_key  ( sub { $_ = decode("utf8", $_) } );
  $db->filter_fetch_value( sub { $_ = decode("utf8", $_) } );
  
  $hash{"**raw UTF-8 strings here**"} = "**raw UTF-8 strings here**";
  while( my( $key, $value ) = each %hash ){
      print "$key:$value\n";
  }
If the DB interface module don't support filter function, we have to hack the module or make some simple wrapper to encode/decode all inout.


I'm not a perl hacker and I may say wrong thing. So please read reliable documents. I think this will help you:
http://ahinea.com/en/tech/perl-unicode-struggle.html
or search google with keyword like 'perl 5.8 utf8 mysql' and find more.
Just sample UTF-8 input data.

This is a translation of the first testcase of L10N subgroup.
If you nees some sample UTF-8 input, you can use this.
We shouldn't need to worry about DB_File or any of the file IO stuff, since we're using DBD::mysql, not straight berkeleydb. I'll look tomorrow at where we need to plug into to filter/scrub the data on its way into and out of the database.
(In reply to comment #10)
> I confirmed HTTP header say now utf-8 and coop_test.cgi treat UTF-8 strings
> well but litmus-stage.mozilla.org cannot yet.
> json.cgi still specify ISO-8859-1 in Content-Type: header but I'm not sure if
> it will cause some problem or not.

Oops, I had an error in how I was appending the charset to the header. This should be fixed now.

We'll need to do some encoding/decoding to be sure, but the goal here is to be as minimally invasive as possible with the existing code.

Zach: I found a module on CPAN that may help us out -- Class::DBI::utf8

http://search.cpan.org/dist/Class-DBI-utf8/lib/Class/DBI/utf8.pm

I've installed the module on the staging server, and tried plugging it into Testcase.pm., but haven't had a chance to test it yet. I'm going to be busy today prepping for the on-site, so if you have a chance to investigate it, I would appreciate it.
dynamis: when you get the chance, can you please add your thoughts on comment #2? 

We'll get the technical details sorted out eventually, but it's more important for us to know what you think the priority is in terms of localization, e.g. testcase presentation vs. entire interface. If you think testers can use your Litmus tutorial to navigate the english interface, then we can concentrate more on the testcase display.

Personally, I'm a fan of the AMO model where we would have a single testcase that could have translations into many languages, and we would default to display the en-US version where no translation existed. How to make language display selectable (perhaps a select box or simple option list in the Welcome! user box?) remains to be decided.
(In reply to comment #12)
> We shouldn't need to worry about DB_File or any of the file IO stuff ...
Of course not. DB_File example is just for to make it clear. I just wanted to say that we need make simple wrapper if the DBI don't support utf-8 by itself.

(In reply to comment #14)
> testcase presentation vs. entire interface. If you think testers can use your
> Litmus tutorial to navigate the english interface, then we can concentrate more
> on the testcase display.

We need entire translated interface to make more users join the QA test. Users hate english and as far as Litmus UI is in English we l10n owners must have other place/documents to call QA testers and gather their results.

I know supporting utf-8 in/out is not so difficult and we will be able to start using Litmus for our QA if at least Japanese testcase is in there. Concerning to the UI (small amount of almost consistent English) of the website, actually we have a workaround.
There is a extension which will dynamically replace English text of websites to Japanese based on the traslation table specific to each site. That is Japaneze:
http://japanize.31tools.com/
That's the reason I didn't strongly require UI i18n.

We already started to make translation table of Litmus site and if user install Japanize, they can use Litmus with Japanese translated UI.
Japanese translation table for Litmus:
http://japanize.31tools.com/index.cgi/view?target=litmus.mozilla.org

Of course this is not ideal solution but it works as a temporary workaround.
So, at least for Japanese teams, we should work for contents(testcase) i18n support first.

> Personally, I'm a fan of the AMO model where we would have a single testcase
> that could have translations into many languages, and we would default to
> display the en-US version where no translation existed. How to make language
> display selectable (perhaps a select box or simple option list in the Welcome!
> user box?) remains to be decided.

Yes I agree. We should have a single testcase and it contain translations, not each testcase for all 50+ languages. We need make testcases possible to have translated strings and use one of them according to users/clients.
But at the same time, we must be able to make specific locale only testcases. For example, english users don't test multibyte language input/domain but japanese and other must test it. Some settings are l10n dependent and there are some testcases which should be done only by some locales.
That is, english is default and translations should be added and no-english testcase should also be esupported.

About how to select the language, we should do 3 things:
 - Client locale/languate detection
   # BTW, we should detect client locale and use it as default selection for
   # "extra configuration information" - "locale" selectbox
   # We must change it each time and very annoying, sometimes select wrong one.
 - User settings
 - Support changing language by hand with lang selectbox.
Displaying english by default and above 3 language selection is already realized in QMO site. I requested jay to install i18n module and I set up it.
# http://quality.mozilla.org/ja/node/324/ is the first translation sample. ;)

QA testers will use/see both Litmus/QMO site and I think it's better if both support same/similar things.

(In reply to comment #2)
> Bumping the priority and severity on this since we're actively seeking help
> from Mozilla Japan on this now.

Sorry I forgot replying for this.

> 1) How do we track user language/locale preference? As part of their core user
> data? As part of each test run?

As I wrote above, we should support both. It's annoying to select language/locale each test run and we should user client locale detection and/or user setting. But at the same time, we must support selecting it by hand with drop down select box to make possible users test multiple locale etc.

> 2) How do we want to present testcases to testers in other languages? Do we
> want to have completely separate test runs/testgroups/subgroups/testcases in
> (e.g.) Japanese? Or do we want to maintain translated versions of each
> testcase, and default to displaying en-US versions where no translation exists?

For the common testcase, it's better to use one testcase for every languages and it contains translations. English should be displayed if translation is not existed for the testcase.
And at the same time we should be able to support making locale-only testcases.

> 3) Do we want to translate the entire interface, or just testcases?

Of course we want to translate all if possible. UI string are not so much and required work for translation is not so huge. We'll translate them if we can.
But this will need a lot of work and so far we have a workaround (Japanize). So the priority for this is lower than above 1) and 2).

One more important work is exists. That is, make same test for each version/product can be found easily. We don't want translate almost same (only version no is different) testcase twice.
If one person do all translation at once, he can notice existing translation but if not, same work will be done twice by different translators.

Priorities (considering the work needed to realize it) is I think:
 1st - Accept UTF-8
 2nd - locale/language user setting and/or client detection to avoid annoying manual selection which cause miss report sometimes.
 3rd - make it easy to find same test for versions/products
 4th - make testcases support translations to avoid toooo many testcases with 50+ locales in the future.
       # with few locales, not so large problem but this should be done before many other locale starts translations.
 5th - support site UI translations.

Thanks.
Zach: this is dragging. If you're not going to be able to work on this, please reassign to me.
No reply from Zach, so I'm taking this officially and will try to have something up by the end of the week.
Assignee: zach → ccooper
Severity: major → critical
Priority: P2 → P1
Status: NEW → ASSIGNED
Sorry Coop. I'm going to be totally crazed for the next week and a half or so (until the 15th basically), and then things should settle down. I'd be happy to work on the next phases of this and help where I can.
In my casual testing so far, this has allowed me to create objects (testgroups/subgroups/testcases) with utf8 characters that make the round-trip to the db and back without getting converted into html entities.

Zach: can you review the patch for sanity and do some poking on the staging install (this patch is live there) to verify that we're not losing any data and that utf8 data is in fact being propagated everywhere? I'm working on Selenium tests here to test the same.
Attachment #283028 - Flags: review?(zach)
Comment on attachment 283028 [details] [diff] [review]
Add utf8_columns to all db classes

I actually had to make some additional changes to FormWidget.pm and Litmus::Template to get things encoded/decoded properly. There may be some more instances where similar changes need to be made, but we have the FormWidget.pm to use if they do pop up. 

I'm comfortable enough with these changes to check them in now.
Attachment #283028 - Attachment is obsolete: true
Attachment #283028 - Flags: review?(zach)
(In reply to comment #20)
> but we have the FormWidget.pm
> to use if they do pop up. 

er, the FormWidget.pm _example_ to use
Landed.

Asai: I encourage you to update the existing testcases/subgroups/testgroups with proper Japanese characters now that Litmus will actually accept them.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Thanks a lot!
I confirmed Litmus can treat Japanese (UTF-8) characters now.
In manage testcases/subgroups/testgroups UI, all works fine! ;)
# I'm adding our testcases...

But unfortunately it's not perfect yet. :(
In the run tests UI, text of testcase/subgroups/testgroups will show garbage characters.

Steps to check the output:
 1) open https://litmus.mozilla.org/run_tests.cgi
    You can see garbage characters of testgroups name in this page
 2) select "Japanese Firefox 2.0 製品テスト" testrun or just open
    https://litmus.mozilla.org/run_tests.cgi?test_run_id=15
 3) You'll see "Your Chosen Test Run" page
    You can see garbage characters of testgroups name in this page
 4) Select OS etc and submit configuration.
 5) You'll see "Select a Testgroup and Subgroup to Test" page
    You can see garbage characters of testgroups/subgroups name in this page
 6) submit
 7) You'll see "Enter Test Results" page
    You can see garbage characters of testcate informations in this page

I believe you can fix this soon since all you should do is just do same things for testrun pages as you already did for manage pages.
# If you fix this soon, we can use litmus for coming Firefox 3 beta 1 QA. ;)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
dynamis: I think you must have just narrowly missed my check-in for bug 402578 that landed this morning. 

I see proper (well, it looks proper to me) Japanese now following the steps you posted above.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Depends on: 402578
Resolution: --- → FIXED
Ah, you have resolved just yesterday.
I confirmed all output is proper.
 --> verified

Great work!
Status: RESOLVED → VERIFIED
I added more testgroups/subgroups/testcates in Japanese and I found problems with newly added testcases.

 1) open Japanese Firefox 2.0 test run:
    https://litmus.mozilla.org/run_tests.cgi?test_run_id=15
 2) select platform etc and submit configurations
 3) select "Japanese 2.0 Surf's Up - A Fun Smoketest!" and submit
 4) You'll see the following error:
    Litmus has suffered an internal error - undef error - Wide character in
    subroutine entry at /usr/lib/perl5/site_perl/5.8.5/Text/Markdown.pm line 194.
 0) same error happen with Japanese 2.0 Smoketests groups

 a) open manage testcace page:
    https://litmus.mozilla.org/manage_testcases.cgi
 b) select Firefox, 2.0 branch, Japanese 2.0 Surf's Up - A Fun Smoketest!
 c) select some testcase
 d) select testcase whose id is one of:
    4755, 4757, 4760, 4761, 4763
 d) Testcase info will not be changed
    # info of previously selected testcase remains
 x) same thing happen with following testcases of Someketests group:
    4771, 4782, 4796, 4797, 4803

If you select one of these testcase at first in the manage testcase page, we can see info of the selected testcase correctly. So these testcase must be correctly added in the database I think.
I'm not sure what is difference between these testcases and the others... :(
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → ASSIGNED
The Text::Markdown module seems to be having some trouble internally dealing with certain utf-8 characters. I would personally be all for disabling this module entirely, but I think others on the QA rely on it.

I will try some experiments with the Text::Markdown module later tonight.
The Markdown.pm module takes snippets of text, breaks them up, and makes subroutines out of some of the component parts to generate the replacement HTML markup. This works great with Latin charsets, but perl still doesn't handle unicode well in subroutines and package names (http://perldoc.perl.org/utf8.html).

In many cases, the Markdown syntax still works on the Japanese testcases, but as you noticed, there must be some problem characters in certain testcases that perl just cannot handle internally, even when running in "use utf8" mode. I've added an eval{} block around all the places where the Markdown syntax gets called so that even if the Markdown does fail on one of these problem characters, we can just substitute in the original markup.

dynamis: because of the limitations mentioned above, I would recommend that you revisit your Japanese testcases and change the markup to be proper HTML rather than Markdown (if you were relying on Markdown to begin with). I've also added instructions to that effect to the beginning of the formatting help blurb. Like I said, the steps/results *should* still appear correctly in most cases, but if you want to be sure, you should use HTML markup.

If you need help with the HTML syntax at all, just let me know.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
Thanks Chris. I confirmed all testrun will be shown correctly without error.

And as far as I checked, all Markdown formatting works correctly now. I think we don't need to use HTML markup.
It seems that adding eval{} block solved the problem completely and Markdown now works correctly with UTF-8 characters too.

So you can remove the instructions for UTF-8 characters at the beginning of the formatting help. ;)
I will continue adding/editing more and more Japanese testcases with Markdown formatting and if something wrong found, I'll report again and use HTML formatting for the testcase.
# Only in that case please add the instructions again.


BTW, another bug found:
Preview of the testcase info in the manage testcase page will show garbage characters. I found this bug before you update the litmus with the workaround for Markdown and this is not relating Markdown problem I think.
# During my test I did a little while age, Markdown output behavior changed and
# your post to this bug is commited just before I report the new found problem.
I was wrong. I found some case Markdown will not work correctly with Japanese testcases. That's when we use automatic link (not always).
# there may be other cases where markdown will not work correctly with utf-8

I'm not sure which character (or which markdown structure) will make problems and I'll use proper HTML markup (not markdown) where the problem occur as you recommend.

So, don't remove the instructions for UTF-8 characters at the beginning of the
formatting help.
(In reply to comment #29)
> BTW, another bug found:
> Preview of the testcase info in the manage testcase page will show garbage
> characters. I found this bug before you update the litmus with the workaround
> for Markdown and this is not relating Markdown problem I think.

Checking in preview.cgi;
/cvsroot/mozilla/webtools/litmus/preview.cgi,v  <--  preview.cgi
new revision: 1.3; previous revision: 1.2
done
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.