Open Bug 469732 Opened 11 years ago Updated 6 years ago

Generic maketext capability

Categories

(Bugzilla :: Bugzilla-General, enhancement)

enhancement
Not set

Tracking

()

People

(Reporter: vitaly.fedrushkov, Unassigned)

References

(Blocks 3 open bugs)

Details

Attachments

(2 files, 3 obsolete files)

Attached patch Proof of concept (obsolete) — Splinter Review
A proof of concept patch, think of it as a technology demonstrator.

All known lexicons are loaded statically.  Existing template/LANG take precedence over (template/en + locale/LANG).  Of course, template/LANG may use l() calls as well.

Surely not ready for review, but feedback is VERY appreciated.
Attached patch Test case (obsolete) — Splinter Review
Test case for attachment 353111 [details] [diff] [review]

Feel free to hack other templates, do not forget to run 

  xgettext.pl -g -D template/en -o locale/en/en.po

when done.  Then go editing en.po.  And yes, .po files are editable with Virtaal ;-)
Blocks: 412161
Comment on attachment 353112 [details] [diff] [review]
Test case

I'm really *against* this kind of changes. The code becomes unreadable. We have no easy way to know what we are writing.
Is

  [% IF bugs.size == 0 %]
    [% terms.zeroSearchResults %].
  [% ELSIF bugs.size == 1 %]
    One [% terms.bug %] found.
  [% ELSE %]
    [% bugs.size %] [%+ terms.bugs %] found.
  [% END %]

more readable than

  [% l('[_1] bug(s) found.',bugs.size) %]

?

I am trying to summarize such concerns on a Wiki:

https://wiki.mozilla.org/Bugzilla:L10n:Maketext#Drawbacks_and_concerns

Improvement ideas are welcome.
(In reply to comment #3)
> Is ... more readable than ... ?

For me, yes, because I have no idea what's behind this function without having to look again at another file. So a developer like me would have to read the sequence

.cgi -> .pm -> .tmpl <-> .po

to know what a given step will produce. I'm really not interested in this last step. Having to read sentences out of their context is painful, IMO, and forces us to do a continual ping-pong between templates and .po files. And when you want to add code, I don't know how you plan to manage duplicated strings. Things could quickly become a mess. Maybe I have a too narrow vision, but that's my opinion.
(In reply to comment #4)
> For me, yes, because I have no idea what's behind this function without having
> to look again at another file. So a developer like me would have to read the

Maybe I misunderstand the way this goes, but I understand that if all's well, then you don't need to look at a .po file at all, because the sentence is there in plain English. Strip the [% l('') %] brackets and you're there.

> want to add code, I don't know how you plan to manage duplicated strings.

Strings have the original English string as hash key, don't they? There wouldn't be any duplication.

Oh my. I hope I'm right. Maybe I should get my feet wet with all this stuff first.
(In reply to comment #5)
> Oh my. I hope I'm right. Maybe I should get my feet wet with all this stuff
> first.

  You're right. :-)
Status: UNCONFIRMED → NEW
Ever confirmed: true
(In reply to comment #5)
> then you don't need to look at a .po file at all, because the sentence is there
> in plain English. Strip the [% l('') %] brackets and you're there.

In that case, what you plan to do is ridiculous, IMO. The key passed to the l() function should be short, to make it useful. Else everytime you fix a typo somewhere, you have to fix it in the template *AND* in the .po file. I really don't see how duplicating strings will improve something.
(In reply to comment #7)
> The key passed to the l() function should be short, to make it useful.

  No, then you really would have to go read the .po file.

> Else everytime you fix a typo
> somewhere, you have to fix it in the template *AND* in the .po file.

  Um, no. There doesn't need to be an English .po file, only a .po file for other languages. Or at least, we can make things work this way.

  This gettext and l() system is how every serious software project does localization--we're just joining the ranks of people who develop software in the 21st century.
(In reply to comment #8)
>   Um, no. There doesn't need to be an English .po file, only a .po file for
> other languages. Or at least, we can make things work this way.

  In fact--Vitaly, I'd like to make this a requirement--no English .po file needed. So this would eliminate use of Locale::Maketext::Fuzzy.

  Also, I'd like to see if we can get the {{{ }}} syntax working, with a Template Toolkit modification.
(In reply to comment #8)
>   Um, no. There doesn't need to be an English .po file, only a .po file for
> other languages. Or at least, we can make things work this way.

Anyway, every time you fix a typo, localizers will have to update their .po file (because the key is different), even if their translated strings won't change. This is IMO bad design.


>   This gettext and l() system is how every serious software project does
> localization

Do they translate whole HTML pages? They usually only translate small strings at once, such as menus and such AFAIK.
(In reply to comment #9)
>   In fact--Vitaly, I'd like to make this a requirement--no English .po file
> needed. So this would eliminate use of Locale::Maketext::Fuzzy.

A generated .po file (1:1 from l('') brackets) would be ok, though, I'd say. This would keep the behaviour identical accross languages at runtime.
(In reply to comment #10)
> Do they translate whole HTML pages? They usually only translate small strings
> at once, such as menus and such AFAIK.

I just checked, and e.g. The Gimp 2.4.x has strings translated in the HTML pages themselves for the help files. They do not use .po files for them, probably for the reasons above.
(In reply to comment #10)
> Anyway, every time you fix a typo, localizers will have to update their .po
> file (because the key is different), even if their translated strings won't
> change. This is IMO bad design.

  They already have to manage conflicts anyhow. This is a totally trivial point, too--it's not like we correct 100 typos every day, or even more than maybe 10 per release.

> Do they translate whole HTML pages? 

  Web apps do, yes.
(In reply to comment #13)
>   They already have to manage conflicts anyhow.

In this case, you don't make their life easier, and make ours more difficult. I see no benefit which worths the effort.


(In reply to comment #11)
> A generated .po file (1:1 from l('') brackets) would be ok, though, I'd say.

Generated how? By checksetup.pl? If we have to maintain it manually, we will very quickly have both files out of sync.
(In reply to comment #14)
> (In reply to comment #13)
> >   They already have to manage conflicts anyhow.
> 
> In this case, you don't make their life easier, and make ours more difficult. I
> see no benefit which worths the effort.

My life *will* be easier if I don't have to separate language from code changes any more. I'm willing and happy to manage typo or re-wording conflicts.

> (In reply to comment #11)
> > A generated .po file (1:1 from l('') brackets) would be ok, though, I'd say.
> 
> Generated how? By checksetup.pl? If we have to maintain it manually, we will
> very quickly have both files out of sync.

Yeah, checksetup.pl.
Let me just tell you about my experience about gettext based project:
Once upon a time, when I wanted to contribute to the free software community, I did in a big free software project (which I wouldn't name for courtesy), I submitted a .po file of a piece of software, about which I strictly don't know what it was about: no context, no nothing, just a plain text file with strings to localize. I submitted it and it was embedded in the project...

Another one, which I'm still involved in, is gnu.org site: I'm the French responsible of localization there. The idea was: we have not enough localizers because localizing html file is difficult, so let's go to gettext (no, this wasn't my idea :) ).
This move is 6 months old or so, and I couldn't see any new contributors, whatever the language. What I could see is converting ~200 files into .po format as for the other l10n coordinators, and this was also a huge amount of work from the developer side. And each time a comma or whatever unsignificant stuff is modified, I have to parse manually the ten lines string that has been modified.

So, no, I don't think that "gettext" is the 21st century localization way and it is not *the* answer to solve the current localization problems we face. Though, I've no other solution to offer.
Whatever, if this is the future only way to localize Bugzilla, I will give up. This is not an ultimatum (and I'm nobody in this project :-) ), this is just a fact: I have not the luxury to spend time for this move.
(In reply to comment #15)
> My life *will* be easier if I don't have to separate language from code changes
> any more.

I doubt you can reasonably separate both, even with .po files. How do you plan to pass classes (for styles) or other <b></b>, <u></u>, <em></em>, .... HTML tags?
(In reply to comment #17)
> How do you plan
> to pass classes (for styles) or other <b></b>, <u></u>, <em></em>, .... HTML
> tags?

  We don't have much of that in the middle of sentences. For some files where we have lots of that, like the release notes, it'd probably be better to just let people translate the whole page, and skip gettext.
(In reply to comment #17)
> I doubt you can reasonably separate both, even with .po files. How do you plan
> to pass classes (for styles) or other <b></b>, <u></u>, <em></em>, .... HTML
> tags?

Please note some languages have different traditions in text emphasis.  

Where English use bold and italics, Russian typesetting tradition use extra spacing.  However, wide use of western software slowly changes this.

For some scripts certain styles are impractical.  Underline may hurt devanagari and mhedruli.  Kanji does not use italics, katakana is used instead.

From 'pristine' l10n standpoint: explicit <b></b> tags outside gettext calls are bad practice.  Perhaps we should consistently use <em></em> to tell translators 'what to say' and leave to them decision on 'how to say'.
(In reply to comment #18)
>   We don't have much of that in the middle of sentences. For some files where
> we have lots of that, like the release notes, it'd probably be better to just
> let people translate the whole page, and skip gettext.

Agreed, translating documentation or help templates have no problems within current architecture.  It is sophisticated code with scattered short strings which makes trouble to translators when changed.
(In reply to comment #16)
> submitted a .po file of a piece of software, about which I strictly don't know
> what it was about: no context, no nothing, just a plain text file with strings
> to localize.

Perhaps you were somehow poorly equipped, if your .po editor of choice is unable to display context at the same time.  See

http://www.gnu.org/software/automake/manual/gettext/C-Sources-Context.html

for example.

> So, no, I don't think that "gettext" is the 21st century localization way and
> it is not *the* answer to solve the current localization problems we face.

It certainly isn't.  Maketext is somewhat better than gettext.  There is also l20n to consider (https://wiki.mozilla.org/L20n) but AFAICT it has stuck to the drawing board.

I'd really _like_ to discuss l20n future, but there was no feedback to my posts on mozilla.dev.apps.bugzilla nor in localizers@bugzilla.org.  It is just irrelevant to this bug.

> Though, I've no other solution to offer.
> Whatever, if this is the future only way to localize Bugzilla, I will give up.
> This is not an ultimatum (and I'm nobody in this project :-) ), this is just a
> fact: I have not the luxury to spend time for this move.

Backwards compatibility should be preserved, see bug 407752 comment 31.
(In reply to comment #11)
> A generated .po file (1:1 from l('') brackets) would be ok, though, I'd say.
> This would keep the behaviour identical accross languages at runtime.

There is no need in such 1:1 file, because of _AUTO feature not turned off, and strings not matched by Lexicon go through.  This also makes it safe for translations to be out of sync, users will see English message anyway.

But numeric inflection logic MUST go away, see bug 412161 (and its first summary).  We should always write 'bug(s)', exactly because himorin does not care :-)

So minimal en.po file will contain all strings with parameters, along with their corresponding %quant() calls in English.
(In reply to comment #22)
> We should always write 'bug(s)', exactly because himorin does not
> care :-)

  No, as I said, no Locale::Maketext::Fuzzy. Please write l("blah bug", "blah bugs") as is standard in other gettext solutions.
(In reply to comment #10)
> Anyway, every time you fix a typo, localizers will have to update their .po
> file (because the key is different), even if their translated strings won't
> change. This is IMO bad design.

This is an epidemic misconception about how gettext should work, and sadly it works this way in some projects.

Kindly read TPJ13 and webl10n again.  When you write the code, your text strings are addressed to translator, not to end user, unless your end user speaks your language and string is static.  You are expected to provide enough context for accurate and unambiguous translation.   This is called 'what to say'.  Spelling and punctuation is _not_ essential, as your text IS a primary key.

If you want to fix a typo, fix it in .po, not in code.  Other languages don't care, the typo have not survived translation anyway.  And proofread your .po files, not the code.  Do not hesitate to add a key to .po if it wasn't there (was left to _AUTO).

But if you change the meaning of the phrase, then, and only then, you really change text in code.  All corresponding .po entries will become fuzzy (in its original l10n sense, not Perl module) and attract translators' attention.
(In reply to comment #10)
> Anyway, every time you fix a typo, localizers will have to update their .po
> file (because the key is different), even if their translated strings won't
> change. This is IMO bad design.

Large projects have already addressed this problem, for example:

http://linux.die.net/man/1/msguntypot
(In reply to comment #9)
> Also, I'd like to see if we can get the {{{ }}} syntax working, with a
> Template Toolkit modification.

This would be easy, as we already have Template::Parser overridden.

(In reply to comment #23)
>   No, as I said, no Locale::Maketext::Fuzzy.

Don't get me wrong, this does not involve Locale::Maketext::Fuzzy.  As seen in attachment 353112 [details] [diff] [review], '[_1] bug(s) found' is essentially a static key, not a fuzzy match.  '(s)' here is a remainder to translators to use %quant, if any.

It has _exact_ entry in Lexicon, no performance impact.  This exact entry contains %quant() variants.  They are expanded by Locale::Maketext using Bugzilla::L10N::en implementation of quant().  For English, there are two or three variants:

  %quant(%1,bug,bugs,No bugz) found

alternatively, English translator can write

  %numerate(%1 bug found.,%1 bugs found.,No bugz found!)

if this looks more human readable.

> Please write l("blah bug", "blah bugs") as is standard in other gettext solutions.

To my knowledge, there is no purpose in writing 'l("blah bug", "blah bugs")' in the code.  Am I missing something?

No we're not using GNU Gettext.  It is Perl's object-oriented Maketext, designed to overcome well-known gettext limitations, but still sharing many tools and file formats.
(In reply to comment #23)
>   No, as I said, no Locale::Maketext::Fuzzy.

Locale::Maketext::Fuzzy was discussed (not implemented) to address another problem:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/webtools/bugzilla/template/en/default/global/user-error.html.tmpl&rev=1.267&root=/cvsroot#1208

or 

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/webtools/bugzilla/template/en/default/global/user-error.html.tmpl&rev=1.267&root=/cvsroot#111

This is a real translators' nightmare, and believe me, worth fixing with some modern l10n technology.  If we don't want improve this -- just disband l10n team :-)))
(In reply to comment #24)
> If you want to fix a typo, fix it in .po, not in code.

How do you fix a typo in the english .po file if this one doesn't exist?? You have no other choice than to fix it in the template itself if you use the _AUTO feature.
Let's see if this make templates more readable.

{{ any text }} may be used instead of [% l("any text") %]

Extra whitespace is stripped, too.
Attachment #353111 - Attachment is obsolete: true
Attached patch Test case v01 (obsolete) — Splinter Review
Some 'Russian' messages intentionally left English.
Attachment #353112 - Attachment is obsolete: true
(In reply to comment #28)
> How do you fix a typo in the english .po file if this one doesn't exist??

en.po should be here.  At minimum, it will contain only l() calls with parameters.

Other msgids can be added at will, for example to correct a spelling error.
Comment on attachment 353450 [details] [diff] [review]
Double braces added

>+++ Bugzilla/Template/Parser.pm	2008-12-17 21:56:07.296875000 +0500

>+    # syntactic sugar for [% l("string") %]
>+    # FIXME: is '.*?' a valid syntax for all supported Perl versions??
>+    $text =~ s/\{\{\s*(.*?)\s*\}\}/[% l("$1") %]/g;

Err wait, this means external tools from TT won't be able to correctly parse and compile the templates?!
(In reply to comment #32)
> Err wait, this means external tools from TT

  Like what? We don't use anything. They already couldn't produce correct templates anyhow--all of our filters, variables, and utf8 code is in Bugzilla::Template.
(In reply to comment #33)
>   Like what? We don't use anything.

Oh, you are right. I had in mind we were using ttree to compile templates.
Comment on attachment 353451 [details] [diff] [review]
Test case v01

Sorry wrong patch, will fix tonight.
Attachment #353451 - Attachment is obsolete: true
Test case v01 attempt 2

Some 'Russian' messages intentionally left English.
Blocks: 150049
too late for 3.4.
Target Milestone: Bugzilla 3.4 → Bugzilla 4.0
We are going to branch for Bugzilla 4.4 next week and this bug is either too invasive to be accepted for 4.4 at this point or shows no recent activity. The target milestone is reset and will be set again *only* when a patch is attached and approved.

I ask the assignee to reassign the bug to the default assignee if you don't plan to work on this bug in the near future, to make it clearer which bugs should be fixed by someone else.
Target Milestone: Bugzilla 4.4 → ---
You need to log in before you can comment on or make changes to this bug.