Easier translation of web parts

RESOLVED FIXED

Status

Mozilla Localizations
Infrastructure
RESOLVED FIXED
9 years ago
8 years ago

People

(Reporter: Dwayne Bailey, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

9 years ago
This bug is raised following en email thread that Friedel Wolff started, the post are pasted here so that people have the full picture.

Summary: The web parts are hard for translators to translate, even for those who can program.  Lets try identify some way to reduce this difficulty.

---------------------------
Hallo Pascal, Seth, Wil

We are currently planning for a new effort in the next few months to
help a few new African languages get started with Firefox localisation.
You might have seen the activity in the Swahili team that we are
currently assisting. There are obviously lots of things to address, but
I'm writing specifically about the web pages that need translation.

I just provided the Afrikaans translation to bug 482781. The original
English file is here:
http://viewvc.svn.mozilla.org/vc/projects/mozilla.com/trunk/fr/firefox/all-beta.html?revision=23169&content-type=text%2Fplain

This is way beyond the technical ability of most of the people we will
be working with. I have chatted to Pascal about this before who assured
me that all professional translators can translate in PHP, HTML and
JavaScript. The people we will be working with can definitely not do it,
and even if they can, it is a waste of their time, and error prone. It
is interesting to note that even the people already in the Mozilla
ecosystem made some mistakes in that bug. (I guess it is very possible
for my own as well.)

Additionally, it makes it very hard to do any type of meaningful review
in an automated or semi-automated way as we are able to do with all our
translations. I don't know if I'm using the same terminology as I did
with previous web page translations. Because we can't use translation
tools to do this, people are likely to not even use editors with spell
checking functionality - definitely not the behaviour we want to
encourage, I believe.

So I am asking: is there a way that we can entirely re-engineer this
process so that it is _radically_ simpler? (In other words, not just
streamlined.) For me the goal we should be working towards is complete
divorce of PHP and JavaScript, with the absolute minimal amount of
knowledge of HTML needed just to handle inline elements like hyperlinks.
Can this be achieved? It is obviously easily doable on some level, as we
know other places are using solutions that allow translators to do
things without knowledge of programming languages.

I have ideas on how I would like to do it, based on what I'm familiar
with, but I'm curious to hear what we are able to do. I am a big
supporter of gettext, and I believe that caching should be able to
alleviate all performance concerns that might exist. (I have heard this
to be a concern about gettext before.) You guys obviously know better
about internal requirements at mozilla.com.

I'm curious to hear what you think is possible.

Keep well
Friedel

--
Recently on my blog:
http://translate.org.za/blogs/friedel/en/content/monolingual-translation-formats-considered-harmful
(Reporter)

Comment 1

9 years ago
Hi All,

Friedel has shown me these pages before but I think this is the first
time I've looked hard at it.  Even when opened in a text editor with
highlighting I found it hard to work out what to translate.

I have another suggestion for an approach to simplify this.  I realise
that having the page working as is and readable is probably quite
important.

My suggestion is that we put all the localisations at the top of the
page as PHP variables and then use those variables in the actual page
content. That way all the items that need localisation are in one place.
The added benefit is that the Translate Toolkit already has a php2po
tool that could extract those items that need localisation and pump them
back. Clean and simple yet still workable for whoever is testing the
page.

Just an idea that I'll leave with you guys.  Should this be discussed in
a bug?
(Reporter)

Comment 2

9 years ago
To elaborate on the ideas in comment 1.  I'm not a PHP developer so I don't know about scope of variables etc.  But here is a snippet to convey the general feeling of what I'm proposing, its pulled from the link Fridel gave, but simplified.

<?
l[page_title] = 'Firefox web browser | Help us test the latest beta';
l[main-feature-heading] = 'Help test the future of Firefox!';
?>

<?php
    // The $body_* variables are for compatibility with pre-existing css
    $page_title = $l[page_title];
?>

<div id="main-feature">
        <h2><?$l[main-feature-heading]?></h2>
</div>


-----------------

As I said I don't code PHP so I'm not sure what happens to everything here.

Comment 3

9 years ago
You're ignoring that our web parts don't consist of just translation.

Btw, linking to empty blog posts is just not constructive.
(Reporter)

Comment 4

9 years ago
(In reply to comment #3)
> You're ignoring that our web parts don't consist of just translation.

I'm having to guess what you mean here.  Do you mean that some teams need to customise these pages or fix things?

If that is so it still does not change the case for the majority of teams who don't need to customise or fix anything, they just need to translate.

I think my suggestion in comment 2:
1) Allows both translation and customisation to be achieved together
2) Keeps the page and the translations together for easier debugging
3) Most importantly, makes translation easy since all translatable text is at the top of the file (so you know what to translate) and follows PHP escaping (so there is only one type of escaping).

I think its a workable solution.  What do you think?

> Btw, linking to empty blog posts is just not constructive.

??

Comment 5

9 years ago
(In reply to comment #4)
> If that is so it still does not change the case for the majority of teams who
> don't need to customise or fix anything, they just need to translate.
> 
> I think my suggestion in comment 2:
> 1) Allows both translation and customisation to be achieved together
> 2) Keeps the page and the translations together for easier debugging
> 3) Most importantly, makes translation easy since all translatable text is at
> the top of the file (so you know what to translate) and follows PHP escaping
> (so there is only one type of escaping).

I would like to see someone from Mozilla respond directly to Dwayne's suggestion.  If we are going to make any progress, we should stay on point in this bug and respond to points.  

If there are links to other bugs where we have previously discussed this, please link them.  If not, for documentations sake, can we respond in this bug? 

> > Btw, linking to empty blog posts is just not constructive.
> 
> ??

Please stay on point in this bug.  All I want to see is a reason why we can or cannot make progress.  If we are going to cut and paste emails, please make sure all links are relevant to the bug.  

In comment 1, I followed a link that looked like a default signature to Friedel's email, giving brief thoughts about monoligual file formats.  

Is this bug about simplifying Mozilla's web-l10n or monolingual translation formats or both?  Please clarify because the link there is confusing to me.
(Reporter)

Comment 6

9 years ago
(In reply to comment #5)
> In comment 1, I followed a link that looked like a default signature to
> Friedel's email, giving brief thoughts about monoligual file formats.  
> 
> Is this bug about simplifying Mozilla's web-l10n or monolingual translation
> formats or both?  Please clarify because the link there is confusing to me.

OK know I understand. My fault, I just did a Ctrl-A, Ctrl-V when pasting Friedel's mail into the bug.

I think Friedel's email was clear on the purpose.  But if it helps clarify things this is about web-l10n and making the localisation of it easier its not about monolingual files.

Comment 7

9 years ago
Ah, OK.

I have raised the question in our internal discussions before, and yes, the current way we do our webpages makes us hit the limits in how many locales we can take and release.

We need to come up with something better, definitely.

I don't think that multilingual files are going to be an option, as we need to support the addition and removal of content. 

I'm not sure just yet on how much we need the ability to tweak the living shit out of our HTML, especially for particularly verbose languages or RTL languages.

I'm neither sure if the same answer holds true for our start page snippets as for the majority of our web pages as for getting started (which is really the only page with localized content).

My personal feeling is to restrict ourselves to some kind of wiki markup for which we can create good UI/UE for the common parts would be the way to go, but that's just my personal "I can mediawiki fast" way of thinking.

We're going to have an intern over the summer that did some UI on placables, Jeremy. I'm looking forward to discuss with the group including him on how we can present HTML such that it's hard to get wrong, and still easy to get right. And, for Pascal and Stas, easy to confirm it's right.
<quote>Summary: The web parts are hard for translators to translate, even for those who can program.  Lets try identify some way to reduce this difficulty.
</quote>

We have been working with lots of localizers on web parts over the years many of them without any HTML or programming knowledge, I can recall only four people complaining about us using html, that is both of you in the Afrikaans team, Yannig our Occitan localizer and a localizer from an Indian locale that wanted to use OmegaT and not a text editor. It doesn't mean everybody is in love with HTML, but it is simple enough to have allowed us to grow fast on the web in many locales. Everytime I had to explain the basics of HTML face to face to new localizers, it didn't take more than 15 minutes to have them be confortable with it and they were not programmers and neither genius, it does take some time and we are all short of it, it's true.

We have grown in 3 years from a monolingual mozilla.com to good quality web parts for 62 locales and we are very far from having an army of localizers, I do not know of any open source web project having reached this level of profesionnal localized web presence so the above claim does not seem to correspond to what we have been experimenting in the last three years.

What I have heard people complaining about has always been about the *length*,  the little time given to translate pages, the marketing jargon used or how difficult the productization of Firefox Central was, not the formatting of documents. There were some complaints about strings in javascript, specifically in the Tips page, and I do understand these complaints because the page was not entirely well thought out for localization per lack of time during release last year (at least it was localizable though), but that doesn't make it a rule since we very marginally have strings in javascript and almost none in php for our general work.

<quote>I just provided the Afrikaans translation to bug 482781. The original English file is here:
http://viewvc.svn.mozilla.org/vc/projects/mozilla.com/trunk/fr/firefox/all-beta.html?revision=23169&content-type=text%2Fplain
This is way beyond the technical ability of most of the people we will be working with.</quote>

In a short time we have had already 33 locales done without a big push on it. On more than an occasion, we had a page translated in 50 languages in 2 or 3 weeks, which is crucial when we are in a rush for a major update for example. I don't think that externalizing strings would improve that response time, at least not in the short term since our localizers would have to give up on some of the expertise they have acquired along their years in our process.

What I find worrying in the above sentence is that you don't envision people to work with Mozilla directly though as you apparently would act as a proxy between them and us. I really don't know who these people are but why not asking them to contact me directly for help? If these people are part of emerging mozilla communities, shouldn't they get involved in our processes first before rejecting them?

A few points:

1/ We already do have separation of strings from HTML for some easy translation web parts such as the google snippets or online surveys, we also store strings used in our php scripts in a single file, main.lang, not gettext ok, but potentially convertible to/from .po, I remember that Wil said Verbatim would have support for our .lang format.
2/ The number of compulsory pages to translate is very very limited and usually contains no strings in javascript and very limited php with usually only one php string to translate, containing the page title for the page. I will agree with you that a few new localizers tend to forget this title and leave it in English.
3/ We have lots of localizers part of our community that are happy with the current process and are key to the success of Mozilla, so any alternative solution would have to take into account the well being of our current Mozilla community first.
4/ Your proposed changes (all strings in php variables) would mean that the website would have to be rewritten in English too or that a parallel version would have to be maintained as a reference for localization, I can foresee a maintenance there and no benefit for people working on the en-US version.
5/ we have more and more localized content (not just central as Pike mentioned), our landing pages have space promoting community portals, our firstrun and whatsnew pages more and more frequently have extra texts for surveys targetting specific locales, marketing campaigns or information relevant to a single locale.
6/ The biggest and longest work for web parts for localizers was on productization for central
7/ The QA process would have to be revisited
8/ mozillamessaging would have to follow mozilla.com new localization process because they share the same localization community
9/ As of 3.5, we will no longer ask localizers to translate long pages such as Tips or Organic that represented the bulk of localization for 3.0 (like 80% of content) but account for less than 2% of visits. With one year of data, we now know that 4 pages (whatsnew/firstrun/landing/central) amount to +95% of page views for locales. Out of these 4 pages, only one has significant content length, Central with 400 words. So we are not updating it for current locales and we will work on a much smaller and easier central page in the future. FYI, a new locale currently needs to translate about 1000 words for the web parts (the tips page alone was 1400 words...). This reduction of scope and better focus on visitors will allow us to scale to more locales much more easily while allowing our most active localizers to be part of marketing campaigns involving the creation of mini-sites (some of them probably with gettext).

I understand the problem that you are facing within translate.org since your mission is to localize software in languages spoken by populations where it is more difficult to find contributors somewhat computer-savvy. We definitely want to make it easier for as many locales as possible to have official web parts of quality that's why we build tools, hire people specialized in Web QA, contract graphic designers to make it look good... 

But please also understand that the process in place while not perfect has proven to be efficient, so far scalable and is a good compromise between our current localizers, marketing, developers and IT needs.

On my side I can definitely better document our processes for new localizers less experimented than our current community, there is also room for improvement in our html to make the occasional javascript and php calls less intrusive and I am committed to do my best at helping the less technical contributors.

(BTW, I also understand what benefits gettext can bring, especially since I am a localizer for a web project outside of Mozilla using gettext and in their context I think it was the right tool, we are just in a very different context.)

Comment 9

8 years ago
It's been a year since the last comment in this bug, I'll take the liberty to summarize, and close.

Doing l10n should always become easier, and there are multiple angles in which we're currently working. There's pontoon in the works to localize web pages inline, which I guess would fit the initial comment. Best described at http://diary.braniecki.net/2010/04/19/pontoon-introduction/, I guess. We've got verbatim up for quite a few web properties, too. The webdashboard got better.

While the majority of our in-product web pages are still plain html, I expect those to migrate to new foundations as soon as they're stable, offer the right features (read, not just translation), and performant enough for the main mozilla.com site. I don't see keeping this bug open helping us.

If there are pages with particularly hillarious php code still in them, or other pieces, I'd suggest to file bugs on those individual pages in Websites, mozilla.com component.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.