Closed Bug 671725 Opened 13 years ago Closed 4 years ago

Integrate "translation memory" features for wiki localization

Categories

(developer.mozilla.org Graveyard :: Localization, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: openjck, Unassigned)

References

()

Details

Store mappings of English phrases to translated phrases for use in UI elements, table headings, notification boxes, and the like, with support for looking them up from in-page script code.

Wiki user "Pike" (https://wiki.mozilla.org/User:AxelHecht) had the following comment on this feature:

"This is generally known as 'Translation Memory' in Localization land. We should be specific to which extent we support this, but localizers would sure be happy to get full support here. If we do, we should probably rename this, if we don't, be specific on the difference."
Whiteboard: u= c= p= → u=localizer c=localization p=
Priority: -- → P4
The support needed from Mozilla starts with (and can be pretty much liimited to) a complete and up-to-date en-US tree in the l10n repository(ies), identical in its structure to the structure of l10n target translations. 


Regards

smo
John do you mean something like this : 

http://www.frenchmozilla.fr/transvision/


The TMX (translation memory) are available in a standard form in the directories in  : 

http://www.frenchmozilla.fr/transvision/TMX/
John, not sure if I was misreading the feature page?

I thought this is about using translation memory to aid the translation of documents on MDN, and possibly sumo if they're sharing that part of the code base.

John's initial request may not go as far, but may be more like the strings we have on verbatim for sumo, https://localize.mozilla.org/projects/sumo/?

Either way, this has not a lot to do with the firefox strings, which comments 1 and 2 refer to.
philippe and Alex: Thanks for the feedback. Jay wrote the original feature description, so I will be sure to ping him to ask if this is what he had in mind.
(In reply to comment #2)
> John do you mean something like this : 
> 
> http://www.frenchmozilla.fr/transvision/ 

proof of concept: using TMX material from this URL and OmegaT we could pretranslate 80% of fennec before the actual localization kicked off.

There's a few things that should be corrected though - in TMX themselves or in the translation - like the handling of HTML tags,

Otherwise big help, merci beaucoup!

smo
I'd like the localization community to weigh in mostly on this, since they'd be using it. Basically my idea is to have substitution strings, where the UI and site content itself can inject localized text by specifying, essentially, a variable name that would get mapped based on the user's locale to the appropriately translated text string.

Having a UI for localizers to translate strings would of course be a huge win.
so this feature is to utilize our l10n translation memory to support wiki content localization? is there a reason to use this rather than tapping Google Translate or Microsoft Translator services to try to pre-translate wiki docs? is our translation memory more accurate for technical text?
(In reply to comment #6)
> I'd like the localization community to weigh in mostly on this, since they'd
> be using it. Basically my idea is to have substitution strings, where the UI
> and site content itself can inject localized text by specifying,
> essentially, a variable name that would get mapped based on the user's
> locale to the appropriately translated text string.
> 
> Having a UI for localizers to translate strings would of course be a huge
> win.

afaik this is what pontoon project is about:
http://browserland.com/firefox-latest-news/2011/05/31/matjaz-horvat-moving-forward-with-pontoon/
(in reply to comment 7) 
There's no translation memory until somebody creates it, either by aligning the existing source and target language translations or by building it sentence by sentence. 
Re "our translation memory" - if ny "our TM" the materials in http://www.frenchmozilla.fr/transvision/  are meant: The TMX files (TMs for all the languages present in Mozilla products) are as accurate (i.e. good or bad)  as the translation so far done on Mozilla products: TMX files are built from the existing material. 

Re "is our translation memory more accurate for technical text?" - I don't quite understand. It depends on which "translation memory" is meant hereby. First, there's awesome TMs and there's aweful TMs, their quality aka accuracy depends on their authors. Second, the /transvision/ memories surely are not much useful for say SuMo article translations or wiki page localization. Just as Engineering glossary is useless when medical terminology is asked for.

Regards

smo
Blocks: 756266
No longer blocks: 756266
Blocks: 756266
Version: Kuma → unspecified
Component: Website → Landing pages
No longer blocks: 756266
Component: Landing pages → Localization
Summary: Phrase translation service → Build "translation memory" features
Whiteboard: u=localizer c=localization p=
Whiteboard: [localization]
Does it make sense for MDN to address this? I would think it would make more sense for the translation tool (Verbatim, Pontoon, etc.) to take care of this.
Whiteboard: [localization] → [localization][triaged]
Flags: needinfo?
Whiteboard: [localization][triaged] → [localization][triaged][feature]
We should look at this (need or not) as part of our l10n revamp project.

Btw, this may be an extension of http://transvision.mozfr.org/
Flags: needinfo?
The translation of actual documentation happens inside MDN, and l10n technologies to support that should be in scope for the MDN editing environment.

We're having a variety of options, actually:

Termbases: A selected set of words and phrases with the explanation and reference translations. This is mostly manually curated and ensures that localizations understand terminology used on MDN, and translate it consistently.

Translation Memory: This is storing existing translations, and suggesting those based on untranslated text. These would help greatly for structured documentation like reference material, where we're trying to use consistent language in English, and should in our localized content. The challenge here is to find the alignment between content in different languages.

Machine Translation: Something like google translate etc. In the future, we might employ a consistent interface to open source tools, which the Intellego project is trying to create [1]. The challenge is getting our wiki markup understood, and finding an engine that creates good content for technical documentation.

[1] https://wiki.mozilla.org/Intellego
For a TM that just does TM and nothing else you may want to look at amaGama https://github.com/translate/amagama.  We use that for Pootle, Verbatim should be pulling TM matches from there also.  No reason why other TM data can't be in there.  We populate this with all the translations from FOSS desktop tools and it can be loaded with any resources such as PO, TMX, monolingual data after conversion.
Given all of the user comments and discussion, this looks like it's out of scope for the MDN platform.
Severity: normal → enhancement
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
If this seems to be out of scope for the MDN platform, then I fear we haven't done a good enough job describing how the MDN platform can benefit from incorporating translation memory into the translation portion of the platform.

Translation memory allows translators to leverage previously translated content across a variety of different projects within a translation environment. Essentially, if a translator, or a team of translators, are translating content on HTML within MDN, using a translation memory feature, they can be provided with matching translation segments that they have already translated in other parts of MDN on the same subject, effectively ensuring the consistency of the translation across the team/content and allowing them to complete the translation in a much shorter time frame. Translation memory thus can lead to more accurate translations, shorter turnaround time, and a reduced workload for a community of volunteer translators.

I would be happy to have a call with anyone on the MDN team who would like additional explanation on why this would be useful within the MDN platform. It should be noted that this feature is ubiquitous standard in all translation tools today, especially those which focus on translating documentation.
While I definitely see the value for this (and would like to have it, in fact), finding time to build it, and to do it right, is a big deal. I wonder if we could try to make a volunteer opportunity of it.
(In reply to Eric Shepherd [:sheppy] from comment #15)
> While I definitely see the value for this (and would like to have it, in
> fact), finding time to build it, and to do it right, is a big deal. I wonder
> if we could try to make a volunteer opportunity of it.

I could refer you to several members of the Indian community who have expressed previous interest in developing a translation memory feature, if that is indeed the path that the MDN team would like to take.
I agree with :mars comment - that the TM platform shouldn't exist in MDN code.

I also agree with :teoli and :openjck - TM should be part of a robust l10n tool like Pootle, Pontoon, etc.

Is there a common l10n platform with TM features? If so, a better bug would be "Integrate TM from ____ into MDN translation interface."
(In reply to Luke Crouch [:groovecoder] from comment #17)
> I agree with :mars comment - that the TM platform shouldn't exist in MDN
> code.
> 
> I also agree with :teoli and :openjck - TM should be part of a robust l10n
> tool like Pootle, Pontoon, etc.
> 
> Is there a common l10n platform with TM features? If so, a better bug would
> be "Integrate TM from ____ into MDN translation interface."

What you would be looking for is a server-side TM solution, which we don't have. TM is already a part of Pontoon, and Pontoon would be very flexible to the needs of MDN. Perhaps it would make sense to replace the MDN translation interface with Pontoon? Or if Pontoon's currently not yet ready for MDN, helping Matjaz make it ready for MDN to implement?
Pontoon only publishes translations to SVN, Transifex, or files. [1] MDN article content is stored in a MySQL database. So, to use Pontoon on MDN we would have to either:

1. Switch MDN article storage to SVN, Transifex, or files
2. Add a publish-to-MDN feature to Pontoon (probably using the PUT API[2])

#1 is practically impossible

Matjaz, can you give any estimate on #2?

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Localization/Localizing_with_Pontoon#Publishing_your_localization
[2] https://developer.mozilla.org/en-US/docs/User:lmorchard/PUT-API
Flags: needinfo?(m)
We also have bug 845961 open suggesting that we replace Verbatim with Pontoon.
Since this request is for TM not actually a CAT tool replacement then a better bet I think is to look at amaGama for translation memory.  Pootle/Verbatim use and have been using for ages TM from amaGama.  An amaGama instance can store any good paired TM resources and an API query can give you the results really quickly.
amaGama is very possible with MDN. So, I'm re-opening this and updating the summary to reflect the change of scope from "build" to "integrate".
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: Build "translation memory" features → Integrate "translation memory" features for wiki localization
If you evaluate the amaGama API, please also have a look at Transvision which was mentionned by several people in this bug and which is maintained by the Mozilla community and is based on the indexing of all of our code repositories for all locales:
http://transvision.mozfr.org/

https://github.com/mozfr/transvision/wiki/JSON-API

Regards

Pascal
(In reply to Luke Crouch [:groovecoder] from comment #19)
> Pontoon only publishes translations to SVN, Transifex, or files. [1] MDN
> article content is stored in a MySQL database. So, to use Pontoon on MDN we
> would have to either:
> 
> 1. Switch MDN article storage to SVN, Transifex, or files
> 2. Add a publish-to-MDN feature to Pontoon (probably using the PUT API[2])
> 
> #1 is practically impossible
> 
> Matjaz, can you give any estimate on #2?
> 
> [1]
> https://developer.mozilla.org/en-US/docs/Mozilla/Localization/
> Localizing_with_Pontoon#Publishing_your_localization
> [2] https://developer.mozilla.org/en-US/docs/User:lmorchard/PUT-API

I think implementing Transvision or amaGama on MDN is the most straightforward way of resolving this bug. But it will require some effort on both sides: displaying results on MDN and feeding translation memory with MDN strings.

I don't know however, how much sense does it make to turn MDN into a localization platform. Because tomorrow we might need more features like translation history, comparing translations from other locales, machine translation, etc. So switching to Pontoon (or Pootle or any other l10n tool) sounds like the right long-term solution.

(Note that Pontoon already has support for Transvision and amaGama, but none of them has been fed with MDN strings, so this part of the job remains to be done in any case.)

That being said, Pontoon (or Pootle or any other similar l10n tool I can think of) only works with strings that are stored in reposiories (SVN, HG, GIT), Transifex, or files, as Luka already mentioned. Adding support for publishing to MDN DB would require some time, but is doable in Q3. Is there a GET API, too? We also need to import strings from somewhere.
Flags: needinfo?(m)
For a standpoint of implementation, of course, most of the time these strings will be accessed from KumaScript macros, so we can have a KumaScript method that returns a translation for us. That will be excellent.

Having an interface for the actual editing/importing of strings, including adding new ones when creating or adding to new macros, will be critical, as :mathjazz said.
This is all great input for the feature - thanks everyone.

Any way we do it, it will be a significant project. Now we have a much better understanding of how much time it will take and how many dependencies there may be. We'll look at this again during our next prioritization meetup. (Should be by Q4)
Priority: P4 → --
Whiteboard: [localization][triaged][feature]
MDN Web Docs' bug reporting has now moved to GitHub. From now on, please file content bugs at https://github.com/mdn/sprints/issues/ and platform bugs at https://github.com/mdn/kuma/issues/.
Status: REOPENED → RESOLVED
Closed: 10 years ago4 years ago
Resolution: --- → WONTFIX
Product: developer.mozilla.org → developer.mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.