Last Comment Bug 399014 - we need an l10n-merge tool
: we need an l10n-merge tool
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Localization (show other bugs)
: Trunk
: All All
: -- normal (vote)
: ---
Assigned To: Zibi Braniecki [:gandalf][:zibi]
:
Mentors:
Depends on:
Blocks: 398954 403215
  Show dependency treegraph
 
Reported: 2007-10-08 08:27 PDT by Axel Hecht [:Pike]
Modified: 2010-01-18 08:01 PST (History)
17 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Pur first attempt; the latest source on the Google Doc from our website (6.42 KB, application/x-zip-compressed)
2007-10-13 15:05 PDT, Armen Zambrano [:armenzg] (EDT/UTC-4)
no flags Details
brief instructions on how to make run our little attempt to make an l10n-merge tool (2.16 KB, text/plain)
2007-10-21 22:17 PDT, Armen Zambrano [:armenzg] (EDT/UTC-4)
no flags Details
l10n-merge script package (42.88 KB, application/zip)
2007-10-31 07:02 PDT, dynamis (Tomoya ASAI)
no flags Details
patch to fix treatment of ignored entities (396 bytes, patch)
2007-11-09 08:59 PST, dynamis (Tomoya ASAI)
no flags Details | Diff | Splinter Review
fix a typo (3.17 KB, patch)
2007-11-09 09:09 PST, Axel Hecht [:Pike]
no flags Details | Diff | Splinter Review
merge bug 391680, Paths and filter.py (10.91 KB, patch)
2007-11-09 09:13 PST, Axel Hecht [:Pike]
no flags Details | Diff | Splinter Review
factor out common parts of CompareCollector, subclass for the rest (2.09 KB, patch)
2007-11-15 02:52 PST, Axel Hecht [:Pike]
no flags Details | Diff | Splinter Review
merging the more standards-compliant regexps from trunk (1.81 KB, patch)
2007-11-19 14:35 PST, Axel Hecht [:Pike]
no flags Details | Diff | Splinter Review
add back a bookmarksparser (4.55 KB, patch)
2007-11-19 14:57 PST, Axel Hecht [:Pike]
no flags Details | Diff | Splinter Review
complete patch for l10n-merge (27.60 KB, patch)
2007-11-20 08:42 PST, Axel Hecht [:Pike]
no flags Details | Diff | Splinter Review

Description Axel Hecht [:Pike] 2007-10-08 08:27:22 PDT
We need a tool to merge l10n-impact changes in en-US into an existing localization, both for localization work, and for the build process.

If we can include such a tool into the build, we could generate builds for testing independently of the current status of a localization, and make that process much more agile.

Dynamis was having an intern working on this, can you report on the current status?
Comment 1 Armen Zambrano [:armenzg] (EDT/UTC-4) 2007-10-12 21:12:28 PDT
Hello all, my name's Armen and I'm working with Rueen Fiez and Vincent Lam on this tool; You can see our progress on the following website:
http://zenit.senecac.on.ca/wiki/index.php/Automated_localization_build_tool

We have no knowledge of python, we don't know how are the l10n tools are used and we are strugglinh with a lot of assignments, please understand that we wrote it from scratch based on the examples that Axel has sent.

Said that this is our initial testing that we got after six hours "trying to code something" rather than keep on reading without knowing what to learn
http://docs.google.com/Doc?id=dnkkbhp_65p4whj5

During this week we will have to have something working for our open source course, so if you can give us some feedback, we will really appreciate.

Regards, Armen on behalf of our team

PS = Dynamis could you please post somewhere some of your code? or a blog? or somehting
PS2 = why is it called autol10n-MERGE the tool? why MERGE?
Comment 2 Armen Zambrano [:armenzg] (EDT/UTC-4) 2007-10-13 15:05:00 PDT
Created attachment 284789 [details]
Pur first attempt; the latest source on the Google Doc from our website
Comment 3 Armen Zambrano [:armenzg] (EDT/UTC-4) 2007-10-21 22:17:33 PDT
Created attachment 285695 [details]
brief instructions on how to make run our little attempt to make an l10n-merge tool

I have added some instructions on how to use the first attachment to this bug. This Tuesday, Oct 23rd, we are going to work more on our tool adding the functionality to go over subfolders and making more changes to all the DTD and properties files.
We would also like to add more flexibility with the regular expressions

Please any feedback will be much appreciated, we will be on #seneca and #l10n
Armen, Rueen and Vincent
Comment 4 Axel Hecht [:Pike] 2007-10-29 06:55:11 PDT
Hi Armen,

to me, the l10n-merge tool and the code you're working on are two separate tools. They hopefully would be able to share code on the side of writing changes, but that's about it.

Regarding the 0.2 version, I looked at the script on Google Docs, and I bet I found a bug. Take a look at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/browser/locales/en-US/chrome/browser/setDesktopBackground.dtd&rev=1.1&mark=6, where 'color' shows up in the entity name. This is one of the reasons why I'd want your tool to work on top of the parsing arch and only postprocess the values, and not the complete files. Sadly my poking for the actual l10n-merge tool works didn't turn out too successful so far.
Comment 5 Armen Zambrano [:armenzg] (EDT/UTC-4) 2007-10-30 09:44:16 PDT
Here is our 0.2.2 attempt from last week and the Google Docs code wasn't updated:
http://matrix.senecac.on.ca/~azambran/mozilla/l10nmerge0.2.2.zip
I have added this line:
   return re.sub(r'(.*)([Cc])olor', r'\1\2olour', instring)

I don't know how is the right way to put code and discuss about it.

Axel, do you want us to file another bug for our project?


Comment 6 Chris Cooper [:coop] 2007-10-30 14:28:20 PDT
Pike: John and I were just discussing this, or at least I think we were. 

When you speak of the l10n-merge tool, are you looking for something that will create a diff of string resources between the previous build and the current build, and mark any omissions with easily searchable placeholder text for localizers to find? 

Just looking for a little more direction here, if you haven't heard anything more from dynamis.
Comment 7 dynamis (Tomoya ASAI) 2007-10-31 06:58:40 PDT
Hi all. Sorry coming late.

I will attach my l10n-merge script based on compare-locale.py.

What the script will do:
 - supporting dtd, properties and inc files
 - keep white space and comments
 - keep entity order
 - add new en-US entities at the end of file (and their comment)
 - comment out obsolate l10n entities (and their comment)
   # or remove them with -c option
 - copy new en-US file to l10n
 - remove obsolate file
   # or backup them with -b option (filename -> filename~)

workflow for each file is:
 1. split en-US/l10n file into the 'entity blocks' and make key table
 2. compare key en-US/l10n key table
 3. comment out or remove obsolate l10n 'entity blocks'
 4. add new en-US entity at the end of l10n file

entity block is:
 - First comment block of the file is considered as 'header' and don't touch it.
   # It is in many case license block and not comment for the first
   # entity. So, we don't consider it part of the first entity block.
 - Preceding white spaces and comments of each entity will be included
 - Following comment in the same line will also included
   # we sometimes add comment after the entity in the same line

example:
   en-US:
 1: <!-- header comment (license block) -->
 2: <!-- comments before entity1 -->
 3: <!ENTITY entity1 "value1">
 4: 
 5: <!-- comments before entity2 -->
 6: <!ENTITY entity2 "value2"> <!-- comment in the same line -->
 7: <!-- comments before entity3 -->
 8: <!ENTITY entity3 "value3">

       header: line1
entity block1: line2~3
entity block2: line4~6
entity block3: line7~8

   if l10n is:
 1: <!-- header comment (license block) -->
 2: <!-- comments before entity1 -->
 3: <!ENTITY entity1 "value1">
 4: <!-- comments before entity3 -->
 5: <!ENTITY entity3 "value3">
 6: <!ENTITY entity4 "value4">

   merged l10n will be:
 1: <!-- header comment (license block) -->
 2: <!-- comments before entity1 -->
 3: <!ENTITY entity1 "value1">
 4: 
 5: <!-- comments before entity2 -->
 6: <!-- XXX l10n merge: new entities (Wed Oct 31 03:38:16 2007) -->
 6: <!ENTITY entity2 "value2"> <!-- comment in the same line -->
 7: <!-- comments before entity3 -->
 8: <!-- XXX l10n merge: new entities (Wed Oct 31 03:38:16 2007) 
 9: <!ENTITY entity3 "value3"> -->
Comment 8 dynamis (Tomoya ASAI) 2007-10-31 07:02:59 PDT
Created attachment 286829 [details]
l10n-merge script package

Our l10n-merge python script based on Axel's compare-locale script

This l10n-merge script may still have some bugs but as far as I tested working fine.
Comment 9 dynamis (Tomoya ASAI) 2007-10-31 07:09:19 PDT
oops, line 8 was just my typo :(
>  8: <!-- XXX l10n merge: new entities (Wed Oct 31 03:38:16 2007) 
obsolate entities, not new entities.

This is just my typo here, not bug of the script.
# I wrote above example by hand.
Comment 10 Axel Hecht [:Pike] 2007-10-31 08:22:44 PDT
Thanks to Toshihiro Kura and dynamis for getting us this far.

I have been looking at the code already, and we'll need to do some merging and factorization. Paths.py and Parser.py need to be merged, and we should factor the common code between CompareLocales.py and L10nMerge.py as well as compare-locales and l10n-merge.py.

There are a few minor nits like spelling fixes, and the options should be a param, those are simple, though.

(In reply to comment #5)
> 
> I don't know how is the right way to put code and discuss about it.

I suggest that you take a look at L10nMerge.py. Yes, we're going to do some changes to it, but it should give you a really good idea on how to modify just the entity values.

> Axel, do you want us to file another bug for our project?

Yes, please.
Comment 11 dynamis (Tomoya ASAI) 2007-11-01 07:39:04 PDT
(In reply to comment #10)
> I have been looking at the code already, and we'll need to do some merging and
> factorization. Paths.py and Parser.py need to be merged, and we should factor
> the common code between CompareLocales.py and L10nMerge.py as well as
> compare-locales and l10n-merge.py.

We didn't have enough time and didn't care enough about factorization. I concentrated to make the script works strictly as I want.

I don't enough time now and if you merge common parts of l10n-merge and compare-locales, it's very nice.

To make the script works strictly, most important and complex part of the code is regular expression in Parser.py I think. Regular expression of compare-locale was not enough strict (in fact bug found) and I completely rewrote them.

Call me when you finished merging l10n-merge and compare-locales. I'll check if merged one still works as I expected. And if you will not share my regular expression for compare-locales, I will look latest compare-locales too and make patch for the minor bug (if it still exists).
Thanks.
Comment 12 dynamis (Tomoya ASAI) 2007-11-09 08:59:33 PST
Created attachment 288005 [details] [diff] [review]
patch to fix treatment of ignored entities

My peer found the bug of L10nMerge.py and this is the patch to fix it.

Without this patch, l10n-merge will comment out l10n only ignored entities.
For example browser.search.order.[3-9] in region.properties.
They are defined only in l10n files but should not be commented out as obsolate.
Comment 13 Axel Hecht [:Pike] 2007-11-09 09:09:47 PST
Created attachment 288008 [details] [diff] [review]
fix a typo

I have a bunch of patches already, I should shovel those out the wild. I'm currently having my own little bzr rep.

I'd say that I basically reviewed the original zip by dynamis, if someone would volunteer to review the merge patches, that'd be cool. I'll be attaching a final patch against mozilla/testing/tests/l10n in the end.

First patch is a spelling fix.
Comment 14 Axel Hecht [:Pike] 2007-11-09 09:13:09 PST
Created attachment 288010 [details] [diff] [review]
merge bug 391680, Paths and filter.py

Dynamis, this one should be actually interesting to review, it's incorporating bug 391680, which changes the way I ignore entities and files, and it may, sorry for that, obsolete the patch you just attached.
Comment 15 Axel Hecht [:Pike] 2007-11-15 02:52:12 PST
Created attachment 288828 [details] [diff] [review]
factor out common parts of CompareCollector, subclass for the rest

More merge foo, CompareCollector should stay unchanged, and merge should just subclass that one.
Comment 16 Axel Hecht [:Pike] 2007-11-19 14:35:36 PST
Created attachment 289389 [details] [diff] [review]
merging the more standards-compliant regexps from trunk
Comment 17 Axel Hecht [:Pike] 2007-11-19 14:57:29 PST
Created attachment 289393 [details] [diff] [review]
add back a bookmarksparser

I added a new bookmarks parser, the old one seems to be a tad too picky (busting es-AR, at least). The new one is based on HTMLParser.HTMLParser, and loosely on what Clint did for Minotaur. I think this one is a tad more strict, though.

The bookmarks parser can't do merges yet, I think that will be a good starting point for the l10n-fork team.

With a small nit

-      os.mkdir(os.path.dirname(l_fl))
+      os.makedirs(os.path.dirname(l_fl))

the code is now able to create new localizations from sparse ones, in the case of en-IN,

sparse-en-IN/browser/profile/bookmarks.html
sparse-en-IN/browser/searchplugins/a1books.xml
sparse-en-IN/browser/searchplugins/amazon-en-GB.xml
sparse-en-IN/browser/searchplugins/eBay-en-IN.xml
sparse-en-IN/browser/searchplugins/list.txt
sparse-en-IN/browser/searchplugins/yahoo-en-IN.xml
sparse-en-IN/other-licenses/branding/firefox/brand.properties
sparse-en-IN/toolkit/chrome/global/intl.properties

Making that work good might still take a bit.

Remaining tasks are

- unbust compare-locales, CompareLocales
- actually share common code
- some further clean up, at least the global options var needs to die.

PS: Using yield in the __iter__ function of Parser is likely a change that should be made for other parser impls, too. Not necessarily in this bug, though.
Comment 18 Axel Hecht [:Pike] 2007-11-20 08:42:00 PST
Created attachment 289508 [details] [diff] [review]
complete patch for l10n-merge

Complete patch for l10n-merge to mozilla/testing/tests/l10n.

I factored more code, and I all but the parser patches on L10N_MERGE_BRANCH incrementally, to ease review. http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=L10N_MERGE_BRANCH&branchtype=match&dir=%2Fmozilla%2Ftesting%2Ftests%2Fl10n%2F&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-11-19&maxdate=2007-11-21&cvsroot=%2Fcvsroot has the check-ins between the trunk and the current tip on that branch.

This patch is actually nothing but a 
cvs update -j L10N_MERGE_BASE -j L10N_MERGE_BRANCH

I left a few things up to volunteers or follow-up bugs:

- for some use-cases, I'd prefer to add entities without comments, in particular, for build purposes
- for build automation as well, I think it's be better to have the generated files in a separate dir. I'm not so fond of cvs update -C, it doesn't always work for me, it might leave dangling files, and I expect it to be more time intensive, too
- there's more room for factorization, in particular between CompareLocales.compare and L10nMerge.merge
- BookmarksParser can't merge yet, l10n-fork will need that, though
- Parser should use generators for __iter__, like BookmarksParser does now

Dynamis offered a review, I'd like to have a review from our python tool crew, too, any volunteers?
Comment 19 Axel Hecht [:Pike] 2007-12-06 03:05:27 PST
Comment on attachment 289508 [details] [diff] [review]
complete patch for l10n-merge

Benjamin, could you do a review on this? The original code from dynamis and his intern has a review + comments, where the comments are actually implemented in the follow up patches I checked in on my branch.

Comment 18 still has the round up description.
Comment 20 Axel Hecht [:Pike] 2007-12-30 05:04:47 PST
Comment on attachment 289508 [details] [diff] [review]
complete patch for l10n-merge

Cancelling review requests, my own tree is looking different now.
Comment 21 Axel Hecht [:Pike] 2008-04-30 04:39:27 PDT
Status update, gandalf and I have forked code that could prove useful.

More on gandalf's code is on http://wiki.braniecki.net/Projects:mozpyl10n, my code is at http://hg.mozilla.org/users/axel_mozilla.com/tooling/.

Some of the differences might have performance impact, so I'm not sure if both are equally well suited to be run as part of the build.

Gandalf signed up for making his code feature-compat so that we can have a perf-shoot-out. And I owe him a beer if his code wins.

Re-assigning to gandalf for that action item.
Comment 22 Axel Hecht [:Pike] 2010-01-18 08:01:50 PST
Marking FIXED, we have at least one 1l0n-merge in production now.

Note You need to log in before you can comment on or make changes to this bug.