Closed Bug 399014 Opened 17 years ago Closed 14 years ago

we need an l10n-merge tool

Categories

(Core :: Internationalization: Localization, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: Pike, Assigned: zbraniecki)

References

Details

Attachments

(2 files, 8 obsolete files)

We need a tool to merge l10n-impact changes in en-US into an existing localization, both for localization work, and for the build process.

If we can include such a tool into the build, we could generate builds for testing independently of the current status of a localization, and make that process much more agile.

Dynamis was having an intern working on this, can you report on the current status?
Hello all, my name's Armen and I'm working with Rueen Fiez and Vincent Lam on this tool; You can see our progress on the following website:
http://zenit.senecac.on.ca/wiki/index.php/Automated_localization_build_tool

We have no knowledge of python, we don't know how are the l10n tools are used and we are strugglinh with a lot of assignments, please understand that we wrote it from scratch based on the examples that Axel has sent.

Said that this is our initial testing that we got after six hours "trying to code something" rather than keep on reading without knowing what to learn
http://docs.google.com/Doc?id=dnkkbhp_65p4whj5

During this week we will have to have something working for our open source course, so if you can give us some feedback, we will really appreciate.

Regards, Armen on behalf of our team

PS = Dynamis could you please post somewhere some of your code? or a blog? or somehting
PS2 = why is it called autol10n-MERGE the tool? why MERGE?
I have added some instructions on how to use the first attachment to this bug. This Tuesday, Oct 23rd, we are going to work more on our tool adding the functionality to go over subfolders and making more changes to all the DTD and properties files.
We would also like to add more flexibility with the regular expressions

Please any feedback will be much appreciated, we will be on #seneca and #l10n
Armen, Rueen and Vincent
Hi Armen,

to me, the l10n-merge tool and the code you're working on are two separate tools. They hopefully would be able to share code on the side of writing changes, but that's about it.

Regarding the 0.2 version, I looked at the script on Google Docs, and I bet I found a bug. Take a look at http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/browser/locales/en-US/chrome/browser/setDesktopBackground.dtd&rev=1.1&mark=6, where 'color' shows up in the entity name. This is one of the reasons why I'd want your tool to work on top of the parsing arch and only postprocess the values, and not the complete files. Sadly my poking for the actual l10n-merge tool works didn't turn out too successful so far.
Here is our 0.2.2 attempt from last week and the Google Docs code wasn't updated:
http://matrix.senecac.on.ca/~azambran/mozilla/l10nmerge0.2.2.zip
I have added this line:
   return re.sub(r'(.*)([Cc])olor', r'\1\2olour', instring)

I don't know how is the right way to put code and discuss about it.

Axel, do you want us to file another bug for our project?


Pike: John and I were just discussing this, or at least I think we were. 

When you speak of the l10n-merge tool, are you looking for something that will create a diff of string resources between the previous build and the current build, and mark any omissions with easily searchable placeholder text for localizers to find? 

Just looking for a little more direction here, if you haven't heard anything more from dynamis.
Hi all. Sorry coming late.

I will attach my l10n-merge script based on compare-locale.py.

What the script will do:
 - supporting dtd, properties and inc files
 - keep white space and comments
 - keep entity order
 - add new en-US entities at the end of file (and their comment)
 - comment out obsolate l10n entities (and their comment)
   # or remove them with -c option
 - copy new en-US file to l10n
 - remove obsolate file
   # or backup them with -b option (filename -> filename~)

workflow for each file is:
 1. split en-US/l10n file into the 'entity blocks' and make key table
 2. compare key en-US/l10n key table
 3. comment out or remove obsolate l10n 'entity blocks'
 4. add new en-US entity at the end of l10n file

entity block is:
 - First comment block of the file is considered as 'header' and don't touch it.
   # It is in many case license block and not comment for the first
   # entity. So, we don't consider it part of the first entity block.
 - Preceding white spaces and comments of each entity will be included
 - Following comment in the same line will also included
   # we sometimes add comment after the entity in the same line

example:
   en-US:
 1: <!-- header comment (license block) -->
 2: <!-- comments before entity1 -->
 3: <!ENTITY entity1 "value1">
 4: 
 5: <!-- comments before entity2 -->
 6: <!ENTITY entity2 "value2"> <!-- comment in the same line -->
 7: <!-- comments before entity3 -->
 8: <!ENTITY entity3 "value3">

       header: line1
entity block1: line2~3
entity block2: line4~6
entity block3: line7~8

   if l10n is:
 1: <!-- header comment (license block) -->
 2: <!-- comments before entity1 -->
 3: <!ENTITY entity1 "value1">
 4: <!-- comments before entity3 -->
 5: <!ENTITY entity3 "value3">
 6: <!ENTITY entity4 "value4">

   merged l10n will be:
 1: <!-- header comment (license block) -->
 2: <!-- comments before entity1 -->
 3: <!ENTITY entity1 "value1">
 4: 
 5: <!-- comments before entity2 -->
 6: <!-- XXX l10n merge: new entities (Wed Oct 31 03:38:16 2007) -->
 6: <!ENTITY entity2 "value2"> <!-- comment in the same line -->
 7: <!-- comments before entity3 -->
 8: <!-- XXX l10n merge: new entities (Wed Oct 31 03:38:16 2007) 
 9: <!ENTITY entity3 "value3"> -->
Our l10n-merge python script based on Axel's compare-locale script

This l10n-merge script may still have some bugs but as far as I tested working fine.
oops, line 8 was just my typo :(
>  8: <!-- XXX l10n merge: new entities (Wed Oct 31 03:38:16 2007) 
obsolate entities, not new entities.

This is just my typo here, not bug of the script.
# I wrote above example by hand.
Thanks to Toshihiro Kura and dynamis for getting us this far.

I have been looking at the code already, and we'll need to do some merging and factorization. Paths.py and Parser.py need to be merged, and we should factor the common code between CompareLocales.py and L10nMerge.py as well as compare-locales and l10n-merge.py.

There are a few minor nits like spelling fixes, and the options should be a param, those are simple, though.

(In reply to comment #5)
> 
> I don't know how is the right way to put code and discuss about it.

I suggest that you take a look at L10nMerge.py. Yes, we're going to do some changes to it, but it should give you a really good idea on how to modify just the entity values.

> Axel, do you want us to file another bug for our project?

Yes, please.
(In reply to comment #10)
> I have been looking at the code already, and we'll need to do some merging and
> factorization. Paths.py and Parser.py need to be merged, and we should factor
> the common code between CompareLocales.py and L10nMerge.py as well as
> compare-locales and l10n-merge.py.

We didn't have enough time and didn't care enough about factorization. I concentrated to make the script works strictly as I want.

I don't enough time now and if you merge common parts of l10n-merge and compare-locales, it's very nice.

To make the script works strictly, most important and complex part of the code is regular expression in Parser.py I think. Regular expression of compare-locale was not enough strict (in fact bug found) and I completely rewrote them.

Call me when you finished merging l10n-merge and compare-locales. I'll check if merged one still works as I expected. And if you will not share my regular expression for compare-locales, I will look latest compare-locales too and make patch for the minor bug (if it still exists).
Thanks.
My peer found the bug of L10nMerge.py and this is the patch to fix it.

Without this patch, l10n-merge will comment out l10n only ignored entities.
For example browser.search.order.[3-9] in region.properties.
They are defined only in l10n files but should not be commented out as obsolate.
Attached patch fix a typo (obsolete) — Splinter Review
I have a bunch of patches already, I should shovel those out the wild. I'm currently having my own little bzr rep.

I'd say that I basically reviewed the original zip by dynamis, if someone would volunteer to review the merge patches, that'd be cool. I'll be attaching a final patch against mozilla/testing/tests/l10n in the end.

First patch is a spelling fix.
Assignee: nobody → l10n
Status: NEW → ASSIGNED
Attachment #288008 - Flags: review?
Dynamis, this one should be actually interesting to review, it's incorporating bug 391680, which changes the way I ignore entities and files, and it may, sorry for that, obsolete the patch you just attached.
Attachment #288010 - Flags: review?
Attachment #288010 - Flags: review? → review?(bugzilla)
More merge foo, CompareCollector should stay unchanged, and merge should just subclass that one.
Blocks: 403215
Attached patch add back a bookmarksparser (obsolete) — Splinter Review
I added a new bookmarks parser, the old one seems to be a tad too picky (busting es-AR, at least). The new one is based on HTMLParser.HTMLParser, and loosely on what Clint did for Minotaur. I think this one is a tad more strict, though.

The bookmarks parser can't do merges yet, I think that will be a good starting point for the l10n-fork team.

With a small nit

-      os.mkdir(os.path.dirname(l_fl))
+      os.makedirs(os.path.dirname(l_fl))

the code is now able to create new localizations from sparse ones, in the case of en-IN,

sparse-en-IN/browser/profile/bookmarks.html
sparse-en-IN/browser/searchplugins/a1books.xml
sparse-en-IN/browser/searchplugins/amazon-en-GB.xml
sparse-en-IN/browser/searchplugins/eBay-en-IN.xml
sparse-en-IN/browser/searchplugins/list.txt
sparse-en-IN/browser/searchplugins/yahoo-en-IN.xml
sparse-en-IN/other-licenses/branding/firefox/brand.properties
sparse-en-IN/toolkit/chrome/global/intl.properties

Making that work good might still take a bit.

Remaining tasks are

- unbust compare-locales, CompareLocales
- actually share common code
- some further clean up, at least the global options var needs to die.

PS: Using yield in the __iter__ function of Parser is likely a change that should be made for other parser impls, too. Not necessarily in this bug, though.
Complete patch for l10n-merge to mozilla/testing/tests/l10n.

I factored more code, and I all but the parser patches on L10N_MERGE_BRANCH incrementally, to ease review. http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=L10N_MERGE_BRANCH&branchtype=match&dir=%2Fmozilla%2Ftesting%2Ftests%2Fl10n%2F&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007-11-19&maxdate=2007-11-21&cvsroot=%2Fcvsroot has the check-ins between the trunk and the current tip on that branch.

This patch is actually nothing but a 
cvs update -j L10N_MERGE_BASE -j L10N_MERGE_BRANCH

I left a few things up to volunteers or follow-up bugs:

- for some use-cases, I'd prefer to add entities without comments, in particular, for build purposes
- for build automation as well, I think it's be better to have the generated files in a separate dir. I'm not so fond of cvs update -C, it doesn't always work for me, it might leave dangling files, and I expect it to be more time intensive, too
- there's more room for factorization, in particular between CompareLocales.compare and L10nMerge.merge
- BookmarksParser can't merge yet, l10n-fork will need that, though
- Parser should use generators for __iter__, like BookmarksParser does now

Dynamis offered a review, I'd like to have a review from our python tool crew, too, any volunteers?
Attachment #288005 - Attachment is obsolete: true
Attachment #288008 - Attachment is obsolete: true
Attachment #288010 - Attachment is obsolete: true
Attachment #288828 - Attachment is obsolete: true
Attachment #289389 - Attachment is obsolete: true
Attachment #289393 - Attachment is obsolete: true
Attachment #289508 - Flags: review?(bugzilla)
Attachment #288008 - Flags: review?
Attachment #288010 - Flags: review?(bugzilla)
Attachment #284789 - Attachment is obsolete: true
Attachment #285695 - Attachment is obsolete: true
Comment on attachment 289508 [details] [diff] [review]
complete patch for l10n-merge

Benjamin, could you do a review on this? The original code from dynamis and his intern has a review + comments, where the comments are actually implemented in the follow up patches I checked in on my branch.

Comment 18 still has the round up description.
Attachment #289508 - Flags: review?(benjamin)
Comment on attachment 289508 [details] [diff] [review]
complete patch for l10n-merge

Cancelling review requests, my own tree is looking different now.
Attachment #289508 - Flags: review?(bugzilla)
Attachment #289508 - Flags: review?(benjamin)
Status update, gandalf and I have forked code that could prove useful.

More on gandalf's code is on http://wiki.braniecki.net/Projects:mozpyl10n, my code is at http://hg.mozilla.org/users/axel_mozilla.com/tooling/.

Some of the differences might have performance impact, so I'm not sure if both are equally well suited to be run as part of the build.

Gandalf signed up for making his code feature-compat so that we can have a perf-shoot-out. And I owe him a beer if his code wins.

Re-assigning to gandalf for that action item.
Assignee: l10n → gandalf
Status: ASSIGNED → NEW
Marking FIXED, we have at least one 1l0n-merge in production now.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.