Closed Bug 403215 Opened 13 years ago Closed 12 years ago
Python script that will help create from a translated locale a regional build
170.66 KB, application/octet-stream
1017 bytes, application/octet-stream
7.85 KB, patch
|Details | Diff | Splinter Review|
Release 1.0 - Faster localization process and new functionality (tool can auto generate search engine localization file with predefined keys).
228.44 KB, application/x-zip-compressed
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:22.214.171.124) Gecko/20071025 Firefox/126.96.36.199 Build Identifier: Hello all, my name's Armen and I'm working with Rueen Fiez and Vincent Lam on this python script; You can see our progress on the following website: http://zenit.senecac.on.ca/wiki/index.php/Automated_localization_build_tool We have little understanding of python, how the l10n/build process works and what this script could end up looking like, please understand that we wrote it from scratch based on the examples that Axel has sent us. We started writing on the wrong BUG (bug 399014 - we need an l10n-merge tool) and now we have brought the discussion here. We are going to be working on our 0.2 release during this coming week and we will write down if we need help Regards, Armen on behalf of our team Reproducible: Always Steps to Reproduce: N/A
We are going to ask for review on the 0.2.3 release
Assigning to Armen at his request.
Assignee: nobody → azambran
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment on attachment 288014 [details] The regular expresion has ben fixed and does not touch keys but only values Fixing mime type.
Download 0.2.3 - http://zenit.senecac.on.ca/wiki/index.php/Image:L10nmerge0.2.3.zip We are stuck with the idea on how to apply more regular expressions changes. Do we create a file with a row for each string to be replaced and read from it? How do we apply more than one regular expression substitution to a string? ---------------------- Please if you want to test it you can follow this instructions: http://zenit.senecac.on.ca/wiki/index.php/Release_0.2.3_Instructions Rueen has added some screenshots and explains it in plain words: http://rueenfiez.wordpress.com/2007/11/15/automated-localization-tool-02-release/ And as mentioned in other comments, this is the page were we write information about the script: http://zenit.senecac.on.ca/wiki/index.php/Automated_localization_build_tool
Adding a dependency on l10n-merge, bug 399014. The parsing/output routines there will likely be worthwhile. To get a different name for this compared to l10n-merge, what do you guys think about calling it l10n-fork?
Depends on: 399014
I think it is a better name that what we had I believe that it refocuses us into what it is supposed to do We will be posting more stuff during this week
This attachment is supposed to do multiple regular expressions changes to the DTD and the properties files of an l10n source tree I think it will be amazing if it could create a parallel l10n tree with the changes applied to the new regional. We want to get rid of creating .bak files Inside the file, you can find "todo.txt", "changes.txt", "instructions.txt" and "release.txt" with plenty of information. Please if someone could just give me an idea of how would they: 1) create a parallel l10n tree - (I am thinking of copying from SOURCE to DESTINATION and then apply changes) 2) read the changes from changes.txt which contains multiple changes and recreate the regular expressions - (I am thinking of an array with COMPILED regular expressions and then iterate through them)
Attachment #288014 - Attachment is obsolete: true
> Please if someone could just give me an idea of how would they: > 1) create a parallel l10n tree - (I am thinking of copying from SOURCE to > DESTINATION and then apply changes) I'm not sure I know what you mean. I _think_ you're saying you should make a full copy of a tree, eg. "cp -R en-US en-CA". Either way, I think that would be a fine approach. As far as I know, an l10n tree isn't too big so this shouldn't eat disk space. > 2) read the changes from changes.txt which contains multiple changes and > recreate the regular expressions - (I am thinking of an array with COMPILED > regular expressions and then iterate through them) > I would read the changes file line-by-line and process them one at a time. It seems to me that there could be thousands of lines (I could be wrong though), and you don't want that many regular expression objects live at the same time. The pseudo-code might look something like this: (open changes file) (open new properties/dtd/etc file) while readline: if (line is a comment): pass else: (extract before/after versions of the string) (create regex base on those strings) (apply regex to file) It'll be more complicated than that, for sure, but that's the general idea. Ping me on IRC if you've got follow-up questions.
Ben, we have used shutil.copytree for tree copying Ben, I have been fooling around with the file idea from which to read the regular expressions and I will be posting a preliminar for it later on Axel: | "The parsing/output routines there will likely be worthwhile." I wish I knew python before I took this project, after my sessions today, I think that going back to dynamis code might help me a lot TO EVERYONE: I have added TXT files in the attachment that helps an outsider to know what the code is supposed to do and I have some comments in my code that could help others. -> Is there a page explaining the l10n scripts? -> Is there a page explaining in which context to use each script?
This is my own attempt to read from the file "changes.txt" the substitutions that I need to apply to the l10n source tree First method gives me an idea on how to create a dictionary with "original value" as the key and the "regionalized value" as the value to the key
Comment on attachment 292015 [details] This for testing over an l10n tree. Unzip this and overwrite l10nFork.py with the latest version > def __init__(self): > self.data =  I don't see self.data being used anywhere -- you can probably get rid of this whole method. (__init__() is not required). > def translate(self, node): > dtdParser = DTDParser() > propParser = PropertiesParser() > if(node.endswith('.dtd')): > f = open(node,'r+') > dtdParser.read(node) > f.write((t.callback(dtdParser.contents)).encode("utf-8")) t.callback() should be self.callback() -- this only works here because you've instantiated a Tool() globally as 't'. Try changing the 't = Tool()' line to something else and see what happens :). > # Use for DEBUGGING > #print dtdParser.contents > f.close() > elif(node.endswith('.properties')): > f = open(node,'r+') > propParser.read(node) > f.write((t.callback(propParser.contents)).encode("utf-8")) Same here. > # Use for DEBUGGING > #print propParser.contents > f.close() > def process(self, forkedLocale): > directories = [forkedLocale] I'm assuming you're putting this into a list because you want to support multiple locales at some point. If this is the case, I suggest making process() require a list, so it would be called like.. 't.process([newLocaleName])'. I have some general comments as well: * Lines should be kept to a maximum length of 80 characters. This is true for code and comments. * I'd really like it if this script accepted command line options rather than forcing me to input things. This will let it be used in an automated way. See the 'getopt' module for how to this easily. * Debugging statements will need to be removed at some point. * Please submit this as patch next time. It makes the review process much much easier. Overall, it appears to work. Once the things listed above are fixed I'll be happy to r+ this. If you have any follow-up questions feel free to ping me on IRC.
Attachment #292015 - Flags: review?(bhearsum) → review-
It doesn't obsolete any of the previous. - To test, download 0.3.2 and overwrite l10nFork.py with this new attachment Ben, things I have fixed: - fixed 80 cols per line - removed DEBUG statements - Attached as Patch rather than an archive file - Using optparse for arguments Could NOT fix: - Using self.process and remove t=Tool() --> I believe it has to do that I am working inside of __name__ = '__main__', NOT SURE - In the other side, it works when I call self.applyChanges() Left to be done for the script: - Read from file the changes to be applied to the l10n tree rather than using hard-coded regular expressions --> MY ATTEMPTS/PRACTICE is in attachment "readChanges.py" NOTE: I will retake the development after my exams are done (next Friday) and for now I will JUST be fixing easy things to fix
Attachment #292111 - Flags: review?(bhearsum)
Comment on attachment 292111 [details] [diff] [review] 0.3.3 - fixed most of Ben's suggestions - using optparse for arguments > # DESCRIPTION > # applyChanges() is passed a string to be translated. It is responsible for > # changing all parser.contents values. We insert the translation rules > # in here via regular expressions. This function needs to be optomized > # since their could be hundreds of translations, thus resulting in a > # bloated callback() function. Need to fix the spelling mistake and change callback() to applyChanges() > # DESCRIPTION > # translate() uses the DTDParser and PropertiesParser from the Parser.py > # class to parse each .dtd and .properties file in the given directory. > # Its contents are then passed into our callback() function string by > # string and changed in their. Here too. > f.write((self.applyChanges(dtdParser.contents)).encode("utf-8")) Good :) > >if __name__ == '__main__': > # TODO the options are not implemented yet > # Receiving ARGUMENTS is implemented > # > # example of what to pass to the script > # $> python l10nFork.py --file=changes.txt > # to get help type $> python l10nFork.py -h and will print these options > usage = 'usage: %prog [options] forkedLocale regionToGenerate' > > parser = OptionParser(usage=usage) > #TODO option to be implemented later on If the --file option doesn't work right now you should note that in the usage statement. > parser.add_option("-f", "--file", dest="filename", default="changes.txt", > help="the FILE from where to read changes", metavar="FILE") > > (options, args) = parser.parse_args() > > if len(args) < 2: > parser.error('At least 2 parameters required: locale and regional (e.g.') > t = Tool() > forkedLocale = args > newLocaleRegion = args > if os.path.exists(forkedLocale) and os.path.exists(newLocaleRegion) == False: > shutil.copytree(forkedLocale, newLocaleRegion) > t.process(newLocaleRegion) > else: > #REVIEW I don't handle the NON existance of "forkedLocale" - TODO > #REVIEW should we throw Exception instead or is this good enough > print "The directory " + newLocaleRegion \ > + " already exists. Please choose another one.\n" > If forkedLocale doesn't exist it sounds like this script can't do anything. If that's the case, printing an error message and exiting is the right behaviour. You should print to stderr here, though, that's where error messages belong. Look at sys.stderr for how to do that. You should also return a non-zero value when you exit with an error. This is another thing that will aid the scriptability. This is a simple 'return 1'. > Could NOT fix: > - Using self.process and remove t=Tool() --> I believe it has to do that I am working inside of __name__ = '__main__', NOT SURE Sorry, I didn't mean to give the impression that you needed to change it there. You *must* have an instance of a Tool() (eg, 't') to be able to call it. This version of the script does exactly the right thing :). This is shaping up quite well, good work. I only r-'ed because of the error handling comments a couple paragraphs up.
Attachment #292111 - Flags: review?(bhearsum) → review-
The creation code should be merged with http://lxr.mozilla.org/mozilla/source/tools/l10n/l10n.py (which needs an additional argument itself to not just copy over from en-US on demand). For the general setup, I guess that we'd end up with the modification code in the target directory. Being able to forcefully overwrite might be good, too. I guess that the "oops, did you really want to do that" detection needs to be a tad more complex. And instead of doing stderr and return 1 like bhearsum suggested, just use parser.error() again, that just does that for you. Function comments should really go into the function body, like def foo(): ''' This is my lengthy comment. It will be displayed if you do help foo. ''' and_now_implement = True For the directory tree iteration, I'd use os.walk. CompareLocales.FileCollector does some of that, though dead-on ugly. I might do that on my playground, I just checked in a bunch of stuff in http://hg.mozilla.org/users/axel_mozilla.com/tooling/. There's a static helper function Parser.getParser, which should help you avoid the special casing you're currently doing in the loop. I'm looking forward to see the actual forking code to be integrated with l10n-merge. Do you have ideas on how to factor that code? Not sure if that stuff doesn't call for a real class approach instead of hashes of strings for the output of the parsers.
Comment on attachment 292015 [details] This for testing over an l10n tree. Unzip this and overwrite l10nFork.py with the latest version r- based on my and bhearsum's comments, but I agree, this is going forward.
Attachment #292015 - Flags: review?(l10n) → review-
Sorry, for late reply (fixing computer, preparing for other projects and holiday's mood) I have added the suggested comments into this revision. The following has not been tackled yet: > For the directory tree iteration, I'd use os.walk. CompareLocales.FileCollector > does some of that, though dead-on ugly. I might do that on my playground, I > just checked in a bunch of stuff in > http://hg.mozilla.org/users/axel_mozilla.com/tooling/. > > There's a static helper function Parser.getParser, which should help you avoid > the special casing you're currently doing in the loop. > > I'm looking forward to see the actual forking code to be integrated with > l10n-merge. Do you have ideas on how to factor that code? Not sure if that > stuff doesn't call for a real class approach instead of hashes of strings for > the output of the parsers.
Comment on attachment 294966 [details] [diff] [review] 0.3.4 - Fixed comments, using Documentation strings, used Parser.error() ># After execution if you go to: ># <newLocaleRegion>/browser/chrome/browser/preferences/colors.dtd ># you can see changes if we still aply the change "color" to "colour" Fix the spelling mistake. This is good. I think the next step here is to address the things you explicitly didn't in this version, and define/use an input file to read changes from.
Attachment #292015 - Attachment description: 0.3.2 - creates new regional l10n tree with the changes applied to it → This for testing over an l10n tree. Unzip this and overwrite l10nFork.py with the latest version
Attachment #294966 - Flags: review?(bhearsum) → review+
Comment on attachment 294966 [details] [diff] [review] 0.3.4 - Fixed comments, using Documentation strings, used Parser.error() I'm going with an r- still. This got better, yes, but we're still processing the whole file instead of just the localized strings. My own fork on hg.mozilla.org got a good deal better, but I still don't have anybody volunteering to review it :-( The biggest deficiency is that l10n-merge is currently broken, didn't have time to update that yet.
Attachment #294966 - Flags: review?(l10n) → review-
Thanks for the review. This bug will be continued by Rueen. He will continue the fixes and reviews. Armen
This is the newest version of the l10n Auto Fork tool - Version 0.4. The next release will be out shortly with new changes. For more information regarding the project as well as the features being incorporated with each release, check out this wiki: http://zenit.senecac.on.ca/wiki/index.php/Automated_localization_build_tool#Project_Plan
Instructions: 1. Unzip folder 2. Open command-line 3. Navigate to folder 4. Type: "l10nFork.py --source en --new en-CA --verbose" 5. Navigate to ./en-CA/browser/chrome/branding/brand.properties 6. Verify that original "color=color" (the one in the en version) has been changed to new "color=colour" (en-CA version) This tool has been turned into a "real" command-line utility now. Typing "l10nFork.py --help" or "l10nFork.py -h" would yield the following: Options: -h, --help "Show help message and exit" -v, --verbose "Show details of Forking process" -a, --abspath "Shows the absolute path of the files being parsed" -t, --template "Create a localization template in your cwd" -i, --import "Use this option to import a localization template. To generate one, use the '-t' flag" -s --source "The original source that will be forked" -n --new "The name given to the new forked tree" (In reply to comment #20) > Created an attachment (id=300224) [details] > l10n Auto-Fork Version 0.4 > This is the newest version of the l10n Auto Fork tool - Version 0.4. The next > release will be out shortly with new changes. For more information regarding > the project as well as the features being incorporated with each release, check > out this wiki: > http://zenit.senecac.on.ca/wiki/index.php/Automated_localization_build_tool#Project_Plan
(Forgot to mention) By the 0.7 Release - which is scheduled sometime in March - the tool will be able to generate a localization template, as well as apply them when creating a new forked source tree. In which case, the tool would be used like: ***(The following commands will work at 0.7 release - not now)*** STEP 1: Generate a template using this command: ("l10nFork.py -t" or "l10nFork.py --template") STEP 2: Localizer will use the template - which will be in DTD, properties, or xml format still need to be determined... - and fill in localization information STEP 3: Apply the template and output forked source tree using this command: ("l10nFork.py -s en -n en-CA -i SomeLocaleFile.properties -v") *I apologize for so many consecutive bug replies.
Release 0.5 (added: validation; template generator)
Attachment #300224 - Attachment is obsolete: true
Template generated by running the following command: "l10nFork.py -t" or "l10nFork.py --template" System will output a new properties file to your cwd. Checks for existing one first.
Release 0.6 - Completed Localization Template functionality
Attachment #300972 - Attachment is obsolete: true
[l10n Auto-Fork Release 0.7] Download 0.7 --> http://zenit.senecac.on.ca/wiki/imgs/L10nFork0.7.zip The tool is now able to localize the default search engines. (eg; www.google.com = www.google.in). Localizes the XML files in /browser/locales/en-XY/searchplugins/ (just google right now, will do same for Wikipedia, etc.. before 0.8 release)
Would appreciate review of the auto-fork tool. Walks full source tree looking for DTD and properties files and makes changes to their Key values based on localization template (found in attachment). Notes: The tool has been tested on a full FF source tree. Successfully changed strings and default google search to those specified in "locale_template.properties" file located in attachment. Results of testing on full FF source can be found here (http://rueenfiez.files.wordpress.com/2008/03/ss2.jpg) and (http://rueenfiez.files.wordpress.com/2008/03/ss1.jpg) (Plan to allow for changes to default RSS feeds, dictionary, and more default search engines in future releases)
So far, this tool is able to localize/regionalize: -Strings -Default search engines -Default dictionary By having the localizer enter the change he/she wants to make into a simple .properties file, thus, eliminating the need for them to read any code. For my future release I plan to: -Be able to localize the default RSS feeds -Optomize the Python script's functions to increase localization speed (not really necessary but would be nice if there is a way. Copying the whole FF folder takes a long time for the Python script) I requested review, would appreciate any feedback or even direction on what else the Script should localize/regionalize
Comment on attachment 311697 [details] Able to localize strings, default search engines, and default dictionary of a full Firefox build. Hi Ruen, comment 18 still applies. Add a <!ENTITY center.label "this will break"> to your sample, and you'll see why. Zbigniew "Gandalf" Braniecki is writing a related library, which might come in handy, too. You can check his code at http://svn.braniecki.net/wsvn/Mozpyl10n/. On the general code: Please don't put comments after code lines, source code should in general not be wider than 80 chars. If you put the comments before the code it's commenting on, things will be much more readable. I at least had a hard time to decipher and find things. For debug output, I suggest looking at the logging module. That enables you to set a debugging level at the beginning, and then you can just use .info or .debug etc inside the code. Much more readable, again.
Attachment #311697 - Flags: review?(l10n) → review-
Hi Axel, Thank you so much for reviewing my code. I'm going to get started on the points you mentioned right now. Again, thanks very much for taking the time for the review.
- Implemented changes Axel requested - Seperate localization templates implemented for Strings/Labels and Search Defaults - Fixed critical bug found when first char of KEY is different from first char of VALUE Plan to implement functionality that localizes default RSS feeds in final release. Also plan to optomize localization process. This is tested on full en-US FF build. Screenshot of default Amazon search in en-US Firefox build being localized to German (http://rueenfiez.files.wordpress.com/2008/04/ss1r9_amaazon.jpg)
wonder how close this is to helping us to produce builds for places like es-MX and other locales?
Anything left to be done in this bug? Axel, WONTFIX?
Comment on attachment 316511 [details] Release 1.0 - Faster localization process and new functionality (tool can auto generate search engine localization file with predefined keys). We're going to do this differently in the end.
Attachment #316511 - Flags: review?(l10n) → review-
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.