Closed Bug 608004 Opened 14 years ago Closed 14 years ago

l10n repack jobs should be able to do multiple repacks per job

Categories

(Release Engineering :: General, defect, P4)

Product:

Component:

Platform:

All

macOS

Type:

defect

Priority:

P4

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: bhearsum, Unassigned)

References

Details

(Whiteboard: [l10n])

Attachments

(1 file, 3 obsolete files)

l10n repack logic, in a script! 14 years ago bhearsum@mozilla.com (:bhearsum) 15.67 KB, patch	Pike : review-	Details \| Diff \| Splinter Review
updated scripts 14 years ago bhearsum@mozilla.com (:bhearsum) 24.38 KB, patch	Pike : review-	Details \| Diff \| Splinter Review
buildbotcustom changes 14 years ago bhearsum@mozilla.com (:bhearsum) 8.47 KB, patch		Details \| Diff \| Splinter Review
scripts updated with purge_builds and a few other things 14 years ago bhearsum@mozilla.com (:bhearsum) 31.14 KB, patch	catlee : review+ bhearsum : checked-in+	Details \| Diff \| Splinter Review

bhearsum@mozilla.com (:bhearsum)

Reporter

Description

•

14 years ago

We currently fire 1 job for every locale, which means that we deal with setup/teardown cost per locale, even though a single slave ends up dealing with many jobs for a set of repacks. We should be able to do N repacks in a job to save (setup/teardown cost) * N-1. This is especially helpful in releases where l10n repacks affect end to end time non-trivially.

bhearsum@mozilla.com (:bhearsum)

Reporter

Updated

•

14 years ago

Depends on: 608005

Armen [:armenzg]

Comment 1

•

14 years ago

We also have to make sure that we can report the logs of each locale individually as we post to each sub-tinderbox page ("Mozilla-l10n-$locale"). I guess we also have to give to each locale the right exit status rather than share the same job exit status (as some locales can be red, others oranges and others green). Axel, anything else for us to watch for?

Whiteboard: [l10n]

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 2

•

14 years ago

Do people still use that? I thought everyone was looking at the dashboard these days. I'd argue that we shouldn't make design decisions based on Tinderbox support, too. Status for new things should come through Buildbot JSON.

Axel Hecht [:Pike]

Comment 3

•

14 years ago

According to my coconuts (sorry, should put that online, but busy with l10n-changesets), l10n-repacks spend their time in three steps, generally: configure installers partial mar configure is not really sweet, but not too bad, either. installers and partial take the biggest part, varying by platform. Sounds like a premature optimization. And yes, if we tune our infrastructure to reporting, it should be forwards, not backwards. PS: coconuts is a django app on top of the status db dumps that catlee put up.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 4

•

14 years ago

(In reply to comment #3) > According to my coconuts (sorry, should put that online, but busy with > l10n-changesets), l10n-repacks spend their time in three steps, generally: > > configure > installers > partial mar > > configure is not really sweet, but not too bad, either. installers and partial > take the biggest part, varying by platform. configure alone adds about 30 minutes on the end2end time on Windows. (~30s * 74 locales) -- when we're talking about an overall end2end time of 6 hours, that's pretty significant. > Sounds like a premature optimization. Optimization is only part of this. The other parts are: - Moving logic out of BuildFactory's - Enabling one-build-per-builder releases, which helps a ton in the scheduling area. > And yes, if we tune our infrastructure to reporting, it should be forwards, not > backwards. Sorry, I don't quite follow. The specific thing I'm concerned about is: Is it a requirement that l10n builds send mail to the per-locale Tinderbox trees?

Axel Hecht [:Pike]

Comment 5

•

14 years ago

If we run configure via make -f client.mk /Users/axelhecht/src/central/mozilla-central/../fx-debug/Makefile that should stop being an issue. Requires mozconfigs to be used, though. I don't see any value moving logic or one-build-per-builder, from the given context. Regarding reporting, anything that makes reporting harder is a no-go, basically. That's true in this very abstract form, independent on whether that's affecting existing or non-existing reports. Just generally, reporting is much more important than scheduling.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 6

•

14 years ago

(In reply to comment #5) > Regarding reporting, anything that makes reporting harder is a no-go, > basically. That's true in this very abstract form, independent on whether > that's affecting existing or non-existing reports. Just generally, reporting is > much more important than scheduling. I don't disagree with that general principle. Can you please answer my question explicitly though? I'm having trouble interpreting your answer.

Axel Hecht [:Pike]

Comment 7

•

14 years ago

Yes, people still look at the individual tinderbox pages, mostly because the top-level all-locales page drop builds, and usually have way more builds than one can look at. Tinderbox is not something that I call reporting, though. As such, "works on tinderbox" isn't an argument beyond "doesn't regress the cruft we have".

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 8

•

14 years ago

(In reply to comment #7) > Yes, people still look at the individual tinderbox pages, mostly because the > top-level all-locales page drop builds, and usually have way more builds than > one can look at. OK. I'll make sure that whatever does or doesn't happen here doesn't regress that. (In reply to comment #5) > If we run configure via > > make -f client.mk > /Users/axelhecht/src/central/mozilla-central/../fx-debug/Makefile > > that should stop being an issue. Requires mozconfigs to be used, though. What Makefile is fx-debug/Makefile? http://hg.mozilla.org/mozilla-central/file/39a979e26931/Makefile.in ?

Axel Hecht [:Pike]

Comment 9

•

14 years ago

That's the generated Makefile in my local objdir. Using my quick test, it doesn't matter if it's an absolute or relative path in that command either, luckily.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 10

•

14 years ago

(In reply to comment #9) > That's the generated Makefile in my local objdir. Using my quick test, it > doesn't matter if it's an absolute or relative path in that command either, > luckily. Ok, cool. Thanks for that info

Armen [:armenzg]

Comment 11

•

14 years ago

If things get too complicated. Would it help to have a TBPL page for Mozilla-l10n tbox page and drop per locale tbox pages? Would this page still work http://l10n.mozilla.org/~axel/nightlies/?

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 12

•

14 years ago

Please, let's not extend this bug to changing l10n reporting unless it ends up overly difficult to maintain.

bhearsum@mozilla.com (:bhearsum)

Reporter

Updated

•

14 years ago

Depends on: 609901

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 13

•

14 years ago

Attached patch l10n repack logic, in a script! (obsolete) — Details — Splinter Review

This is a somewhat rough version, but I'd love some feedback on it. Notable things: - Some stuff is just a copy of existing logic in buildbotcustom (eg, postUploadCmdPrefix) - I've tried to separate the l10n logic in such a way that bits that can be shared with nightly l10n builds are not in release-specific spots. These are in build/l10n.py - The way I've written these, we'll want to start using a separate mozconfig for l10n builds. This is something we've wanted to do for awhile (bug 482447) anyways. - There's some messy stuff with paths in here due to MSYS -> env var -> MSYS crap. (Basically, path constructed with path.join(), stuffed into an env var, and read by another msys program do not work.) I'm OK with what I ended up with here, but frustration is a bias -- let me know if you'd like to see it cleaner. - I'm planning to write a wrapper similar to that in bug 508896 that will be invoked from Buildbot. I thought about reading json directly here but in my brain, it seems to partly defeat the purpose of moving things out of Buildbot if the main scripts themselves depend on buildbot-configs stuff. - This supports multiple repacks without doing any of the setup/teardown twice. I tested on mac (10.5), linux (32-bit), and windows

Attachment #489009 - Flags: review?(catlee)

bhearsum@mozilla.com (:bhearsum)

Reporter

Updated

•

14 years ago

Blocks: 482447

Axel Hecht [:Pike]

Comment 14

•

14 years ago

Comment on attachment 489009 [details] [diff] [review] l10n repack logic, in a script! This is obscuring a ton of things that are worthwhile understanding when looking at repack builds. I don't think we should go down this route, and I do so very strongly.

Attachment #489009 - Flags: review-

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 15

•

14 years ago

(In reply to comment #14) > Comment on attachment 489009 [details] [diff] [review] > l10n repack logic, in a script! > > This is obscuring a ton of things that are worthwhile understanding when > looking at repack builds. > > I don't think we should go down this route, and I do so very strongly. If you're going to object so strongly, please be more specific. The ultimate goal here is getting the repack logic out of Buildbot and making it more parallelizable. Do you have an alternate implementation suggestion?

Axel Hecht [:Pike]

Comment 16

•

14 years ago

More parallelizable by making it less modular? That cannot work. Also, the idea to get logic out of one piece of software into a different piece of software is not a goal in itself. Right now, you can at least point people to "check what our builds are doing", and this bug is taking that away.

Aki Sasaki (not active)

Comment 17

•

14 years ago

If the script has a for loop (per locale), we should be able to report/notify in that for loop. I only see goodness coming from this.

Axel Hecht [:Pike]

Comment 18

•

14 years ago

I consider the logic move to be bad for Mozilla, really. Right now, at least for those that have first-class access to the data, you can just check which step failed with what result and decide if it's an locale problem or an infrastructure problem. Like, the ssh upload stuff. just some easy db queries, and you get exactly when an upload step failed on which slave.

Aki Sasaki (not active)

Comment 19

•

14 years ago

What's preventing a slave side script from populating such a db? I don't see why buildbot is a prereq for this.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 20

•

14 years ago

And there's various other things we can do to make querying easier -- such as post processing logs. Optimizing for reporting is the wrong thing to do IMHO. We should optimize for efficiency, clarity, maintainability, and other things. Reporting issues can be fixed with tools.

Axel Hecht [:Pike]

Comment 21

•

14 years ago

A) reporting is not a post-processing step, but a first class requirement. I know that none of our build infrastructure does that, but maybe that's part of the reason why we're not shipping? Or why I am running my own build system? B) efficiency, clarity and whatnot are nice to have, no goals. Nor bound to what you're doing. C) Good coding style has nothing to do with spreading the code across more files with more lengthy command lines that noone but one releng system is ever going to use. You're really just ripping out features that once you add them back will get you into a worse place than before.

Aki Sasaki (not active)

Comment 22

•

14 years ago

Mozharness should be able to trim the command line args needed to as few as zero. Reporting can be done even more efficiently in a slave side script than in buildbot afaict. I think you're arguing that if we do this, we need to do it well, and I agree.

Axel Hecht [:Pike]

Comment 23

•

14 years ago

I'm arguing that if you want to do something well, focus on the things that matter instead of those that don't. Create real mozconfigs, use failsafe mercurial steps, generate reports on infrastructure problems, refactor the build factories to not do something different all around. Make configure use the build system to be quick.

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 24

•

14 years ago

Comment on attachment 489009 [details] [diff] [review] l10n repack logic, in a script! This has changed a fair bit since I posted it, I'll attach a new one for feedback later.

Attachment #489009 - Attachment is obsolete: true

Attachment #489009 - Flags: review?(catlee)

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 25

•

14 years ago

Attached patch updated scripts (obsolete) — Details — Splinter Review

This is a much much more tested version, and includes a thin .sh wrapper. Despite our conversation yesterday I ended up reading release configs directly in create-release-repacks.py. The only alternative I could find that was reconfigless release friendly was parsing the release/branch config in the .sh wrapper, which is worse IMHO. I expanded the config reading functions in release.info to do some validation. I've tested from within Buildbot across win32, mac (ppc+i386 universal), and 32-bit linux. Buildbotcustom changes to enable that are incoming.

Attachment #489656 - Flags: review?(catlee)

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 26

•

14 years ago

Attached patch buildbotcustom changes (obsolete) — Details — Splinter Review

All my testing was done with "force build". I still need to make sure I can get proper properties/sourcestamp propagation through the Schedulers (specifically, sourcestamp.revision and the release_config property).

Attachment #489657 - Flags: feedback?(catlee)

Axel Hecht [:Pike]

Comment 27

•

14 years ago

Comment on attachment 489656 [details] [diff] [review] updated scripts Nothing in this patch changes my mind. r-. I don't accept this change in infrastructure, fankly, as it's regressing existing functionality, and it's not providing promise for the future. I'm also not open to arguments like "reporting can be done...", as years of experience have shown that nothing's going to happen. I'm not going to hide my frustration here. That said, if you're interested in constructive feedback on the problems you want to solve, I'm happy to discuss those in detail, grouped as it makes sense.

Attachment #489656 - Flags: review-

bhearsum@mozilla.com (:bhearsum)

Reporter

Updated

•

14 years ago

Blocks: 608008

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 28

•

14 years ago

Attached patch scripts updated with purge_builds and a few other things — Details — Splinter Review

New in this version: - purge_builds.py support, with smarter defaults in that script - use mercurial/update calls everywhere to ensure we always get the most recent _RELEASE tag. will also get us using share when possible, once that patch lands - use l10n mozconfigs from $platform/$branch/release/l10n-mozconfig - inherit merge option for release config Axel, I appreciate that you care about our l10n builds but frankly, your unrequested review is not going to stop us from moving forward here. I am more than happy to work with you and do our best to accommodate your reporting requirements but especially for release builds (which are my current target) they are not the primary design consideration.

Attachment #489656 - Attachment is obsolete: true

Attachment #489657 - Attachment is obsolete: true

Attachment #490155 - Flags: review?(catlee)

Attachment #489657 - Flags: feedback?(catlee)

Attachment #489656 - Flags: review?(catlee)

Axel Hecht [:Pike]

Comment 29

•

14 years ago

I do expect that the trouble for release factories will be significantly eased by bug 525438, fwiw. Anyway, I'll make this bug part of a post to .governance.

Axel Hecht [:Pike]

Comment 30

•

14 years ago

CCing LegNeato on this one. Transparency of release builds is on his list, IIRC.

Chris AtLee [:catlee]

Comment 31

•

14 years ago

Comment on attachment 490155 [details] [diff] [review] scripts updated with purge_builds and a few other things >+def readConfig(configfile, keys=[], required=[]): > c = {} > execfile(configfile, c) >- return c['releaseConfig'] >+ for k in keys: >+ c = c[k] >+ items = c.keys() >+ err = False >+ for key in required: >+ if key not in items: >+ err = True >+ log.error("Required item `%s' missing from %s" % (key, c)) >+ if err: >+ raise >+ return c You need to raise some kind of exception here. >+ build.misc.cleanupObjdir(sourceRepoName, objdir, appName) >+ l10nRepackPrep(sourceRepo, sourceRepoName, revision, objdir, >+ mozconfigPath, l10nRepoDir, makeDirs, localeSrcDir, env) >+ input_env = downloadReleaseBuilds(stageServer, product, brand, version, >+ buildNumber, platform) >+ env.update(input_env) >+ >+ err = False >+ for l in locales: >+ try: >+ repackLocale(l, l10nRepoDir, l10nBaseRepo, revision, >+ localeSrcDir, l10nIni, compareLocalesRepo, env, merge) >+ except Exception, e: >+ err = True >+ log.error("Error creating locale '%s': %s", l, e) >+ pass >+ >+ if err: >+ raise Here too. The 'pass' above isn't required either. >+ l10nRepoDir = path.split(releaseConfig["l10nRepoClonePath"])[-1] >+ stageSshKey = path.join("~", ".ssh", branchConfig["stage_ssh_key"]) Does this "~" need to be expanded with os.path.expanduser? r+ otherwise.

Attachment #490155 - Flags: review?(catlee) → review+

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 32

•

14 years ago

Whoops, thanks for catching those empty raises. We actually *must* preserve the ~ rather than expanding it, because it's being set as an environment variable. A path which has been fully expanded by a Windows app (python), set as env var, then read in again by a Windows app (python), then passed to MSYS' ssh will not work, because of quoting or slashes (can't remember which).

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 33

•

14 years ago

Comment on attachment 490155 [details] [diff] [review] scripts updated with purge_builds and a few other things changeset: 1271:9cba7a9fd13e

Attachment #490155 - Flags: checked-in+

bhearsum@mozilla.com (:bhearsum)

Reporter

Comment 34

•

14 years ago

Active in staging now, will be turned in production with the first 0.8.0 releases there.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

bhearsum@mozilla.com (:bhearsum)

Reporter

Updated

•

14 years ago

Blocks: 608005

No longer depends on: 608005, 609901

Axel Hecht [:Pike]

Updated

•

12 years ago

Depends on: 804090

Nobody; OK to take it and work on it

Assignee

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

You need to log in before you can comment on or make changes to this bug.