Closed
Bug 970738
Opened 10 years ago
Closed 10 years ago
Build some jacuzzis
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Unassigned)
References
Details
Attachments
(2 files, 2 obsolete files)
10.71 KB,
patch
|
bhearsum
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
3.61 KB,
patch
|
bhearsum
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
We can speed up build times & reduce disk space by reducing the set of machines that certain build types are runnable on. (small pools of machines == jacuzzis)
Comment 1•10 years ago
|
||
Catlee, Rail and I talked about this this morning. We want to start trying different allocation methods as soon as possible. Because of this, we're going to use static files on a server immediately. These will be replaced by a proper service later. The server will have a few different endpoints: * /$builder. A GET to this will return a the list of slaves that should be used for this builder. * /fallback/$pool. A GET to this will return a list of slaves that can be used for any builder. It is intended to be used as a fallback mechanism. I will work on generating static content for some builders for these endpoints. We will adjust allocations to try out different builder/slave combinations throughout this week. To start with, we'll try 1 on demand and 2 spot instances for some builders on Cedar. Each Buildbot master will talk to the server as part of nextSlave. It will make a GET request to /$builder. If the response is 2xx it will only consider the slaves returned in the body for the build. If the response is anything else, it will make a GET request to /fallback/$pool to obtain the list of fallback slaves. Responses from all unique endpoints will be cached for some period of time (5-10min?) If neither /$builder nor /fallback/$pool return a useful response, the master will consider all slaves associated with that builder. Catlee is going to work on this part. watch_pending.py will also be updated to talk to this service to figure out which slaves should be started for a pending job. It will use the same logic as the Buildbot master for fallback. Rail will be working on this part.
Reporter | ||
Comment 2•10 years ago
|
||
WIP for buildbot nextSlave: https://github.com/catlee/buildbotcustom/compare/master...jacuzzi
Reporter | ||
Comment 3•10 years ago
|
||
So the jacuzzi client-side code is going to interact in fun ways with this: http://hg.mozilla.org/build/buildbot-configs/file/default/mozilla/master_common.py#l188 I was seeing this situation in staging: - builder A had a relatively new request, compared to builder B. - builder A had slave S1 allocated to it in the jacuzzi service - builders A,B have same sets of slaves configured - prioritizeBuilders therefore dropped A from the list of builders. Because it has dropped builders, it triggers the builder loop again, which happens after the processing below is done - builder B tries to find slaves, can't find any since S1 is the only one connected and it's allocated to builder A - prioritizeBuilders runs again. repeat ad nauseam, and generate lots of log data in the meanwhile.
Reporter | ||
Comment 4•10 years ago
|
||
Attachment #8378637 -
Flags: feedback?(nthomas)
Attachment #8378637 -
Flags: feedback?(jhopkins)
Attachment #8378637 -
Flags: feedback?(bhearsum)
Reporter | ||
Comment 5•10 years ago
|
||
Attachment #8378638 -
Flags: feedback?(nthomas)
Attachment #8378638 -
Flags: feedback?(jhopkins)
Attachment #8378638 -
Flags: feedback?(bhearsum)
Comment 6•10 years ago
|
||
Comment on attachment 8378637 [details] [diff] [review] jacuzzi client for nextSlave Looks good. As discussed, a couple of possible improvements: - fetch a single 'cache control' file to indicate that all caches need to be updated (fetched). - store all data in a single file (compressed) and poll that for changes Either of these changes would eliminate the need to poll many files.
Attachment #8378637 -
Flags: feedback?(jhopkins) → feedback+
Comment 7•10 years ago
|
||
Comment on attachment 8378638 [details] [diff] [review] jacuzzi integration for prioritizeBuilders Looks good. I might suggest adding a couple of comments indicating the intention of this code.
Attachment #8378638 -
Flags: feedback?(jhopkins) → feedback+
Comment 8•10 years ago
|
||
(In reply to John Hopkins (:jhopkins) from comment #6) > - store all data in a single file (compressed) and poll that for changes This might be something we can add in addition to the existing endpoints, but I think it's important to keep at least the /builders/* and /machines/* endpoints around so that other tools can make use of them. I suspect this is something that we'll tackle when we write the real server.
Comment 9•10 years ago
|
||
Comment on attachment 8378637 [details] [diff] [review] jacuzzi client for nextSlave Review of attachment 8378637 [details] [diff] [review]: ----------------------------------------------------------------- ::: misc.py @@ +331,5 @@ > + if exc_info: > + log.err() > + log.msg("Jacuzzi %i: %s" % (id(self), msg)) > + else: > + log.msg("Jacuzzi %i: %s" % (id(self), msg)) These messages might be a little confusing as is. If I saw "Jacuzzi 123456: blah blah" in a log I would think that 123456 is the id of a combination of builders/slaves. AFAICT, the same id will be used for a bunch of different jacuzzis. Maybe s/Jacuzzi/JacuzziAllocator/? @@ +343,5 @@ > + if self.allocated_cache: > + cache_time, slaves = self.allocated_cache > + if cache_time > time.time(): > + # TODO: This could get spammy > + self.log("fresh cache: %s" % slaves) I'm OK with spammy to start with. It's better than being too quiet. @@ +373,5 @@ > + # Check the cache for this builder > + self.log("checking cache for builder %s" % str(buildername)) > + c = self.cache.get(buildername) > + if c: > + self.log("cache hit") Is it still a cache it if the cache has expired? @@ +407,5 @@ > + self.cache[buildername] = (time.time() + self.CACHE_MAXAGE, slaves) > + # Filter the list of available slaves by the set the service > + # returned to us > + return [s for s in available_slaves if s.slave.slavename in slaves] > + except urllib2.HTTPError, e: You might be able to make this block simpler by passing available_slaves to get_allocated_slaves, and letting it filter. If you do that, I think you can get rid of the fake 404 stuff and just call it in the "if missing_cache" block above, and in here. @@ +414,5 @@ > + # of our list of available slaves > + if e.code == 404: > + try: > + if not fake_404: > + self.log("got 404; falling back to looking for allocated slaves") Unless I'm misreading, this isn't true. The only difference between a fake 404 and a real 404 is whether or not you cache anything in self.missing_cache. @@ +628,5 @@ > > +_nextAWSSlave_wait_sort = safeNextSlave(J(_nextAWSSlave(aws_wait=60, recentSort=True))) > +_nextAWSSlave_wait_sort_skip_spot = safeNextSlave(J(_nextAWSSlave(aws_wait=60, recentSort=True, > + skip_spot=True))) > +_nextAWSSlave_nowait = safeNextSlave(_nextAWSSlave()) Wow, such wraps.
Attachment #8378637 -
Flags: feedback?(bhearsum) → feedback+
Updated•10 years ago
|
Attachment #8378638 -
Flags: feedback?(bhearsum) → feedback+
Comment 10•10 years ago
|
||
Comment on attachment 8378637 [details] [diff] [review] jacuzzi client for nextSlave Review of attachment 8378637 [details] [diff] [review]: ----------------------------------------------------------------- +1 to the comments Ben made leading to simplification. Helper functions for common code blocks may help too. ::: misc.py @@ +342,5 @@ > + self.log("checking cache allocated slaves") > + if self.allocated_cache: > + cache_time, slaves = self.allocated_cache > + if cache_time > time.time(): > + # TODO: This could get spammy cache_time implies creation time to me, so maybe cache_expiry_time.
Attachment #8378637 -
Flags: feedback?(nthomas) → feedback+
Comment 11•10 years ago
|
||
Comment on attachment 8378638 [details] [diff] [review] jacuzzi integration for prioritizeBuilders Review of attachment 8378638 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozilla/master_common.py @@ +187,5 @@ > + try: > + # Filter the available slaves through the jacuzzi bubbles.. > + slaves = J.get_slaves(b[1].name, slaves) > + except Exception: > + twlog.err("handled exception talking to jacuzzi; trying to carry on") Deliberately twlog instead of log here ?
Attachment #8378638 -
Flags: feedback?(nthomas) → feedback+
Reporter | ||
Comment 12•10 years ago
|
||
Comment on attachment 8378638 [details] [diff] [review] jacuzzi integration for prioritizeBuilders Review of attachment 8378638 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozilla/master_common.py @@ +187,5 @@ > + try: > + # Filter the available slaves through the jacuzzi bubbles.. > + slaves = J.get_slaves(b[1].name, slaves) > + except Exception: > + twlog.err("handled exception talking to jacuzzi; trying to carry on") Yes. The log() method here doesn't handle exceptions.
Reporter | ||
Comment 13•10 years ago
|
||
Attachment #8378637 -
Attachment is obsolete: true
Attachment #8380643 -
Flags: review?(bhearsum)
Reporter | ||
Comment 14•10 years ago
|
||
Attachment #8378638 -
Attachment is obsolete: true
Attachment #8380645 -
Flags: review?(bhearsum)
Comment 15•10 years ago
|
||
Comment on attachment 8380645 [details] [diff] [review] jacuzzi integration for prioritizeBuilders Review of attachment 8380645 [details] [diff] [review]: ----------------------------------------------------------------- Looks fine with your pastebin fix.
Attachment #8380645 -
Flags: review?(bhearsum) → review+
Comment 16•10 years ago
|
||
Comment on attachment 8380643 [details] [diff] [review] jacuzzi client for nextSlave Review of attachment 8380643 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me. Turn on the bubbles!
Attachment #8380643 -
Flags: review?(bhearsum) → review+
Reporter | ||
Comment 17•10 years ago
|
||
Comment on attachment 8380643 [details] [diff] [review] jacuzzi client for nextSlave https://hg.mozilla.org/build/buildbotcustom/rev/93c609f031a4
Attachment #8380643 -
Flags: checked-in+
Reporter | ||
Comment 18•10 years ago
|
||
Comment on attachment 8380645 [details] [diff] [review] jacuzzi integration for prioritizeBuilders https://hg.mozilla.org/build/buildbot-configs/rev/743f092b924b
Attachment #8380645 -
Flags: checked-in+
Comment 19•10 years ago
|
||
in production.
Comment 20•10 years ago
|
||
URL-encoding of builder names went into production.
Comment 21•10 years ago
|
||
Catlee mentioned yesterday that we should put b2g_b2g-inbound_hamachi_eng_dep in a jacuzzi and remove the non-eng one. I did that this morning.
Reporter | ||
Comment 22•10 years ago
|
||
I think we're all done here.
Reporter | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•