Closed Bug 513056 Opened 15 years ago Closed 14 years ago

do hg.m.o polling outside of the Buildbot master process

Categories

(Release Engineering :: General, defect, P3)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Unassigned)

References

Details

(Whiteboard: [buildmasters][automation])

Attachments

(4 files, 4 obsolete files)

We need to pull out the polling of hg.m.o for a couple of reasons:
* Performance - the less work we have to do on the master, the better. Parsing so many pushlogs all the time is a suspected slow spot.
* Remove duplicate polling - since we separated the masters we poll mozilla-central and other repositories in multiple places. Once polling is centralized, away from the master we can remove that.

The idea is to poll every repository we need to, once, and then notify all appropriate masters of all changes. It would be good to only send wanted changes to each master but sending all changes to all masters would be fine too.
Depends on: 513058
Is this worth doing compared to the concept of a shared pushlog db for all affected repos? If we'd get some momentum behind that, we might be off easier, and more robust.

Not that I know how you've planned to fix this one, and how much risk that is.
I forgot to mention it in the summary, but it's something we need on the road to multiple master instances being fed jobs by the same scheduling system, too. Perf is the main concern at the moment, though.

I plan to get your review on the patches when they're ready, but I'm taking a simple approach, I think:
1) Factor out all polling code into base classes; have barebones classes that inherit them, and ChangeSource. These will be functionality equivalent to the current HgPoller and HgAllLocalesPoller classes.
2) Create subclasses of the new bases that will run outside of Buildbot and send the changes where we want them to goo.

Put simply:
The bases will be responsible for polling, the children will be responsible for passing the changes along.
I had to add Properties support to the Buildbot Change objects to make it possible to make it possible for the locale polling to work outside of the Buildbot process. http://github.com/djmitche/buildbot/commit/bcd04d730f38cd9303e7e22efce7e21778b2c35b
No longer depends on: 513058
Depends on: 513058
This patch passes the unit tests but is untested inside a master.

Needs a bit more work probably, but here's what's done so far:
* Separation of Buildbot specific code and general polling code. The latter is in the Base* classes, the former in Buildbot*
* removal changeHook() (because the Change object supports properties now)
* A few small things, including: remove unnecessary kwargs from HgLocalePoller, spacing fixes
* Update unit tests with new names

If anyone has time to look at this I'd appreciate it, but there's definitely no need for a real review yet.
Attached patch [wip] standalone hg pollers (obsolete) — Splinter Review
Same thing here, WIP, here's what I've done:
* Create pollers that use Sendchange to forward to the master.
* Create minimal config (ini format)

I still need support for queuing up changes when a master goes down and I want to find a better way to deal with shutdown (this is hard). Other than that, I believe it's feature complete. Needs more testing before a real review.
Depends on: 514242
This patch (and the incoming buildbot-configs one) is phase 1 of pulling out the pollers. At a high level, this patch moves all of the Buildbot specific code into subclasses, and leaves the HgPolling stuff in the bases. Renames have been made to most classes to match their new roles. The changeHook has disappeared in favour of the new Change Properties, which will also be used in the out of process pollers. Some minor changes have happened here too: spacing nits, removal of unnecessary constructor arguments, etc.
Attachment #397953 - Attachment is obsolete: true
Attachment #398199 - Flags: review?(ccooper)
Attachment #398199 - Flags: review?(catlee)
This one is pretty straightforward - just switches all the affected masters to using the new names for the pollers. Some unused imports were removed too.
Attachment #397955 - Attachment is obsolete: true
Attachment #398200 - Flags: review?(kairo)
Attachment #398200 - Flags: review?(gozer)
Attachment #398200 - Flags: review?(catlee)
Attached patch out of process HGPollers (obsolete) — Splinter Review
Here's my attempt at out of process polling. The highlights:
* .ini config file that should get the script to do all of the polling we currently do, and report the changes to all the right places.
* BaseSendchangeHgPoller.submitChange is called when there's a change that needs sending. It handles sending to all of its masters and supports a variable number of retries and customizable interval between retrying. Defaults to an infinite amount of retries spaced at 1 minute apart. I think the other two pollers are self explanatory.
* Handles shutting down on SIGTERM, SIGQUIT, or SIGINT. I don't know of a better way than sleeping to try to shut down cleanly.
* Logging is handled by twisted, configurable on the command line.

This is currently running out of ~cltbld/bhearsumblah/tools/buildfarm/hg on staging-master. You can have a look at the logs there if you want to. I've been watching the waterfall and comparing it to pm and pm02 - we haven't missed any changes that they picked up.
Attachment #398730 - Flags: review?(catlee)
This one is pretty straightforward. It probably seems silly to have this place the previous refactor patches but I don't want to have to back out *everything* if the oop polling script blows up.
Attachment #398731 - Flags: review?(catlee)
Attachment #398200 - Flags: review?(catlee) → review+
Comment on attachment 398731 [details] [diff] [review]
patch to disable HgPoller's in mozilla2* and tryserver configs

I like deleting code.
Attachment #398731 - Flags: review?(catlee) → review+
Comment on attachment 398730 [details] [diff] [review]
out of process HGPollers

Just a few nits...

>+        for s in self.senders:
>+            # The last argument here is the number of retries that have
>+            # happened so far. Needs to be tracked here in case we geadjustedChangeTime, properties        # overlapped failures

Something funky going on with the comment here.

>+    def failure(self, res, sender, change, adjustedChangeTime, properties,
>+                retries):
>+        if res.check(networkErrors) in networkErrors:

This is a bit awkward, it can probably be written as
    if res.check(networkErrors):
Attachment #398730 - Flags: review?(catlee) → review-
Comment on attachment 398200 [details] [diff] [review]
master side patch for poller refactor

Looking good from my end.
Attachment #398200 - Flags: review?(gozer) → review+
I've addressed your review comments in this version, Chris.
Attachment #398730 - Attachment is obsolete: true
Attachment #399716 - Flags: review?(catlee)
Attachment #398200 - Flags: review?(kairo) → review+
As said on the buildbot mailing list, http://thread.gmane.org/gmane.comp.python.buildbot.devel/4950, I continue to think that this is a design bug in the making. The problem that we're not having a centralized change management on hg.m.o should be fixed there, and not hacked around by making things more complex.
Attachment #398199 - Attachment is obsolete: true
Attachment #399745 - Flags: review?(catlee)
Attachment #398199 - Flags: review?(ccooper)
Attachment #398199 - Flags: review?(catlee)
Attachment #399745 - Flags: review?(catlee) → review+
Attachment #399716 - Flags: review?(catlee) → review+
Depends on: 515686
Comment on attachment 398200 [details] [diff] [review]
master side patch for poller refactor

changeset:   1497:16da32d6d680
Attachment #398200 - Flags: checked-in+
Comment on attachment 399745 [details] [diff] [review]
fix compare_attrs in BuildbotHgAllLocalesPoller

changeset:   407:eaf5fa3e5740
Attachment #399745 - Flags: checked-in+
Comment on attachment 399745 [details] [diff] [review]
fix compare_attrs in BuildbotHgAllLocalesPoller

Backed out because silly me forgot that these patches depend on Change Properties
Attachment #399745 - Flags: checked-in+ → checked-in-
Attachment #398200 - Flags: checked-in+ → checked-in-
Seems like this isn't going to happen soon.
Assignee: bhearsum → nobody
Component: Release Engineering → Release Engineering: Future
Status: ASSIGNED → NEW
Mass move of bugs from Release Engineering:Future -> Release Engineering. See
http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Priority: -- → P3
Whiteboard: [buildmasters][automation]
Can we close this one, now that we do polling in the scheduler master?
Yeah, having a separate scheduler master is functionally equivalent. RESO FIXED, in my opinion.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: