Closed Bug 555664 Opened 14 years ago Closed 13 years ago

packaged tests/talos can pull older build when merging changes

Categories

(Release Engineering :: General, defect, P3)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: catlee)

References

Details

(Whiteboard: [automation])

Attachments

(1 file, 1 obsolete file)

Consider the following:
* two pushes come in a few minutes apart (more than the treeStableTimer)
* the earlier push gets a slower build slave, the later one a faster slave (either through a more recent objdir, or ix vs VM, or xserve vs mini)
* so the later checkin finishes first
* test/talos is congested enough to merge the sendchanges for the two builds
* we use the last sendchange to look up the urls, which is the earlier revision, and hence don't test the most recent revision

I think we should test the most recent revision in any set of sendchanges, if we can figure out what that is somehow.
Whiteboard: [automation]
I looked into this and it is looking like the changes list on the test running side gives the time that the change was received on that master not the time the build was started (which makes sense).  

Two options that i can think of are putting something in the sendchange (comment field?) that we can use to figure out test running-side order and another would be to look through the change list on the test master and look for the url with the highest buildid.
Priority: -- → P3
I think sendchange supports a 'when' parameter, doesn't it?
Ehsan thinks this a big problem.  I think it's a problem, but not a huge problem, because of the 100s of pending jobs at crunch time, only a subset can be merged together.  e.g. 'Rev3 WINNT 6.1 mozilla-central opt test jsreftest' jobs can only be merged with other 'Rev3 WINNT 6.1 mozilla-central opt test jsreftest' jobs.  I'm going to swag that having more than 5 pending builds of any particular type is pretty rare.

In terms of fixing it, we may be able to use buildbot's nextBuild hook to sort the pending builds by date...as long as we can get the changeset's pushtime quickly...Hitting hg.m.o every time we want to start a build is too slow.
Thinking more about this, I think what you really want is to make sure that the tests are considered in the same order as the builds started.

Builds start in FIFO order according to push time.  If there are multiple builds pending, we will run the latest.

So ordering test runs according to build start time preserves ordering by push time in the usual case.  It also allows for the fact that sometimes we do builds out of push-order due to manual triggering, or nightlies.  I think for these cases, it's better to have the tests run according to build start time rather than push time.  e.g. there are tests pending for revision B, but you want to re-do a build (or just tests) on revision A, which was pushed before B.  If you sort test jobs according to push time, then the new pending job for A would be sorted earlier than the pending job for B, and since we pick the "latest" job from the pile, the pending job for B would be run when we got a free slave.  

Instead, if you sort test jobs according to build start time (and treat manual re-tests w/o builds as having a build start time of the time when the re-test was requested), then the pending job for A sorts after B, and will be chosen as the next test to run.

That's the idea anyway...convincing buildbot to do this is another matter!
Assignee: nobody → catlee
So the idea here is to use the buildid of the request as the key to sort requests by. More recent requests, with higher buildids, sort last, and so end up being built/tested according to buildbot's merging rules.

Dustin had an idea for how we could get rid of merging altogether and also solve this problem, but it's a bit more involved:
* disable treeStableTimer (easy)

* update buildbotcustom.misc.mergeRequests to not merge most things by default. requests older than N hours should be merged just so we don't end up with piles of stuff we can never get to. (medium)

* update buildbot-configs' build prioritization (in master_common.py) to sort builders by the time of last build, rather than the time of oldest request. Also figure out how to do this without starving try builds. (hard)

* add a nextBuild function to every builder so it chooses the latest build first instead of the earliest build first (medium).

* rework our tools/expectations to deal with higher number of pending requests

The desired outcome here is that we'd always do the latest pending build/test request, and then if we have time we can work our way backwards through 'skipped' requests.

Right now this seems like a huge scary change to me...but maybe I just have to sleep on it. The attached patch maintains the current merging behaviour, but ensures that test requests are ordered properly.
Attachment #512660 - Attachment is obsolete: true
Attachment #513211 - Flags: review?(dustin)
Attachment #513211 - Flags: review?(bhearsum)
Attachment #513211 - Flags: review?(dustin) → review+
Attachment #513211 - Flags: review?(bhearsum) → review+
Comment on attachment 513211 [details] [diff] [review]
Reorder requests according to buildid

http://hg.mozilla.org/build/buildbotcustom/rev/2e15e3b5d96d
Attachment #513211 - Flags: checked-in+
This should be fixed now!
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: