Last Comment Bug 555664 - packaged tests/talos can pull older build when merging changes
: packaged tests/talos can pull older build when merging changes
Product: Release Engineering
Classification: Other
Component: Other (show other bugs)
: other
: x86 All
P3 normal (vote)
: ---
Assigned To: Chris AtLee [:catlee]
: 602372 627603 (view as bug list)
Depends on:
  Show dependency treegraph
Reported: 2010-03-29 02:38 PDT by Nick Thomas [:nthomas]
Modified: 2013-08-12 21:54 PDT (History)
9 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---

Reorder requests according to buildid (18.44 KB, patch)
2011-02-15 16:52 PST, Chris AtLee [:catlee]
no flags Details | Diff | Splinter Review
Reorder requests according to buildid (19.96 KB, patch)
2011-02-17 12:10 PST, Chris AtLee [:catlee]
bhearsum: review+
dustin: review+
catlee: checked‑in+
Details | Diff | Splinter Review

Description User image Nick Thomas [:nthomas] 2010-03-29 02:38:45 PDT
Consider the following:
* two pushes come in a few minutes apart (more than the treeStableTimer)
* the earlier push gets a slower build slave, the later one a faster slave (either through a more recent objdir, or ix vs VM, or xserve vs mini)
* so the later checkin finishes first
* test/talos is congested enough to merge the sendchanges for the two builds
* we use the last sendchange to look up the urls, which is the earlier revision, and hence don't test the most recent revision

I think we should test the most recent revision in any set of sendchanges, if we can figure out what that is somehow.
Comment 1 User image John Ford [:jhford] CET/CEST Berlin Time 2010-04-01 08:13:13 PDT
I looked into this and it is looking like the changes list on the test running side gives the time that the change was received on that master not the time the build was started (which makes sense).  

Two options that i can think of are putting something in the sendchange (comment field?) that we can use to figure out test running-side order and another would be to look through the change list on the test master and look for the url with the highest buildid.
Comment 2 User image Chris AtLee [:catlee] 2010-04-01 08:31:47 PDT
I think sendchange supports a 'when' parameter, doesn't it?
Comment 3 User image Chris AtLee [:catlee] 2011-01-20 18:32:26 PST
*** Bug 627603 has been marked as a duplicate of this bug. ***
Comment 4 User image Chris AtLee [:catlee] 2011-01-20 18:47:44 PST
Ehsan thinks this a big problem.  I think it's a problem, but not a huge problem, because of the 100s of pending jobs at crunch time, only a subset can be merged together.  e.g. 'Rev3 WINNT 6.1 mozilla-central opt test jsreftest' jobs can only be merged with other 'Rev3 WINNT 6.1 mozilla-central opt test jsreftest' jobs.  I'm going to swag that having more than 5 pending builds of any particular type is pretty rare.

In terms of fixing it, we may be able to use buildbot's nextBuild hook to sort the pending builds by long as we can get the changeset's pushtime quickly...Hitting hg.m.o every time we want to start a build is too slow.
Comment 5 User image Chris AtLee [:catlee] 2011-01-21 06:09:14 PST
Thinking more about this, I think what you really want is to make sure that the tests are considered in the same order as the builds started.

Builds start in FIFO order according to push time.  If there are multiple builds pending, we will run the latest.

So ordering test runs according to build start time preserves ordering by push time in the usual case.  It also allows for the fact that sometimes we do builds out of push-order due to manual triggering, or nightlies.  I think for these cases, it's better to have the tests run according to build start time rather than push time.  e.g. there are tests pending for revision B, but you want to re-do a build (or just tests) on revision A, which was pushed before B.  If you sort test jobs according to push time, then the new pending job for A would be sorted earlier than the pending job for B, and since we pick the "latest" job from the pile, the pending job for B would be run when we got a free slave.  

Instead, if you sort test jobs according to build start time (and treat manual re-tests w/o builds as having a build start time of the time when the re-test was requested), then the pending job for A sorts after B, and will be chosen as the next test to run.

That's the idea anyway...convincing buildbot to do this is another matter!
Comment 6 User image Chris AtLee [:catlee] 2011-02-15 16:52:21 PST
Created attachment 512660 [details] [diff] [review]
Reorder requests according to buildid

So the idea here is to use the buildid of the request as the key to sort requests by. More recent requests, with higher buildids, sort last, and so end up being built/tested according to buildbot's merging rules.

Dustin had an idea for how we could get rid of merging altogether and also solve this problem, but it's a bit more involved:
* disable treeStableTimer (easy)

* update buildbotcustom.misc.mergeRequests to not merge most things by default. requests older than N hours should be merged just so we don't end up with piles of stuff we can never get to. (medium)

* update buildbot-configs' build prioritization (in to sort builders by the time of last build, rather than the time of oldest request. Also figure out how to do this without starving try builds. (hard)

* add a nextBuild function to every builder so it chooses the latest build first instead of the earliest build first (medium).

* rework our tools/expectations to deal with higher number of pending requests

The desired outcome here is that we'd always do the latest pending build/test request, and then if we have time we can work our way backwards through 'skipped' requests.

Right now this seems like a huge scary change to me...but maybe I just have to sleep on it. The attached patch maintains the current merging behaviour, but ensures that test requests are ordered properly.
Comment 7 User image Chris AtLee [:catlee] 2011-02-17 12:10:38 PST
Created attachment 513211 [details] [diff] [review]
Reorder requests according to buildid
Comment 8 User image Chris AtLee [:catlee] 2011-03-07 14:48:15 PST
Comment on attachment 513211 [details] [diff] [review]
Reorder requests according to buildid
Comment 9 User image Chris AtLee [:catlee] 2011-03-08 08:20:51 PST
This should be fixed now!
Comment 10 User image Nick Thomas [:nthomas] 2012-08-21 14:48:04 PDT
*** Bug 602372 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.