Closed Bug 613429 Opened 15 years ago Closed 15 years ago

Builds scheduled with unimportant changes have the last unimportant change's revision used in the sourcestamp

Categories

(Release Engineering :: General, defect, P2)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: dustin)

Details

(Whiteboard: [buildbot])

Attachments

(1 file)

Scenario: User Bob pushes revision A, which is determined by the scheduler to be unimportant (e.g. by adding DONTBUILD to the comments) User Charlie pushes revision B, which is important. The scheduler triggers builds at the appropriate time. The build's sourcestamp has two changes, A and B. But the sourcestamp's revision is A (the unimportant one), however it's actually checking out B (the important one). Expected behaviour would be to have B as the sourcestamp's revision, since it's the more recent of the two. We get A as the sourcestamp because the unimportant changes come after the important ones when the build is being created here: http://hg.mozilla.org/build/buildbot/file/f189210bc7de/master/buildbot/schedulers/basic.py#l132 By the time the build comes around, it pulls the sourcestamp from the DB again, and so gets the changes in a different order (the proper order, specified here http://hg.mozilla.org/build/buildbot/file/f189210bc7de/master/buildbot/db/connector.py#l569). For some reason, the Mercurial step ignores the sourcestamp's revision and uses the last change's (http://hg.mozilla.org/build/buildbot/file/f189210bc7de/master/buildbot/steps/source.py#l1018) revision, so we end up checking out the correct revision. So this sucks mostly from a reporting standpoint. Tinderbox will think we've built revision B, but the DB thinks we've built revision A. The binaries are correct, and tests will be also associated with B. Easiest fix I can think of right now is to change that line in the scheduler to say all_changes = unimportant + important. I'd like to file an upstream ticket for this, but trac is dead.
Summary: Builds scheduled with unimportant changes have the unimportant changes revision used in the sourcestamp → Builds scheduled with unimportant changes have the last unimportant change's revision used in the sourcestamp
we could also do all_changes = sorted(important + unimportant, key=lambda c:c.changeid)
Assignee: nobody → catlee
Priority: -- → P2
This seems to affect all (or at least a lot of) multi-changeset pushes. The last three pushes to mozilla-central (ending with 6e4659bdc601) have all had their builds show up on http://build.mozilla.org/builds/running.html tagged with the first changeset in the push, not the last changeset. This confuses TBPL, making it not show gray letters for those builds. (and also confuses developers looking at running.html and find that their missing builds don't seem to be there -- at least not with the expected cset-ID)
(In reply to comment #2) > The last three pushes to mozilla-central (ending with 6e4659bdc601) have all > had their builds show up on http://build.mozilla.org/builds/running.html tagged > with the first changeset in the push, not the last changeset. Just to be clear -- by "first" I mean "oldest", and by "last" I mean "newest".
It's not clear if this is the same root cause or not. Symptoms are definitely the same.
I think comment #2 is unrelated, but I'm not 100% sure so I'll work in this bug for now. As an example of what the DB shows, on 'Rev3 Fedora 12x64 mozilla-central debug test mochitests-1/5', build 314 on buildbot-master1, there were two changes: 1) 61fbdf66e57974ce792bea262728c0fdeae219fa at Fri 19 Nov 2010 11:11:13 2) 16c2e141d4182698d46ea33a020192a4153c5f8a at Fri 19 Nov 2010 10:56:28 The DB has two separate sourcestamps and two separate buildsets/buildrequests for these. Since 2) is the later of the two in this ordering, its revision and build were used to test, even though it was the earlier of the two temporally. One interesting bit is that the two buildrequests were submitted at exactly the same time (11:11:15). This shouldn't happen because the schedulers run every minute, and there's a 15 minute gap between these changes. There are several instance of this message in the log: 2010-11-19 10:53:41-0800 [-] received a Change with when > now; assuming the change happened now The clock on the machine looks ok....
Comments #2 and on are a different problem (bug 613832). I've filed this upstream as http://buildbot.net/trac/ticket/1065.
I'm on it. Should we shoot for tomorrow's downtime?
Assignee: catlee → dustin
This just affects the main scheduler, so it can land any time. Restarting that master is safe.
Attached patch 613429-r1.patchSplinter Review
Sorry if this didn't need review.
Attachment #492700 - Flags: review?(catlee)
Attachment #492700 - Flags: review?(catlee) → review+
Attachment #492700 - Flags: checked-in+
Landed on pm01 and pm02, and scheduler-master and tests-scheduler were restarted. Let's see what happens next time some DONTBUILD builds land, eh?
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: