Closed Bug 964071 Opened 8 years ago Closed 8 years ago

Periodic PGO and non-unified builds shouldn't be running again on pushes that already have them

Categories

(Release Engineering :: General, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: catlee)

References

Details

(Keywords: regression)

Attachments

(1 file)

In theory, periodic builds call lastRevFunc with triggerBuildIfNoChanges=False in http://mxr.mozilla.org/build/source/buildbotcustom/misc.py#1081, and lastRevFunc calls getLastBuiltRevisions, finds where we did PGO/non-unified, and only schedules another job if there's a revision newer than that.

In fact, as the 42 PGO and non-unified builds we've scheduled so far on the tip of https://tbpl.mozilla.org/?tree=Birch show, it doesn't work.

Apparently a regression from the addition of non-unified builds (or a complete coincidence that we broke the db at the same time): https://tbpl.mozilla.org/?tree=Services-Central&fromchange=a2774d4a7b7d&tochange=99a9927e3979 covers a range from a push that sat on the tip for four days but only got one PGO build, in the days before non-unified, to the first push which got a non-unified build, which got two PGO and two non-unified builds before another push took over tip and got six.
Blocks: 957502
The other three times a day we do this are normal severity, but when we do it at 18:00, so we upload builds and logs and trigger tests somewhere around 19:30 when bug 957502 hits, that's some unknown severity between major and blocker depending on how much it contributes to an every night four hour tree closure.
Severity: normal → major
The plea of time immemorial cries out, "how did this ever work?"

In our logs we have:
> 2014-01-27 06:02:42-0800 [-] lastChange returned d698d2058646b1e8886fb48eca8deed81591e1aa
> 2014-01-27 06:02:42-0800 [-] lastBuiltRevisions: [u'd698d2058646b1e8886fb48eca8deed81591e1aa', u'd698d2058646b1e8886fb48eca8deed81591e1aa', u'd698d2058646b1e8886fb48eca8deed81591e1aa', u'd698d2058646b1e8886fb48eca8deed81591e1aa', u'd698d2058646b1e8886fb48eca8deed81591e1aa']
> 2014-01-27 06:02:42-0800 [-] birch periodic: Creating buildset with sourcestamp ['d698d2058646b1e8886fb48eca8deed81591e1aa', "in 'projects/birch'"]

which seems strange because the lastGoodRev seems to know that it's built the latest revision that's on the branch. I believe the problem lies with this code:
http://hg.mozilla.org/build/buildbotcustom/file/8f4ab71ba7d4/misc_scheduler.py#l213

It's an optimization that tries to avoid doing a DB lookup to find the latest revision of a given set of revisions if all the revisions are the same. Notice that it's only using the first 12 characters of the revision string though. If it determines that all the passed in revisions are the same, it would return 'd698d2058646' in this case. This causes the test further down to fail:
http://hg.mozilla.org/build/buildbotcustom/file/8f4ab71ba7d4/misc_scheduler.py#l348

Given that, we should have been building extra PGO builds (and now non-unified builds) for any revisions that are the latest and for which all the other PGO builds (and now non-unified builds) have also completed.
Assignee: nobody → catlee
so I think if we try and return the full revision (or one of the original revisions) here, we'll stop the insanity.
Attachment #8365995 - Flags: review?(bhearsum)
Attachment #8365995 - Flags: review?(bhearsum) → review+
Attachment #8365995 - Flags: checked-in+
buildbotcustom patch is in production as of ~3pm PT! :)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Depends on: 973695
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.