Closed Bug 674057 Opened 13 years ago Closed 13 years ago

Wait time reports show jobs with 90+ min waits

Categories

(Release Engineering :: General, defect, P2)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: nthomas)

Details

Attachments

(2 files)

eg: http://groups.google.com/group/mozilla.dev.tree-management/browse_frm/thread/912ac0b3ce6b250a# tegra: 347 0: 340 97.98% 15: 0 0.00% 30: 3 0.86% 45: 0 0.00% 60: 0 0.00% 75: 0 0.00% 90+: 4 1.15% fedora: 636 0: 464 72.96% 15: 4 0.63% 30: 3 0.47% 45: 4 0.63% 60: 1 0.16% 75: 7 1.10% 90+: 153 24.06% I checked all the Tegra jobs for that day and none have a delay between buildrequests.submitted_at and builds.start_time of more than 1 minute. I suspect what is happpening is that http://mxr.mozilla.org/build/source/buildapi/buildapi/model/util.py#180 doesn't have the right patterns to match rebuilds from self-serve, so we aren't actually excluding rebuilds like the reports claim. Haven't managed to track one down to confirm that yet. The report also prefers the original push/tests sendchange instead of the rebuild timestamp http://mxr.mozilla.org/build/source/buildapi/buildapi/model/waittimes.py#96 so we get long times for the waits.
A self-serve rebuild sets a 'reason' like this Rebuilt by <email_address_from_ldap> which isn't caught by WAITTIMES_BUILDSET_REASON_SQL_EXCLUDE = [ "The web-page 'force build' button was pressed by %", "The web-page 'rebuild' button was pressed by %", ] I'm not sure if that takes care of all 150-odd slow requests for fedora, but I suggest we deploy this and see.
Attachment #548296 - Flags: review?(catlee)
I'm half inclined to suggest that we always calculate the wait from buildrequests.submitted_time - builds.start_time since then we can include rebuilds too. The downside is that it misses any delay in hg_push-poll-schedule, and sendchange-schedule, and those are still delays from a developers point of view. What do you think catlee ?
Attachment #548296 - Flags: review?(catlee) → review+
We're still getting distributions like comment #0, only fewer builds in 90+ (for a different day, so that's not a super strong statement). I've verified that the late-starting builds have a buildsets.reason of 'scheduler', so we are excluding rebuilds now. Instead we're getting changesets with DONTBUILD in the push comment, and the report thinks they start when they get merged into the next push.
Attachment #549454 - Flags: review?(catlee)
Assignee: nobody → nrthomas
Priority: -- → P2
Attachment #549454 - Flags: review?(catlee) → review+
Which turns this: Wait time report for buildpool for jobs submitted between Wed, 27 Jul 2011 00:00:00 -0700 (PDT) and Thu, 28 Jul 2011 00:00:00 -0700 (PDT) Total Jobs: 865 Wait Times 0: 831 96.07% 15: 2 0.23% 30: 12 1.39% 45: 2 0.23% 60: 0 0.00% 75: 0 0.00% 90+: 18 2.08% into this: Total Jobs: 762 Wait Times 0: 757 99.34% 15: 1 0.13% 30: 2 0.26% 45: 2 0.26% The drop in Total Jobs I'm surprised by, since I can only find two rows in the changes table where the comment which contain DONTBUILD. I'll take a closer look at that in the next few days. The change is deployed in the meantime.
On no-outage days we're not seeing any more counts on 90+ with a gap to the nearest others timebox.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: