Closed
Bug 674057
Opened 13 years ago
Closed 13 years ago
Wait time reports show jobs with 90+ min waits
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: nthomas)
Details
Attachments
(2 files)
555 bytes,
patch
|
catlee
:
review+
nthomas
:
checked-in+
|
Details | Diff | Splinter Review |
913 bytes,
patch
|
catlee
:
review+
nthomas
:
checked-in+
|
Details | Diff | Splinter Review |
eg: http://groups.google.com/group/mozilla.dev.tree-management/browse_frm/thread/912ac0b3ce6b250a#
tegra: 347
0: 340 97.98%
15: 0 0.00%
30: 3 0.86%
45: 0 0.00%
60: 0 0.00%
75: 0 0.00%
90+: 4 1.15%
fedora: 636
0: 464 72.96%
15: 4 0.63%
30: 3 0.47%
45: 4 0.63%
60: 1 0.16%
75: 7 1.10%
90+: 153 24.06%
I checked all the Tegra jobs for that day and none have a delay between buildrequests.submitted_at and builds.start_time of more than 1 minute. I suspect what is happpening is that
http://mxr.mozilla.org/build/source/buildapi/buildapi/model/util.py#180
doesn't have the right patterns to match rebuilds from self-serve, so we aren't actually excluding rebuilds like the reports claim. Haven't managed to track one down to confirm that yet.
The report also prefers the original push/tests sendchange instead of the rebuild timestamp
http://mxr.mozilla.org/build/source/buildapi/buildapi/model/waittimes.py#96
so we get long times for the waits.
Assignee | ||
Comment 1•13 years ago
|
||
A self-serve rebuild sets a 'reason' like this
Rebuilt by <email_address_from_ldap>
which isn't caught by
WAITTIMES_BUILDSET_REASON_SQL_EXCLUDE = [
"The web-page 'force build' button was pressed by %",
"The web-page 'rebuild' button was pressed by %",
]
I'm not sure if that takes care of all 150-odd slow requests for fedora, but I suggest we deploy this and see.
Attachment #548296 -
Flags: review?(catlee)
Assignee | ||
Comment 2•13 years ago
|
||
I'm half inclined to suggest that we always calculate the wait from
buildrequests.submitted_time - builds.start_time
since then we can include rebuilds too. The downside is that it misses any delay in hg_push-poll-schedule, and sendchange-schedule, and those are still delays from a developers point of view. What do you think catlee ?
Updated•13 years ago
|
Attachment #548296 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 3•13 years ago
|
||
Comment on attachment 548296 [details] [diff] [review]
Extend list of reasons
http://hg.mozilla.org/build/buildapi/rev/12f24be2450d
Attachment #548296 -
Flags: checked-in+
Assignee | ||
Comment 4•13 years ago
|
||
We're still getting distributions like comment #0, only fewer builds in 90+ (for a different day, so that's not a super strong statement). I've verified that the late-starting builds have a buildsets.reason of 'scheduler', so we are excluding rebuilds now. Instead we're getting changesets with DONTBUILD in the push comment, and the report thinks they start when they get merged into the next push.
Attachment #549454 -
Flags: review?(catlee)
Assignee | ||
Updated•13 years ago
|
Assignee: nobody → nrthomas
Priority: -- → P2
Updated•13 years ago
|
Attachment #549454 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 5•13 years ago
|
||
Comment on attachment 549454 [details] [diff] [review]
Exclude DONTBUILD
http://hg.mozilla.org/build/buildapi/rev/fc5ae653d197
Attachment #549454 -
Flags: checked-in+
Assignee | ||
Comment 6•13 years ago
|
||
Which turns this:
Wait time report for buildpool for jobs submitted between Wed, 27 Jul 2011
00:00:00 -0700 (PDT) and Thu, 28 Jul 2011 00:00:00 -0700 (PDT)
Total Jobs: 865
Wait Times
0: 831 96.07%
15: 2 0.23%
30: 12 1.39%
45: 2 0.23%
60: 0 0.00%
75: 0 0.00%
90+: 18 2.08%
into this:
Total Jobs: 762
Wait Times
0: 757 99.34%
15: 1 0.13%
30: 2 0.26%
45: 2 0.26%
The drop in Total Jobs I'm surprised by, since I can only find two rows in the changes table where the comment which contain DONTBUILD. I'll take a closer look at that in the next few days. The change is deployed in the meantime.
Assignee | ||
Comment 7•13 years ago
|
||
On no-outage days we're not seeing any more counts on 90+ with a gap to the nearest others timebox.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•