Closed
Bug 846342
Opened 11 years ago
Closed 11 years ago
buildapi does not show all builds
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jhopkins, Assigned: bhearsum)
References
Details
Attachments
(1 file, 1 obsolete file)
3.61 KB,
patch
|
nthomas
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
I have run into two cases now where buildapi does not show all builds. The two examples I have are for pending builds, but this may also affect running and recent build display. The first instance was with pending builds not showing these: +----------+------------+---------------------------------------+----------+------------+------------------------------------------------------------------------------+-------------------------+----------+---------+--------------+-------------+ | id | buildsetid | buildername | priority | claimed_at | claimed_by_name | claimed_by_incarnation | complete | results | submitted_at | complete_at | +----------+------------+---------------------------------------+----------+------------+------------------------------------------------------------------------------+-------------------------+----------+---------+--------------+-------------+ | 20943228 | 5739719 | release-mozilla-beta-win32_repack_6/6 | 0 | 1361414621 | buildbot-master32.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid15377-boot1354829935 | 0 | NULL | 1361409208 | NULL | | 20943229 | 5739720 | release-mozilla-beta-win32_repack_5/6 | 0 | 1361414621 | buildbot-master32.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid15377-boot1354829935 | 0 | NULL | 1361409225 | NULL | | 20943230 | 5739721 | release-mozilla-beta-win32_repack_4/6 | 0 | 1361414621 | buildbot-master32.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid15377-boot1354829935 | 0 | NULL | 1361409234 | NULL | | 20943231 | 5739722 | release-mozilla-beta-win32_repack_3/6 | 0 | 1361414631 | buildbot-master30.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid19550-boot1354823282 | 0 | NULL | 1361409239 | NULL | | 20943232 | 5739723 | release-mozilla-beta-win32_repack_2/6 | 0 | 1361414631 | buildbot-master30.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid19550-boot1354823282 | 0 | NULL | 1361409244 | NULL | | 20943233 | 5739725 | release-mozilla-beta-win32_repack_1/6 | 0 | 1361414631 | buildbot-master30.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid19550-boot1354823282 | 0 | NULL | 1361409251 | NULL | +----------+------------+---------------------------------------+----------+------------+------------------------------------------------------------------------------+-------------------------+----------+---------+--------------+-------------+ The second instance was with 'Thunderbird comm-aurora win32 l10n nightly' not showing in the pending builds list, even though a query showed those builds had the longest wait time. I can see that is the case today as well: select id, submitted_at, claimed_at, complete from buildrequests where buildername='Thunderbird comm-aurora win32 l10n nightly' and complete = 0 order by id desc; +----------+--------------+------------+----------+ | id | submitted_at | claimed_at | complete | +----------+--------------+------------+----------+ | 21222627 | 1362059813 | 0 | 0 | | 21222626 | 1362059813 | 0 | 0 | | 21222625 | 1362059813 | 0 | 0 | | 21222624 | 1362059813 | 0 | 0 | | 21222623 | 1362059813 | 0 | 0 | | 21222622 | 1362059813 | 0 | 0 | | 21222621 | 1362059813 | 0 | 0 | | 21222620 | 1362059813 | 0 | 0 | | 21222619 | 1362059813 | 0 | 0 | | 21222618 | 1362059813 | 0 | 0 | | 21222617 | 1362059813 | 0 | 0 | | 21222616 | 1362059813 | 1362067304 | 0 | | 21222615 | 1362059813 | 0 | 0 | | 21222614 | 1362059813 | 0 | 0 | | 21222613 | 1362059813 | 1362067241 | 0 | | 21222612 | 1362059813 | 0 | 0 | | 21222611 | 1362059813 | 1362067252 | 0 | | 21222610 | 1362059813 | 0 | 0 | | 21222609 | 1362059813 | 0 | 0 | | 21222608 | 1362059813 | 1362067160 | 0 | | 21222607 | 1362059813 | 0 | 0 | | 21222606 | 1362059813 | 1362067265 | 0 | | 21222605 | 1362059813 | 0 | 0 | | 21222604 | 1362059813 | 0 | 0 | | 21222601 | 1362059812 | 1362067079 | 0 | | 21222600 | 1362059812 | 1362067092 | 0 | | 21222599 | 1362059812 | 1362067079 | 0 | | 21222592 | 1362059812 | 1362067092 | 0 | +----------+--------------+------------+----------+ 28 rows in set (0.01 sec) A search at https://secure.pub.build.mozilla.org/buildapi/pending?numbuilds=10000 for 'comm-aurora' does not show any 'Thunderbird comm-aurora win32 l10n nightly' matches.
Assignee | ||
Comment 1•11 years ago
|
||
It appears to me that the 'if ss.c.revision != None' condition is what causes these builds to be hidden (https://github.com/mozilla/build-buildapi/blob/master/buildapi/model/query.py#L102). I looked at a few builds: http://buildbot-master61.srv.releng.use1.mozilla.com:8001/builders/Firefox%20mozilla-central%20linux%20l10n%20nightly/builds/646 http://buildbot-master61.srv.releng.use1.mozilla.com:8001/builders/release-mozilla-beta-android_repack_2%2F10/builds/1 http://buildbot-master61.srv.releng.use1.mozilla.com:8001/builders/release-mozilla-beta-firefox_source/builds/3 And none of them have a revision on their Change or SourceStamp. I think it's about time that we get rid of this condition, and start showing _all_ builds on these displays. Nearly every morning sheriffs mention that builds aren't starting for platform X, and having l10n/release/other builds shown on these displays would make it easier for both them and us to confirm whether this is because of load, or because of some other problem. Catlee, Nick - what say you?
Flags: needinfo?(nthomas)
Flags: needinfo?(catlee)
Comment 2•11 years ago
|
||
re /pending, /running, and /revision pages, the condition in GetBuilds() does look like the cause. Probably this a shortcut when this code was getting written. Are we talking about making sure a revision is set, or just setting something like 'None' ? Note that a little later in the code we do things like revision[:12] and use revision as the key of dict, so we that'd need adjusting. Other builds, like fuzzing and idle time jobs will also start showing up, and you may find there is more a problem with zombie running builds. re /recent, I'm not sure what's hiding l10n nightlies as GetHistoricBuilds() doesn't have any particular limit there. Might be something to do with how builds get inserted into statusdb. We don't have a buildapi staging, but one can built one on cruncher with a little effort.
Flags: needinfo?(nthomas)
Comment 3•11 years ago
|
||
Some docs at https://wiki.mozilla.org/ReleaseEngineering/BuildAPI
Assignee | ||
Comment 4•11 years ago
|
||
The only real thing of note here is the 4 line ugliness around getting the revision. Revision will always exist as a key AFAICT, but it will sometimes be None, so we can't use foo.get('revision', 'Unknown')[:12], because you can't slice a NoneType. I also removed shadow central references because it's long dead. I've left an instance running at http://cruncher.srv.releng.scl3.mozilla.com:55000/running, but I don't know if they'll be any useful jobs running when you look at this. Here's a couple screencaps: https://people.mozilla.com/~bhearsum/sattap/fdc7adb0.png https://people.mozilla.com/~bhearsum/sattap/51d9e68a.png
Comment 5•11 years ago
|
||
Comment on attachment 777902 [details] [diff] [review] stop hiding things Review of attachment 777902 [details] [diff] [review]: ----------------------------------------------------------------- There's a bug here, so r-. ::: buildapi/model/query.py @@ +167,5 @@ > for r in query_results: > real_branch = GetBranchName(r['branch']) > + if not real_branch: > + real_branch = 'Unknown' > + revision = this_result.get('revision') this_result doesn't exist in this else block, you've got a result object r instead. So requests to /pending end up as 500 errors. btw, paster --reload is your friend when iterating on code.
Attachment #777902 -
Flags: review?(nthomas) → review-
Comment 6•11 years ago
|
||
Please check /revision/mozilla-inbound/3772e15f1b45 works too. I verified the difference in builds shown for /running looked good, and the rest of the code reviewed fine, so not much to do.
Comment 7•11 years ago
|
||
edmorley, will tbpl be OK if the json for pending and running builds starts having branches named 'Unknown', and sometimes revision 'Unknown' too ? That's l10n, release and probably bundles etc for the former, fuzzing jobs for the latter.
Flags: needinfo?(catlee)
Updated•11 years ago
|
Flags: needinfo?(emorley)
Comment 8•11 years ago
|
||
Yup that should be fine. For branch 'Unknown', we'll handle the same as if it was a new tree not yet added to TBPL, and for rev 'Unknown' we cross-reference the pending/running revs against those being shown in the current view, in a bunch of places: https://hg.mozilla.org/webtools/tbpl/file/0218f0dd1194/js/Data.js#l261 https://hg.mozilla.org/webtools/tbpl/file/0218f0dd1194/js/Data.js#l327 https://hg.mozilla.org/webtools/tbpl/file/0218f0dd1194/js/Data.js#l390 ...and TBPL has no concept that the rev is supposed to be a hash, so 'Unknown' will work fine (ie we'll just never match against it).
Flags: needinfo?(emorley)
Assignee | ||
Comment 9•11 years ago
|
||
I tested a bunch more URLs with this: http://cruncher.srv.releng.scl3.mozilla.com:55000/recent http://cruncher.srv.releng.scl3.mozilla.com:55000/running http://cruncher.srv.releng.scl3.mozilla.com:55000/pending http://cruncher.srv.releng.scl3.mozilla.com:55000/reports/idlejobs http://cruncher.srv.releng.scl3.mozilla.com:55000/revision/mozilla-inbound/3772e15f1b45 I also tried browsing around self serve a bit, but that failed because it _requires_ a cache, and memcached support is broken (bug 895916). I didn't want to use redis because I'm afraid of polluting the production cache. I can do more testing around that if required.
Attachment #777902 -
Attachment is obsolete: true
Attachment #778492 -
Flags: review?(nthomas)
Comment 10•11 years ago
|
||
Comment on attachment 778492 [details] [diff] [review] fix revision lgtm, thanks for doing extra testing.
Attachment #778492 -
Flags: review?(nthomas) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #778492 -
Flags: checked-in+
Assignee | ||
Comment 11•11 years ago
|
||
Thanks for the detailed review as always, Nick. Hopefully this patch will give sheriffs better insight when slaves get bogged down with l10n or other previously invisible jobs.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 12•11 years ago
|
||
Hmm, just realized that I don't know how to deploy this change.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 13•11 years ago
|
||
Seems that I don't need to do anything, https://secure.pub.build.mozilla.org/buildapi/running is showing l10n builds already - woot.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 14•11 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #11) > Thanks for the detailed review as always, Nick. Hopefully this patch will > give sheriffs better insight when slaves get bogged down with l10n or other > previously invisible jobs. Indeed - thank you! :-)
Assignee | ||
Comment 15•11 years ago
|
||
This got backed out in bug 898688.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 16•11 years ago
|
||
Er, it got relanded already.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Comment 17•11 years ago
|
||
(http://hg.mozilla.org/build/buildapi/rev/c4e7469f95fd)
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•