Closed Bug 846342 Opened 11 years ago Closed 11 years ago

buildapi does not show all builds

Categories

(Release Engineering :: General, defect)

x86
All
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jhopkins, Assigned: bhearsum)

References

Details

Attachments

(1 file, 1 obsolete file)

I have run into two cases now where buildapi does not show all builds.  The two examples I have are for pending builds, but this may also affect running and recent build display.

The first instance was with pending builds not showing these:

+----------+------------+---------------------------------------+----------+------------+------------------------------------------------------------------------------+-------------------------+----------+---------+--------------+-------------+
| id       | buildsetid | buildername                           | priority | claimed_at | claimed_by_name                                                              | claimed_by_incarnation  | complete | results | submitted_at | complete_at |
+----------+------------+---------------------------------------+----------+------------+------------------------------------------------------------------------------+-------------------------+----------+---------+--------------+-------------+
| 20943228 |    5739719 | release-mozilla-beta-win32_repack_6/6 |        0 | 1361414621 | buildbot-master32.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid15377-boot1354829935 |        0 |    NULL |   1361409208 |        NULL | 
| 20943229 |    5739720 | release-mozilla-beta-win32_repack_5/6 |        0 | 1361414621 | buildbot-master32.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid15377-boot1354829935 |        0 |    NULL |   1361409225 |        NULL | 
| 20943230 |    5739721 | release-mozilla-beta-win32_repack_4/6 |        0 | 1361414621 | buildbot-master32.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid15377-boot1354829935 |        0 |    NULL |   1361409234 |        NULL | 
| 20943231 |    5739722 | release-mozilla-beta-win32_repack_3/6 |        0 | 1361414631 | buildbot-master30.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid19550-boot1354823282 |        0 |    NULL |   1361409239 |        NULL | 
| 20943232 |    5739723 | release-mozilla-beta-win32_repack_2/6 |        0 | 1361414631 | buildbot-master30.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid19550-boot1354823282 |        0 |    NULL |   1361409244 |        NULL | 
| 20943233 |    5739725 | release-mozilla-beta-win32_repack_1/6 |        0 | 1361414631 | buildbot-master30.srv.releng.scl3.mozilla.com:/builds/buildbot/build1/master | pid19550-boot1354823282 |        0 |    NULL |   1361409251 |        NULL | 
+----------+------------+---------------------------------------+----------+------------+------------------------------------------------------------------------------+-------------------------+----------+---------+--------------+-------------+

The second instance was with 'Thunderbird comm-aurora win32 l10n nightly' not showing in the pending builds list, even though a query showed those builds had the longest wait time.  I can see that is the case today as well:

select id, submitted_at, claimed_at, complete from buildrequests where buildername='Thunderbird comm-aurora win32 l10n nightly' and complete = 0 order by id desc;
+----------+--------------+------------+----------+
| id       | submitted_at | claimed_at | complete |
+----------+--------------+------------+----------+
| 21222627 |   1362059813 |          0 |        0 | 
| 21222626 |   1362059813 |          0 |        0 | 
| 21222625 |   1362059813 |          0 |        0 | 
| 21222624 |   1362059813 |          0 |        0 | 
| 21222623 |   1362059813 |          0 |        0 | 
| 21222622 |   1362059813 |          0 |        0 | 
| 21222621 |   1362059813 |          0 |        0 | 
| 21222620 |   1362059813 |          0 |        0 | 
| 21222619 |   1362059813 |          0 |        0 | 
| 21222618 |   1362059813 |          0 |        0 | 
| 21222617 |   1362059813 |          0 |        0 | 
| 21222616 |   1362059813 | 1362067304 |        0 | 
| 21222615 |   1362059813 |          0 |        0 | 
| 21222614 |   1362059813 |          0 |        0 | 
| 21222613 |   1362059813 | 1362067241 |        0 | 
| 21222612 |   1362059813 |          0 |        0 | 
| 21222611 |   1362059813 | 1362067252 |        0 | 
| 21222610 |   1362059813 |          0 |        0 | 
| 21222609 |   1362059813 |          0 |        0 | 
| 21222608 |   1362059813 | 1362067160 |        0 | 
| 21222607 |   1362059813 |          0 |        0 | 
| 21222606 |   1362059813 | 1362067265 |        0 | 
| 21222605 |   1362059813 |          0 |        0 | 
| 21222604 |   1362059813 |          0 |        0 | 
| 21222601 |   1362059812 | 1362067079 |        0 | 
| 21222600 |   1362059812 | 1362067092 |        0 | 
| 21222599 |   1362059812 | 1362067079 |        0 | 
| 21222592 |   1362059812 | 1362067092 |        0 | 
+----------+--------------+------------+----------+
28 rows in set (0.01 sec)

A search at https://secure.pub.build.mozilla.org/buildapi/pending?numbuilds=10000 for 'comm-aurora' does not show any 'Thunderbird comm-aurora win32 l10n nightly' matches.
It appears to me that the 'if ss.c.revision != None' condition is what causes these builds to be hidden (https://github.com/mozilla/build-buildapi/blob/master/buildapi/model/query.py#L102).

I looked at a few builds:
http://buildbot-master61.srv.releng.use1.mozilla.com:8001/builders/Firefox%20mozilla-central%20linux%20l10n%20nightly/builds/646
http://buildbot-master61.srv.releng.use1.mozilla.com:8001/builders/release-mozilla-beta-android_repack_2%2F10/builds/1
http://buildbot-master61.srv.releng.use1.mozilla.com:8001/builders/release-mozilla-beta-firefox_source/builds/3

And none of them have a revision on their Change or SourceStamp.

I think it's about time that we get rid of this condition, and start showing _all_ builds on these displays. Nearly every morning sheriffs mention that builds aren't starting for platform X, and having l10n/release/other builds shown on these displays would make it easier for both them and us to confirm whether this is because of load, or because of some other problem.

Catlee, Nick - what say you?
Flags: needinfo?(nthomas)
Flags: needinfo?(catlee)
re /pending, /running, and /revision pages, the condition in GetBuilds() does look like the cause. Probably this a shortcut when this code was getting written. Are we talking about making sure a revision is set, or just setting something like 'None' ? Note that a little later in the code we do things like revision[:12] and use revision as the key of dict, so we that'd need adjusting. Other builds, like fuzzing and idle time jobs will also start showing up, and you may find there is more a problem with zombie running builds.

re /recent, I'm not sure what's hiding l10n nightlies as GetHistoricBuilds() doesn't have any particular limit there. Might be something to do with how builds get inserted into statusdb.

We don't have a buildapi staging, but one can built one on cruncher with a little effort.
Flags: needinfo?(nthomas)
Attached patch stop hiding things (obsolete) — Splinter Review
The only real thing of note here is the 4 line ugliness around getting the revision. Revision will always exist as a key AFAICT, but it will sometimes be None, so we can't use foo.get('revision', 'Unknown')[:12], because you can't slice a NoneType.

I also removed shadow central references because it's long dead.

I've left an instance running at http://cruncher.srv.releng.scl3.mozilla.com:55000/running, but I don't know if they'll be any useful jobs running when you look at this. Here's a couple screencaps:
https://people.mozilla.com/~bhearsum/sattap/fdc7adb0.png
https://people.mozilla.com/~bhearsum/sattap/51d9e68a.png
Assignee: nobody → bhearsum
Status: NEW → ASSIGNED
Attachment #777902 - Flags: review?(nthomas)
Comment on attachment 777902 [details] [diff] [review]
stop hiding things

Review of attachment 777902 [details] [diff] [review]:
-----------------------------------------------------------------

There's a bug here, so r-.

::: buildapi/model/query.py
@@ +167,5 @@
>          for r in query_results:
>              real_branch = GetBranchName(r['branch'])
> +            if not real_branch:
> +                real_branch = 'Unknown'
> +            revision = this_result.get('revision')

this_result doesn't exist in this else block, you've got a result object r instead. So requests to /pending end up as 500 errors.

btw, paster --reload is your friend when iterating on code.
Attachment #777902 - Flags: review?(nthomas) → review-
Please check /revision/mozilla-inbound/3772e15f1b45 works too.

I verified the difference in builds shown for /running looked good, and the rest of the code reviewed fine, so not much to do.
edmorley, will tbpl be OK if the json for pending and running builds starts having branches named 'Unknown', and sometimes revision 'Unknown' too ? That's l10n, release and probably bundles etc for the former, fuzzing jobs for the latter.
Flags: needinfo?(catlee)
Flags: needinfo?(emorley)
Yup that should be fine.

For branch 'Unknown', we'll handle the same as if it was a new tree not yet added to TBPL, and for rev 'Unknown' we cross-reference the pending/running revs against those being shown in the current view, in a bunch of places:
https://hg.mozilla.org/webtools/tbpl/file/0218f0dd1194/js/Data.js#l261
https://hg.mozilla.org/webtools/tbpl/file/0218f0dd1194/js/Data.js#l327
https://hg.mozilla.org/webtools/tbpl/file/0218f0dd1194/js/Data.js#l390
...and TBPL has no concept that the rev is supposed to be a hash, so 'Unknown' will work fine (ie we'll just never match against it).
Flags: needinfo?(emorley)
Attached patch fix revisionSplinter Review
I tested a bunch more URLs with this:
http://cruncher.srv.releng.scl3.mozilla.com:55000/recent
http://cruncher.srv.releng.scl3.mozilla.com:55000/running
http://cruncher.srv.releng.scl3.mozilla.com:55000/pending
http://cruncher.srv.releng.scl3.mozilla.com:55000/reports/idlejobs
http://cruncher.srv.releng.scl3.mozilla.com:55000/revision/mozilla-inbound/3772e15f1b45

I also tried browsing around self serve a bit, but that failed because it _requires_ a cache, and memcached support is broken (bug 895916). I didn't want to use redis because I'm afraid of polluting the production cache.

I can do more testing around that if required.
Attachment #777902 - Attachment is obsolete: true
Attachment #778492 - Flags: review?(nthomas)
Comment on attachment 778492 [details] [diff] [review]
fix revision

lgtm, thanks for doing extra testing.
Attachment #778492 - Flags: review?(nthomas) → review+
Attachment #778492 - Flags: checked-in+
Thanks for the detailed review as always, Nick. Hopefully this patch will give sheriffs better insight when slaves get bogged down with l10n or other previously invisible jobs.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Hmm, just realized that I don't know how to deploy this change.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Seems that I don't need to do anything, https://secure.pub.build.mozilla.org/buildapi/running is showing l10n builds already - woot.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
(In reply to Ben Hearsum [:bhearsum] from comment #11)
> Thanks for the detailed review as always, Nick. Hopefully this patch will
> give sheriffs better insight when slaves get bogged down with l10n or other
> previously invisible jobs.

Indeed - thank you! :-)
This got backed out in bug 898688.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Er, it got relanded already.
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: