Closed Bug 905554 Opened 11 years ago Closed 11 years ago

buildapi/scripts/reporter.py hangs generating builds-4hr.js.gz, resulting in tbpl.m.o not showing completed builds

Categories

(Release Engineering :: General, defect)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dbaron, Assigned: nthomas)

References

Details

(Keywords: buildapi)

This may be similar to bug 810049, bug 821232, bug 827443.  Or maybe not.

Currently completed builds are not showing up on https://tbpl.mozilla.org/?tree=Mozilla-Inbound .  Builds that are running just disappear when they finish rather than showing up as completed, which means people aren't seeing results.

Trees are closed (shortly) as a result.
Assignee: server-ops → ashish
A clear example is that:

https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=31c08ca022b3 shows no completed builds (only gray)

https://secure.pub.build.mozilla.org/buildapi/self-serve/mozilla-inbound/rev/31c08ca022b3 shows many completed jobs (in the "builds" section, below "pending" and "running")

If the first link shows plenty of green builds, then the problem is fixed (or at least that part of the queued-up problem is fixed).
Looks like the script that generates builds-4hr.js.gz is stuck, last modified is:
15-Aug-2013 05:38              

http://builddata.pub.build.mozilla.org/buildjson/builds-4hr.js.gz

Note builds-running.js has modified of 15-Aug-2013 07:59, so the server is UTC+0.
(In reply to David Baron [:dbaron] (don't cc:, use needinfo? instead) from comment #0)
> This may be similar to bug 810049, bug 821232, bug 827443.  Or maybe not.

Those failure modes hopefully will not occur now that bug 801461 is fixed :-)
(Note to self: we have until 09:37 UTC+0 before we'll need to manually import TBPL data, since the 4 hr windows won't overlap)
Indeed report-4hr.sh cron is stuck running since 00:05.

buildapi 11898  0.0  0.0   8704   996 ?        Ss   00:05   0:00  |   \_ /bin/sh /home/buildapi/bin/report-4hr.sh
buildapi 11901  0.0  1.1 164788 48384 ?        S    00:05   0:00  |       \_ /home/buildapi/bin/python /home/buildapi/src/buildapi/scripts/reporter.py -z -o /var/www/buildapi/buildjson/builds-4hr.js.gz --starttime 1376535901
Status: NEW → ASSIGNED
Killed the process and removed the lock file. Subsequent run from cron has kicked off.
Not sure how long the script takes to run, don't think it's long - think it may be hung again, since builds-4hr.js.gz.tmp has been created at 15-Aug-2013 08:27 UTC+0 with filesize of 0 bytes, and hasn't changed since.
Ah, there is a log file, missed that oops.

Could you see what ~/reporter-4hr.log contains?
(From IRC)

The log file hasn;t been updated since 00:04.

Running reporter.py manually gave:
{
Thu Aug 15 01:51:13 2013
0.65 get builds
}

Last line of stdout was at:
https://hg.mozilla.org/build/buildapi/file/ce6ca3a6c23e/buildapi/scripts/reporter.py#l108
I've just seen bug 898688, don't know if it's related. That issue was supposed to be fixed by http://hg.mozilla.org/build/buildapi/rev/f988446ed820
Keywords: buildapi
Summary: tbpl.m.o not showing completed builds → buildapi/scripts/reporter.py hangs generating builds-4hr.js.gz, resulting in tbpl.m.o not showing completed builds
Component: Server Operations → Other
Product: mozilla.org → Release Engineering
QA Contact: shyam → joduinn
Flags: needinfo?(bugspam.Callek)
Flags: needinfo?(bhearsum)
Flags: needinfo?(nthomas)
It also looks like the daily report generation script has hung too (builds-2013-08-14.js.gz.tmp and builds-2013-08-15.js.gz.tmp present at http://builddata.pub.build.mozilla.org/buildjson/), but we can regenerate them later (less important than builds-4hr.js.gz).
:nthomas from Releng/Build has been engaged and is looking into the issue.
Assignee: ashish → nthomas
redis was hung in the same way as bug 898739, so I've restarted the service. The builds-4hr.js.gz is generated, working on others.
builds-2013-08-14.js.gz & builds-2013-08-15.js.gz are up to date (not actually used by tbpl), and edmorley says tbpl is working again.
Severity: blocker → normal
Component: Other → Tools
Flags: needinfo?(nthomas)
Flags: needinfo?(bugspam.Callek)
Flags: needinfo?(bhearsum)
QA Contact: joduinn → hwine
Trees reopened - thank you David, Ashish & Nick! :-)
Bug 905587 for the followup investigate/fix.
Severity: normal → blocker
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Blocks: 912428
Blocks: 926246
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.