Closed Bug 936878 Opened 11 years ago Closed 11 years ago

/buildjson/builds-4hr.js.gz is CRITICAL ** not updading

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P1)

x86
All

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cbook, Assigned: nthomas)

References

Details

via nagios: Service: http file age - /buildjson/builds-4hr.js.gz Host: builddata.pub.build.mozilla.org Address: 63.245.215.57 State: CRITICAL Date/Time: 11-10-2013 00:18:02 Additional Info: HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:18:51 ago - 195813 bytes in 0.036 second response time not closing the trees at this time because they are already closed due to bug 936827
I fixed this up. The weekly restart of the redis server failed: On 10/11/13 9:00 PM, Cron Daemon wrote: > Found redis running on pid 2789 > Open files 295 in /proc, 304 via lsof > Stopping redis-server: [FAILED] > Starting redis-server: [ OK ] > cat: /var/run/redis/redis.pid: No such file or directory > /root/weekly_restart: line 20: test: : integer expression expected > redis confusion: pid_file=, pgrep=2789 > Redis apparantly not running after restart > From redis01:/var/log/redis/redis.log: [2789] 09 Nov 23:57:18 * Background saving terminated with success [2789] 10 Nov 00:00:01 # Received SIGTERM, scheduling shutdown... [2789] 10 Nov 00:00:01 # User requested shutdown... [2789] 10 Nov 00:00:01 * Saving the final RDB snapshot before exiting. [575] 10 Nov 00:00:07 # Opening port 6379: bind: Address already in use [1128] 10 Nov 00:27:09 * Server started, Redis version 2.4.10 ie PID 575 failed to start because it hadn't shut down before that, leaving nothing running when 2789 exited eventually. Manually restarted. Please file these as blockers rather than critical, as it's better to not change the priority after the other blocker is closed and I'd argue this is more important anyway (global tbpl reporting).
Assignee: nobody → nthomas
Severity: critical → blocker
Status: NEW → RESOLVED
Closed: 11 years ago
Priority: -- → P1
Resolution: --- → FIXED
It looks like this failed again. Why wasn't the startup script fixed?
There has been discussion about redoing the redis service (moving off of kvm, into scl3, managed by webops). Please see bug 934627 and bug 934593 for proposed future work.
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.