Closed Bug 942545 Opened 6 years ago Closed 6 years ago
.js .gz not updating, all trees closed
Notification Type: PROBLEM Service: http file age - /buildjson/builds-4hr.js.gz Host: builddata.pub.build.mozilla.org Address: 22.214.171.124 State: CRITICAL Date/Time: 11-24-2013 00:16:15 Additional Info: HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:17:01 ago - 250539 bytes in 0.045 second response time and the companion, Sun 00:21:55 PST  redis01.build.scl1.mozilla.com:procs - redis-server is CRITICAL: PROCS CRITICAL: 0 processes with regex args redis-server All trees closed.
Adding release and john to this bug in case someone checks his emails today. Just as note that the last outage bug 936878 2 weeks ago happened also on a sunday, so whatever process is involved (i guess some weekly process) maybe should not run on a sunday or so in case no one is around :)
Whats going on here? Why is this not being acted on. John?
Severity: critical → blocker
Priority: P1 → --
I've restarted redis on redis01.build.m.o, but have to get on a plane now. Will verify when I can get online next.
Seems to have done the trick, we missed most of the things that died failing to get a signing token since they were more than four hours ago, but dying PGO on fx-team said we were successfully rebuilding builds-4hr.js.gz. Retriggered nightlies on m-c and aurora, killed the b2g nightlies on m-c since they don't care about signing and apparently completed fine the first time. Trees reopened.
Severity: blocker → normal
Ok. The output of the weekly cron was: Found redis running on pid 26858 Open files 316 in /proc, 325 via lsof Stopping redis-server: [FAILED] Starting redis-server: [ OK ] cat: /var/run/redis/redis.pid: No such file or directory /root/weekly_restart: line 20: test: : integer expression expected redis confusion: pid_file=, pgrep=26858 Redis apparantly not running after restart I believe hwine made some changes last time this happened, so we'll need to look further to see what might be causing this.
There has been discussion about redoing the redis service (moving off of kvm, into scl3, managed by webops). Please see bug 934627 and bug 934593 for proposed future work.
(In reply to Amy Rich [:arich] [:arr] from comment #6) > There has been discussion about redoing the redis service (moving off of > kvm, into scl3, managed by webops). Please see bug 934627 and bug 934593 > for proposed future work. In addition to what Amy listed, I note other "make redis more stable" and "have better monitoring on redis" work by both RelEng and IT is being tracked in bug#905587, bug#905616.
IIRC we have removed the cronjob to do the restart. buildduty will be running the restart on Monday mornings until this is a stable process.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.