Closed
Bug 942545
Opened 11 years ago
Closed 11 years ago
builds-4hr.js.gz not updating, all trees closed
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
References
Details
Notification Type: PROBLEM
Service: http file age - /buildjson/builds-4hr.js.gz
Host: builddata.pub.build.mozilla.org
Address: 63.245.215.57
State: CRITICAL
Date/Time: 11-24-2013 00:16:15
Additional Info:
HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:17:01 ago - 250539 bytes in 0.045 second response time
and the companion,
Sun 00:21:55 PST [4308] redis01.build.scl1.mozilla.com:procs - redis-server is CRITICAL: PROCS CRITICAL: 0 processes with regex args redis-server
All trees closed.
Comment 1•11 years ago
|
||
Adding release and john to this bug in case someone checks his emails today.
Just as note that the last outage bug 936878 2 weeks ago happened also on a sunday, so whatever process is involved (i guess some weekly process) maybe should not run on a sunday or so in case no one is around :)
Comment 2•11 years ago
|
||
Whats going on here? Why is this not being acted on. John?
Flags: needinfo?(joduinn)
Updated•11 years ago
|
Severity: blocker → critical
Priority: -- → P1
Reporter | ||
Updated•11 years ago
|
Severity: critical → blocker
Priority: P1 → --
Comment 3•11 years ago
|
||
I've restarted redis on redis01.build.m.o, but have to get on a plane now. Will verify when I can get online next.
Reporter | ||
Comment 4•11 years ago
|
||
Seems to have done the trick, we missed most of the things that died failing to get a signing token since they were more than four hours ago, but dying PGO on fx-team said we were successfully rebuilding builds-4hr.js.gz. Retriggered nightlies on m-c and aurora, killed the b2g nightlies on m-c since they don't care about signing and apparently completed fine the first time. Trees reopened.
Severity: blocker → normal
Comment 5•11 years ago
|
||
Ok. The output of the weekly cron was:
Found redis running on pid 26858
Open files 316 in /proc, 325 via lsof
Stopping redis-server: [FAILED]
Starting redis-server: [ OK ]
cat: /var/run/redis/redis.pid: No such file or directory
/root/weekly_restart: line 20: test: : integer expression expected
redis confusion: pid_file=, pgrep=26858
Redis apparantly not running after restart
I believe hwine made some changes last time this happened, so we'll need to look further to see what might be causing this.
Comment 6•11 years ago
|
||
There has been discussion about redoing the redis service (moving off of kvm, into scl3, managed by webops). Please see bug 934627 and bug 934593 for proposed future work.
Comment 7•11 years ago
|
||
(In reply to Amy Rich [:arich] [:arr] from comment #6)
> There has been discussion about redoing the redis service (moving off of
> kvm, into scl3, managed by webops). Please see bug 934627 and bug 934593
> for proposed future work.
In addition to what Amy listed, I note other "make redis more stable" and "have better monitoring on redis" work by both RelEng and IT is being tracked in bug#905587, bug#905616.
Flags: needinfo?(joduinn)
Comment 8•11 years ago
|
||
IIRC we have removed the cronjob to do the restart.
buildduty will be running the restart on Monday mornings until this is a stable process.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•