Closed Bug 942545 Opened 12 years ago Closed 12 years ago

builds-4hr.js.gz not updating, all trees closed

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

References

Details

Phil Ringnalda (:philor)

Reporter

Description

•

12 years ago

Notification Type: PROBLEM Service: http file age - /buildjson/builds-4hr.js.gz Host: builddata.pub.build.mozilla.org Address: 63.245.215.57 State: CRITICAL Date/Time: 11-24-2013 00:16:15 Additional Info: HTTP CRITICAL: HTTP/1.1 200 OK - Last modified 0:17:01 ago - 250539 bytes in 0.045 second response time and the companion, Sun 00:21:55 PST [4308] redis01.build.scl1.mozilla.com:procs - redis-server is CRITICAL: PROCS CRITICAL: 0 processes with regex args redis-server All trees closed.

Carsten Book [:Tomcat]

Updated

•

12 years ago

Blocks: 926246

Carsten Book [:Tomcat]

Comment 1

•

12 years ago

Adding release and john to this bug in case someone checks his emails today. Just as note that the last outage bug 936878 2 weeks ago happened also on a sunday, so whatever process is involved (i guess some weekly process) maybe should not run on a sunday or so in case no one is around :)

Andreas Gal :gal

Comment 2

•

12 years ago

Whats going on here? Why is this not being acted on. John?

Flags: needinfo?(joduinn)

Andreas Gal :gal

Updated

•

12 years ago

Blocks: 942503

Andreas Gal :gal

Updated

•

12 years ago

Severity: blocker → critical

Priority: -- → P1

Phil Ringnalda (:philor)

Reporter

Updated

•

12 years ago

Severity: critical → blocker

Priority: P1 → --

Nick Thomas [:nthomas] (UTC+12)

Comment 3

•

12 years ago

I've restarted redis on redis01.build.m.o, but have to get on a plane now. Will verify when I can get online next.

Phil Ringnalda (:philor)

Reporter

Comment 4

•

12 years ago

Seems to have done the trick, we missed most of the things that died failing to get a signing token since they were more than four hours ago, but dying PGO on fx-team said we were successfully rebuilding builds-4hr.js.gz. Retriggered nightlies on m-c and aurora, killed the b2g nightlies on m-c since they don't care about signing and apparently completed fine the first time. Trees reopened.

Severity: blocker → normal

Nick Thomas [:nthomas] (UTC+12)

Comment 5

•

12 years ago

Ok. The output of the weekly cron was: Found redis running on pid 26858 Open files 316 in /proc, 325 via lsof Stopping redis-server: [FAILED] Starting redis-server: [ OK ] cat: /var/run/redis/redis.pid: No such file or directory /root/weekly_restart: line 20: test: : integer expression expected redis confusion: pid_file=, pgrep=26858 Redis apparantly not running after restart I believe hwine made some changes last time this happened, so we'll need to look further to see what might be causing this.

Andreas Gal :gal

Updated

•

12 years ago

No longer blocks: 942503

Amy Rich [:arr] [:arich]

Comment 6

•

12 years ago

There has been discussion about redoing the redis service (moving off of kvm, into scl3, managed by webops). Please see bug 934627 and bug 934593 for proposed future work.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 7

•

12 years ago

(In reply to Amy Rich [:arich] [:arr] from comment #6) > There has been discussion about redoing the redis service (moving off of > kvm, into scl3, managed by webops). Please see bug 934627 and bug 934593 > for proposed future work. In addition to what Amy listed, I note other "make redis more stable" and "have better monitoring on redis" work by both RelEng and IT is being tracked in bug#905587, bug#905616.

Flags: needinfo?(joduinn)

Armen [:armenzg]

Comment 8

•

12 years ago

IIRC we have removed the cronjob to do the restart. buildduty will be running the restart on Monday mornings until this is a stable process.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

7 years ago

Product: Release Engineering → Infrastructure & Operations

BMO Automation

Updated

•

6 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

builds-4hr.js.gz not updating, all trees closed

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

Tracking

(Not tracked)

People

(Reporter: philor, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Updated

Updated

Comment 3

Comment 4

Comment 5

Updated

Comment 6

Comment 7

Comment 8

Updated

Updated