If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Stage server appears to be down

VERIFIED WORKSFORME

Status

mozilla.org Graveyard
Server Operations
P1
major
VERIFIED WORKSFORME
9 years ago
3 years ago

People

(Reporter: Paul Booker, Unassigned)

Tracking

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

9 years ago
Created attachment 374246 [details]
Browser message

Stage server appears to be down
Assignee: nobody → server-ops
Component: spreadfirefox.com → Server Operations
Product: Websites → mozilla.org
QA Contact: spreadfirefox-com → mrz
Version: unspecified → other
Duplicate of this bug: 489794
Severity: normal → critical

Comment 2

9 years ago
out or memory all over the console - first line was:

httpd invoked oom-killer

power cycling, box is coming back online
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED

Comment 3

9 years ago
something tells me this should be monitored too.
That'd be awesome.
We need this back online ASAP for spreadfirefox testing.  Is there an ETA?
BTW - i get pinwheeled and things time out, so I don't think this is fixed yet.  Could someone peek?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This blocks SFX and SUMO, so -> blocker.
Severity: critical → blocker

Updated

9 years ago
Assignee: server-ops → oremj

Comment 8

9 years ago
msnbot was crawling spreadfirefox.authstage.mozilla.com, which was killing the box. I put up http auth to prevent it from happening again.
Status: REOPENED → RESOLVED
Last Resolved: 9 years ago9 years ago
Resolution: --- → FIXED
woot, thanks.

what can we do to prevent this production?  or will the production servers just handle this without problems?
i still get connection interrupted.  maybe we didn't find the root cause?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Still down.  reassigning to server-ops so somebody gets paged.
Assignee: oremj → server-ops
Priority: -- → P1
Works for me (https://support-stage.mozilla.org/en-US/kb/).  Code issue?
OK - SUMO works for stage, closing.  SFX team needs to provide instructions on how to fix SFX stage.  Will get more info from them -- until then, downgrading severity but leaving this open.
Severity: blocker → major
SUMO is now serving up blank pages.  I get a blank homepage right now, sometimes it's the login or forum pages.  Something is still broken.
OK, sounds like the problem is stemming from a missing table on SFx.

oremj, could you run this query.  it will disable the offending module for now.

update system set status = 0 where name = 'sfx_stats';

Thanks
mysql> update system set status = 0 where name = 'sfx_stats';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0
(In reply to comment #14)
> SUMO is now serving up blank pages.  I get a blank homepage right now,
> sometimes it's the login or forum pages.  Something is still broken.

Restarting memcache fixed it.

Updated

9 years ago
Status: REOPENED → RESOLVED
Last Resolved: 9 years ago9 years ago
Resolution: --- → FIXED
https://spreadfirefox.authstage.mozilla.com/ isn't really "down", now, but it's wicked slow -- reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Other sites on the staging server are pretty quick, for example https://support-stage.mozilla.org/en-US/kb/.  The cachegrind for loading spreadfirefox.authstage says there was 31588 invocations of mysql_query, which explains the slowness.
Looks like it is all coming from:
Called From	Count	Total Call Cost	
  sfx_stats_count_new_users @ 136 	14360 	13473 	O
  sfx_stats_count_user_logins @ 160 	14360 	12126 	O
  sfx_stats_count_new_nodes @ 90 	1861 	1760 	O
  sfx_stats_count_new_comments @ 113 	575 	571 	O

Updated

9 years ago
Status: REOPENED → RESOLVED
Last Resolved: 9 years ago9 years ago
Resolution: --- → WORKSFORME
i've turned off the sfx_stats module for now.  I think the way I used Drupal's
hook_menu() (which gets called a ton) in that module was causing the problems.


Fixing now.
Thanks, this is much zippier now -- will file a new bug if we see it slow down again.
Status: RESOLVED → VERIFIED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.