Closed Bug 489762 Opened 15 years ago Closed 15 years ago

Stage server appears to be down

Categories

(mozilla.org Graveyard :: Server Operations, task, P1)

Tracking

(Not tracked)

VERIFIED WORKSFORME

People

(Reporter: paul, Unassigned)

References

()

Details

Attachments

(1 file)

Attached image Browser message
Stage server appears to be down
Assignee: nobody → server-ops
Component: spreadfirefox.com → Server Operations
Product: Websites → mozilla.org
QA Contact: spreadfirefox-com → mrz
Version: unspecified → other
Severity: normal → critical
out or memory all over the console - first line was:

httpd invoked oom-killer

power cycling, box is coming back online
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
something tells me this should be monitored too.
That'd be awesome.
We need this back online ASAP for spreadfirefox testing.  Is there an ETA?
BTW - i get pinwheeled and things time out, so I don't think this is fixed yet.  Could someone peek?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This blocks SFX and SUMO, so -> blocker.
Severity: critical → blocker
Assignee: server-ops → oremj
msnbot was crawling spreadfirefox.authstage.mozilla.com, which was killing the box. I put up http auth to prevent it from happening again.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
woot, thanks.

what can we do to prevent this production?  or will the production servers just handle this without problems?
i still get connection interrupted.  maybe we didn't find the root cause?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Still down.  reassigning to server-ops so somebody gets paged.
Assignee: oremj → server-ops
Priority: -- → P1
OK - SUMO works for stage, closing.  SFX team needs to provide instructions on how to fix SFX stage.  Will get more info from them -- until then, downgrading severity but leaving this open.
Severity: blocker → major
SUMO is now serving up blank pages.  I get a blank homepage right now, sometimes it's the login or forum pages.  Something is still broken.
OK, sounds like the problem is stemming from a missing table on SFx.

oremj, could you run this query.  it will disable the offending module for now.

update system set status = 0 where name = 'sfx_stats';

Thanks
mysql> update system set status = 0 where name = 'sfx_stats';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0
(In reply to comment #14)
> SUMO is now serving up blank pages.  I get a blank homepage right now,
> sometimes it's the login or forum pages.  Something is still broken.

Restarting memcache fixed it.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
https://spreadfirefox.authstage.mozilla.com/ isn't really "down", now, but it's wicked slow -- reopening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Other sites on the staging server are pretty quick, for example https://support-stage.mozilla.org/en-US/kb/.  The cachegrind for loading spreadfirefox.authstage says there was 31588 invocations of mysql_query, which explains the slowness.
Looks like it is all coming from:
Called From	Count	Total Call Cost	
  sfx_stats_count_new_users @ 136 	14360 	13473 	O
  sfx_stats_count_user_logins @ 160 	14360 	12126 	O
  sfx_stats_count_new_nodes @ 90 	1861 	1760 	O
  sfx_stats_count_new_comments @ 113 	575 	571 	O
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → WORKSFORME
i've turned off the sfx_stats module for now.  I think the way I used Drupal's
hook_menu() (which gets called a ton) in that module was causing the problems.


Fixing now.
Thanks, this is much zippier now -- will file a new bug if we see it slow down again.
Status: RESOLVED → VERIFIED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: