out or memory all over the console - first line was: httpd invoked oom-killer power cycling, box is coming back online
something tells me this should be monitored too.
That'd be awesome.
We need this back online ASAP for spreadfirefox testing. Is there an ETA?
BTW - i get pinwheeled and things time out, so I don't think this is fixed yet. Could someone peek?
This blocks SFX and SUMO, so -> blocker.
msnbot was crawling spreadfirefox.authstage.mozilla.com, which was killing the box. I put up http auth to prevent it from happening again.
woot, thanks. what can we do to prevent this production? or will the production servers just handle this without problems?
i still get connection interrupted. maybe we didn't find the root cause?
Still down. reassigning to server-ops so somebody gets paged.
Works for me (https://support-stage.mozilla.org/en-US/kb/). Code issue?
OK - SUMO works for stage, closing. SFX team needs to provide instructions on how to fix SFX stage. Will get more info from them -- until then, downgrading severity but leaving this open.
SUMO is now serving up blank pages. I get a blank homepage right now, sometimes it's the login or forum pages. Something is still broken.
OK, sounds like the problem is stemming from a missing table on SFx. oremj, could you run this query. it will disable the offending module for now. update system set status = 0 where name = 'sfx_stats'; Thanks
mysql> update system set status = 0 where name = 'sfx_stats'; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0
(In reply to comment #14) > SUMO is now serving up blank pages. I get a blank homepage right now, > sometimes it's the login or forum pages. Something is still broken. Restarting memcache fixed it.
https://spreadfirefox.authstage.mozilla.com/ isn't really "down", now, but it's wicked slow -- reopening.
Other sites on the staging server are pretty quick, for example https://support-stage.mozilla.org/en-US/kb/. The cachegrind for loading spreadfirefox.authstage says there was 31588 invocations of mysql_query, which explains the slowness.
Looks like it is all coming from: Called From Count Total Call Cost sfx_stats_count_new_users @ 136 14360 13473 O sfx_stats_count_user_logins @ 160 14360 12126 O sfx_stats_count_new_nodes @ 90 1861 1760 O sfx_stats_count_new_comments @ 113 575 571 O
i've turned off the sfx_stats module for now. I think the way I used Drupal's hook_menu() (which gets called a ton) in that module was causing the problems. Fixing now.
Thanks, this is much zippier now -- will file a new bug if we see it slow down again.