Closed
Bug 662412
Opened 13 years ago
Closed 13 years ago
CPU usage is crazy on dashboard
Categories
(Input Graveyard :: Dashboard, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
4.1
People
(Reporter: davedash, Assigned: davedash)
References
Details
(Whiteboard: [qa-])
Matt, can you isolate this to one of your changes. git bisect can help you if you can reproduce locally.
Assignee | ||
Updated•13 years ago
|
Severity: normal → blocker
Comment 1•13 years ago
|
||
davedash showed me how to do this... here's the final output. [root@mrapp-stage02 reporter]# git bisect good f1765e86729b06dd0fa10a0a56b64497aa322890 is the first bad commit commit f1765e86729b06dd0fa10a0a56b64497aa322890 Author: Fred Wenzel <fwenzel@mozilla.com> Date: Mon Jun 6 12:20:08 2011 -0700 L10n update (automatic commit) :160000 160000 83270569cadf961ca4b1eacff3717eb8d49902ef 75f78763b41b23352864bba55e9000393d6f9dce M locale No idea how a locale update broke it so badly, but that's what git bisect thinks happened. I did a "service httpd restart" on each step. The timing makes sense though, it's around when we started having problems. The update_staging.sh jobs are all disabled at the moment (in fact, all input_staging.sh cron jobs are disabled). When a fix is in, please let us know to re-enable them. For the record, they're in /etc/cron.d/stage_updates, and I simply block-commented the entire "input" section (lines 107-124) by 1 hash mark.
Comment 2•13 years ago
|
||
Well I don't know what's happened, but the whole thing works fine now, right up to upstream/master. Note that for some reason there are 2 unpushed, locally committed changes on mrapp-stage02. I don't know what they are, and "git reset --hard" doesn't remove them (because they're locally committed already). There is also an untracked local directory: lib/whoosh_index/. Let us know what to do with it. Our thinking is that this is not a problem with newly-checked-in code, but with user data on that page. Because of this, I have re-enabled the cron jobs and left it on the 'master' branch. However, locally we've also made a 'tmp_1468919' branch that has that revision in it. If this problem reoccurs in the near future, we should be able to quickly 'git checkout tmp_1468919' and restart Apache. If it's due to user data, one would expect this to have no effect... if it's due to a new checking causing a problem, this may fix it.
Assignee | ||
Comment 3•13 years ago
|
||
You can safely rm -rf lib/whoosh_index we aren't using whoosh. git fetch && get reset --hard origin/master might help.
Assignee | ||
Comment 4•13 years ago
|
||
Jake, Ping me on IRC when you are around - I have a suspicion that something else is happening. I feel like on occasion you get a good request, but at other times you don't. I might need to get access directly to the staging machine. See if I can put some profiling code in place.
Assignee | ||
Comment 5•13 years ago
|
||
In the meantime, I'll start taking things out of the dashboard and see if it helps.
Assignee: tofumatt → dd
Comment 6•13 years ago
|
||
Thanks for the whoosh_index and reset command, that has completely resolved the local commits issue. This didn't have any effect on the problem at hand, but it's nice to have a clean copy. It looks like some regular job ends up making those commits, as later one I was back at "2 commits ahead of origin". Not sure what they are, but it doesn't seem to affect this problem one way or the other. Here's a summary of what's been tried so far (that I'm aware of): 1) older code (commit 1468919), in case it was a recent checkin causing this. No effect when the problem is happening. 2) cleaned feedback tables, reducing total size: https://bug644302.bugzilla.mozilla.org/attachment.cgi?id=521613. Also did not help, but that's not too terribly surprising (that bug deals with a database problem, not a webserver problem... it would have been "by coincidence only" if it fixed this issue). 3) davedash has made new checkins to remove frequent terms and hide the "trends" section. Neither has solved this. As long as someone's around to keep an eye on it, we can kill the bad Apache processes quickly, before they build up and kill the server. If it's not resolved by tonight, we'll probably want to disable the site and pick it back up tomorrow. At present, davedash has disabled parts of the site (frequent terms, at least), but mostly it's operating normally. We are watching it for a few hours, to see if the problem manifests itself again.
Assignee | ||
Comment 7•13 years ago
|
||
Problem hasn't manifested, and as Jake mentioned in a similar bug, we won't bring back that code, unless we do it "right"... or at least do it "less wrong"
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 9•13 years ago
|
||
> Problem hasn't manifested, and as Jake mentioned in a similar bug, we won't bring back that code, unless we do it "right"... or at least do it "less wrong"
Dave, can you file a bug to fix this then? This is removing a piece of functionality that offers insights to our userbase on the dashboard.
Updated•13 years ago
|
Whiteboard: [qa-]
Updated•7 years ago
|
Product: Input → Input Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•