Closed Bug 662412 Opened 13 years ago Closed 13 years ago

CPU usage is crazy on dashboard

Categories

(Input Graveyard :: Dashboard, defect)

x86
macOS
defect
Not set
blocker

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: davedash, Assigned: davedash)

References

Details

(Whiteboard: [qa-])

Matt, can you isolate this to one of your changes.  git bisect can help you if you can reproduce locally.
Severity: normal → blocker
Blocks: 662423
davedash showed me how to do this... here's the final output.


[root@mrapp-stage02 reporter]# git bisect good
f1765e86729b06dd0fa10a0a56b64497aa322890 is the first bad commit
commit f1765e86729b06dd0fa10a0a56b64497aa322890
Author: Fred Wenzel <fwenzel@mozilla.com>
Date:   Mon Jun 6 12:20:08 2011 -0700

    L10n update (automatic commit)

:160000 160000 83270569cadf961ca4b1eacff3717eb8d49902ef 75f78763b41b23352864bba55e9000393d6f9dce M	locale


No idea how a locale update broke it so badly, but that's what git bisect thinks happened. I did a "service httpd restart" on each step. The timing makes sense though, it's around when we started having problems.

The update_staging.sh jobs are all disabled at the moment (in fact, all input_staging.sh cron jobs are disabled). When a fix is in, please let us know to re-enable them. For the record, they're in /etc/cron.d/stage_updates, and I simply block-commented the entire "input" section (lines 107-124) by 1 hash mark.
Well I don't know what's happened, but the whole thing works fine now, right up to upstream/master. Note that for some reason there are 2 unpushed, locally committed changes on mrapp-stage02. I don't know what they are, and "git reset --hard" doesn't remove them (because they're locally committed already).

There is also an untracked local directory: lib/whoosh_index/. Let us know what to do with it.


Our thinking is that this is not a problem with newly-checked-in code, but with user data on that page. Because of this, I have re-enabled the cron jobs and left it on the 'master' branch. However, locally we've also made a 'tmp_1468919' branch that has that revision in it. If this problem reoccurs in the near future, we should be able to quickly 'git checkout tmp_1468919' and restart Apache. If it's due to user data, one would expect this to have no effect... if it's due to a new checking causing a problem, this may fix it.
You can safely rm -rf lib/whoosh_index we aren't using whoosh.

git fetch && get reset --hard origin/master 

might help.
Jake,

Ping me on IRC when you are around - I have a suspicion that something else is happening.  I feel like on occasion you get a good request, but at other times you don't.

I might need to get access directly to the staging machine.  See if I can put some profiling code in place.
In the meantime, I'll start taking things out of the dashboard and see if it helps.
Assignee: tofumatt → dd
Thanks for the whoosh_index and reset command, that has completely resolved the local commits issue. This didn't have any effect on the problem at hand, but it's nice to have a clean copy. It looks like some regular job ends up making those commits, as later one I was back at "2 commits ahead of origin". Not sure what they are, but it doesn't seem to affect this problem one way or the other.



Here's a summary of what's been tried so far (that I'm aware of):

1) older code (commit 1468919), in case it was a recent checkin causing this. No effect when the problem is happening.

2) cleaned feedback tables, reducing total size: https://bug644302.bugzilla.mozilla.org/attachment.cgi?id=521613. Also did not help, but that's not too terribly surprising (that bug deals with a database problem, not a webserver problem... it would have been "by coincidence only" if it fixed this issue).

3) davedash has made new checkins to remove frequent terms and hide the "trends" section. Neither has solved this.


As long as someone's around to keep an eye on it, we can kill the bad Apache processes quickly, before they build up and kill the server. If it's not resolved by tonight, we'll probably want to disable the site and pick it back up tomorrow.


At present, davedash has disabled parts of the site (frequent terms, at least), but mostly it's operating normally. We are watching it for a few hours, to see if the problem manifests itself again.
Problem hasn't manifested, and as Jake mentioned in a similar bug, we won't bring back that code, unless we do it "right"... or at least do it "less wrong"
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
> Problem hasn't manifested, and as Jake mentioned in a similar bug, we won't bring back that code, unless we do it "right"... or at least do it "less wrong"

Dave, can you file a bug to fix this then? This is removing a piece of functionality that offers insights to our userbase on the dashboard.
Whiteboard: [qa-]
Verified as [qa-].
Status: RESOLVED → VERIFIED
Product: Input → Input Graveyard
You need to log in before you can comment on or make changes to this bug.