662412 - CPU usage is crazy on dashboard

Dave Dash [:davedash, :dd] (assign all bugs to mbrandt)

Assignee

Description

•

13 years ago

Matt, can you isolate this to one of your changes.  git bisect can help you if you can reproduce locally.

Dave Dash [:davedash, :dd] (assign all bugs to mbrandt)

Assignee

Updated

•

13 years ago

Blocks: 662194

Dave Dash [:davedash, :dd] (assign all bugs to mbrandt)

Assignee

Updated

•

13 years ago

Severity: normal → blocker

Jake Maul [:jakem]

Updated

•

13 years ago

Blocks: 662423

Jake Maul [:jakem]

Comment 1

•

13 years ago

davedash showed me how to do this... here's the final output.


[root@mrapp-stage02 reporter]# git bisect good
f1765e86729b06dd0fa10a0a56b64497aa322890 is the first bad commit
commit f1765e86729b06dd0fa10a0a56b64497aa322890
Author: Fred Wenzel <fwenzel@mozilla.com>
Date:   Mon Jun 6 12:20:08 2011 -0700

    L10n update (automatic commit)

:160000 160000 83270569cadf961ca4b1eacff3717eb8d49902ef 75f78763b41b23352864bba55e9000393d6f9dce M	locale


No idea how a locale update broke it so badly, but that's what git bisect thinks happened. I did a "service httpd restart" on each step. The timing makes sense though, it's around when we started having problems.

The update_staging.sh jobs are all disabled at the moment (in fact, all input_staging.sh cron jobs are disabled). When a fix is in, please let us know to re-enable them. For the record, they're in /etc/cron.d/stage_updates, and I simply block-commented the entire "input" section (lines 107-124) by 1 hash mark.

Jake Maul [:jakem]

Comment 2

•

13 years ago

Well I don't know what's happened, but the whole thing works fine now, right up to upstream/master. Note that for some reason there are 2 unpushed, locally committed changes on mrapp-stage02. I don't know what they are, and "git reset --hard" doesn't remove them (because they're locally committed already).

There is also an untracked local directory: lib/whoosh_index/. Let us know what to do with it.


Our thinking is that this is not a problem with newly-checked-in code, but with user data on that page. Because of this, I have re-enabled the cron jobs and left it on the 'master' branch. However, locally we've also made a 'tmp_1468919' branch that has that revision in it. If this problem reoccurs in the near future, we should be able to quickly 'git checkout tmp_1468919' and restart Apache. If it's due to user data, one would expect this to have no effect... if it's due to a new checking causing a problem, this may fix it.

Dave Dash [:davedash, :dd] (assign all bugs to mbrandt)

Assignee

Comment 3

•

13 years ago

You can safely rm -rf lib/whoosh_index we aren't using whoosh.

git fetch && get reset --hard origin/master 

might help.

Dave Dash [:davedash, :dd] (assign all bugs to mbrandt)

Assignee

Comment 4

•

13 years ago

Jake,

Ping me on IRC when you are around - I have a suspicion that something else is happening.  I feel like on occasion you get a good request, but at other times you don't.

I might need to get access directly to the staging machine.  See if I can put some profiling code in place.

Dave Dash [:davedash, :dd] (assign all bugs to mbrandt)

Assignee

Comment 5

•

13 years ago

In the meantime, I'll start taking things out of the dashboard and see if it helps.

Assignee: tofumatt → dd

Jake Maul [:jakem]

Comment 6

•

13 years ago

Thanks for the whoosh_index and reset command, that has completely resolved the local commits issue. This didn't have any effect on the problem at hand, but it's nice to have a clean copy. It looks like some regular job ends up making those commits, as later one I was back at "2 commits ahead of origin". Not sure what they are, but it doesn't seem to affect this problem one way or the other.



Here's a summary of what's been tried so far (that I'm aware of):

1) older code (commit 1468919), in case it was a recent checkin causing this. No effect when the problem is happening.

2) cleaned feedback tables, reducing total size: https://bug644302.bugzilla.mozilla.org/attachment.cgi?id=521613. Also did not help, but that's not too terribly surprising (that bug deals with a database problem, not a webserver problem... it would have been "by coincidence only" if it fixed this issue).

3) davedash has made new checkins to remove frequent terms and hide the "trends" section. Neither has solved this.


As long as someone's around to keep an eye on it, we can kill the bad Apache processes quickly, before they build up and kill the server. If it's not resolved by tonight, we'll probably want to disable the site and pick it back up tomorrow.


At present, davedash has disabled parts of the site (frequent terms, at least), but mostly it's operating normally. We are watching it for a few hours, to see if the problem manifests itself again.

Dave Dash [:davedash, :dd] (assign all bugs to mbrandt)

Assignee

Comment 7

•

13 years ago

Problem hasn't manifested, and as Jake mentioned in a similar bug, we won't bring back that code, unless we do it "right"... or at least do it "less wrong"

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Aakash Desai [:aakashd]

Comment 9

•

13 years ago

> Problem hasn't manifested, and as Jake mentioned in a similar bug, we won't bring back that code, unless we do it "right"... or at least do it "less wrong"

Dave, can you file a bug to fix this then? This is removing a piece of functionality that offers insights to our userbase on the dashboard.

Matt Brandt [:mbrandt]

Updated

•

13 years ago

Whiteboard: [qa-]

Matt Brandt [:mbrandt]

Comment 10

•

13 years ago

Verified as [qa-].

Status: RESOLVED → VERIFIED

Nobody; OK to take it and work on it

Updated

•

7 years ago

Product: Input → Input Graveyard

Bugzilla

Quick Search

CPU usage is crazy on dashboard

Categories

(Input Graveyard :: Dashboard, defect)

Tracking

(Not tracked)

People

(Reporter: davedash, Assigned: davedash)

References

Details

(Whiteboard: [qa-])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 9

Updated

Comment 10

Updated