396617 - Investigate source(s) of Litmus slowdowns

Reporter

Description

•

18 years ago

Tomcat I have noticed of late that Litmus can be very slow. Apparently the story we usually get is that it is because of whatever else is running on the machine is slowing things down. Can Litmus get upgraded hardware or can we do something to improve things a bit here?

Carsten Book [:Tomcat]

Comment 1

•

18 years ago

confirmed, surfing to litmus.mozilla.org takes sometimes a lot time and its a bad user experience, maybe its a issue with the shared VM ?

OS: Mac OS X → All

Hardware: PC → All

Marcia Knous [:marcia]

Reporter

Updated

•

18 years ago

Assignee: ccooper → server-ops

Component: Litmus → Server Operations

Product: Webtools → mozilla.org

QA Contact: litmus → justin

Version: Trunk → other

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

18 years ago

Severity: critical → major

Zach Lipton [:zach]

Comment 2

•

18 years ago

I'm having a hard time identifying the bottleneck here, and would appreciate any advice people have. Memory usage doesn't seem to be the issue: Mem: 514420k total, 110684k used, 403736k free, 1892k buffers Swap: 524280k total, 21408k used, 502872k free, 21392k cached And top reports CPU usage around 80% idle when I hit it with a bunch of requests at once. The same exact actions are sometimes quite speedy and sometimes dog slow, so I'm inclined to believe that slow MySQL queries aren't the issue here. Could this be something related to activity by other users of the VM host/the db server?

Aravind Gottipati [:aravind]

Updated

•

18 years ago

Assignee: server-ops → aravind

Aravind Gottipati [:aravind]

Comment 3

•

18 years ago

When can we reboot and tweak configuration settings on this box?

Zach Lipton [:zach]

Comment 4

•

18 years ago

We can be pretty flexible about downtime as long as we don't do it during testdays or pre-release testing when everyone needs access to the tests. Easiest yhing to do would probably be to ask in #as if a time is ok for everyone. If you need me to generate some load on the server for diagnostics, let me know and I'd be happy to do so. Thanks for helping out with this.

Aravind Gottipati [:aravind]

Comment 5

•

18 years ago

So, the server that hosts the VM has plenty of RAM, I know you said that RAM probably wasn't the issue, but 512 MB is probably too low for any kind of production load. I doubled the RAM on the box, created a 2 GB local swap file and gave it another CPU. Lets see if things look any better.

Marcia Knous [:marcia]

Reporter

Comment 6

•

18 years ago

Stephen and I noticed that the server seemed to speed up right after Aravind rebooted it, but Stephen noted that even after that he was having issues with errors. I noticed that it is not quite as snappy with searches as it was right after it was restarted.

Stephen Donner [:stephend] Not actively reading bugmail

Comment 7

•

18 years ago

(In reply to comment #6) > Stephen and I noticed that the server seemed to speed up right after Aravind > rebooted it, but Stephen noted that even after that he was having issues with > errors. I noticed that it is not quite as snappy with searches as it was right > after it was restarted. The errors I saw were all 500 - internal server errors, and were most definitely post-upgrade, but I too did indeed see the speed boost--when I was able to log in.

Aravind Gottipati [:aravind]

Comment 8

•

18 years ago

At this point I don't think the problem is hardware related, since the symptoms seem to be the same regardless of how much we throw at it. Would it be possible to narrow this down to specific instances or usage scenarios? I am guessing some kind of application memory leak or something like that, but I can't be sure. Usage scenarios would help narrow it down.

Aravind Gottipati [:aravind]

Comment 9

•

18 years ago

haven't heard from anyone yet, please re-open once you have concrete examples to replicate this.

Status: NEW → RESOLVED

Closed: 18 years ago

Resolution: --- → INCOMPLETE

Carsten Book [:Tomcat]

Comment 10

•

18 years ago

reopen - a connection to litmus takes several minutes today (started on the testday) and also community members noticed that and litmus react dog slow when i choose testruns or other actions.

Status: RESOLVED → REOPENED

Resolution: INCOMPLETE → ---

Chris Cooper [:coop] (he/him)

Assignee

Comment 11

•

18 years ago

Some of this is *apparent* slowness. We do a lot of lookups to generate the index page, but we don't have to do it before display. I'm going to push all the big summations into AJAX (like the coverage already is), and that should improve that initial load time substantially. The index page and the test runs page are largely identical, so this will improve both. However, that doesn't mean that something is still not broken at the server level. Aravind: the slow query log isn't turned on in the db, at least not according to 'show variables.' Can we turn it on for the Litmus db if it's not already, if not permanently, at least for the next testday? It's possible we're doing dumb things in SQL-land, and I'm not 100% trusting of Class::DBI either.

Priority: -- → P3

Chris Cooper [:coop] (he/him)

Assignee

Comment 12

•

18 years ago

I landed the index/test run page display improvement this morning. They display pretty quick. We could still use the slow query logs to investigate other parts of the interface for slowdown.

Justin Fitzhugh

Comment 13

•

18 years ago

Based on coop's comments, this is an app issue that needs to be optimized? If new hardware isn't needed, can I close this (or move it to the person who is working on it)?

Chris Cooper [:coop] (he/him)

Assignee

Comment 14

•

18 years ago

Justin: no one's been able to nail down a set of reproduceable conditions for slowdown. I'm going to take this off of IT's plate for now until we can say definitively that this isn't an app problem. Those slow query logs would still help a lot in diagnosing this, if we can get those turned on, please.

Assignee: aravind → ccooper

Status: REOPENED → NEW

Component: Server Operations → Litmus

Product: mozilla.org → Webtools

QA Contact: justin → litmus

Chris Cooper [:coop] (he/him)

Assignee

Comment 15

•

18 years ago

Dropping the severity on this since I think the index/test run changes help a lot.

Severity: major → normal

Status: NEW → ASSIGNED

Summary: Litmus needs to run on better hardware (dog slow on some days) → Investigate source(s) of Litmus slowdowns

Chris Cooper [:coop] (he/him)

Assignee

Comment 16

•

18 years ago

I used the YSlow extension from Yahoo to diagnose a few easy performance wins. I've got that code landed and pushed to the staging server already, and will push it to production later tonight once the testday stragglers trail off.

Priority: P3 → P2

Chris Cooper [:coop] (he/him)

Assignee

Comment 17

•

18 years ago

This code is in production now.

Chris Cooper [:coop] (he/him)

Assignee

Comment 18

•

18 years ago

P3-ing this until I have a chance to do some diagnosis in the staging env.

Priority: P2 → P3

Chris Cooper [:coop] (he/him)

Assignee

Comment 19

•

18 years ago

Seems to be reasonably peppy for me now. Please reopen or (better) file bugs on specific areas of slowness if necessary. Note: we'll use bug 401139 to track the server error/db disconnect issue which is not really related.

Status: ASSIGNED → RESOLVED

Closed: 18 years ago → 18 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

9 years ago

Product: Webtools → Webtools Graveyard

Bugzilla

Investigate source(s) of Litmus slowdowns

Categories

(Webtools Graveyard :: Litmus, defect, P3)

Tracking

(Not tracked)

People

(Reporter: marcia, Assigned: coop)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Updated

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Updated