Closed Bug 718218 Opened 13 years ago Closed 13 years ago

search performance regression in 2.4

Tracking

(Not tracked)

Status:

VERIFIED FIXED

People

(Reporter: rhelmer, Unassigned)

References

Details

Attachments

(1 file)

postgres load average spike 13 years ago Robert Helmer [:rhelmer] 43.09 KB, image/png		Details

Robert Helmer [:rhelmer]

Reporter

Description

•

13 years ago

Attached image postgres load average spike — Details

We're tracking a perfomance regression that the selenium tests are showing in production following the 2.4 push (bug 717055). What we know so far: A run we believe to be equivalent on production 2.3.5.1, from Jan 11th: http://qa-selenium.mv.mozilla.com:8080/job/socorro.prod/1741/testReport/ A run from just now (looks similar to last night): http://qa-selenium.mv.mozilla.com:8080/view/Socorro/job/socorro.prod/1736/testReport/ We have a number of failures (that we believe to be timeouts), and the following test suites seem to have regressed in performance: tests.test_smoke_tests before: 1 min 11 sec after: 6 min 5 sec tests.test_search before: 2 min 11 sec after: 13 min It looks like postgresql has big load-average spikes that coincide with the test timeouts (graph attached), and we get nagios alerts from the front-end and mware servers.

[:jberkus] Josh Berkus

Comment 1

•

13 years ago

Testing individual search queries showed no change in execution time. So something else is going on. Setting up a performance test environment on stagedb.

Robert Helmer [:rhelmer]

Reporter

Comment 2

•

13 years ago

This test seems to be the single worst offender, and it'd make sense since it's doing a much larger query: https://github.com/mozilla/Socorro-Tests/commit/32d53ffb58d6800db8837bc2e814d1ab41c98978 I see I am implicated in this, I was trying to suggest a version that wouldn't expire as fast as Aurora, but did not ask what it was used for :) Can someone please try changing this to 11.0a2 instead?

Robert Helmer [:rhelmer]

Reporter

Comment 3

•

13 years ago

(In reply to Robert Helmer [:rhelmer] from comment #2) > This test seems to be the single worst offender, and it'd make sense since > it's doing a much larger query: > > https://github.com/mozilla/Socorro-Tests/commit/ > 32d53ffb58d6800db8837bc2e814d1ab41c98978 > > I see I am implicated in this, I was trying to suggest a version that > wouldn't expire as fast as Aurora, but did not ask what it was used for :) > > Can someone please try changing this to 11.0a2 instead? I just filed a pull request: https://github.com/mozilla/Socorro-Tests/pull/82

Robert Helmer [:rhelmer]

Reporter

Comment 4

•

13 years ago

Pull request merged, we're running a test against prod: http://qa-selenium.mv.mozilla.com:8080/job/socorro.prod/1745/

Robert Helmer [:rhelmer]

Reporter

Comment 5

•

13 years ago

Just FYI I reversed the links here: (In reply to Robert Helmer [:rhelmer] from comment #0) > A run we believe to be equivalent on production 2.3.5.1, from Jan 11th: http://qa-selenium.mv.mozilla.com:8080/view/Socorro/job/socorro.prod/1736/testReport/ > A run from just now (looks similar to last night): > We have a number of failures (that we believe to be timeouts), and the > following test suites seem to have regressed in performance: http://qa-selenium.mv.mozilla.com:8080/job/socorro.prod/1741/testReport/

Stephen Donner [:stephend] Not actively reading bugmail

Comment 6

•

13 years ago

I think we tracked this down to at least one cause: test_that_filter_for_browser_results has a |while| condition that looks for a "browser_icon" present, and then clicks the "Next" button for each set of search results -- the problem is, it never really clicks Next because that condition is never true. As a stopgap, we're commenting out the offending while/assert, and re-running: https://github.com/mozilla/Socorro-Tests/commit/36a5f9fcac7c189495fd3c413c99e0bac9e700ee (We watched that test running in a Sauce Labs video, which led us to this conclusion.)

Robert Helmer [:rhelmer]

Reporter

Comment 7

•

13 years ago

The problematic test mentioned in comment 6 did turn out to be the problem; it was cycling through all pages looking for a browser icon (which has never existed for the advanced search results page). Yesterday the test was switched to use 9.0.1 instead of 10.0a1 (my suggestion, since Aurora changes so frequently). However this causes the DB load to be so high that things start timing out and unrelated tests fail. This has been reverted. We now have a test run that is comparable to the run on the 11th we were comparing to: http://qa-selenium.mv.mozilla.com:8080/job/socorro.prod/1747/ We discussed and have some ideas for making more representative tests for perf comparisons, and using different methodology (running the same set of tests right before/after push for instance), we can cover this in the post-mortem.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Stephen Donner [:stephend] Not actively reading bugmail

Comment 8

•

13 years ago

For fun: https://saucelabs.com/jobs/6f38bb2836dcb1db1812e8a532e03de0 is the problematic test running in Sauce Labs -- the video is only archived for ~ 30 days, but if I can, I'll attach it as an FLV, here, too.

Stephen Donner [:stephend] Not actively reading bugmail

Updated

•

13 years ago

Blocks: 717055

Matt Brandt [:mbrandt]

Comment 9

•

13 years ago

Verified ... we'll need give some love to our test suite. A bit of house cleaning is in order. Awesome work digging down through this! Woot

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

search performance regression in 2.4

Categories

(Socorro :: General, task)

Tracking

(Not tracked)

People

(Reporter: rhelmer, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Attachment

General

Description

File Name

Content Type