Closed
Bug 664318
Opened 13 years ago
Closed 12 years ago
[stage] input.allizom.org Test automation hampered by frequent timeouts
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
VERIFIED
WONTFIX
People
(Reporter: mbrandt, Assigned: cturra)
References
()
Details
Attachments
(2 files)
Service Unavailable: This is fairly recent and is illiciting false positives in our test automation. We're seeing quite a few timeouts and "Service Unavailable." The timeouts we might be able to bypass by increasing the time window in which the test runs however we can't bypass the "Service Unavailable" errors. Can you investigate the cause of this? Please let us know what we can do to help dig into the cause of the behavior.
Reporter | ||
Comment 1•13 years ago
|
||
Reporter | ||
Comment 2•13 years ago
|
||
Here are a few timestamps to help w/ looking through the logs: http://qa-selenium.mv.mozilla.com:8080/view/Input/job/input.stage/ Failed > Console Output #725 Jun 14, 2011 2:10:09 PM Failed > Console Output #724 Jun 14, 2011 8:00:32 AM
Updated•13 years ago
|
Assignee: server-ops → nmaul
Comment 3•13 years ago
|
||
The screenshot in 725 is a a Zeus 500 error, meaning the backend server did not respond in a timely manner. It's worth noting that Apache gets reloaded on this server at least every 10 minutes (at 0,10,20,30,40,50 min past the hour, every hour). This is a shared staging server, serving more than just input.allizom.org, so it may be related to one of the other sites causing a load problem or something. I offset this in Selenium... it was set for this schedule: 0 8,16 * * * it is now: 2 8,16 * * * Assuming I changed the right setting in the right place, your tests should no longer overlap with the Apache restart, which should give you more consistent results. Of course, manually run tests can still trip up. I recommend just paying attention to the timing of them- failures occurring on a 10-minute mark within the hour are likely bogus. Let us know if it doesn't get any better over the next couple days.
Status: NEW → ASSIGNED
Comment 4•13 years ago
|
||
Going to close this out... if the problem is not solved, please re-open and we'll keep looking. Thanks!
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 5•13 years ago
|
||
Thanks for digging into this. We've had a good run with many fewer timeouts. There's been an increase in intermittent timeouts recently. What can I provide you on to help diagnose this further? - Are we still hitting timeouts because Apache is reloading (comment 3)? - What is the Zeus timeout threshold, is this setting shared between the other projects being hosted on the server?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 6•13 years ago
|
||
In talking to Corey on IRC, it looks bug 651148 will either obviate this or make it all-the-more apparent. After discussing it with him, the plan is to mark this bug as depending on bug 651148, which should happen sometime next week or so, and then we (WebQA) can comment on what we're seeing in the new setup.
Depends on: 651148
Updated•13 years ago
|
Component: Server Operations → Server Operations: Web Operations
QA Contact: mrz → cshields
Comment 7•12 years ago
|
||
How has this been? The bug in comment 6 was resolved recently, wondering if we can close this out too.
Comment 8•12 years ago
|
||
(In reply to Jake Maul [:jakem] from comment #7) > How has this been? The bug in comment 6 was resolved recently, wondering if > we can close this out too. Jake: still seeing quite a few failures (timeouts, as well as "Search Unavailables," the latter of which we've xfailed a few).
Comment 9•12 years ago
|
||
(In reply to Stephen Donner [:stephend] from comment #8) > (In reply to Jake Maul [:jakem] from comment #7) > > How has this been? The bug in comment 6 was resolved recently, wondering if > > we can close this out too. > > Jake: still seeing quite a few failures (timeouts, as well as "Search > Unavailables," the latter of which we've xfailed a few). http://qa-selenium.mv.mozilla.com:8080/job/input.stage/buildTimeTrend
Comment 10•12 years ago
|
||
I wonder if we're hitting an issue with the Seamicro dev nodes not being fast enough. Phong: can we get a VM put up and in place on this cluster (input-dev, PHX1)? Once it's puppetized I can put it in the right node class and zeus cluster and all that. RHEL6, 2GB RAM, 1 core, 10GB disk should be sufficient for testing. Thanks!
Assignee: nmaul → phong
Whiteboard: want vm
Comment 11•12 years ago
|
||
Is this in phx1 or sjc1?
Comment 12•12 years ago
|
||
PHX1. :)
Comment 13•12 years ago
|
||
Can you create a VM on the PHX cluster for this?
Assignee: phong → dparsons
Updated•12 years ago
|
Assignee: dparsons → server-ops
Component: Server Operations: Web Operations → Server Operations: Virtualization
QA Contact: cshields → dparsons
Comment 14•12 years ago
|
||
Let's hold off on a VM - we'll have new Xeon gear there in about a week, I'd like to just give a full blade to it.
Comment 15•12 years ago
|
||
OK, gonna move it back to the server-ops queue then.
Component: Server Operations: Virtualization → Server Operations
QA Contact: dparsons → phong
Whiteboard: want vm
Comment 16•12 years ago
|
||
Just wanted to note, since I've been poking at input a bunch the last couple of days for bug 725782, that input.allizom.org is hosted on mrapp-stage02 right now, which is an older DL360 G4, and runs a bunch of stuff, so it's not seamicro slowness, just older hardware slowness due to sharing. Input-dev is on a pair of seamicro nodes, but stage and prod were never migrated to the new admin node and setup with the current deployment style. As Corey mentioned above, moving staging new a new Xeon seamicro and to the new inputadm and deployment style is probably for the best long term, so once a node is ready, I can take the lead on setting up a new input.allizom.org on it and getting it tested by the developers.
Updated•12 years ago
|
Assignee: server-ops → bburton
Comment 17•12 years ago
|
||
Pending Xeon seamicros and migration to phx1
Reporter | ||
Comment 18•12 years ago
|
||
:solarace - thx for the update :)
Updated•12 years ago
|
Assignee: bburton → cturra
Assignee | ||
Comment 19•12 years ago
|
||
per the "Input: Sequel or Reboot" meeting this morning, we will be doing a reboot on input so marking this bug "won't fix"
Status: REOPENED → RESOLVED
Closed: 13 years ago → 12 years ago
Resolution: --- → WONTFIX
Reporter | ||
Comment 20•12 years ago
|
||
QA verified wontfix - thank you for following up and cleaning out the bug queue cturra.
Status: RESOLVED → VERIFIED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•