Closed
Bug 629482
Opened 13 years ago
Closed 13 years ago
try clones failing all over the place again
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: aravind)
References
Details
Attachments
(1 file)
551 bytes,
patch
|
catlee
:
review+
|
Details | Diff | Splinter Review |
Same symptoms as the start of https://bugzilla.mozilla.org/show_bug.cgi?id=629268. Looks like an Apache restart may have improved things last night?
Assignee | ||
Comment 1•13 years ago
|
||
The server has a load avg of 50. Probably means there are a ton of try servers all cloning at the same time. We have to find out a way to throttle the number of clones that run at the same time. There is a hardware fix for this as well, but that will take some time. In the meantime fixing 629268 might help.
Assignee: server-ops → aravind
Reporter | ||
Comment 2•13 years ago
|
||
Attachment #507593 -
Flags: review?(catlee)
Updated•13 years ago
|
Attachment #507593 -
Flags: review?(catlee) → review+
Comment 3•13 years ago
|
||
(In reply to comment #1) > The server has a load avg of 50. Probably means there are a ton of try servers > all cloning at the same time. We have to find out a way to throttle the number > of clones that run at the same time. We get spikes in load, on different branches, unpredictably, and hg.m.o needs to be able to handle this load. > There is a hardware fix for this as well, > but that will take some time. Please clarify what is needed here, and I'll do my best to make sure you get it. > In the meantime fixing 629268 might help. Happy to reset try repo in bug#629268, and see if that helps. If yes, then we should do this as-routine-proceedure at start of every month to avoid rehitting these developer-visible-failures. Meanwhile, would another apache restart help?
Blocks: 626751
Reporter | ||
Comment 4•13 years ago
|
||
Comment on attachment 507593 [details] [diff] [review] lower lock to 20 Landed this ptach and reconfiged the try master
Assignee | ||
Comment 5•13 years ago
|
||
(In reply to comment #3) > We get spikes in load, on different branches, unpredictably, and hg.m.o needs > to be able to handle this load. Handle spikes up to what load.. We can throw more hardware at the problem, and scale it to say double the current capacity. And then you add more try slaves and then that isn't enough anymore. Maybe the fix is to clone it once per push and replicate that clone internally somehow (bittorrent..etc)? I think this has to involve a software solution along with the hardware side of things.. > Please clarify what is needed here, and I'll do my best to make sure you get > it. We are discussing hardware options in IT and plan to rebuild/move the entire hg.m.o infrastructure to phx. Those discussions are underway, not sure if there is anything for build and release to do there. > Meanwhile, would another apache restart help? Not really, Ben pushed out a change to limit the number of clones to 20. That should help.
Comment 6•13 years ago
|
||
(In reply to comment #5) > (In reply to comment #3) > > We get spikes in load, on different branches, unpredictably, and hg.m.o needs > > to be able to handle this load. > > Handle spikes up to what load.. We can throw more hardware at the problem, and > scale it to say double the current capacity. And then you add more try slaves > and then that isn't enough anymore. > > Maybe the fix is to clone it once per push and replicate that clone internally > somehow (bittorrent..etc)? I think this has to involve a software solution > along with the hardware side of things.. This is already happening in bug 589885.
Reporter | ||
Comment 7•13 years ago
|
||
I guess this bug is FIXED, since the immediate issue is dealt with.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•