Closed Bug 629482 Opened 13 years ago Closed 13 years ago

try clones failing all over the place again

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86_64
Linux
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: aravind)

References

Details

Attachments

(1 file)

Same symptoms as the start of https://bugzilla.mozilla.org/show_bug.cgi?id=629268. Looks like an Apache restart may have improved things last night?
The server has a load avg of 50.  Probably means there are a ton of try servers all cloning at the same time.  We have to find out a way to throttle the number of clones that run at the same time.  There is a hardware fix for this as well, but that will take some time.

In the meantime fixing 629268 might help.
Assignee: server-ops → aravind
Attached patch lower lock to 20Splinter Review
Attachment #507593 - Flags: review?(catlee)
Attachment #507593 - Flags: review?(catlee) → review+
(In reply to comment #1)
> The server has a load avg of 50.  Probably means there are a ton of try servers
> all cloning at the same time.  We have to find out a way to throttle the number
> of clones that run at the same time.  

We get spikes in load, on different branches, unpredictably, and hg.m.o needs to be able to handle this load.


> There is a hardware fix for this as well,
> but that will take some time.
Please clarify what is needed here, and I'll do my best to make sure you get it.




> In the meantime fixing 629268 might help.
Happy to reset try repo in bug#629268, and see if that helps. If yes, then we should do this as-routine-proceedure at start of every month to avoid rehitting these developer-visible-failures.

Meanwhile, would another apache restart help?
Blocks: 626751
Comment on attachment 507593 [details] [diff] [review]
lower lock to 20

Landed this ptach and reconfiged the try master
(In reply to comment #3)
> We get spikes in load, on different branches, unpredictably, and hg.m.o needs
> to be able to handle this load.

Handle spikes up to what load..  We can throw more hardware at the problem, and scale it to say double the current capacity.  And then you add more try slaves and then that isn't enough anymore.  

Maybe the fix is to clone it once per push and replicate that clone internally somehow (bittorrent..etc)?  I think this has to involve a software solution along with the hardware side of things..

> Please clarify what is needed here, and I'll do my best to make sure you get
> it.

We are discussing hardware options in IT and plan to rebuild/move the entire hg.m.o infrastructure to phx.  Those discussions are underway, not sure if there is anything for build and release to do there.

> Meanwhile, would another apache restart help?

Not really, Ben pushed out a change to limit the number of clones to 20.  That should help.
(In reply to comment #5)
> (In reply to comment #3)
> > We get spikes in load, on different branches, unpredictably, and hg.m.o needs
> > to be able to handle this load.
> 
> Handle spikes up to what load..  We can throw more hardware at the problem, and
> scale it to say double the current capacity.  And then you add more try slaves
> and then that isn't enough anymore.  
> 
> Maybe the fix is to clone it once per push and replicate that clone internally
> somehow (bittorrent..etc)?  I think this has to involve a software solution
> along with the hardware side of things..

This is already happening in bug 589885.
I guess this bug is FIXED, since the immediate issue is dealt with.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: