Closed
Bug 949674
Opened 11 years ago
Closed 10 years ago
ec2 Builders timing out during mock-install step
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: KWierso, Unassigned)
Details
https://tbpl.mozilla.org/php/getParsedLog.php?id=31892811&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=31892485&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=31891234&tree=Fx-Team
[13:53] <KWierso|sheriffduty> bhearsum|buildduty: ping
<bhearsum|buildduty> KWierso|sheriffduty: pong
<KWierso|sheriffduty> I've been seeing a few things like this today: https://tbpl.mozilla.org/php/getParsedLog.php?id=31892485&tree=Mozilla-Inbound
[13:54] anything someone should be worried about?
https://tbpl.mozilla.org/php/getParsedLog.php?id=31891234&tree=Fx-Team was another
<bhearsum|buildduty> hmm
[13:55] i bet that's related to this nagios alert
16:50 < nagios-releng> Thu 13:50:44 PST [4846] releng-puppet2.srv.releng.usw2.mozilla.com:load is WARNING: WARNING - load average: 11.97, 9.44, 5.31 (http://m.allizom.org/load)
that step downloads files from the puppet server
please file it while i look into it
load seems okay now - so it could've just been a brief spike of instances being started
Comment 1•11 years ago
|
||
It looks like we got a simple spike in load, but this may have been made worse if we increased the maximum number of instances recently. Rail, I think you touched that earlier this week?
We also deployed foreman recently which I don't _think_ affects load on the Puppet master, but it's worth checking. Dustin, do you know?
Flags: needinfo?(rail)
Comment 2•11 years ago
|
||
That's certainly possible. If load is spiking that high (and I see it did the same about an hour before this event) then we should have more puppetmasters in ec2.
Reporter | ||
Comment 3•11 years ago
|
||
Reporter | ||
Comment 4•11 years ago
|
||
Comment 5•11 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #2)
> That's certainly possible. If load is spiking that high (and I see it did
> the same about an hour before this event) then we should have more
> puppetmasters in ec2.
As a temporary solution we can also bump instance type from m1.large to m1.xlarge ($0.240 per hour vs $0.480 per hour).
Flags: needinfo?(rail)
Comment 6•11 years ago
|
||
Looks like we already have releng-puppet1 & 2 in use1 and usw2 and AFAICT, all are getting used. Definitely seems like we should add a releng-puppet3 for at least usw2.
Comment 7•11 years ago
|
||
Another option is to offload hosting yum/deb repos from the puppet masters and put them on S3 or some other file server. This was mentioned several times at the AWS re:Invent conference as a best practice for getting puppet to scale.
Comment 8•11 years ago
|
||
That's certainly more complicated, but an option if you can figure out how.
Comment 9•11 years ago
|
||
Moving this, because it's not an acute buildduty concern.
Component: Buildduty → General Automation
QA Contact: armenzg → catlee
Comment 10•10 years ago
|
||
We haven't seen these for a long time.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•