Closed Bug 1246397 Opened 8 years ago Closed 8 years ago

puppettize.vbs partial failure (failed to extract 2 of 3 keys from certs file)

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: grenade, Assigned: grenade)

References

Details

(Whiteboard: [aws][windows])

Attachments

(2 files)

https://github.com/mozilla/build-cloud-tools/pull/177 8 years ago Rob Thijssen [:grenade (EET/UTC+0300)] 53 bytes, text/x-github-pull-request	markco : review+	Details \| Review
https://github.com/mozilla/build-cloud-tools/pull/178 8 years ago Rob Thijssen [:grenade (EET/UTC+0300)] 53 bytes, text/x-github-pull-request	markco : review+	Details \| Review

Rob Thijssen [:grenade (EET/UTC+0300)]

Assignee

Description

•

8 years ago

This mornings cron scheduled puppet run on b-2008-ec2-golden failed. Both y-2008 and t-w732 succeeded and ran as normal.

The error was captured in the agent run report as "2016-02-06 02:45:00 -0800 Puppet (err): Could not request certificate: Error 400 on SERVER: this master is not a CA".

This normally indicates a missing cert file on the agent. On the instance, I found that the output from the cscript run of puppettize.vbs (c:\log\puppettize-stderr.log) contained this:

C:\ProgramData\PuppetLabs\puppet\var\puppettize_TEMP.vbs(88, 1) Microsoft VBScript runtime error: Input past end of file

This is new, I've never seen any ouput in this file before. It's normally empty. It's created when we run puppettize.vbs from userdata powershell with a stderr redirect.

The certs.sh file (in programdata/puppetlabs/...) contained all of the certs we would expect.
The private_keys folder contained the newly created, agent specific, pem file that we expect (file timestamp checked).
The certs folder was missing both of the pem files that we expect.

So, for some reason puppettize.vbs which always succeeds and never gives us problems, failed today in a new and wonderful way, after partially succeeding by extracting 1 of the 3 keys in the certs file. I have no idea why, but if anyone wants a copy of the certs.sh file that it choked on, in order to debug what happened, I kept a copy and can forward on request.

Phil Ringnalda (:philor)

Updated

•

8 years ago

Blocks: 1246412

Rob Thijssen [:grenade (EET/UTC+0300)]

Assignee

Comment 1

•

8 years ago

Attached file https://github.com/mozilla/build-cloud-tools/pull/177 — Details

Attachment #8716669 - Flags: review?(mcornmesser)

Rob Thijssen [:grenade (EET/UTC+0300)]

Assignee

Comment 2

•

8 years ago

The failed puppet run yesterday had unexpected but logical and rational consequences:

- spot instances spawned from the golden ami with the failed puppet run did not start runner/buildbot (I did not expect this, being based off an ami that could run builds without puppet, I was sure the failed puppet run would have no impact. I was wrong)
- check_ami worked beautifully and killed off (good) spot instances spawned from earlier (good) amis, replacing them with (bad) spot instances spawned from the new (bad) ami. yay for check_ami!
cloud tools kept seeing demand for for b-2008 spot instances because the pending queue was full no matter how many b-2008 instances were spawned. we ended up instantiating the full allocation of 200 b-2008 spot instances which all just sat around twiddling their thumbs and not talking to buildbot.
- when we noticed (because philor pointed out he had closed the trees), I manually de-registered the bad amis (use1, usw2) and terminated all of the idle b-2008 spot instances using the aws console. cloud tools immediately tried to replace all the instances, but took a full hour to do so, because it took that long for the IP address leases to expire and become available for re-use by new instances.

Today's cron run (Sunday), had no problem running puppettize.vbs. Nothing changed, it just worked the way it should (and always has, excluding yesterday), puppet ran correctly, the golden ami was created.

Rob Thijssen [:grenade (EET/UTC+0300)]

Assignee

Comment 3

•

8 years ago

the new amis to watch are ami-47c1e92d (use1), ami-f5fd1c95 (usw2)

Mark Cornmesser [:markco] OOO 2024/04/15

Updated

•

8 years ago

Attachment #8716669 - Flags: review?(mcornmesser) → review+

Rob Thijssen [:grenade (EET/UTC+0300)]

Assignee

Comment 4

•

8 years ago

Attached file https://github.com/mozilla/build-cloud-tools/pull/178 — Details

Assignee: relops → rthijssen

Status: NEW → ASSIGNED

Attachment #8720786 - Flags: review?(mcornmesser)

Mark Cornmesser [:markco] OOO 2024/04/15

Updated

•

8 years ago

Attachment #8720786 - Flags: review?(mcornmesser) → review+

Rob Thijssen [:grenade (EET/UTC+0300)]

Assignee

Comment 5

•

8 years ago

fixed by looping puppet runs until someone manually fixes problems.

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

puppettize.vbs partial failure (failed to extract 2 of 3 keys from certs file)

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

Tracking

(Not tracked)

People

(Reporter: grenade, Assigned: grenade)

References

Details

(Whiteboard: [aws][windows])

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Updated

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Updated

Comment 5

Attachment

General

Description

File Name

Content Type