Closed Bug 785371 Opened 13 years ago Closed 13 years ago

DNS errors from mountain lion systems

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

Details

Attachments

(3 files)

We're seeing some failures of puppetagain runs due to getaddrinfo failures (clocks are pacific): talos-mtnlion-r5-008.test.releng.scl3.mozilla.com Fri Aug 24 05:17:50 -0700 2012 /File[/var/lib/puppet/lib] (err): Failed to generate additional resources using 'eval_generate: getaddrinfo: nodename nor servname provided, or not known Fri Aug 24 05:17:50 -0700 2012 /File[/var/lib/puppet/lib] (err): Could not evaluate: getaddrinfo: nodename nor servname provided, or not known Could not retrieve file metadata for puppet://puppet/plugins: getaddrinfo: nodename nor servname provided, or not known talos-mtnlion-r5-010.test.releng.scl3.mozilla.com: Fri Aug 24 00:24:26 -0700 2012 /File[/var/lib/puppet/lib] (err): Failed to generate additional resources using 'eval_generate: getaddrinfo: nodename nor servname provided, or not known Fri Aug 24 00:24:26 -0700 2012 /File[/var/lib/puppet/lib] (err): Could not evaluate: getaddrinfo: nodename nor servname provided, or not known Could not retrieve file metadata for puppet://puppet/plugins: getaddrinfo: nodename nor servname provided, or not known talos-mtnlion-r5-009.test.releng.scl3.mozilla.com: Fri Aug 24 00:15:52 -0700 2012 /File[/var/lib/puppet/lib] (err): Failed to generate additional resources using 'eval_generate: getaddrinfo: nodename nor servname provided, or not known Fri Aug 24 00:15:52 -0700 2012 /File[/var/lib/puppet/lib] (err): Could not evaluate: getaddrinfo: nodename nor servname provided, or not known Could not retrieve file metadata for puppet://puppet/plugins: getaddrinfo: nodename nor servname provided, or not known Fri Aug 24 05:16:31 -0700 2012 /File[/var/lib/puppet/lib] (err): Failed to generate additional resources using 'eval_generate: getaddrinfo: nodename nor servname provided, or not known Fri Aug 24 05:16:31 -0700 2012 /File[/var/lib/puppet/lib] (err): Could not evaluate: getaddrinfo: nodename nor servname provided, or not known Could not retrieve file metadata for puppet://puppet/plugins: getaddrinfo: nodename nor servname provided, or not known This particular error comes from failing to look up the puppet server name, which in this case is 'puppet'. talos-mtnlion-r5-009:~ root# host puppet puppet.test.releng.scl3.mozilla.com is an alias for releng-puppet1.srv.releng.scl3.mozilla.com. releng-puppet1.srv.releng.scl3.mozilla.com has address 10.26.48.45 search domain[0] : test.releng.scl3.mozilla.com ... nameserver[0] : 10.26.75.40 nameserver[1] : 10.26.75.41 I don't see anything in the logs on ns1, other than a failed puppet run (bug 785360) just before each of these failures.
Ah, looks like this is an OS X thing, not a DNS thing.
Summary: DNS errors in releng BU → DNS errors from mountain lion systems
And apparently has been going on for some time. Which suggests bug 734123 will be valuable :/
Assignee: server-ops → dustin
Component: Server Operations → Release Engineering: Machine Management
QA Contact: jdow → armenzg
So, adding scutil --dns to the run-puppet.sh shows that the problem is the first run of puppet (against 'puppet') is occurring before lookupd starts. So, I think we need to wait for lookupd to start. Basically, waiting until scutil --dns | grep 'No DNS configuration available' fails. I'll give that a shot. Otherwise, this seems relatively harmless, since the script just falls back to other puppet masters.
Progress! talos-mtnlion-r5-018:~ root# less /var/log/puppet/puppet.out No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS Running puppet agent against server 'puppet'
Attached patch bug785371.patchSplinter Review
Tested on 018 - I'll see it loop 2-3 times, then successfully puppetize against 'puppet'. I also tested the rebooting part (by just using "while true") and it works fine, too.
Attachment #655015 - Flags: review?(kmoir)
Attachment #655015 - Flags: review?(kmoir) → review+
Attachment #655015 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Seeing this today now the the 10.8 puppet slaves are in production. Tue Aug 28 04:19:47 -0700 2012 /File[/var/lib/puppet/lib] (err): Failed to generate additional resources using 'eval_generate: getaddrinfo: nodename nor servname provided, or not known Tue Aug 28 04:19:47 -0700 2012 /File[/var/lib/puppet/lib] (err): Could not evaluate: getaddrinfo: nodename nor servname provided, or not known Could not retrieve file metadata for puppet://puppet/plugins: getaddrinfo: nodename nor servname provided, or not known
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
talos-mtnlion-r5-049:~ root# cat /var/log/puppet/puppet.* Running puppet agent against server 'puppet' No DNS configuration available ..waiting for DNS Running puppet agent against server 'puppet' No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS Running puppet agent against server 'puppet' No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS Running puppet agent against server 'puppet' No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS Running puppet agent against server 'puppet' Running puppet agent against server 'releng-puppet1.build.mtv1.mozilla.com' No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS Running puppet agent against server 'puppet' I wonder if there's a race condition here between lookupd starting and DNS queries resolving? I'd like to add an extra bit of output to these logfiles so we know when run-puppet.sh starts. That may help figure out what's going on. For those playing along at home, these are harmless errors except that they slightly delay startup and pepper our email with error logs.
Attached patch bug785371.patchSplinter Review
Attachment #656033 - Flags: review?(kmoir)
Attachment #656033 - Flags: review?(kmoir) → review+
Comment on attachment 656033 [details] [diff] [review] bug785371.patch Let's see what this gives us
Attachment #656033 - Flags: checked-in+
Starting run-puppet.sh at Tue Aug 28 12:41:19 PDT 2012 checking DNS ;; connection timed out; no servers could be reached Running puppet agent against server 'puppet' So it successfully ran scutil --dns, but a subsequent 'host' command didn't work. Ugh, OS X!
Attached patch bug785371.patchSplinter Review
With this patch in place: Starting run-puppet.sh at Tue Aug 28 13:36:20 PDT 2012 No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS No DNS configuration available ..waiting for DNS ;; connection timed out; no servers could be reached releng-puppet1.build.mtv1.mozilla.com has address 10.250.48.247 Running puppet agent against server 'puppet' so it runs 'host' until it actually works.
Attachment #656197 - Flags: review?(kmoir)
Attachment #656197 - Flags: review?(kmoir) → review+
Attachment #656197 - Flags: checked-in+
I haven't seen any failures yet!
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Hm, saw three failures this morning. talos-mtnlion-r5-038.test.releng.scl3.mozilla.com talos-mtnlion-r5-036.test.releng.scl3.mozilla.com talos-mtnlion-r5-077.test.releng.scl3.mozilla.com I investigated based on /var/log/puppet.out and `last`. For all of them, this was their first time starting up since 8/23, so they got the patch *after* the failure. In fact, all have successfully rebooted several times since this morning without errors.
Product: mozilla.org → Release Engineering
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: