Closed
Bug 785371
Opened 13 years ago
Closed 13 years ago
DNS errors from mountain lion systems
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: dustin)
Details
Attachments
(3 files)
|
2.08 KB,
patch
|
kmoir
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
|
889 bytes,
patch
|
kmoir
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
|
1.29 KB,
patch
|
kmoir
:
review+
dustin
:
checked-in+
|
Details | Diff | Splinter Review |
We're seeing some failures of puppetagain runs due to getaddrinfo failures (clocks are pacific):
talos-mtnlion-r5-008.test.releng.scl3.mozilla.com
Fri Aug 24 05:17:50 -0700 2012 /File[/var/lib/puppet/lib] (err): Failed to generate additional resources using 'eval_generate: getaddrinfo: nodename nor servname provided, or not known
Fri Aug 24 05:17:50 -0700 2012 /File[/var/lib/puppet/lib] (err): Could not evaluate: getaddrinfo: nodename nor servname provided, or not known Could not retrieve file metadata for puppet://puppet/plugins: getaddrinfo: nodename nor servname provided, or not known
talos-mtnlion-r5-010.test.releng.scl3.mozilla.com:
Fri Aug 24 00:24:26 -0700 2012 /File[/var/lib/puppet/lib] (err): Failed to generate additional resources using 'eval_generate: getaddrinfo: nodename nor servname provided, or not known
Fri Aug 24 00:24:26 -0700 2012 /File[/var/lib/puppet/lib] (err): Could not evaluate: getaddrinfo: nodename nor servname provided, or not known Could not retrieve file metadata for puppet://puppet/plugins: getaddrinfo: nodename nor servname provided, or not known
talos-mtnlion-r5-009.test.releng.scl3.mozilla.com:
Fri Aug 24 00:15:52 -0700 2012 /File[/var/lib/puppet/lib] (err): Failed to generate additional resources using 'eval_generate: getaddrinfo: nodename nor servname provided, or not known
Fri Aug 24 00:15:52 -0700 2012 /File[/var/lib/puppet/lib] (err): Could not evaluate: getaddrinfo: nodename nor servname provided, or not known Could not retrieve file metadata for puppet://puppet/plugins: getaddrinfo: nodename nor servname provided, or not known
Fri Aug 24 05:16:31 -0700 2012 /File[/var/lib/puppet/lib] (err): Failed to generate additional resources using 'eval_generate: getaddrinfo: nodename nor servname provided, or not known
Fri Aug 24 05:16:31 -0700 2012 /File[/var/lib/puppet/lib] (err): Could not evaluate: getaddrinfo: nodename nor servname provided, or not known Could not retrieve file metadata for puppet://puppet/plugins: getaddrinfo: nodename nor servname provided, or not known
This particular error comes from failing to look up the puppet server name, which in this case is 'puppet'.
talos-mtnlion-r5-009:~ root# host puppet
puppet.test.releng.scl3.mozilla.com is an alias for releng-puppet1.srv.releng.scl3.mozilla.com.
releng-puppet1.srv.releng.scl3.mozilla.com has address 10.26.48.45
search domain[0] : test.releng.scl3.mozilla.com
...
nameserver[0] : 10.26.75.40
nameserver[1] : 10.26.75.41
I don't see anything in the logs on ns1, other than a failed puppet run (bug 785360) just before each of these failures.
| Assignee | ||
Comment 1•13 years ago
|
||
Ah, looks like this is an OS X thing, not a DNS thing.
| Assignee | ||
Updated•13 years ago
|
Summary: DNS errors in releng BU → DNS errors from mountain lion systems
| Assignee | ||
Comment 2•13 years ago
|
||
And apparently has been going on for some time. Which suggests bug 734123 will be valuable :/
| Assignee | ||
Updated•13 years ago
|
Assignee: server-ops → dustin
Component: Server Operations → Release Engineering: Machine Management
QA Contact: jdow → armenzg
| Assignee | ||
Comment 3•13 years ago
|
||
So, adding scutil --dns to the run-puppet.sh shows that the problem is the first run of puppet (against 'puppet') is occurring before lookupd starts.
So, I think we need to wait for lookupd to start. Basically, waiting until
scutil --dns | grep 'No DNS configuration available'
fails. I'll give that a shot.
Otherwise, this seems relatively harmless, since the script just falls back to other puppet masters.
| Assignee | ||
Comment 4•13 years ago
|
||
Progress!
talos-mtnlion-r5-018:~ root# less /var/log/puppet/puppet.out
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
Running puppet agent against server 'puppet'
| Assignee | ||
Comment 5•13 years ago
|
||
Tested on 018 - I'll see it loop 2-3 times, then successfully puppetize against 'puppet'. I also tested the rebooting part (by just using "while true") and it works fine, too.
Attachment #655015 -
Flags: review?(kmoir)
Updated•13 years ago
|
Attachment #655015 -
Flags: review?(kmoir) → review+
| Assignee | ||
Updated•13 years ago
|
Attachment #655015 -
Flags: checked-in+
| Assignee | ||
Updated•13 years ago
|
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 6•13 years ago
|
||
Seeing this today now the the 10.8 puppet slaves are in production.
Tue Aug 28 04:19:47 -0700 2012 /File[/var/lib/puppet/lib] (err): Failed to generate additional resources using 'eval_generate: getaddrinfo: nodename nor servname provided, or not known
Tue Aug 28 04:19:47 -0700 2012 /File[/var/lib/puppet/lib] (err): Could not evaluate: getaddrinfo: nodename nor servname provided, or not known Could not retrieve file metadata for puppet://puppet/plugins: getaddrinfo: nodename nor servname provided, or not known
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Assignee | ||
Comment 7•13 years ago
|
||
talos-mtnlion-r5-049:~ root# cat /var/log/puppet/puppet.*
Running puppet agent against server 'puppet'
No DNS configuration available
..waiting for DNS
Running puppet agent against server 'puppet'
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
Running puppet agent against server 'puppet'
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
Running puppet agent against server 'puppet'
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
Running puppet agent against server 'puppet'
Running puppet agent against server 'releng-puppet1.build.mtv1.mozilla.com'
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
Running puppet agent against server 'puppet'
I wonder if there's a race condition here between lookupd starting and DNS queries resolving? I'd like to add an extra bit of output to these logfiles so we know when run-puppet.sh starts. That may help figure out what's going on.
For those playing along at home, these are harmless errors except that they slightly delay startup and pepper our email with error logs.
| Assignee | ||
Comment 8•13 years ago
|
||
Attachment #656033 -
Flags: review?(kmoir)
Updated•13 years ago
|
Attachment #656033 -
Flags: review?(kmoir) → review+
| Assignee | ||
Comment 9•13 years ago
|
||
Attachment #656033 -
Flags: checked-in+
| Assignee | ||
Comment 10•13 years ago
|
||
Starting run-puppet.sh at Tue Aug 28 12:41:19 PDT 2012
checking DNS
;; connection timed out; no servers could be reached
Running puppet agent against server 'puppet'
So it successfully ran scutil --dns, but a subsequent 'host' command didn't work. Ugh, OS X!
| Assignee | ||
Comment 11•13 years ago
|
||
With this patch in place:
Starting run-puppet.sh at Tue Aug 28 13:36:20 PDT 2012
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
No DNS configuration available
..waiting for DNS
;; connection timed out; no servers could be reached
releng-puppet1.build.mtv1.mozilla.com has address 10.250.48.247
Running puppet agent against server 'puppet'
so it runs 'host' until it actually works.
Attachment #656197 -
Flags: review?(kmoir)
Updated•13 years ago
|
Attachment #656197 -
Flags: review?(kmoir) → review+
| Assignee | ||
Updated•13 years ago
|
Attachment #656197 -
Flags: checked-in+
| Assignee | ||
Comment 12•13 years ago
|
||
I haven't seen any failures yet!
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
| Assignee | ||
Comment 13•13 years ago
|
||
Hm, saw three failures this morning.
talos-mtnlion-r5-038.test.releng.scl3.mozilla.com
talos-mtnlion-r5-036.test.releng.scl3.mozilla.com
talos-mtnlion-r5-077.test.releng.scl3.mozilla.com
I investigated based on /var/log/puppet.out and `last`. For all of them, this was their first time starting up since 8/23, so they got the patch *after* the failure. In fact, all have successfully rebooted several times since this morning without errors.
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•