Closed
Bug 792079
Opened 13 years ago
Closed 13 years ago
replication failing for ad.mozilla.com
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dustin, Assigned: q)
References
Details
Error log messages on dc3 regarding being unable to communicate with dc1 and dc2.
| Reporter | ||
Updated•13 years ago
|
Assignee: server-ops-releng → mlarrain
| Reporter | ||
Comment 1•13 years ago
|
||
12:17 < MaRu> dcdiag on dc1 everything passes
12:21 < MaRu> from dc3 everything works for replication BUT it doesn't see dc6/7
12:21 < MaRu> just dc1,2,9
12:22 < MaRu> it doesn't see the ones in SCL1
12:22 < MaRu> sounds like it could be a firewall issue
12:24 < MaRu> and I stand on that ground because the rest of the replication is working
| Reporter | ||
Comment 2•13 years ago
|
||
If we get time, let's use this next week as an example of how to diagnose and repair replication. I'll take notes and put 'em in mana, then you can use those notes in an oncall meeting.
Comment 3•13 years ago
|
||
This needs to be fixed and documented early next week.
Severity: normal → critical
| Reporter | ||
Comment 4•13 years ago
|
||
Well, this seems to still be the case, and is even a little bit worse now.
C:\Users\dmitchell>repadmin /replsum
Replication Summary Start Time: 2012-11-26 19:34:02
Beginning data collection for replication summary, this may take awhile:
..........
Source DSA largest delta fails/total %% error
DC1 43m:20s 0 / 17 0
DC2 44m:57s 0 / 14 0
DC3 44m:57s 0 / 11 0
DC6 43m:19s 0 / 14 0
DC7 44m:57s 0 / 14 0
DC8 44m:57s 0 / 15 0
DC9 17d.00h:49m:12s 4 / 11 36 (8524) The DSA operation is
unable to proceed because of a DNS lookup failure.
Destination DSA largest delta fails/total %% error
DC1 44m:58s 0 / 17 0
DC2 43m:21s 0 / 14 0
DC3 30m:30s 0 / 11 0
DC6 17d.00h:49m:13s 4 / 18 22 (8524) The DSA operation is
unable to proceed because of a DNS lookup failure.
DC7 42m:09s 0 / 14 0
DC8 42m:14s 0 / 11 0
DC9 41m:13s 0 / 11 0
| Reporter | ||
Comment 5•13 years ago
|
||
We failed the releng.ad.mozilla.com domain-specific FSMO roles over to DC9 (they were on DC6). Now they won't fail back. I suspect this is related to the above error.
| Reporter | ||
Updated•13 years ago
|
Summary: DFS replication failing for ad.mozilla.com → replication failing for ad.mozilla.com
| Reporter | ||
Comment 6•13 years ago
|
||
Here's the failing to fall back:
fsmo maintenance: transfer pdc
ldap_modify_sW error 0x34(52 (Unavailable).
Ldap extended error message is 000020AF: SvcErr: DSID-03210581, problem 5002 (UN
AVAILABLE), data 8524
Win32 error returned is 0x20af(The requested FSMO operation failed. The current
FSMO holder could not be contacted.)
)
Depending on the error code this may indicate a connection,
ldap, or role transfer error.
Server "dc6" knows about 5 roles
Schema - CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN=Sites,
CN=Configuration,DC=ad,DC=mozilla,DC=com
Naming Master - CN=NTDS Settings,CN=DC1,CN=Servers,CN=Default-First-Site-Name,CN
=Sites,CN=Configuration,DC=ad,DC=mozilla,DC=com
PDC - CN=NTDS Settings,CN=DC9,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=
Configuration,DC=ad,DC=mozilla,DC=com
RID - CN=NTDS Settings,CN=DC9,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=
Configuration,DC=ad,DC=mozilla,DC=com
Infrastructure - CN=NTDS Settings,CN=DC9,CN=Servers,CN=Default-First-Site-Name,C
N=Sites,CN=Configuration,DC=ad,DC=mozilla,DC=com
fsmo maintenance: c
| Reporter | ||
Comment 7•13 years ago
|
||
(the above is after I fixed DNS to point the PDC role toward DC9:
(moz)dustin@euclid ~/code/moz/t/dnsconfig $ svn diff -c 53222
Index: zones/mozilla.com/ad/SOA
===================================================================
--- zones/mozilla.com/ad/SOA (revision 53221)
+++ zones/mozilla.com/ad/SOA (revision 53222)
@@ -1,7 +1,7 @@
$TTL 300
@ IN SOA ns.mozilla.org. noc.mozilla.com. (
- 2012092502
+ 2012112601
10800
3600
604800
Index: zones/mozilla.com/ad/private
===================================================================
--- zones/mozilla.com/ad/private (revision 53221)
+++ zones/mozilla.com/ad/private (revision 53222)
@@ -110,7 +110,6 @@
releng.ad.mozilla.com. 600 IN A 10.22.69.18
_ldap._tcp.releng.ad.mozilla.com. 600 IN SRV 0 100 389 dc6.releng.ad.mozilla.com.
_ldap._tcp.Default-First-Site-Name._sites.releng.ad.mozilla.com. 600 IN SRV 0 100 389 dc6.releng.ad.mozilla.com.
-_ldap._tcp.pdc._msdcs.releng.ad.mozilla.com. 600 IN SRV 0 100 389 dc6.releng.ad.mozilla.com.
_ldap._tcp.c98bae82-3d1c-42d8-b8a7-668e672d88a6.domains._msdcs.ad.mozilla.com. 600 IN SRV 0 100 389 dc6.releng.ad.mozilla.com.
d7b21ce7-b91d-492f-952e-2a34682ed9a8._msdcs.ad.mozilla.com. 600 IN CNAME dc6.releng.ad.mozilla.com.
_kerberos._tcp.dc._msdcs.releng.ad.mozilla.com. 600 IN SRV 0 100 88 dc6.releng.ad.mozilla.com.
@@ -205,6 +204,7 @@
_kerberos._udp.releng.ad.mozilla.com. 600 IN SRV 0 100 88 DC9.releng.ad.mozilla.com.
_kpasswd._tcp.releng.ad.mozilla.com. 600 IN SRV 0 100 464 DC9.releng.ad.mozilla.com.
_kpasswd._udp.releng.ad.mozilla.com. 600 IN SRV 0 100 464 DC9.releng.ad.mozilla.com.
+_ldap._tcp.pdc._msdcs.releng.ad.mozilla.com. 600 IN SRV 0 100 389 DC9.releng.ad.mozilla.com.
;
; non-dc services
Comment 8•13 years ago
|
||
Matt, can you please start an etherpad with all of the research you've done on this and the solutions you've tried so we can pick up on that without much overlap at a later point?
Updated•13 years ago
|
Assignee: mlarrain → dustin
| Reporter | ||
Comment 9•13 years ago
|
||
I suspect that this is also preventing me from adding a DFS root target on DC6. So, I'll need to do that when the replication is fixed.
| Reporter | ||
Comment 10•13 years ago
|
||
I had similar problems trying to create the 'apps' link in \\ad\data. If DC2 is one of the root targets, creation fails. If I remove DC2, creation works.
| Reporter | ||
Updated•13 years ago
|
Assignee: dustin → server-ops-releng
Updated•13 years ago
|
Assignee: server-ops-releng → qfortier
| Assignee | ||
Comment 11•13 years ago
|
||
All working now. Notes to follow shortly
C:\Windows\system32>repadmin /replsum
Replication Summary Start Time: 2013-01-24 17:02:47
Beginning data collection for replication summary, this may take awhile:
..........
Source DSA largest delta fails/total %% error
DC1 11m:23s 0 / 17 0
DC2 12m:06s 0 / 14 0
DC3 11m:21s 0 / 11 0
DC6 06m:31s 0 / 14 0
DC7 12m:06s 0 / 14 0
DC8 11m:21s 0 / 11 0
DC9 12m:06s 0 / 11 0
Destination DSA largest delta fails/total %% error
DC1 11m:21s 0 / 17 0
DC2 06m:31s 0 / 14 0
DC3 11m:23s 0 / 11 0
DC6 12m:06s 0 / 14 0
DC7 06m:26s 0 / 14 0
DC8 11m:08s 0 / 11 0
DC9 05m:37s 0 / 11 0
| Assignee | ||
Comment 12•13 years ago
|
||
Issue:
DFS failing to replicate on DC1 - 3
Root cause: Issues on DC2 DFS setup, time drift on DC2, and DC1 DNS settings
Symptoms: DFS error in event viewer, dfsdiag (cli tool), repadmin (cli tool), and files and accounts not appearing on DC2.
Started checking on DC2 and I was unable to login with my new ad credentials. The admin account was reporting that the system time was off. After resetting the time I was able to login. After logging in I noticed DFS errors in the event viewer. I removed the DFS namespace from DC2 and rebuilt it the dc started replicating. I check all three servers in the event viewer and via unc path (\\dc1.mozilla.com\data\apps, etc). Things looked clean for the ad.mozilla.com domain.
| Assignee | ||
Comment 13•13 years ago
|
||
Issue:
DFS failing to replicate on DC6 - 9
Root cause:
DNS issues on DC6 and via forwarders DC1
Symptoms:
DFS error in event viewer, dfsdiag (cli tool), repadmin (cli tool), and files and accounts not appearing on DC8 or DC9.
Started checking on DC6 and dfsdiag /TestDcs reported errors on DC8 and DC9. I was unable to ping either DC8 or DC9 from the command line. However, nslookup directly against ns1 and ns2 worked and other machines resolved fine. After some IRC in channel discussion with ( and helpful troubleshotting from) arr and dustin I noticed that ipconfig showed that the IPv6 had a hard coded domain name server of ::1 which was referring to localhost. The local dns server was forwarding DC1 which referred to itself for SOA of releng.ad.mozilla.com and DC1 had no entries for DC8 or DC9 causing replication to fail. I set the the DNS settings in the "adapter settings" control panel to automatic in the IPV6 settings. I then forced sync on all DCs and reran repadmin and dfsdiag with no errors. After a few hours the event viewer showed no new errors and informational messages about functioning replication.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in
before you can comment on or make changes to this bug.
Description
•