Closed Bug 833883 Opened 11 years ago Closed 11 years ago

svn1.dmz.phx1 issues

Categories

(Developer Services :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dumitru, Assigned: bkero)

References

Details

After svn1 crashed (bug 831642), we received two reports that sometimes svn commits fail (see bug 833830 and bug 833001).

Looking at svn1:

[root@svn1.dmz.phx1 ~]# ps aux | grep svnserve
verbatim   658  0.0  0.0  93756  4652 ?        Ss   Jan19   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim  4040  0.0  0.1  93840  6892 ?        Ss   Jan22   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim  4428  0.0  0.1  93828  6724 ?        Ss   Jan19   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
553       5630  0.0  0.0 189592  5416 ?        Ss   Jan22   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim  6776  0.0  0.1  93788  8712 ?        Ss   Jan21   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim  7605  0.0  0.1  93764  6688 ?        Ss   Jan19   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
550       7679  0.0  0.1 189992  6068 ?        Ss   10:19   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim  9995  0.0  0.1  93600  6680 ?        Ss   Jan21   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 10104  0.0  0.0  92800  5612 ?        Ss   Jan19   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 11226  0.0  0.0  92832  3380 ?        Ss   Jan19   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
root     12291  0.0  0.0 103248   856 pts/0    S+   10:30   0:00 grep svnserve
verbatim 13688  0.0  0.1  93828  6956 ?        Ss   02:48   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 15035  0.0  0.1  93788  6904 ?        Ss   Jan21   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
1127     16985  0.0  0.0 189936  5768 ?        Ss   Jan22   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 19097  0.0  0.1  93832  6860 ?        Ss   02:59   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 19197  0.0  0.0  92564  5384 ?        Ss   Jan22   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 20378  0.0  0.1  93620  6824 ?        Ss   Jan21   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 20930  0.0  0.1  93648  6772 ?        Ss   Jan20   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 22306  0.0  0.1  93788  6720 ?        Ss   Jan21   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
1727     23461  0.0  0.0  92604  5528 ?        Ss   Jan20   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
1727     24045  0.0  0.0  92632  5632 ?        Ss   Jan20   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 24326  0.0  0.0  92476  5356 ?        Ss   Jan20   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 24993  0.0  0.1  93848  6828 ?        Ss   Jan22   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 25541  0.0  0.1  93808  6920 ?        Ss   Jan21   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 25571  0.0  0.1  94044  6956 ?        Ss   Jan21   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 26041  0.0  0.1  93504  8448 ?        Ss   Jan20   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
553      26686  0.0  0.1 190292  6192 ?        Ss   Jan19   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 30397  0.0  0.1  93508  6540 ?        Ss   Jan21   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 31393  0.0  0.1  93756  6832 ?        Ss   Jan19   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
verbatim 32037  0.0  0.1  93744  6908 ?        Ss   Jan22   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log
2139     32685  0.0  0.0  92324  5180 ?        Ss   Jan22   0:00 svnserve -t -r /repo/svn/mozilla --log-file /var/log/svn.log

Doing a lsof on their PIDs, they all have this in common:

svnserve 22306 verbatim   15u   REG   0,19 51657728    18693 /repo/svn/mozilla/db/rep-cache.db (10.8.74.10:/vol/svn)


I've confirmed that svn1 is the only one causing the issues (I drained svn2 and svn3 in Zeus and asked reed to commit, it hang). Removing svn1 from the pool fixed this.
This was a very hairy yak.

I traced one of the svnserve processes, which yielded:
[root@svn1.dmz.phx1 ~]# strace -f -p 32037
Process 32037 attached - interrupt to quit
read(8,

From here I saw it was stuck on file descriptor 8. Which happened to be:

[root@svn1.dmz.phx1 ~]# lsof -p 32037 | grep 8r
svnserve 32037 verbatim    8r  FIFO    0,8      0t0 99760462 pipe

stuck in a pipe. Pipe # 99760462 no less. Sniffing through the /proc tree to find out what else was using that pipe I came across:

[root@svn1.dmz.phx1 fd]# (find /proc -type l  | xargs ls -l | fgrep 'pipe:[99760462]') 2>/dev/null
lr-x------ 1 verbatim                users  64 Jan 23 10:03 /proc/32037/fd/8 -> pipe:[99760462]
lr-x------ 1 verbatim                users  64 Jan 23 11:09 /proc/32037/task/32037/fd/8 -> pipe:[99760462]
l-wx------ 1 verbatim                users  64 Jan 23 10:50 /proc/32046/fd/2 -> pipe:[99760462]
l-wx------ 1 verbatim                users  64 Jan 23 11:09 /proc/32046/task/32046/fd/2 -> pipe:[99760462]
l-wx------ 1 verbatim                users  64 Jan 23 10:50 /proc/32058/fd/2 -> pipe:[99760462]
l-wx------ 1 verbatim                users  64 Jan 23 11:09 /proc/32058/task/32058/fd/2 -> pipe:[99760462]

Snooping on a few of these procs, I found:
[root@svn1.dmz.phx1 fd]# strace -f -p 32058
Process 32058 attached - interrupt to quit
connect(4, {sa_family=AF_FILE, path="/var/run/abrt/abrt.socket"}, 27

It was hanging on abrt.socket.

So I kicked the abrtd service and everything unfucked itself.

[root@svn1.dmz.phx1 abrt]# /etc/init.d/abrtd restart
Stopping abrt daemon:                                      [  OK  ]
Starting abrt daemon:                                      [  OK  ]

[root@svn1.dmz.phx1 abrt]# ps aux|grep svn
root      7749  0.0  0.0 103244   840 pts/2    S+   11:17   0:00 grep svn
[root@svn1.dmz.phx1 abrt]#
Assignee: server-ops-devservices → bkero
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in before you can comment on or make changes to this bug.