Closed
Bug 54256
Opened 24 years ago
Closed 24 years ago
runaway cvs processes on cvs.mozilla.org
Categories
(mozilla.org Graveyard :: Server Operations, task, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dmosedale, Assigned: rkotalampi)
Details
10-20 processes per day are spinning and chewing up CPU on thelizard.mozilla.org.
Eventually, the load gets so high that mail stops being processed. This seems
to have started happening when the MN6 branch was cut last week. I've killed a
bunch of them yesterday and today, and they show no signs of letting up.
Some preliminary analysis: all the processes I've tried attaching to with gdb
look like this:
#0 0x3e1a0 in translate_symtag (rcs=0xaed60,
tag=0xc4538 "\tNetscape_20000922_BRANCH") at ../../src/rcs.c:3371
#1 0x3d070 in RCS_gettag (rcs=0xaed60,
symtag=0xc4538 "\tNetscape_20000922_BRANCH", force_tag_match=1,
simple_tag=0x0) at ../../src/rcs.c:2486
#2 0x52ef4 in val_fileproc (callerdat=0xeffff868, finfo=0xeffff548)
at ../../src/tag.c:691
#3 0x4786c in do_file_proc (p=0xc1658, closure=0xeffff540)
at ../../src/recurse.c:595
#4 0x29bec in walklist (list=0xcfab0, proc=0x477b8 <do_file_proc>,
closure=0xeffff540) at ../../src/hash.c:370
#5 0x476a8 in do_recursion (frame=0xeffff5e0) at ../../src/recurse.c:517
#6 0x47cf0 in do_dir_proc (p=0x0, closure=0xeffff6e8)
at ../../src/recurse.c:807
#7 0x29bec in walklist (list=0xcf9e8, proc=0x47890 <do_dir_proc>,
closure=0xeffff6e8) at ../../src/hash.c:370
#8 0x47764 in do_recursion (frame=0xeffff790) at ../../src/recurse.c:543
#9 0x47084 in start_recursion (fileproc=0x52ed4 <val_fileproc>,
filesdoneproc=0, direntproc=0x52f14 <val_direntproc>, dirleaveproc=0,
callerdat=0xeffff868, argc=0, argv=0xcee88, local=0, which=6, aflag=1,
readlock=1, update_preload=0x0, dosrcs=1) at ../../src/recurse.c:216
#10 0x53188 in tag_check_valid (name=0xc4538 "\tNetscape_20000922_BRANCH",
argc=0, argv=0xc3cf0, local=0, aflag=1,
---Type <return> to continue, or q <return> to quit---
repository=0xae5d0 "/cvsroot/mozilla/build/mac") at ../../src/tag.c:835
#11 0x17dbc in checkout_proc (pargc=0xeffffa24, argv=0xc3cec, where_orig=0x0,
mwhere=0x0, mfile=0xc3d30 "XPCOM_BASE", shorten=804352, local_specified=0,
omodule=0xa8178 "mozilla/build/mac", msg=0x7ac00 "Updating")
at ../../src/checkout.c:1005
#12 0x36bec in do_module (db=0xa7610, mname=0xa8178 "mozilla/build/mac",
m_type=CHECKOUT, msg=0x7ac00 "Updating",
callback_proc=0x17450 <checkout_proc>, where=0x0, shorten=0,
local_specified=0, run_module_prog=0, extra_arg=0x0)
at ../../src/modules.c:552
#13 0x171fc in checkout (argc=1, argv=0xc44d4) at ../../src/checkout.c:373
#14 0x4d914 in do_cvs_command (cmd_name=0x8a718 "checkout",
command=0x16934 <checkout>) at ../../src/server.c:2349
#15 0x4eeac in serve_co (arg=0xc0082 "") at ../../src/server.c:3322
#16 0x50a7c in server (argc=662272, argv=0xeffffe98) at ../../src/server.c:4599
#17 0x353fc in main (argc=1, argv=0xeffffe98) at ../../src/main.c:923
(gdb) frame 2
#2 0x52ef4 in val_fileproc (callerdat=0xeffff868, finfo=0xeffff548)
at ../../src/tag.c:691
../../src/tag.c:691: No such file or directory.
(gdb) print *finfo
$1 = {file = 0xc9d70 "BuildList.pm", update_dir = 0xceeb8 "",
fullname = 0xc9de8 "BuildList.pm",
repository = 0xc0820 "/cvsroot/mozilla/build/mac", entries = 0x0,
rcs = 0xaed60}
cvs always appears to be spinning while looking for the branch tag in that
particular file (which happens to be in the Attic, and in fact does not have the
tag).
Perhaps a newer version of cvs fixes this?
Assignee | ||
Comment 2•24 years ago
|
||
Okay, after intensive detective work I have a theory of what is happening.
Clue #1: after looking where the requests comes that horks cvs -> all seems to
be Macs
Clue #2: I did some snooping. Take a look at this packet:
0: 0800 2095 e8a1 00b0 c285 aca1 0800 4500 .. ...........E.
16: 00a2 3801 4000 f806 33bc d00c 24ee cfc8 ..8.@.ø.3...$...
32: 51d5 c001 0961 3058 0793 cea3 d4d6 5018 Q....a0X......P.
48: 8000 2f44 0000 4469 7265 6374 6f72 7920 ../D..Directory
64: 2e0a 2f63 7673 726f 6f74 0a41 7267 756d ../cvsroot.Argum
80: 656e 7420 2d41 0a41 7267 756d 656e 7420 ent -A.Argument
96: 2d72 0a41 7267 756d 656e 7420 094e 6574 -r.Argument .Net
112: 7363 6170 655f 3230 3030 3039 3232 5f42 scape_20000922_B
128: 5241 4e43 480a 4172 6775 6d65 6e74 202d RANCH.Argument -
144: 6e0a 4172 6775 6d65 6e74 206d 6f7a 696c n.Argument mozil
160: 6c61 2f62 7569 6c64 2f6d 6163 0a63 6f0a la/build/mac.co.
Look for string "-r.Argument .Net". The dot in front of "Net" is 0x09 -> HT.
Compare it to this pserver packet (pulled from my home adsl):
0: 0800 2095 e8a1 00b0 c285 aca1 0800 4500 .. ...........E.
16: 00ad bcad 4000 3506 ad47 3fc1 79f7 cfc8 ....@.5..G?.y<F7>..
32: 51d5 052e 0961 3166 b667 db40 41f5 8018 Q....a1f.g.@A<F5>..
48: 7d78 fa9f 0000 0101 080a 010d 8167 1bec }x...........g..
64: 81f1 4172 6775 6d65 6e74 202d 4e0a 4172 ..Argument -N.Ar
80: 6775 6d65 6e74 202d 500a 4172 6775 6d65 gument -P.Argume
96: 6e74 202d 720a 4172 6775 6d65 6e74 204e nt -r.Argument N
112: 6574 7363 6170 655f 3230 3030 3039 3232 etscape_20000922
128: 5f42 5241 4e43 480a 4172 6775 6d65 6e74 _BRANCH.Argument
144: 206d 6f7a 696c 6c61 2f62 7569 6c64 2f6d mozilla/build/m
160: 6163 0a44 6972 6563 746f 7279 202e 0a2f ac.Directory ../
176: 6376 7372 6f6f 740a 636f 0a cvsroot.co.
-> No tabs there.
I talked about this with smfr and according to him Mac build machines does
explicitely pull mozilla/build/mac before they run anything else.
So, my theory here is that for some reason MACs runs command:
cvs co -A -r<tab>Netscape_20000922_BRANCH -n mozilla/build/mac
And apparently maccvs client don't check this and is causing it to violate
pserver protocol.
Assignee | ||
Comment 3•24 years ago
|
||
We need someone to look into these macs and see if the theory is valid. This is
causing major problems for both internal (cvs.netscape.com) and external
(cvs.mozilla.org) cvs servers. Cc:ing smfr and leaf.
Assignee | ||
Comment 4•24 years ago
|
||
Btw, I have had a script "cvs-kill.pl" I've run from cron to find these runaway
processes out and kill 'em. But someone still should take a look at those macs
and find out if the problem is what I've described. Also, internal cvs servers
don't run the script.
Comment 5•24 years ago
|
||
I'll take a look at the automation and see what I can see. that would explain
why the macs are more screwy than usual on the branch. thanks for the detective
work!
Comment 6•24 years ago
|
||
granrose fixed the errand tab, so I think this is fixed.
Comment 7•24 years ago
|
||
yes, it looks like all the macs are pulling successfully. Now the only question
is the server happy again?
Assignee | ||
Comment 8•24 years ago
|
||
We should see that soon. I'm keeping the bug open until we can be sure that
cvs.mozilla.org or cvs.mcom.com aren't hit anymore.
Reporter | ||
Comment 9•24 years ago
|
||
Nice detective work, Risto!
Assignee | ||
Comment 10•24 years ago
|
||
Thanks Dan! Sniffing is fun.
Things have been calm, closing the bug. I've also posted to gnu.cvs.bug to see
if this is a known bug and fixed yet - or maybe fixed in 1.11.1. When I'm
looking into code I don't it see checking anything else than numeric tags.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•