Closed
Bug 779159
Opened 12 years ago
Closed 12 years ago
mw32-ix-slave* connections to buildbot master are dying during linking
Categories
(Infrastructure & Operations Graveyard :: NetOps, task)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: emorley, Unassigned)
References
Details
First build this occurred on was: https://tbpl.mozilla.org/?tree=Mozilla-Esr10&onlyunstarred=1&rev=d07051929ca3 -> (mw32-ix-slave13) https://tbpl.mozilla.org/php/getParsedLog.php?id=13963505&tree=Mozilla-Esr10 -> (mw32-ix-slave15, free space clobber) https://tbpl.mozilla.org/php/getParsedLog.php?id=13970058&tree=Mozilla-Esr10 -> We then somehow got a green on the next push That changeset doesn't seem like it could have broken the build, unless I'm missing something? https://hg.mozilla.org/releases/mozilla-esr10/rev/d07051929ca3 All other builds since then have failed, even after clobbering. Win Nightly also failed: https://tbpl.mozilla.org/php/getParsedLog.php?id=14002091&tree=Mozilla-Esr10 All with: { make -C toolkit/library libs make[6]: Entering directory `/e/builds/moz2_slave/m-esr10-w32/build/obj-firefox/toolkit/library' d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/m-esr10-w32/build/config/pythonpath.py -I../../config /e/builds/moz2_slave/m-esr10-w32/build/config/expandlibs_exec.py --uselist -- d:/mozilla-build/python25/python2.5.exe /e/builds/moz2_slave/m-esr10-w32/build/build/link.py /e/builds/moz2_slave/m-esr10-w32/build/obj-firefox/toolkit/library/linker-vsize link -NOLOGO -DLL -OUT:xul.dll -PDB:xul.pdb -SUBSYSTEM:WINDOWS dlldeps-xul.obj nsStaticXULComponents.obj nsDllMain.obj nsGFXDeps.obj dlldeps-zlib.obj nsUnicharUtils.obj nsBidiUtils.obj nsRDFResource.obj ./module.res -LARGEADDRESSAWARE -NXCOMPAT -DYNAMICBASE -SAFESEH -DEBUG -DEBUGTYPE:CV -DEBUG -OPT:REF -LTCG:PGUPDATE -LIBPATH:../../dist/lib -NODEFAULTLIB:msvcrt -NODEFAULTLIB:msvcrtd -NODEFAULTLIB:msvcprt -NODEFAULTLIB:msvcprtd -DEFAULTLIB:mozcrt ../../toolkit/xre/xulapp_s.lib ../../staticlib/components/necko.lib ../../staticlib/components/uconv.lib ../../staticlib/components/i18n.lib ../../staticlib/components/chardet.lib ../../staticlib/components/jar50.lib ../../staticlib/components/startupcache.lib ../../staticlib/components/pref.lib ../../staticlib/components/htmlpars.lib ../../staticlib/components/imglib2.lib ../../staticlib/components/gkgfx.lib ../../staticlib/components/gklayout.lib ../../staticlib/components/docshell.lib ../../staticlib/components/embedcomponents.lib ../../staticlib/components/webbrwsr.lib ../../staticlib/components/nsappshell.lib ../../staticlib/components/txmgr.lib ../../staticlib/components/commandlines.lib ../../staticlib/components/toolkitcomps.lib ../../staticlib/components/pipboot.lib ../../staticlib/components/pipnss.lib ../../staticlib/components/appcomps.lib ../../staticlib/components/jsreflect.lib ../../staticlib/components/composer.lib ../../staticlib/components/jetpack_s.lib ../../staticlib/components/telemetry.lib ../../staticlib/components/jsdebugger.lib ../../staticlib/components/storagecomps.lib ../../staticlib/components/rdf.lib ../../staticlib/components/windowds.lib ../../staticlib/components/jsctypes.lib ../../staticlib/components/jsperf.lib ../../staticlib/components/gkplugin.lib ../../staticlib/components/windowsproxy.lib ../../staticlib/components/jsd.lib ../../staticlib/components/autoconfig.lib ../../staticlib/components/auth.lib ../../staticlib/components/cookie.lib ../../staticlib/components/permissions.lib ../../staticlib/components/universalchardet.lib ../../staticlib/components/places.lib ../../staticlib/components/tkautocomplete.lib ../../staticlib/components/satchel.lib ../../staticlib/components/pippki.lib ../../staticlib/components/imgicon.lib ../../staticlib/components/gkwidget.lib ../../staticlib/components/accessibility.lib ../../staticlib/components/spellchecker.lib ../../staticlib/components/zipwriter.lib ../../staticlib/components/services-crypto.lib ../../staticlib/jsipc_s.lib ../../staticlib/domipc_s.lib ../../staticlib/domplugins_s.lib ../../staticlib/mozipc_s.lib ../../staticlib/mozipdlgen_s.lib ../../staticlib/ipcshell_s.lib ../../staticlib/gfx2d.lib ../../staticlib/gfxipc_s.lib ../../staticlib/hal_s.lib ../../staticlib/xpcom_core.lib ../../staticlib/ucvutil_s.lib ../../staticlib/chromium_s.lib ../../staticlib/mozreg_s.lib ../../staticlib/thebes.lib ../../staticlib/ycbcr.lib ../../staticlib/angle.lib ../../media/libjpeg/jpeg3250.lib ../../media/libpng/png.lib ../../gfx/qcms/mozqcms.lib e:/builds/moz2_slave/m-esr10-w32/build/obj-firefox/dist/lib/mozjs.lib e:/builds/moz2_slave/m-esr10-w32/build/obj-firefox/dist/lib/crmf.lib e:/builds/moz2_slave/m-esr10-w32/build/obj-firefox/dist/lib/smime3.lib e:/builds/moz2_slave/m-esr10-w32/build/obj-firefox/dist/lib/ssl3.lib e:/builds/moz2_slave/m-esr10-w32/build/obj-firefox/dist/lib/nss3.lib e:/builds/moz2_slave/m-esr10-w32/build/obj-firefox/dist/lib/nssutil3.lib ../../gfx/cairo/cairo/src/mozcairo.lib ../../gfx/cairo/libpixman/src/mozlibpixman.lib ../../gfx/harfbuzz/src/mozharfbuzz.lib ../../gfx/ots/src/mozots.lib ../../dist/lib/mozsqlite3.lib ../../modules/zlib/src/mozz.lib e:/builds/moz2_slave/m-esr10-w32/build/obj-firefox/dist/lib/nspr4.lib e:/builds/moz2_slave/m-esr10-w32/build/obj-firefox/dist/lib/plc4.lib e:/builds/moz2_slave/m-esr10-w32/build/obj-firefox/dist/lib/plds4.lib ../../dist/lib/mozalloc.lib kernel32.lib user32.lib gdi32.lib winmm.lib wsock32.lib advapi32.lib shell32.lib ole32.lib uuid.lib version.lib winspool.lib comdlg32.lib imm32.lib winmm.lib wsock32.lib msimg32.lib shlwapi.lib psapi.lib ws2_32.lib dbghelp.lib wininet.lib usp10.lib oleaut32.lib PGOMGR : warning PG0188: No .PGC files matching 'xul!*.pgc' were found. warning C4743: 'const std::logic_error::`vftable'' has different size in 'e:\builds\moz2_slave\m-esr10-w32\build\toolkit\crashreporter\google-breakpad\src\common\windows\http_upload.cc' and 'e:\builds\moz2_slave\m-esr10-w32\build\toolkit\xre\nsWindowsDllBlocklist.cpp': 12 and 16 bytes warning C4743: 'const std::length_error::`vftable'' has different size in 'e:\builds\moz2_slave\m-esr10-w32\build\toolkit\crashreporter\google-breakpad\src\common\windows\http_upload.cc' and 'e:\builds\moz2_slave\m-esr10-w32\build\toolkit\xre\nsWindowsDllBlocklist.cpp': 12 and 16 bytes Creating library xul.lib and object xul.exp Generating code remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] }
Comment 1•12 years ago
|
||
"PGOMGR : warning PG0188: No .PGC files matching 'xul!*.pgc' were found." is not an error, it happens in every PGO build. We merge and delete the pgc files into the pgd file before linking, but the linker warns about that anyway. The only thing I can think is that it's not generating output so buildbot is killing it? Not sure why we'd only see this on ESR, maybe beacuse it's still using VC 2005?
Reporter | ||
Comment 2•12 years ago
|
||
Ah ok, I haven't every noticed that in a log, sorry. https://tbpl.mozilla.org/php/getParsedLog.php?id=14006787&tree=Mozilla-Esr10
Reporter | ||
Comment 3•12 years ago
|
||
s/every/ever/
Comment 4•12 years ago
|
||
Nothing build-related seems to be an error in that log. That, on the other hand: remoteFailed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. ] [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion. And I don't know anything in the build system using twisted.
Reporter | ||
Comment 5•12 years ago
|
||
Hi ho, hi ho, off to releng we go! (Thank you for taking a look :-))
Component: Build Config → Release Engineering
Product: Core → mozilla.org
Summary: Permanent Win7 failure on esr10, with "PGOMGR : warning PG0188: No .PGC files matching 'xul!*.pgc' were found." → Permanent Win7 failure on esr10, with "Connection to the other side was lost in a non-clean fashion"
Version: Trunk → other
Reporter | ||
Comment 6•12 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=14011154&tree=Mozilla-Esr10
Reporter | ||
Comment 7•12 years ago
|
||
Tree closed; we've had 5 pushes without Windows 7 coverage.
Severity: critical → blocker
Reporter | ||
Updated•12 years ago
|
OS: All → Windows 7
Hardware: All → x86
Comment 8•12 years ago
|
||
At a glance, this seems highly unlikely that it's a configuration issue, but I'll look into it more deeply.
Assignee: nobody → bhearsum
Comment 9•12 years ago
|
||
Based on the fact that: - Connections are consistently getting killed during idle parts of the job (eg, during linking) when there's no activity between the master and slave - We had network maintenance over the weekend - The first occurrence of this is on Monday I think that something changed in the network configuration that is more aggressively killing "idle" connections. Can someone from IT look into whether or not that's plausible?
Assignee: bhearsum → server-ops
Component: Release Engineering → Server Operations
QA Contact: jdow
Summary: Permanent Win7 failure on esr10, with "Connection to the other side was lost in a non-clean fashion" → mw32-ix-slave* connections to buildbot master are dieing during linking
Comment 10•12 years ago
|
||
Can you specify some slave/master pairs where you're seeing this?
Updated•12 years ago
|
Assignee: server-ops → network-operations
Severity: blocker → critical
Component: Server Operations → Server Operations: Netops
QA Contact: jdow → ravi
Updated•12 years ago
|
Summary: mw32-ix-slave* connections to buildbot master are dieing during linking → mw32-ix-slave* connections to buildbot master are dying during linking
Comment 11•12 years ago
|
||
One example: buildbot-master30.srv.releng.scl3.mozilla.com and mw32-ix-slave15.build.mtv1.mozilla.com
Comment 12•12 years ago
|
||
we haven't changed any service time outs. what port is this communicating on?
Comment 13•12 years ago
|
||
(In reply to casey ransom [:casey] from comment #12) > we haven't changed any service time outs. what port is this communicating on? The slave connects to the master's port 9001 (tcp)
Comment 14•12 years ago
|
||
timeout should be extended again now. traffic from build to internal hosts was moved to a different zone on fw1.mtv1. the policy applied there was an 'any' policy which didn't include provisions for the longer timeout you require.
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 15•12 years ago
|
||
I've kicked some new jobs that I'll keep an eye on. These jobs take 3-4h to complete, so it'll be awhile before I can confirm that this fixed things. Thank you for the very quick response, though!
Comment 16•12 years ago
|
||
I rebooted all of the affected slaves last night to make sure they didn't have any stale state. After that, all the builds I kicked worked fine. Thanks very much for the quick response here.
Status: RESOLVED → VERIFIED
Updated•12 years ago
|
Assignee: network-operations → nobody
Component: Server Operations: Netops → Release Engineering
QA Contact: ravi
Updated•12 years ago
|
Assignee: nobody → network-operations
tracking-firefox-esr10:
? → ---
Component: Release Engineering → Server Operations: Netops
QA Contact: ravi
Comment 17•12 years ago
|
||
just moving it around to take off the esr tracking flag.
Updated•11 years ago
|
Product: mozilla.org → Infrastructure & Operations
Updated•2 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•