Closed Bug 509960 Opened 15 years ago Closed 15 years ago

Occasional link failures on win32 (LINK : fatal error LNK1104: cannot open file '<something>.lib') (usually jsctypes-test.lib)

Categories

(Release Engineering :: General, defect, P3)

x86
Windows Server 2003
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: jrmuizel, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [buildduty])

We seem to be getting occasional link failures on the windows builds that look like: LINK : fatal error LNK1104: cannot open file 'AccessibleMarshal.lib' http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1250066647.1250070149.28096.gz or LINK : fatal error LNK1104: cannot open file 'npwinless.lib' http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1250067338.1250069723.23446.gz
LINK : fatal error LNK1104: cannot open file 'IA2Marshal.lib' make[7]: *** [IA2Marshal.dll] Error 80 http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1252076099.1252077304.8565.gz&fulltext=1
This may be a race condition where the library doesn't yet exist when the link command is called. Are the dependencies set correctly? Can you test this with try server and a mozconfig that has -j1 instead of -j4. This changeset looks like it may correct the first link failure http://hg.mozilla.org/mozilla-central/rev/cccf8041fc0b
I don't think this belongs in releng; where should we move this bug?
(In reply to comment #2) > This may be a race condition where the library doesn't yet exist when the link > command is called. Are the dependencies set correctly? Can you test this with > try server and a mozconfig that has -j1 instead of -j4. This changeset looks > like it may correct the first link failure > > http://hg.mozilla.org/mozilla-central/rev/cccf8041fc0b That changeset is completely unrelated, FWIW. (It's just a packaging manifest change.) If you look at the linker commandline and output in the build log: link -NOLOGO -DLL -OUT:IA2Marshal.dll -PDB:IA2Marshal.pdb -SUBSYSTEM:WINDOWS dlldata.obj Accessible2_p.obj AccessibleAction_p.obj AccessibleApplication_p.obj AccessibleComponent_p.obj AccessibleEditableText_p.obj AccessibleHyperlink_p.obj AccessibleHypertext_p.obj AccessibleImage_p.obj AccessibleRelation_p.obj AccessibleTable_p.obj AccessibleText_p.obj AccessibleValue_p.obj Accessible2_i.obj AccessibleAction_i.obj AccessibleApplication_i.obj AccessibleComponent_i.obj AccessibleEditableText_i.obj AccessibleHyperlink_i.obj AccessibleHypertext_i.obj AccessibleImage_i.obj AccessibleRelation_i.obj AccessibleTable_i.obj AccessibleText_i.obj AccessibleValue_i.obj ./module.res -NXCOMPAT -SAFESEH -DYNAMICBASE -MANIFEST:NO -LIBPATH:"e:/builds/moz2_slave/mozilla-central-win32/build/obj-firefox/memory/jemalloc/crtsrc/build/intel" -NODEFAULTLIB:msvcrt -NODEFAULTLIB:msvcrtd -NODEFAULTLIB:msvcprt -NODEFAULTLIB:msvcprtd -DEFAULTLIB:mozcrt19 -DEFAULTLIB:mozcpp19 -DEBUG -OPT:REF -OPT:nowin98 -LTCG:PGINSTRUMENT -DEF:e:/builds/moz2_slave/mozilla-central-win32/build/accessible/public/ia2/IA2Marshal.def kernel32.lib rpcns4.lib rpcrt4.lib ole32.lib oleaut32.lib Creating library IA2Marshal.lib and object IA2Marshal.exp LINK : fatal error LNK1104: cannot open file 'IA2Marshal.lib' You'll note that the linker is supposed to be creating the .lib file (it's the import library for the DLL it's creating). As to why it can't open the file, who knows? Clearly it can write to that directory. If we were out of disk space I think we'd error differently. Either there's something weirder happening here, or this is a compiler bug.
We will have to deal with this later on when we are freed up.
Component: Release Engineering → Release Engineering: Future
Ted thanks for the last comment
Is this possibly the same as bug 419445?
It looks like it, though I believe it'll increase the scope from just VS2008 to VS2008+VS2005.
This syndrome -- intermittent failures with a nonspecific error message -- makes me think that the problem is at the operating system level; for instance we might be hitting the global limit on the number of open files. I wish the linker printed the GetLastError() message instead of just "cannot open file". Question for RelEng, does this consistently happen on one particular machine?
[ N.B. I don't know if Windows *has* a global limit on the number of open files. ]
It's certainly possible that this is a compiler or OS bug, or some other weird edge case that we're running into. It's just really hard to figure anything out without being able to consistently reproduce. I agree that VC++ has terrible error messages, FWIW. :-/
No, that's not the same thing. Look at the error message! That appears to be fallout from bug 518107.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1254235707.1254237101.22872.gz&fulltext=1 WINNT 5.2 mozilla-1.9.2 build on 2009/09/29 07:48:27 LINK : fatal error LNK1104: cannot open file 'jsctypes-test.lib' make[6]: *** [jsctypes-test.dll] Error 80 make[6]: *** Deleting file `jsctypes-test.dll' make[5]: *** [libs] Error 2 make[4]: *** [libs_tier_gecko] Error 2 make[3]: *** [tier_gecko] Error 2 make[2]: *** [default] Error 2 make[1]: Leaving directory `/e/builds/moz2_slave/mozilla-1.9.2-win32/build' make[1]: *** [build] Error 2 make: *** [profiledbuild] Error 2
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1256641666.1256650763.4307.gz WINNT 5.2 mozilla-central build on 2009/10/27 04:07:46 LINK : fatal error LNK1104: cannot open file 'npwinless.lib'
OS: Mac OS X → Windows NT
There's a possibility that we're running out of disk here... bug 522719 . 8 and 9 gb free before the build, which would seem like enough, but I'm not sure how much temp space we use. Otherwise, this might be a dupe of bug 419445 .
So we could be running out of space on C here. moz2-win32-slave31 just died with a similar error, and had 681 MB free on C.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1257444130.1257448311.29529.gz WINNT 5.2 mozilla-1.9.2 build on 2009/11/05 10:02:10 LINK : fatal error LNK1104: cannot open file 'jsctypes-test.lib' Also, let's not have three open bugs for this. I merged the other two into this one as it seems to have more analysis.
Blocks: 438871, 474683
Summary: Occasional link failures on win32 (LINK : fatal error LNK1104: cannot open file 'npwinless.lib' or 'AccessibleMarshal.lib') → Occasional link failures on win32 (LINK : fatal error LNK1104: cannot open file '<something>.lib')
Whiteboard: [orange]
OS: Windows NT → Windows Server 2003
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1258316917.1258317506.17115.gz WINNT 5.2 mozilla-central build on 2009/11/15 12:28:37 "s: moz2-win32-slave06"
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1258644534.1258645310.19196.gz WINNT 5.2 mozilla-central build on 2009/11/19 07:28:54 "s: moz2-win32-slave41" LINK : fatal error LNK1104: cannot open file 'jsctypes-test.lib' make[6]: *** [jsctypes-test.dll] Error 80
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1258669091.1258670404.18892.gz This seems to be happening a bunch in 'jsctypes-test.lib', which really doesn't have much special about it - just an ordinary little test library, built as part of unit tests. This makes me think that there's a race going on, and maybe our dependencies are a little messed up.
I don't know what else would be racing with it. Only the linker produces the import library, nothing else does, and I don't see any other suspicious commands when I look at these logs.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1259178814.1259180025.31314.gz WINNT 5.2 mozilla-central build on 2009/11/25 11:53:34 "s: moz2-win32-slave09" LINK : fatal error LNK1104: cannot open file 'jsctypes-test.lib'
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1260653501.1260654862.24272.gz WINNT 5.2 mozilla-central build on 2009/12/12 13:31:41 LINK : fatal error LNK1104: cannot open file 'jsctypes-test.lib'
Summary: Occasional link failures on win32 (LINK : fatal error LNK1104: cannot open file '<something>.lib') → Occasional link failures on win32 (LINK : fatal error LNK1104: cannot open file '<something>.lib') (usually jsctypes-test.lib)
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1260786098.1260792840.11121.gz s: win32-slave43 LINK : fatal error LNK1104: cannot open file 'crashinjectdll.lib'
http://tinderbox.mozilla.org/showlog.cgi?log=Places/1260817454.1260823020.6663.gz&fulltext=1 s: win32-slave12 builder: WINNT 5.2 places leak test build LINK : fatal error LNK1104: cannot open file 'e:\builds\moz2_slave\places-win32-debug\build\obj-firefox\dist\lib\sqlite3.lib' make[6]: Leaving directory `/e/builds/moz2_slave/places-win32-debug/build/security/nss/lib/softoken' make[6]: *** [e:/builds/moz2_slave/places-win32-debug/build/obj-firefox/nss/softokn/softokn3.dll] Error 80
Whiteboard: [orange] → [orange][buildduty]
Component: Release Engineering: Future → Release Engineering
I am trying to see if there are re-incidental slaves but the first seven logs were unavailable. From the 4 logs without slave name mentioned on the comment: * win32-slave10 * WIN32-SLAVE36 * moz2-win32-slave37 * WIN32-SLAVE40 Restating slave names starting from comment 19: * moz2-win32-slave06 * moz2-win32-slave09 * win32-slave12 * moz2-win32-slave31 * moz2-win32-slave41 * win32-slave43 I would like to discourage the theory of machine specific problem.
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey2.0/1261086893.1261087520.19828.gz WINNT 5.2 comm-1.9.1 build on 2009/12/17 13:54:53 s: cb-seamonkey-win32-02 { LINK : fatal error LNK1104: cannot open file 'xpcom.lib' }
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1261267156.1261267864.18833.gz WINNT 5.2 comm-central-trunk build on 2009/12/19 15:59:16 s: cb-seamonkey-win32-01 LINK : fatal error LNK1104: cannot open file 'xpcom.lib'
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1261440351.1261441593.17997.gz WINNT 5.2 comm-central-trunk build on 2009/12/21 16:05:51 s: cb-seamonkey-win32-01 LINK : fatal error LNK1104: cannot open file 'jsctypes-test.lib'
I posted a question in the MSDN forums about this: http://social.msdn.microsoft.com/Forums/en-US/vcgeneral/thread/1c0fb80c-6227-400d-8470-c51f42bdf7c1 Not really expecting to get an answer there but it can't hurt.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1261522328.1261523506.31957.gz WINNT 5.2 mozilla-central build on 2009/12/22 14:52:08 s: win32-slave06 LINK : fatal error LNK1104: cannot open file 'jsctypes-test.lib'
Looking back through the logs here, I found one instance where the build was a clobber: comment 29, http://tinderbox.mozilla.org/showlog.cgi?tree=Firefox&errorparser=unix&logfile=1260786098.1260792840.11121.gz&buildtime=1260786098&buildname=WINNT%205.2%20mozilla-central%20build&fulltext=1 So we can eliminate the possibility that the .lib file is being held open and can't be deleted or modified, given that it's not there to begin with.
(In reply to comment #37) > Looking back through the logs here, I found one instance where the build was a > clobber: comment 29, > http://tinderbox.mozilla.org/showlog.cgi?tree=Firefox&errorparser=unix&logfile=1260786098.1260792840.11121.gz&buildtime=1260786098&buildname=WINNT%205.2%20mozilla-central%20build&fulltext=1 > > So we can eliminate the possibility that the .lib file is being held open and > can't be deleted or modified, given that it's not there to begin with. That depends on if this was a PGO build, and if so, if this was the first or second pass through.
Ah. You're right, on the clobber it happened on the second pass. This hints at the pre-existence of the .lib file being relevant, but it's not strong evidence. On Zack's MSDN forum post, someone suggested adding a rule to remove the .lib file (and friends?) before invoking 'link'. I'm not sure if this would work, in that case that the problem is said file being held open. Or whether it would interfere with PGO, but I'm guessing that data is stored in the .pdb file. How does this idea sound?
I think removing the import library before linking ought to be safe, but it's not something I've ever tested. I would assume that the linker would just regenerate it, given that it says it's going to in the output.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.6/1262161104.1262161817.17143.gz&fulltext=1 LINK : fatal error LNK1104: cannot open file 'jsctypes-test.lib' s: win32-slave11
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1262209619.1262220995.6646.gz WINNT 5.2 mozilla-central build on 2009/12/30 13:46:59 s: win32-slave26 LINK : fatal error LNK1104: cannot open file 'npwinless.lib'
This isn't being worked on at this time, off to the Future.
Component: Release Engineering → Release Engineering: Future
Mass move of bugs from Release Engineering:Future -> Release Engineering. See http://coop.deadsquid.com/2010/02/kiss-the-future-goodbye/ for more details.
Component: Release Engineering: Future → Release Engineering
Priority: -- → P3
We could still hit this if a VM gets a build, but the combination of prioritizing ix machines for builds and running with -j1 has made this much less common.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WORKSFORME
Whiteboard: [orange][buildduty] → [buildduty]
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.