Last Comment Bug 545015 - configure sometimes fails with "rm: cannot lstat `conftest.exe': Permission denied" followed by "C++ compiler cannot create executables"
: configure sometimes fails with "rm: cannot lstat `conftest.exe': Permission d...
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Build Config (show other bugs)
: Trunk
: x86 Windows 7
: -- normal with 1 vote (vote)
: mozilla12
Assigned To: :Ehsan Akhgari (busy, don't ask for review please)
:
Mentors:
: 513371 634943 (view as bug list)
Depends on:
Blocks: 758732 788241
  Show dependency treegraph
 
Reported: 2010-02-08 15:20 PST by Justin Dolske [:Dolske]
Modified: 2015-05-26 07:40 PDT (History)
32 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
sleep for 1 second before attempting to touch conftest.exe the second time (826 bytes, patch)
2012-01-17 13:52 PST, :Ehsan Akhgari (busy, don't ask for review please)
ted: review-
Details | Diff | Review
Patch (v2) (421 bytes, patch)
2012-01-18 12:09 PST, :Ehsan Akhgari (busy, don't ask for review please)
ted: review+
Details | Diff | Review
conf.tar (10.00 KB, application/x-tar)
2012-01-22 13:56 PST, Bob Clary [:bc:]
no flags Details
conf-2.tar (30.00 KB, application/x-tar)
2012-01-22 14:12 PST, Bob Clary [:bc:]
no flags Details

Description Justin Dolske [:Dolske] 2010-02-08 15:20:21 PST
Seems like about 1/2 the time when I run configure, I get the following:

  ...
  checking whether cl accepts -g... (cached) no
  checking for c++... (cached) cl
  checking whether the C++ compiler (cl  ) works... rm: cannot lstat
    `conftest.exe': Permission denied
  no
  configure: error: installation or configuration problem: C++ compiler cannot
    create executables.

If I just run configure again, it works fine. [In fact, I just ran it a few times in a row, so maybe it's only sometimes failing the first time it's run after something has changed?]

I suspect this might be Windows 7 specific, as I've never seen it happen on my XP builds.
Comment 1 Ted Mielczarek [:ted.mielczarek] 2010-02-08 15:32:54 PST
I've seen this on Win 7 as well (never on XP), but haven't bothered to file a bug. You should check out the tail of config.log next time you hit this, although it may not be any more informative.
Comment 2 Benjamin Smedberg [:bsmedberg] 2010-02-08 16:04:51 PST
I see this a lot too. config.log says that something can't be deleted. A permissions issue, perhaps?
Comment 3 Ted Mielczarek [:ted.mielczarek] 2010-02-09 08:09:00 PST
just hit this in a clean objdir. config.log says:
configure:2522: checking whether the C++ compiler (cl  ) works
configure:2538: cl -o conftest    conftest.C  1>&5
Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.01 for 80x86
Copyright (C) Microsoft Corporation.  All rights reserved.

cl : Command line warning D9035 : option 'o' has been deprecated and will be rem
oved in a future release
conftest.C
Microsoft (R) Incremental Linker Version 9.00.30729.01
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:conftest.exe
/out:conftest.exe
conftest.obj
LINK : fatal error LNK1104: cannot open file 'conftest.exe'
configure: failed program was:

#line 2533 "configure"
#include "confdefs.h"
Comment 4 Ted Mielczarek [:ted.mielczarek] 2010-08-04 05:28:51 PDT
*** Bug 513371 has been marked as a duplicate of this bug. ***
Comment 5 Ted Mielczarek [:ted.mielczarek] 2011-02-17 10:25:21 PST
*** Bug 634943 has been marked as a duplicate of this bug. ***
Comment 6 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-05-02 13:31:50 PDT
This bug is pretty hurty because I often kick off a build and then go do something else.  If the build needs a reconfigure and this test fails, I might lose hours by not noticing.  Rob has mentioned this happening to him too.

bent says he knows what the underlying problem is.  Can we fix it?
Comment 7 Ben Turner (not reading bugmail, use the needinfo flag!) 2011-05-02 13:33:59 PDT
(In reply to comment #6)
> bent says he knows what the underlying problem is.  Can we fix it?

Sorry, I don't know. I suspect it's just msys locking files for exclusive access when it shouldn't or in a racy way, but I don't know of a good way to verify.
Comment 8 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-05-02 13:48:39 PDT
Is there a way for us to get more information about the error?  Does win32 have something like |lsof|?  Would that info help?
Comment 9 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2011-05-02 13:58:53 PDT
ISTR Kevin looking at this once upon a time.
Comment 10 K. Gadd (:kael) 2011-05-02 15:17:44 PDT
There are a few things at work here, generally speaking:

Windows allows you to open a file in 'shareable' mode, where the file can then later be opened by other applications if they also attempt to open it in a compatible sharing mode. This means that if your application opens a file for read, and allows sharing for Write, someone else can open it up and write it simultaneously.

One of the sharing modes is Delete, and it allows a file to be deleted while you have it open. However, if the file is deleted, your read handle is still valid, and remains valid until you close it - the file is kept alive until all outstanding handles to it are closed.

When a file is deleted while handles to it are opened, it exists in a weird filesystem limbo - sometimes you'll see it when browsing a directory, and the name is reserved until all handles attached to it are closed. Attempts to manipulate the reserved name (open it, move it, delete it) will fail.

As far as I can tell, MS does this to prevent you from ending up in a situation where you have two open handles to two different files with the exact same name. The result is that all sorts of background daemons can end up holding a shareable handle open to one of your files, causing delete operations to succeed but leave the name in use.

Off hand, I can think of a few apps that I've seen do this:

Windows 7 has a built-in service called 'Application Experience' that does some sort of compatibility checks against .exe files. It tends not to leave files open for long, but of particular importance is that if this service stops, the kernel will leave handles open to them for a while as it waits forever for the service to examine the executable.

The Windows Indexing Service will also open files in this manner, but unless the files are huge, it closes the handle pretty quick.

Most windows virus scanners and anti-malware packages will open files in this manner as well, locking them into existence while they're scanned. Worse still, most virus scanners hook various windows IO APIs, so you get misleading error messages and results in some cases because the virus scanners' authors didn't understand file sharing mechanisms.

The most reliable way to identify whether this is happening is to run Sysinternals Procmon and have it filter to the file you care about. If you get hit by a shareable delete problem, you'll see the 'PENDING_DELETE' error code show up in the Procmon log. For whatever reason, a lot of applications (and even Win32 APIs) turn ERROR_PENDING_DELETE into Permission Denied or somesuch, making it look like a permissions issue.
Comment 11 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2011-05-02 15:52:11 PDT
I tried running configure under Procmon, but I couldn't reproduce the bug, perhaps because Procmon slows everything down quite a bit. However, the Procmon log does show what Kevin describes. In particular on my system MsMpEng.exe opens the conftest.exe file that we just built (as do svghost.exe and lsass.exe, by the way) --- MsMpEng is part of the Microsoft antivirus stuff (which I thought I had disabled ... hmm).

So, I think there are two things we can do here:
1) Make sure everyone has really disabled antivirus stuff -- it's got to hurt performance, if nothing else.
2) In theory we could alter configure so that it uses unique names for each conftest.exe file and kinds of problems Kevin describes can't arise. I have no idea how hard that would be.
Comment 12 Ryan VanderMeulen [:RyanVM] 2011-05-02 15:53:49 PDT
FWIW, I have my entire MozillaBuild directory whitelisted in MSE, and I still see this all the time.
Comment 13 K. Gadd (:kael) 2011-05-02 15:58:59 PDT
Ultimately I think your best bet here is to use unique names. If unique names for the actual files is not feasible, using a new directory when you would otherwise clean out an existing directory is a more reasonable solution - you can still end up with name collisions, but they're far less likely, and it doesn't cost you as much.

It certainly helps to shut off applications that can cause this problem - in practice, I almost never hit it - but sometimes you may be dealing with a design that just isn't reliable on top of FAT32/NTFS due to the pending delete mechanism.

Also, one option that might be viable: If using a new folder every time isn't viable, and unique names aren't viable, what you could do is, for every build folder that has its contents periodically deleted, instead, have a junction in that location that points to a randomly-generated folder name. When the time comes to empty the folder, create a new randomly-generated folder, adjust the junction to point to that new empty folder, and then delete the contents of the old one. That way, the pending delete state doesn't kill you, and you don't have to update any code. I *think* all of the windows IO infrastructure understands junctions well enough that this shouldn't break anything.
Comment 14 Ted Mielczarek [:ted.mielczarek] 2011-05-02 16:48:31 PDT
That's easier said than done, since all of this work is happening deep inside autoconf macros:
http://git.savannah.gnu.org/gitweb/?p=autoconf.git;a=blob;f=acgeneral.m4;h=ae971de139977f52d92f424f0890bbab3ede4093;hb=74b21b9b2ce75529e3d93df7367f4f4b0c6b6adf#l1511
http://git.savannah.gnu.org/gitweb/?p=autoconf.git;a=blob;f=acgeneral.m4;h=ae971de139977f52d92f424f0890bbab3ede4093;hb=74b21b9b2ce75529e3d93df7367f4f4b0c6b6adf#l1233

I suppose we can just replace AC_LANG_{C,CPLUSPLUS} and AC_TRY_COMPILER with modified versions of the original. It's not like autoconf 2.13 is changing.
Comment 15 Martijn Wargers [:mwargers] (gone per 2016-05-31 :-( ) 2011-07-20 16:54:38 PDT
I was hitting this every time, I tried to build something.
It turned out, when I closed the other shell window, where I had something built previously, then, I didn't get this problem anymore.
Comment 16 Quentin Raynaud 2011-12-02 02:43:13 PST
I'm currently maintainer for a lot of Instantbird services (the xul based instant messaging client) including the Windows build machine. We decided recently to build a new VM with win7 + VS2010 for our Windows builds. We ran into this issue for nearly all our nightly builds...

I decided to give a try to some of the ideas I found in here. Basically, I deactivated the followibng services:
- Application Experience
- Windows Defender
- Windows Search

It did not help. In fact, it even got a lot worse. While we were able to get the build to finish by starting a build every time it falied, with this changes, the failure happened every time. No builds were able to get past the compiler check anymore. None.

I then found into this article: http://www.retrocopy.com/blog/28/cant-delete-exe-files-in-vista--windows-7-solved.aspx.

It's not very clear but I basically understood "deactivating the Application Experience service mess up with the .exe file deletion and gets things worse". I had nothing to lose and I reactivated this one only. Now, the build is getting past the compiler check again. At least, it did at first try. It is now building. I believe that with Defender & Search deactivated, it might be enough to prevent any later issue.

If I ran into another similar rm issue, I'll report it there.
Comment 17 :Ehsan Akhgari (busy, don't ask for review please) 2012-01-17 13:52:08 PST
Created attachment 589291 [details] [diff] [review]
sleep for 1 second before attempting to touch conftest.exe the second time

This is really hackish and all, but I think all we need to do here is to sleep 1 second before touching the conftest.exe file for the second time.  This seems to work for me.  Ted, how do you like this patch?
Comment 18 Ted Mielczarek [:ted.mielczarek] 2012-01-18 11:42:29 PST
Comment on attachment 589291 [details] [diff] [review]
sleep for 1 second before attempting to touch conftest.exe the second time

Review of attachment 589291 [details] [diff] [review]:
-----------------------------------------------------------------

Does this actually work around the problem, or will it just crop up at the next test? I agree that it's super annoying, so even a really crappy band-aid like this might be okay. I've certainly wasted way more than 1 second per build on this error.
Comment 19 :Ehsan Akhgari (busy, don't ask for review please) 2012-01-18 11:55:09 PST
In my experience, it fixes the problem.  Note that for some reason, this only ever happens on the second access to conftest.exe.  I have never seen configure breaking after the second conftest.exe has been created with this error message.

Maybe we should land this and see if anybody still sees this problem?
Comment 20 Ben Turner (not reading bugmail, use the needinfo flag!) 2012-01-18 11:58:26 PST
Yes, please! I agree with ted, even a lame band-aid is well worth it.
Comment 21 Ted Mielczarek [:ted.mielczarek] 2012-01-18 12:00:11 PST
Comment on attachment 589291 [details] [diff] [review]
sleep for 1 second before attempting to touch conftest.exe the second time

Review of attachment 589291 [details] [diff] [review]:
-----------------------------------------------------------------

Yeah, we might as well land it. We've had no traction on figuring out the root cause, so it's worthwhile. Can you wrap that in a check for target==WINNT though? No sense in adding an extra second on every platform.
Comment 22 :Ehsan Akhgari (busy, don't ask for review please) 2012-01-18 12:09:28 PST
Created attachment 589603 [details] [diff] [review]
Patch (v2)

I even added some comments!
Comment 23 Ted Mielczarek [:ted.mielczarek] 2012-01-18 12:14:51 PST
Comment on attachment 589603 [details] [diff] [review]
Patch (v2)

Review of attachment 589603 [details] [diff] [review]:
-----------------------------------------------------------------

::: configure.in
@@ +567,5 @@
>  else
>      AC_PROG_CC
> +    case "$target" in
> +    *-mingw*)
> +      # Work around the conftest.exe access problem on Windows (bug 545015)

Mentioning the bug number is a bit overkill, that'll be in the hg blame.
Comment 24 Florian Quèze [:florian] [:flo] 2012-01-18 13:02:37 PST
(In reply to Ted Mielczarek [:ted, :luser] from comment #21)

> Yeah, we might as well land it. We've had no traction on figuring out the
> root cause, so it's worthwhile.

Quentin (see comment 16) has fought this bug for a while on the Windows slave of Instantbird's buildbot. He tells me the problem has completely disappeared for us after doing this:

1. In the local group policy editor:
Local Computer Policy -> Computer Configuration -> Administrative Templates -> Windows Components -> Application Compatibility

Select Turn Off Application Compatibility Engine
Select Enabled under the Settings tab
Select Turn off Program Compatibility Assistant
Select Enabled under the Settings tab

2. Turn off the file indexation and the application experience services.

3. Reboot.

More than 3 weeks after doing this, we have yet to see another build fail with this error.
Comment 25 :Ehsan Akhgari (busy, don't ask for review please) 2012-01-18 13:39:27 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/0fcee05b6e0a

I will post to dev-platform tomorrow when this merges so that people can keep an eye to see if this still happens.
Comment 26 :Ehsan Akhgari (busy, don't ask for review please) 2012-01-18 13:40:38 PST
(In reply to Florian Quèze from comment #24)
> (In reply to Ted Mielczarek [:ted, :luser] from comment #21)
> 
> > Yeah, we might as well land it. We've had no traction on figuring out the
> > root cause, so it's worthwhile.
> 
> Quentin (see comment 16) has fought this bug for a while on the Windows
> slave of Instantbird's buildbot. He tells me the problem has completely
> disappeared for us after doing this:
> 
> 1. In the local group policy editor:
> Local Computer Policy -> Computer Configuration -> Administrative Templates
> -> Windows Components -> Application Compatibility
> 
> Select Turn Off Application Compatibility Engine
> Select Enabled under the Settings tab
> Select Turn off Program Compatibility Assistant
> Select Enabled under the Settings tab
> 
> 2. Turn off the file indexation and the application experience services.
> 
> 3. Reboot.
> 
> More than 3 weeks after doing this, we have yet to see another build fail
> with this error.

It would be interesting if you guys can reset these settings and try to see whether my patch helps when it gets merged with mozilla-central.
Comment 27 Quentin Raynaud 2012-01-19 01:28:38 PST
Well, considering the root cause of the problem (usually, the application service locking the conftest.exe file, somtimes the indexing service), obviously, waiting a little is going to help a lot. But it's still based on luck. the service could well swap and maintain its lock, or anything of the like.

So there is 2 cases
1/ Buildbot slaves. Here, we don't want to rely on "luck": we want those builds! We need a real solution. Now, you can try the settings flo gave you, I believe that it will solve the problem and plus, it might even speed up the process because those services are really slowing things down by analyzing nearly every generated file for no good reason...
2/ The developer that compiles a XUL based application (let's say Firefox because we love this one)... Here, we can advise him to change these settings. He won't miss the application experience service anyway (if he compiles Fx, he knows his stuff enough to find out that some application is not working properly on his new OS without needing MS databases). But in most cases, we want the guy to be able to compile without asking him to play around with those kind of Windows parameters. I want to say "it should work out of the box". In this case, the patch in this bug should be enough. With today's requirements to compile XUL, the guy probably has a good computer, so we don't have to worry too much about the service hanging in background at that time. Even if it does, then it will bug like that once, and the guy will try again. This time it should work...

I would also like to note that on a slave we encoutered this bug once in the JS src configure (which I believe contradict with comment #19).

Now, we can also discuss the obvious "right" way to fix this on the build side. Since the cause is locking on executables by the application experience service, the solution is probably to replace the autoconf macro testing the compiler on Windows to set something doing exactly the same thing but generating a conftest-<random>.exe file instead. This would be the best way to fix this. Another (maybe simpler) solution would be to find a way to run this macro after changing the currend directory to a randomly generated subfolder that we can delete after this using a simple rm -Rf. This maintain the original macro but solves the locking issue.
Comment 28 Marco Bonardo [::mak] 2012-01-19 02:53:44 PST
https://hg.mozilla.org/mozilla-central/rev/0fcee05b6e0a
Comment 29 Bob Clary [:bc:] 2012-01-20 06:07:26 PST
Thanks for this fix. In my testing building on Windows 7 32 bit, an initial checkout then build failed with the conftest error but subsequent 4 builds completed successfully. I then clobbered and did 5 more builds checking out each time and they all succeeded. This is much better than previously which would failed multiple times.
Comment 30 Justin Lebar (not reading bugmail) 2012-01-20 09:50:13 PST
I just saw this error on my Windows 7 VM, hg rev 49936b49.
Comment 31 :Ehsan Akhgari (busy, don't ask for review please) 2012-01-20 13:15:54 PST
Can you guys please submit the output of the configure script when it failed?  Also, the contents of config.log would be really helpful as well.
Comment 32 Justin Lebar (not reading bugmail) 2012-01-20 13:18:02 PST
I'll post it if I see it happen again!
Comment 33 Bob Clary [:bc:] 2012-01-22 13:56:21 PST
Created attachment 590586 [details]
conf.tar

Windows 7 64bit doing 32 bit build.

contains config.log and the output of configure in configure.log

LINK : fatal error LNK1104: cannot open file 'conftest.exe'
Comment 34 Bob Clary [:bc:] 2012-01-22 14:12:58 PST
Created attachment 590589 [details]
conf-2.tar

different errors on attempt to just repeat the checkout build.
Comment 35 Sergey 2012-06-15 03:57:03 PDT
This happens to me most of the time on Win 7 32-bit, I've only built Nighly like two or three times and then it stopped working
Comment 36 josh 2013-04-14 02:58:45 PDT
I start the "Application Experience" service on my windows7,then the error disappear; By default ,the service is forbidden
Comment 37 Quentin Raynaud 2013-04-14 03:16:08 PDT
(In reply to josh from comment #36)
> I start the "Application Experience" service on my windows7,then the error
> disappear; By default ,the service is forbidden

Yes, to understand this, see my #16 comment, then look at #24 comment for a better solution.
Comment 38 Asif 2013-11-06 10:39:42 PST
Status: This failure is reproducible even after disabling the following services in Windows 8:
- Applciation Experience.
- Program Compatibility Assistant Service.
- Windows Search.

    Host: Ubuntu 12.04
Platform: CYGWIN_NT-6.2-WOW64 windows8 1.7.25(0.270/5/3) inside VirtualBox 4.2.18-88780
 Program: idutils-4.6
   Issue: Doing a configure (autoconf tools) causes Windows 8 to crash (inside VirtualBox). The last thing that configure was doing:

configure:21871: gcc -std=gnu99 -o conftest.exe -g -O2   conftest.c  >&5
configure:21871: $? = 0
configure:21871: result: yes
configure:21880: checking whether getcwd aborts when 4k < cwd_length < 16k
configure:21999: gcc -std=gnu99 -o conftest.exe -g -O2   conftest.c  >&5
configure:21999: $? = 0
configure:21999: ./conftest.exe
Comment 39 neil@parkwaycc.co.uk 2014-07-06 16:54:26 PDT
I've started seeing regular configure failures due to pending deletes, although I haven't tried all of the suggestions in this bug yet.
Comment 40 Kenneth Long 2015-05-24 16:42:49 PDT
Make sure Application Experience is Automatic and Running. Tbe timeout based on whatever this service does accounts for it. Solved this issue for me immediately. I was seeing this all the time, no other way around it.

Note You need to log in before you can comment on or make changes to this bug.