Closed Bug 1185756 Opened 9 years ago Closed 9 years ago

"WindowsError: [Error 32] The process cannot access the file because it is being used by another process" download_manager.py:167 in _download()

Categories

(Testing :: mozregression, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: parkouss)

References

Details

Attachments

(3 files, 2 obsolete files)

[~/tmp]$ mozregression --good 2014-08-01 --profile ./foo --bits 32

Interrupted.
 0:00.35 LOG: MainThread INFO No 'bad' date specified, using 2015-07-20
 0:03.30 LOG: MainThread Bisector INFO Downloading build from: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2015/01/2015
-01-25-03-02-02-mozilla-central/firefox-38.0a1.en-US.win32.zip
===== Downloaded 100% =====
 0:20.27 LOG: MainThread Test Runner INFO Running nightly for 2015-01-25
 0:22.15 LOG: MainThread Test Runner INFO Launching c:\users\ed\appdata\local\temp\tmpxuirnl\firefox\firefox.exe
 0:22.16 LOG: MainThread mozversion INFO application_buildid: 20150125030202
 0:22.16 LOG: MainThread mozversion INFO application_changeset: c18776175a69
 0:22.16 LOG: MainThread mozversion INFO application_display_name: Nightly
 0:22.16 LOG: MainThread mozversion INFO application_id: {ec8030f7-c20a-464f-9b0e-13a3a9e97384}
 0:22.16 LOG: MainThread mozversion INFO application_name: Firefox
 0:22.16 LOG: MainThread mozversion INFO application_remotingname: firefox
 0:22.16 LOG: MainThread mozversion INFO application_repository: https://hg.mozilla.org/mozilla-central
 0:22.17 LOG: MainThread mozversion INFO application_vendor: Mozilla
 0:22.17 LOG: MainThread mozversion INFO application_version: 38.0a1
 0:22.17 LOG: MainThread mozversion INFO platform_buildid: 20150125030202
 0:22.17 LOG: MainThread mozversion INFO platform_changeset: c18776175a69
 0:22.17 LOG: MainThread mozversion INFO platform_repository: https://hg.mozilla.org/mozilla-central
 0:22.17 LOG: MainThread mozversion INFO platform_version: 38.0a1
Was this nightly build good, bad, or broken? (type 'good', 'bad', 'skip', 'retry' or 'exit' and press Enter): Exception in thread
Thread-10:
Traceback (most recent call last):
  File "c:\mozilla-build\python\lib\threading.py", line 810, in __bootstrap_inner
    self.run()
  File "c:\mozilla-build\python\lib\threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "c:\mozilla-build\python\lib\site-packages\mozregression\download_manager.py", line 167, in _download
    os.rename(temp_dest, dest)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process

Exception in thread Thread-13:
Traceback (most recent call last):
  File "c:\mozilla-build\python\lib\threading.py", line 810, in __bootstrap_inner
    self.run()
  File "c:\mozilla-build\python\lib\threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "c:\mozilla-build\python\lib\site-packages\mozregression\download_manager.py", line 167, in _download
    os.rename(temp_dest, dest)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process


Was this nightly build good, bad, or broken? (type 'good', 'bad', 'skip', 'retry' or 'exit' and press Enter): g
 3:31.71 LOG: MainThread Bisector INFO Narrowed nightly regression window from [2014-08-01, 2015-07-20] (353 days) to [2015-01-25,
 2015-07-20] (176 days) (~7 steps left)
 3:31.71 LOG: MainThread Bisector INFO Downloading build from: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2015/04/2015
-04-23-03-02-04-mozilla-central/firefox-40.0a1.en-US.win32.zip
===== Downloaded 100% =====
 3:52.58 LOG: MainThread Test Runner INFO Running nightly for 2015-04-23
 3:54.66 LOG: MainThread Test Runner INFO Launching c:\users\ed\appdata\local\temp\tmp6golqe\firefox\firefox.exe
 3:54.68 LOG: MainThread mozversion INFO application_buildid: 20150423030204
 3:54.68 LOG: MainThread mozversion INFO application_changeset: 0b202671c9e2
 3:54.68 LOG: MainThread mozversion INFO application_display_name: Nightly
 3:54.68 LOG: MainThread mozversion INFO application_id: {ec8030f7-c20a-464f-9b0e-13a3a9e97384}
 3:54.68 LOG: MainThread mozversion INFO application_name: Firefox
 3:54.68 LOG: MainThread mozversion INFO application_remotingname: firefox
 3:54.68 LOG: MainThread mozversion INFO application_repository: https://hg.mozilla.org/mozilla-central
 3:54.68 LOG: MainThread mozversion INFO application_vendor: Mozilla
 3:54.69 LOG: MainThread mozversion INFO application_version: 40.0a1
 3:54.69 LOG: MainThread mozversion INFO platform_buildid: 20150423030204
 3:54.69 LOG: MainThread mozversion INFO platform_changeset: 0b202671c9e2
 3:54.69 LOG: MainThread mozversion INFO platform_repository: https://hg.mozilla.org/mozilla-central
 3:54.69 LOG: MainThread mozversion INFO platform_version: 40.0a1
Was this nightly build good, bad, or broken? (type 'good', 'bad', 'skip', 'retry', 'back' or 'exit' and press Enter): Exception in
 thread Thread-21:
Traceback (most recent call last):
  File "c:\mozilla-build\python\lib\threading.py", line 810, in __bootstrap_inner
    self.run()
  File "c:\mozilla-build\python\lib\threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "c:\mozilla-build\python\lib\site-packages\mozregression\download_manager.py", line 167, in _download
    os.rename(temp_dest, dest)
WindowsError: [Error 32] The process cannot access the file because it is being used by another process
crap!

I suspect this is the same thing handled by mozfile.remove - some windows process (antivirus probably) holding the file handle at the same time...

I don't think shutil.move handle this case, like shutil.rmtree. Time for a mozfile.move function ?
Flags: needinfo?(emorley)
(In reply to Julien Pagès from comment #2)
> I don't think shutil.move handle this case, like shutil.rmtree. Time for a
> mozfile.move function ?

Yeah something like https://hg.mozilla.org/mozilla-central/file/e7e69cc8c07b/testing/mozbase/mozfile/mozfile/mozfile.py#l157 but for rename sounds like a good idea :-)
Flags: needinfo?(emorley)
Depends on: 1185971
Assignee: nobody → j.parkouss
Status: NEW → ASSIGNED
Attachment #8638524 - Flags: review?(wlachance)
Comment on attachment 8638524 [details] [review]
fix intermittent windows error when renaming downloaded file

lgtm, thanks!
Attachment #8638524 - Flags: review?(wlachance) → review+
Merged in:

https://github.com/mozilla/mozregression/commit/6ee1990215c413859ca244643cae6e2e713fcb41
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Unfortunately, still seeing the error:

bug 1088020 comment 12
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
mozfile.move retry 5 times, waiting 0.5 seconds between each attempt - total of 2.5 seconds for the last attempt.

I suspect that some antivirus are too slow. They open the zip file, look into each file in there, etc. - and this can takes more than 2.5 seconds.

So, one way I can see to fix that is to add the arguments retry_max, retry_delay from the mozlog.move/mozlog.remove functions. And here we add more time, for example:

retry_max = 10
retry_delay = 0.5

Will, thoughts ?
Flags: needinfo?(wlachance)
(In reply to Julien Pagès from comment #8)
> mozfile.move retry 5 times, waiting 0.5 seconds between each attempt - total
> of 2.5 seconds for the last attempt.
> 
> I suspect that some antivirus are too slow. They open the zip file, look
> into each file in there, etc. - and this can takes more than 2.5 seconds.
> 
> So, one way I can see to fix that is to add the arguments retry_max,
> retry_delay from the mozlog.move/mozlog.remove functions. And here we add
> more time, for example:
> 
> retry_max = 10
> retry_delay = 0.5
> 
> Will, thoughts ?

Makes sense, though I think we might be able to do even better just by incorporating redo into mozfile:

https://github.com/bhearsum/redo

(just copy it in and start using it IMO)
Flags: needinfo?(wlachance)
Attached image mozreg move filehandle.png (obsolete) —
I don't have any anti-virus-stuff installed, and I had that error a couples of times too.
Windows' Search Index isn't looking into .tmp afaik, so I tried to check which files keep a handle on the file.

Surprisingly it's python.exe and firefox.exe(!). This is while the tested build's firefox is open.

Why's that?
Thanks Elbart for the comment. It looks like this needs more investigation before taking action here.

For reference, the root cause here was the patch introduced in bug 1176699:

https://github.com/mozilla/mozregression/commit/c32bd8f3910da61db78f1383bc97b8fca526e9a9
See Also: → 1176699
Running mozreg with debug-logging doesn't show anything useful.

In the picture above python.exe has a handle because I made the screenshot while the tmp-file was still downloading. Here's a proper screenshot after the download finished.

After the background-download finishes, only the firefox.exe-process of the tested build has a handle on the file.
Attachment #8643022 - Attachment is obsolete: true
I'm using mozregression 0.40 with Python 2.7.10 in Windows 7 x64 and I get a very similar error often repeated many times:

move() failed for "(u'<mozilla zip>')". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
I can confirm, there is a really strange bug that is easily reproducible:

 - start a bisection using background download (the default)
 - wait while the browser is launched the end of the background downloads

When the background download will finish, you will see the error described here. If you close the browser before that (saying good or bad for example), then no error.

I can also confirm - as Elbart mentioned - that with a file locker viewer I can see that this is firefox (the tested process) that is holding a handle on that file (file created in a thread from mozregression).

I must say that this does not make sense to me. I tried to rename the temporary file, nothing changes. there is always a handle hold with this use case on windows.

Well I think I will just backout the patch from bug 1176699. This is a bug that happen when the mozregression python process is killed abruptly while downloading and using the --persist flag. This is quite an advanced use case - I would recommend in that case to just remove the file(s) that are broken manually - maybe we could write that in the doc.

Will, what do you think ?
Flags: needinfo?(wlachance)
In case extraction of a broken file fails, which was the cause for bug 117669, can't mozregression try to download a fresh copy of the build once to replace the broken file?

This wouldn't require the tmp-file and it would ensure to get a proper copy of the archive without user-intervention in the off-chance of the local copy being broken.
(In reply to Elbart from comment #15)
> In case extraction of a broken file fails, which was the cause for bug
> 117669, can't mozregression try to download a fresh copy of the build once
> to replace the broken file?
> 
> This wouldn't require the tmp-file and it would ensure to get a proper copy
> of the archive without user-intervention in the off-chance of the local copy
> being broken.

Ah, I tried one more thing, and I can finally avoid this error!

The fix is to use tempfile.NamedTemporaryFile for the temp file creation instead of our raw open(tempfile, 'wb'). Not sure what NamedTemporaryFile does to open the file - but this works, I don't see the background exception anymore.

This will imply that we won't have .tmp files anymore in the download folder (python will determine a temp name somewhere else), and once downloaded the file will be moved in our download dir (I don't think anybody cares about the .tmp, I say this just to be clear).

Working on a patch soon. Clearing Will NI as this is no more required.
Flags: needinfo?(wlachance)
The tempfile lib uses a windows-specific feature to mark the file as "should be deleted on close", from what I can tell from:
https://github.com/python/cpython/blob/2.7/Lib/tempfile.py#L470

Guess this is what is avoiding the issue?
(In reply to Ed Morley [:emorley] from comment #17)
> The tempfile lib uses a windows-specific feature to mark the file as "should
> be deleted on close", from what I can tell from:
> https://github.com/python/cpython/blob/2.7/Lib/tempfile.py#L470
> 
> Guess this is what is avoiding the issue?

Well no, because I explicitly pass "delete=False" when creating the temp file to be able to move it once downloaded (ie, closed). But there is definitely some code in that file that fix the issue on my laptop.
It would be nice if somebody could give this patch a try and report a feedback to confirm it also works for him.

Thanks!
Attachment #8638524 - Attachment is obsolete: true
Attachment #8645423 - Flags: review?(wlachance)
Attached image padlock.png
(In reply to Julien Pagès from comment #19)
> Created attachment 8645423 [details] [review]
> WindowsError: [Error 32] The process cannot access the ...
> 
> It would be nice if somebody could give this patch a try and report a
> feedback to confirm it also works for him.
> 
> Thanks!

Works here, but the files now have a padlock-icon. Permissions-issue due to the copying/moving?

PS:
Earlier I found two links which may be related to the handle-problem:
https://bugs.python.org/issue19575
http://stackoverflow.com/questions/948119/preventing-file-handle-inheritance-in-multiprocessing-lib
Hmm, I don't have any permission issues with or without the --persist flag here (this is on windows 7).
no persist = no padlock-icon
persist-folder on C = padlock-icon
persist-folder on other volume = no padlock-icon.
Looks like a windows permission issue to me (have you checked the persist folder permissions or something ?). I only have a C volume, and it works well for me.

We do not have specific permission handling in the code. Ultimately, moving the file calls:

https://docs.python.org/2/library/shutil.html#shutil.move

which according to the doc just rename the file or copy it while preserving file metadata.

I'm sorry but I am not a windows expert, I don't think I can help much here...
(In reply to Julien Pagès from comment #23)
> Looks like a windows permission issue to me (have you checked the persist
> folder permissions or something ?).

The only difference is your patch.
No icons with mozregression-master with the same commandlines in- and outside of mozilla-build.
So, Elbart thanks for digging the links in comment 20. It seems to me that this shows our real issue, that is (from what I understand):

 - we open two files in a thread for writing (these are our next builds)
 - then we start the firefox process. It will automatically inherit the opened handles (in that case, opened in write mode)
 - at the end of the download, we try to move the files. If the tested firefox is still running, we still have opened handles on the files and windows will say no to that operation. On python side shutil.move first try to rename, then silently try to copy the dest and remove the origin if that fail. This is what we see (copy is fine, but the OS won't allow to remove the origin as the handle is still open).


I think using stuff from tempfile is the easiest way to go here. Plus it is mentioned in the doc of mkstemp (https://docs.python.org/2/library/tempfile.html#tempfile.mkstemp):

"""
The file descriptor is not inherited by child processes.
"""

So this is exactly what we want. Note that NamedTemporaryFile uses mkstemp. If it was not this issue with permissions (that I can't reproduce), I would have fixed this bug using NamedTemporaryFile.


I looked around and found http://www.howtogeek.com/howto/17117/remove-the-lock-icon-from-a-folder-in-windows-7/.

Elbart, can you look at it ? if the padlock icon is just there to say that only your user can edit the file, this is fine by me! or maybe we could try to resolve that from python later with a follow up bug, but this is not really annoying I think. The important part to me is that the created file must have good permissions for the user (should be able to copy, remove, etc).
Comment on attachment 8645423 [details] [review]
WindowsError: [Error 32] The process cannot access the ...

Patch looks good to me, though I haven't tested it.
Attachment #8645423 - Flags: review?(wlachance) → review+
Elbart, are you OK if I land that patch and close the bug ? You can reopen a bug for the padlock icon after that, but it seems to me that we should fix this bug now.

I don't like that warning and the fact that in persistent mode currently on windows temp files are not removed - the patch here should fix both, and it seems better to me even if we have a padlock icon in some cases (we can address that in a follow up).
Flags: needinfo?(elbart)
(In reply to Julien Pagès from comment #25)
> I looked around and found
> http://www.howtogeek.com/howto/17117/remove-the-lock-icon-from-a-folder-in-
> windows-7/.
> 
> Elbart, can you look at it ? if the padlock icon is just there to say that
> only your user can edit the file, this is fine by me! or maybe we could try
> to resolve that from python later with a follow up bug, but this is not
> really annoying I think. The important part to me is that the created file
> must have good permissions for the user (should be able to copy, remove,
> etc).


(In reply to Julien Pagès from comment #27)
> Elbart, are you OK if I land that patch and close the bug ? You can reopen a
> bug for the padlock icon after that, but it seems to me that we should fix
> this bug now.
> 
> I don't like that warning and the fact that in persistent mode currently on
> windows temp files are not removed - the patch here should fix both, and it
> seems better to me even if we have a padlock icon in some cases (we can
> address that in a follow up).

Yeah, fine by me.

I don't think the padlock is a python-issue but just something with the permissions of the %temp%-folder (usually C:\Users\<username>\AppData\Local\Temp).

I created a new VM of Win7 SP1, installed only MozillaBuild 2.0, installed fresh mozregression via pip, then compiled your winerr32-version within the MozillaBuild-shell and retested with and without persist.
Again, without persist there's no padlock, as the files stay in the %temp% folder, while with persist the files are moved out of that folder and show the padlock-icon.

No idea why you cannot reproduce it.

I can open and access the files, so whatever the issue is with the permissions due to the move, they aren't influencing mozregression as far as I can see.
Flags: needinfo?(elbart)
Ok, thanks a lot for the investigation! Maybe there is something on my system that prevent me to see this padlock icon.

So one thing I can easily try is to define the temp dir in which this temp file is created - I can use the one where builds are moved for example. I'll try that on my box, just to be sure that does not break anything (ie, no Error 32) and will land that instead. Maybe it will fix the padlock icon issue ?

Anyway, I will release a new mozregression version after that. If you still see the padlock-icon bug in next release, please open a bug and I'll look at it.
(In reply to Julien Pagès from comment #29)
> Ok, thanks a lot for the investigation! Maybe there is something on my
> system that prevent me to see this padlock icon.

Probably not.
What's your persist-folder?
When I use one outside (e.g. C:\temp) of my user-folder (C:\Users\<username\*), I get the padlock. And I guess that's how it's supposed to be.

When the folder is on a different volume, the file isn't moved, which preserves permissions, but copied, which isn't preserving permissions, afaik.

Just for fun I did a test by using

>shutil.copy(temp.name, dest)
>mozfile.remove(temp.name)

instead of

>mozfile.move(temp.name, dest)

And it worked, no padlock.

But I guess that's just overcomplicating things for the fringe case of having the persist-folder somewhere it doesn't really belong.
Apparently I did all my tests with an old version of your winerr32-branch, with

>with tempfile.NamedTemporaryFile(delete=False) as temp:

instead of

>with tempfile.NamedTemporaryFile(
>        delete=False,
>        suffix='.tmp',
>        dir=os.path.dirname(dest)) as temp:

No more padlocks with the new version. :)
I just fixed that yesterday. Excellent, Thanks for giving it a try! So you were right, the folder where the temp file is created is important, and that was the cause for the padlocks. And using the same folder as the final file works good. Great!

Merged in https://github.com/mozilla/mozregression/commit/1f2c4962483f2f7f7a71ee0561d1ecc2d5d6930e.

Will release a new version soon (probably today) with that fix, and another good one from bug 1192485.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: