Closed Bug 970846 Opened 10 years ago Closed 10 years ago

[mozfile] mozfile.remove() still fails with "Error 2: File not found" when trying to modify file permissions

Categories

(Testing :: Mozbase, defect)

All
Windows 8.1
defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

Attachments

(2 files, 1 obsolete file)

When I run mozmill with the latest version of mozfile I got the following failure because mozprocess returned too early (before the application was shutdown) and tried to remove the profile exactly at the same time when Firefox did:

TEST-START | testPreferences_masterPassword\test3.js | testRemoveMasterPassword
TEST-PASS | testPreferences_masterPassword\test3.js | testRemoveMasterPassword
TEST-START | testPreferences_masterPassword\test3.js | teardownModule
TEST-END | testPreferences_masterPassword\test3.js | finished in 622ms
TEST-SKIPPED | test1.js | run-if: os == 'mac'
TEST-SKIPPED | test2.js | run-if: os == 'mac'
TEST-SKIPPED | test3.js | run-if: os == 'mac'
TEST-SKIPPED | test4.js | run-if: os == 'mac'
TEST-SKIPPED | test5.js | run-if: os == 'mac'
Retrying to remove "c:\users\mozauto\appdata\local\temp\tmpd2ryb_.mozrunner". Re                          ason: Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen P                          rozess verwendet wird (13)
TEST-UNEXPECTED-FAIL | Disconnect Error: Application unexpectedly closed
RESULTS | Passed: 28
RESULTS | Failed: 0
RESULTS | Skipped: 11
Traceback (most recent call last):
  File "c:\mozilla\mozmill-env\python\lib\site-packages\mozmill\__init__.py", li                          ne 798, in run
    mozmill.run(tests, self.options.restart)
  File "c:\mozilla\mozmill-env\python\lib\site-packages\mozmill\__init__.py", li                          ne 411, in run
    frame = self.run_test_file(frame or self.start_runner(),
  File "c:\mozilla\mozmill-env\python\lib\site-packages\mozmill\__init__.py", li                          ne 316, in start_runner
    self.runner.reset()
  File "c:\mozilla\mozmill-env\python\lib\site-packages\mozrunner\base.py", line                           155, in reset
    self.profile.reset()
  File "c:\mozilla\mozmill-env\python\lib\site-packages\mozprofile\profile.py",                           line 135, in reset
    self.cleanup()
  File "c:\mozilla\mozmill-env\python\lib\site-packages\mozprofile\profile.py",                           line 129, in cleanup
    mozfile.remove(self.profile)
  File "c:\mozilla\mozmill-env\python\lib\site-packages\mozfile\mozfile.py", lin                          e 199, in remove
    update_permissions(os.path.join(root, entry))
  File "c:\mozilla\mozmill-env\python\lib\site-packages\mozfile\mozfile.py", lin                          e 185, in update_permissions
    os.chmod(path, mode)
WindowsError: [Error 2] Das System kann die angegebene Datei nicht finden: 'c:\\                          users\\mozauto\\appdata\\local\\temp\\tmpd2ryb_.mozrunner\\places.sqlite-shm'

We should make the method more robust by replacing the if not exist check with a try / except.
Assignee: nobody → hskupin
Blocks: 970594
Status: NEW → ASSIGNED
Whiteboard: [mozmill-2.1+]
Summary: [mozfile] mozfile.remove() still fails with Error 2: File not found → [mozfile] mozfile.remove() still fails with "Error 2: File not found" when trying to modify file permissions
Attached patch Patch v1 (obsolete) — Splinter Review
Patch which should fix the problem with the unexpected exception. I will get it tested on a Windows system before asking for review. I have no idea how to test this given that it would need very precise timing with a thread.
Attached patch Patch v1.1Splinter Review
Actually we should also wrap the call to os.chmod into _call_with_windows_retry(). Only that way we can make sure to have at least a fallback, if the file we are trying to modify is still in use. Also I have updated the try/except code in the while loop so that we no longer report a wrong error if the file is not present anymore. This saves us another 0.5s.
Attachment #8373982 - Attachment is obsolete: true
Attachment #8374130 - Flags: review?(ahalberstadt)
Comment on attachment 8374130 [details] [diff] [review]
Patch v1.1

Review of attachment 8374130 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, makes sense. Did you manage to test it to see if it works?
Attachment #8374130 - Flags: review?(ahalberstadt) → review+
In my last patch i missed to change the path placeholder in the error message to args, so I did this now. Also I included the function name which actually get called. Otherwise you will also see 'Retrying to remove...' when doing an os.chmod(). Let me know which way you would like. Personally I prefer this one, even when it includes a bit more techie wording.

The output looks like:

remove() failed for ('c:\\users\\mozauto\\appdata\\local\\temp\\tmp0x7isk\\foo\\bar.txt',). Reason: The process cannot access the file because it is being used by another process (13). Retrying...
Attachment #8374286 - Flags: review?(ahalberstadt)
Attachment #8374286 - Flags: feedback?(wlachance)
Comment on attachment 8374286 [details] [diff] [review]
Patch v1.1 (alternative)

Review of attachment 8374286 [details] [diff] [review]:
-----------------------------------------------------------------

This one looks fine to me.
Attachment #8374286 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8374286 [details] [diff] [review]
Patch v1.1 (alternative)

Review of attachment 8374286 [details] [diff] [review]:
-----------------------------------------------------------------

So this looks ok, but I'm not convinced the unit test is actually testing the right thing. Did you run the test against an unmodified copy of mozfile to make sure it fails?

::: mozfile/tests/test_remove.py
@@ +23,4 @@
>      os.chmod(path, mode & ~stat.S_IWUSR & ~stat.S_IWGRP & ~stat.S_IWOTH)
>  
>  
> +class FileOpenCloseThread(threading.Thread):

I would like to see this launch a new program which opens the file, rather than a seperate thread. That more accurately simulates the error condition here.
Attachment #8374286 - Flags: feedback?(wlachance) → feedback-
(In reply to William Lachance (:wlach) from comment #6)
> So this looks ok, but I'm not convinced the unit test is actually testing
> the right thing. Did you run the test against an unmodified copy of mozfile
> to make sure it fails?

William, as I have mentioned earlier on this bug, I have no idea how to test this. The timing is so crucial that we cannot have another thread or process, which removes the file exactly at the right time. It's just a some milliseconds window. I even haven't seen this problem again on the affected surface tablet. So I extended the tests to make at least sure that our retry logic is working, and that we are not failing. 

> > +class FileOpenCloseThread(threading.Thread):
> 
> I would like to see this launch a new program which opens the file, rather
> than a seperate thread. That more accurately simulates the error condition
> here.

Why? I don't think it matters if a thread or another program opens the file. The thing is that it has to be opened to block remove() from actually deleting the file on Windows. And that's exactly what is happening here. Why do we have to over-complicate the mozfile tests and do that via an external program? As you told on IRC yesterday, we might already spend too much time on writing unit tests for mozbase. Here I don't see a reason for another program. If I should be wrong, please tell me why it would matter.

An option which might help is to use the Process class from the multiprocessing library: http://docs.python.org/dev/library/multiprocessing.html.
Flags: needinfo?(wlachance)
(In reply to Henrik Skupin (:whimboo) from comment #7)
> (In reply to William Lachance (:wlach) from comment #6)
> > So this looks ok, but I'm not convinced the unit test is actually testing
> > the right thing. Did you run the test against an unmodified copy of mozfile
> > to make sure it fails?
> 
> William, as I have mentioned earlier on this bug, I have no idea how to test
> this. The timing is so crucial that we cannot have another thread or
> process, which removes the file exactly at the right time. It's just a some
> milliseconds window. I even haven't seen this problem again on the affected
> surface tablet. So I extended the tests to make at least sure that our retry
> logic is working, and that we are not failing. 

Well, if the test doesn't fail if you remove the patch, then it's not really testing the case we care about (i.e. if we broke this functionality again we wouldn't notice). Under this circumstance, I would rather just not include the test. I haven't looked at the problem in great deal, but it wouldn't surprise me if some of these timing issues are near impossible to reproduce.

> > > +class FileOpenCloseThread(threading.Thread):
> > 
> > I would like to see this launch a new program which opens the file, rather
> > than a seperate thread. That more accurately simulates the error condition
> > here.
> 
> Why? I don't think it matters if a thread or another program opens the file.
> The thing is that it has to be opened to block remove() from actually
> deleting the file on Windows. And that's exactly what is happening here. Why
> do we have to over-complicate the mozfile tests and do that via an external
> program? As you told on IRC yesterday, we might already spend too much time
> on writing unit tests for mozbase. Here I don't see a reason for another
> program. If I should be wrong, please tell me why it would matter.
> 
> An option which might help is to use the Process class from the
> multiprocessing library:
> http://docs.python.org/dev/library/multiprocessing.html.

Well, I don't think it would be that much more complicated... we could indeed use multiprocessing here. But it's a bit of a moot point if we're not going to include the test, which IMO we shouldn't.
Flags: needinfo?(wlachance)
(In reply to William Lachance (:wlach) from comment #8)
> Well, if the test doesn't fail if you remove the patch, then it's not really
> testing the case we care about (i.e. if we broke this functionality again we
> wouldn't notice). Under this circumstance, I would rather just not include
> the test. I haven't looked at the problem in great deal, but it wouldn't
> surprise me if some of these timing issues are near impossible to reproduce.

The test here indeed does not test the chmod() case, but it does for remove() which hasn't gotten a test so far. It means we would test the retry feature in case the files to be deleted are locked by another application. So I would absolutely include the tests.

> Well, I don't think it would be that much more complicated... we could
> indeed use multiprocessing here. But it's a bit of a moot point if we're not
> going to include the test, which IMO we shouldn't.

Please see my above comment. So I can change those lines from Thread to Process and we should be fine here?
(In reply to Henrik Skupin (:whimboo) from comment #9)
> (In reply to William Lachance (:wlach) from comment #8)
> > Well, if the test doesn't fail if you remove the patch, then it's not really
> > testing the case we care about (i.e. if we broke this functionality again we
> > wouldn't notice). Under this circumstance, I would rather just not include
> > the test. I haven't looked at the problem in great deal, but it wouldn't
> > surprise me if some of these timing issues are near impossible to reproduce.
> 
> The test here indeed does not test the chmod() case, but it does for
> remove() which hasn't gotten a test so far. It means we would test the retry
> feature in case the files to be deleted are locked by another application.
> So I would absolutely include the tests.

Ok, well if it tests *some* case that we care about it is still useful. Could make sure the test fails if you disable the code that retries with remove?

> > Well, I don't think it would be that much more complicated... we could
> > indeed use multiprocessing here. But it's a bit of a moot point if we're not
> > going to include the test, which IMO we shouldn't.
> 
> Please see my above comment. So I can change those lines from Thread to
> Process and we should be fine here?

I would have to try to be absolute sure, but I think so.
(In reply to William Lachance (:wlach) from comment #10)
> Ok, well if it tests *some* case that we care about it is still useful.
> Could make sure the test fails if you disable the code that retries with
> remove?

You could simply replace the _call_with_windows_retry() calls with the inner methods like:

_call_with_windows_retry(os.remove, (path,)) => os.remove(path)

And you will see that we fail in removing the file. The above is one example. We have another one for directories which uses shutil.rmtree(path).

> I would have to try to be absolute sure, but I think so.

I will check that now, and have an updated patch up in a bit.
By using the multiprocessing.Process class I get failures like:

test_removing_open_file_with_retry (test_remove.MozfileRemoveTestCase)
Test removing a file in use with retry ... Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\mozilla\mozmill-env\python\Lib\multiprocessing\forking.py", line 374, in main
    self = load(from_parent)
  File "C:\mozilla\mozmill-env\python\Lib\pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "C:\mozilla\mozmill-env\python\Lib\pickle.py", line 858, in load
    dispatch[key](self)
  File "C:\mozilla\mozmill-env\python\Lib\pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "C:\mozilla\mozmill-env\python\Lib\pickle.py", line 1124, in find_class
    __import__(module)
ImportError: No module named test_remove

I don't think that it is worth diving into this problem and get it fixed, when Threading is perfectly doing the job here. I think we should scratch the process idea for now.
As discussed on irc, let's land this as-is for now, and plan to look into this in more detail later (re: windows locking semantics, etc.).
https://github.com/mozilla/mozbase/commit/b948e278fe02506337fa472c74879c017ba20db9

Keeping in-testsuite? so it stays in the requests list, and we can add a test later for this particular issue.
Blocks: 971796
No longer blocks: 970594
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Whiteboard: [mozmill-2.1+] → [mozmill-2.0.6+]
No longer blocks: 971796
Whiteboard: [mozmill-2.0.6+]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: