Closed Bug 944684 Opened 11 years ago Closed 11 years ago

[mozfile] Be more robust in case of WindowsError thrown when trying to remove files

Categories

(Testing :: Mozbase, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

()

Details

(Whiteboard: [qa-automation-blocked])

Attachments

(2 files, 3 obsolete files)

As seen on bug 944366 we fail in removing an add-on because it is still in use. As of now we only catch IOError conditions, which was a suggestion from Jeff. But as it looks like at least on Windows we see another exception which is WindowsError:

04:16:45 WindowsError: [Error 5] Access is denied: u'c:\\jenkins\\workspace\\mozilla-aurora_addons\\data\\profile\\extensions\\{a6fd85ed-e919-4a43-a5af-8da18bda539f}.xpi'

Whatever it is caused by we should catch that and at least write out a warning to the console.

I have no idea if there could be other exceptions. So my question would be, which exceptions we really have to catch? The catch all I implemented first here, was not so welcome. So would like to reach out what's necessary here.
Flags: needinfo?(jhammel)
Flags: needinfo?(ahalberstadt)
I don't know about exceptions, but I think using mozfile.rmtree() instead of shutil.rmtree() should fix the windows problem.
Flags: needinfo?(ahalberstadt)
This will not work because it isn't a directory but a file. So we might want to introduce a new method to mozfile called `remove()` in synonym to `os.remove()`, or we extend `mozfile.rmtree()` to also be able to handle single files.

  File "/mozilla/code/mozbase/mozprofile/mozprofile/addons.py", line 371, in remove_addon
    mozfile.rmtree(path)
  File "/mozilla/code/mozbase/mozfile/mozfile/mozfile.py", line 168, in rmtree
    for name in os.listdir(dir):
OSError: [Errno 20] Not a directory: '/tmp/tmpR2NBYf.mozrunner/extensions/{a6fd85ed-e919-4a43-a5af-8da18bda539f}.xpi'

What do you think?
Flags: needinfo?(jhammel)
(In reply to Andrew Halberstadt [:ahal] from comment #1)
> I don't know about exceptions, but I think using mozfile.rmtree() instead of
> shutil.rmtree() should fix the windows problem.

Agreed. That said, there are numerous deficiencies in mozfile.rmtree that should be fixed
(In reply to Henrik Skupin (:whimboo) from comment #2)
> This will not work because it isn't a directory but a file. So we might want
> to introduce a new method to mozfile called `remove()` in synonym to
> `os.remove()`, or we extend `mozfile.rmtree()` to also be able to handle
> single files.
> 
>   File "/mozilla/code/mozbase/mozprofile/mozprofile/addons.py", line 371, in
> remove_addon
>     mozfile.rmtree(path)
>   File "/mozilla/code/mozbase/mozfile/mozfile/mozfile.py", line 168, in
> rmtree
>     for name in os.listdir(dir):
> OSError: [Errno 20] Not a directory:
> '/tmp/tmpR2NBYf.mozrunner/extensions/{a6fd85ed-e919-4a43-a5af-8da18bda539f}.
> xpi'
> 
> What do you think?

My vote, though it does not have API parity with the stdlib, is to have a function, `mozfile.remove()` that will remove a file or directory (i.e. recursively).  A legacy skeleton mozfile.rmtree could point to this for backwards compatability temporarily.
That sounds perfect. I will implement that then.
Assignee: nobody → hskupin
Status: NEW → ASSIGNED
So before I'm starting to fix that I tried to reproduce this problem. It was kinda hard but finally I'm able to on a single specific machine. Not sure yet why it doesn't fail the same way on the other machines with the same setup. I will dive into and figure out why specifically we fail here in removing this addon.
Ok, so the underlying issue we are facing here is a timing related access problem. When the application gets shutdown it looks like that there is still a process or thread, which holds this add-on XPI file open for a couple more milliseconds before releasing it. 

Jeff, could this be an issue with mozprocess? I mean that we do not really wait until the application process is completely gone? I will try to dig deeper but I would appreciate your feedback here. As it looks like there are two distinct problems to fix.
Flags: needinfo?(jhammel)
I can reproduce this bug with mozrunner plain by running the following command line on this affected Windows 7 64bit machine: 

mozrunner --addon http://release.seleniumhq.org/selenium-ide/editor/2.3.0/selenium-ide-editor-2.3.0.xpi -b %path_to_aurora%
Jeff, when I add a sleep(0.1) after the call to self.process_handler.wait() all is fine:
https://github.com/mozilla/mozbase/blob/master/mozrunner/mozrunner/runner.py#L83

So I might really have to file a new bug for mozprocess here.
This is the log of procmon on that affected machine. It shows that Firefox has been closed the file about 1s earlier but somehow the lock has not been released yet when mozprofile tries to remove the add-on.

Jeff, as suggested here we can and most likely want to add the mozfile.remove() method to retry in those cases. But my question is if we should better also have a small sleep call in mozprocess to ensure that any other lock has been released.
Attached patch Patch v1 (obsolete) — Splinter Review
You might be scared when I change all of those instances now, but I think it is a good time to actually do it. There might be a lot of other cases where shutil.rmtree() or os.remove() can fail.
Attachment #8343768 - Flags: review?(jhammel)
I also put up a PR (https://github.com/mozilla/mozbase/pull/52) just to get the travis results, so I can be sure I don't break anything. And the last run of Travis was successful.
Looks like I still have to do the version bump together with this patch given the API change.
Attached patch Patch v1.1 (obsolete) — Splinter Review
Updated patch with version bumps and dependency updates regarding the API change in mozfile. I hope that I got all of them.
Attachment #8343768 - Attachment is obsolete: true
Attachment #8343768 - Flags: review?(jhammel)
Attachment #8343787 - Flags: review?(jhammel)
Comment on attachment 8343787 [details] [diff] [review]
Patch v1.1

Review of attachment 8343787 [details] [diff] [review]:
-----------------------------------------------------------------

Not sure if Jeff is around today. So also requesting review from Andrew.
Attachment #8343787 - Flags: review?(ahalberstadt)
Flags: needinfo?(jhammel)
Comment on attachment 8343787 [details] [diff] [review]
Patch v1.1

Please clear the old review if you want to change it, otherwise we might both accidentally review it :)
Attachment #8343787 - Flags: review?(jhammel)
Comment on attachment 8343787 [details] [diff] [review]
Patch v1.1

Review of attachment 8343787 [details] [diff] [review]:
-----------------------------------------------------------------

r- because you need to add a mozfile dependency to mozrunner. I also had a question about the Access denied case. But other than that looks good.

::: mozfile/mozfile/mozfile.py
@@ +130,2 @@
>  
> +    remove(dir)

Return this call. I know remove returns None at the moment, but might as well future proof it in case that changes.

@@ +151,5 @@
> +            try:
> +                func(path)
> +                break
> +            except WindowsError as e:
> +                # Error   5 == Access is denied

I don't understand why you added this. Don't we chmod the files before calling this method? Do we ever expect a timeout/retry to fix an Access denied issue?

::: mozprofile/tests/bug758250.py
@@ +16,5 @@
>      https://bugzilla.mozilla.org/show_bug.cgi?id=758250
>      """
>  
> +    def setUp(self):
> +        self.tmpdir = tempfile.mktemp()

mktemp is deprecated in favour of mkstemp or mkdtemp. Might as well fix it if you're in the area.

@@ +20,5 @@
> +        self.tmpdir = tempfile.mktemp()
> +
> +    def tearDown(self):
> +        # remove vestiges
> +        shutil.rmtree(self.tmpdir)

Not mozfile.remove()?

::: mozrunner/setup.py
@@ +15,4 @@
>          'mozinfo >= 0.7',
>          'mozlog >= 1.3',
>          'mozprocess >= 0.13',
> +        'mozprofile >= 0.18',

Technically I don't think it is necessary to bump these as their API's did not change.. but I guess it doesn't hurt either. But you need to add a dependency on mozfile >= 1.0 now.
Attachment #8343787 - Flags: review?(ahalberstadt) → review-
Comment on attachment 8343787 [details] [diff] [review]
Patch v1.1

Review of attachment 8343787 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozfile/mozfile/mozfile.py
@@ +130,2 @@
>  
> +    remove(dir)

Sure. Will add it.

@@ +151,5 @@
> +            try:
> +                func(path)
> +                break
> +            except WindowsError as e:
> +                # Error   5 == Access is denied

If that also gets thrown for permission issues, we have multiple cases which we cannot easily differentiate. In my case the file had a lock handle set by Firefox and we tried to delete the file too early while Firefox still hasn't released the handle. 

Shall I add back the chmod() lines? The comment was kinda old and only talked about Windows NT. I searched the web but didn't found anything for more recent Windows versions. So I assumed shutil.rmtree() will do the right thing. Or is that not the case?

::: mozprofile/tests/bug758250.py
@@ +20,5 @@
> +        self.tmpdir = tempfile.mktemp()
> +
> +    def tearDown(self):
> +        # remove vestiges
> +        shutil.rmtree(self.tmpdir)

We can but I left it out for now in case our method is broken in some unknown way. Shall I do the change? I will happily do so.

::: mozrunner/setup.py
@@ +15,4 @@
>          'mozinfo >= 0.7',
>          'mozlog >= 1.3',
>          'mozprocess >= 0.13',
> +        'mozprofile >= 0.18',

Makes sense. I will add it.
Once i have updated everything I will trigger a tryserver run  on mozilla-central to have a better understanding of possible regression vectors caused by this patch.
(In reply to Henrik Skupin (:whimboo) from comment #18) 
> If that also gets thrown for permission issues, we have multiple cases which
> we cannot easily differentiate. In my case the file had a lock handle set by
> Firefox and we tried to delete the file too early while Firefox still hasn't
> released the handle. 
> 
> Shall I add back the chmod() lines? The comment was kinda old and only
> talked about Windows NT. I searched the web but didn't found anything for
> more recent Windows versions. So I assumed shutil.rmtree() will do the right
> thing. Or is that not the case?

No don't add it back since there's a comment already saying it's probably redundant. I didn't realize that was getting thrown for a lock. I'm not sure what normally gets thrown for permissions, but either way we should make sure that we have permissions beforehand (which I think we are doing).

> We can but I left it out for now in case our method is broken in some
> unknown way. Shall I do the change? I will happily do so.

No that's fine.

> ::: mozrunner/setup.py
> @@ +15,4 @@
> >          'mozinfo >= 0.7',
> >          'mozlog >= 1.3',
> >          'mozprocess >= 0.13',
> > +        'mozprofile >= 0.18',
> 
> Makes sense. I will add it.
Attached patch Patch v2 (obsolete) — Splinter Review
Updated patch which includes the review comments. I will push this now to try to check for further regressions.
Attachment #8343787 - Attachment is obsolete: true
Attachment #8344668 - Flags: review+
Actually I missed to add the new files which were added since the last merge of mozbase to m-c, so I got build errors.  I retriggered try via https://tbpl.mozilla.org/?tree=Try&rev=64a7010b30d7.
Attached patch Patch v2.1Splinter Review
Good that I run the tests on try. I have seen a failure across platforms in one of our tests, which I haven't run locally after making the change from mktemp() to mkdtemp():

https://tbpl.mozilla.org/php/getParsedLog.php?id=31694606&tree=Try#error0

Traceback (most recent call last):
  File "/builds/slave/try-l64_g-00000000000000000000/build/testing/mozbase/mozprofile/tests/bug758250.py", line 35, in test_profile_addon_cleanup
    shutil.copytree(empty, self.tmpdir)
  File "/tools/python27/lib/python2.7/shutil.py", line 175, in copytree
    os.makedirs(dst)
  File "/builds/slave/try-l64_g-00000000000000000000/build/obj-firefox/_virtualenv/lib/python2.7/os.py", line 157, in makedirs
    mkdir(name, mode)
OSError: [Errno 17] File exists: '/tmp/tmpUEYQQw'

So while mktemp() is not creating the temporary directory it is done by mkdtemp(). This causes shutil.copytree() to fail because we do not delete the folder before. With this patch all is green now.
Attachment #8344668 - Attachment is obsolete: true
Attachment #8344844 - Flags: review+
Pushed to master as:
https://github.com/mozilla/mozbase/commit/8d7fd63824065728aec53544ade26f0511f6fa8b

I will create tags now and push all affected packages to pypi.
Summary: [mozprofile] Be more robust in case of WindowsError thrown when trying to remove an add-on → [mozfile] Be more robust in case of WindowsError thrown when trying to remove files
I created the appropriate tags and pushed all packages to pypi. So we are good here.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Blocks: 948120
Hmm, that last try run is completely busted! :p

Anyway, I'm seeing the same results here:
https://tbpl.mozilla.org/php/getParsedLog.php?id=31893804&tree=Try

Looks like there are two separate issues, the reftests and the windows builds. The non-windows builds might also be related to this patch.
Blocks: 949600
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I think something is weird in the test of mozinstall:

binary_exe = os.path.join(installdir_exe, 'firefox', 'firefox',
                                      'firefox.exe')

I will have a look at it when I'm back in about 1h.
Status: REOPENED → ASSIGNED
(In reply to Andrew Halberstadt [:ahal] from comment #27)
> Anyway, I'm seeing the same results here:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=31893804&tree=Try
> 
> Looks like there are two separate issues, the reftests and the windows
> builds. The non-windows builds might also be related to this patch.

The regression you are seeing here came in via a patch from yourself. I filed it as bug 950227.

This bug is fixed and didn't cause a regression so far.
No longer blocks: 949600
Status: ASSIGNED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Depends on: 950227
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: