Closed Bug 1001948 Opened 6 years ago Closed 5 years ago

Intermittent test_hosted_launch_no_registry.xul,test_packaged_launch.xul,test_packaged_launch_no_registry.xul,test_packaged_update_from_webapp_runtime.xul | Cleanup function threw exception: Win error 32 during operation remove on file ...

Categories

(Firefox Graveyard :: Web Apps, defect, P1)

x86
Windows 8
defect

Tracking

(firefox31 wontfix, firefox32 fixed, firefox33 fixed, firefox-esr24 unaffected)

RESOLVED FIXED
Firefox 33
Tracking Status
firefox31 --- wontfix
firefox32 --- fixed
firefox33 --- fixed
firefox-esr24 --- unaffected

People

(Reporter: RyanVM, Assigned: marco)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 1 obsolete file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=38542591&tree=Mozilla-Inbound

WINNT 6.2 mozilla-inbound opt test mochitest-other on 2014-04-25 22:19:28 PDT for push 169eaed7017b
slave: t-w864-ix-064

22:33:30     INFO -  5960 INFO TEST-START | chrome://mochitests/content/chrome/toolkit/webapps/tests/test_packaged_launch.xul
22:33:30     INFO -  5961 INFO TEST-INFO | dumping last 6 message(s)
22:33:30     INFO -  5962 INFO TEST-INFO | if you need more context, please use SimpleTest.requestCompleteLog() in your test
22:33:30     INFO -  5963 INFO TEST-PASS | chrome://mochitests/content/chrome/toolkit/webapps/tests/test_packaged_launch.xul | NativeApp object created
22:33:30     INFO -  5964 INFO TEST-PASS | chrome://mochitests/content/chrome/toolkit/webapps/tests/test_packaged_launch.xul | Profile directory created
22:33:30     INFO -  5965 INFO TEST-INFO | chrome://mochitests/content/chrome/toolkit/webapps/tests/test_packaged_launch.xul | Test installation
22:33:30     INFO -  5966 INFO TEST-PASS | chrome://mochitests/content/chrome/toolkit/webapps/tests/test_packaged_launch.xul | App launchable
22:33:30     INFO -  5967 INFO TEST-PASS | chrome://mochitests/content/chrome/toolkit/webapps/tests/test_packaged_launch.xul | webapprt executable is executable
22:33:30     INFO -  5968 INFO TEST-PASS | chrome://mochitests/content/chrome/toolkit/webapps/tests/test_packaged_launch.xul | App was launched and is still running
22:33:30     INFO -  5969 INFO TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/toolkit/webapps/tests/test_packaged_launch.xul | Cleanup function threw exception: Win error 32 during operation remove on file C:\Users\cltbld.T-W864-IX-064\Desktop\test_desktop_packaged_launch.lnk (The process cannot access the file because it is being used by another process.
22:33:30     INFO -  )
22:33:30     INFO -  5970 INFO TEST-INFO | MEMORY STAT vsize after test: 1063559168
22:33:30     INFO -  5971 INFO TEST-INFO | MEMORY STAT vsizeMaxContiguous after test: 1893662720
22:33:30     INFO -  5972 INFO TEST-INFO | MEMORY STAT residentFast after test: 322101248
22:33:30     INFO -  5973 INFO TEST-INFO | MEMORY STAT heapAllocated after test: 106660584
22:33:30     INFO -  5974 INFO TEST-END | chrome://mochitests/content/chrome/toolkit/webapps/tests/test_packaged_launch.xul | finished in 706ms
https://tbpl.mozilla.org/php/getParsedLog.php?id=38545154&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=38545875&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=38558570&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=38560822&tree=Mozilla-Inbound
Summary: Intermittent test_packaged_launch.xul | Cleanup function threw exception: Win error 32 during operation remove on file test_desktop_packaged_launch.lnk (The process cannot access the file because it is being used by another process. → Intermittent test_packaged_launch.xul,test_packaged_launch_no_registry.xul | Cleanup function threw exception: Win error 32 during operation remove on file ...
I think we can ignore these exceptions during the conclusive cleanup, because we cleanup again during test startup.
https://tbpl.mozilla.org/php/getParsedLog.php?id=38653391&tree=Mozilla-Central
Summary: Intermittent test_packaged_launch.xul,test_packaged_launch_no_registry.xul | Cleanup function threw exception: Win error 32 during operation remove on file ... → Intermittent test_hosted_launch_no_registry.xul,test_packaged_launch.xul,test_packaged_launch_no_registry.xul | Cleanup function threw exception: Win error 32 during operation remove on file ...
Attached patch Patch (obsolete) — Splinter Review
As I said in my previous comment, I think we can ignore these exceptions.
Assignee: nobody → mar.castelluccio
Status: NEW → ASSIGNED
Attachment #8416587 - Flags: review?(myk)
Component: DOM: Apps → Web Apps
Product: Core → Firefox
Comment on attachment 8416587 [details] [diff] [review]
Patch

r?Yoric for the two constants added in dom/system/OSFileConstants.cpp
Attachment #8416587 - Flags: review?(dteller)
Comment on attachment 8416587 [details] [diff] [review]
Patch

Review of attachment 8416587 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Perhaps the cleanup function should be put in a head.js?
P.S.: Nice new and shiny e-mail address, Marco.

::: toolkit/webapps/tests/test_hosted.xul
@@ +147,2 @@
>        }
> +    } catch (e) {

Since you're modernizing this code, could you add a comment here explaining:
- that we catch all errors on purpose;
- why this can happen;
- why we do that.

::: toolkit/webapps/tests/test_hosted_launch.xul
@@ +105,2 @@
>        }
> +    } catch (e) {

Here, too.

@@ +119,5 @@
>  
>        yield OS.File.remove(desktopShortcut, { ignoreAbsent: true });
>        yield OS.File.remove(startMenuShortcut, { ignoreAbsent: true });
> +    } catch (ex if ex instanceof OS.File.Error &&
> +             (ex.winLastError == OS.Constants.Win.ERROR_SUCCESS ||

I don't think it can throw ERROR_SUCCESS. Do you have any reason to believe that it could?

::: toolkit/webapps/tests/test_hosted_launch_no_registry.xul
@@ +105,2 @@
>        }
> +    } catch (e) {

As above.

@@ +122,5 @@
> +    } catch (ex if ex instanceof OS.File.Error &&
> +             (ex.winLastError == OS.Constants.Win.ERROR_SUCCESS ||
> +              ex.winLastError == OS.Constants.Win.ERROR_SHARING_VIOLATION)) {
> +      // Ignore errors while removing these files at the end of the test run
> +      // because we will remove them at the startup of the next run.

As above.

::: toolkit/webapps/tests/test_packaged_launch.xul
@@ +115,2 @@
>        }
> +    } catch (e) {

As above

::: toolkit/webapps/tests/test_packaged_launch_no_registry.xul
@@ +115,2 @@
>        }
> +    } catch (e) {

As above.
Attachment #8416587 - Flags: review?(dteller) → feedback+
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on May 8th-12th) from comment #33)
> P.S.: Nice new and shiny e-mail address, Marco.

It's a leftover in my Windows hgrc file :D
But I'll take that address back in a couple of months for my internship ;)

> @@ +119,5 @@
> >  
> >        yield OS.File.remove(desktopShortcut, { ignoreAbsent: true });
> >        yield OS.File.remove(startMenuShortcut, { ignoreAbsent: true });
> > +    } catch (ex if ex instanceof OS.File.Error &&
> > +             (ex.winLastError == OS.Constants.Win.ERROR_SUCCESS ||
> 
> I don't think it can throw ERROR_SUCCESS. Do you have any reason to believe
> that it could?

The tests are failing sometimes with "Win error 0" and sometimes with "Win error 32".
Comment on attachment 8416587 [details] [diff] [review]
Patch

Review of attachment 8416587 [details] [diff] [review]:
-----------------------------------------------------------------

I'm ok with this, since the problem doesn't seem like an actual implementation failure.  I recall similar issues with Windows test file cleanup in the distant past, long before OS.File.  And OS.file.remove uses DeleteFile, which is known to have such issues.  See the responses to this question for some of its causes and workarounds:

http://stackoverflow.com/questions/1753209/deletefile-fails-on-recently-closed-file

Nevertheless, I'd like confirmation from a Windows expert, especially since the DeleteFile docs <http://msdn.microsoft.com/en-us/library/windows/desktop/aa363915%28v=vs.85%29.aspx> claim that it "marks a file for deletion on close" and "the file deletion does not occur until the last handle to the file is closed," so it's hard to see how a file access race condition would cause this problem.

Thus I'm requesting feedback from Tim!

On another note: although I appreciate modernized code, and I considered making Task.spawn -> Task.async changes myself, I eventually decided not to do it, because it clutters the history and obscures the origin (i.e. blame view) of the code being touched.  (It also makes a patch like this harder to review, because it surrounds the relevant changes with a bunch of unrelated ones.)  So I'd rather we not make such changes unless we're actually touching the code in question.
Attachment #8416587 - Flags: review?(myk)
Attachment #8416587 - Flags: review+
Attachment #8416587 - Flags: feedback?(tabraldes)
Comment on attachment 8416587 [details] [diff] [review]
Patch

Please consider making the code cleanup/modernization changes as a separate patch. As Myk mentioned, it's harder to review a patch that has a lot of unrelated changes in it, but it is valuable to keep our code up to date.

It sounds like we're planning to ignore file deletion failures during cleanup because 1) they occur intermittently with no apparent cause and 2) we will attempt to remove the files again on test startup. True deletion errors will still cause test failures because the subsequent test will run into the same deletion failure on startup, but transient deletion errors will not cause test failures because we ignore those errors. It is important to note that the last test that is run does not have the benefit of a subsequent test doing its cleanup for it. I'm OK with the current approach, but I think a better approach would be to make our cleanup code retry deletions for some specified amount of time (500ms?) and then fail instead of just ignore errors.

As for the `ERROR_SUCCESS` errors, it sounds like A or B is happening:
  A) `OS.File.remove` and/or `OS.File.removeDir` successfully removes the file/dir, but then throws an `OS.File.Error` with its `winLastError` property set to `ERROR_SUCCESS`
  B) `OS.File.remove` and/or `OS.File.removeDir` fails to remove the file/dir, and throws an `OS.File.Error` but fails to set the `winLastError` property correctly

We should file a bug so that someone can investigate why this is happening and fix our `OS.File` implementation.
Attachment #8416587 - Flags: feedback?(tabraldes) → feedback+
Attached patch PatchSplinter Review
I've removed the Task.spawn => Task.async transition, I'll refactor a bit the code in another bug (moving in head.js the common parts).

With this new patch we'll try to remove the files more than once (waiting 100 ms each time). The test will timeout if we can't remove the files. Looks like this approach works: https://tbpl.mozilla.org/?tree=Try&rev=13809e4f3d91
Attachment #8416587 - Attachment is obsolete: true
Attachment #8418139 - Flags: review?(tabraldes)
Attachment #8418139 - Flags: review?(dteller)
https://tbpl.mozilla.org/php/getParsedLog.php?id=39174468&tree=Fx-Team
Summary: Intermittent test_hosted_launch_no_registry.xul,test_packaged_launch.xul,test_packaged_launch_no_registry.xul | Cleanup function threw exception: Win error 32 during operation remove on file ... → Intermittent test_hosted_launch_no_registry.xul,test_packaged_launch.xul,test_packaged_launch_no_registry.xul,test_packaged_update_from_webapp_runtime.xul | Cleanup function threw exception: Win error 32 during operation remove on file ...
Comment on attachment 8418139 [details] [diff] [review]
Patch

Review of attachment 8418139 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, but have you tested what happens if the file is truly un-deletable? I'm hoping that the test eventually times out (in a reasonable amount of time). Maybe we should have a maximum number of retry attempts?
Attachment #8418139 - Flags: review?(tabraldes) → review+
(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #54)
> Comment on attachment 8418139 [details] [diff] [review]
> Patch
> 
> Review of attachment 8418139 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks good, but have you tested what happens if the file is truly
> un-deletable? I'm hoping that the test eventually times out (in a reasonable
> amount of time). Maybe we should have a maximum number of retry attempts?

Yes, the default mochitest timeout applies here too (so, IIRC, 5 minutes).
Comment on attachment 8418139 [details] [diff] [review]
Patch

Review of attachment 8418139 [details] [diff] [review]:
-----------------------------------------------------------------

Also, can you file a bug for finding out what in OS.File can throw with ERROR_SUCCESS?
Attachment #8418139 - Flags: review?(dteller) → review+
Duplicate of this bug: 1007666
Depends on: 1007797
I'm not sure why this is still happening, given that we're supposed to eat these exceptions now...
Attached patch PatchSplinter Review
The problem is that we should catch exceptions also when we try to remove the profile directory, because the profile root directory basically coincides with the installation directory.
Attachment #8423399 - Flags: review?(tabraldes)
Attachment #8418139 - Flags: checkin+
Attachment #8423399 - Flags: review?(tabraldes) → review+
The last remaining failure is quite weird, looks like a OS.File problem.
The removeDir function should remove all the files contained in the directory, and then remove the directory itself using removeEmptyDir.
In bug 1007797 we removed the code to check for ERROR_SUCCESS because that bug was supposed to make it so we don't get ERROR_SUCCESS as a return value anymore. Unfortunately we still seem to be seeing "Win error 0 [...]" messages which indicates that the fix for bug 1007797 was incomplete. That explains a handful of the above tbpl logs. I'm not yet sure what the "Win error 145 [...]" messages are about.
(In reply to Tim Abraldes [:TimAbraldes] [:tabraldes] from comment #114)
> In bug 1007797 we removed the code to check for ERROR_SUCCESS because that
> bug was supposed to make it so we don't get ERROR_SUCCESS as a return value
> anymore. Unfortunately we still seem to be seeing "Win error 0 [...]"
> messages which indicates that the fix for bug 1007797 was incomplete. That
> explains a handful of the above tbpl logs. I'm not yet sure what the "Win
> error 145 [...]" messages are about.

My mistake; we're seeing the ERROR_SUCCESS failures because the fixes haven't landed in aurora yet. Still not sure about the "error 145" messages
145 means ERROR_DIR_NOT_EMPTY. So OS.File.remove failed to empty the directory content somehow.
(In reply to Masatoshi Kimura [:emk] from comment #116)
> 145 means ERROR_DIR_NOT_EMPTY. So OS.File.remove failed to empty the
> directory content somehow.

Should we file a bug for that? I imagine that ERROR_DIR_NOT_EMPTY is not an error that a consumer should expect from OS.File.remove
My bad. OS.File.removeDir. I guess ERROR_DIR_NOT_EMPTY is an expected error from OS.File.removeDir.
(In reply to Masatoshi Kimura [:emk] from comment #118)
> My bad. OS.File.removeDir. I guess ERROR_DIR_NOT_EMPTY is an expected error
> from OS.File.removeDir.

removeDir should recursively remove the directory (like nsIFile::remove does). So ERROR_DIR_NOT_EMPTY is unexpected.
(In reply to Marco Castelluccio [:marco] from comment #119)
> (In reply to Masatoshi Kimura [:emk] from comment #118)
> > My bad. OS.File.removeDir. I guess ERROR_DIR_NOT_EMPTY is an expected error
> > from OS.File.removeDir.
> 
> removeDir should recursively remove the directory (like nsIFile::remove
> does). So ERROR_DIR_NOT_EMPTY is unexpected.

removeDir failed to empty the directory because it could not delete a file in the directory for some reasons (permission denied, sharing violation, etc.). If the error code is unexpected, what error code should be returned in this case?
(In reply to Masatoshi Kimura [:emk] from comment #120)
> (In reply to Marco Castelluccio [:marco] from comment #119)
> > (In reply to Masatoshi Kimura [:emk] from comment #118)
> > > My bad. OS.File.removeDir. I guess ERROR_DIR_NOT_EMPTY is an expected error
> > > from OS.File.removeDir.
> > 
> > removeDir should recursively remove the directory (like nsIFile::remove
> > does). So ERROR_DIR_NOT_EMPTY is unexpected.
> 
> removeDir failed to empty the directory because it could not delete a file
> in the directory for some reasons (permission denied, sharing violation,
> etc.). If the error code is unexpected, what error code should be returned
> in this case?

I think it should either throw the first error that it runs into or a generic error. ERROR_DIR_NOT_EMPTY is a bit misleading in my opinion.
Priority: -- → P1
What's your opinion on comment 120 and 121?

If we want to keep the current behavior, I'll take for ERROR_DIR_NOT_EMTPY the same approach as the first patch.
Flags: needinfo?(dteller)
I agree that if ERROR_DIR_NOT_EMPTY is raised because we somehow skipped an error, we should actually throw the original error, which will be more useful. However, if we have no better error, we should probably keep throwing ERROR_DIR_NOT_EMPTY.
Flags: needinfo?(dteller)
So, even if we modify the behavior, there will still be cases where ERROR_DIR_NOT_EMPTY is thrown. So, let's try to remove the files again when this exception is thrown.
Attachment #8453489 - Flags: review?(tabraldes)
Attachment #8453489 - Flags: review?(tabraldes) → review+
https://hg.mozilla.org/mozilla-central/rev/225f2cba1982
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 33
https://hg.mozilla.org/releases/mozilla-aurora/rev/8e992de2080c

The first two patches in this bug could uplift to beta, but it doesn't seem worth the hassle.
Product: Firefox → Firefox Graveyard
You need to log in before you can comment on or make changes to this bug.