Closed Bug 757632 Opened 10 years ago Closed 9 years ago

Intermittent "command timed out: 1200 seconds without output" in test_0202_app_launch_apply_update_dirlocked.js, test_0202_app_launch_apply_update_dirlocked_svc.js

Categories

(Toolkit :: Application Update, defect)

x86
Windows XP
defect
Not set
major

Tracking

()

RESOLVED WORKSFORME
mozilla16

People

(Reporter: philor, Assigned: ehsan.akhgari)

References

Details

(Keywords: intermittent-failure, Whiteboard: [test which aborts the suite])

Attachments

(1 file)

I'm trying to reproduce this locally....
Assignee: nobody → ehsan
Duplicate of this bug: 758305
Note to self: the log in bug 758305 might be useful here.
<http://brasstacks.mozilla.com/orangefactor/?display=Bug&tree=all&startday=2012-04-27&endday=2012-05-25&bugid=757632> explains why I have not been able to reproduce this.  It only happened once on Windows 7.  Need to retest on XP.
Depends on: 758704
OK, I have had 0 luck reproducing this locally, or on the slave I own.  So I landed a debugging patch to try to get some insight into what's going on.

https://hg.mozilla.org/integration/mozilla-inbound/rev/7d9daf859330
Whiteboard: [orange][test which aborts the suite] → [orange][test which aborts the suite][leave open]
Depends on: 758801
Hmm, this is bad.  I _think_ that the updater.exe process is crashing.  I've seen it twice once on my VM and once the test slave, but I didn't have debugging symbols on the former and Visual Studio on the latter to debug it.  :/
(In reply to Phil Ringnalda (:philor) from comment #97)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=12081041&tree=Mozilla-
> Inbound (still pre-debugging patch)

Just as a hint, if this is indeed caused by a crash, the debugging patch is not gonna add much value.  :/
(In reply to Ehsan Akhgari [:ehsan] from comment #94)
> OK, I have had 0 luck reproducing this locally, or on the slave I own.  So I
> landed a debugging patch to try to get some insight into what's going on.
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/7d9daf859330

https://hg.mozilla.org/mozilla-central/rev/7d9daf859330
Prettier fail (or perhaps a different fail, can't really know) from try:

https://tbpl.mozilla.org/php/getParsedLog.php?id=12126383&tree=Try

TEST-INFO | c:\talos-slave\test\build\xpcshell\tests\toolkit\mozapps\update\test_svc\unit\test_0202_app_launch_apply_update_dirlocked_svc.js | running test ...
TEST-UNEXPECTED-FAIL | c:\talos-slave\test\build\xpcshell\tests\toolkit\mozapps\update\test_svc\unit\test_0202_app_launch_apply_update_dirlocked_svc.js | test failed (with xpcshell return code: 0), see following log:
>>>>>>>
### XPCOM_MEM_LEAK_LOG defined -- logging leaks to c:\docume~1\cltbld\locals~1\temp\tmpq13us5\runxpcshelltests_leaks.log

TEST-INFO | (xpcshell/head.js) | test 1 pending
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [attemptServiceInstall : 1502] Starting installer process...
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [shouldRunServiceTest : 1425] Checking if the service exists on this machine.
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [shouldRunServiceTest : 1431] Service exists, return value: 0

TEST-INFO | (xpcshell/head.js) | test 2 pending
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/test_0202_app_launch_apply_update_dirlocked_svc.js | [adjustPathsOnWindows : 252] Using this new bin directory: c:\docume~1\cltbld\locals~1\temp\tmpq13us5\ExecutableDir.tmp\bin

TEST-PASS | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [lockDirectory : 1279] true == true

TEST-PASS | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [lockDirectory : 1280] false == false

TEST-PASS | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/test_0202_app_launch_apply_update_dirlocked_svc.js | [run_test : 189] true == true
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [setEnvironment : 2776] setting the XRE_NO_WINDOWS_CRASH_DIALOG environment variable to 1... previously it didn't exist
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [setEnvironment : 2823] removing the XPCOM_MEM_LEAK_LOG environment variable... previous value c:\docume~1\cltbld\locals~1\temp\tmpq13us5\runxpcshelltests_leaks.log
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [setEnvironment : 2830] setting the XPCOM_DEBUG_BREAK environment variable to warn... previous value stack-and-abort
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [setEnvironment : 2856] setting MOZ_NO_SERVICE_FALLBACK environment variable to 1
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [resetEnvironment : 2876] setting the XPCOM_MEM_LEAK_LOG environment variable back to c:\docume~1\cltbld\locals~1\temp\tmpq13us5\runxpcshelltests_leaks.log
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [resetEnvironment : 2882] setting the XPCOM_DEBUG_BREAK environment variable back to stack-and-abort
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [resetEnvironment : 2916] removing the XRE_NO_WINDOWS_CRASH_DIALOG environment variable
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [resetEnvironment : 2938] removing MOZ_NO_SERVICE_FALLBACK environment variable

TEST-INFO | (xpcshell/head.js) | test 2 finished

TEST-INFO | (xpcshell/head.js) | running event loop

TEST-UNEXPECTED-FAIL | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/test_0202_app_launch_apply_update_dirlocked_svc.js | pending-service == pending - See following stack:
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: do_throw :: line 440
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: _do_check_eq :: line 534
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: do_check_eq :: line 555
JS frame :: c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/test_0202_app_launch_apply_update_dirlocked_svc.js :: finishTest :: line 313
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: <TOP_LEVEL> :: line 121

TEST-INFO | (xpcshell/head.js) | exiting test

TEST-UNEXPECTED-FAIL | c:\talos-slave\test\build\xpcshell\head.js | exception thrown from do_timeout callback: 2147500036 - See following stack:
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: do_throw :: line 440
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: <TOP_LEVEL> :: line 123

TEST-INFO | (xpcshell/head.js) | exiting test
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/test_0202_app_launch_apply_update_dirlocked_svc.js | [end_test : 209] removing update test directory c:\docume~1\cltbld\locals~1\temp\tmpq13us5\ExecutableDir.tmp\bin\update_test
Unable to remove files / directories from directory
path: C:\Documents and Settings\cltbld\Local Settings\Application Data\Mozilla\Firefox\firefox\updates
Exception: [Exception... "Component returned failure code: 0x80520015 (NS_ERROR_FILE_ACCESS_DENIED) [nsIFile.remove]"  nsresult: "0x80520015 (NS_ERROR_FILE_ACCESS_DENIED)"  location: "JS frame :: c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js :: cleanUpdatesDir :: line 1050"  data: no]
WARNING: nsExceptionService ignoring thread destruction after shutdown: file e:/builds/moz2_slave/try-w32-dbg/build/xpcom/base/nsExceptionService.cpp, line 166
WARNING: OOPDeinit() without successful OOPInit(): file e:/builds/moz2_slave/try-w32-dbg/build/toolkit/crashreporter/nsExceptionHandler.cpp, line 2005
nsStringStats
 => mAllocCount:           3423
 => mReallocCount:          299
 => mFreeCount:            3423
 => mShareCount:           9469
 => mAdoptCount:            138
 => mAdoptFreeCount:        138
<<<<<<<
Traceback (most recent call last):
  File "xpcshell/runxpcshelltests.py", line 915, in <module>
    main()
  File "xpcshell/runxpcshelltests.py", line 911, in main
    if not xpcsh.runTests(args[0], testdirs=args[1:], **options.__dict__):
  File "xpcshell/runxpcshelltests.py", line 795, in runTests
    self.removeDir(self.profileDir)
  File "xpcshell/runxpcshelltests.py", line 321, in removeDir
    shutil.rmtree(dirname)
  File "c:\mozilla-build\python25\Lib\shutil.py", line 169, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "c:\mozilla-build\python25\Lib\shutil.py", line 169, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "c:\mozilla-build\python25\Lib\shutil.py", line 174, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "c:\mozilla-build\python25\Lib\shutil.py", line 172, in rmtree
    os.remove(fullname)
WindowsError: [Error 13] Access is denied: 'c:\\docume~1\\cltbld\\locals~1\\temp\\tmpq13us5\\ExecutableDir.tmp\\bin\\updated.update_in_progress.lock'
program finished with exit code 1
(In reply to Phil Ringnalda (:philor) from comment #155)
> Prettier fail (or perhaps a different fail, can't really know) from try:

Hmm, this is different, and doesn't make a whole lot of sense.  Let's see if it happens somewhere else...
https://tbpl.mozilla.org/php/getParsedLog.php?id=12159645&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=12159119&tree=Mozilla-Inbound

(NS_ERROR_FILE_ACCESS_DENIED) [nsIFile.remove] on Windows makes a ton of sense to me, having been the cause of other insanely-frequent Windows failures.

By the way, for the week this is the top fail across all trees, more than double the next, though for mozilla-central it's not quite five times the next.
https://tbpl.mozilla.org/php/getParsedLog.php?id=12159302&tree=Mozilla-Inbound

TEST-UNEXPECTED-FAIL | c:\talos-slave\test\build\xpcshell\tests\toolkit\mozapps\update\test_svc\unit\test_0202_app_launch_apply_update_dirlocked_svc.js | test failed (with xpcshell return code: 0), see following log:
>>>>>>>

TEST-INFO | (xpcshell/head.js) | test 1 pending
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [attemptServiceInstall : 1502] Starting installer process...
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [shouldRunServiceTest : 1425] Checking if the service exists on this machine.
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [shouldRunServiceTest : 1431] Service exists, return value: 0

TEST-INFO | (xpcshell/head.js) | test 2 pending
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/test_0202_app_launch_apply_update_dirlocked_svc.js | [adjustPathsOnWindows : 252] Using this new bin directory: c:\docume~1\cltbld\locals~1\temp\tmpbsrp_3\ExecutableDir.tmp\bin

TEST-PASS | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [lockDirectory : 1279] true == true

TEST-PASS | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [lockDirectory : 1280] false == false

TEST-PASS | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/test_0202_app_launch_apply_update_dirlocked_svc.js | [run_test : 189] true == true
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [setEnvironment : 2776] setting the XRE_NO_WINDOWS_CRASH_DIALOG environment variable to 1... previously it didn't exist
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [setEnvironment : 2823] removing the XPCOM_MEM_LEAK_LOG environment variable... previous value c:\docume~1\cltbld\locals~1\temp\tmpbsrp_3\runxpcshelltests_leaks.log
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [setEnvironment : 2830] setting the XPCOM_DEBUG_BREAK environment variable to warn... previous value stack-and-abort
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [setEnvironment : 2856] setting MOZ_NO_SERVICE_FALLBACK environment variable to 1
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [resetEnvironment : 2876] setting the XPCOM_MEM_LEAK_LOG environment variable back to c:\docume~1\cltbld\locals~1\temp\tmpbsrp_3\runxpcshelltests_leaks.log
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [resetEnvironment : 2882] setting the XPCOM_DEBUG_BREAK environment variable back to stack-and-abort
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [resetEnvironment : 2916] removing the XRE_NO_WINDOWS_CRASH_DIALOG environment variable
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js | [resetEnvironment : 2938] removing MOZ_NO_SERVICE_FALLBACK environment variable

TEST-INFO | (xpcshell/head.js) | test 2 finished

TEST-INFO | (xpcshell/head.js) | running event loop

TEST-UNEXPECTED-FAIL | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/test_0202_app_launch_apply_update_dirlocked_svc.js | pending-service == pending - See following stack:
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: do_throw :: line 440
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: _do_check_eq :: line 534
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: do_check_eq :: line 555
JS frame :: c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/test_0202_app_launch_apply_update_dirlocked_svc.js :: finishTest :: line 313
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: <TOP_LEVEL> :: line 121

TEST-INFO | (xpcshell/head.js) | exiting test

TEST-UNEXPECTED-FAIL | c:\talos-slave\test\build\xpcshell\head.js | exception thrown from do_timeout callback: 2147500036 - See following stack:
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: do_throw :: line 440
JS frame :: c:\talos-slave\test\build\xpcshell\head.js :: <TOP_LEVEL> :: line 123

TEST-INFO | (xpcshell/head.js) | exiting test
TEST-INFO | c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/test_0202_app_launch_apply_update_dirlocked_svc.js | [end_test : 209] removing update test directory c:\docume~1\cltbld\locals~1\temp\tmpbsrp_3\ExecutableDir.tmp\bin\update_test
Unable to remove files / directories from directory
path: C:\Documents and Settings\cltbld\Local Settings\Application Data\Mozilla\Firefox\firefox\updates
Exception: [Exception... "Component returned failure code: 0x80520015 (NS_ERROR_FILE_ACCESS_DENIED) [nsIFile.remove]"  nsresult: "0x80520015 (NS_ERROR_FILE_ACCESS_DENIED)"  location: "JS frame :: c:/talos-slave/test/build/xpcshell/tests/toolkit/mozapps/update/test_svc/unit/head_update.js :: cleanUpdatesDir :: line 1050"  data: no]
<<<<<<<
Traceback (most recent call last):
  File "xpcshell/runxpcshelltests.py", line 915, in <module>
    main()
  File "xpcshell/runxpcshelltests.py", line 911, in main
    if not xpcsh.runTests(args[0], testdirs=args[1:], **options.__dict__):
  File "xpcshell/runxpcshelltests.py", line 795, in runTests
    self.removeDir(self.profileDir)
  File "xpcshell/runxpcshelltests.py", line 321, in removeDir
    shutil.rmtree(dirname)
  File "c:\mozilla-build\python25\Lib\shutil.py", line 169, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "c:\mozilla-build\python25\Lib\shutil.py", line 169, in rmtree
    rmtree(fullname, ignore_errors, onerror)
  File "c:\mozilla-build\python25\Lib\shutil.py", line 174, in rmtree
    onerror(os.remove, fullname, sys.exc_info())
  File "c:\mozilla-build\python25\Lib\shutil.py", line 172, in rmtree
    os.remove(fullname)
WindowsError: [Error 13] Access is denied: 'c:\\docume~1\\cltbld\\locals~1\\temp\\tmpbsrp_3\\ExecutableDir.tmp\\bin\\updated.update_in_progress.lock'
program finished with exit code 1
elapsedTime=1622.234000
TinderboxPrint: xpcshell<br/><em class="testfail">T-FAIL</em>
Unknown Error: command finished with exit code: 1
(In reply to Phil Ringnalda (:philor) from comment #179)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=12159645&tree=Mozilla-
> Inbound
> https://tbpl.mozilla.org/php/getParsedLog.php?id=12159119&tree=Mozilla-
> Inbound
> 
> (NS_ERROR_FILE_ACCESS_DENIED) [nsIFile.remove] on Windows makes a ton of
> sense to me, having been the cause of other insanely-frequent Windows
> failures.

That is a red herring, the real failure happens before that in the log.

> By the way, for the week this is the top fail across all trees, more than
> double the next, though for mozilla-central it's not quite five times the
> next.

Yep, I'm aware of this, and I'm actively trying to reproduce it, but that has turned out to be a lot trickier than I expected.
https://tbpl.mozilla.org/php/getParsedLog.php?id=12167105&tree=Fx-Team

But we're learning stuff that will help you reproduce it, by having it fail everywhere all the time and thus essentially not running any other xpcshell tests on WinXP, right?
(In reply to Phil Ringnalda (:philor) from comment #187)
> But we're learning stuff that will help you reproduce it, by having it fail
> everywhere all the time and thus essentially not running any other xpcshell
> tests on WinXP, right?

I guess not... I disabled the test: https://hg.mozilla.org/integration/mozilla-inbound/rev/1ff862485759
Whiteboard: [orange][test which aborts the suite][leave open] → [orange][test which aborts the suite][test disabled][leave open]
I landed a followup to remove the test name because apparently comments don't work properly in xpcshell manifests...

https://hg.mozilla.org/integration/mozilla-inbound/rev/39b384528de3
Yet another push, because our xpcshell manifest format is retarded:

https://hg.mozilla.org/integration/mozilla-inbound/rev/d88590f8a245
OK, seems like my patch in bug 762032 actually fixes this as well!  \o/
Depends on: 762032
No longer depends on: 758704
Whiteboard: [orange][test which aborts the suite][test disabled][leave open] → [orange][test which aborts the suite][test disabled]
Attachment #632685 - Flags: review?(robert.bugzilla)
Attachment #632685 - Flags: review?(robert.bugzilla) → review+
https://hg.mozilla.org/mozilla-central/rev/983b91e5aa17
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla16
Whiteboard: [orange][test which aborts the suite][test disabled] → [orange][test which aborts the suite]
https://tbpl.mozilla.org/php/getParsedLog.php?id=12668461&tree=Mozilla-Inbound

On the bright side, that's six pushes up from reenabling the test; on the not-so-bright side, it's only the third time it ran on WinXP in those six pushes.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Constant "command timed out: 1200 seconds without output" in test_0202_app_launch_apply_update_dirlocked.js → Intermittent "command timed out: 1200 seconds without output" in test_0202_app_launch_apply_update_dirlocked.js
(In reply to Phil Ringnalda (:philor) from comment #198)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=12668461&tree=Mozilla-
> Inbound
> 
> On the bright side, that's six pushes up from reenabling the test; on the
> not-so-bright side, it's only the third time it ran on WinXP in those six
> pushes.

Oh, hrm, I ran ~10 tests on XP on Oak on this and they were all green...  Let's keep this in the tree for a while to see how frequent the test failures are now.
test_0202_app_launch_apply_update_dirlocked.js
Oh, that's why I keep getting confused about some of these, I've been dumping both dirlocked.js and dirlocked_svc.js in here without noticing.

https://tbpl.mozilla.org/php/getParsedLog.php?id=14488949&tree=Mozilla-Aurora
Summary: Intermittent "command timed out: 1200 seconds without output" in test_0202_app_launch_apply_update_dirlocked.js → Intermittent "command timed out: 1200 seconds without output" in test_0202_app_launch_apply_update_dirlocked.js, test_0202_app_launch_apply_update_dirlocked_svc.js
Whiteboard: [orange][test which aborts the suite] → [test which aborts the suite]
It has been a couple of months since this has happened and it was happening several times per month. Resolving -> WFM. Please reopen if this happens again.
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → WORKSFORME
This looks like a timeout when removing the copied bin dir.
or a timeout waiting on 0202_aus_test_app.exe to no longer be in use
You need to log in before you can comment on or make changes to this bug.