Closed Bug 909489 Opened 6 years ago Closed 6 years ago

Intermittent test_0202_app_launch_apply_update_dirlocked.js | Test timed out | test failed (with xpcshell return code: 1)

Categories

(Toolkit :: Application Update, defect)

x86
Windows XP
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla26
Tracking Status
firefox24 --- unaffected
firefox25 --- unaffected
firefox26 --- fixed

People

(Reporter: RyanVM, Assigned: rstrong)

References

Details

(Keywords: intermittent-failure)

Attachments

(5 files, 3 obsolete files)

5.39 KB, patch
bbondy
: review+
rstrong
: checkin+
Details | Diff | Splinter Review
6.23 KB, patch
rstrong
: review+
rstrong
: checkin+
Details | Diff | Splinter Review
1.15 KB, patch
bbondy
: review+
rstrong
: checkin+
Details | Diff | Splinter Review
33.67 KB, patch
rstrong
: review+
rstrong
: checkin+
Details | Diff | Splinter Review
29.83 KB, patch
rstrong
: review+
rstrong
: checkin+
Details | Diff | Splinter Review
https://tbpl.mozilla.org/php/getParsedLog.php?id=27016931&tree=Mozilla-Inbound

Windows XP 32-bit mozilla-inbound opt test xpcshell on 2013-08-26 11:58:50 PDT for push 69a2e380cfc2
slave: t-xp32-ix-043

12:19:31     INFO -  TEST-INFO | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test\unit\test_0202_app_launch_apply_update_dirlocked.js | running test ...
12:24:31  WARNING -  TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test\unit\test_0202_app_launch_apply_update_dirlocked.js | Test timed out
12:24:31     INFO -  Can't trigger Breakpad, just killing process
12:24:31  WARNING -  TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test\unit\test_0202_app_launch_apply_update_dirlocked.js | test failed (with xpcshell return code: 1), see following log:
12:24:31     INFO -  >>>>>>>
12:24:31     INFO -  TEST-INFO | (xpcshell/head.js) | test MAIN run_test pending (1)
12:24:31     INFO -  TEST-INFO | (xpcshell/head.js) | test pending (2)
12:24:31     INFO -  TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [adjustPathsOnWindows : 3036] Using this new bin directory: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmptagwpm\ExecutableDir.tmp\bin
12:24:31     INFO -  TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [lockDirectory : 1411] testing the successful creation of the lock file
12:24:31     INFO -  TEST-PASS | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [lockDirectory : 1412] true == true
12:24:31     INFO -  TEST-PASS | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [lockDirectory : 1413] false == false
12:24:31     INFO -  TEST-PASS | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/test_0202_app_launch_apply_update_dirlocked.js | [run_test : 168] true == true
12:24:31     INFO -  TEST-PASS | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/test_0202_app_launch_apply_update_dirlocked.js | [checkUpdateApplied : 266] "pending" == "pending"
12:24:31     INFO -  TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [unlockDirectory : 1424] removing and testing the successful removal of the lock file
12:24:31     INFO -  TEST-PASS | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [unlockDirectory : 1426] false == false
12:24:31     INFO -  TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [removeCallbackCopy : 2508] attempting removal of file: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmptagwpm\ExecutableDir.tmp\bin\0202_aus_test_app.exe
12:24:31     INFO -  TEST-INFO | (xpcshell/head.js) | test finished (2)
12:24:31     INFO -  TEST-INFO | (xpcshell/head.js) | test MAIN run_test finished (1)
12:24:31     INFO -  TEST-INFO | (xpcshell/head.js) | exiting test
12:24:31     INFO -  TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/test_0202_app_launch_apply_update_dirlocked.js | [end_test : 200] removing update test directory c:\docume~1\cltbld~1.t-x\locals~1\temp\tmptagwpm\ExecutableDir.tmp\bin\update_test
12:24:31     INFO -  TEST-PASS | (xpcshell/head.js) | 5 (+ 0) check(s) passed
12:24:31     INFO -  TEST-INFO | (xpcshell/head.js) | 0 check(s) todo
12:24:31     INFO -  <<<<<<<
The logs show the test run starting and five minutes later all other log entries are reported.

09:23:46     INFO -  TEST-INFO | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test\unit\test_0202_app_launch_apply_update_dirlocked.js | running test ...
09:28:46  WARNING -  TEST-UNEXPECTED-FAIL | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test\unit\test_0202_app_launch_apply_update_dirlocked.js | Test timed out

I'd like to add the time to the internal logging for the tests to verify what is going on with the times.
Since there is an effort to speed up xpcshell tests I included milliseconds.
Assignee: nobody → robert.bugzilla
Status: NEW → ASSIGNED
Attachment #796843 - Flags: review?(netzen)
Comment on attachment 796843 [details] [diff] [review]
additional logging

Review of attachment 796843 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/mozapps/update/test/shared.js
@@ +599,5 @@
> +             (mm < 10 ? "0" + mm : mm) + ":" +
> +             (ss < 10 ? "0" + ss : ss) + ":" +
> +             (ms < 10 ? "00" + ms : ms < 100 ? "0" + ms : ms);
> +  dump(time + " | TEST-INFO | " + caller.filename + " | [" + caller.name +
> +       " : " + caller.lineNumber + "] " + aText + "\n");

nice :) I've wanted to add something similar in the past
Attachment #796843 - Flags: review?(netzen) → review+
06:55:12     INFO -  TEST-INFO | C:\slave\test\build\tests\xpcshell\tests\toolkit\mozapps\update\test\unit\test_0202_app_launch_apply_update_dirlocked.js | running test ...
...
07:00:12     INFO -  06:55:13:503 | TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [removeCallbackCopy : 2517] attempting removal of file: c:\docume~1\cltbld~1.t-x\locals~1\temp\tmptcfvp7\ExecutableDir.tmp\bin\0202_aus_test_app.exe

So, the reported time from the test harness for the messages is incorrect and the call to removeCallbackCopy is where it fails.
So, do_test_finished is called below from removeCallbackCopy well before the timeout so for some reason it isn't successfully finishing.
07:00:12     INFO -  06:55:13:504 | TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/test_0202_app_launch_apply_update_dirlocked.js | [end_test : 204] removing update test directory c:\docume~1\cltbld~1.t-x\locals~1\temp\tmptcfvp7\ExecutableDir.tmp\bin\update_test
This also adds a removeUpdater helper so the test doesn't call do_test_finish until after it is no longer in use. I was consistently unable to remove the updates directory locally and though this doesn't cause the test to fail it was preventing cleanup of the updates directory after the test had completed.
Attachment #797552 - Flags: review?(netzen)
The try run shows that the patch didn't fix it but I'd still to get this patch landed.
Interesting log... at a glance it looks like it shouldn't have timed out.

18:13:28     INFO -  18:08:28:900 | TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [removeUpdater : 2560] calling do_test_finished with a timeout to allow files that are in use to finish closing
18:13:28     INFO -  TEST-INFO | (xpcshell/head.js) | test MAIN run_test finished (2)
18:13:28     INFO -  TEST-INFO | (xpcshell/head.js) | running event loop
18:13:28     INFO -  TEST-INFO | (xpcshell/head.js) | test finished (1)
18:13:28     INFO -  TEST-INFO | (xpcshell/head.js) | exiting test
18:13:28     INFO -  18:08:29:018 | TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [APOW_cleanup : 3106] unregistering directory provider
18:13:28     INFO -  18:08:29:018 | TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/test_0202_app_launch_apply_update_dirlocked.js | [end_test : 202] start - test cleanup
18:13:28     INFO -  18:08:29:018 | TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/test_0202_app_launch_apply_update_dirlocked.js | [end_test : 206] removing update test directory c:\docume~1\cltbld~1.t-x\locals~1\temp\tmpkbfdfi\ExecutableDir.tmp\bin\update_test
18:13:28     INFO -  18:08:29:020 | TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [cleanUp : 1286] start - general test cleanup
18:13:28     INFO -  18:08:29:022 | TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/head_update.js | [cleanUp : 1317] finish - general test cleanup
18:13:28     INFO -  18:08:29:022 | TEST-INFO | C:/slave/test/build/tests/xpcshell/tests/toolkit/mozapps/update/test/unit/test_0202_app_launch_apply_update_dirlocked.js | [end_test : 235] finish - test cleanup
18:13:28     INFO -  TEST-PASS | (xpcshell/head.js) | 5 (+ 0) check(s) passed
18:13:28     INFO -  TEST-INFO | (xpcshell/head.js) | 0 check(s) todo
Comment on attachment 797552 [details] [diff] [review]
patch rev1 restore calling do_test_finish via do_timeout

Review of attachment 797552 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/mozapps/update/test/unit/head_update.js.in
@@ +1593,5 @@
> +      catch (e) {
> +        // It is not unusual for it to take several seconds for the updater
> +        // binary to no longer be in use so this failure is not logged since it
> +        // is trivial to just follow the output from the log log
> +        do_timeout(TEST_HELPER_TIMEOUT, removeUpdater);

Maybe we should log here?
Attachment #797552 - Flags: review?(netzen) → review+
Pushed to fx-team
https://hg.mozilla.org/integration/fx-team/rev/7fce54972453

This also won't fix this so please leave open
Attachment #797552 - Attachment is obsolete: true
Attachment #798051 - Flags: review+
Let's disable this test on Win XP for now so it stops annoying the sheriffs, etc. I can continue to diagnose it on try.
Attachment #798075 - Flags: review?(netzen)
Attachment #798075 - Flags: review?(netzen) → review+
Whiteboard: [leave open] → [test disabled on Windows XP][leave open]
I tried changing the order of the test and it appears it may have an affect though it doesn't fix it

first sequential test - 4 out of 19 failures
https://tbpl.mozilla.org/?tree=Try&rev=55bc31cb1d41

no change - 9 out of 14 failures
https://tbpl.mozilla.org/?tree=Try&rev=1a062fa3f254

last sequential test - 12 out of 14 failures
https://tbpl.mozilla.org/?tree=Try&rev=1a062fa3f254

Forced failure try run. I think the failed runs aren't seeing the updater in use and I want to verify this is the case
https://tbpl.mozilla.org/?tree=Try&rev=35aed63354c7

Try run after making changes to test_0202_app_launch_apply_update_dirlocked.js to make it more like test_0202_app_launch_apply_update_dirlocked_svc.js
https://tbpl.mozilla.org/?tree=Try&rev=e64d3929799c
I suspect that the test is deleting the updater prior to the test running the updater.

Ran xpcshell tests a bunch of times on try without it failing in this test.
https://tbpl.mozilla.org/?tree=Try&rev=e64d3929799c

Ran xpcshell tests a bunch of times on try with it intentionally failing at the end of this test to see if I could catch additional logged info. If it had timed out it would have failed with a time out.
https://tbpl.mozilla.org/?tree=Try&rev=151b78f2f91c

Example of it timing out before the intentional fail.
https://tbpl.mozilla.org/?tree=Try&rev=35aed63354c7

Another try run with the attached patch (same as other patches thrown at try with minor modifications)
https://tbpl.mozilla.org/?tree=Try&rev=47b9996c7550
Attachment #799922 - Flags: review?(netzen)
Also, the test prior to this patch was fairly consistently failing per the try links in comment #77
The last try run looks really good so far with 20 out of 20 passing
https://tbpl.mozilla.org/?tree=Try&rev=47b9996c7550

I just triggered an additional 20 tests
Wouldn't you know it... the first 39 passed and the 40th failed. Must have pushed my luck. ;)

removeUpdater typically is called several times and when it the test hits the timeout it is only called once.

Even hough it still fails I'd still like to get the latest patch reviewed and landed since the failures are much less frequent. After that I'll take another look at how this can be fixed.
Comment on attachment 799922 [details] [diff] [review]
patch - make test as similar as possible to test_0202_app_launch_apply_update_dirlocked_svc.js which passes

Review of attachment 799922 [details] [diff] [review]:
-----------------------------------------------------------------

It failed a couple more times on the try push after your last comment, but everything looks fine in the patch to me if you want to land it anyway.
Attachment #799922 - Flags: review?(netzen) → review+
Comment on attachment 796843 [details] [diff] [review]
additional logging

Setting checkin to denote this has landed on m-c
Attachment #796843 - Flags: checkin+
Comment on attachment 798051 [details] [diff] [review]
2. additional logging and cleanup

Setting checkin to denote this has landed on m-c
Attachment #798051 - Flags: checkin+
Comment on attachment 798075 [details] [diff] [review]
disabled test for now on Win XP

Setting checkin to denote this has landed on m-c
Attachment #798075 - Flags: checkin+
The main differences in the logs that I see

Test passes
../head_update.js | [removeCallbackCopy : 2535] attempting removal of the updater binary
../head_update.js | [removeUpdater : 2557] non-fatal error removing file after test finished (will try again). File: ../updater.exe etc.

followed by MANY more attempts to remove the updater

../head_update.js | [removeUpdater : 2566] calling do_test_finished

Test timed out
../head_update.js | [removeCallbackCopy : 2535] attempting removal of the updater binary
../head_update.js | [removeUpdater : 2566] calling do_test_finished
The main thing this patch does is uses the update-staged notification before continuing so the updater launches before the attempt to delete it which happens now that the timeouts are much shorter. It also limits the app files copied by the test to the minimum which speeds up the test quite a lot since instead of copying around 150 MB it is copying around 40 MB. The majority of the other changes are for consistency between tests.

try run looks good so far with the only failures being for tests outside of app update (crosses fingers that none of the additional retriggers fail for this test)
https://tbpl.mozilla.org/?tree=Try&rev=b0d6c3b7a140
Attachment #800599 - Flags: review?(netzen)
Comment on attachment 800599 [details] [diff] [review]
use update-staged notification and only copy files that are necessary

Review of attachment 800599 [details] [diff] [review]:
-----------------------------------------------------------------

\o/

::: toolkit/mozapps/update/test/unit/head_update.js.in
@@ +522,5 @@
> +  let hasMore;
> +  let line = {};
> +  do {
> +    hasMore = istream.readLine(line);
> +    fileLeafNames.push(line.value); 

nit: trailing whitespace
Attachment #800599 - Flags: review?(netzen) → review+
fixed trailing whitespace throughout head_update.js.in. carrying forward r+
Attachment #800599 - Attachment is obsolete: true
Attachment #800810 - Flags: review+
Thanks for the review!

Pushed to fx-team
https://hg.mozilla.org/integration/fx-team/rev/163308fe32da
Flags: in-testsuite+
Whiteboard: [test disabled on Windows XP][leave open]
Target Milestone: --- → mozilla26
https://hg.mozilla.org/mozilla-central/rev/163308fe32da
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.