Closed Bug 1316309 Opened 3 years ago Closed 3 years ago

Xpcshell WinXP debug logs contain several "test_end for <test> logged while test not in progress"

Categories

(Testing :: XPCShell Harness, defect)

defect
Not set

Tracking

(firefox53 fixed)

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: ahal, Assigned: ahal)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(1 file)

This happens on other windows platforms as well, and infrequently on debug, but seems to happen most frequently on XP debug. Take any xpcshell log from that platform and grep for "test_end". For example:
https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32-debug/1478640923/mozilla-central_xp_ix-debug_test-xpcshell-bm119-tests1-windows-build38.txt.gz

You'll notice these are logged by mozharness at the INFO level, even though mozlog is emitting them at the ERROR level:
https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/structuredlog.py#359

Paul's patch in bug 1261197 got backed out because it is (correctly) making those lines get logged as an error. As a workaround, we could change that to a WARNING, but it seems like there is a legitimately bad problem happening here that warrants a fix.
> and infrequently on debug

Should read "infrequently on opt"
So it seems roughly the following is happening for ~77 different tests in that log:

1) Test times out
2) We log a "test_end" with status=TIMEOUT
3) We attempt to kill the test process, but that fails
4) Test ends eventually, we try to log a second "test_end"

How none of this was turning the job orange is kind of mind boggling to me. There are several things failing to happen here that are combining to cause this issue:

1) Many tests on Win XP debug are exceeding the timeout threshold
2) Killing the test processes after timeout is broken (at least on Windows)
3) All throughout this process, nothing gets logged that is sufficient for mozharness to turn the job orange (this is fixed by bug 1261197 at least)

I'll make fixing this my priority.
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
The situation didn't up being as bad as it originally seemed in comment 2. These tests were being re-tried in the non-parallel sequence at the end of the test run and passing there. This patch is still important though as it makes sure we clean up properly and unblocks bug 1261197.
Comment on attachment 8809954 [details]
Bug 1316309 - Use mozcrash.kill_and_get_minidump() in xpcshell instead of automation.py,

https://reviewboard.mozilla.org/r/92426/#review93528

Apparently I had a half-finished patch for this in bug 1193738 but I never quite got around to getting it into a landable state. Thanks for fixing this!

::: testing/xpcshell/runxpcshelltests.py
(Diff revision 1)
>      import psutil
>      HAVE_PSUTIL = True
>  except Exception:
>      HAVE_PSUTIL = False
>  
> -from automation import Automation

Hooray! If we got the Android harnesses off of remoteautomation we could actually remove automation.py...
Attachment #8809954 - Flags: review?(ted) → review+
Duplicate of this bug: 1193738
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1a9545f0b58e
Use mozcrash.kill_and_get_minidump() in xpcshell instead of automation.py, r=ted
https://hg.mozilla.org/mozilla-central/rev/1a9545f0b58e
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Depends on: 1319472
You need to log in before you can comment on or make changes to this bug.