Closed Bug 1316309 Opened 8 years ago Closed 8 years ago

Xpcshell WinXP debug logs contain several "test_end for <test> logged while test not in progress"

Tracking

(firefox53 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla53

Tracking Flags:

Tracking

Status

firefox53

---

fixed

People

(Reporter: ahal, Assigned: ahal)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1316309 - Use mozcrash.kill_and_get_minidump() in xpcshell instead of automation.py, 8 years ago Andrew Halberstadt [:ahal] 58 bytes, text/x-review-board-request	ted : review+	Details

Andrew Halberstadt [:ahal]

Assignee

Description

•

8 years ago

This happens on other windows platforms as well, and infrequently on debug, but seems to happen most frequently on XP debug. Take any xpcshell log from that platform and grep for "test_end". For example:
https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32-debug/1478640923/mozilla-central_xp_ix-debug_test-xpcshell-bm119-tests1-windows-build38.txt.gz

You'll notice these are logged by mozharness at the INFO level, even though mozlog is emitting them at the ERROR level:
https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/structuredlog.py#359

Paul's patch in bug 1261197 got backed out because it is (correctly) making those lines get logged as an error. As a workaround, we could change that to a WARNING, but it seems like there is a legitimately bad problem happening here that warrants a fix.

Andrew Halberstadt [:ahal]

Assignee

Comment 1

•

8 years ago

> and infrequently on debug

Should read "infrequently on opt"

Andrew Halberstadt [:ahal]

Assignee

Comment 2

•

8 years ago

So it seems roughly the following is happening for ~77 different tests in that log:

1) Test times out
2) We log a "test_end" with status=TIMEOUT
3) We attempt to kill the test process, but that fails
4) Test ends eventually, we try to log a second "test_end"

How none of this was turning the job orange is kind of mind boggling to me. There are several things failing to happen here that are combining to cause this issue:

1) Many tests on Win XP debug are exceeding the timeout threshold
2) Killing the test processes after timeout is broken (at least on Windows)
3) All throughout this process, nothing gets logged that is sufficient for mozharness to turn the job orange (this is fixed by bug 1261197 at least)

I'll make fixing this my priority.

Assignee: nobody → ahalberstadt

Status: NEW → ASSIGNED

Comment hidden (mozreview-request)

There are a number of things that were going wrong with XPCShell on windows, this patch address them:

1) We were erroring out in Automation.killAndGetStack(), both failing to produce a minidump and failing
to kill the process. This patch fixes both these issues by using mozcrash instead.

2) Occasionally we were occasionally raising a psutil.NoSuchProcess error when attempting to kill the
process after the test. This appears to be caused by a race condition, but is safe to ignore as if the
process doesn't exist, then we don't need to kill it.

3) Spurious "Unable to remove directory" errors in cleanup. This uses mozfile.remove to remedy this.

Review commit: https://reviewboard.mozilla.org/r/92426/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/92426/

Andrew Halberstadt [:ahal]

Assignee

Comment 4

•

8 years ago

The situation didn't up being as bad as it originally seemed in comment 2. These tests were being re-tried in the non-parallel sequence at the end of the test run and passing there. This patch is still important though as it makes sure we clean up properly and unblocks bug 1261197.

(not currently active) Ted Mielczarek

Comment 5

•

8 years ago

mozreview-review

Comment on attachment 8809954 [details]
Bug 1316309 - Use mozcrash.kill_and_get_minidump() in xpcshell instead of automation.py,

https://reviewboard.mozilla.org/r/92426/#review93528

Apparently I had a half-finished patch for this in bug 1193738 but I never quite got around to getting it into a landable state. Thanks for fixing this!

::: testing/xpcshell/runxpcshelltests.py
(Diff revision 1)
>      import psutil
>      HAVE_PSUTIL = True
>  except Exception:
>      HAVE_PSUTIL = False
>  
> -from automation import Automation

Hooray! If we got the Android harnesses off of remoteautomation we could actually remove automation.py...

Attachment #8809954 - Flags: review?(ted) → review+

(not currently active) Ted Mielczarek

Updated

•

8 years ago

Blocks: 1192307

Pulsebot

Comment 7

•

8 years ago

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1a9545f0b58e
Use mozcrash.kill_and_get_minidump() in xpcshell instead of automation.py, r=ted

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 8

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1a9545f0b58e

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox53: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla53

Andrew Halberstadt [:ahal]

Assignee

Updated

•

8 years ago

Depends on: 1319472

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Xpcshell WinXP debug logs contain several "test_end for <test> logged while test not in progress"

Categories

(Testing :: XPCShell Harness, defect)

Tracking

(firefox53 fixed)

People

(Reporter: ahal, Assigned: ahal)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Comment 7

Comment 8

Updated

Attachment

General

Description

File Name

Content Type