Closed Bug 1628527 Opened 4 years ago Closed 4 years ago

Enable stack-fixing on the awsy-dmd Windows jobs

Categories

(Core :: DMD, task)

task
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Assigned: n.nethercote)

References

Details

Attachments

(3 obsolete files)

Stack fixing is currently disabled on Windows for the awsy-dmd job. In bug 1626272 I attempted to get it working, but failed, and so just disabled stack fixing in order to get the job working again.

I will try again to enable stack fixing on that job in this bug.

Currently AWSY-with-DMD doesn't work on Windows. This is because fix-stacks
is initialized lazily, and by the time the initialization happens some file
descriptors for files are open, and that leads to some major Python2-on-Windows
sadness as described in the big comment in the commit.

To fix the problem, this commit adds an init function to fix_stacks.py so
that fix-stacks can be initialized eagerly, hopefully before any file
descriptors for files are open.

For dmd.py, other than fixing the AWSY problems, this has little effect,
because fix-stacks is always initialized.

For utils.py, which is used to process the output of most tests, this has a
more noticeable effect: the fix-stacks process is always spawned, rather than
being spawned only when needed. If no stack traces appear in the test output,
this means that fix-stacks is spawned unnecessarily. But it's cheap to spawn;
the expensive part only happens when stack traces start getting fixed. So I
think this change in behaviour is acceptable.

Furthermore, the commit adds a finish function to fix_stacks.py, so that
the fix-stacks process can be explicitly shut down. This has never been done
for processes spawned for any of the stack fixing scripts. It's never caused
problems on Linux/Mac, but it seems to be necessary on Windows to avoid
similar "this file is locked" problems with the test_dmd.js test.

The commit also renames some things to more standard Python style, e.g.
json_mode instead of jsonMode.

Note that another Android hostutils update will be necessary for this to land, because the second commit modifies utils.py and fix_stacks.py in tandem, and the former is obtained from the repository while the latter is obtained from the hostutils tarball.

Blocks: 1596292

Bug 1628494 shows an interesting side-benefit of this bug: it will make some erroneous Taskcluster configurations more obvious.

In that bug the "OS X 10.14 debug test-macosx1014-64/debug-test-verify-e10s (TV)" job had an assertion failure, and the stack fixing failed because fix-stacks wasn't installed. This problem doesn't manifest on a green run of that job. With the eager spawning of fix-stacks introduced by this bug, it would.

Blocks: 1629789

This needs an Android host utils update, because it updates fix_stacks.py (obtained from the host utils) and utils.py (obtained from the repository) in tandem.

aerickson, would you be able to generate an update? Here is a try push that looks good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1fdcf02381d7b7dac0f86cdac3a4e81ad60418cc

Is that enough for you to go on? Let me know if you need anything else. Thanks.

Flags: needinfo?(aerickson)

:njn, we don't usually build hostutils from branches (normally we use m-c). This can't be landed to central until the hostutils has been created?

Flags: needinfo?(aerickson) → needinfo?(n.nethercote)

This can't be landed to central until the hostutils has been created?

Unfortunately, no.

In bug 1623134 comment 28 gbrown said "Usually the host-utils update is straight-forward and the host-utils patch is short -- easy enough to land in the same push as whatever requires it." So I was hoping doing it in this bug would be ok.

Flags: needinfo?(n.nethercote)

OK, I've generated a x86_64 host-utils from the build you linked.

Mac host-utils can only be generated from m-c and we're unable to create 32-bit host-utils because the test archive isn't generated for it any longer.

https://phabricator.services.mozilla.com/D71250 contains the updated manifest file that points at the new host-utils. Do you want to patch my change into your change?

Thank you, aerickson. I tested your changes on try but unfortunately I'm getting a lot of oranges: https://treeherder.mozilla.org/#/jobs?repo=try&revision=aba0d48534846199fecb68d6145cd8706bcad9f8

All the failures have "Failed wait for remote log: /sdcard/tests/reftest/reftest.log missing?" in common. Oh dear.

Hmm, I don't know what's going on there.

Geoff, do you have any idea?

Flags: needinfo?(gbrown)

There were failures like that on autoland recently; fixed by backout of bug 1607984:

https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=android%2Creftest&tochange=ea574b47a33281f4b3c65e96b0f8f7211f8bb391&fromchange=d27ee5af31d88915e006fc90ec7d25fb29a78f10

Maybe the try push included that changeset?

"Failed wait for remote log:" is saying that the test log is not being created on the device. From the logcat, it looks like geckoview started but the reftests never ran. There are some errors in the logcat, but I'm not sure what they mean, or if they are normal.

Flags: needinfo?(gbrown)

Thanks for the suggestion, gbrown. I tried looking through autoland commits to see if I had pushed on top of a bad revision, but maybe I didn't look far enough back. I will rebase and try another push on Monday.

(Back when I used hg mq I had a script that took patches from a mozilla-inbound repo and applied them to mozilla-central before doing a try push, in order to avoid pushing on top of bustage. Maybe I should resurrect that and make it work with proper hg commits.)

Pushed by nnethercote@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8235287622a8
Install `fix-stacks` on the `build-win64-fuzzing/debug` job. r=erahm
https://hg.mozilla.org/integration/autoland/rev/d9dfb6439761
Introduce explicit initialization and finalization of `fix-stacks`. r=erahm,perftest-reviewers,sparky,gbrown
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77
Backout by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3f07f2826db1
Backed out 2 changesets for causing wpt failures and a spike in bug 1622119. CLOSED TREE

This was my second attempt at enabling stack-fixing on awsy-dmd Windows jobs, and my second failure. I give up.

Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Flags: needinfo?(n.nethercote)
Resolution: --- → WONTFIX
No longer blocks: 1629789
Attachment #9141135 - Attachment is obsolete: true
Attachment #9139362 - Attachment is obsolete: true
Attachment #9139363 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: