Closed Bug 1489277 Opened 3 years ago Closed 3 years ago

mozrunner unit tests causing crashes in sh.exe during mozbuild regression test suite

Categories

(Firefox Build System :: General, defect)

Unspecified
Windows
defect
Not set
normal

Tracking

(firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: bugzilla, Assigned: bugzilla)

References

Details

Attachments

(1 file)

See the failures on "S" and "BR" Windows builds:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=a21462dc142ad715926864ac5ccccadcdc2595f7

I added some additional information to the exception being thrown by the test suite, and it turns out that sh.exe's return code is 0xC0000005, which is the Windows exception code for access violations.

I don't know what it is about the config/define changes made by launcher process by default, but something has created an enviornment that causes sh.exe to randomly crash!
Keywords: in-triage
Were you able to reproduce these issues on try?
Flags: needinfo?(aklotz)
Keywords: in-triage
Depends on: 1490843
My working theory was that something was running your new code during this test run somehow and it's breaking the environment in a way that eventually causes msys' sh.exe to crash. I was puzzling over this for a bit, since I didn't think we should be running Firefox itself in non-PGO builds, and your code doesn't seem to get used for xpcshell or anything like that that we *would* run, then I noticed this in your log:

17:04:20     INFO - ..\testing\mozbase\mozrunner\tests\test_interactive.py::test_run_interactive[firefox] PASSED

Those mozrunner tests look like they actually do run Firefox:
https://dxr.mozilla.org/mozilla-central/rev/2a59b432d2bd9b15ceec6b9435f60c785a820ef2/testing/mozbase/mozrunner/tests/test_interactive.py#10

There's code in the test fixtures to locate the binary from the objdir:
https://dxr.mozilla.org/mozilla-central/rev/2a59b432d2bd9b15ceec6b9435f60c785a820ef2/testing/mozbase/mozrunner/tests/conftest.py#26
https://dxr.mozilla.org/mozilla-central/rev/2a59b432d2bd9b15ceec6b9435f60c785a820ef2/testing/mozbase/moztest/moztest/selftest/fixtures.py#87

Given that, I'd be interested to see if disabling those tests changes the output here.
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #3)
> Those mozrunner tests look like they actually do run Firefox:
> https://dxr.mozilla.org/mozilla-central/rev/
> 2a59b432d2bd9b15ceec6b9435f60c785a820ef2/testing/mozbase/mozrunner/tests/
> test_interactive.py#10

Note that the `mb` python-test jobs are not dependent on the build job, and as such cannot run with the real Firefox binary. See also bug 1065583 for getting those tests enabled.

> There's code in the test fixtures to locate the binary from the objdir:
> https://dxr.mozilla.org/mozilla-central/rev/
> 2a59b432d2bd9b15ceec6b9435f60c785a820ef2/testing/mozbase/mozrunner/tests/
> conftest.py#26

Maybe this only works on Windows because the tests are part of the build job on that platform? Eventually the tests will also be moved out to `mb`, but until then it might be related.
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #3)
> Given that, I'd be interested to see if disabling those tests changes the
> output here.

Holy cow, Ted, you were right! Setting skip-if = true on those tests cleared up the builds!

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d596c1ebaca76ec53808be13e00256d1eb9a58c2
Now my next question is, what do we do about this? Henrik, any suggestions?
Flags: needinfo?(hskupin)
Andrew recently refactored those tests so that we can get coverage for Chrome. I think it would be best if he has a look at this.
Flags: needinfo?(hskupin) → needinfo?(ahal)
aklotz: I think it'd also be good for you to test PGO builds with your patch on Try, since those are definitely going to run Firefox during the build. If we hit the same problem there then we'll have a larger problem. (I think we could disable these tests in the build tasks without it being a big issue.)

Does making those tests run Firefox with your new flags (-wait-for-browser, -no-deelevate) change the behavior here at all? Related: will mozrunner want to use -wait-for-browser by default when running a new-enough Firefox?
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #8)
> I think we could disable these tests in the build tasks without it being a big issue.)

Yeah, disabling the tests such that they are skipped in automation but not locally would be fine. As Henrik mentioned we want to pull that whole suite out of the build anyway (at which point there will be no Firefox binary for them to find).

I'm not sure if it's possible to skip a test if it's running in CI or not via mozinfo/manifest.ini, if not we could use something like:

@pytest.skip(os.environ.get('MOZ_AUTOMATION'))
def test_...

I imagine pytest has a way to skip entire files as well.

> Does making those tests run Firefox with your new flags (-wait-for-browser,
> -no-deelevate) change the behavior here at all? Related: will mozrunner want
> to use -wait-for-browser by default when running a new-enough Firefox?

If we can fix the issue by setting flags, that would be better than skipping the tests. Without knowing any specifics, -wait-for-browser definitely sounds like something mozrunner should be using.
Flags: needinfo?(ahal)
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #8)
> aklotz: I think it'd also be good for you to test PGO builds with your patch
> on Try, since those are definitely going to run Firefox during the build. If
> we hit the same problem there then we'll have a larger problem. (I think we
> could disable these tests in the build tasks without it being a big issue.)

Yeah, I've got other issues on PGO courtesy of bug 1488627. But yeah, I'll keep an eye out.

> 
> Does making those tests run Firefox with your new flags (-wait-for-browser,
> -no-deelevate) change the behavior here at all? Related: will mozrunner want
> to use -wait-for-browser by default when running a new-enough Firefox?


(In reply to Andrew Halberstadt [:ahal] from comment #9)
> If we can fix the issue by setting flags, that would be better than skipping
> the tests. Without knowing any specifics, -wait-for-browser definitely
> sounds like something mozrunner should be using.

-no-deelevate is specific to Talos, but the other flag needs to be set, yes. I already included a patch in that series on the try build that addresses this: https://hg.mozilla.org/try/rev/8d217f130399acda7e7393dd91adc3666c2d219e, but the sh.exe crashes were still being triggered until I disabled these tests outright.
Depends on: 1491489
OK. I think working around this by disabling those tests is totally reasonable. We are using a known ancient msys version of sh.exe and I don't think it's worth anyone's time to actually try to figure out why it's crashing. It's entirely possible that it's only even happening in this way because we have a python.exe that launches Firefox and then the same process goes on to launch sh.exe (in the mozconfig loader in configure tests), which I don't think is a common scenario in the build.
Comment on attachment 9009305 [details]
Bug 1489277: Skip mozrunner tests if running under automation; r=ahal!

Andrew Halberstadt [:ahal] has approved the revision.
Attachment #9009305 - Flags: review+
Comment on attachment 9009305 [details]
Bug 1489277: Skip mozrunner tests if running under automation; r=ahal!

Andrew Halberstadt [:ahal] has requested changes to the revision.
Attachment #9009305 - Flags: review+
Comment on attachment 9009305 [details]
Bug 1489277: Skip mozrunner tests if running under automation; r=ahal!

Andrew Halberstadt [:ahal] has approved the revision.
Attachment #9009305 - Flags: review+
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
Summary: Seemingly random crashes in sh.exe during mozbuild regression test suite → mozrunner unit tests causing crashes in sh.exe during mozbuild regression test suite
https://hg.mozilla.org/mozilla-central/rev/2cdf22687fa7
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
You need to log in before you can comment on or make changes to this bug.