1489277 - mozrunner unit tests causing crashes in sh.exe during mozbuild regression test suite

(No longer employed by Mozilla) Aaron Klotz

Assignee

Description

•

6 years ago

See the failures on "S" and "BR" Windows builds: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=a21462dc142ad715926864ac5ccccadcdc2595f7 I added some additional information to the exception being thrown by the test suite, and it turns out that sh.exe's return code is 0xC0000005, which is the Windows exception code for access violations. I don't know what it is about the config/define changes made by launcher process by default, but something has created an enviornment that causes sh.exe to randomly crash!

Kim Moir [:kmoir] ET

Updated

•

6 years ago

Keywords: in-triage

Kim Moir [:kmoir] ET

Comment 1

•

6 years ago

Were you able to reproduce these issues on try?

Flags: needinfo?(aklotz)

Keywords: in-triage

(No longer employed by Mozilla) Aaron Klotz

Assignee

Comment 2

•

6 years ago

Yes. e.g. https://treeherder.mozilla.org/#/jobs?repo=try&revision=bc8edef5f0806caa2163479dce28393b3c9e8067&selectedJob=197642788

Flags: needinfo?(aklotz)

(No longer employed by Mozilla) Aaron Klotz

Assignee

Updated

•

6 years ago

Depends on: 1490843

(not currently active) Ted Mielczarek

Comment 3

•

6 years ago

My working theory was that something was running your new code during this test run somehow and it's breaking the environment in a way that eventually causes msys' sh.exe to crash. I was puzzling over this for a bit, since I didn't think we should be running Firefox itself in non-PGO builds, and your code doesn't seem to get used for xpcshell or anything like that that we *would* run, then I noticed this in your log: 17:04:20 INFO - ..\testing\mozbase\mozrunner\tests\test_interactive.py::test_run_interactive[firefox] PASSED Those mozrunner tests look like they actually do run Firefox: https://dxr.mozilla.org/mozilla-central/rev/2a59b432d2bd9b15ceec6b9435f60c785a820ef2/testing/mozbase/mozrunner/tests/test_interactive.py#10 There's code in the test fixtures to locate the binary from the objdir: https://dxr.mozilla.org/mozilla-central/rev/2a59b432d2bd9b15ceec6b9435f60c785a820ef2/testing/mozbase/mozrunner/tests/conftest.py#26 https://dxr.mozilla.org/mozilla-central/rev/2a59b432d2bd9b15ceec6b9435f60c785a820ef2/testing/mozbase/moztest/moztest/selftest/fixtures.py#87 Given that, I'd be interested to see if disabling those tests changes the output here.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 4

•

6 years ago

(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #3) > Those mozrunner tests look like they actually do run Firefox: > https://dxr.mozilla.org/mozilla-central/rev/ > 2a59b432d2bd9b15ceec6b9435f60c785a820ef2/testing/mozbase/mozrunner/tests/ > test_interactive.py#10 Note that the `mb` python-test jobs are not dependent on the build job, and as such cannot run with the real Firefox binary. See also bug 1065583 for getting those tests enabled. > There's code in the test fixtures to locate the binary from the objdir: > https://dxr.mozilla.org/mozilla-central/rev/ > 2a59b432d2bd9b15ceec6b9435f60c785a820ef2/testing/mozbase/mozrunner/tests/ > conftest.py#26 Maybe this only works on Windows because the tests are part of the build job on that platform? Eventually the tests will also be moved out to `mb`, but until then it might be related.

(No longer employed by Mozilla) Aaron Klotz

Assignee

Comment 5

•

6 years ago

(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #3) > Given that, I'd be interested to see if disabling those tests changes the > output here. Holy cow, Ted, you were right! Setting skip-if = true on those tests cleared up the builds! https://treeherder.mozilla.org/#/jobs?repo=try&revision=d596c1ebaca76ec53808be13e00256d1eb9a58c2

(No longer employed by Mozilla) Aaron Klotz

Assignee

Comment 6

•

6 years ago

Now my next question is, what do we do about this? Henrik, any suggestions?

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 7

•

6 years ago

Andrew recently refactored those tests so that we can get coverage for Chrome. I think it would be best if he has a look at this.

Flags: needinfo?(hskupin) → needinfo?(ahal)

(not currently active) Ted Mielczarek

Comment 8

•

6 years ago

aklotz: I think it'd also be good for you to test PGO builds with your patch on Try, since those are definitely going to run Firefox during the build. If we hit the same problem there then we'll have a larger problem. (I think we could disable these tests in the build tasks without it being a big issue.) Does making those tests run Firefox with your new flags (-wait-for-browser, -no-deelevate) change the behavior here at all? Related: will mozrunner want to use -wait-for-browser by default when running a new-enough Firefox?

Andrew Halberstadt [:ahal]

Comment 9

•

6 years ago

(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #8) > I think we could disable these tests in the build tasks without it being a big issue.) Yeah, disabling the tests such that they are skipped in automation but not locally would be fine. As Henrik mentioned we want to pull that whole suite out of the build anyway (at which point there will be no Firefox binary for them to find). I'm not sure if it's possible to skip a test if it's running in CI or not via mozinfo/manifest.ini, if not we could use something like: @pytest.skip(os.environ.get('MOZ_AUTOMATION')) def test_... I imagine pytest has a way to skip entire files as well. > Does making those tests run Firefox with your new flags (-wait-for-browser, > -no-deelevate) change the behavior here at all? Related: will mozrunner want > to use -wait-for-browser by default when running a new-enough Firefox? If we can fix the issue by setting flags, that would be better than skipping the tests. Without knowing any specifics, -wait-for-browser definitely sounds like something mozrunner should be using.

Flags: needinfo?(ahal)

(No longer employed by Mozilla) Aaron Klotz

Assignee

Comment 10

•

6 years ago

(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #8) > aklotz: I think it'd also be good for you to test PGO builds with your patch > on Try, since those are definitely going to run Firefox during the build. If > we hit the same problem there then we'll have a larger problem. (I think we > could disable these tests in the build tasks without it being a big issue.) Yeah, I've got other issues on PGO courtesy of bug 1488627. But yeah, I'll keep an eye out. > > Does making those tests run Firefox with your new flags (-wait-for-browser, > -no-deelevate) change the behavior here at all? Related: will mozrunner want > to use -wait-for-browser by default when running a new-enough Firefox? (In reply to Andrew Halberstadt [:ahal] from comment #9) > If we can fix the issue by setting flags, that would be better than skipping > the tests. Without knowing any specifics, -wait-for-browser definitely > sounds like something mozrunner should be using. -no-deelevate is specific to Talos, but the other flag needs to be set, yes. I already included a patch in that series on the try build that addresses this: https://hg.mozilla.org/try/rev/8d217f130399acda7e7393dd91adc3666c2d219e, but the sh.exe crashes were still being triggered until I disabled these tests outright.

(No longer employed by Mozilla) Aaron Klotz

Assignee

Updated

•

6 years ago

Depends on: 1491489

(No longer employed by Mozilla) Aaron Klotz

Assignee

Comment 11

•

6 years ago

Attached file Bug 1489277: Skip mozrunner tests if running under automation; r=ahal! — Details

Depends on D5921

(No longer employed by Mozilla) Aaron Klotz

Assignee

Comment 12

•

6 years ago

Try build with this patch applied: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9cfcefe45af78b7dda94f56ff3af7bd3fe3bbafc

(not currently active) Ted Mielczarek

Comment 13

•

6 years ago

OK. I think working around this by disabling those tests is totally reasonable. We are using a known ancient msys version of sh.exe and I don't think it's worth anyone's time to actually try to figure out why it's crashing. It's entirely possible that it's only even happening in this way because we have a python.exe that launches Firefox and then the same process goes on to launch sh.exe (in the mozconfig loader in configure tests), which I don't think is a common scenario in the build.

Andrew Halberstadt [:ahal]

Comment 14

•

6 years ago

Comment on attachment 9009305 [details] Bug 1489277: Skip mozrunner tests if running under automation; r=ahal! Andrew Halberstadt [:ahal] has approved the revision.

Attachment #9009305 - Flags: review+

Andrew Halberstadt [:ahal]

Comment 15

•

6 years ago

Comment on attachment 9009305 [details] Bug 1489277: Skip mozrunner tests if running under automation; r=ahal! Andrew Halberstadt [:ahal] has requested changes to the revision.

Attachment #9009305 - Flags: review+

Andrew Halberstadt [:ahal]

Comment 16

•

6 years ago

Comment on attachment 9009305 [details] Bug 1489277: Skip mozrunner tests if running under automation; r=ahal! Andrew Halberstadt [:ahal] has approved the revision.

Attachment #9009305 - Flags: review+

(No longer employed by Mozilla) Aaron Klotz

Assignee

Updated

•

6 years ago

Assignee: nobody → aklotz

Status: NEW → ASSIGNED

Summary: Seemingly random crashes in sh.exe during mozbuild regression test suite → mozrunner unit tests causing crashes in sh.exe during mozbuild regression test suite

(No longer employed by Mozilla) Aaron Klotz

Assignee

Comment 17

•

6 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/2cdf22687fa74d1f01f4888837e46c8ae2ad7b5e Bug 1489277: Skip mozrunner tests if running under automation; r=ahal!

Bogdan Tara[:bogdan_tara | bogdant]

Comment 18

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/2cdf22687fa7

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox64: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla64

(No longer employed by Mozilla) Aaron Klotz

Assignee

Comment 19

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=10261f47cd07bac49cb3c1cc1e264f483b46d30e