Closed Bug 1379906 Opened 4 years ago Closed 4 years ago

Assertion and crash during startup when running Marionette tests due do security.sandbox.content.level=3

Categories

(Testing :: Marionette, defect)

Version 3
defect
Not set
critical

Tracking

(firefox56- fixed)

RESOLVED DUPLICATE of bug 1380690
Tracking Status
firefox56 - fixed

People

(Reporter: tnguyen, Assigned: haik)

References

Details

(Keywords: crash, regression)

Attachments

(3 files)

I tried to run ./mach firefox-ui-functional on my Mac machine but I got an error

1:00.19 LOG: MainThread ERROR Failure during harness execution
Traceback (most recent call last):

  File "/Volumes/B2G/Projects/git-gecko-local/testing/marionette/harness/marionette_harness/runtests.py", line 92, in cli
    failed = harness_instance.run()

  File "/Volumes/B2G/Projects/git-gecko-local/testing/marionette/harness/marionette_harness/runtests.py", line 72, in run
    runner.run_tests(tests)

  File "/Volumes/B2G/Projects/git-gecko-local/testing/marionette/harness/marionette_harness/runner/base.py", line 842, in run_tests
    self.marionette = self.driverclass(**self._build_kwargs())

  File "/Volumes/B2G/Projects/git-gecko-local/testing/marionette/client/marionette_driver/marionette.py", line 615, in __init__
    self.raise_for_port(timeout=self.startup_timeout)

  File "/Volumes/B2G/Projects/git-gecko-local/testing/marionette/client/marionette_driver/decorators.py", line 28, in _
    m._handle_socket_failure()

  File "/Volumes/B2G/Projects/git-gecko-local/testing/marionette/client/marionette_driver/decorators.py", line 23, in _
    return func(*args, **kwargs)

  File "/Volumes/B2G/Projects/git-gecko-local/testing/marionette/client/marionette_driver/marionette.py", line 701, in raise_for_port
    self.host, self.port))

IOError: Content process crashed (Reason: Timed out waiting for connection on localhost:2828
And I don't see port 2828 being in use in my machine, seems that marionette failed to register localhost 2828
Please see the IOError as reported, which includes that there is a content crash. As it looks this is a startup crash of Firefox. With the minidump_stackwalk binary correctly setup it should analyze the crash report. Please double check.
Flags: needinfo?(tnguyen)
And maybe use a debug build and the following options to run the tests: "-vv --gecko-log -"
(In reply to Henrik Skupin (:whimboo) [partly available 07/10 -07/14] from comment #3)
> And maybe use a debug build and the following options to run the tests: "-vv
> --gecko-log -"

Ah, I see the crash too. I am using the July 06 build, let me checkout the latest and see.
Flags: needinfo?(tnguyen)
If it's not fixed and you are able to get the stack of the crash it would be kinda helpful to know. So we can move this bug into the correct component.
Attached file crash.log
Attached file gecko.log
It works fine with --disable-10s, only crash for e10s
Attachment #8885613 - Attachment mime type: text/x-log → text/plain
Attachment #8885614 - Attachment mime type: text/x-log → text/plain
(In reply to Thomas Nguyen[:tnguyen] ni plz from comment #7)
> Created attachment 8885614 [details]
> gecko.log
> 
> It works fine with --disable-10s, only crash for e10s

Thread 0 (crashed)
 0  libmozglue.dylib + 0x1b61
    rbx = 0x000000013550cc60   r12 = 0x000000012e1e9d40
    r13 = 0x4700cb538a831106   r14 = 0x000000013550b850
    r15 = 0x0000000000000000   rip = 0x000000012950db61
    rsp = 0x00007fff50df8c30   rbp = 0x00007fff50df8c40
    Found by: given as instruction pointer in context
 1  XUL + 0xcc6c4
    rip = 0x000000010f3556c5   rsp = 0x00007fff50df8c50
    rbp = 0x00007fff50df8c60
    Found by: stack scanning

The stack is not symbolicated so it's not helpful. But maybe try to run in a profile with only a single content process. Might be related to a recent change in Firefox. If that's from a local build please run 'mach buildsymbols', or run via gdb/lldb. Once we have better info we can move it to the correct component.
Flags: needinfo?(tnguyen)
That's what I got from build with mach buildsymbols. I could not run firefox-ui-functional with debugger (gbd/lldb) optional. Not sure minidump would be helpful, but let me try later if I can find some interesting thing.
Flags: needinfo?(tnguyen)
Oh, you could use 'MINIDUMP_SAVE_PATH' and the minidump stackwalk binary to process the minidump afterward. Btw. i just tried and I can see the same crash. I'm using an artifact build and as it looks like `buildsymbols` is not working for that kind of build.
I had a bit of time in parallel and started a regression test. The crash itself started with the following merge into mozilla-central:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e23f55311ecb&tochange=433c379d6e44

Further checks would have to be done here, but I assume it's something av decoder related because I see an assertion and lines like:

#09: av1_temporal_filter_apply_sse2[/Volumes/data/code/gecko/obj/debug/dist/NightlyDebug.app/Contents/MacOS/XUL +0x41cb2ac]
Regression range is:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ba8db0fbc006&tochange=6b101438c684

So it's bug 1332190 which caused this regression, and Firefox to crash during startup. Haik, can you please have a look? Just run the following command to reproduce:

> mach marionette-test -vv --gecko-log -
Blocks: 1332190
Severity: normal → critical
Flags: needinfo?(haftandilian)
Summary: Unable to run firefox-ui-functional test. It says Timed out waiting for connection on localhost:2828 → Assertion and crash during startup when running Marionette tests
For now I'm moving this bug to Marionette. But it might need a better home.
Component: Firefox UI Tests → Marionette
QA Contact: hskupin
Assignee: nobody → haftandilian
Flags: needinfo?(haftandilian)
I did a quick test and can that for these tests we're not setting MOZ_DEVELOPER_REPO_DIR in mach. So we need to fix mach to set this environment variable before starting firefox.

As a workaround, as long as your MOZCONFIG puts your object dir within your repo, manually setting MOZ_DEVELOPER_REPO_DIR should work for now:

  $ MOZ_DEVELOPER_REPO_DIR=/path/to/mozilla-central ./mach firefox-ui-functional
So this is only a problem for self-made builds of Firefox? I mean that I could also run the harness itself with such a build or any other build. So I wonder if the mach command is the right place to set this variable. It looks like that the base code of mach already set it, but only for the run command:

https://dxr.mozilla.org/mozilla-central/source/python/mozbuild/mozbuild/mach_commands.py#1327

And test harnesses have to implement that on their own, like here for mochitest:

https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#1593

Which means we would have to add a new command line argument like `--topsrcdir` for Marionette, or better to mozrunner.
Flags: needinfo?(haftandilian)
(In reply to Henrik Skupin (:whimboo) [partly available 07/10 -07/14] from comment #15)
> So this is only a problem for self-made builds of Firefox?

It should only be a problem on self-made builds. With a "mach build package", the generated .app/ doesn't contain symlinks and we don't need to set MOZ_DEVELOPER_REPO_DIR. This was introduced in bug 1294641 and there's a bit more context there. We modified mach so that the var is set for mach run, mochitest, xpcshelltest, but we missed some.

And we're looking into ways to not depend on MOZ_DEVELOPER_REPO_DIR with bug 1380690 and bug 1380416.

> I mean that I
> could also run the harness itself with such a build or any other build. So I
> wonder if the mach command is the right place to set this variable. It looks
> like that the base code of mach already set it, but only for the run command:
> 
> https://dxr.mozilla.org/mozilla-central/source/python/mozbuild/mozbuild/
> mach_commands.py#1327
> 
> And test harnesses have to implement that on their own, like here for
> mochitest:
> 
> https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.
> py#1593
> 
> Which means we would have to add a new command line argument like
> `--topsrcdir` for Marionette, or better to mozrunner.

I'm not familiar with Marionette testing, but do you mean that you sometimes run it without mach? 

I agree it would be better if we didn't have to set MOZ_DEVELOPER_REPO_DIR in many different places. With bug 1380132, we will also have MOZ_DEVELOPER_OBJ_DIR.
Flags: needinfo?(haftandilian)
Flags: needinfo?(hskupin)
(In reply to Haik Aftandilian [:haik] from comment #16)
> It should only be a problem on self-made builds. With a "mach build
> package", the generated .app/ doesn't contain symlinks and we don't need to
> set MOZ_DEVELOPER_REPO_DIR. This was introduced in bug 1294641 and there's a
>
> And we're looking into ways to not depend on MOZ_DEVELOPER_REPO_DIR with bug
> 1380690 and bug 1380416.

I see. Thank you for that information.

> I'm not familiar with Marionette testing, but do you mean that you sometimes
> run it without mach? 

We run it without mach in test archives, or when you install marionette via pypi. But given your above explanation it shouldn't cause any harm for those people and our automation. I for myself run marionette directly locally because with mach some output gets eaten and pdb doesn't seem to work very well. But for those cases the env variable could be manually set.

Haik, given that you already assigned yourself, can you fix this quickly? Not sure if the fix should go into the mach script for Marionette (testing/marionette/mach_commands.py) or as a general entry into the mach base script. Gregory, do you have a preferred method here?
Flags: needinfo?(hskupin) → needinfo?(gps)
(In reply to Henrik Skupin (:whimboo) [partly available 07/10 -07/14] from comment #17)
> Haik, given that you already assigned yourself, can you fix this quickly?

Yes, I'm working on it now.

> Not sure if the fix should go into the mach script for Marionette
> (testing/marionette/mach_commands.py) or as a general entry into the mach
> base script. Gregory, do you have a preferred method here?

For reference, see the fix for bug 1294641 and posted patches on bug 1380132.
I've posted a patch which allows "./mach firefox-ui-functional" and "./mach marionette-test" to work.

Alex posted about some good progress on bug 1380416. That should eliminate the need for these changes when its ready. I suspect there are other tests that will need similar changes until 1380416 can land.
Comment on attachment 8886703 [details]
Bug 1379906 - Assertion and crash during startup when running Marionette tests.

https://reviewboard.mozilla.org/r/157478/#review162716

Haik, I can perfectly review this. Looking at the code I would still prefer to have the changes to the environment variable in mozrunner, but given that this is shortlived I'm fine with the current solution. There is one thing I would like to see added. Otherwise it looks fine. Thanks.

::: testing/marionette/harness/marionette_harness/runner/base.py:523
(Diff revision 1)
>                   server_root=None, gecko_log=None, result_callbacks=None,
>                   prefs=None, test_tags=None,
>                   socket_timeout=BaseMarionetteArguments.socket_timeout_default,
>                   startup_timeout=None, addons=None, workspace=None,
> -                 verbose=0, e10s=True, emulator=False, headless=False, **kwargs):
> +                 verbose=0, e10s=True, emulator=False, headless=False,
> +                 topsrcdir=None, topobjdir=None, **kwargs):

Given that you touch this file, could you also please add the command line options, so that both arguments can be used when using the CLI of Marionette?
Attachment #8886703 - Flags: review+
(In reply to Henrik Skupin (:whimboo) from comment #21)
> Comment on attachment 8886703 [details]
> Bug 1379906 - Assertion and crash during startup when running Marionette
> tests.
> 
> https://reviewboard.mozilla.org/r/157478/#review162716
> 
> Haik, I can perfectly review this. Looking at the code I would still prefer
> to have the changes to the environment variable in mozrunner, but given that
> this is shortlived I'm fine with the current solution. There is one thing I
> would like to see added. Otherwise it looks fine. Thanks.

Thanks for reviewing. I'm going to put this on hold because I've got something working for bug 1380690 which would solve this problem and some other instances of the same bug. As well as the situation where we run Firefox without mach.
Attachment #8886703 - Flags: review?(gps)
Ok, I think we can survive one or two more days. Thanks.
Depends on: 1380690
Summary: Assertion and crash during startup when running Marionette tests → Assertion and crash during startup when running Marionette tests due do security.sandbox.content.level=3
Duplicate of this bug: 1382029
Duplicate of this bug: 1383214
Where are we with this? Do we have a fix or should we back out the original patch while a proper fix is worked on?
Flags: needinfo?(haftandilian)
(In reply to David Burns :automatedtester from comment #26)
> Where are we with this? Do we have a fix or should we back out the original
> patch while a proper fix is worked on?

I think the best path forward is to land Bug 1380690 which will address this.
Flags: needinfo?(haftandilian)
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(gps)
Resolution: --- → DUPLICATE
Duplicate of bug: 1380690
Mark 56 fixed as bug 1380690 was fixed in 56.
You need to log in before you can comment on or make changes to this bug.