Last Comment Bug 1285582 - Xvfb not running in TASKCLUSTER_INTERACTIVE mode
: Xvfb not running in TASKCLUSTER_INTERACTIVE mode
Status: RESOLVED FIXED
:
Product: Taskcluster
Classification: Other
Component: Task Configuration (show other bugs)
: unspecified
: Unspecified Unspecified
P1 normal
: mozilla50
Assigned To: Andrew Halberstadt [:ahal]
:
:
Mentors:
Depends on:
Blocks: 1262260
  Show dependency treegraph
 
Reported: 2016-07-08 09:11 PDT by Andrew Halberstadt [:ahal]
Modified: 2016-07-26 15:46 PDT (History)
3 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---

MozReview Requests
Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:
Show discarded requests

Attachments
~/artifacts/public/xvfb.log (2.87 KB, text/plain)
2016-07-08 13:21 PDT, Andrew Halberstadt [:ahal]
no flags Details
Interactive shell copy pasta (3.29 KB, text/plain)
2016-07-08 13:35 PDT, Andrew Halberstadt [:ahal]
no flags Details
Bug 1285582 - Make sure Xvfb is running for interactive tasks, (58 bytes, text/x-review-board-request)
2016-07-18 06:48 PDT, Andrew Halberstadt [:ahal]
dustin: review+
Details | Review

Description User image Andrew Halberstadt [:ahal] 2016-07-08 09:11:44 PDT
In test-linux.sh, we start Xvfb:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/scripts/tester/test-linux.sh#74

However, after starting an interactive shell, running ps -ef | grep Xvfb shows that it is not running. And to back this up tests that need to open Firefox fail with "Cannot open display :0". Running Xvfb in the background and then re-running tests causes them to work.

Here is the relevant log, I'm not sure how to interpret it (I kind of wish taskcluster didn't also dump the contents of the script):
> # run XVfb in the background, if necessary
> if $NEED_XVFB; then
> Xvfb :0 -nolisten tcp -screen 0 1600x1200x24 \
>  ~/artifacts/public/xvfb.log 2>&1 &
> export DISPLAY=:0
> xvfb_pid=$!
> # Only error code 255 matters, because it signifies that no
> # display could be opened. As long as we can open the display
> # tests should work. We'll retry a few times with a sleep before
> # failing.
> retry_count=0
> max_retries=2
> xvfb_test=0
> until [ $retry_count -gt $max_retries ]; do
> xvinfo || xvfb_test=$?
> if [ $xvfb_test != 255 ]; then
> retry_count=$(($max_retries + 1))
> else
> retry_count=$(($retry_count + 1))
> echo "Failed to start Xvfb, retry: $retry_count"
> sleep 2
> fi
> done
> if [ $xvfb_test == 255 ]; then fail "xvfb did not start properly"; fi
> fi
> + true
> + export DISPLAY=:0
> + DISPLAY=:0
> + xvfb_pid=34
> + retry_count=0
> + max_retries=2
> + xvfb_test=0
> + '[' 0 -gt 2 ']'
> + xvinfo
> + Xvfb :0 -nolisten tcp -screen 0 1600x1200x24
Comment 1 User image Dustin J. Mitchell [:dustin] 2016-07-08 13:13:44 PDT
Is this only in interactive mode?  I wonder if this is related to bug 1263815

There are some weird things around starting Xvfb, for sure.
Comment 2 User image Andrew Halberstadt [:ahal] 2016-07-08 13:21:46 PDT
Created attachment 8769334 [details]
~/artifacts/public/xvfb.log

How would I test if it's working in non-interactive mode? The test jobs are all green if that says anything..

Though from the log above, it looks like the Xvfb call is succeeding? I guess either it's falling over shortly after or else is returning an error other than 255? I'm not familiar with X display servers or how they work. I've attached the contents of ~/artifacts/public/xvfb.log
Comment 3 User image Dustin J. Mitchell [:dustin] 2016-07-08 13:28:18 PDT
Yeah, that's about what I saw in bug 1263815: Xvfb's log just stops, with no actual error message.  I suspect it's segfaulting or some other such joyfulness.
Comment 4 User image Andrew Halberstadt [:ahal] 2016-07-08 13:35:15 PDT
Created attachment 8769337 [details]
Interactive shell copy pasta

The weird(er) thing is that when I run Xvfb manually I see a very similar log, except everything magically works. Here's a copy pasta from an interactive shell.
Comment 5 User image Andrew Halberstadt [:ahal] 2016-07-08 13:36:42 PDT
(The output there is a bit munged because of Xvfb being run as a background process, but the important thing to note is that it doesn't fall over or anything)
Comment 6 User image Andrew Halberstadt [:ahal] 2016-07-08 13:38:46 PDT
Actually, the major difference between those two outputs is this line found in the first attachment but not the second:
_XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.
Comment 7 User image Andrew Halberstadt [:ahal] 2016-07-11 07:52:30 PDT
I verified that when running in the context of an interactive shell, we are running as root. Whereas when running in the context of test-linux.sh, we have dropped down to the worker user due to:
https://dxr.mozilla.org/mozilla-central/source/testing/docker/desktop-test/bin/test.sh#15

So perhaps Xvfb just needs to be run as root. But there are still questions I don't know the answer to:

1) How do we run this as root without running the rest of the script as root?
2) This used to work, what regressed it? That fallback to the worker user looks like it's been there for a long time, I think longer than this has been broken..

Finally, when running Xvfb from an interactive shell tests pass and the display session shows up in the interactive displays section. However when connecting to that session, I only see a black screen (when reftests are running I see the tests flicker in and out a bit, but still mostly just a black screen). I think this might be worth filing a separate bug for once the Xvfb issue is fixed.
Comment 8 User image Andrew Halberstadt [:ahal] 2016-07-11 07:53:44 PDT
Given comment 7, do you have any thoughts on how I should proceed?
Comment 9 User image Andrew Halberstadt [:ahal] 2016-07-11 08:05:25 PDT
(In reply to Andrew Halberstadt [:ahal] from comment #7)
> Finally, when running Xvfb from an interactive shell tests pass and the
> display session shows up in the interactive displays section. However when
> connecting to that session, I only see a black screen (when reftests are
> running I see the tests flicker in and out a bit, but still mostly just a
> black screen). I think this might be worth filing a separate bug for once
> the Xvfb issue is fixed.

Nvm about this issue. It happens because x11vnc and XSession fail due to Xvfb not running.. starting those manually gets this working again. Though would still like feedback on how best to run Xvfb as root.
Comment 10 User image Dustin J. Mitchell [:dustin] 2016-07-11 08:57:14 PDT
That is surprising, and indeed the changes to run as 'worker' landed about 10 months ago (bug 1199379).

Probably the best way to run Xvfb as root would be to use `sudo` to launch it.  If that doesn't work for whatever reason (we had some obscure issues with pulseaudio, bug 1214809), then you may need to run Xvfb from testing/docker/desktop-test/bin/test.sh before dropping privileges.. which could be even more difficult :(
Comment 11 User image Andrew Halberstadt [:ahal] 2016-07-11 13:21:19 PDT
Is there a way to use sudo without hardcoding the password somewhere? Or is that not important.
Comment 12 User image Dustin J. Mitchell [:dustin] 2016-07-11 13:26:07 PDT
Yes, we could just add that command to /etc/sudoers for worker in the desktop-test image.

I could have sworn we had code in place to define sudoers already, but I don't see it, so it would need to be added.
Comment 13 User image Andrew Halberstadt [:ahal] 2016-07-12 08:14:02 PDT
I tried this patch:
https://hg.mozilla.org/try/rev/701e5956d8d6dc10dd07d54107fb2f8a465b6c63

But it still didn't work. All 'sudo' invocations output "sudo: unable to resolve host taskcluster-worker". Apparently that error means /etc/hosts isn't set up properly (why?). But I think this is a red herring anyway as the commands still proceed to work after that error is printed.

So to recap, I was able to run sudo Xvfb without requiring a password, but upon opening an interactive shell Xvfb is still not working. So maybe running as root wasn't the problem after all. I can see no other obvious errors in either stdout or xvfb.log. The only idea I have now is to improve logging in the test-linux.sh script and hope something jumps out.
Comment 14 User image Andrew Halberstadt [:ahal] 2016-07-14 09:14:48 PDT
So I'm dumb..

When using interactive mode, the process running test-linux.sh terminates almost immediately which means so does the Xvfb process it spawns. I was getting tripped up because for some reason I thought this used to work.. but I must have only been testing xpcshell which doesn't require Xvfb, so only thought it used to work.

I guess the solutions are:
1) Run under screen session
2) Start Xvfb in the taskcluster-interactive.sh script

I'm leaning towards 2) because I'm not sure what the implications are (if any) of leaving screen sessions lying around.
Comment 15 User image Dustin J. Mitchell [:dustin] 2016-07-18 06:36:09 PDT
Screen sessions won't hurt (Docker will kill them when cleaning up the container).  But I prefer #2 too.
Comment 16 User image Andrew Halberstadt [:ahal] 2016-07-18 06:42:54 PDT
So I actually changed my mind and ended up going with screen, because I couldn't think of a good way to share code between the two. Not that duplicating is that terrible in this case I guess. But since I have the patch, I'll upload so you can take a look. Feel free to r- and I'll just resubmit with the code duplicated in that case.
Comment 17 User image Andrew Halberstadt [:ahal] 2016-07-18 06:46:18 PDT
Another benefit of screen is that we don't need to check to see if Xvfb is already running (in the event that the developer starts multiple interactive shells).
Comment 18 User image Andrew Halberstadt [:ahal] 2016-07-18 06:48:24 PDT
Created attachment 8771974 [details]
Bug 1285582 - Make sure Xvfb is running for interactive tasks,

Normally we start Xvfb as a background task, then run the tests from the
same script. However, in interactive mode we were starting Xvfb, the script
would exit, and then we would potentially run tests later on from another
script. Unforunately this meant that Xvfb was dying with the first script
and tests would fail.

This patch runs Xvfb in a screen session so that it will still be available
later on when running an interactive shell.

Review commit: https://reviewboard.mozilla.org/r/64932/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/64932/
Comment 19 User image Andrew Halberstadt [:ahal] 2016-07-18 06:50:06 PDT
https://reviewboard.mozilla.org/r/64932/#review61896

::: taskcluster/scripts/tester/test-linux.sh:156
(Diff revision 1)
>    # arguments in from our own invocation
>    ${mozharness_bin};
> +  screen -XS xvfb quit
>  fi
> +
> +ps -ef | grep Xvfb

This is a leftover debug statement.. TODO: Remove
Comment 20 User image Dustin J. Mitchell [:dustin] 2016-07-19 14:14:40 PDT
https://reviewboard.mozilla.org/r/64932/#review62350

::: taskcluster/scripts/tester/test-linux.sh:78
(Diff revision 1)
> -# run XVfb in the background, if necessary
> +# run Xvfb in the background, if necessary
>  if $NEED_XVFB; then
> -    Xvfb :0 -nolisten tcp -screen 0 1600x1200x24 \
> -       > ~/artifacts/public/xvfb.log 2>&1 &
> +    screen -dmS xvfb Xvfb :0 -nolisten tcp -screen 0 1600x1200x24 \
> +       > ~/artifacts/public/xvfb.log 2>&1
>      export DISPLAY=:0
>      xvfb_pid=$!

Uhoh :(

This is going to get the wrong pid, which means that the cleanup command fails later.  I don't remember why we needed to clean things up (docker's pretty good at killing)

Also, this is going to conflict with jmaher's work on bug 1263815, where I just encouraged him to use the xvfb_pid to see if xvfb started successfully.

I could be convinced that killing xvfb in cleanup isn't necessary, if Joel agrees.

::: taskcluster/scripts/tester/test-linux.sh:153
(Diff revision 1)
>  # In interactive mode, the user will be prompted with options for what to do.
>  if [ "$TASKCLUSTER_INTERACTIVE" != "true" ]; then
>    # run the given mozharness script and configs, but pass the rest of the
>    # arguments in from our own invocation
>    ${mozharness_bin};
> +  screen -XS xvfb quit

This should probably be in cleanup() too?  If this will have the effect of killing xvfb (does it listen to SIGHUP?) then maybe the pid really isn't useful..
Comment 21 User image Andrew Halberstadt [:ahal] 2016-07-20 06:05:52 PDT
(In reply to Dustin J. Mitchell [:dustin] from comment #20)
> Uhoh :(
> 
> This is going to get the wrong pid, which means that the cleanup command
> fails later.  I don't remember why we needed to clean things up (docker's
> pretty good at killing)
> 
> Also, this is going to conflict with jmaher's work on bug 1263815, where I
> just encouraged him to use the xvfb_pid to see if xvfb started successfully.
> 
> I could be convinced that killing xvfb in cleanup isn't necessary, if Joel
> agrees.

Oh right! I'll remove the xvfb_pid stuff for this patch so it's easier to see, but the jury can still be out on whether we take it or not. Cc'ing jmaher so he is aware.

 
> This should probably be in cleanup() too?  If this will have the effect of
> killing xvfb (does it listen to SIGHUP?) then maybe the pid really isn't
> useful..

Sure I can move it to cleanup(), we'll just need a second TASKCLUSTER_INTERACTIVE check. I'm *pretty sure* this kills it. I'll do some more testing to make sure though.
Comment 22 User image Dustin J. Mitchell [:dustin] 2016-07-21 07:51:42 PDT
Comment on attachment 8771974 [details]
Bug 1285582 - Make sure Xvfb is running for interactive tasks,

https://reviewboard.mozilla.org/r/64932/#review62910
Comment 23 User image Andrew Halberstadt [:ahal] 2016-07-22 13:13:59 PDT
Comment on attachment 8771974 [details]
Bug 1285582 - Make sure Xvfb is running for interactive tasks,

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64932/diff/1-2/
Comment 24 User image Dustin J. Mitchell [:dustin] 2016-07-25 10:23:55 PDT
Comment on attachment 8771974 [details]
Bug 1285582 - Make sure Xvfb is running for interactive tasks,

https://reviewboard.mozilla.org/r/64932/#review63644
Comment 25 User image Joel Maher ( :jmaher) 2016-07-25 10:25:54 PDT
there will be a conflict as on mozilla-inbound I landed a rename to test-linux.sh.
Comment 26 User image Andrew Halberstadt [:ahal] 2016-07-26 06:36:47 PDT
Mercurial can auto-merge a file rename (was able to rebase my change on top of yours locally without manual intervention). So autoland should be fine in this case.
Comment 27 User image Pulsebot 2016-07-26 06:38:58 PDT
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/70b5b544b245
Make sure Xvfb is running for interactive tasks, r=dustin
Comment 28 User image Wes Kocher (:KWierso) 2016-07-26 15:46:22 PDT
https://hg.mozilla.org/mozilla-central/rev/70b5b544b245

Note You need to log in before you can comment on or make changes to this bug.