Closed
Bug 1285582
Opened 8 years ago
Closed 8 years ago
Xvfb not running in TASKCLUSTER_INTERACTIVE mode
Categories
(Firefox Build System :: Task Configuration, task, P1)
Firefox Build System
Task Configuration
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla50
People
(Reporter: ahal, Assigned: ahal)
References
(Blocks 1 open bug)
Details
Attachments
(3 files)
In test-linux.sh, we start Xvfb: https://dxr.mozilla.org/mozilla-central/source/taskcluster/scripts/tester/test-linux.sh#74 However, after starting an interactive shell, running ps -ef | grep Xvfb shows that it is not running. And to back this up tests that need to open Firefox fail with "Cannot open display :0". Running Xvfb in the background and then re-running tests causes them to work. Here is the relevant log, I'm not sure how to interpret it (I kind of wish taskcluster didn't also dump the contents of the script): > # run XVfb in the background, if necessary > if $NEED_XVFB; then > Xvfb :0 -nolisten tcp -screen 0 1600x1200x24 \ > ~/artifacts/public/xvfb.log 2>&1 & > export DISPLAY=:0 > xvfb_pid=$! > # Only error code 255 matters, because it signifies that no > # display could be opened. As long as we can open the display > # tests should work. We'll retry a few times with a sleep before > # failing. > retry_count=0 > max_retries=2 > xvfb_test=0 > until [ $retry_count -gt $max_retries ]; do > xvinfo || xvfb_test=$? > if [ $xvfb_test != 255 ]; then > retry_count=$(($max_retries + 1)) > else > retry_count=$(($retry_count + 1)) > echo "Failed to start Xvfb, retry: $retry_count" > sleep 2 > fi > done > if [ $xvfb_test == 255 ]; then fail "xvfb did not start properly"; fi > fi > + true > + export DISPLAY=:0 > + DISPLAY=:0 > + xvfb_pid=34 > + retry_count=0 > + max_retries=2 > + xvfb_test=0 > + '[' 0 -gt 2 ']' > + xvinfo > + Xvfb :0 -nolisten tcp -screen 0 1600x1200x24
Comment 1•8 years ago
|
||
Is this only in interactive mode? I wonder if this is related to bug 1263815 There are some weird things around starting Xvfb, for sure.
Assignee | ||
Comment 2•8 years ago
|
||
How would I test if it's working in non-interactive mode? The test jobs are all green if that says anything.. Though from the log above, it looks like the Xvfb call is succeeding? I guess either it's falling over shortly after or else is returning an error other than 255? I'm not familiar with X display servers or how they work. I've attached the contents of ~/artifacts/public/xvfb.log
Comment 3•8 years ago
|
||
Yeah, that's about what I saw in bug 1263815: Xvfb's log just stops, with no actual error message. I suspect it's segfaulting or some other such joyfulness.
Assignee | ||
Comment 4•8 years ago
|
||
The weird(er) thing is that when I run Xvfb manually I see a very similar log, except everything magically works. Here's a copy pasta from an interactive shell.
Assignee | ||
Comment 5•8 years ago
|
||
(The output there is a bit munged because of Xvfb being run as a background process, but the important thing to note is that it doesn't fall over or anything)
Assignee | ||
Comment 6•8 years ago
|
||
Actually, the major difference between those two outputs is this line found in the first attachment but not the second: _XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.
Assignee | ||
Comment 7•8 years ago
|
||
I verified that when running in the context of an interactive shell, we are running as root. Whereas when running in the context of test-linux.sh, we have dropped down to the worker user due to: https://dxr.mozilla.org/mozilla-central/source/testing/docker/desktop-test/bin/test.sh#15 So perhaps Xvfb just needs to be run as root. But there are still questions I don't know the answer to: 1) How do we run this as root without running the rest of the script as root? 2) This used to work, what regressed it? That fallback to the worker user looks like it's been there for a long time, I think longer than this has been broken.. Finally, when running Xvfb from an interactive shell tests pass and the display session shows up in the interactive displays section. However when connecting to that session, I only see a black screen (when reftests are running I see the tests flicker in and out a bit, but still mostly just a black screen). I think this might be worth filing a separate bug for once the Xvfb issue is fixed.
Assignee | ||
Comment 8•8 years ago
|
||
Given comment 7, do you have any thoughts on how I should proceed?
Flags: needinfo?(dustin)
Assignee | ||
Comment 9•8 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #7) > Finally, when running Xvfb from an interactive shell tests pass and the > display session shows up in the interactive displays section. However when > connecting to that session, I only see a black screen (when reftests are > running I see the tests flicker in and out a bit, but still mostly just a > black screen). I think this might be worth filing a separate bug for once > the Xvfb issue is fixed. Nvm about this issue. It happens because x11vnc and XSession fail due to Xvfb not running.. starting those manually gets this working again. Though would still like feedback on how best to run Xvfb as root.
Comment 10•8 years ago
|
||
That is surprising, and indeed the changes to run as 'worker' landed about 10 months ago (bug 1199379). Probably the best way to run Xvfb as root would be to use `sudo` to launch it. If that doesn't work for whatever reason (we had some obscure issues with pulseaudio, bug 1214809), then you may need to run Xvfb from testing/docker/desktop-test/bin/test.sh before dropping privileges.. which could be even more difficult :(
Flags: needinfo?(dustin)
Assignee | ||
Comment 11•8 years ago
|
||
Is there a way to use sudo without hardcoding the password somewhere? Or is that not important.
Comment 12•8 years ago
|
||
Yes, we could just add that command to /etc/sudoers for worker in the desktop-test image. I could have sworn we had code in place to define sudoers already, but I don't see it, so it would need to be added.
Assignee | ||
Comment 13•8 years ago
|
||
I tried this patch: https://hg.mozilla.org/try/rev/701e5956d8d6dc10dd07d54107fb2f8a465b6c63 But it still didn't work. All 'sudo' invocations output "sudo: unable to resolve host taskcluster-worker". Apparently that error means /etc/hosts isn't set up properly (why?). But I think this is a red herring anyway as the commands still proceed to work after that error is printed. So to recap, I was able to run sudo Xvfb without requiring a password, but upon opening an interactive shell Xvfb is still not working. So maybe running as root wasn't the problem after all. I can see no other obvious errors in either stdout or xvfb.log. The only idea I have now is to improve logging in the test-linux.sh script and hope something jumps out.
Assignee | ||
Comment 14•8 years ago
|
||
So I'm dumb.. When using interactive mode, the process running test-linux.sh terminates almost immediately which means so does the Xvfb process it spawns. I was getting tripped up because for some reason I thought this used to work.. but I must have only been testing xpcshell which doesn't require Xvfb, so only thought it used to work. I guess the solutions are: 1) Run under screen session 2) Start Xvfb in the taskcluster-interactive.sh script I'm leaning towards 2) because I'm not sure what the implications are (if any) of leaving screen sessions lying around.
Comment 15•8 years ago
|
||
Screen sessions won't hurt (Docker will kill them when cleaning up the container). But I prefer #2 too.
Assignee | ||
Comment 16•8 years ago
|
||
So I actually changed my mind and ended up going with screen, because I couldn't think of a good way to share code between the two. Not that duplicating is that terrible in this case I guess. But since I have the patch, I'll upload so you can take a look. Feel free to r- and I'll just resubmit with the code duplicated in that case.
Assignee | ||
Comment 17•8 years ago
|
||
Another benefit of screen is that we don't need to check to see if Xvfb is already running (in the event that the developer starts multiple interactive shells).
Assignee | ||
Comment 18•8 years ago
|
||
Normally we start Xvfb as a background task, then run the tests from the same script. However, in interactive mode we were starting Xvfb, the script would exit, and then we would potentially run tests later on from another script. Unforunately this meant that Xvfb was dying with the first script and tests would fail. This patch runs Xvfb in a screen session so that it will still be available later on when running an interactive shell. Review commit: https://reviewboard.mozilla.org/r/64932/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/64932/
Attachment #8771974 -
Flags: review?(dustin)
Assignee | ||
Comment 19•8 years ago
|
||
https://reviewboard.mozilla.org/r/64932/#review61896 ::: taskcluster/scripts/tester/test-linux.sh:156 (Diff revision 1) > # arguments in from our own invocation > ${mozharness_bin}; > + screen -XS xvfb quit > fi > + > +ps -ef | grep Xvfb This is a leftover debug statement.. TODO: Remove
Updated•8 years ago
|
Assignee: nobody → ahalberstadt
Priority: -- → P1
Comment 20•8 years ago
|
||
https://reviewboard.mozilla.org/r/64932/#review62350 ::: taskcluster/scripts/tester/test-linux.sh:78 (Diff revision 1) > -# run XVfb in the background, if necessary > +# run Xvfb in the background, if necessary > if $NEED_XVFB; then > - Xvfb :0 -nolisten tcp -screen 0 1600x1200x24 \ > - > ~/artifacts/public/xvfb.log 2>&1 & > + screen -dmS xvfb Xvfb :0 -nolisten tcp -screen 0 1600x1200x24 \ > + > ~/artifacts/public/xvfb.log 2>&1 > export DISPLAY=:0 > xvfb_pid=$! Uhoh :( This is going to get the wrong pid, which means that the cleanup command fails later. I don't remember why we needed to clean things up (docker's pretty good at killing) Also, this is going to conflict with jmaher's work on bug 1263815, where I just encouraged him to use the xvfb_pid to see if xvfb started successfully. I could be convinced that killing xvfb in cleanup isn't necessary, if Joel agrees. ::: taskcluster/scripts/tester/test-linux.sh:153 (Diff revision 1) > # In interactive mode, the user will be prompted with options for what to do. > if [ "$TASKCLUSTER_INTERACTIVE" != "true" ]; then > # run the given mozharness script and configs, but pass the rest of the > # arguments in from our own invocation > ${mozharness_bin}; > + screen -XS xvfb quit This should probably be in cleanup() too? If this will have the effect of killing xvfb (does it listen to SIGHUP?) then maybe the pid really isn't useful..
Assignee | ||
Comment 21•8 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #20) > Uhoh :( > > This is going to get the wrong pid, which means that the cleanup command > fails later. I don't remember why we needed to clean things up (docker's > pretty good at killing) > > Also, this is going to conflict with jmaher's work on bug 1263815, where I > just encouraged him to use the xvfb_pid to see if xvfb started successfully. > > I could be convinced that killing xvfb in cleanup isn't necessary, if Joel > agrees. Oh right! I'll remove the xvfb_pid stuff for this patch so it's easier to see, but the jury can still be out on whether we take it or not. Cc'ing jmaher so he is aware. > This should probably be in cleanup() too? If this will have the effect of > killing xvfb (does it listen to SIGHUP?) then maybe the pid really isn't > useful.. Sure I can move it to cleanup(), we'll just need a second TASKCLUSTER_INTERACTIVE check. I'm *pretty sure* this kills it. I'll do some more testing to make sure though.
Comment 22•8 years ago
|
||
Comment on attachment 8771974 [details] Bug 1285582 - Make sure Xvfb is running for interactive tasks, https://reviewboard.mozilla.org/r/64932/#review62910
Attachment #8771974 -
Flags: review?(dustin) → review-
Assignee | ||
Comment 23•8 years ago
|
||
Comment on attachment 8771974 [details] Bug 1285582 - Make sure Xvfb is running for interactive tasks, Review request updated; see interdiff: https://reviewboard.mozilla.org/r/64932/diff/1-2/
Attachment #8771974 -
Flags: review- → review?(dustin)
Comment 24•8 years ago
|
||
Comment on attachment 8771974 [details] Bug 1285582 - Make sure Xvfb is running for interactive tasks, https://reviewboard.mozilla.org/r/64932/#review63644
Attachment #8771974 -
Flags: review?(dustin) → review+
Comment 25•8 years ago
|
||
there will be a conflict as on mozilla-inbound I landed a rename to test-linux.sh.
Assignee | ||
Comment 26•8 years ago
|
||
Mercurial can auto-merge a file rename (was able to rebase my change on top of yours locally without manual intervention). So autoland should be fine in this case.
Comment 27•8 years ago
|
||
Pushed by ahalberstadt@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/70b5b544b245 Make sure Xvfb is running for interactive tasks, r=dustin
Comment 28•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/70b5b544b245
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Updated•6 years ago
|
Product: TaskCluster → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•