Closed Bug 1021230 Opened 10 years ago Closed 10 years ago

Setup mozauto user and its environment for QA slave nodes

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

Attachments

(2 files, 1 obsolete file)

With bug 1020659 we have dedicated node definitions for QA. Now we have to setup our mozauto user, and all the relevant services and settings for slave nodes.

What has to be done:
* Add user 'mozauto'
* Enable auto-login for mozauto
* Enable VNC (remote management on OS X)
* Set VNC screensize to 1024x768
* Disable noticeboard (OS X)
* Add smb://fs1.qa.scl3.mozilla.com as share
* Set user settings:
** Turn off screensaver
** Turn off screen locking
** Disable all energy power settings
** Mute sound output
** Disable system software updates

In regard of adding mozauto, Dustin gave the following information on bug 1020659:

Users::people adds people, while users::mozauto will add the role-based "mozauto" user.  They're unrelated.  You can probably just use users::builder, and set $builder_username = 'mozauto' in your config.  I'd recommend starting there, and if it turns out you need something more complex, creating users::mozauto later is always an option.

Dustin, would anything of that not be possible out of the box?
Nope!  Much of it is already done, and some of the rest would probably be acceptable for releng.  The rest can either be parametrized (e.g., screen size) or done in a different class (the mounted share).
Ok, so adding the mozauto user is absolutely not a problem. But what wonders me is how to setup X11 on Ubuntu. I tried the gui class by setting on_gpu to true, but that wants to install nvidia drivers, which are not supported by the graphics card we have via vSphere. So can someone please let me know what to do here?

That's what I have at the moment:

class toplevel::slave::qa inherits toplevel::slave {
    include users::builder
    include vnc

    class {
        gui:
            on_gpu => true;
    }
}
When rebooting the VM I see the following syslog entries:

Jun 13 03:50:22 mm-ub-1404-32 nvidia-persistenced: Started (4000)
Jun 13 03:50:22 mm-ub-1404-32 nvidia-persistenced: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 112 has read and write permissions for those files.
Jun 13 03:50:22 mm-ub-1404-32 nvidia-persistenced: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced
Jun 13 03:50:22 mm-ub-1404-32 nvidia-persistenced: Shutdown (4000)

That makes sense given that we do not have an nvidia gpu. So why do we currently force to install the nvidia package if on_gpu is set to true? Shouldn't we have an extra parameter for the type of graphic card?
Flags: needinfo?(bugspam.Callek)
Last Friday I talked with Justin on IRC about that problem and it looks like that all the existing machines have a GPU from Nvidia. Because of that we always trying to install those packages at the moment. It looks like we have to add a new parameter to the gui module, which allows to set specific GPU types.

Dustin, I also wonder about the user distribution across slaves. Will those machines only have the root account enabled, but not all the other QA admin users?
Flags: needinfo?(bugspam.Callek)
Isn't on_gpu => false what you want, since these are VMs which don't have a GPU?

There's room to add support for other, non-nvidia, graphics cards, but I don't thin you have those :)

And yes, they only end up with the root account and whatever automation account you create.
(In reply to Dustin J. Mitchell [:dustin] from comment #5)
> Isn't on_gpu => false what you want, since these are VMs which don't have a
> GPU?

Oh, I thought this is about xvfb or a graphical user interface. But if it is about a real graphic card, then we would need false here. Please keep in mind that we cannot use xfvb with Mozmill right now. So the normal Unity desktop as for Ubuntu desktop should be used.

> And yes, they only end up with the root account and whatever automation
> account you create.

Thanks. That helps, and we will use root for modifications.
Oh, that's interesting -- so you need to run against the graphics hardware provided by vmware, and not an xvfb.  Then you will need to expand the `on_gpu` option to determine what kind of GPU to run on -- nvidia for releng and VMware for QA.

There were some problems with Unity - bug 838351 comment 10 et seq. - so we're using LXDE instead.
(In reply to Dustin J. Mitchell [:dustin] from comment #7)
> Oh, that's interesting -- so you need to run against the graphics hardware
> provided by vmware, and not an xvfb.  Then you will need to expand the
> `on_gpu` option to determine what kind of GPU to run on -- nvidia for releng
> and VMware for QA.

Alright so that clarifies things up. Lemme see how to get this. Maybe I can determine that with facter.

> There were some problems with Unity - bug 838351 comment 10 et seq. - so
> we're using LXDE instead.

I don't think that this is true. By default the default Ubuntu deskop is installed but not LXDE:
http://hg.mozilla.org/build/puppet/file/f903eedd3293/modules/packages/manifests/linux_desktop.pp#l11

The package for LXDE will be 'lubuntu-desktop'.
Yes, you're right, we did switch back to Unity.
Hm, so it looks like that the graphics driver for the VMware emulated GPU comes with the VMWare Tools. Sadly I don't know how I get those installed because I cannot find a way in vSphere to mount it. It's grayed out. Dustin, I assume we could manage its version with puppetagain and stick the installation packages onto the data drive? If now we would have to figure out how to get the tools installed.
Wait, looks like this vmware tools integration is done by the hardware module. So all fine on that end.
(In reply to Henrik Skupin (:whimboo) from comment #11)
> Wait, looks like this vmware tools integration is done by the hardware
> module. So all fine on that end.

However you do need to specify version in the config and add the package to the data volume in the private section due to it being non-redist (we can't have it public)
The config already contains it and when running puppet on a slave it looks like it get successfully installed:

Notice: /Stage[main]/Vmwaretools::Kernel_upgrade/Exec[vmware_config_tools]/returns: executed successfully
Could someone please attach the output from a facter run on a buildbot machine, which has an nvidia GPU? That would help me a lot to get details in how to determine which drivers to use. Callek or Rail?
Flags: needinfo?(rail)
Flags: needinfo?(bugspam.Callek)
architecture => amd64
augeasversion => 0.10.0
bios_release_date => 06/27/2012
bios_vendor => American Megatrends Inc.
bios_version => 1.2a
blockdevice_sda_model => WDC WD5003ABYX-0
blockdevice_sda_size => 500107862016
blockdevice_sda_vendor => ATA
blockdevices => sda
boardmanufacturer => Supermicro
boardproductname => X8SIT
boardserialnumber => VM131S013219
domain => test.releng.scl3.mozilla.com
facterversion => 1.7.5
filesystems => ext3,ext4
fqdn => talos-linux64-ix-006.test.releng.scl3.mozilla.com
hardwareisa => x86_64
hardwaremodel => x86_64
hostname => talos-linux64-ix-006
id => root
interfaces => eth0,eth1,lo
ipaddress => 10.26.56.235
ipaddress_eth0 => 10.26.56.235
ipaddress_lo => 127.0.0.1
is_virtual => false
kernel => Linux
kernelmajversion => 3.2
kernelrelease => 3.2.0-38-generic
kernelversion => 3.2.0
lsbdistcodename => precise
lsbdistdescription => Ubuntu 12.04 LTS
lsbdistid => Ubuntu
lsbdistrelease => 12.04
lsbmajdistrelease => 12
macaddress => 00:25:90:c0:b3:36
macaddress_eth0 => 00:25:90:c0:b3:36
macaddress_eth1 => 00:25:90:c0:b3:37
manufacturer => iXsystems
memoryfree => 5.56 GB
memoryfree_mb => 5697.39
memorysize => 7.79 GB
memorysize_mb => 7977.89
memorytotal => 7.79 GB
mtu_eth0 => 1500
mtu_eth1 => 1500
mtu_lo => 16436
netmask => 255.255.252.0
netmask_eth0 => 255.255.252.0
netmask_lo => 255.0.0.0
network_eth0 => 10.26.56.0
network_lo => 127.0.0.0
operatingsystem => Ubuntu
operatingsystemrelease => 12.04
osfamily => Debian
path => /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11
physicalprocessorcount => 1
processor0 => Intel(R) Xeon(R) CPU           X3450  @ 2.67GHz
processor1 => Intel(R) Xeon(R) CPU           X3450  @ 2.67GHz
processor2 => Intel(R) Xeon(R) CPU           X3450  @ 2.67GHz
processor3 => Intel(R) Xeon(R) CPU           X3450  @ 2.67GHz
processor4 => Intel(R) Xeon(R) CPU           X3450  @ 2.67GHz
processor5 => Intel(R) Xeon(R) CPU           X3450  @ 2.67GHz
processor6 => Intel(R) Xeon(R) CPU           X3450  @ 2.67GHz
processor7 => Intel(R) Xeon(R) CPU           X3450  @ 2.67GHz
processorcount => 8
productname => iX21X4-STIBTRF
ps => ps -ef
puppetversion => 3.6.1
rubysitedir => /usr/local/lib/site_ruby/1.8
rubyversion => 1.8.7
selinux => false
serialnumber => A1-26388
swapfree => 7.99 GB
swapfree_mb => 8181.00
swapsize => 7.99 GB
swapsize_mb => 8181.00
timezone => PDT
type => Sealed-case PC
uniqueid => 007f0101
uptime => 1 day
uptime_days => 1
uptime_hours => 28
uptime_seconds => 103843
uuid => 49434D53-0200-90C0-2500-C090250037B3
virtual => physical
^
Flags: needinfo?(rail)
Depends on: 1026176
I meant rail might be able to help with bug 1026176.
Flags: needinfo?(bugspam.Callek)
This is more complicated as I thought. First, I got the xserver and lightdm working. Also x11vnc works, but all  with hand-made changes. Here things we have to change:

* To check which kind of system we have we can do a check for $::virtual. If it is set to 'physical' we have a native machine. Otherwise it's running under VMware. So I think that should be enough to unselect the installation of the nvidia drivers.

* Right now we place a copy of the xorg.conf file into /etc/X11. This file contains various lines for nvidia driver settings. When I remove that file and let Xorg auto-select the settings it works fine. I will have to check if it would be enough to simply remove the driver section, or make this a template so we can set the right settings.

* x11vnc fails to start because of a broken client auth. We are starting the xserver as root, but the x11vnc as user %builder_user. So not sure how this worked on RelEng systems but I cannot get VNC up until I specifiy the xauth file via the -auth parameter.

Given that I do not have a test system with a dedicated NVidia GPU I cannot really verify that my changes to not cause any regressions for those systems. So I think I will keep the current code in place and don't do a lot of refactoring.

Dustin or Rail, do you have any feedback on that? I'm not that good in configuring a xserver, so help in that (links etc) would be kinda appreciated.
No longer depends on: 1026176
Well, I just checked one of our existent Ubuntu machines with ubuntu-desktop installed, it doesn't have the /etc/X11/xorg.conf file. So maybe we can leave that alone and only copy it there in case of a dedicated GPU?
So the Xserver and lightdm are working, but XSession has issues and fails with errors like:

xrdb: No such file or directory
xrdb: Can't open display ':0'
No protocol specified
xhost:  unable to open display ":0"

The server is running on :0 but not sure yet what prevents us here from starting a session. Maybe we try to start it too early? When I issue the 'startx' command a bit later for the mozauto user, it works.
I'm not convinced that we have a third case here, at least not yet.  Releng runs some of its buildslaves (specifically, talos on hardware) with on_gpu => true, and some with on_gpu => false (lots and lots of EC2 instances).  In the latter configuration, no nvidia drivers are installed and since we run xvfb there's no xorg.conf.  This seems like exactly what you want.  Is there a hard requirement that the X server display its content on the VMware console?

Even if that *is* a hard requirement, I'd recommend getting everyting else working with xvfb first, then addressing the issue of running on the VMware console separately, just as a means of limiting the scope of work.

Using xvfb should fix the x11vnc problem, too, as with xvfb the X server runs as the builder user.  I suspect we just never run x11vnc on GPU-backed testers, and so haven't encountered that problem.
(In reply to Dustin J. Mitchell [:dustin] from comment #21)
> I'm not convinced that we have a third case here, at least not yet.  Releng
> runs some of its buildslaves (specifically, talos on hardware) with on_gpu
> => true, and some with on_gpu => false (lots and lots of EC2 instances).  In
> the latter configuration, no nvidia drivers are installed and since we run
> xvfb there's no xorg.conf.  This seems like exactly what you want.  Is there

Even if you run with xvfb at the moment the xorg.conf will always be placed in /etc/X11. There is no ensure line to disallow that in case of such a configuration:
http://hg.mozilla.org/build/puppet/file/68ec7430ffb1/modules/gui/manifests/init.pp#l62

> a hard requirement that the X server display its content on the VMware
> console?

Well, what we need is not xvfb given that Mozmill still fails to run our tests due to some reason. We tried that with Travis and we are still failing. See bug 928366. I'm not that familiar with the other Xserver stuff so I would appreciate further information and feedback from you regarding the VMware console.

> Even if that *is* a hard requirement, I'd recommend getting everyting else
> working with xvfb first, then addressing the issue of running on the VMware
> console separately, just as a means of limiting the scope of work.

Here I have the problem in updating the mesa libraries on bug 1026176. I'm not sure I can solve that adequately in a timely manner. It would block me even more in getting our own goals finished.

> Using xvfb should fix the x11vnc problem, too, as with xvfb the X server
> runs as the builder user.  I suspect we just never run x11vnc on GPU-backed
> testers, and so haven't encountered that problem.

Something I wonder is why we start the Xsession separately and do not setup lightdm to auto-login the builder_user. I think that might also fix our problem here. Was there any reason why to explicitely start the Xserver?
(In reply to Henrik Skupin (:whimboo) from comment #22)
> Even if you run with xvfb at the moment the xorg.conf will always be placed
> in /etc/X11. There is no ensure line to disallow that in case of such a
> configuration:
> http://hg.mozilla.org/build/puppet/file/68ec7430ffb1/modules/gui/manifests/
> init.pp#l62

True, but in that case we run xvfb, not xorg, so that config file doesn't hurt anything.  But it wouldn't hurt to make that config file conditional on on_gpu => true.

> Well, what we need is not xvfb given that Mozmill still fails to run our
> tests due to some reason. We tried that with Travis and we are still
> failing. See bug 928366. I'm not that familiar with the other Xserver stuff
> so I would appreciate further information and feedback from you regarding
> the VMware console.

Ah, OK.  We've never done testing on the console, and X was never my specialty.  In that case you may be right that it's not worth the effort to get xvfb working, and instead go straight for making xorg work.  To do that, I think that turning xorg.conf into a template is the right approach.

> Here I have the problem in updating the mesa libraries on bug 1026176. I'm
> not sure I can solve that adequately in a timely manner. It would block me
> even more in getting our own goals finished.

Those are only required for B2G reftests, according to the comments, so it should be possible to add another attribute, want_mesa, and set that to false for your slaves.

> Something I wonder is why we start the Xsession separately and do not setup
> lightdm to auto-login the builder_user. I think that might also fix our
> problem here. Was there any reason why to explicitely start the Xserver?

I think part of it is that we want puppet to run before X starts.  Rail may remember more about that decision.
(In reply to Dustin J. Mitchell [:dustin] from comment #23)
> I think part of it is that we want puppet to run before X starts.  Rail may
> remember more about that decision.

Sure, but shouldn't we better delay the startup of the XServer then? The XSession can only be started when the XServer is running, right? So we might not want to delay the XSession. I have to add this is a total unknown area for me and I have to dig into all that X11 foo stuff.
Flags: needinfo?(rail)
We use Ubuntu's upstart to manage the dependencies

1) puppet is started by /etc/init/puppet.conf: http://hg.mozilla.org/build/puppet/file/3c949a7ef18e/modules/puppet/files/puppet.upstart.conf

2) X related upstart configs are defined at http://hg.mozilla.org/build/puppet/file/3c949a7ef18e/modules/gui/manifests/init.pp#l50

Dependencies defined like these:
 http://hg.mozilla.org/build/puppet/file/3c949a7ef18e/modules/gui/templates/x11.conf.erb#l8 (starts after puppet)
 http://hg.mozilla.org/build/puppet/file/3c949a7ef18e/modules/gui/templates/Xsession.conf.erb (starts after X)

I hope it helps.
Flags: needinfo?(rail)
So we faced an interesting fact while talking on IRC. Actually do to the refactoring of slave.pp into qa.pp and releng.pp I moved to many included modules out of slave.pp. What we in qa.pp missed is the disableservices module. If we don't include that, lightdm will be started and claim the :0 display. XSession will fail to start then. Sadly after a puppet run, no X server was running at all. I restarted the machine and it works now. I wonder if we need a restart here, and if yes why we don't do it yet. Maybe I miss some other modules too. I will check that.

Ok, next is getting VNC to work.
It's possible to signal that a restart is needed pretty easily, but let's be sure we need it before going there.
x11vnc is disabled by default to consume CPU powah! It's not used by automation, only by humans when they want to debug things.

to start it manually just run as root:
 start x11vnc

To enable it on boot (manually): 

 sed -i 's/manual/start on started Xsession/' /etc/init/x11vnc.conf

(from https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave)
(In reply to Rail Aliiev [:rail] from comment #28)
> x11vnc is disabled by default to consume CPU powah! It's not used by
> automation, only by humans when they want to debug things.

That's interesting. I haven't thought about this yet. Given that me manage all via puppetagain now, we actually wont need VNC enabled even. Lets see how this works out. Thanks Rail!
(In reply to Dustin J. Mitchell [:dustin] from comment #27)
> It's possible to signal that a restart is needed pretty easily, but let's be
> sure we need it before going there.

I have seen it through an puppet agent run via my environment, when a green line appeared which mentioned that the slave has to be restarted. But the restart didn't happen. I will come back with details.
Right, the puppet startup scripts recognize that state and will reboot before allowing startup to proceed.  It won't happen if you just run 'puppet agent --test'.
Attached patch mozauto user + x server (obsolete) — Splinter Review
This patch adds our mozauto user, and also sets-up the xserver necessary by Mozmill. x11vnc will be installed, but not started automatically for now. Follow-up patches are necessary to finish all the remaining items.
Attachment #8442425 - Flags: review?(dustin)
It may be good if someone could check with a system running a Nvidia card? I don't have such one around.
Dustin, one question regarding the screen timeout. Right now it is set to 5 minutes, then the screen goes black. I cannot really find a command to turn this off. So before searching details here, I would like to know if that is a wanted change, or if this could cause troubles with RelEng machines.
Is that a DPMS timeout or xscreensaver?
Most likely DPMS because it turns of the screen.
I think 'xset' is the way to do that.  Perhaps that could be added to .Xsession?  I don't think releng will have any issue with that.
Comment on attachment 8442425 [details] [diff] [review]
mozauto user + x server

Review of attachment 8442425 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good - my comments are about things that aren't problematic.

::: modules/gui/manifests/init.pp
@@ +42,5 @@
>              include packages::gnome_settings_daemon
>              # Bug 859972: xrestop is needed for talos data collection
>              include packages::xrestop
>  
> +            if ($on_gpu == false) {

probably if (!$on_gpu) is better

@@ +56,5 @@
> +                    # Auto-detection of settings works fine, but it would be
> +                    # better to have that file generated from a template
> +                    "/etc/X11/xorg.conf":
> +                        content => template("${module_name}/xorg.conf.erb"),
> +                        notify => Service['x11'];

Isn't this using a template?? I'm confused by the comment..

@@ +68,1 @@
>              }

Maybe it'd be good to have an `else` here that ensures xorg.conf is absent?  Explicit is better than implicit :)
Attachment #8442425 - Flags: review?(dustin) → review+
Comment on attachment 8442425 [details] [diff] [review]
mozauto user + x server

Review of attachment 8442425 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/gui/manifests/init.pp
@@ +42,5 @@
>              include packages::gnome_settings_daemon
>              # Bug 859972: xrestop is needed for talos data collection
>              include packages::xrestop
>  
> +            if ($on_gpu == false) {

Indeed! :) I was just c&p the entries...

@@ +56,5 @@
> +                    # Auto-detection of settings works fine, but it would be
> +                    # better to have that file generated from a template
> +                    "/etc/X11/xorg.conf":
> +                        content => template("${module_name}/xorg.conf.erb"),
> +                        notify => Service['x11'];

Wrong wording here. It's already a template, but we don't really make use of it. Only for the GPU PCI bus id. I will update the comment.

@@ +68,1 @@
>              }

I better move it back and do a conditional for the ensure property. Good idea.
Attachment #8442425 - Attachment is obsolete: true
Attachment #8443342 - Flags: review+
Attached file log.txt
Just for additional information here the output of puppet with the patch applied.
(In reply to Dustin J. Mitchell [:dustin] from comment #37)
> I think 'xset' is the way to do that.  Perhaps that could be added to
> .Xsession?  I don't think releng will have any issue with that.

I think that I will wait with that change. Lets see our tests behave when the screen turns off. If that is causing problems I will add it later.
http://hg.mozilla.org/qa/puppet/rev/d896882232e1 (default)
http://hg.mozilla.org/qa/puppet/rev/29a04373a8e9 (production)

I kickstarted one of our new Ubuntu 14.04 nodes and it successfully booted into X with the mozauto user active.

In regards of that bug it should be everything what I wanna do right now. If we need special cases for OS X a new bug will be filed.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 1028104
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: