Closed
Bug 1021230
Opened 10 years ago
Closed 10 years ago
Setup mozauto user and its environment for QA slave nodes
Categories
(Infrastructure & Operations :: RelOps: Puppet, task)
Infrastructure & Operations
RelOps: Puppet
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: whimboo, Assigned: whimboo)
References
Details
Attachments
(2 files, 1 obsolete file)
11.09 KB,
patch
|
whimboo
:
review+
|
Details | Diff | Splinter Review |
19.98 KB,
text/plain
|
Details |
With bug 1020659 we have dedicated node definitions for QA. Now we have to setup our mozauto user, and all the relevant services and settings for slave nodes. What has to be done: * Add user 'mozauto' * Enable auto-login for mozauto * Enable VNC (remote management on OS X) * Set VNC screensize to 1024x768 * Disable noticeboard (OS X) * Add smb://fs1.qa.scl3.mozilla.com as share * Set user settings: ** Turn off screensaver ** Turn off screen locking ** Disable all energy power settings ** Mute sound output ** Disable system software updates In regard of adding mozauto, Dustin gave the following information on bug 1020659: Users::people adds people, while users::mozauto will add the role-based "mozauto" user. They're unrelated. You can probably just use users::builder, and set $builder_username = 'mozauto' in your config. I'd recommend starting there, and if it turns out you need something more complex, creating users::mozauto later is always an option. Dustin, would anything of that not be possible out of the box?
Comment 1•10 years ago
|
||
Nope! Much of it is already done, and some of the rest would probably be acceptable for releng. The rest can either be parametrized (e.g., screen size) or done in a different class (the mounted share).
Assignee | ||
Comment 2•10 years ago
|
||
Ok, so adding the mozauto user is absolutely not a problem. But what wonders me is how to setup X11 on Ubuntu. I tried the gui class by setting on_gpu to true, but that wants to install nvidia drivers, which are not supported by the graphics card we have via vSphere. So can someone please let me know what to do here? That's what I have at the moment: class toplevel::slave::qa inherits toplevel::slave { include users::builder include vnc class { gui: on_gpu => true; } }
Assignee | ||
Comment 3•10 years ago
|
||
When rebooting the VM I see the following syslog entries: Jun 13 03:50:22 mm-ub-1404-32 nvidia-persistenced: Started (4000) Jun 13 03:50:22 mm-ub-1404-32 nvidia-persistenced: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 112 has read and write permissions for those files. Jun 13 03:50:22 mm-ub-1404-32 nvidia-persistenced: The daemon no longer has permission to remove its runtime data directory /var/run/nvidia-persistenced Jun 13 03:50:22 mm-ub-1404-32 nvidia-persistenced: Shutdown (4000) That makes sense given that we do not have an nvidia gpu. So why do we currently force to install the nvidia package if on_gpu is set to true? Shouldn't we have an extra parameter for the type of graphic card?
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(bugspam.Callek)
Assignee | ||
Comment 4•10 years ago
|
||
Last Friday I talked with Justin on IRC about that problem and it looks like that all the existing machines have a GPU from Nvidia. Because of that we always trying to install those packages at the moment. It looks like we have to add a new parameter to the gui module, which allows to set specific GPU types. Dustin, I also wonder about the user distribution across slaves. Will those machines only have the root account enabled, but not all the other QA admin users?
Flags: needinfo?(bugspam.Callek)
Comment 5•10 years ago
|
||
Isn't on_gpu => false what you want, since these are VMs which don't have a GPU? There's room to add support for other, non-nvidia, graphics cards, but I don't thin you have those :) And yes, they only end up with the root account and whatever automation account you create.
Assignee | ||
Comment 6•10 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #5) > Isn't on_gpu => false what you want, since these are VMs which don't have a > GPU? Oh, I thought this is about xvfb or a graphical user interface. But if it is about a real graphic card, then we would need false here. Please keep in mind that we cannot use xfvb with Mozmill right now. So the normal Unity desktop as for Ubuntu desktop should be used. > And yes, they only end up with the root account and whatever automation > account you create. Thanks. That helps, and we will use root for modifications.
Comment 7•10 years ago
|
||
Oh, that's interesting -- so you need to run against the graphics hardware provided by vmware, and not an xvfb. Then you will need to expand the `on_gpu` option to determine what kind of GPU to run on -- nvidia for releng and VMware for QA. There were some problems with Unity - bug 838351 comment 10 et seq. - so we're using LXDE instead.
Assignee | ||
Comment 8•10 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #7) > Oh, that's interesting -- so you need to run against the graphics hardware > provided by vmware, and not an xvfb. Then you will need to expand the > `on_gpu` option to determine what kind of GPU to run on -- nvidia for releng > and VMware for QA. Alright so that clarifies things up. Lemme see how to get this. Maybe I can determine that with facter. > There were some problems with Unity - bug 838351 comment 10 et seq. - so > we're using LXDE instead. I don't think that this is true. By default the default Ubuntu deskop is installed but not LXDE: http://hg.mozilla.org/build/puppet/file/f903eedd3293/modules/packages/manifests/linux_desktop.pp#l11 The package for LXDE will be 'lubuntu-desktop'.
Comment 9•10 years ago
|
||
Yes, you're right, we did switch back to Unity.
Assignee | ||
Comment 10•10 years ago
|
||
Hm, so it looks like that the graphics driver for the VMware emulated GPU comes with the VMWare Tools. Sadly I don't know how I get those installed because I cannot find a way in vSphere to mount it. It's grayed out. Dustin, I assume we could manage its version with puppetagain and stick the installation packages onto the data drive? If now we would have to figure out how to get the tools installed.
Assignee | ||
Comment 11•10 years ago
|
||
Wait, looks like this vmware tools integration is done by the hardware module. So all fine on that end.
Comment 12•10 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #11) > Wait, looks like this vmware tools integration is done by the hardware > module. So all fine on that end. However you do need to specify version in the config and add the package to the data volume in the private section due to it being non-redist (we can't have it public)
Assignee | ||
Comment 13•10 years ago
|
||
The config already contains it and when running puppet on a slave it looks like it get successfully installed: Notice: /Stage[main]/Vmwaretools::Kernel_upgrade/Exec[vmware_config_tools]/returns: executed successfully
Assignee | ||
Comment 14•10 years ago
|
||
Could someone please attach the output from a facter run on a buildbot machine, which has an nvidia GPU? That would help me a lot to get details in how to determine which drivers to use. Callek or Rail?
Flags: needinfo?(rail)
Flags: needinfo?(bugspam.Callek)
Comment 15•10 years ago
|
||
architecture => amd64 augeasversion => 0.10.0 bios_release_date => 06/27/2012 bios_vendor => American Megatrends Inc. bios_version => 1.2a blockdevice_sda_model => WDC WD5003ABYX-0 blockdevice_sda_size => 500107862016 blockdevice_sda_vendor => ATA blockdevices => sda boardmanufacturer => Supermicro boardproductname => X8SIT boardserialnumber => VM131S013219 domain => test.releng.scl3.mozilla.com facterversion => 1.7.5 filesystems => ext3,ext4 fqdn => talos-linux64-ix-006.test.releng.scl3.mozilla.com hardwareisa => x86_64 hardwaremodel => x86_64 hostname => talos-linux64-ix-006 id => root interfaces => eth0,eth1,lo ipaddress => 10.26.56.235 ipaddress_eth0 => 10.26.56.235 ipaddress_lo => 127.0.0.1 is_virtual => false kernel => Linux kernelmajversion => 3.2 kernelrelease => 3.2.0-38-generic kernelversion => 3.2.0 lsbdistcodename => precise lsbdistdescription => Ubuntu 12.04 LTS lsbdistid => Ubuntu lsbdistrelease => 12.04 lsbmajdistrelease => 12 macaddress => 00:25:90:c0:b3:36 macaddress_eth0 => 00:25:90:c0:b3:36 macaddress_eth1 => 00:25:90:c0:b3:37 manufacturer => iXsystems memoryfree => 5.56 GB memoryfree_mb => 5697.39 memorysize => 7.79 GB memorysize_mb => 7977.89 memorytotal => 7.79 GB mtu_eth0 => 1500 mtu_eth1 => 1500 mtu_lo => 16436 netmask => 255.255.252.0 netmask_eth0 => 255.255.252.0 netmask_lo => 255.0.0.0 network_eth0 => 10.26.56.0 network_lo => 127.0.0.0 operatingsystem => Ubuntu operatingsystemrelease => 12.04 osfamily => Debian path => /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/bin/X11 physicalprocessorcount => 1 processor0 => Intel(R) Xeon(R) CPU X3450 @ 2.67GHz processor1 => Intel(R) Xeon(R) CPU X3450 @ 2.67GHz processor2 => Intel(R) Xeon(R) CPU X3450 @ 2.67GHz processor3 => Intel(R) Xeon(R) CPU X3450 @ 2.67GHz processor4 => Intel(R) Xeon(R) CPU X3450 @ 2.67GHz processor5 => Intel(R) Xeon(R) CPU X3450 @ 2.67GHz processor6 => Intel(R) Xeon(R) CPU X3450 @ 2.67GHz processor7 => Intel(R) Xeon(R) CPU X3450 @ 2.67GHz processorcount => 8 productname => iX21X4-STIBTRF ps => ps -ef puppetversion => 3.6.1 rubysitedir => /usr/local/lib/site_ruby/1.8 rubyversion => 1.8.7 selinux => false serialnumber => A1-26388 swapfree => 7.99 GB swapfree_mb => 8181.00 swapsize => 7.99 GB swapsize_mb => 8181.00 timezone => PDT type => Sealed-case PC uniqueid => 007f0101 uptime => 1 day uptime_days => 1 uptime_hours => 28 uptime_seconds => 103843 uuid => 49434D53-0200-90C0-2500-C090250037B3 virtual => physical
Comment 17•10 years ago
|
||
I meant rail might be able to help with bug 1026176.
Flags: needinfo?(bugspam.Callek)
Assignee | ||
Comment 18•10 years ago
|
||
This is more complicated as I thought. First, I got the xserver and lightdm working. Also x11vnc works, but all with hand-made changes. Here things we have to change: * To check which kind of system we have we can do a check for $::virtual. If it is set to 'physical' we have a native machine. Otherwise it's running under VMware. So I think that should be enough to unselect the installation of the nvidia drivers. * Right now we place a copy of the xorg.conf file into /etc/X11. This file contains various lines for nvidia driver settings. When I remove that file and let Xorg auto-select the settings it works fine. I will have to check if it would be enough to simply remove the driver section, or make this a template so we can set the right settings. * x11vnc fails to start because of a broken client auth. We are starting the xserver as root, but the x11vnc as user %builder_user. So not sure how this worked on RelEng systems but I cannot get VNC up until I specifiy the xauth file via the -auth parameter. Given that I do not have a test system with a dedicated NVidia GPU I cannot really verify that my changes to not cause any regressions for those systems. So I think I will keep the current code in place and don't do a lot of refactoring. Dustin or Rail, do you have any feedback on that? I'm not that good in configuring a xserver, so help in that (links etc) would be kinda appreciated.
No longer depends on: 1026176
Assignee | ||
Comment 19•10 years ago
|
||
Well, I just checked one of our existent Ubuntu machines with ubuntu-desktop installed, it doesn't have the /etc/X11/xorg.conf file. So maybe we can leave that alone and only copy it there in case of a dedicated GPU?
Assignee | ||
Comment 20•10 years ago
|
||
So the Xserver and lightdm are working, but XSession has issues and fails with errors like: xrdb: No such file or directory xrdb: Can't open display ':0' No protocol specified xhost: unable to open display ":0" The server is running on :0 but not sure yet what prevents us here from starting a session. Maybe we try to start it too early? When I issue the 'startx' command a bit later for the mozauto user, it works.
Comment 21•10 years ago
|
||
I'm not convinced that we have a third case here, at least not yet. Releng runs some of its buildslaves (specifically, talos on hardware) with on_gpu => true, and some with on_gpu => false (lots and lots of EC2 instances). In the latter configuration, no nvidia drivers are installed and since we run xvfb there's no xorg.conf. This seems like exactly what you want. Is there a hard requirement that the X server display its content on the VMware console? Even if that *is* a hard requirement, I'd recommend getting everyting else working with xvfb first, then addressing the issue of running on the VMware console separately, just as a means of limiting the scope of work. Using xvfb should fix the x11vnc problem, too, as with xvfb the X server runs as the builder user. I suspect we just never run x11vnc on GPU-backed testers, and so haven't encountered that problem.
Assignee | ||
Comment 22•10 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #21) > I'm not convinced that we have a third case here, at least not yet. Releng > runs some of its buildslaves (specifically, talos on hardware) with on_gpu > => true, and some with on_gpu => false (lots and lots of EC2 instances). In > the latter configuration, no nvidia drivers are installed and since we run > xvfb there's no xorg.conf. This seems like exactly what you want. Is there Even if you run with xvfb at the moment the xorg.conf will always be placed in /etc/X11. There is no ensure line to disallow that in case of such a configuration: http://hg.mozilla.org/build/puppet/file/68ec7430ffb1/modules/gui/manifests/init.pp#l62 > a hard requirement that the X server display its content on the VMware > console? Well, what we need is not xvfb given that Mozmill still fails to run our tests due to some reason. We tried that with Travis and we are still failing. See bug 928366. I'm not that familiar with the other Xserver stuff so I would appreciate further information and feedback from you regarding the VMware console. > Even if that *is* a hard requirement, I'd recommend getting everyting else > working with xvfb first, then addressing the issue of running on the VMware > console separately, just as a means of limiting the scope of work. Here I have the problem in updating the mesa libraries on bug 1026176. I'm not sure I can solve that adequately in a timely manner. It would block me even more in getting our own goals finished. > Using xvfb should fix the x11vnc problem, too, as with xvfb the X server > runs as the builder user. I suspect we just never run x11vnc on GPU-backed > testers, and so haven't encountered that problem. Something I wonder is why we start the Xsession separately and do not setup lightdm to auto-login the builder_user. I think that might also fix our problem here. Was there any reason why to explicitely start the Xserver?
Comment 23•10 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #22) > Even if you run with xvfb at the moment the xorg.conf will always be placed > in /etc/X11. There is no ensure line to disallow that in case of such a > configuration: > http://hg.mozilla.org/build/puppet/file/68ec7430ffb1/modules/gui/manifests/ > init.pp#l62 True, but in that case we run xvfb, not xorg, so that config file doesn't hurt anything. But it wouldn't hurt to make that config file conditional on on_gpu => true. > Well, what we need is not xvfb given that Mozmill still fails to run our > tests due to some reason. We tried that with Travis and we are still > failing. See bug 928366. I'm not that familiar with the other Xserver stuff > so I would appreciate further information and feedback from you regarding > the VMware console. Ah, OK. We've never done testing on the console, and X was never my specialty. In that case you may be right that it's not worth the effort to get xvfb working, and instead go straight for making xorg work. To do that, I think that turning xorg.conf into a template is the right approach. > Here I have the problem in updating the mesa libraries on bug 1026176. I'm > not sure I can solve that adequately in a timely manner. It would block me > even more in getting our own goals finished. Those are only required for B2G reftests, according to the comments, so it should be possible to add another attribute, want_mesa, and set that to false for your slaves. > Something I wonder is why we start the Xsession separately and do not setup > lightdm to auto-login the builder_user. I think that might also fix our > problem here. Was there any reason why to explicitely start the Xserver? I think part of it is that we want puppet to run before X starts. Rail may remember more about that decision.
Assignee | ||
Comment 24•10 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #23) > I think part of it is that we want puppet to run before X starts. Rail may > remember more about that decision. Sure, but shouldn't we better delay the startup of the XServer then? The XSession can only be started when the XServer is running, right? So we might not want to delay the XSession. I have to add this is a total unknown area for me and I have to dig into all that X11 foo stuff.
Flags: needinfo?(rail)
Comment 25•10 years ago
|
||
We use Ubuntu's upstart to manage the dependencies 1) puppet is started by /etc/init/puppet.conf: http://hg.mozilla.org/build/puppet/file/3c949a7ef18e/modules/puppet/files/puppet.upstart.conf 2) X related upstart configs are defined at http://hg.mozilla.org/build/puppet/file/3c949a7ef18e/modules/gui/manifests/init.pp#l50 Dependencies defined like these: http://hg.mozilla.org/build/puppet/file/3c949a7ef18e/modules/gui/templates/x11.conf.erb#l8 (starts after puppet) http://hg.mozilla.org/build/puppet/file/3c949a7ef18e/modules/gui/templates/Xsession.conf.erb (starts after X) I hope it helps.
Flags: needinfo?(rail)
Assignee | ||
Comment 26•10 years ago
|
||
So we faced an interesting fact while talking on IRC. Actually do to the refactoring of slave.pp into qa.pp and releng.pp I moved to many included modules out of slave.pp. What we in qa.pp missed is the disableservices module. If we don't include that, lightdm will be started and claim the :0 display. XSession will fail to start then. Sadly after a puppet run, no X server was running at all. I restarted the machine and it works now. I wonder if we need a restart here, and if yes why we don't do it yet. Maybe I miss some other modules too. I will check that. Ok, next is getting VNC to work.
Comment 27•10 years ago
|
||
It's possible to signal that a restart is needed pretty easily, but let's be sure we need it before going there.
Comment 28•10 years ago
|
||
x11vnc is disabled by default to consume CPU powah! It's not used by automation, only by humans when they want to debug things. to start it manually just run as root: start x11vnc To enable it on boot (manually): sed -i 's/manual/start on started Xsession/' /etc/init/x11vnc.conf (from https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave)
Assignee | ||
Comment 29•10 years ago
|
||
(In reply to Rail Aliiev [:rail] from comment #28) > x11vnc is disabled by default to consume CPU powah! It's not used by > automation, only by humans when they want to debug things. That's interesting. I haven't thought about this yet. Given that me manage all via puppetagain now, we actually wont need VNC enabled even. Lets see how this works out. Thanks Rail!
Assignee | ||
Comment 30•10 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #27) > It's possible to signal that a restart is needed pretty easily, but let's be > sure we need it before going there. I have seen it through an puppet agent run via my environment, when a green line appeared which mentioned that the slave has to be restarted. But the restart didn't happen. I will come back with details.
Comment 31•10 years ago
|
||
Right, the puppet startup scripts recognize that state and will reboot before allowing startup to proceed. It won't happen if you just run 'puppet agent --test'.
Assignee | ||
Comment 32•10 years ago
|
||
This patch adds our mozauto user, and also sets-up the xserver necessary by Mozmill. x11vnc will be installed, but not started automatically for now. Follow-up patches are necessary to finish all the remaining items.
Attachment #8442425 -
Flags: review?(dustin)
Assignee | ||
Comment 33•10 years ago
|
||
It may be good if someone could check with a system running a Nvidia card? I don't have such one around.
Assignee | ||
Comment 34•10 years ago
|
||
Dustin, one question regarding the screen timeout. Right now it is set to 5 minutes, then the screen goes black. I cannot really find a command to turn this off. So before searching details here, I would like to know if that is a wanted change, or if this could cause troubles with RelEng machines.
Comment 35•10 years ago
|
||
Is that a DPMS timeout or xscreensaver?
Assignee | ||
Comment 36•10 years ago
|
||
Most likely DPMS because it turns of the screen.
Comment 37•10 years ago
|
||
I think 'xset' is the way to do that. Perhaps that could be added to .Xsession? I don't think releng will have any issue with that.
Comment 38•10 years ago
|
||
Comment on attachment 8442425 [details] [diff] [review] mozauto user + x server Review of attachment 8442425 [details] [diff] [review]: ----------------------------------------------------------------- Looks good - my comments are about things that aren't problematic. ::: modules/gui/manifests/init.pp @@ +42,5 @@ > include packages::gnome_settings_daemon > # Bug 859972: xrestop is needed for talos data collection > include packages::xrestop > > + if ($on_gpu == false) { probably if (!$on_gpu) is better @@ +56,5 @@ > + # Auto-detection of settings works fine, but it would be > + # better to have that file generated from a template > + "/etc/X11/xorg.conf": > + content => template("${module_name}/xorg.conf.erb"), > + notify => Service['x11']; Isn't this using a template?? I'm confused by the comment.. @@ +68,1 @@ > } Maybe it'd be good to have an `else` here that ensures xorg.conf is absent? Explicit is better than implicit :)
Attachment #8442425 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 39•10 years ago
|
||
Comment on attachment 8442425 [details] [diff] [review] mozauto user + x server Review of attachment 8442425 [details] [diff] [review]: ----------------------------------------------------------------- ::: modules/gui/manifests/init.pp @@ +42,5 @@ > include packages::gnome_settings_daemon > # Bug 859972: xrestop is needed for talos data collection > include packages::xrestop > > + if ($on_gpu == false) { Indeed! :) I was just c&p the entries... @@ +56,5 @@ > + # Auto-detection of settings works fine, but it would be > + # better to have that file generated from a template > + "/etc/X11/xorg.conf": > + content => template("${module_name}/xorg.conf.erb"), > + notify => Service['x11']; Wrong wording here. It's already a template, but we don't really make use of it. Only for the GPU PCI bus id. I will update the comment. @@ +68,1 @@ > } I better move it back and do a conditional for the ensure property. Good idea.
Assignee | ||
Comment 40•10 years ago
|
||
Attachment #8442425 -
Attachment is obsolete: true
Attachment #8443342 -
Flags: review+
Assignee | ||
Comment 41•10 years ago
|
||
Just for additional information here the output of puppet with the patch applied.
Assignee | ||
Comment 42•10 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #37) > I think 'xset' is the way to do that. Perhaps that could be added to > .Xsession? I don't think releng will have any issue with that. I think that I will wait with that change. Lets see our tests behave when the screen turns off. If that is causing problems I will add it later.
Assignee | ||
Comment 43•10 years ago
|
||
http://hg.mozilla.org/qa/puppet/rev/d896882232e1 (default) http://hg.mozilla.org/qa/puppet/rev/29a04373a8e9 (production) I kickstarted one of our new Ubuntu 14.04 nodes and it successfully booted into X with the mozauto user active. In regards of that bug it should be everything what I wanna do right now. If we need special cases for OS X a new bug will be filed.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•