Closed Bug 681726 Opened 13 years ago Closed 13 years ago

ensure that puppet can configure 10.6 testing slave.

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jhford, Assigned: jhford)

References

Details

Attachments

(1 file, 4 obsolete files)

We need to get a bunch of rev4 machines imaged and into production.  Lets do this with puppet
A couple issues that I have found so far:
1) xcode is installed when it really shouldn't be.  bug 681748
2) cscreen is a PPC only application, requiring rosetta to be installed. I assume this is something for screen resolution?  Either way, rosetta is an optional install in 10.6 and I feel is not something we should be installing on the slaves if at all avoidable.
It looks like there is an open source program that might be able to replace cscreen at http://hints.macworld.com/article.php?story=20090413120929454
I would be fine taking a replacement for cscreen (since we don't have the source code and it is obsolete) as long as we can source code of the replacement is open source and we have very clear instructions on how to build it.

This would help community members set things themselves.

Which ref page will you use?
I just ran through the unittest suite on these machines.  I didn't count passes and failures.  The timestamp is when the suite finished

1314233474 reftest rc: 0
1314236141 mochitest rc: 0
1314236468 mochitest-chrome rc: 0
1314237331 mochitest-browser-chrome rc: 0
1314237355 mochitest-ipcplugins rc: 0
1314237515 crashtest rc: 0
1314237792 jsreftest rc: 0
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #3)
> I would be fine taking a replacement for cscreen (since we don't have the
> source code and it is obsolete) as long as we can source code of the
> replacement is open source and we have very clear instructions on how to
> build it.

With something like this, we might only have close source choices.  I am not going to block on this for now.

> This would help community members set things themselves.
> 
> Which ref page will you use?

Do you mean a ReferencePlatforms/ page?  We don't currently have a useful one.  The complete set of non-puppet steps as of now are:

1) install base OS 10.6.8
2) update to 10.6.8 using http://support.apple.com/kb/DL1399
3) run this
curl -O http://downloads.puppetlabs.com/gems/facter-1.5.6.gem
curl -O http://projects.puppetlabs.com/attachments/download/584/puppet-0.24.8.gem
sudo gem install facter-1.5.6.gem puppet-0.24.8.gem
4) run puppet
sudo puppetd -v --test --server staging-puppet.build.mozilla.org

As so much of this is dependent on puppet to work, I think we should outline these steps in a text document in the puppet-manifests repository.
wget needs to be built+installed.

wget ftp://ftp.gnu.org/pub/gnu/wget/wget-1.10.2.tar.gz
tar zxf wget-1.10.2.tar.gz
cd wget-1.10.2
./configure --prefix=/usr/local --disable-nls #because we don't want to depend on gettext
make
make install DESTDIR=../root
wget http://hg.mozilla.org/build/puppet-manifests/raw-file/aa75d6e91838/create-dmg.sh
sh create-dmg.sh root/usr/local wget-1.10.2 wget /usr/
scp wget-1.10.2.dmg staging-puppet.build.mozilla.org:/N/staging/darwin10-i386/test/DMGs/
not sure why, but when I try to deploy the wget package to the slaves using

            package {
                "wget-1.10.2.dmg":
                    provider => pkgdmg,
                    ensure => installed,
                    source => "${platform_httproot}/DMGs/wget-1.10.2.dmg";
            }

I get the following (line 30 is the closing brace from above)
err: //Node[rev4-testing]/talosslave/talos_osx/Package[wget-1.10.2.dmg]/ensure: change from absent to present failed: Could not set present on ensure: undefined method `[]' for nil:NilClass at /etc/puppet/manifests/os/talos_osx.pp:30
running puppetd again with -d gave me the commands pkgdmg is trying to use.  

When I run these commands on the command line, I don't get a drive mounting.  When I try mounting this disk image with "open /tmp/wget-1.10.2.dmg" I am able to mount this drive.  I suspect that the failure of hdiutil is killing the ruby script.

rev4-testing:~ cltbld$ /usr/bin/curl -o /tmp/wget-1.10.2.dmg -C - -k -s --url http://staging-puppet.build.mozilla.org/staging/darwin10-i386/test/DMGs/wget-1.10.2.dmg
rev4-testing:~ cltbld$ sudo /usr/bin/hdiutil mount -plist -nobrowse -readonly -noidme -mountrandom /tmp /tmp/wget-1.10.2.dmg
Password:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>system-entities</key>
	<array>
		<dict>
			<key>content-hint</key>
			<string>Apple_partition_scheme</string>
			<key>dev-entry</key>
			<string>/dev/disk1</string>
			<key>potentially-mountable</key>
			<integer>0</integer>
			<key>unmapped-content-hint</key>
			<string>Apple_partition_scheme</string>
		</dict>
		<dict>
			<key>content-hint</key>
			<string>Apple_HFS</string>
			<key>dev-entry</key>
			<string>/dev/disk1s2</string>
			<key>mount-point</key>
			<string>/private/tmp/dmg.TxXG6q</string>
			<key>potentially-mountable</key>
			<integer>1</integer>
			<key>unmapped-content-hint</key>
			<string>Apple_HFS</string>
			<key>volume-kind</key>
			<string>hfs</string>
		</dict>
		<dict>
			<key>content-hint</key>
			<string>Apple_partition_map</string>
			<key>dev-entry</key>
			<string>/dev/disk1s1</string>
			<key>potentially-mountable</key>
			<integer>0</integer>
			<key>unmapped-content-hint</key>
			<string>Apple_partition_map</string>
		</dict>
	</array>
</dict>
</plist>
Depends on: 682080
It seems to work on reboot, but I am not able to run puppetd on the slave again successfully.  Requesting another mini in bug 682080 to do clean room verification.

I also found 'pmset' which we can use to adjust the power saving properties on the command line, so we can verify these settings on boot, and remove another manual step.

Regarding our dependency on Xcode, I have found that installing 'install_name_tool' manually by copying from my laptop to /usr/local/bin helps the python install move forward.

The newer issue is that we are creating our virtualenv without using --no-site-packages.  Because we are using site-packages, and distutils expects the .py files to be present, we are failing because the default OS X install only ships the .pyc and .pyo files.

  File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.6/distutils/__init__.py", line 16, in <module>
    exec open(os.path.join(distutils_path, '__init__.py')).read()
IOError: [Errno 2] No such file or directory: '/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/distutils/__init__.py'

rev4-testing:~ cltbld$ file /System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/distutils/__init__.py*
/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/distutils/__init__.pyc: python 2.6 byte-compiled
/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/distutils/__init__.pyo: python 2.6 byte-compiled
I wrote a script to automatically unpack all the files in the Xcode dmg to figure out where the Python framwork files are.  

#!/bin/bash
set -x

for i in /Volumes/Xcode\ and\ iOS\ SDK/Packages/*.pkg ; do
    pkgname=$(basename "$i" | sed 's/\(.*\)[.]pkg$/\1/')
    mkdir "$pkgname"
    cd "$pkgname"
    xar -xf "$i" Payload
    mv Payload Payload.gz #i thought there was a way to force gunzip to unzip. -f maybe?
    gunzip Payload.gz
    cat Payload | cpio -id
    cd ..
done

Turns out that they are in /Volumes/Xcode and iOS SDK/Packages/DevSDK.pkg.  Inspection of this package tells me that there are *tons* of other frameworks in there and that we probably don't want to install them as well.  I am going to create a dmg that will install the Python framework files for Python 2.5 and Python 2.6 and see if that gets us a working.  Since all the files that are part of the sdk are .py, .pyc, .pyo or .h files, I think we are fine to package these files and deploy them as is.

create-dmg.sh Python.framework python-sdk-3.2.6 python-sdk /System/Library/Frameworks/

I am creating a test dmg using
create-dmg.sh Python.framework test-python-sdk-3.2.6 test-python-sdk /test/System/Library/Frameworks/ to install locally before I deploy this to the testing slave
Yay!  the dmg created in comment 10 above allows virtualenv to work and means that I am able to run puppet on these machines without error.  Things that I know I need to finish adding to the manifests are:
-cscreen or replacement
-apache starting
-apache config
-buildbot auto starting
Attached patch wip patch (obsolete) — Splinter Review
this patch allows puppet to sync on the rev4 testing slave that I have
Assignee: nobody → jhford
Status: NEW → ASSIGNED
(In reply to John Ford [:jhford] from comment #11)
> -buildbot auto starting

Actually, I am making an educated guess that this is happening properly and the thing lacking here is the entry in slavealloc for 'rev4-testing', and there is no fallback tac file.  I am going to try adding this slave to slavealloc and my personal testing master.
(In reply to John Ford [:jhford] from comment #13)
> (In reply to John Ford [:jhford] from comment #11)
> > -buildbot auto starting
> 
> Actually, I am making an educated guess that this is happening properly and
> the thing lacking here is the entry in slavealloc for 'rev4-testing', and
> there is no fallback tac file.  I am going to try adding this slave to
> slavealloc and my personal testing master.

Starting runslave manually on the command line does result in the slave starting, not sure why it doesn't start on boot
Just finished an XPCShell test run and found that sudo isn't working properly, another thing to fix.
also, need to figure out how to turn off wifi with puppet
So a recap of manual pre-puppet steps as I understand them:

-base install isn't in puppet because that's not what puppet does
-Mac OS X 10.6.8 Combined update V1.1 isn't in puppet because it needs a restart and doesn't reliable run through the installer (i usually need at least 2 attempts, but it does work)
-enabling VNC is manual because my attempts at using the ARD kickstart script require having the cleartext password in the puppet manifests.
-bootstrap puppet. I can write a short script that will download + install puppet and facter


The things I need to do before we are ready for a larger scale test (10 minis):
-set up apache configs to enable localhost for talos
-verify that puppet is able to configure a pristine system (rev4-testing2)
Just failed talos because of a lack of PyYAML.  Will compile that locally

wget http://pyyaml.org/download/pyyaml/PyYAML-3.10.tar.gz
tar zxf PyYAML-3.10.tar.gz
cd PyYAML-3.10
python2.6 setup.py install --root=root
python2.5 setup.py install --root=root
wget http://hg.mozilla.org/build/puppet-manifests/raw-file/aa75d6e91838/create-dmg.sh
sh create-dmg.sh root/Library/Python pyyaml-3.10 pyyaml /Library/
scp pyyaml-3.10.dmg staging-puppet.build.mozilla.org:/N/staging/darwin10-i386/test/DMGs/

A note for the Apache configuration is that a sample URL used is

localhost/page_load_test/sunspider/3d-cube.html

This means that ~/talos-slave/talos-data/ needs to be the root of http://localhost
I noticed that the mac mini launced the screen saver.  I have done some searching and haven't found a way to disable the screen saver using the command line.  Unfortunately, this likely means a manual step as part of the OS preperation

System Preferences -> Desktop and Screen Saver pane -> Screen Saver tab -> Start screen saver: Never
You can probably figure out what plist it is by modifying the preference and checking which plist gets modified:

armenzg-laptop $ ls -lrt ~/Library/Preferences/ByHost/ | grep "13:5"
-rw-------  1 armenzg  staff      89 30 Aug 13:55 com.apple.iCal.helper.3C71DF91-6E36-553D-8C00-B6E04CEB3AB0.plist
-rw-------  1 armenzg  staff     209 30 Aug 13:57 com.apple.screensaver.3C71DF91-6E36-553D-8C00-B6E04CEB3AB0.plist

Then you can set the read/set the values:
armenzg-laptop $ defaults read ByHost/com.apple.screensaver.3C71DF91-6E36-553D-8C00-B6E04CEB3AB0 idleTime
2100
armenzg-laptop $ defaults write ByHost/com.apple.screensaver.3C71DF91-6E36-553D-8C00-B6E04CEB3AB0 idleTime -int 0
armenzg-laptop $ defaults read ByHost/com.apple.screensaver.3C71DF91-6E36-553D-8C00-B6E04CEB3AB0 idleTime
0

Also without having to figure out the hash:
armenzg-laptop $ defaults -currentHost read com.apple.screensaver idleTime
2100
armenzg-laptop $ defaults -currentHost write com.apple.screensaver idleTime 0
armenzg-laptop $ defaults -currentHost read com.apple.screensaver idleTime
0

I remember I fiddled with this some time ago but I don't remember if I figured out how to make the preference permanent after reboot.

Let me know if this helps.


The string of numbers in the plist name is the machine's MAC address
http://archive.macosxlabs.org/documentation/hard_disk_maintenance/byhost_renamer_shell/byhost_renamer_script.html

[1] http://hintsforums.macworld.com/archive/index.php/t-28474.html
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #20)
> Also without having to figure out the hash:
> armenzg-laptop $ defaults -currentHost read com.apple.screensaver idleTime
> 2100
> armenzg-laptop $ defaults -currentHost write com.apple.screensaver idleTime 0
> armenzg-laptop $ defaults -currentHost read com.apple.screensaver idleTime
> 0

I tried that, but ScreenSaverEngine doesn't seem to read it.

> I remember I fiddled with this some time ago but I don't remember if I
> figured out how to make the preference permanent after reboot.
> 
> Let me know if this helps.
> 
> 
> The string of numbers in the plist name is the machine's MAC address
> http://archive.macosxlabs.org/documentation/hard_disk_maintenance/
> byhost_renamer_shell/byhost_renamer_script.html
> 
> [1] http://hintsforums.macworld.com/archive/index.php/t-28474.html

Hmm, that doesn't seem to load.

A friend helped me figure out how to get the hash.  I have written a script that turns off the screensaver.  I believe this would survive reboot, but I am going to run it every boot with puppet to ensure things work


#!/bin/bash
# This script is used to turn off the screen saver on systems that run 
# Mac OS X 10.6
uuid=$(ioreg -rd1 -c IOPlatformExpertDevice | grep -E '(UUID)' | sed -e "s/.*\"\(.*\)\"$/\1/")
echo "UUID for $(hostname): $uuid"
echo "idleTime before setting: " $(defaults read ByHost/com.apple.screensaver.${uuid} idleTime)
echo $(defaults write ByHost/com.apple.screensaver.${uuid} idleTime 0)
echo "idleTime after setting: " $(defaults read ByHost/com.apple.screensaver.${uuid} idleTime)
echo "Starting screensaver to make sure that it picks up the setting change"
/System/Library/Frameworks/ScreenSaver.framework/Resources/ScreenSaverEngine.app/Contents/MacOS/ScreenSaverEngine &
ssepid=$!
sleep 1
kill $ssepid
I am not sure if the start and stop of the screensaverengine is strictly required, I'll test in a bit
Just finished a test with zandr and it looks like 1600x1200x32 is not an available mode with the dongle unplugged at boot.  The best choice is 1680x1050x32 and the one that "screenresolution set 1600x1200x32" selects is 1280x1024x32, which is not acceptable.

On the upside, i did verify that asking for a previously valid resolution does fail as it is expected to with the screenresolution program.
(In reply to John Ford [:jhford] from comment #24)
> Just finished a test with zandr and it looks like 1600x1200x32 is not an
> available mode with the dongle unplugged at boot.  The best choice is
> 1680x1050x32 and the one that "screenresolution set 1600x1200x32" selects is
> 1280x1024x32, which is not acceptable.

actually, a more correct phrasing is "Running screenresoution set 1600x1200x32 fails and the resolution is set to 1280x1024x32 by some other means"
 
> On the upside, i did verify that asking for a previously valid resolution
> does fail as it is expected to with the screenresolution program.
Depends on: 684346
Ok, I think setting the before meta parameter to reference the buildbot::install Class is what is needed here.  I was able to run puppet and get the machine into staging.
Sadly, this cannot happen while existing 10.6 slaves are hooked up to puppet masters.
Attachment #561254 - Flags: review?
Attachment #555887 - Attachment is obsolete: true
This is an interim patch that short circuits classes/talosslave.pp

I understand that classes/talosslave is there for a reason, but part of this project is the removal of all rev3 talos machines.  Once the rev4 machines are in production, I will be removing talos_osx.pp and replacing it with talos_osx2.pp and using talosslave.pp again in the site file.

There will also be more machines (talos-r4-snow-001 through 010) added to the manifests, exactly as rev4-testing and rev4-testing2 are in staging.pp.
Attachment #561254 - Attachment is obsolete: true
Attachment #561254 - Flags: review?
Attachment #561558 - Flags: review?(bhearsum)
Comment on attachment 561558 [details] [diff] [review]
allow both rev3 and rev4 10.6 slaves to sync with same puppet master

Review of attachment 561558 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to John Ford [:jhford] from comment #17)
> -enabling VNC is manual because my attempts at using the ARD kickstart
> script require having the cleartext password in the puppet manifests.

That's fine. We have secrets.pp for this. It's important to have passwords changeable via Puppet, please do this. Let me know if you have any trouble with secrets.pp

(In reply to John Ford [:jhford] from comment #18)
> Just failed talos because of a lack of PyYAML.  Will compile that locally
> 
> wget http://pyyaml.org/download/pyyaml/PyYAML-3.10.tar.gz
> tar zxf PyYAML-3.10.tar.gz
> cd PyYAML-3.10
> python2.6 setup.py install --root=root
> python2.5 setup.py install --root=root

Why do you need to run setup.py through both Python's?

I've made some comments below about exec {}s that may or may not need to run every time. If they don't need to run every time here's a few strategies for fixing them:
* If there's a file whose existence is dependent on the state you're trying to achieve, you can use an onlyif like this to control whether or not the command is run: 'onlyif => "/bin/bash -c '! test -f [file]'"
* If there's a command you can use to decide whether or not the exec needs to run, you can use a similar onlyif

r- specifically for the VNC password issue and the comments about ssh/apache below. I'd like to see the other execs improved where possible, too. However, the vast majority of this is looking good.

::: os/talos_osx2.pp
@@ +1,1 @@
> +# talos_osx.pp

Comment needs updating.

@@ +2,5 @@
> +# This file is divided into a large case statement which
> +# separates Mac OS X 10.6 and 10.5 followed by a section
> +# where common configuration is noted
> +
> +class talos_osx2 {

Minor nit: Please be more descriptive here. Eg: talos_osx_rev4

@@ +98,5 @@
> +                    command => "/usr/sbin/systemsetup -setrestartfreeze on";
> +                "disallow-sleep-button":
> +                    command => "/usr/sbin/systemsetup -setallowpowerbuttontosleepcomputer off";
> +                "set-time-server":
> +                    command => "/usr/sbin/systemsetup -setnetworktimeserver ntp1.build.mozilla.org";

Do all of these need to be run every time?

@@ +100,5 @@
> +                    command => "/usr/sbin/systemsetup -setallowpowerbuttontosleepcomputer off";
> +                "set-time-server":
> +                    command => "/usr/sbin/systemsetup -setnetworktimeserver ntp1.build.mozilla.org";
> +                # Using -w will enable the service for future boots, this command does tick the box
> +                # for remote-login in the Sharing prefpane (survives reboot)

If this turns ssh for future boots, why do we need to run this every time?

@@ +107,5 @@
> +                # apachectl internally loads the org.apache.httpd.plist daemon.  Assuming that it
> +                # uses the launchctl -w flag to persist state. this command does tick the box in
> +                # the sharing prefpane (survives reboot)
> +                "turn-on-httpd":
> +                    command => "/usr/sbin/apachectl restart";

Can't this be configured to start at boot?

@@ +143,5 @@
> +    }
> +
> +    # Mac OS X 10.6 and 10.5 common configuration
> +
> +    file {

I'm not pleased to see all of the duplication of these file checks (and some others above), I would've preferred to see common things factored out to a common class. Not going to r- based on this, but please consider this for future patches.

@@ +213,5 @@
> +    exec {
> +        disable-indexing:
> +            command => "/usr/bin/mdutil -a -i off";
> +        remove-index:
> +            command => "/usr/bin/mdutil -a -E";

Do these need to be run every time?
Attachment #561558 - Flags: review?(bhearsum) → review-
(In reply to Ben Hearsum [:bhearsum] from comment #29)
> Comment on attachment 561558 [details] [diff] [review]
> allow both rev3 and rev4 10.6 slaves to sync with same puppet master
> 
> Review of attachment 561558 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> (In reply to John Ford [:jhford] from comment #17)
> > -enabling VNC is manual because my attempts at using the ARD kickstart
> > script require having the cleartext password in the puppet manifests.
> 
> That's fine. We have secrets.pp for this. It's important to have passwords
> changeable via Puppet, please do this. Let me know if you have any trouble
> with secrets.pp

How were passwords being changed before on 10.6 rev3 slaves?  I don't see anything in talos_osx.pp that suggests that the machines were consuming the information in the secrets.pp file and from what I can see, talos_osx was the only manifest that the slaves had applied.

> (In reply to John Ford [:jhford] from comment #18)
> > Just failed talos because of a lack of PyYAML.  Will compile that locally
> > 
> > wget http://pyyaml.org/download/pyyaml/PyYAML-3.10.tar.gz
> > tar zxf PyYAML-3.10.tar.gz
> > cd PyYAML-3.10
> > python2.6 setup.py install --root=root
> > python2.5 setup.py install --root=root
> 
> Why do you need to run setup.py through both Python's?

Because its trivially easy and uses a tiny amount of disk space to have PyYAML for both 2.5 and 2.6.  As I understand, we are currently using python2.5 for talos.  This removes a roadblock to using python 2.6 should we ever decide to upgrade.

> I've made some comments below about exec {}s that may or may not need to run
> every time. If they don't need to run every time here's a few strategies for
> fixing them:
> * If there's a file whose existence is dependent on the state you're trying
> to achieve, you can use an onlyif like this to control whether or not the
> command is run: 'onlyif => "/bin/bash -c '! test -f [file]'"
> * If there's a command you can use to decide whether or not the exec needs
> to run, you can use a similar onlyif
> 
> r- specifically for the VNC password issue and the comments about ssh/apache
> below. I'd like to see the other execs improved where possible, too.
> However, the vast majority of this is looking good.
> 
> ::: os/talos_osx2.pp
> @@ +1,1 @@
> > +# talos_osx.pp
> 
> Comment needs updating.

Sure, but this is not intended to be the long term solution.  As soon as rev3 10.6 machines are turned off, this file/class will be moving back to being talos_osx

> @@ +2,5 @@
> > +# This file is divided into a large case statement which
> > +# separates Mac OS X 10.6 and 10.5 followed by a section
> > +# where common configuration is noted
> > +
> > +class talos_osx2 {
> 
> Minor nit: Please be more descriptive here. Eg: talos_osx_rev4

Done

> @@ +98,5 @@
> > +                    command => "/usr/sbin/systemsetup -setrestartfreeze on";
> > +                "disallow-sleep-button":
> > +                    command => "/usr/sbin/systemsetup -setallowpowerbuttontosleepcomputer off";
> > +                "set-time-server":
> > +                    command => "/usr/sbin/systemsetup -setnetworktimeserver ntp1.build.mozilla.org";
> 
> Do all of these need to be run every time?

I am not sure, but they run very quickly and the onlyif parameter would be the systemsetup command with -get<command> with some shell script to evaluate the outcome.  I don't think that adding the complexity is worth it, either from a performance or correctness standpoint.

> @@ +100,5 @@
> > +                    command => "/usr/sbin/systemsetup -setallowpowerbuttontosleepcomputer off";
> > +                "set-time-server":
> > +                    command => "/usr/sbin/systemsetup -setnetworktimeserver ntp1.build.mozilla.org";
> > +                # Using -w will enable the service for future boots, this command does tick the box
> > +                # for remote-login in the Sharing prefpane (survives reboot)
> 
> If this turns ssh for future boots, why do we need to run this every time?

It's not destructive, I don't know what to configure as the onlyif parameter, and it actively ensures that it is on.
 
> @@ +107,5 @@
> > +                # apachectl internally loads the org.apache.httpd.plist daemon.  Assuming that it
> > +                # uses the launchctl -w flag to persist state. this command does tick the box in
> > +                # the sharing prefpane (survives reboot)
> > +                "turn-on-httpd":
> > +                    command => "/usr/sbin/apachectl restart";
> 
> Can't this be configured to start at boot?

It is.  As mentioned in the comment, I believe that apachectl internally uses the launchctl -w flag, which configures the daemon to start at boot.  I guess it could be "apachectl start", but the incremental cost to restart over start is 0.04s, hardly worth optimizing.  Even a start is cheap at about 0.02s.

> @@ +143,5 @@
> > +    }
> > +
> > +    # Mac OS X 10.6 and 10.5 common configuration
> > +
> > +    file {
> 
> I'm not pleased to see all of the duplication of these file checks (and some
> others above), I would've preferred to see common things factored out to a
> common class. Not going to r- based on this, but please consider this for
> future patches.

This is a temporary workaround that will last as long as the rev3 10.6 machines.  The plan is for this file to replace the talos_osx.pp file when the 10.6 rev3 machines are turned off.

> @@ +213,5 @@
> > +    exec {
> > +        disable-indexing:
> > +            command => "/usr/bin/mdutil -a -i off";
> > +        remove-index:
> > +            command => "/usr/bin/mdutil -a -E";
> 
> Do these need to be run every time?

No clue, but that's how they have always been.  It's not destructive and the onlyif metaparameter would likely require running mdutil, shell, grep... to figure out whether or not to run mdutil.

As for performance, it is 0.15s - 0.5s to run through turning off indexes, when already off.
(In reply to John Ford [:jhford] from comment #30)
> > That's fine. We have secrets.pp for this. It's important to have passwords
> > changeable via Puppet, please do this. Let me know if you have any trouble
> > with secrets.pp
> 
> How were passwords being changed before on 10.6 rev3 slaves?  I don't see
> anything in talos_osx.pp that suggests that the machines were consuming the
> information in the secrets.pp file and from what I can see, talos_osx was
> the only manifest that the slaves had applied.

The rev3 manifests predate secrets.pp. Now that we have it we should be using it - regardless of what the rev3 manifests do.

> > Why do you need to run setup.py through both Python's?
> 
> Because its trivially easy and uses a tiny amount of disk space to have
> PyYAML for both 2.5 and 2.6.  As I understand, we are currently using
> python2.5 for talos.  This removes a roadblock to using python 2.6 should we
> ever decide to upgrade.

Okay, good to know.

> > @@ +98,5 @@
> > > +                    command => "/usr/sbin/systemsetup -setrestartfreeze on";
> > > +                "disallow-sleep-button":
> > > +                    command => "/usr/sbin/systemsetup -setallowpowerbuttontosleepcomputer off";
> > > +                "set-time-server":
> > > +                    command => "/usr/sbin/systemsetup -setnetworktimeserver ntp1.build.mozilla.org";
> > 
> > Do all of these need to be run every time?
> 
> I am not sure, but they run very quickly and the onlyif parameter would be
> the systemsetup command with -get<command> with some shell script to
> evaluate the outcome.  I don't think that adding the complexity is worth it,
> either from a performance or correctness standpoint.
> > @@ +100,5 @@
> > > +                    command => "/usr/sbin/systemsetup -setallowpowerbuttontosleepcomputer off";
> > > +                "set-time-server":
> > > +                    command => "/usr/sbin/systemsetup -setnetworktimeserver ntp1.build.mozilla.org";
> > > +                # Using -w will enable the service for future boots, this command does tick the box
> > > +                # for remote-login in the Sharing prefpane (survives reboot)
> > 
> > If this turns ssh for future boots, why do we need to run this every time?
> 
> It's not destructive, I don't know what to configure as the onlyif
> parameter, and it actively ensures that it is on.

Please have at least a quick try at doing this. This argument doesn't hold in 2 years from now when we have 3 or 4 times the number of commands like this. I don't think it's worth spending half the day on, but it does have value.

> > @@ +107,5 @@
> > > +                # apachectl internally loads the org.apache.httpd.plist daemon.  Assuming that it
> > > +                # uses the launchctl -w flag to persist state. this command does tick the box in
> > > +                # the sharing prefpane (survives reboot)
> > > +                "turn-on-httpd":
> > > +                    command => "/usr/sbin/apachectl restart";
> > 
> > Can't this be configured to start at boot?
> 
> It is.  As mentioned in the comment, I believe that apachectl internally
> uses the launchctl -w flag, which configures the daemon to start at boot.  I
> guess it could be "apachectl start", but the incremental cost to restart
> over start is 0.04s, hardly worth optimizing.  Even a start is cheap at
> about 0.02s.

Sorry, I'm not following. Why do we need to run _anything_ via apachectl at boot if it's already configured to start at boot?

> > @@ +143,5 @@
> > > +    }
> > > +
> > > +    # Mac OS X 10.6 and 10.5 common configuration
> > > +
> > > +    file {
> > 
> > I'm not pleased to see all of the duplication of these file checks (and some
> > others above), I would've preferred to see common things factored out to a
> > common class. Not going to r- based on this, but please consider this for
> > future patches.
> 
> This is a temporary workaround that will last as long as the rev3 10.6
> machines.  The plan is for this file to replace the talos_osx.pp file when
> the 10.6 rev3 machines are turned off.

I'm aware that this is intended to be a temporary workaround, but we don't have a great track record with things like that, so I generally don't like a "do it dirty because it will die soon" approach.

> > @@ +213,5 @@
> > > +    exec {
> > > +        disable-indexing:
> > > +            command => "/usr/bin/mdutil -a -i off";
> > > +        remove-index:
> > > +            command => "/usr/bin/mdutil -a -E";
> > 
> > Do these need to be run every time?
> 
> No clue, but that's how they have always been.  It's not destructive and the
> onlyif metaparameter would likely require running mdutil, shell, grep... to
> figure out whether or not to run mdutil.
> 
> As for performance, it is 0.15s - 0.5s to run through turning off indexes,
> when already off.

I'm more concerned with correctness than performance. It's silly, and clogs up the Puppet output to pointlessly run commands at every boot.
(In reply to Ben Hearsum [:bhearsum] from comment #31)
> (In reply to John Ford [:jhford] from comment #30)
> The rev3 manifests predate secrets.pp. Now that we have it we should be
> using it - regardless of what the rev3 manifests do.

I would like to treat that as a separate project.  Having password management that didn't already exist is out of the scope of this project.  This project is about bringing new rev4 minis up to par with rev3 minis so we can repurpose the rev3 minis.  General puppet enhancements should be handled in a new project, imo.

> > It's not destructive, I don't know what to configure as the onlyif
> > parameter, and it actively ensures that it is on.
> 
> Please have at least a quick try at doing this. This argument doesn't hold
> in 2 years from now when we have 3 or 4 times the number of commands like
> this. I don't think it's worth spending half the day on, but it does have
> value.

I could have the onlyif that does something like "echo | nc localhost 22 | grep SSH-2.0", but keep in mind that it is going to slow down the overall process.  That onlyif is going to take 0.4s, where enabling ssh (redundantly) is going to take 0.02s.

> > It is.  As mentioned in the comment, I believe that apachectl internally
> > uses the launchctl -w flag, which configures the daemon to start at boot.  I
> > guess it could be "apachectl start", but the incremental cost to restart
> > over start is 0.04s, hardly worth optimizing.  Even a start is cheap at
> > about 0.02s.
> 
> Sorry, I'm not following. Why do we need to run _anything_ via apachectl at
> boot if it's already configured to start at boot?

To ensure it's started.  Like ssh, we could use an onlyif ('curl -I localhost | grep "HTTP/1.1 200 OK') for a 0.04s time savings.

> > This is a temporary workaround that will last as long as the rev3 10.6
> > machines.  The plan is for this file to replace the talos_osx.pp file when
> > the 10.6 rev3 machines are turned off.
> 
> I'm aware that this is intended to be a temporary workaround, but we don't
> have a great track record with things like that, so I generally don't like a
> "do it dirty because it will die soon" approach.

It isn't a 'do it dirty' approach, it is a pragmatic approach.  If it allays your concerns, I can prepare the patch for when rev3 minis are taken offline before I deploy this change.

> > As for performance, it is 0.15s - 0.5s to run through turning off indexes,
> > when already off.
> 
> I'm more concerned with correctness than performance. It's silly, and clogs
> up the Puppet output to pointlessly run commands at every boot.

This is existing code that has been in place since May, I don't see any reason to change it.
(new patch coming up, as soon as I can test it)
(In reply to John Ford [:jhford] from comment #32)
> (In reply to Ben Hearsum [:bhearsum] from comment #31)
> > (In reply to John Ford [:jhford] from comment #30)
> > The rev3 manifests predate secrets.pp. Now that we have it we should be
> > using it - regardless of what the rev3 manifests do.
> 
> I would like to treat that as a separate project.  Having password
> management that didn't already exist is out of the scope of this project. 
> This project is about bringing new rev4 minis up to par with rev3 minis so
> we can repurpose the rev3 minis.  General puppet enhancements should be
> handled in a new project, imo.

I don't agree. Bringing up a new platform is an ideal time to make changes like this. I do _not_ want us to have another platform on which we have no password management strategy. Addressing this before we roll out a pool of machines is highly desirable.

> > > It's not destructive, I don't know what to configure as the onlyif
> > > parameter, and it actively ensures that it is on.
> > 
> > Please have at least a quick try at doing this. This argument doesn't hold
> > in 2 years from now when we have 3 or 4 times the number of commands like
> > this. I don't think it's worth spending half the day on, but it does have
> > value.
> 
> I could have the onlyif that does something like "echo | nc localhost 22 |
> grep SSH-2.0", but keep in mind that it is going to slow down the overall
> process.  That onlyif is going to take 0.4s, where enabling ssh
> (redundantly) is going to take 0.02s.
> 
> > > It is.  As mentioned in the comment, I believe that apachectl internally
> > > uses the launchctl -w flag, which configures the daemon to start at boot.  I
> > > guess it could be "apachectl start", but the incremental cost to restart
> > > over start is 0.04s, hardly worth optimizing.  Even a start is cheap at
> > > about 0.02s.
> > 
> > Sorry, I'm not following. Why do we need to run _anything_ via apachectl at
> > boot if it's already configured to start at boot?
> 
> To ensure it's started.  Like ssh, we could use an onlyif ('curl -I
> localhost | grep "HTTP/1.1 200 OK') for a 0.04s time savings.
> 
> > > This is a temporary workaround that will last as long as the rev3 10.6
> > > machines.  The plan is for this file to replace the talos_osx.pp file when
> > > the 10.6 rev3 machines are turned off.
> > 
> > I'm aware that this is intended to be a temporary workaround, but we don't
> > have a great track record with things like that, so I generally don't like a
> > "do it dirty because it will die soon" approach.
> 
> It isn't a 'do it dirty' approach, it is a pragmatic approach.  If it allays
> your concerns, I can prepare the patch for when rev3 minis are taken offline
> before I deploy this change.
> 
> > > As for performance, it is 0.15s - 0.5s to run through turning off indexes,
> > > when already off.
> > 
> > I'm more concerned with correctness than performance. It's silly, and clogs
> > up the Puppet output to pointlessly run commands at every boot.
> 
> This is existing code that has been in place since May, I don't see any
> reason to change it.

I'm not going to argue further on the rest of this.
(In reply to Ben Hearsum [:bhearsum] from comment #34)
> (In reply to John Ford [:jhford] from comment #32)
> > (In reply to Ben Hearsum [:bhearsum] from comment #31)
> > > (In reply to John Ford [:jhford] from comment #30)
> > > The rev3 manifests predate secrets.pp. Now that we have it we should be
> > > using it - regardless of what the rev3 manifests do.
> > 
> > I would like to treat that as a separate project.  Having password
> > management that didn't already exist is out of the scope of this project. 
> > This project is about bringing new rev4 minis up to par with rev3 minis so
> > we can repurpose the rev3 minis.  General puppet enhancements should be
> > handled in a new project, imo.
> 
> I don't agree. Bringing up a new platform is an ideal time to make changes
> like this. I do _not_ want us to have another platform on which we have no
> password management strategy. Addressing this before we roll out a pool of
> machines is highly desirable.

As discussed in mozbuild, we don't currently have a way of doing this at all on Mac.  I asked around in #puppet and the best response I got was that the user/group management on mac was questionable at best, and that was for the latest puppet.  I wrote a POC python script to do this with subprocess and select and no matter what I did, the passwd program seemed to open a new stdin, replacing the one that s.PIPE was supposed to create.
When I try setting the vnc password on the command line, i get into a state where the VNC password doesn't work at all.
Attached patch with onlyif metaparams (obsolete) — Splinter Review
This has onlyif metaparameters for ssh.  I have also added a file resource that ensures that Screen Sharing is ticked in the sharing preferences.
Attachment #561558 - Attachment is obsolete: true
Attachment #561946 - Flags: review?(bhearsum)
Comment on attachment 561946 [details] [diff] [review]
with onlyif metaparams

>+++ b/os/talos_osx_rev4.pp
>@@ -0,0 +1,230 @@
>+# talos_osx_rev4.pp
>+# This file is divided into a large case statement which
>+# separates Mac OS X 10.6 and 10.5 followed by a section
>+# where common configuration is noted
>+
>+class talos_osx_rev4 {

In my opinion it might be simpler and more obvious if you split the 10.6 and 10.5 up into separate imported classes for this. (Since extreme nesting of this stuff is harder to read/skim when you have it the way this patch is... imo)

something like:

class talos_osx_rev4_105 {
  ...
}

class talos_osx_rev4_106 {
  ...
}

Then use imports inside the case, rather than a large case section, and use the common in the talos_osx_rev4 class like current.
Comment on attachment 561946 [details] [diff] [review]
with onlyif metaparams

You can use grep -v instead of switching the return values manually. r+ though.
Attachment #561946 - Flags: review?(bhearsum) → review+
Blocks: 683715
Because i wasn't able to test the bootstrapping code much, I think I found a problem.  Because this problem isn't fatal, the second run of puppet might be fixing it.

This is the interdiff

~/puppet-manifests $ diff before.diff fix.diff 
72c72
< +                    before => Class["buildslave::install"];
---
> +                    before => [Class["buildslave::install"], Class["python::virtualenv"]];
Turns out that comment 40 doesn't work.  This is kind of a hack, but it does work

                "/usr/local/bin/install_name_tool":
                    source => "${platform_fileroot}/usr/local/bin/install_name_tool",
                    owner => "root",
                    group => "staff",
                    mode => 755,
                    before => Package['python-sdk-3.2.6.dmg'];

I say its a hack because really, the install_name_tool should 'before' the virtual env.  Because python::virtualenv is a defined resource type, not a class or a native type, I don't believe I can reference in a metaparam.

Because xcode is installed before any of the buildslave stuff is run, and the buildslave stuff is the only thing creating venvs, this should ensure that install_name_tool is installed before venv.  No matter what, venv fails catastrophically until *both* install_name_tool and the python sdk are installed.
(In reply to John Ford [:jhford] from comment #41)
> Turns out that comment 40 doesn't work.  This is kind of a hack, but it does
> work
> 
>                 "/usr/local/bin/install_name_tool":
>                     source =>
> "${platform_fileroot}/usr/local/bin/install_name_tool",
>                     owner => "root",
>                     group => "staff",
>                     mode => 755,
>                     before => Package['python-sdk-3.2.6.dmg'];
> 
> I say its a hack because really, the install_name_tool should 'before' the
> virtual env.  Because python::virtualenv is a defined resource type, not a
> class or a native type, I don't believe I can reference in a metaparam.
> 
> Because xcode is installed before any of the buildslave stuff is run, and
> the buildslave stuff is the only thing creating venvs, this should ensure
> that install_name_tool is installed before venv.  No matter what, venv fails
> catastrophically until *both* install_name_tool and the python sdk are
> installed.

That is working.

I also fixed another random bug.  Because puppet was implicitly ordering pkgdmgs after installing pkgdmg, setting dependencies on the python-sdk changed when dmgs are installed.  Because of this, puppet was erroring out.  I think adding this helps

"python-sdk-3.2.6.dmg":
                    provider => pkgdmg,
                    ensure => installed,
                    source => "${platform_httproot}/DMGs/python-sdk-3.2.6.dmg",
                    require => File['/Library/Ruby/Gems/1.8/gems/puppet-0.24.8/lib/puppet/provider/package/pkgdmg.rb'],
                    before => Class["buildslave::install"];
Looks like something funky is happening with paths.  I changed install_name_tool from /usr/local/bin/ to /usr/bin/ and that seems to be fixing a lot of my virtualenv issues.
Blocks: 690236
While checking results I noticed that some of the machines had their screensavers running.  Turns out that I wrote and deployed a script to turn off the screensaver, but didn't include an exec resource to actually run it.

Strangely, this script needs to be run as cltbld, not root.  To work around this, I run the command under sudo.

Here is the interdiff

diff -u b/os/talos_osx_rev4.pp b/os/talos_osx_rev4.pp
--- b/os/talos_osx_rev4.pp
+++ b/os/talos_osx_rev4.pp
@@ -99,6 +99,9 @@
                     command => "/usr/sbin/systemsetup -setrestartpowerfailure on";
                 "restart-on-panic":
                     command => "/usr/sbin/systemsetup -setrestartfreeze on";
+                "disable-screensaver":
+                    command => "/usr/bin/sudo -u cltbld /usr/local/bin/disable-screensaver.sh",
+                    require => File["/usr/local/bin/disable-screensaver.sh"];
                 "disallow-sleep-button":
                     command => "/usr/sbin/systemsetup -setallowpowerbuttontosleepcomputer off";
                 "set-time-server":
Attachment #561946 - Attachment is obsolete: true
Attachment #564724 - Flags: review?
Comment on attachment 564724 [details] [diff] [review]
slight update to patch

see comment 44 on this bug for the interdiff between this and the r+'d patch
Attachment #564724 - Flags: review? → review?(bhearsum)
Attachment #564724 - Flags: review?(bhearsum) → review+
Puppet is able to correctly configure talos-r4-snow-000 testing slaves.  There are issues bootstrapping the machines, but that is tracked in other bugs.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: