Closed Bug 820235 Opened 12 years ago Closed 11 years ago

Perform verification of new Linux64 and Linux32 test reference platform on iX node

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Assigned: rail)

References

Details

Attachments

(5 files, 3 obsolete files)

Like the other new test reference platforms, we need to get the evaluation node hooked up to a dev buildbot master, run the full suite of tests against it, and document the test failure delta from the current ref platform.
Depends on: 820238
Blocks: 820243
(In reply to Chris Cooper [:coop] from comment #0)
> Like the other new test reference platforms, we need to get the evaluation
> node hooked up to a dev buildbot master, run the full suite of tests against
> it, and document the test failure delta from the current ref platform.

Hey Coop, do we have that test failure delta data from the current ref platform somewhere?
(In reply to Carsten Book [:Tomcat] from comment #1) 
> Hey Coop, do we have that test failure delta data from the current ref
> platform somewhere?

Once the new test machine is ready for verification, it's as simple as pulling down the log from a current test run on mozilla-central and making sure you test the same packaged build on the new machine, i.e. uploading that build to dev-stage so the new machine can grab it. Alternatively, you could run the same build against the both the old ref platform and the new ref platform in staging. That would eliminate any production vs. staging differences.

Not really something we can do in advance unless you preserve the logs & build...best to wait until the new machine is setup to minimize result drift.
(In reply to Chris Cooper [:coop] from comment #2)

> Not really something we can do in advance unless you preserve the logs &
> build...best to wait until the new machine is setup to minimize result drift.

thanks! maschine is up and working on the puppet stuff, will update the bug as i get news/results
Tomcat, are you working on getting this to use X11 on the external graphics card?  I can take a whack at it if you're not.  That should probably be in a sub-bug anyway, since this bug is about evaluating the result.
So bug 838351 is for the graphics and audio, which I and/or someone else from IT will take care of.  Tomcat, do you know of anything else that will need to be done before we put these into production?
Flags: needinfo?(cbook)
Tomcat, I'm going to grab ix-mn-linux64-001 for this purpose.
Attached patch bug820235.patch (obsolete) — Splinter Review
As I had mentioned in irc, the disableservices line is disabling services which aren't even installed in the Base group.  Puppet fails to stop a service it doesn't know about, so instead this just uninstalls them where they're installed.

This also adds support for ipmitools on Ubuntu, since it's required on this hardware.

Graphics work will be in a patch on bug 838351.
Attachment #710451 - Flags: review?(cbook)
Comment on attachment 710451 [details] [diff] [review]
bug820235.patch

s/modem-manager/modemmanager/
Otherwise looks good.
Attachment #710451 - Flags: feedback+
Thanks for checking - I fixed that but I think I forgot to push the diff update.
Comment on attachment 710451 [details] [diff] [review]
bug820235.patch

r+ also see the notes from rail :)
Attachment #710451 - Flags: review?(cbook) → review+
Flags: needinfo?(cbook)
(In reply to Dustin J. Mitchell [:dustin] from comment #5)
> So bug 838351 is for the graphics and audio, which I and/or someone else
> from IT will take care of.  Tomcat, do you know of anything else that will
> need to be done before we put these into production?

no so far i don't know anything else, thanks for checking, will update when i have somehting
Comment on attachment 710451 [details] [diff] [review]
bug820235.patch

Per IRC conversations with Rail, this needs a bit more (installing ubuntu-desktop) and some changes to disableservices (install, but disable, anacron, since it's required for ubuntu-desktop)
Attachment #710451 - Flags: review-
This should do the trick..
Attachment #710788 - Flags: review?(rail)
Comment on attachment 710788 [details] [diff] [review]
bug820235-r2.patch

Review of attachment 710788 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm
Attachment #710788 - Flags: review?(rail) → review+
Attachment #710451 - Attachment is obsolete: true
Close!  I landed this to avoid including linux_desktop on Darwin:

diff --git a/modules/toplevel/manifests/slave/test.pp b/modules/toplevel/manifests/slave/test.pp
index 4416dbd..ab1e9aa 100644
--- a/modules/toplevel/manifests/slave/test.pp
+++ b/modules/toplevel/manifests/slave/test.pp
@@ -6,11 +6,15 @@ class toplevel::slave::test inherits toplevel::slave {
     # so we get the GUI for free and just need to ensure VNC is enabled.
     include vnc
     include screenresolution::talos
-    include packages::linux_desktop
     include users::builder::autologin
     include talos
     include ntp::atboot
     include packages::fonts
     include tweaks::fonts
     include tweaks::cleanup
+
+    # this will get fixed in a subsequent patch for bug 838351
+    if ($::operatingsystem == 'Ubuntu') {
+        include packages::linux_desktop
+    }
 }
Blocks: linux64-ix-releng
No longer blocks: 820243
I think this is ready to go, and in fact may already be done.  Rail?
Flags: needinfo?(rail)
We still need to hookup these machines to one of the non-staging branches and run them in parallel with fedora slaves. This would require some changes in builbot-configs (evil loops, of course) and probably a person from a-team to look at the possible failures.

We attached some of the machines to my staging master, replacing existing fedora slaves, but evaluating results without TBPL is hard.
Flags: needinfo?(rail)
(In reply to Rail Aliiev [:rail] from comment #17)
> We attached some of the machines to my staging master, replacing existing
> fedora slaves, but evaluating results without TBPL is hard.

Is cedar being specifically used for Win8 or could we also hook up the Fedora slaves there and just have the Windows guys ignore the Linux results and vice versa? Seems like the best use of existing resources to me.
Yeah, Cedar is my favorite too. :)
Attached patch sekretsSplinter Review
Attachment #729206 - Flags: review?(catlee)
Attached patch buildapiSplinter Review
Attachment #729207 - Flags: review?(catlee)
Attached patch buildbotcustomSplinter Review
* use talos_slave_platforms by default and slave_platforms as fallback
Attachment #729211 - Flags: review?(catlee)
Attached patch configs (obsolete) — Splinter Review
It generates sane diffs:

config.py: https://gist.github.com/rail/5240897
builders: https://gist.github.com/rail/5240891
Attachment #729212 - Flags: review?(catlee)
Attached patch configs (obsolete) — Splinter Review
the only difference is number of slaves (100 vs 50) in production_config.py
Attachment #729212 - Attachment is obsolete: true
Attachment #729212 - Flags: review?(catlee)
Attachment #729232 - Flags: review?(catlee)
Attachment #729206 - Flags: review?(catlee) → review+
Attachment #729207 - Flags: review?(catlee) → review+
Attachment #729211 - Flags: review?(catlee) → review+
Comment on attachment 729232 [details] [diff] [review]
configs

Review of attachment 729232 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla-tests/BuildSlaves.py.template
@@ +25,5 @@
>      "ubuntu64_vm-b2g": "pass",
>      "ubuntu64_vm": "pass",
> +    "ubuntu32_hw": "pass",
> +    "ubuntu64_hw-b2g": "pass",
> +    "ubuntu64_hw": "pass",

nit: can you sort these platforms? maybe group all the ubuntu32 variants together, and then all the ubuntu64 variants together?
Attachment #729232 - Flags: review?(catlee) → review+
Attachment #729206 - Flags: checked-in+
Back out: http://hg.mozilla.org/build/buildbot-configs/rev/c5631ad322a0


INFO  - created  "bm18-tests1-linux" master, running checkconfig
INFO  - starting to print log file '/builds/buildbot/preproduction/slave/test-masters/buildbot-configs/test-output/bm18-tests1-linux-jp7wI4-checkconfig.log'
INFO  - /builds/buildbot/preproduction/slave/test-masters/sandbox/lib/python2.6/site-packages/twisted/mail/smtp.py:10: DeprecationWarning: the MimeWriter module is deprecated; use the email package instead
INFO  -   import MimeWriter, tempfile, rfc822
INFO  - Traceback (most recent call last):
INFO  -   File "/builds/buildbot/preproduction/slave/test-masters/sandbox/lib/python2.6/site-packages/buildbot-0.8.2_hg_41fc8a9db7a0_production_0.8-py2.6.egg/buildbot/scripts/runner.py", line 1042, in doCheckConfig
INFO  -     ConfigLoader(configFileName=configFileName)
INFO  -   File "/builds/buildbot/preproduction/slave/test-masters/sandbox/lib/python2.6/site-packages/buildbot-0.8.2_hg_41fc8a9db7a0_production_0.8-py2.6.egg/buildbot/scripts/checkconfig.py", line 31, in __init__
INFO  -     self.loadConfig(configFile, check_synchronously_only=True)
INFO  -   File "/builds/buildbot/preproduction/slave/test-masters/sandbox/lib/python2.6/site-packages/buildbot-0.8.2_hg_41fc8a9db7a0_production_0.8-py2.6.egg/buildbot/master.py", line 808, in loadConfig
INFO  -     % (b['name'], n))
INFO  - ValueError: builder Ubuntu HW 12.04 x64 cedar talos svgr uses undefined slave talos-linux64-ix-001
INFO  - finished printing log file '/builds/buildbot/preproduction/slave/test-masters/buildbot-configs/test-output/bm18-tests1-linux-jp7wI4-checkconfig.log'
ERROR - TEST-FAIL bm18-tests1-linux failed to run checkconfig


Hmmm. It worked fine on my <s>laptop</s> dev-master...
Attachment #729232 - Flags: checked-in+ → checked-in-
Attached patch configs v2Splinter Review
Hmmm, it turns out that running dump_master.py and builder_list.py doesn't check the configs...

A trivial interdiff: https://gist.github.com/rail/5248201

It passes test-master.sh now and has the same diff of list of builders.
Attachment #729232 - Attachment is obsolete: true
Attachment #729700 - Flags: review?(catlee)
Attachment #729700 - Flags: review?(catlee) → review+
random aside - can we make the hostnames consistent between HW and VMs? We currently have

tst-linux32-ec2-XXX
talos-linux32-ix-YYY

Does it still make sense to have 'talos' in the hostname?
b2g emulator tests are enabled on cedar in parallel with the old ones
Depends on: 858214
Depends on: 858587
(In reply to Chris AtLee [:catlee] from comment #33)
> random aside - can we make the hostnames consistent between HW and VMs? We
> currently have
> 
> tst-linux32-ec2-XXX
> talos-linux32-ix-YYY
> 
> Does it still make sense to have 'talos' in the hostname?

Yeah, I think this will be better.

Ami, how long it may take to rename the slaves?

From our side this will require changes in puppet patterns and buildbot configs.
Flags: needinfo?(arich)
Assignee: cbook → rail
So this is a significant couple of days worth of work to reimage all of the linux hosts we already have.  We would also need to change dns, inventory, and nagios.  We also would need to change all of inventory and dns for the windows xp and w7 hosts as well since the hardware is on ordered and all of these things have already been pre-populated.

I thought that talos was still a useful designator, because the whole reason we have physical hardware (vs the AWS machines) was because we *needed* it for talos (and graphics) tests (eg AWS can't do talos).  So to me it makes sense to still differentiate.
Flags: needinfo?(arich)
Depends on: 859867
Depends on: 861580
Depends on: 862327
Depends on: 863022
(In reply to Amy Rich [:arich] [:arr] from comment #36)
> So this is a significant couple of days worth of work to reimage all of the
> linux hosts we already have.  We would also need to change dns, inventory,
> and nagios.  We also would need to change all of inventory and dns for the
> windows xp and w7 hosts as well since the hardware is on ordered and all of
> these things have already been pre-populated.
> 
> I thought that talos was still a useful designator, because the whole reason
> we have physical hardware (vs the AWS machines) was because we *needed* it
> for talos (and graphics) tests (eg AWS can't do talos).  So to me it makes
> sense to still differentiate.

This probably got clarified somewhere else but in-house test machines will be doing mainly two types of jobs:
* unit tests that need graphic cards support
* talos jobs

We also established a hostname naming convention that is going through review by Relops and DCops.
No longer depends on: 859867
ATM this platform can be used for Talos without any problems. B2G emulator failures are tracked by bug 850105.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
BTW, (almost) everything here applies to 32-bit platform as well. Updating the summary accordingly.
Summary: Perform verification of new Linux64 test reference platform on iX node → Perform verification of new Linux64 and Linux32 test reference platform on iX node
Product: mozilla.org → Release Engineering
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: