Verify machine ix-mn-w864-00{2...6} suitable for RelEng win8 test use

RESOLVED FIXED

Status

Infrastructure & Operations
CIDuty
RESOLVED FIXED
6 years ago
2 months ago

People

(Reporter: Q, Assigned: Tomcat)

Tracking

Details

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

6 years ago
Machine ix-mn-w864-002 has a W8 build that needs testing by releng. I logs in under ctlbuild and starts processes automatically. Relops needs someone to test and let us know if the install is up to snuff and works as releng expects.
Tomcat, it looks like this is already hooked up to a staging master -- it's running jobs, anyway.  Can you have a look at the job results, and do any other analysis on the host that you'd like, and let us know here what (if anything) is wrong?

We have a google doc giving specifics on how these machines need to be configured here:
  https://docs.google.com/document/d/1gRzAybpgj4WFi-OO38eskKyT8PjsbopV_6GQMC9hvpI/edit#
which might help you -- or might not.  We think we've implemented everything on that list, although there are two unknowns at the end which we don't have any data on:
 *  Do we need specific sound drivers?
 *  “Mozilla Maintenance Service”?

At any rate, we'll keep that up to date with any modifications that come from your testing.  I expect we'll do a few you-test/we-fix rounds before we have something we're all happy with.  As you probably know, we're a few weeks away from the hardware being ready to run, so time is of the essence.
Blocks: 780050
(Assignee)

Comment 2

6 years ago
thanks dustin, will take care of this next week since i'm on buildduty today, but thats definitely my priority for next week!
Status: NEW → ASSIGNED
Tomcat emailed me about being unable to get to the host.

I alerted Q, who power-cycled it to bring it back.

We're looking into what's going wrong.  Tomcat, please check in with Q to see what the status is when you have a chance to look at the host.
(Assignee)

Comment 4

6 years ago
(In reply to Dustin J. Mitchell [:dustin] from comment #3)
> Tomcat emailed me about being unable to get to the host.
> 
> I alerted Q, who power-cycled it to bring it back.
> 
> We're looking into what's going wrong.  Tomcat, please check in with Q to
> see what the status is when you have a chance to look at the host.

hm seems its still down

PING ix-mn-linux64-001.test.releng.scl3.mozilla.com (10.26.56.129): 56 data bytes
64 bytes from 10.26.56.129: icmp_seq=0 ttl=62 time=183.200 ms
64 bytes from 10.26.56.129: icmp_seq=1 ttl=62 time=234.462 ms
64 bytes from 10.26.56.129: icmp_seq=2 ttl=62 time=322.635 ms
^X64 bytes from 10.26.56.129: icmp_seq=3 ttl=62 time=181.778 ms
64 bytes from 10.26.56.129: icmp_seq=4 ttl=62 time=180.473 ms
^C
--- ix-mn-linux64-001.test.releng.scl3.mozilla.com ping statistics ---
5 packets transmitted, 5 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 180.473/220.510/322.635/54.990 ms
cbook-1119:releng cbook$ ping ix-mn-w864-002
PING ix-mn-w864-002.wintest.releng.scl3.mozilla.com (10.26.40.25): 56 data bytes
Request timeout for icmp_seq 0
Request timeout for icmp_seq 1
Request timeout for icmp_seq 2
Request timeout for icmp_seq 3
Request timeout for icmp_seq 4
Request timeout for icmp_seq 5

is there a kind of firewall etc or do i need to connect to a special vpn/need access bits etc? this connection from above was while i was connected to the build network vpn
(Assignee)

Comment 5

6 years ago
and also pinged Q on irc
Sorry, it's not pingable (that's on our TODO list - host firewall disallows it).  You should be able to get in with VNC.
(Assignee)

Comment 7

6 years ago
(In reply to Dustin J. Mitchell [:dustin] from comment #6)
> Sorry, it's not pingable (that's on our TODO list - host firewall disallows
> it).  You should be able to get in with VNC.

oh cool, that worked thanks! working on testing!

Comment 8

6 years ago
Created attachment 710253 [details] [diff] [review]
[wip] win8 configs

Comment 9

6 years ago
Created attachment 710254 [details] [diff] [review]
wip - win8 buildbotcustom

Here are the patches from the summer that I used in case they are of any use.

Comment 10

6 years ago
FTR, if it was running on a buildbot master it is because up until last week I still had the masters with the changes that I had attached.
(In reply to Dustin J. Mitchell [:dustin] from comment #6)
> Sorry, it's not pingable (that's on our TODO list - host firewall disallows
> it).  You should be able to get in with VNC.

1) What else is still on your TODO list? Will any of those impact RelEng evaluation in staging? 

2) from irc, there were questions about whether new imaging machines/process was used for this.
(In reply to John O'Duinn [:joduinn] from comment #11)
> 1) What else is still on your TODO list? Will any of those impact RelEng
> evaluation in staging? 

It's in bug 780050.

> 2) from irc, there were questions about whether new imaging machines/process
> was used for this.

It was.
morphing summary, as 2nd node is being imaged same way to speed up testing.
Summary: Test w8 machine ix-mn-w864-002 for releng build use → Verify machine ix-mn-w864-00{1,2} suitable for RelEng win8 test use
(In reply to Dustin J. Mitchell [:dustin] from comment #12)
> (In reply to John O'Duinn [:joduinn] from comment #11)
> > 1) What else is still on your TODO list? Will any of those impact RelEng
> > evaluation in staging? 
> 
> It's in bug 780050.
...and in meeting just now, consolidated with two other TODO lists. Expect complete list here, once Melissa has finished working through timelines.


Tomcat: from meeting with IT just now, we believe that despite these remaining todo items, its still worthwhile you doing eval of this machine asap on cedar. We'd expect some oranges, which will clear up as the list is resolved, but it would be good to know if you hit other problems also.

Only after cedar is green, can we consider enabling this on other branches...



> > 2) from irc, there were questions about whether new imaging machines/process
> > was used for this.
> It was.
gtk, thanks.
per irc w/arr just now:

1) ix-mn-w864-002 is close to, but not identical, to the final spec of machines being delivered Friday. 

2) bug#838348 is tracking imaging more machines, which are identical to -002.
Depends on: 838348
Summary: Verify machine ix-mn-w864-00{1,2} suitable for RelEng win8 test use → Verify machine ix-mn-w864-00{2...6} suitable for RelEng win8 test use
(In reply to John O'Duinn [:joduinn] from comment #14)
> (In reply to Dustin J. Mitchell [:dustin] from comment #12)
> > (In reply to John O'Duinn [:joduinn] from comment #11)
> > > 1) What else is still on your TODO list? Will any of those impact RelEng
> > > evaluation in staging? 
> > 
> > It's in bug 780050.
> ...and in meeting just now, consolidated with two other TODO lists. Expect
> complete list here, once Melissa has finished working through timelines.

Complete list was posted instead to https://bugzilla.mozilla.org/show_bug.cgi?id=780050#c73



> Tomcat: from meeting with IT just now, we believe that despite these
> remaining todo items, its still worthwhile you doing eval of this machine
> asap on cedar. We'd expect some oranges, which will clear up as the list is
> resolved, but it would be good to know if you hit other problems also.
> 
> Only after cedar is green, can we consider enabling this on other branches...
> 
> 
> 
> > > 2) from irc, there were questions about whether new imaging machines/process
> > > was used for this.
> > It was.
> gtk, thanks.

Comment 17

6 years ago
Created attachment 710979 [details] [diff] [review]
win8 configs
Attachment #710253 - Attachment is obsolete: true

Comment 18

6 years ago
Created attachment 710981 [details] [diff] [review]
win8 buildbotcustom
Attachment #710254 - Attachment is obsolete: true

Comment 19

6 years ago
I have ix-mn-w864-002 running on my buildbot master:
http://dev-master01.build.scl1.mozilla.com:8042/builders/Rev3%20WINNT%206.2%20cedar%20opt%20test%20mochitest-other

You can trigger more sendchanges by running this script:
https://github.com/armenzg/playground/blob/master/mozilla/scripts/sendchanges.py
or rebuild current failed jobs.

We're currently having trouble with mozharness for win8 (I was working back with Matt on this). We preferred making the code more flexible.

We have to figure out why it is not running with the highest privilege:

14:54:48     INFO - Copy/paste: c:\talos-slave\test\build\venv\Scripts\mozinstall -h
Traceback (most recent call last):
  File "scripts/scripts/desktop_unittest.py", line 364, in <module>
    desktop_unittest.run()
  File "C:\slave\test\scripts\mozharness\base\script.py", line 735, in run
    self._possibly_run_method(method_name, error_if_missing=True)
  File "C:\slave\test\scripts\mozharness\base\script.py", line 694, in _possibly_run_method
    return getattr(self, method_name)()
  File "C:\slave\test\scripts\mozharness\mozilla\testing\testbase.py", line 250, in install
    output = self.get_output_from_command(cmd + ['-h'])
  File "C:\slave\test\scripts\mozharness\base\script.py", line 591, in get_output_from_command
    cwd=cwd, stderr=tmp_stderr, env=env)
  File "c:\mozilla-build\python\lib\subprocess.py", line 679, in __init__
    errread, errwrite)
  File "c:\mozilla-build\python\lib\subprocess.py", line 893, in _execute_child
    startupinfo)
WindowsError: [Error 740] The requested operation requires elevation

Or why would mozinstall not work if we are running with the highest privilege.

I've pointed my master to:
http://hg.mozilla.org/users/armenzg_mozilla.com/mozharness
and have landed a couple of suggested fixes
I suspect that's because the scheduled task is not running with highest privs, as my detective work this morning uncovered.  We'll get that patched up before the next round.
I manually fixed -002:
 - run with highest privileges
 - only run on cltbld login
So your testing can continue.

I'll update the TODOs in bug 780050.
(Assignee)

Comment 22

6 years ago
seems is run also into the issue from comment #19 i run into the UAC and its asking for the admin password. When i enter the root passwords i have or cltbld password it fails - bug 780050 mentions a fixed password, any pointers to this password?
(Assignee)

Comment 23

6 years ago
just as idea, it seems that Administrator is currently the only user with admin rights on the maschine (02) not sure what impact it would have to give cltbld also admin rights
(Assignee)

Comment 24

6 years ago
(In reply to Carsten Book [:Tomcat] from comment #22)
> seems is run also into the issue from comment #19 i run into the UAC and its
> asking for the admin password. When i enter the root passwords i have or
> cltbld password it fails - bug 780050 mentions a fixed password, any
> pointers to this password?

tried 8 different passwords for far from kickstart to old root passwords but no one worked :(
My understanding of the spec is that cltbld *not* have admin rights.  We can certainly change that if you'd like, but that makes the simulation of a user's machine less realistic.

I'll send the password out of band, but as of your morning the scheduled task should be running with elevated privs, so that shouldn't be necessary.  Can you confirm this is still the case?
(Assignee)

Comment 26

6 years ago
(In reply to Dustin J. Mitchell [:dustin] from comment #25)
> My understanding of the spec is that cltbld *not* have admin rights.  We can
> certainly change that if you'd like, but that makes the simulation of a
> user's machine less realistic.
> 
> I'll send the password out of band, but as of your morning the scheduled
> task should be running with elevated privs, so that shouldn't be necessary. 
> Can you confirm this is still the case?

yeah, confirmed. Also its running now, thanks for sending the password, confirming also i have full access now. Thanks dustin again!
(Assignee)

Updated

6 years ago
Depends on: 840458
(Assignee)

Updated

6 years ago
Depends on: 840461
(Assignee)

Updated

6 years ago
Depends on: 839052
(Assignee)

Comment 27

6 years ago
filed some bugs (see dependency list) that are coming up from the dev master - investigating
(Assignee)

Updated

6 years ago
Depends on: 840919
(Assignee)

Updated

6 years ago
Depends on: 840920
(Assignee)

Updated

6 years ago
Depends on: 840926
(Assignee)

Comment 28

6 years ago
seems so far Bug 840926 is the most important bug since this prevents tests from successful running, so this seems to really a blocker
No longer depends on: 840920
No longer depends on: 840919
(Assignee)

Updated

6 years ago
Depends on: 841362
(Assignee)

Comment 29

6 years ago
seems also bug 841362 is kind of a regression somehow on 002
(Assignee)

Comment 30

6 years ago
update to this bug, seems we were able to workaround Bug 840926 - patch there now inside.

01:26:40     INFO - Return code: 0
01:26:40     INFO - TinderboxPrint: reftest-crashtest<br/>2225/0/15
01:26:40     INFO - # TBPL SUCCESS #
01:26:40     INFO - The reftest suite: crashtest ran with return status: SUCCESS
01:26:40     INFO - Copying logs to upload dir...
01:26:40     INFO - mkdir: C:\slave\test\build\upload\logs

\o/
(Assignee)

Updated

6 years ago
Depends on: 843561
You're not still using these machines, right? Can we close this bug?

Comment 32

6 years ago
WRT to the scope of this bug, the machines were suitable for releng.

There were some mozharness issues that needed to be addressed.

Feel free to take back any ix-mn-w864-* machines back.

We will evaluate the new set of machines in bug 844130.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering

Updated

2 months ago
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.