Closed Bug 624044 Opened 14 years ago Closed 14 years ago

Install DirectX and upgrade NVIDIA driver to latest version on all Windows 7 test slaves

Categories

(Release Engineering :: General, defect, P3)

All
Windows 7
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bjacob, Assigned: armenzg)

References

Details

(Whiteboard: [unittest][win7][manual intervention])

Attachments

(2 files)

I had very weird failures on Windows 7 test slaves in the WebGL mochitest I'm trying to land.

Then I got VNC access to talos-r3-w7-048; I first upgraded the NVIDIA GeForce driver but the problems persisted; then I upgraded DirectX to latest version on it, and that fixed the failures.

So for sure, upgrading both NVIDIA GeForce driver and DirectX solves my problem. What I don't know is whether upgrading DirectX alone would be enough. I suspect so, but I can't be sure, and moreover we are soon going to be blacklisting old driver versions (they cause too much trouble) so now would be a great time to upgrade the NVIDIA driver too.

Here's the link I used to upgrade DirectX:
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=2da43d38-db71-4c1b-bc6a-9b6652cd92a3&displaylang=en

And to upgrade NVIDIA:
http://www.nvidia.com/object/win7-winvista-32bit-260.99-whql-driver.html

Thank you very much. I really need this for bug 582053 -- land the WebGL mochitest
fyi: doing this could impact any unittest or talos suite run on those same windows test machines - for all project branches and supported release branches. We'll need to coord the timing of this upgrade. Still figuring out details, but this *may* need a tree closure while we upgrade all the machines manually.

From irc with bjacob, we'll raise this in the Tuesday platform meeting and in newsgroups.
Doing this upgrade will require manual (VNC UI interaction) intervention for more than 50 slaves.

tree-closure is needed.
Whiteboard: [unittest][win7][manual intervention]
Two things:
1) This will probably impact Talos numbers.
2) We can probably partly automate the driver installation with an autoit script.
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Priority: -- → P2
Benoit at the bottom of the page you linked I can reach to:
http://www.microsoft.com/downloads/en/details.aspx?displaylang=en&FamilyID=3b170b25-abab-4bc3-ae91-50ceb6d8fa8d
which is the "DirectX End-User Runtimes (June 2010)" (directx_Jun2010_redist.exe - 95.6MB 6/7/2010 9.29.1962) installer rather than 
the web installer.

I am still trying to verify which files get deployed once this update is installed.
I also read that it cannot be uninstalled.

So far I have been able to verify that the dlls for the drivers go into C:\Windows\System32 and are dated with 10/16/2010.



Benoit I need the following information:
* what do I need to run on a slave without the DirectX update and drivers update that it will fail?
* if I run the steps that you provide me on talos-r3-w7-048 I assume it should work/pass

I need to do this verification before I can proceed.
> 
> Benoit I need the following information:
> * what do I need to run on a slave without the DirectX update and drivers
> update that it will fail?

This patch that I just pushed to try:
http://hg.mozilla.org/try/rev/1677b25190b9

It's going to give oranges anyway about 'Test expected to fail, but passed'. That's OK. If you don't get any other oranges than that, then the test slave is good. If you get 'Test failed' oranges, that's bad.

The only mochitest that you need to care about here is: mozilla-central/content/canvas/test/webgl/test_webgl_conformance_test_suite.html
I've been partly wrong here. (The WebGL and DirectX part is wrong)

Yesterday Armen showed me that these WebGL tests were actually successful on test slaves without any DirectX or driver upgrade.

The reason why my mochitest was failing seems to be that my changeset was extracting a DirectX runtime DLL (d3dx9_42.dll) from the DirectX CAB archive into dist/bin, and that was causing the failures. Yesterday I did a new tryserver push without this change, and it didn't have these failures. This change was only needed anyway for Win2k3 which is going away anyway.

So we don't currently seem to need the DirectX upgrade anymore, and this also doesn't seem to block WebGL anymore.

However, we still need the NVIDIA driver upgrade as it blocks hardblocker bug 623338.
Blocks: 623338
No longer blocks: 582053
Summary: Upgrade DirectX and NVIDIA driver to latest version on all Windows 7 test slaves → Upgrade NVIDIA driver to latest version on all Windows 7 test slaves
My comment 6 was itself wrong:

(In reply to comment #6)
> Yesterday Armen showed me that these WebGL tests were actually successful on
> test slaves without any DirectX or driver upgrade.

Actually that's not the case: there had been a DirectX runtime upgrade (DirectX SDK dlls). What I didn't realize was that the link I put in comment 0 was the directx runtime.

Today I logged back into test slaves, here's what I found:
 * test slaves don't have the DirectX runtime installed, so they fail to use ANGLE, and fall back to OpenGL. With that, the WebGL tests succeed, but ideally we want to test ANGLE.
 * installing the DirectX runtime (link from comment 0) makes them succeed at using ANGLE. WebGL tests still succeed.

> The reason why my mochitest was failing seems to be that my changeset was
> extracting a DirectX runtime DLL (d3dx9_42.dll) from the DirectX CAB archive
> into dist/bin, and that was causing the failures. Yesterday I did a new
> tryserver push without this change, and it didn't have these failures. This
> change was only needed anyway for Win2k3 which is going away anyway.

So, those weird failures happen when we ship this d3dx9_42.dll on a machine that doesn't have the DirectX runtime installed. Will continue investigating this locally.

> So we don't currently seem to need the DirectX upgrade anymore, and this also
> doesn't seem to block WebGL anymore.

Need to discuss this with Vlad.

> 
> However, we still need the NVIDIA driver upgrade as it blocks hardblocker bug
> 623338.

That remains equally true, it's independent of the above.
Benoit I had a little bit of time to poke at this today.
It seems that the second machine had DirectX runtime already installed (either me by mistake OR from a previous loan - we don't reimage machines fast enough).

After installation on a clean machine we go from 14 files matching "d3d" on C:\Windows\System32 to 61 files. I will write the complete information tomorrow.
DirectX has a set of core and optional components. The runtime installer simply adds the optional components/updates. Each new package has the contents of the previous plus new releases. We could even deploy our custom one and even use silent install (if UAC was not on the way) if we want to but we are not going to be playing around. I would not want to be missing now something that the GFX team could need later.

For more information you can read this:
* http://msdn.microsoft.com/en-us/library/ee416805%28v=vs.85%29.aspx

There is both a web installer and a redistributable package (at this point directx_Jun2010_redist.exe). The redistributable package just unzip all the cab files, some dlls and dxsetup.exe; after that you have to run dxsetup.exe.

The installation through the web installer is cleaner and faster. It requires only few clicks (uncheck installing the Byng bar), the download and the installation is also fast (1-2mins).

The NVIDIA driver update is:
* GeForce/ION Driver Release 260
  Version: 260.99; Date: 2010.10.25; Size: 96.3MB
  260.99_desktop_win7_winvista_32bit_english_whql.exe

The installation process is:
* wget http://us.download.nvidia.com/Windows/260.99/260.99_desktop_win7_winvista_32bit_english_whql.exe
* 260.99_desktop_win7_winvista_32bit_english_whql.exe (say "Yes" to UAC)
* It will unzip contents to C:\NVIDIA\DisplayDriver\260.99\Vista
* After it finishes it opens up a Wizard which we click "next" few times
  * It takes a couple of minutes
* Reboot the machine for the changes to take place

More on deployment plans later.

Let's also keep track of the changes that happen:

Before DirectX update
#####################
C:\Windows\System32>ls -l d3d*                   
-rw-rw-rw-  2 cltbld 0 1030144 2009-07-13 18:15 d3d10.dll
-rw-rw-rw-  2 cltbld 0  161792 2009-07-13 18:15 d3d10_1.dll
-rw-rw-rw-  2 cltbld 0  217088 2009-07-13 18:15 d3d10_1core.dll
-rw-rw-rw-  2 cltbld 0  190464 2009-07-13 18:15 d3d10core.dll
-rw-rw-rw-  2 cltbld 0  489984 2009-07-13 18:15 d3d10level9.dll
-rw-rw-rw-  2 cltbld 0  825344 2009-07-13 18:15 d3d10warp.dll
-rw-rw-rw-  2 cltbld 0  522752 2009-07-13 18:15 d3d11.dll
-rw-rw-rw-  2 cltbld 0 1036800 2009-07-13 18:15 d3d8.dll
-rw-rw-rw-  2 cltbld 0   11264 2009-07-13 18:15 d3d8thk.dll
-rw-rw-rw-  2 cltbld 0 1826816 2009-07-13 18:15 d3d9.dll
-rw-rw-rw-  2 cltbld 0  386048 2009-07-13 18:15 d3dim.dll
-rw-rw-rw-  2 cltbld 0  817664 2009-07-13 18:15 d3dim700.dll
-rw-rw-rw-  2 cltbld 0  593920 2009-07-13 18:15 d3dramp.dll
-rw-rw-rw-  2 cltbld 0   53760 2009-07-13 18:15 d3dxof.dll

After DirectX update
####################
Now add to those 14 files few more to a grand total of 61 files.
No need to clutter the comment.

Before NVIDIA drivers update
############################
* Open Dxdiag.exe
* Go to the "Display" tab
* Look for the driver's info:
Version 8.17.11.9562
Date 11/20/2009 6:34:54PM

NOTE: only the directory "195.62" exists in C:\NVIDIA\DisplayDriver

After NVIDIA drivers update
###########################
* Open Dxdiag.exe
* Go to the "Display" tab
* Look for the driver's info:
Version 8.17.12.6099
Date 10/16/2010 10:55:00AM

NOTE: now the directory "195.62" & "260.99" exists in C:\NVIDIA\DisplayDriver

In C:\Windows\System32 you will see:
-rw-rw-rw-   1 cltbld 0    57960 2010-10-16 11:55 OpenCL.dll               
-rw-rw-rw-   1 cltbld 0  5473896 2010-10-16 11:55 nvwgf2um.dll                
-rw-rw-rw-   1 cltbld 0 14899816 2010-10-16 11:55 nvoglv32.dll     
-rw-rw-rw-   1 cltbld 0     4962 2010-10-16 11:55 nvinfo.pb            
-rw-rw-rw-   1 cltbld 0   813672 2010-10-16 11:55 nvgenco322030.dll           
-rw-rw-rw-   1 cltbld 0   888424 2010-10-16 11:55 nvdispco322050.dll
-rw-rw-rw-   1 cltbld 0   319080 2010-10-16 11:55 nvdecodemft.dll               
-rw-rw-rw-   1 cltbld 0 10023528 2010-10-16 11:55 nvd3dum.dll       
-rw-rw-rw-   1 cltbld 0  2912360 2010-10-16 11:55 nvcuvid.dll                   
-rw-rw-rw-   1 cltbld 0  2666600 2010-10-16 11:55 nvcuvenc.dll      
-rw-rw-rw-   1 cltbld 0  4837480 2010-10-16 11:55 nvcuda.dll                    
-rw-rw-rw-   1 cltbld 0 13019752 2010-10-16 11:55 nvcompiler.dll           
-rw-rw-rw-   1 cltbld 0  1719912 2010-10-16 11:55 nvapi.dll                     
-rw-rw-rw-   1 cltbld 0  2079336 2010-10-16 13:42 nvsvc.dll        
-rw-rw-rw-   1 cltbld 0  3420776 2010-10-16 13:42 nvcpl.dll                     
-rwxrwxrwx   1 cltbld 0   600680 2010-10-16 13:42 nvvsvc.exe                  
-rw-rw-rw-   1 cltbld 0   110696 2010-10-16 13:42 nvmctray.dll
Blocks: 625849
== through command prompt
cd C:\Users\cltbld\Desktop
wget http://pastebin.mozilla.org/?dl=941198 -Odirect_nvidia.bat

== through VNC (or remote desktop)
* double-click on directx_nvidia.bat on Desktop
* click "yes", "next" and other type of UI prompts
* shutdown -f -r -t 0

Estimate per slave 2-3mins.

I will put the slaves in the staging pool let them run.
I will also have a look to see if I figure out how to do with autoit (I won't break my head if I don't and fallback on manual intervention).
Attachment #503940 - Flags: review?(coop)
Joe, as you asked here is the spreadsheet that compares talos numbers for a staging slave (changes applied) with a production slave (no changes).

Could you please have a look at the number variances?
I have highlighted in orange those that changed enough that "my eye" will notice it.

I would like to know if you see anything dramatic that should make us investigate further.

Tomorrow/Wednesday I could pick up the numbers again if you wish so (the colo failure made me give you numbers for an old changeset for svg and tp4).
Attachment #504524 - Flags: review?(joe)
Attachment #504524 - Attachment mime type: text/plain → application/vnd.ms-excel
Comment on attachment 504524 [details]
talos changes after DirectX and Nvidia drivers changes

Some of these are regrettable, but they more accurately reflect what our users will have, and they're absolutely required for future testing.
Attachment #504524 - Flags: review?(joe) → review+
Attachment #503940 - Flags: review?(coop) → review+
Nvidia released new GeForce 266.58 WHQL drivers, in case the GFX team might need them for testing.


http://www.nvidia.com/object/win7-winvista-32bit-266.58-whql-driver.html
Summary: Upgrade NVIDIA driver to latest version on all Windows 7 test slaves → Install DirectX and upgrade NVIDIA driver to latest version on all Windows 7 test slaves
It's OK, we don't particularly seem to need the new version, 260.99 is enough.
We need this also on the new WinXP test slaves.

This no longer has to block bug 623338, instead I'm making the tests bypass the blacklist: bug 628377, bug 628384.
No longer blocks: 623338
Summary: Install DirectX and upgrade NVIDIA driver to latest version on all Windows 7 test slaves → Install DirectX and upgrade NVIDIA driver to latest version on all Windows test slaves
Blocks: 629759
I deployed the change to talos-r3-w7-ref machine before the snapshot (see https://bugzilla.mozilla.org/show_bug.cgi?id=617105#c15)

I updated the documentation:
https://wiki.mozilla.org/ReferencePlatforms/Test/Win7#DirectX_runtime_.28June_2010.29
https://wiki.mozilla.org/ReferencePlatforms/Test/Win7#NVidia_drivers_update_.28Version:_260.99.3B_Date:_2010.10.25.29

We still have to deploy the change to slaves w7-020 and w7-036 which are currently down (see bug 629511).

And the staging slave w7-002 which is down for re-imaging (see bug 627121).

I think we are all done here except those last 3 slaves.

I will keep it open until I take care of them.
Depends on: 629511
Priority: P2 → P3
Summary: Install DirectX and upgrade NVIDIA driver to latest version on all Windows test slaves → Install DirectX and upgrade NVIDIA driver to latest version on all Windows 7 test slaves
Depends on: 617105
Blocks: 629935
Depends on: 630309
(In reply to comment #16)
> We still have to deploy the change to slaves w7-020 and w7-036 which are
> currently down (see bug 629511).
bug 630309 now tracks the work of adding the changes and adding the machines back to the pool.
No longer depends on: 629511
No longer blocks: 629935
No longer blocks: 629759
I have fixed slaves 20 & 36 and they will soon be back in the production pool.

Once bug 627121 is fixed we will have the staging slave talos-r3-w7-002 back into the staging pool. I will install the drivers and DirectX once is fixed.

Nothing left to be done in here.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Depends on: 627121
Resolution: --- → FIXED
Are these driver changes now part of the puppet/opsi configuration?  How can we get machines outside the releng vpn updated to latest?  This would include the auto-tools staging machines.
Hi Alice! welcome back!
This is documented on the ref documentation and can be freely downloaded:
https://wiki.mozilla.org/ReferencePlatforms/Test/Win7#DirectX_runtime_.28June_2010.29

Ping me on irc if you have any questions.
Hi bjacob,
It seems that upgrading the graphic driver regressed us in our capacity to test accelerated reftests. We lost the capacity to switch to any screen resolution higher than a 1000 pixels (see bug 702504).

I need to switch to one of these drivers: [1]
Name 	                        Version   Release Date
GeForce 285.79 Driver BETA 	285.79 	  November 10, 2011
GeForce 285.62 Driver WHQL 	285.62 	  October 24, 2011
GeForce 285.38 Driver BETA 	285.38 	  September 26, 2011
GeForce 285.27 Driver BETA 	285.27 	  September 13, 2011
GeForce 280.26 Driver WHQL 	280.26 	  August 9, 2011
GeForce 280.19 Driver BETA 	280.19 	  July 28, 2011
GeForce 275.50 Driver BETA 	275.50 	  June 20, 2011
GeForce 275.33 Driver WHQL 	275.33 	  June 1, 2011
GeForce Driver v275.27 BETA 	275.27 	  May 17, 2011
GeForce/ION Driver v270.61 WHQL 270.61 	  April 18, 2011
GeForce/ION Driver v270.51 BETA 270.51 	  March 30, 2011
GeForce/ION Driver v267.24 BETA 267.24 	  March 1, 2011
GeForce/ION Driver v266.58 WHQL 266.58 	  January 18, 2011
GeForce/ION Release 265 BETA 	266.35 	  January 4, 2011
GeForce/ION Release 260 WHQL 	260.99 	  October 25, 2010
GeForce/ION Release 260 WHQL 	260.89 	  October 18, 2010
GeForce/ION Release 256 WHQL 	258.96 	  July 19, 2010
GeForce/ION Release 256 WHQL 	258.96 	  July 19, 2010
GeForce/ION Release 256 BETA 	258.69 	  June 29, 2010
GeForce/ION Release 256 WHQL 	257.21 	  June 15, 2010

So far I have been able to determine that 257.21 does the job (it gives us 1280x1024 32-bit resolution).

Would such version work for you? If not, what is the minimum version out of that list that would work for you.

This is quite important to determine soon rather than later as I would have to test such driver thoroughly before deploying it to all slaves and this is something we need to do as we have lost coverage for 10 months.

Thanks!

[1] http://www.nvidia.com/Download/Find.aspx?lang=en-us
If we can't get a high resolution, that's probably not an issue with the driver itself: obviously NVIDIA wouldn't release a driver that can't do high resolutions. Rather, it could be an issue with how it's installed, a problem in the Windows registry or some other state. In other words, are you sure that the problem is specific to certain driver versions?

We really want drivers as recent as possible, as significant bugs have been fixed for WebGL, so it would be nice if we could get 285.62 WHQL. Otherwise, I suppose that yes, 257.21 is the minimum version that is still useful.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: