Last Comment Bug 648804 - Upgrade all windows build slaves to June 2010 or later version of the DirectX SDK
: Upgrade all windows build slaves to June 2010 or later version of the DirectX...
Status: RESOLVED FIXED
[builders][opsi]
:
Product: Release Engineering
Classification: Other
Component: Other (show other bugs)
: other
: x86_64 Windows Server 2003
: P2 normal (vote)
: ---
Assigned To: Chris AtLee [:catlee]
:
Mentors:
: 634817 (view as bug list)
Depends on: 651295
Blocks: 629759 634817 657748 660745 671184 673559
  Show dependency treegraph
 
Reported: 2011-04-09 18:10 PDT by Benoit Jacob [:bjacob] (mostly away)
Modified: 2013-08-12 21:54 PDT (History)
19 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
+
fixed
+
fixed


Attachments
prefer June 2010 SDK and be more fool-proof in extracting path (1.58 KB, patch)
2011-05-02 14:29 PDT, Benoit Jacob [:bjacob] (mostly away)
joe: review+
Details | Diff | Review
bump dx10 opsi package to the june version (3.14 KB, patch)
2011-05-18 13:59 PDT, Chris AtLee [:catlee]
bhearsum: review+
catlee: checked‑in+
Details | Diff | Review
delete old sdk first (722 bytes, patch)
2011-05-19 19:00 PDT, Chris AtLee [:catlee]
coop: review+
catlee: checked‑in+
Details | Diff | Review
don't sign new dx10 dll's (941 bytes, patch)
2011-05-31 06:57 PDT, Chris AtLee [:catlee]
bhearsum: review+
catlee: checked‑in+
Details | Diff | Review

Description Benoit Jacob [:bjacob] (mostly away) 2011-04-09 18:10:28 PDT
The Windows build slaves have the February 2010 version of the DirectX SDK, carrying the internal version number 42.

For at least two different important reasons, we need this to be upgraded to the June 2010 version (internal version number 43) or later. Whatever is the current latest version.

Reasons:
 * recent ANGLE revisions require this newer DirectX SDK. Currently, in order to benefit from recent ANGLE revisions, we have to use a custom patch to allow using the older DirectX SDK.
 * bug 634817 shows a crash bug that we get because of the DirectX SDK version that we use, and that is not present in the newer version.
Comment 1 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2011-04-10 20:33:46 PDT
I can understand the desire to use newer DirectX SDK - but first, are there any backward compatible issues? Will upgrading like this cause any problems for any code or tests on FF3.5, FF3.6, FF4.0?
Comment 2 Benoit Jacob [:bjacob] (mostly away) 2011-04-10 22:34:11 PDT
There shouldn't be any issues: the newer SDK is actually what most developers building on Windows are already using.
Comment 3 Armen Zambrano [:armenzg] - Engineering productivity 2011-04-11 05:52:36 PDT
Benoit are you sure that we don't already have the June 2010 version?
I deployed this in January *2011* and I took note that version installed was June 2010.
https://bugzilla.mozilla.org/show_bug.cgi?id=624044#c9
Comment 4 Benoit Jacob [:bjacob] (mostly away) 2011-04-11 10:07:24 PDT
(In reply to comment #3)
> Benoit are you sure that we don't already have the June 2010 version?

What I know for sure is that today's nightly build,
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011-04-11-03-mozilla-central/firefox-4.2a1pre.en-US.win32.zip

Is shipping the February 2010 version of the DirectX SDK DLLs: when I extract this archive, in the firefox/ directory there is
  d3dx9_42.dll
  D3DCompiler_42.dll
This number 42 means February 2010.

Could it be that the build slaves have *both* versions of the DirectX SDK installed? If yes, then there are two ways that we can fix the problem:
 * I could land a configure.in change making sure that when we have more than one version of the DirectX SDK, we use the most recent one.
 * Or you could uninstall the old February 2010 version on the build slaves.
Comment 5 Armen Zambrano [:armenzg] - Engineering productivity 2011-04-11 10:11:33 PDT
Yes, when you run the web installer it installs everything up to the latest SDK.

Yes please, try to fix it on the build side as it is very expensive for me to make that change.
Comment 6 Nick Thomas [:nthomas] 2011-04-12 17:01:29 PDT
There's some confusion between build and test slaves here. Build slaves have the Feb 2010 SDK, from bug 529938. Windows 7 test slaves have a more recent runtime, from bug 624044.

IIRC, we are currently using VS 2005 as the compiler pretty much everywhere (except maybe win64), and bug 529938 comment #27 states that the Feb 2010 SDK is the last one to support that compiler. I think we have VS2008 installed too, but not VS2010.

Changing the compiler is obviously complicates things, CC'ing some build system bigshots.
Comment 7 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-04-12 17:03:04 PDT
We *really* should move to a new compiler ... but that's a lot of work, so it might not be able to block this ...
Comment 8 Armen Zambrano [:armenzg] - Engineering productivity 2011-04-13 08:04:51 PDT
Yes, I got confused.
Comment 9 Lukas Blakk [:lsblakk] use ?needinfo 2011-04-13 10:09:06 PDT
Can someone familiar with this process outline the steps that need to be taken to make this update happen so that slaveduty can track and perform the upgrade over the next week or so?
Comment 10 Lukas Blakk [:lsblakk] use ?needinfo 2011-04-13 10:33:54 PDT
Apparently this will require OPSI work, so adjusting the whiteboard for that.
Comment 11 Benoit Jacob [:bjacob] (mostly away) 2011-04-13 10:46:18 PDT
(In reply to comment #6)
> There's some confusion between build and test slaves here. Build slaves have
> the Feb 2010 SDK, from bug 529938. Windows 7 test slaves have a more recent
> runtime, from bug 624044.

Ah OK. To be clear, since we are now extracting the DLLs from the SDK at build time and shipping them with the build (bug 630628), we now only need the SDK to be installed on the build slaves, not on the test slaves.
Comment 12 Nick Thomas [:nthomas] 2011-04-13 20:29:47 PDT
(In reply to comment #9)

To restate part of comment #6 - I don't think we can proceed with upgrading the SDK because it will cause the compilation to fail with our VS2005 compiler.
Comment 13 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2011-04-14 13:58:54 PDT
(In reply to comment #12)
> (In reply to comment #9)
> 
> To restate part of comment #6 - I don't think we can proceed with upgrading the
> SDK because it will cause the compilation to fail with our VS2005 compiler.

Benoit:

Upgrading the compiler has value, but its not something we're willing to do at this point in our transition to a faster release cadence. We'll revisit the compiler upgrade topic later this year.

Given the compiler upgrade is a requirement for the new DirectX, and given we are already on the latest DirectX supported by the current compiler, I therefore recommend we stay with the DirectX we have now and close this bug as WONTFIX.

Did I miss anything?
Comment 14 Masatoshi Kimura [:emk] 2011-04-14 15:28:54 PDT
As I said in bug 529938 comment #30, I can build Minefield with VS2005 + DXSDK June 2010 + ANGLE enabled without any errors by installing KB949009.
Comment 15 Benoit Jacob [:bjacob] (mostly away) 2011-04-14 17:29:55 PDT
(In reply to comment #14)
> As I said in bug 529938 comment #30, I can build Minefield with VS2005 + DXSDK
> June 2010 + ANGLE enabled without any errors by installing KB949009.

Ah, interesting.

This link:
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=3021d52b-514e-41d3-ad02-438a3ba730ba
says that the June 2010 SDK does not support MSVS 2005, but maybe for the subset we're using it doesn't matter. Will try.
Comment 16 Benoit Jacob [:bjacob] (mostly away) 2011-04-14 17:30:56 PDT
(In reply to comment #13)
> Did I miss anything?

Let's see if we can get the June 2010 SDK to work with MSVS 2005. I will try installing MSVS 2005 myself.
Comment 17 John Ford [:jhford] 2011-04-18 08:50:00 PDT
Benoit, assinging bug to you while you test this.  Please unassign yourself when you have tested this and figured out whether this works
Comment 18 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2011-04-19 20:30:25 PDT
Benoit: 

I've filed bug#651295 to get you a loaner machine - hope this helps.

Note that these same machines are used for FF3.5, FF3.6, FF4.0, as well as m-c. In order to do this DirectX upgrade, we need to also avoid breaking binary compatibility with other releases, so I believe there are two questions here:

1) Is it possible to have two different versions of DirectX installed on the same machine? If so, how?

2) If we need to replace Feb2010 DirectX install with June2010 DirectX install, can the June2010 DirectX be used with the existing MSVS 2005 compiler, and used for the existing FF3.5.x, FF3.6.x, FF4.0.x releases? If we have to upgrade build compilers, will this cause any problems for binary compatibility for these supported releases?
Comment 19 Armen Zambrano [:armenzg] - Engineering productivity 2011-04-26 10:45:59 PDT
(In reply to comment #18)
> Benoit: 
> 
> I've filed bug#651295 to get you a loaner machine - hope this helps.
> 
> Note that these same machines are used for FF3.5, FF3.6, FF4.0, as well as m-c.
> In order to do this DirectX upgrade, we need to also avoid breaking binary
> compatibility with other releases, so I believe there are two questions here:
> 
> 1) Is it possible to have two different versions of DirectX installed on the
> same machine? If so, how?
> 
> 2) If we need to replace Feb2010 DirectX install with June2010 DirectX install,
> can the June2010 DirectX be used with the existing MSVS 2005 compiler, and used
> for the existing FF3.5.x, FF3.6.x, FF4.0.x releases? If we have to upgrade
> build compilers, will this cause any problems for binary compatibility for
> these supported releases?

IIUC from my messing with the SDK a couple of months ago. The newer SDK always contains the contents of the previous SDK plus the new ones. This helps a program target an specified version of the SDK if needed.
Comment 20 Benoit Jacob [:bjacob] (mostly away) 2011-05-02 14:29:11 PDT
Created attachment 529575 [details] [diff] [review]
prefer June 2010 SDK and be more fool-proof in extracting path

This patch does 2 things:
 1) when there are multiple DXSDK versions install, try first to get the June 2010 SDK. Otherwise, take whatever comes first.
 2) more fool-proof sed command to extract the path. Should fix bug 643732.
Comment 21 Benoit Jacob [:bjacob] (mostly away) 2011-05-03 12:48:15 PDT
Great news: on the builder i've gotten access to (bug 651295), building with the June 2010 SDK worked without any problem. I've checked the resulting build, it ran the WebGL conformance test suite exactly like current Nightly. I've also triple checked that it had been built with Visual Studio 2005, and that the DLLs it was shipping and using were the June 2010 SDK DLLs.

-> conclusion: as far as we are concerned, Visual Studio 2005 works nicely with the June 2010 DirectX SDK. Please proceed with the upgrade.
Comment 22 Benoit Jacob [:bjacob] (mostly away) 2011-05-09 06:30:37 PDT
Comment on attachment 529575 [details] [diff] [review]
prefer June 2010 SDK and be more fool-proof in extracting path

Review of attachment 529575 [details] [diff] [review]:
-----------------------------------------------------------------
Comment 23 Benoit Jacob [:bjacob] (mostly away) 2011-05-09 15:38:11 PDT
Now that the concerns over MSVS 2005 compatibility are cleared, do you think the June 2010 SDK update could happen in time for Firefox 6 branching off m-c? If not, we'll probably do a ANGLE update anyway but manually removing the changes that require the June 2010 SDK.
Comment 24 Joe Drew (not getting mail) 2011-05-11 15:02:51 PDT
Comment on attachment 529575 [details] [diff] [review]
prefer June 2010 SDK and be more fool-proof in extracting path

It seems that this does not fix this detection on Windows XP, but there's no reason for that to block this patch being accepted.

Note that you'll want to remove the echos before committing.
Comment 25 Bill Gianopoulos [:WG9s] 2011-05-11 15:04:16 PDT
(In reply to comment #20)
> Created attachment 529575 [details] [diff] [review] [review]
> prefer June 2010 SDK and be more fool-proof in extracting path
> 
> This patch does 2 things:
>  1) when there are multiple DXSDK versions install, try first to get the
> June 2010 SDK. Otherwise, take whatever comes first.
>  2) more fool-proof sed command to extract the path. Should fix bug 643732.

This patch has DOS line endings.
Comment 26 Benoit Jacob [:bjacob] (mostly away) 2011-05-18 12:15:33 PDT
Confirming that this DXSDK upgrade is really needed. In bug 657748 I've been trying to update our ANGLE copy and it fails on a bug in the February 2010 SDK we're using, http://code.google.com/p/angleproject/issues/detail?id=158
Comment 27 Benoit Jacob [:bjacob] (mostly away) 2011-05-18 12:16:30 PDT
To be clear, we really need the newer ANGLE for Firefox 6! It buys us stability, security fixes, performance improvements, and important new features. It's important for parity with Chrome.
Comment 28 Benoit Jacob [:bjacob] (mostly away) 2011-05-18 12:44:07 PDT
Landed the DXSDK detection patch:
http://hg.mozilla.org/mozilla-central/rev/962fee06b08e
Comment 29 Chris AtLee [:catlee] 2011-05-18 13:59:16 PDT
Created attachment 533408 [details] [diff] [review]
bump dx10 opsi package to the june version

installing over top of the old version seems to work, as does deleting the old registry key.
Comment 30 Benoit Jacob [:bjacob] (mostly away) 2011-05-18 17:37:22 PDT
Yay!
Note: deleting the old registry key should not be needed. When I played with the builder, I just installed the new SDK alongside the old one, the two were listed in the registry and the ./configure is able to get it right i.e. prefer the June 2010 SDK if present.
Comment 31 Bill Gianopoulos [:WG9s] 2011-05-19 07:27:31 PDT
(In reply to comment #28)
> Landed the DXSDK detection patch:
> http://hg.mozilla.org/mozilla-central/rev/962fee06b08e

Something is odd here, I have been doing my daily Windows builds under Windows/XP with the patch attached to this bug included, and they have been building with angle.  Since this patch is now on mozilla-central, I altered my build to not include the patch and today's resultant builds did NOT include angle.  I have no idea why not.  I will look at the logs tonight when I have access to the logs on the build system, and post additional info.
Comment 32 Chris AtLee [:catlee] 2011-05-19 10:52:11 PDT
Deploying to try build machines now.
Comment 33 Bill Gianopoulos [:WG9s] 2011-05-19 15:44:51 PDT
(In reply to comment #31)
> (In reply to comment #28)
> > Landed the DXSDK detection patch:
> > http://hg.mozilla.org/mozilla-central/rev/962fee06b08e
> 
> Something is odd here, I have been doing my daily Windows builds under
> Windows/XP with the patch attached to this bug included, and they have been
> building with angle.  Since this patch is now on mozilla-central, I altered
> my build to not include the patch and today's resultant builds did NOT
> include angle.  I have no idea why not.  I will look at the logs tonight
> when I have access to the logs on the build system, and post additional info.

Please ignore this report.

It would seem something went amiss on my Windows builds only, and somehow my build system checked out the revision matching yesterday's nightly instead of today's.  There is supposed to be code in my script to prevent that.
Comment 34 Chris AtLee [:catlee] 2011-05-19 18:28:08 PDT
(In reply to comment #32)
> Deploying to try build machines now.

Looks like this didn't work; will try again tomorrow
Comment 35 Chris AtLee [:catlee] 2011-05-19 19:00:52 PDT
Created attachment 533872 [details] [diff] [review]
delete old sdk first

turns out that the move commands were failing to overwrite the old files, and then the new ones got deleted with the subsequent rmdir. this patch fixes that up.
Comment 36 Chris AtLee [:catlee] 2011-05-20 11:11:09 PDT
Deploying to try again...
Comment 37 Chris AtLee [:catlee] 2011-05-20 12:53:43 PDT
Looks good on try, deploying to the rest of the build machines.
Comment 38 Daniel Veditz [:dveditz] 2011-05-23 13:52:08 PDT
When you roll this out to "all" build slaves does that mean Firefox 5 "beta" builds will automatically pick up this fix? Or are the release build machines separate?
Comment 39 Nick Thomas [:nthomas] 2011-05-23 15:03:33 PDT
(catlee is out for a few days)

There's no separation of build machines for any particular branch or build type, so all release builds could be affected by this change. Comments #18 and #19 deal with compatibility issues, to some extent. Do you want to continue targeting the Feb 2010 SDK in Fx5 beta builds ?
Comment 40 Benoit Jacob [:bjacob] (mostly away) 2011-05-23 16:34:59 PDT
*** Bug 634817 has been marked as a duplicate of this bug. ***
Comment 41 Benoit Jacob [:bjacob] (mostly away) 2011-05-24 08:32:09 PDT
(In reply to comment #37)
> Looks good on try, deploying to the rest of the build machines.

OK, so this is fixed, right? Bug 634817 comment 20 says that Aurora is shipping the June 2010 DLLs now. Marking as fixed, reopen if needed.
Comment 42 Nick Thomas [:nthomas] 2011-05-24 14:50:00 PDT
Not all hosts have picked it up yet, reopening to chase those down.

# root@production-opsi
cd /var/lib/opsi/config/clients
grep -l '9\.28\.1886' *
mw32-ix-slave19.uib.local.ini
mw32-ix-slave20.uib.local.ini
w32-ix-slave01.uib.local.ini
w32-ix-slave02.uib.local.ini
w32-ix-slave04.uib.local.ini
w32-ix-slave05.uib.local.ini
w32-ix-slave07.uib.local.ini
w32-ix-slave23.uib.local.ini
w32-ix-slave41.uib.local.ini
win32-ix-ref.uib.local.ini
win32-slave01.uib.local.ini
win32-slave02.uib.local.ini
win32-slave05.uib.local.ini
win32-slave06.uib.local.ini
win32-slave08.uib.local.ini
win32-slave09.uib.local.ini
win32-slave12.uib.local.ini
win32-slave13.uib.local.ini
win32-slave14.uib.local.ini
win32-slave15.uib.local.ini
win32-slave16.uib.local.ini
win32-slave17.uib.local.ini
win32-slave18.uib.local.ini
win32-slave19.uib.local.ini
win32-slave20.uib.local.ini
win32-slave22.uib.local.ini
win32-slave23.uib.local.ini
win32-slave24.uib.local.ini
win32-slave25.uib.local.ini
win32-slave27.uib.local.ini
win32-slave28.uib.local.ini
win32-slave29.uib.local.ini
win32-slave31.uib.local.ini
win32-slave33.uib.local.ini
win32-slave34.uib.local.ini
win32-slave41.uib.local.ini
win32-slave46.uib.local.ini

Some of those machines may be gone or defunct.
Comment 43 Benoit Jacob [:bjacob] (mostly away) 2011-05-24 15:09:03 PDT
(In reply to comment #42)
> Not all hosts have picked it up yet,

Woops. A build made by a machine that hasn't picked it is a build that won't have ANGLE enabled on Windows, i.e. a big regression.

How long will it take for remaining hosts to pick it? How can I check if the build of the Aurora that was cut off today has it?
Comment 44 Benoit Jacob [:bjacob] (mostly away) 2011-05-24 15:10:05 PDT
(Because I landed the ANGLE update today and June 2010 is now a requirement)
Comment 45 Nick Thomas [:nthomas] 2011-05-24 15:12:59 PDT
I'm going around them now. Idle slaves will get rebooted immediately and pick up the changes. Anything not idle will need to finish the job first, but still only a couple of hours.
Comment 46 Benoit Jacob [:bjacob] (mostly away) 2011-05-24 15:15:02 PDT
Thanks!

I had checked a few tinderbox builds and they had ANGLE and the updated SDK files.
Comment 47 Nick Thomas [:nthomas] 2011-05-24 16:36:45 PDT
mw32-ix-slave19.uib.local.ini - preproduction - rebooted
mw32-ix-slave20.uib.local.ini - production - rebooted

w32-ix-slave01.uib.local.ini - staging - upgrade wasn't set - set now (false positive here)

w32-ix-slave02.uib.local.ini - production - upgrade wasn't set - fixed & rebooted
w32-ix-slave04.uib.local.ini - production - upgrade wasn't set - fixed, will reboot after current cedar compile
w32-ix-slave05.uib.local.ini - production but waiting for reimage - upgrade wasn't set - set now and will get it from the reimage anyway
w32-ix-slave07.uib.local.ini - staging - upgrade wasn't set - set now
w32-ix-slave23.uib.local.ini - prod but away at iX for hardware fix - is set
w32-ix-slave41.uib.local.ini - prod but away at iX for hardware fix - is set

win32-ix-ref.uib.local.ini - reference image - rebooted to apply upgrade

win32-slave02.uib.local.ini - all VMs that were deleted
win32-slave05.uib.local.ini
win32-slave12.uib.local.ini
win32-slave13.uib.local.ini
win32-slave14.uib.local.ini
win32-slave15.uib.local.ini
win32-slave16.uib.local.ini
win32-slave17.uib.local.ini
win32-slave18.uib.local.ini
win32-slave19.uib.local.ini
win32-slave22.uib.local.ini
win32-slave23.uib.local.ini
win32-slave24.uib.local.ini
win32-slave25.uib.local.ini
win32-slave27.uib.local.ini
win32-slave28.uib.local.ini
win32-slave29.uib.local.ini

win32-slave01.uib.local.ini - production - rebooted
win32-slave06.uib.local.ini - production - rebooted
win32-slave08.uib.local.ini - production - rebooted
win32-slave09.uib.local.ini - production - rebooted
win32-slave20.uib.local.ini - production - rebooted
win32-slave31.uib.local.ini - production - rebooted
win32-slave33.uib.local.ini - production - rebooted
win32-slave34.uib.local.ini - production - rebooted
win32-slave41.uib.local.ini - production - rebooted
win32-slave46.uib.local.ini - production - rebooted

All done here.
Comment 48 Ben Hearsum (:bhearsum) 2011-05-25 09:26:32 PDT
w32-ix-slave03 (a staging slave) doesn't have the new SDK yet, and refuses to install it when I mark it for installation. I'm not sure what the issue is, probably a busted OPSI installation :(
Comment 49 Dustin J. Mitchell [:dustin] 2011-05-25 11:54:02 PDT
That's bug 659186 - sorry, I didn't include that in the tracking spreadsheet and thus forgot about it.  If you can fix that bug without a reimage, great!
Comment 50 Chris AtLee [:catlee] 2011-05-31 06:57:24 PDT
Created attachment 536285 [details] [diff] [review]
don't sign new dx10 dll's
Comment 51 Asa Dotzler [:asa] 2011-05-31 14:30:39 PDT
If this is solidly fixed for 6, can you please set the status-firefox6 to fixed? thanks.
Comment 52 Arthur K. 2011-07-08 09:12:24 PDT
FYI: I am still seeing D3DCompiler_42.dll on Seamonkey 2.2 when it should be on D3DCompiler_43.dll.
Comment 53 Kyle Huey [:khuey] (khuey@mozilla.com) 2011-07-08 09:13:48 PDT
(In reply to comment #52)
> FYI: I am still seeing D3DCompiler_42.dll on Seamonkey 2.2 when it should be
> on D3DCompiler_43.dll.

You're looking for Bug 660745.
Comment 54 Virgil Dicu [:virgil] [QA] 2011-08-12 08:57:50 PDT
Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20100101 Firefox/6.0

Can the status of this issue be set to verified fixed?
Comment 55 Arthur K. 2011-08-12 09:07:51 PDT
I can confirm latest 2.3 beta has June 2010 SDK. WFM.

Note You need to log in before you can comment on or make changes to this bug.