Closed Bug 648804 Opened 14 years ago Closed 14 years ago

Upgrade all windows build slaves to June 2010 or later version of the DirectX SDK

Categories

(Release Engineering :: General, defect, P2)

x86_64
Windows Server 2003
defect

Tracking

(firefox5+ fixed, firefox6+ fixed)

RESOLVED FIXED
Tracking Status
firefox5 + fixed
firefox6 + fixed

People

(Reporter: bjacob, Assigned: catlee)

References

Details

(Whiteboard: [builders][opsi])

Attachments

(4 files)

The Windows build slaves have the February 2010 version of the DirectX SDK, carrying the internal version number 42.

For at least two different important reasons, we need this to be upgraded to the June 2010 version (internal version number 43) or later. Whatever is the current latest version.

Reasons:
 * recent ANGLE revisions require this newer DirectX SDK. Currently, in order to benefit from recent ANGLE revisions, we have to use a custom patch to allow using the older DirectX SDK.
 * bug 634817 shows a crash bug that we get because of the DirectX SDK version that we use, and that is not present in the newer version.
Whiteboard: [unittest][win7]
I can understand the desire to use newer DirectX SDK - but first, are there any backward compatible issues? Will upgrading like this cause any problems for any code or tests on FF3.5, FF3.6, FF4.0?
There shouldn't be any issues: the newer SDK is actually what most developers building on Windows are already using.
Benoit are you sure that we don't already have the June 2010 version?
I deployed this in January *2011* and I took note that version installed was June 2010.
https://bugzilla.mozilla.org/show_bug.cgi?id=624044#c9
(In reply to comment #3)
> Benoit are you sure that we don't already have the June 2010 version?

What I know for sure is that today's nightly build,
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2011-04-11-03-mozilla-central/firefox-4.2a1pre.en-US.win32.zip

Is shipping the February 2010 version of the DirectX SDK DLLs: when I extract this archive, in the firefox/ directory there is
  d3dx9_42.dll
  D3DCompiler_42.dll
This number 42 means February 2010.

Could it be that the build slaves have *both* versions of the DirectX SDK installed? If yes, then there are two ways that we can fix the problem:
 * I could land a configure.in change making sure that when we have more than one version of the DirectX SDK, we use the most recent one.
 * Or you could uninstall the old February 2010 version on the build slaves.
Yes, when you run the web installer it installs everything up to the latest SDK.

Yes please, try to fix it on the build side as it is very expensive for me to make that change.
There's some confusion between build and test slaves here. Build slaves have the Feb 2010 SDK, from bug 529938. Windows 7 test slaves have a more recent runtime, from bug 624044.

IIRC, we are currently using VS 2005 as the compiler pretty much everywhere (except maybe win64), and bug 529938 comment #27 states that the Feb 2010 SDK is the last one to support that compiler. I think we have VS2008 installed too, but not VS2010.

Changing the compiler is obviously complicates things, CC'ing some build system bigshots.
We *really* should move to a new compiler ... but that's a lot of work, so it might not be able to block this ...
Yes, I got confused.
OS: Linux → Windows Server 2003
Whiteboard: [unittest][win7] → [builders]
Can someone familiar with this process outline the steps that need to be taken to make this update happen so that slaveduty can track and perform the upgrade over the next week or so?
Whiteboard: [builders] → [builders][slaveduty]
Apparently this will require OPSI work, so adjusting the whiteboard for that.
Whiteboard: [builders][slaveduty] → [builders][opsi]
(In reply to comment #6)
> There's some confusion between build and test slaves here. Build slaves have
> the Feb 2010 SDK, from bug 529938. Windows 7 test slaves have a more recent
> runtime, from bug 624044.

Ah OK. To be clear, since we are now extracting the DLLs from the SDK at build time and shipping them with the build (bug 630628), we now only need the SDK to be installed on the build slaves, not on the test slaves.
(In reply to comment #9)

To restate part of comment #6 - I don't think we can proceed with upgrading the SDK because it will cause the compilation to fail with our VS2005 compiler.
(In reply to comment #12)
> (In reply to comment #9)
> 
> To restate part of comment #6 - I don't think we can proceed with upgrading the
> SDK because it will cause the compilation to fail with our VS2005 compiler.

Benoit:

Upgrading the compiler has value, but its not something we're willing to do at this point in our transition to a faster release cadence. We'll revisit the compiler upgrade topic later this year.

Given the compiler upgrade is a requirement for the new DirectX, and given we are already on the latest DirectX supported by the current compiler, I therefore recommend we stay with the DirectX we have now and close this bug as WONTFIX.

Did I miss anything?
As I said in bug 529938 comment #30, I can build Minefield with VS2005 + DXSDK June 2010 + ANGLE enabled without any errors by installing KB949009.
(In reply to comment #14)
> As I said in bug 529938 comment #30, I can build Minefield with VS2005 + DXSDK
> June 2010 + ANGLE enabled without any errors by installing KB949009.

Ah, interesting.

This link:
http://www.microsoft.com/downloads/en/details.aspx?FamilyID=3021d52b-514e-41d3-ad02-438a3ba730ba
says that the June 2010 SDK does not support MSVS 2005, but maybe for the subset we're using it doesn't matter. Will try.
(In reply to comment #13)
> Did I miss anything?

Let's see if we can get the June 2010 SDK to work with MSVS 2005. I will try installing MSVS 2005 myself.
Benoit, assinging bug to you while you test this.  Please unassign yourself when you have tested this and figured out whether this works
Assignee: nobody → bjacob
Benoit: 

I've filed bug#651295 to get you a loaner machine - hope this helps.

Note that these same machines are used for FF3.5, FF3.6, FF4.0, as well as m-c. In order to do this DirectX upgrade, we need to also avoid breaking binary compatibility with other releases, so I believe there are two questions here:

1) Is it possible to have two different versions of DirectX installed on the same machine? If so, how?

2) If we need to replace Feb2010 DirectX install with June2010 DirectX install, can the June2010 DirectX be used with the existing MSVS 2005 compiler, and used for the existing FF3.5.x, FF3.6.x, FF4.0.x releases? If we have to upgrade build compilers, will this cause any problems for binary compatibility for these supported releases?
(In reply to comment #18)
> Benoit: 
> 
> I've filed bug#651295 to get you a loaner machine - hope this helps.
> 
> Note that these same machines are used for FF3.5, FF3.6, FF4.0, as well as m-c.
> In order to do this DirectX upgrade, we need to also avoid breaking binary
> compatibility with other releases, so I believe there are two questions here:
> 
> 1) Is it possible to have two different versions of DirectX installed on the
> same machine? If so, how?
> 
> 2) If we need to replace Feb2010 DirectX install with June2010 DirectX install,
> can the June2010 DirectX be used with the existing MSVS 2005 compiler, and used
> for the existing FF3.5.x, FF3.6.x, FF4.0.x releases? If we have to upgrade
> build compilers, will this cause any problems for binary compatibility for
> these supported releases?

IIUC from my messing with the SDK a couple of months ago. The newer SDK always contains the contents of the previous SDK plus the new ones. This helps a program target an specified version of the SDK if needed.
See Also: → 563317
This patch does 2 things:
 1) when there are multiple DXSDK versions install, try first to get the June 2010 SDK. Otherwise, take whatever comes first.
 2) more fool-proof sed command to extract the path. Should fix bug 643732.
Great news: on the builder i've gotten access to (bug 651295), building with the June 2010 SDK worked without any problem. I've checked the resulting build, it ran the WebGL conformance test suite exactly like current Nightly. I've also triple checked that it had been built with Visual Studio 2005, and that the DLLs it was shipping and using were the June 2010 SDK DLLs.

-> conclusion: as far as we are concerned, Visual Studio 2005 works nicely with the June 2010 DirectX SDK. Please proceed with the upgrade.
Comment on attachment 529575 [details] [diff] [review]
prefer June 2010 SDK and be more fool-proof in extracting path

Review of attachment 529575 [details] [diff] [review]:
-----------------------------------------------------------------
Attachment #529575 - Flags: review?(joe)
Now that the concerns over MSVS 2005 compatibility are cleared, do you think the June 2010 SDK update could happen in time for Firefox 6 branching off m-c? If not, we'll probably do a ANGLE update anyway but manually removing the changes that require the June 2010 SDK.
Comment on attachment 529575 [details] [diff] [review]
prefer June 2010 SDK and be more fool-proof in extracting path

It seems that this does not fix this detection on Windows XP, but there's no reason for that to block this patch being accepted.

Note that you'll want to remove the echos before committing.
Attachment #529575 - Flags: review?(joe) → review+
(In reply to comment #20)
> Created attachment 529575 [details] [diff] [review] [review]
> prefer June 2010 SDK and be more fool-proof in extracting path
> 
> This patch does 2 things:
>  1) when there are multiple DXSDK versions install, try first to get the
> June 2010 SDK. Otherwise, take whatever comes first.
>  2) more fool-proof sed command to extract the path. Should fix bug 643732.

This patch has DOS line endings.
Blocks: 657748
Confirming that this DXSDK upgrade is really needed. In bug 657748 I've been trying to update our ANGLE copy and it fails on a bug in the February 2010 SDK we're using, http://code.google.com/p/angleproject/issues/detail?id=158
To be clear, we really need the newer ANGLE for Firefox 6! It buys us stability, security fixes, performance improvements, and important new features. It's important for parity with Chrome.
Assignee: bjacob → catlee
Priority: -- → P2
installing over top of the old version seems to work, as does deleting the old registry key.
Attachment #533408 - Flags: review?(bhearsum)
Yay!
Note: deleting the old registry key should not be needed. When I played with the builder, I just installed the new SDK alongside the old one, the two were listed in the registry and the ./configure is able to get it right i.e. prefer the June 2010 SDK if present.
Attachment #533408 - Flags: review?(bhearsum) → review+
(In reply to comment #28)
> Landed the DXSDK detection patch:
> http://hg.mozilla.org/mozilla-central/rev/962fee06b08e

Something is odd here, I have been doing my daily Windows builds under Windows/XP with the patch attached to this bug included, and they have been building with angle.  Since this patch is now on mozilla-central, I altered my build to not include the patch and today's resultant builds did NOT include angle.  I have no idea why not.  I will look at the logs tonight when I have access to the logs on the build system, and post additional info.
Attachment #533408 - Flags: checked-in+
Deploying to try build machines now.
(In reply to comment #31)
> (In reply to comment #28)
> > Landed the DXSDK detection patch:
> > http://hg.mozilla.org/mozilla-central/rev/962fee06b08e
> 
> Something is odd here, I have been doing my daily Windows builds under
> Windows/XP with the patch attached to this bug included, and they have been
> building with angle.  Since this patch is now on mozilla-central, I altered
> my build to not include the patch and today's resultant builds did NOT
> include angle.  I have no idea why not.  I will look at the logs tonight
> when I have access to the logs on the build system, and post additional info.

Please ignore this report.

It would seem something went amiss on my Windows builds only, and somehow my build system checked out the revision matching yesterday's nightly instead of today's.  There is supposed to be code in my script to prevent that.
(In reply to comment #32)
> Deploying to try build machines now.

Looks like this didn't work; will try again tomorrow
turns out that the move commands were failing to overwrite the old files, and then the new ones got deleted with the subsequent rmdir. this patch fixes that up.
Attachment #533872 - Flags: review?(bhearsum)
Attachment #533872 - Flags: review?(bhearsum) → review?(coop)
Attachment #533872 - Flags: review?(coop) → review+
Attachment #533872 - Flags: checked-in+
Deploying to try again...
Looks good on try, deploying to the rest of the build machines.
When you roll this out to "all" build slaves does that mean Firefox 5 "beta" builds will automatically pick up this fix? Or are the release build machines separate?
(catlee is out for a few days)

There's no separation of build machines for any particular branch or build type, so all release builds could be affected by this change. Comments #18 and #19 deal with compatibility issues, to some extent. Do you want to continue targeting the Feb 2010 SDK in Fx5 beta builds ?
(In reply to comment #37)
> Looks good on try, deploying to the rest of the build machines.

OK, so this is fixed, right? Bug 634817 comment 20 says that Aurora is shipping the June 2010 DLLs now. Marking as fixed, reopen if needed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Not all hosts have picked it up yet, reopening to chase those down.

# root@production-opsi
cd /var/lib/opsi/config/clients
grep -l '9\.28\.1886' *
mw32-ix-slave19.uib.local.ini
mw32-ix-slave20.uib.local.ini
w32-ix-slave01.uib.local.ini
w32-ix-slave02.uib.local.ini
w32-ix-slave04.uib.local.ini
w32-ix-slave05.uib.local.ini
w32-ix-slave07.uib.local.ini
w32-ix-slave23.uib.local.ini
w32-ix-slave41.uib.local.ini
win32-ix-ref.uib.local.ini
win32-slave01.uib.local.ini
win32-slave02.uib.local.ini
win32-slave05.uib.local.ini
win32-slave06.uib.local.ini
win32-slave08.uib.local.ini
win32-slave09.uib.local.ini
win32-slave12.uib.local.ini
win32-slave13.uib.local.ini
win32-slave14.uib.local.ini
win32-slave15.uib.local.ini
win32-slave16.uib.local.ini
win32-slave17.uib.local.ini
win32-slave18.uib.local.ini
win32-slave19.uib.local.ini
win32-slave20.uib.local.ini
win32-slave22.uib.local.ini
win32-slave23.uib.local.ini
win32-slave24.uib.local.ini
win32-slave25.uib.local.ini
win32-slave27.uib.local.ini
win32-slave28.uib.local.ini
win32-slave29.uib.local.ini
win32-slave31.uib.local.ini
win32-slave33.uib.local.ini
win32-slave34.uib.local.ini
win32-slave41.uib.local.ini
win32-slave46.uib.local.ini

Some of those machines may be gone or defunct.
(In reply to comment #42)
> Not all hosts have picked it up yet,

Woops. A build made by a machine that hasn't picked it is a build that won't have ANGLE enabled on Windows, i.e. a big regression.

How long will it take for remaining hosts to pick it? How can I check if the build of the Aurora that was cut off today has it?
(Because I landed the ANGLE update today and June 2010 is now a requirement)
I'm going around them now. Idle slaves will get rebooted immediately and pick up the changes. Anything not idle will need to finish the job first, but still only a couple of hours.
Thanks!

I had checked a few tinderbox builds and they had ANGLE and the updated SDK files.
mw32-ix-slave19.uib.local.ini - preproduction - rebooted
mw32-ix-slave20.uib.local.ini - production - rebooted

w32-ix-slave01.uib.local.ini - staging - upgrade wasn't set - set now (false positive here)

w32-ix-slave02.uib.local.ini - production - upgrade wasn't set - fixed & rebooted
w32-ix-slave04.uib.local.ini - production - upgrade wasn't set - fixed, will reboot after current cedar compile
w32-ix-slave05.uib.local.ini - production but waiting for reimage - upgrade wasn't set - set now and will get it from the reimage anyway
w32-ix-slave07.uib.local.ini - staging - upgrade wasn't set - set now
w32-ix-slave23.uib.local.ini - prod but away at iX for hardware fix - is set
w32-ix-slave41.uib.local.ini - prod but away at iX for hardware fix - is set

win32-ix-ref.uib.local.ini - reference image - rebooted to apply upgrade

win32-slave02.uib.local.ini - all VMs that were deleted
win32-slave05.uib.local.ini
win32-slave12.uib.local.ini
win32-slave13.uib.local.ini
win32-slave14.uib.local.ini
win32-slave15.uib.local.ini
win32-slave16.uib.local.ini
win32-slave17.uib.local.ini
win32-slave18.uib.local.ini
win32-slave19.uib.local.ini
win32-slave22.uib.local.ini
win32-slave23.uib.local.ini
win32-slave24.uib.local.ini
win32-slave25.uib.local.ini
win32-slave27.uib.local.ini
win32-slave28.uib.local.ini
win32-slave29.uib.local.ini

win32-slave01.uib.local.ini - production - rebooted
win32-slave06.uib.local.ini - production - rebooted
win32-slave08.uib.local.ini - production - rebooted
win32-slave09.uib.local.ini - production - rebooted
win32-slave20.uib.local.ini - production - rebooted
win32-slave31.uib.local.ini - production - rebooted
win32-slave33.uib.local.ini - production - rebooted
win32-slave34.uib.local.ini - production - rebooted
win32-slave41.uib.local.ini - production - rebooted
win32-slave46.uib.local.ini - production - rebooted

All done here.
w32-ix-slave03 (a staging slave) doesn't have the new SDK yet, and refuses to install it when I mark it for installation. I'm not sure what the issue is, probably a busted OPSI installation :(
That's bug 659186 - sorry, I didn't include that in the tracking spreadsheet and thus forgot about it.  If you can fix that bug without a reimage, great!
Attachment #536285 - Flags: review?(bhearsum)
Attachment #536285 - Flags: review?(bhearsum) → review+
Attachment #536285 - Flags: checked-in+
If this is solidly fixed for 6, can you please set the status-firefox6 to fixed? thanks.
FYI: I am still seeing D3DCompiler_42.dll on Seamonkey 2.2 when it should be on D3DCompiler_43.dll.
(In reply to comment #52)
> FYI: I am still seeing D3DCompiler_42.dll on Seamonkey 2.2 when it should be
> on D3DCompiler_43.dll.

You're looking for Bug 660745.
Blocks: 671184
Blocks: 673559
Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20100101 Firefox/6.0

Can the status of this issue be set to verified fixed?
I can confirm latest 2.3 beta has June 2010 SDK. WFM.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: