Closed
Bug 1273286
Opened 9 years ago
Closed 9 years ago
Upgrade Linux puppet drivers to NVIDIA 361.42
Categories
(Infrastructure & Operations :: RelOps: Puppet, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: acomminos, Assigned: acomminos)
References
Details
Attachments
(2 files)
58 bytes,
text/x-review-board-request
|
rail
:
review+
|
Details |
58 bytes,
text/x-review-board-request
|
Callek
:
review+
rail
:
checked-in+
|
Details |
The current version of the NVIDIA driver on Linux puppet deployments (310.32) is outdated and causing us many issues with hangs in the NVIDIA driver, both within gecko as well as compiz. The latest long-lived driver, 361.42, is available in the graphics-drivers ubuntu ppa (https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa), and has been verified to resolve the hang issues in bug 1197954 on a loaner.
We should switch to this ppa from xorg-edgers and use its nvidia-361 package.
:rail, can you help us move this to the right place?
Flags: needinfo?(rail)
Comment 2•9 years ago
|
||
I presume this was in reference to releng CI machines, not IT machines.
Updated•9 years ago
|
Component: Infrastructure: Puppet → RelOps: Puppet
QA Contact: cshields → dustin
Assignee | ||
Comment 3•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/52990/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/52990/
Attachment #8753077 -
Flags: review?(rail)
Assignee | ||
Comment 4•9 years ago
|
||
Thanks for the move.
I'm not sure how the APT mirrors on puppet work, but this assumes that the graphics-drivers PPA is available to pull from in place of xorg-edgers.
Comment 5•9 years ago
|
||
I'll mirror the files.
Would be great to coordinate this change with jmaher (CCed).
Comment 6•9 years ago
|
||
please make it clear when this is deployed, we can look for talos changes, likewise random intermittents.
Comment 7•9 years ago
|
||
Comment on attachment 8753077 [details]
MozReview Request: Bug 1273286 - Upgrade NVIDIA drivers to 361.42, switch to graphics-drivers PPA. r?rail
https://reviewboard.mozilla.org/r/52990/#review49830
Thank you for the patch!
Attachment #8753077 -
Flags: review?(rail) → review+
Comment 8•9 years ago
|
||
I copied the files to match the patch. We can try to deploy it on a limited set of machines or the whole pool. In both cases reverting may require reimaging.
Assignee | ||
Comment 9•9 years ago
|
||
For the record, this patch assumes a reimage as a variety of packages pulled in incidentally by xorg-edgers will no longer need to be installed.
Keywords: checkin-needed
Comment 10•9 years ago
|
||
(In reply to Andrew Comminos [:acomminos] from comment #9)
> For the record, this patch assumes a reimage as a variety of packages pulled
> in incidentally by xorg-edgers will no longer need to be installed.
I think this is the most straight forward and safe way to proceed.
I'd like to reimage and pin to the new puppet env N machines, so we can ensure everything works as expected.
Andrew, Joel, any ideas about how many machines we should pin?
Assignee | ||
Comment 11•9 years ago
|
||
Joel would probably know better than I do regarding the number of machines needed to get statistically significant data within a reasonable timeframe.
Flags: needinfo?(jmaher)
Comment 12•9 years ago
|
||
I like 20 machines- lets document which get this- then we can push to try and look for odd talos patterns, etc. likewise any issues on the trees. We should let the sheriffs know the change when it happens.
Flags: needinfo?(jmaher)
Comment 13•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/53992/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/53992/
Attachment #8754483 -
Flags: review?(bugspam.Callek)
Comment 14•9 years ago
|
||
Comment on attachment 8754483 [details]
Bug 1273286 - Upgrade Linux drivers to NVIDIA 361.42
https://reviewboard.mozilla.org/r/53992/#review50716
I'm ok with this landing, I do have one downside to comment on, wouldn't this still mean that we run puppet with 'production' puppet first and then *next* run is with your environment?
Attachment #8754483 -
Flags: review?(bugspam.Callek) → review+
Comment 15•9 years ago
|
||
Comment on attachment 8754483 [details]
Bug 1273286 - Upgrade Linux drivers to NVIDIA 361.42
remote: https://hg.mozilla.org/build/puppet/rev/2ffd69a2223d
remote: https://hg.mozilla.org/build/puppet/rev/7ca08c85aa0e
Attachment #8754483 -
Flags: checked-in+
Comment 16•9 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #14)
> I'm ok with this landing, I do have one downside to comment on, wouldn't
> this still mean that we run puppet with 'production' puppet first and then
> *next* run is with your environment?
Not sure. I'll figure this out before enabling this slave.
Comment 17•9 years ago
|
||
Unfortunately kickstart won't talk to my environment at the beginning, so it tries to install xedgers packages first. After the initial puppetization it starts talking to my env and installs new packages. I purged the packages from that repo by:
apt-get purge nvidia-310 nvidia-settings-310
apt-get install libpixman-1-0=0.24.4-1 libpixman-1-0:i386=0.24.4-1
talos-linux64-ix-024 is enabled now and should start taking jobs. Let's start with just one for now.
Comment 18•9 years ago
|
||
is there still something needed to checkin on mozilla-inbound or so ?
Flags: needinfo?(andrew)
Comment 20•9 years ago
|
||
Comment on attachment 8754483 [details]
Bug 1273286 - Upgrade Linux drivers to NVIDIA 361.42
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/53992/diff/1-2/
Attachment #8754483 -
Attachment description: MozReview Request: Bug 1273286 - Pin talos-linux64-ix-024 to rail's env r=Callek → MozReview Request: Bug 1273286 - Use NVIDIA 361.42 on ~20 talos machines. r=Callek
Comment 21•9 years ago
|
||
Comment on attachment 8754483 [details]
Bug 1273286 - Upgrade Linux drivers to NVIDIA 361.42
This should move ~20 first slaves to my environment and update the drivers to the NVIDIA 361.42. Reimaging is not required, because the drivers can live together without any issues.
Attachment #8754483 -
Flags: review?(bugspam.Callek)
Attachment #8754483 -
Flags: review+
Attachment #8754483 -
Flags: checked-in+
Updated•9 years ago
|
Attachment #8754483 -
Flags: review?(bugspam.Callek) → review+
Comment 22•9 years ago
|
||
Comment on attachment 8754483 [details]
Bug 1273286 - Upgrade Linux drivers to NVIDIA 361.42
https://reviewboard.mozilla.org/r/53992/#review51822
r+ on diff based on https://gist.github.com/rail/f9b5a9454031384494c248fc5a9fc5f7 (since reviewboards interdiff was odd)
Comment 23•9 years ago
|
||
Comment on attachment 8754483 [details]
Bug 1273286 - Upgrade Linux drivers to NVIDIA 361.42
remote: https://hg.mozilla.org/build/puppet/rev/59401589eb60
remote: https://hg.mozilla.org/build/puppet/rev/d136a5ad0b8d
Attachment #8754483 -
Flags: checked-in+
Comment 24•9 years ago
|
||
Comment on attachment 8753077 [details]
MozReview Request: Bug 1273286 - Upgrade NVIDIA drivers to 361.42, switch to graphics-drivers PPA. r?rail
Apparently we need to remove the 310 related stuff to make X use the new drivers. I added this block https://gist.github.com/rail/efbe7e78e999358cb60c59f38e9b25a8#file-puppet-diff-L40-L42 to make it work.
Comment 25•9 years ago
|
||
It took quite a bit to get all machines to switch to the new driver, because we don't reboot too often anymore. I just verified, that talos-linux64-ix-001 to talos-linux64-ix-019 and talos-linux64-ix-024 have the new driver installed.
Joel, how long do you want to watch these machines for before we proceed further?
Flags: needinfo?(jmaher)
Comment 26•9 years ago
|
||
can we give them until next wednesday? with a 3 day weekend in the US, I imagine we won't get a lot of traffic as normal- I would rather let this bake for a few days. If others have different desires/goals/plans, please speak up!
Flags: needinfo?(jmaher)
Comment 27•9 years ago
|
||
WFM!
Comment 28•9 years ago
|
||
removing keyword to get this out of our list of things to checkin :))
Keywords: checkin-needed
Comment 29•9 years ago
|
||
Comment on attachment 8754483 [details]
Bug 1273286 - Upgrade Linux drivers to NVIDIA 361.42
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/53992/diff/2-3/
Attachment #8754483 -
Attachment description: MozReview Request: Bug 1273286 - Use NVIDIA 361.42 on ~20 talos machines. r=Callek → Bug 1273286 - Upgrade Linux drivers to NVIDIA 361.42
Comment 30•9 years ago
|
||
Comment on attachment 8754483 [details]
Bug 1273286 - Upgrade Linux drivers to NVIDIA 361.42
Updated the patch to reflect recent Windows changes.
Attachment #8754483 -
Flags: review?(bugspam.Callek)
Attachment #8754483 -
Flags: review+
Attachment #8754483 -
Flags: checked-in+
Updated•9 years ago
|
Attachment #8754483 -
Flags: review?(bugspam.Callek) → review+
Comment 31•9 years ago
|
||
Comment on attachment 8754483 [details]
Bug 1273286 - Upgrade Linux drivers to NVIDIA 361.42
https://reviewboard.mozilla.org/r/53992/#review55202
Lets give this a shot...
::: modules/packages/manifests/nvidia_drivers.pp:13
(Diff revision 3)
> case $::operatingsystem {
> Ubuntu: {
> + $nvidia_version = "361"
> + $nvidia_full_version = "361.42"
> + # The Ubuntu graphics-drivers repo embeds the version number in the
> + # package name, so we can easily require "latest"
s/can/can't/ ?
Comment 32•9 years ago
|
||
Comment on attachment 8754483 [details]
Bug 1273286 - Upgrade Linux drivers to NVIDIA 361.42
remote: https://hg.mozilla.org/build/puppet/rev/3fe3a2a3324e
remote: https://hg.mozilla.org/build/puppet/rev/81c0440bff08
Attachment #8754483 -
Flags: checked-in+
Comment 33•9 years ago
|
||
It make take some time to get all machines to update the packages. So far I checked one of the machines and this is what I see in /var/log/Xorg.0.log:
[ 62.492] (II) NVIDIA GLX Module 361.42 Tue Mar 22 17:25:45 PDT 2016
Assignee | ||
Comment 34•9 years ago
|
||
Just received some successful results with machines talos-linux64-ix-06{7,8}. Looks like the driver is working perfectly.
Thanks!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Comment 35•9 years ago
|
||
Wheeee!!!!
Comment 36•9 years ago
|
||
as a note, this has fixed a lot of bi-modal/noisy data on linux64:
https://bugzilla.mozilla.org/show_bug.cgi?id=1271948#c21
(or it was a real coincidence). Not sure I understand why that is, but it is good to keep in mind.
Comment 37•9 years ago
|
||
Wow! Sounds like the current driver is much better! \o/
Another reason to keep the systems up to date!
You need to log in
before you can comment on or make changes to this bug.
Description
•