bm-l10n-dashboard01 is in bad shape infra and sec-wise, see bug 1304413. The best fix is to get a new VM to replace it, and then swap things out. From IRC: Pike fox2mike: if we're going for a new VM, I'm pretty sure that we can just get started setting it up. Then passing it over to me to figure out the python env and that the old stuff kinda runs fox2mike Pike: we could go that route, new CentOS 7 Pike And once that's done, we can plan an actual migration downtime fox2mike yeah fox2mike that sounds like a decent plan Pike I'd probably start setting the basics up against the elmo and ES stage, and if that works, the risk should be low Pike if things go really smooth, we could meet in Hawaii to do the actual migration in one physicaly location Pike CentOS 7 should be fine, the things I get from centos on a10n are all I need fox2mike ok fox2mike Pike: can you file a bug with Webops then for a new VM? fox2mike and ericz can pass that along to the VM team etc fox2mike and get the ball rolling? Pike are they gonna understand what I'm asking for, or are you or ericz better at playing jargon bingo? Pike in other words, I'm happy to request the VM, but I'm not sure I can be filling in the bootstrap info Pike also, I find 4 webops, not sure which to choose. IT-Managed or Other? ericz Pike: it-managed is probably best suited but we'd likely find any. I can help round out the bug but high-level enough that I can understand is a good start. |<-- marcia has left moznet (Quit: ) Pike 'k. filing something
Aspects we need to move forward: Some good CPU/Mem cycles to run IO bound jobs. There's not a ton of parallism going on. We need to expose storage to a10n.webapp.scl3.mozilla.com, l10n.stage.webapp.scl3.mozilla.com, l10n.webapp.scl3.mozilla.com, in the 20-30G range. Mostly hosting a bunch of local mercurial clones. Right now, we're using 18G, with how mozilla-central clones scale, we should have some leeway. Probably don't need the 100G we have right now, but 50G sounds safe.
Just tried to get a head start with this on a local VM, and CentOS 7 comes with python 2.7.5, and the python 2.7.11 version in the mozilla repo is only for CentOS 6.8. Can we enable a 2.7.11 package for 7, too? As I mentioned in bug 1304413 comment 10, 2.7.5 has a bug that breaks our code.
Packaging up a new version of Python is a fairly large task and maintenance burden. Is it possible there is a different workaround for this bug in 2.7.5?
I understand that that's tricky, but OTH, the problem isn't easy to work around. It's also hard to keep an environment around to verify the results of compare-locales against a potentially security-vulnerable version of python. And do we really need to work around bugs that got fixed in 2013 ? In related news, in bug 1315977 comment 7 we're bound to get the releng infrastructure off of 2.7.3 (which is yet another year older).  release date of 2.7.6 is 2013-11-10, https://hg.python.org/cpython/file/v2.7.6/Misc/NEWS.
If you're using CentOS 7, then Red Hat Software Collections has modern python versions in it, that are maintained, received security updates, and are compatible with the base operating system. You'll get /opt/python33 or similar and some basic guidance on how to fix up PATH to invoke the appropriate version. /usr/bin/python *cannot* be upgraded on any RH/Cent-based system, and Mozilla IT does not currently have FTE allocated for ongoing packaging and maintenance of a custom-built Python for this purpose, nor do we recommend using untrusted third-party builds of Python in any process that's associated with the Firefox builds. https://www.softwarecollections.org/en/scls/user/rhscl/?search=python
The alternative answer is to use CentOS 6.8, for which we maintain a recent version of python 2.7.
For that scenario, we'd likely build it on RHEL 6.8 - just as compatible, but Actually Works with our current Puppet regime. Note that we'd eventually have to rebuild the server at CentOS 7 once we schedule RHEL deprecation some year, but that's not in the card for the next few months as far as I know, anyways. Let us know how you prefer to proceed here.
6.8 sounds good for now.
Ok so could we please get a VM with RHEL 6 with 1 CPU, standard disk, and 2GB RAM? Name: l10n-dashboard2.webapp.scl3.mozilla.com Additionally, we'll need a new 50GB NFS share mounted on this new VM as well as on a10n.webapp.scl3.mozilla.com, l10n.stage.webapp.scl3.mozilla.com and l10n.webapp.scl3.mozilla.com. If that should be a separate request just let me know!
Assignee: eziegenhorn → server-ops-virtualization
Component: WebOps: IT-Managed Tools → Virtualization
QA Contact: smani → cshields
technically the NFS should be a new request - but I'll ask the question here so it's free of cruft - what's the mount point you'd prefer for it? I'll get started on the VM shortly.
I'd say let's use /data for consistency as that's pretty much our standard. Thanks!
VM created, tracked, and not in nagios (as dashboard1 isn't - poke if you need me to fix that) Share created and added to l10n-dashboard2 in change ( 3d59d222689868a49b83e60f23bdf8f6c1d036f3 ) Share added to other VMs in change ( ef23b78dab1784546036685ea8c28693e146f443 ) Per conversations with :ericz in IRC, mounted to /mnt/l10n_shared - /data was causing some puppet indigestion - which we'll be investigating - but didn't want that to unnecessarily delay this. In that there's a lot of history in here, didn't want to close prematurely - but the VM is deployed - is there anything else you need the virt/stor folks to do?
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3776] → [kanban:https://webops.kanbanize.com/ctrl_board/2/3776][vm-create:1]
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3776][vm-create:1] → [vm-create:1]
Silence is golden. Closing out to make the blocked bug LOOK unblocked. Reopen if there's issues.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Eric, what are the next steps? I tried to ssh in as axel, but got permission denied.
Access works now, there's more to do that we'll move in a new bug between this and bug 1304413.
Removing the direct block to bug 1304413, and VERIFIED, thanks.
No longer blocks: 1304413
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.