Create replacement VM for bm-l10n-dashboard01 to run l10n jobs and host shared data store

VERIFIED FIXED

Status

Infrastructure & Operations
Virtualization
VERIFIED FIXED
a year ago
a year ago

People

(Reporter: Pike, Unassigned)

Tracking

Details

(Whiteboard: [vm-create:1])

(Reporter)

Description

a year ago
bm-l10n-dashboard01 is in bad shape infra and sec-wise, see bug 1304413.

The best fix is to get a new VM to replace it, and then swap things out.

From IRC:

	Pike	fox2mike: if we're going for a new VM, I'm pretty sure that we can just get started setting it up. Then passing it over to me to figure out the python env and that the old stuff kinda runs
	fox2mike	Pike: we could go that route, new CentOS 7 
	Pike	And once that's done, we can plan an actual migration downtime
	fox2mike	yeah
	fox2mike	that sounds like a decent plan 
	Pike	I'd probably start setting the basics up against the elmo and ES stage, and if that works, the risk should be low
	Pike	if things go really smooth, we could meet in Hawaii to do the actual migration in one physicaly location
	Pike	CentOS 7 should be fine, the things I get from centos on a10n are all I need
	fox2mike	ok
	fox2mike	Pike: can you file a bug with Webops then for a new VM? 
	fox2mike	and ericz can pass that along to the VM team etc 
	fox2mike	and get the ball rolling? 
	Pike	are they gonna understand what I'm asking for, or are you or ericz better at playing jargon bingo?
	Pike	in other words, I'm happy to request the VM, but I'm not sure I can be filling in the bootstrap info
	Pike	also, I find 4 webops, not sure which to choose. IT-Managed or Other?
	ericz	Pike: it-managed is probably best suited but we'd likely find any. I can help round out the bug but high-level enough that I can understand is a good start.
	|<--	marcia has left moznet (Quit: )
	Pike	'k. filing something

Updated

a year ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3776]
(Reporter)

Comment 1

a year ago
Aspects we need to move forward:

Some good CPU/Mem cycles to run IO bound jobs. There's not a ton of parallism going on.

We need to expose storage to a10n.webapp.scl3.mozilla.com, l10n.stage.webapp.scl3.mozilla.com, l10n.webapp.scl3.mozilla.com, in the 20-30G range. Mostly hosting a bunch of local mercurial clones. Right now, we're using 18G, with how mozilla-central clones scale, we should have some leeway. Probably don't need the 100G we have right now, but 50G sounds safe.
(Reporter)

Comment 2

a year ago
Just tried to get a head start with this on a local VM, and CentOS 7 comes with python 2.7.5, and the python 2.7.11 version in the mozilla repo is only for CentOS 6.8.

Can we enable a 2.7.11 package for 7, too? As I mentioned in bug 1304413 comment 10, 2.7.5 has a bug that breaks our code.

Updated

a year ago
Assignee: server-ops-webops → eziegenhorn
Packaging up a new version of Python is a fairly large task and maintenance burden.  Is it possible there is a different workaround for this bug in 2.7.5?
(Reporter)

Comment 4

a year ago
I understand that that's tricky, but OTH, the problem isn't easy to work around. It's also hard to keep an environment around to verify the results of compare-locales against a potentially security-vulnerable version of python. And do we really need to work around bugs that got fixed in 2013 [1]?

In related news, in bug 1315977 comment 7 we're bound to get the releng infrastructure off of 2.7.3 (which is yet another year older).

[1] release date of 2.7.6 is 2013-11-10, https://hg.python.org/cpython/file/v2.7.6/Misc/NEWS.
If you're using CentOS 7, then Red Hat Software Collections has modern python versions in it, that are maintained, received security updates, and are compatible with the base operating system. You'll get /opt/python33 or similar and some basic guidance on how to fix up PATH to invoke the appropriate version. /usr/bin/python *cannot* be upgraded on any RH/Cent-based system, and Mozilla IT does not currently have FTE allocated for ongoing packaging and maintenance of a custom-built Python for this purpose, nor do we recommend using untrusted third-party builds of Python in any process that's associated with the Firefox builds.

https://www.softwarecollections.org/en/scls/user/rhscl/?search=python
(Reporter)

Comment 6

a year ago
The alternative answer is to use CentOS 6.8, for which we maintain a recent version of python 2.7.
For that scenario, we'd likely build it on RHEL 6.8 - just as compatible, but Actually Works with our current Puppet regime. Note that we'd eventually have to rebuild the server at CentOS 7 once we schedule RHEL deprecation some year, but that's not in the card for the next few months as far as I know, anyways.

Let us know how you prefer to proceed here.
(Reporter)

Comment 8

a year ago
6.8 sounds good for now.
Ok so could we please get a VM with RHEL 6 with 1 CPU, standard disk, and 2GB RAM?  Name: l10n-dashboard2.webapp.scl3.mozilla.com

Additionally, we'll need a new 50GB NFS share mounted on this new VM as well as on a10n.webapp.scl3.mozilla.com, l10n.stage.webapp.scl3.mozilla.com and l10n.webapp.scl3.mozilla.com.  If that should be a separate request just let me know!
Assignee: eziegenhorn → server-ops-virtualization
Component: WebOps: IT-Managed Tools → Virtualization
QA Contact: smani → cshields
technically the NFS should be a new request - but I'll ask the question here so it's free of cruft - what's the mount point you'd prefer for it?

I'll get started on the VM shortly.
I'd say let's use /data for consistency as that's pretty much our standard.  Thanks!
VM created, tracked, and not in nagios (as dashboard1 isn't - poke if you need me to fix that) 

Share created and added to l10n-dashboard2 in change ( 3d59d222689868a49b83e60f23bdf8f6c1d036f3  )

Share added to other VMs in change ( ef23b78dab1784546036685ea8c28693e146f443 )

Per conversations with :ericz in IRC, mounted to /mnt/l10n_shared - /data was causing some puppet indigestion - which we'll be investigating - but didn't want that to unnecessarily delay this.

In that there's a lot of history in here, didn't want to close prematurely - but the VM is deployed - is there anything else you need the virt/stor folks to do?
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3776] → [kanban:https://webops.kanbanize.com/ctrl_board/2/3776][vm-create:1]
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3776][vm-create:1] → [vm-create:1]

Comment 13

a year ago
Silence is golden.  Closing out to make the blocked bug LOOK unblocked.  Reopen if there's issues.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
(Reporter)

Comment 14

a year ago
Eric, what are the next steps?

I tried to ssh in as axel, but got permission denied.
Flags: needinfo?(eziegenhorn)
(Reporter)

Comment 15

a year ago
Access works now, there's more to do that we'll move in a new bug between this and bug 1304413.
Flags: needinfo?(eziegenhorn)
(Reporter)

Updated

a year ago
Blocks: 1323771
(Reporter)

Comment 16

a year ago
Removing the direct block to bug 1304413, and VERIFIED, thanks.
No longer blocks: 1304413
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.