1323771 - Create replacement VM for bm-l10n-dashboard01 to run l10n jobs and host shared data store

Reporter

Description

•

9 years ago

+++ This bug was initially created as a clone of Bug #1319603 +++ bm-l10n-dashboard01 is in bad shape infra and sec-wise, see bug 1304413. We have a VM now from bug 1319603, l10n-dashboard2.webapp.scl3.mozilla.com, with the right version python, and I have access, too. Known issues that we need to track down still: <ericz> Puppet is a bit unhappy about nfs mounts and collectd so I have some work to do still but this is a decent starting point. Also, the network routes to our mysql server don't work yet, generic-rw-zeus.db.scl3.mozilla.com and probably stage-rw-vip.db.scl3.mozilla.com. On my side, I need at least to get the git submodules off of git: and over to https:.

:kanban

Updated

•

9 years ago

Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3876]

:kanban

Updated

•

9 years ago

Assignee: server-ops-webops → eziegenhorn

[github robot]

Comment 1

•

9 years ago

Commit pushed to develop at https://github.com/mozilla/elmo https://github.com/mozilla/elmo/commit/acb9cf18307300a414b03f7951f285e9b8f2d777 bug 1323771, configure travis, r=mathjazz Also remove the old hudson helper script, now that we use travis.

Axel Hecht [:Pike]

Reporter

Comment 2

•

9 years ago

Wrong bug, sorry.

Axel Hecht [:Pike]

Reporter

Comment 3

•

9 years ago

Making good progress now. I got the local master and slave set up, they're talking to the stage db and stage ES, and start up OK with the most recent versions of the libs. One thing that I notice is that apparently something about the new mounted storage is significantly slower than it is on the old VM. A mere `hg ident -i` takes 2 seconds on the old setup, https://l10n.mozilla.org/builds/builders/compare/634825. On the new one, it takes 25, https://l10n.allizom.org/builds/builders/compare/634827. Given that the actual comparison tool isn't impacted, it seems it's not file reads, but maybe stats? Eric, is that something you can ask the storage folks about? Left-over steps AFAICT right now: - check IO perf of /mnt/l10n_shared - clone all needed repos to /mnt/l10n_shared/repos Once that's running, I'll also need us to set up the cron jobs from the old machine, but we shouldn't run those yet. I also see that the VM doesn't have a lot of memory free, but while running my test jobs, it didn't start paging, so that seems all good. I'd like to hold off on the cloning until we know if we can tune IO, as the clones take a very long time right now.

Flags: needinfo?(eziegenhorn)

Greg Cox [:gcox]

Comment 4

•

9 years ago

The volume, export, permissions, IP are exactly the same for all users of /mnt/l10n_shared, so I'm at a loss for what the difference is at first glance. In the dark, my suspicions are 'some software is different between versions' or 'local disk vs NFS'. Could you demo the storage difference with a safe+repeatable command-line between the two boxes?

Axel Hecht [:Pike]

Reporter

Comment 5

•

9 years ago

Doh, you're right, I forgot that the share is local on the old box. That explains a lot. I did actually run `hg ident -i` on both shares on the production elmo VM (the one that serves l10n.m.o) so that both are remote, and there the new share is actually faster than the old. 30 seconds on the new, 4 seconds on the old. I'll mull over workarounds, of which I can think of a few: - run ident -r ., that helps on the one hand. - possibly using hg share to have the working dir be local. I'll test this and see how much diskspace that takes. Unblocked for now, though, thanks for the quick follow-up.

Flags: needinfo?(eziegenhorn)

Axel Hecht [:Pike]

Reporter

Comment 6

•

9 years ago

Eric, can we create a user 'a10n:a10n' with 1000:1000? Benefits: We have a 1000:1000 user on a10n handling the interactions with the hg repo, using the same uid:gid on the buildbot box makes the .hg/hgrc files trusted on both ends. That triggered my thoughts. More so, we don't have to run an old version of buildbot etc as root :-) Note, I figured as this and a10n are both in puppet, it makes more sense to have the user and group names be consistent among those. We could also call it dashboard:dashboard, as it is on bm-l10n-dashboard01, but that seems the wrong direction to be compatible with.

Flags: needinfo?(eziegenhorn)

:Atoll

Comment 7

•

9 years ago

Is UID/GID 1000 within the defined range of system UIDs on both boxes? :jabba, would UID/GID 1000 conflict with LDAP?

Flags: needinfo?(jdow)

Justin Dow [:jabba]

Comment 8

•

9 years ago

There was a user with uidNumber=1000 in LDAP, however it was a community member that hasn't been active since 2010, so I've re-numbered that user in LDAP, which frees up 1000 for generic use outside of LDAP.

Flags: needinfo?(jdow)

:Atoll

Comment 9

•

9 years ago

Thanks!

Eric Ziegenhorn :ericz

Assignee

Comment 10

•

9 years ago

a10n user and group created with uid and gid 1000.

Flags: needinfo?(eziegenhorn)

Axel Hecht [:Pike]

Reporter

Comment 11

•

9 years ago

Similar to bug 1305973 comment 2, I need libffi-devel and libffi. Eric, can you add those to the install?

Flags: needinfo?(eziegenhorn)

Eric Ziegenhorn :ericz

Assignee

Comment 12

•

9 years ago

Added

Flags: needinfo?(eziegenhorn)

Axel Hecht [:Pike]

Reporter

Comment 13

•

8 years ago

There's some follow-up configuration work like bug 1343898, but as the machine is now in production, I think this is good to mark FIXED. Thanks for the help.

Blocks: 1343898

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Bugzilla

Create replacement VM for bm-l10n-dashboard01 to run l10n jobs and host shared data store

Categories

(Infrastructure & Operations :: IT-Managed Tools, task)

Tracking

(Not tracked)

People

(Reporter: Pike, Assigned: ericz)

References

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/3876])

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13