Closed Bug 688186 Opened 8 years ago Closed 8 years ago
.mozilla .org with symbols1 .dmz .phx1 .mozilla .com
After socorro staging moves to phx, it doesn't make sens to keep the entire symbol store in sjc with the 5-minute rsync to sync it to phx. It would simplify things quite a bit if we could set up a new VM in phx and migrate whatever is on dm-symbolpush01 over to it. Is there any reason to keep the symbol store in sjc1?
I actually filed this same bug as bug 628731 a while ago. I am A-OK with this, we just need to figure out a migration plan. We have all of our buildbots as well as Thunderbird/Seamonkey's buildbots and Linux distros etc uploading there. On IRC, jabba suggested that we set up the new box, but keep the rsync mirroring going, and then once all the users have migrated to the new box we can shut off the old one.
Resolving this bug would eliminate another sjc1-scl3 move. Bumping priority.
Severity: minor → major
This machine was created very quietly in 688192 146 days ago, and is running now. Ted needs a symbolfetch account on it like in bug 423413, which I'm working on. I'll add it to LDAP and give it access on that machine via puppet.
Assignee: jthomas → dustin
The LDAP user already existed. And my puppet is rusty. But this user exists now. Ted, can you verify that this scp works from symbolfetch1 now? They're on the same network segment, so they shouldn't need a firewall flow.
SSH as symbolfetch works, thanks!
Ok, I have two issues: 1) The /vol/pio_symbols mount is read-only 2) We need the post-symbol-upload script installed, as per bug 609270.
(In reply to Ted Mielczarek [:ted, :luser] from comment #6) > Ok, I have two issues: > 1) The /vol/pio_symbols mount is read-only Fixed manually and verified, so the perms on the netapp are correct -- I'll figure out how to do this in Puppet (/cc jason). > 2) We need the post-symbol-upload script installed, as per bug 609270. I'll do this in puppet, as well.
Both are complete. That should finish bug 722756, right?
I'll run the script there once more to confirm, and if it works properly then we can finish that bug off.
Looks really close, I'm just getting a few permissions errors trying to create files/directories in /mnt/netapp/breakpad/symbols_os: [firstname.lastname@example.org symbols_os]$ ls -ld . drwxrwxr-x 9961 root 7766 368640 Feb 15 21:55 . [email@example.com symbols_os]$ ls -ld wtsapi32.pdb drwxrwxr-x 51 2307 7766 4096 Feb 13 21:45 wtsapi32.pdb Most of the other dirs are symbolfetch/users, but I can't write to whatever those are.
Whoops, this kinda fell off my radar. I can certainly chown those files, but I'm hesitant to do so without a sign-off from someone familiar with the contents of the mount. Laura? Or if not, who?
If Ted thinks it's ok then he's the most qualified. I don't think it will break anything.
So 2307 is ted's account, 7766 is the 'breakpad' user, and '702' is stgbld. I've chown'd symbols_os itself (but not its contents) to symbolfetch/users. Let me know if that's not enough, or if it causes problems. drwxrwxr-x 9964 symbolfetch users 806912 Feb 16 11:19 symbols_os I did this on the sjc1 and phx1 shares, so rsync shouldn't revert the change.
:ted, where are we at with this? You were verifying some scripts in bug 722756, and assuming that works, I think this bug is closed, too?
I need to circle back around to that bug, but I think for our purposes this bug is probably finished. The bugs it blocks cover moving all the consumers of this service to the new host.
OK, cool. dm-symbolpush01 is a VM, so you've got until we shut off the ESX servers -- a few weeks, at least.
Correction - that host will go away on 4/9. What needs to happen before then?
Assignee: dustin → ted.mielczarek
I'm still getting some permission errors, it looks like: checkdir error: cannot create zipfldr.pdb/FC44B439CA35485EAD36FA65AF7F2E171 Permission denied unable to process zipfldr.pdb/FC44B439CA35485EAD36FA65AF7F2E171 /. checkdir error: cannot create zipfldr.pdb/FC44B439CA35485EAD36FA65AF7F2E171 Permission denied unable to process zipfldr.pdb/FC44B439CA35485EAD36FA65AF7F2E171 /zipfldr.sym. Those are under symbols_os.
drwxrwxr-x 9971 symbolfetch users 368640 Mar 28 12:25 symbols_os drwxrwxr-x 39 2307 7766 4096 Feb 12 21:59 symbols_os/zipfldr.pdb/ It's that ted guy again with userid 2307. Maybe we should lock him out ;) I did a recursive chown of symbols_os to symbolfetch:users. Did that help?
That did the trick, thanks!
OK, so the phx1 host is working as a target -- now we need to get releng and seamonkey systems uploading to the new host. We have a reprieve on the 4/6 deadline, but only one week. I believe this will need a known_hosts update (a la bug 742045, perhaps piggybacked?) as well as a config change to http://hg.mozilla.org/build/buildbot-configs/file/4701b9b1642c/mozilla/production_config.py#l91 for releng - and similar adjustments for seamonkey. We may also need network flows (which I can help with if necessary). Who can make this happen?
I can take bug 722759 for the Firefox side since I'm touching known_hosts anyway. Do any of the infra bugs attached here open the network flow ? Do we have all the *bld users set up on symbols1.dmz.phx1.mozilla.com ?
Flows got st up in bug 742083 (thanks for filing that). I can add whatever users you need, preferably LDAP users (trybld and ffxbld both are; syncbld isn't). Just let me know.
(In reply to Dustin J. Mitchell [:dustin] from comment #23) > Flows got st up in bug 742083 (thanks for filing that). I can add whatever > users you need, preferably LDAP users (trybld and ffxbld both are; syncbld > isn't). Just let me know. Thunderbird and SeaMonkey build system AFAIK also need to be set up in terms of flow and users, I guess - not sure if Thunderbird machines are 100% covered by RelEng flow stuff yet or not, but I guess Nick knows about that. Should Callek file another bug for the currently (due to history and colo moves) somewhat complicate flows for SeaMonkey? The users IIRC are tbirdbld and seabld, I guess those are in LDAP. Also not sure if Calendar/calbld also needs to be added, it's AFAIK handled by the Thunderbird infra, guys there should know about that.
Yes, please, for the flow requests. I'm curious what host those systems are using now?
(In reply to Dustin J. Mitchell [:dustin] from comment #25) > Yes, please, for the flow requests. I'm curious what host those systems are > using now? I'm pretty sure everyone is using dm-symbolpush01.mozilla.org now. And http://hg.mozilla.org/build/buildbot-configs/file/c393d6d725ba/seamonkey/config.py#l147 etc. as well as http://hg.mozilla.org/build/buildbot-configs/file/c393d6d725ba/thunderbird/config.py#l124 etc. confirm that.
Ah, ok, that resolves differently externally. So we'll need to set that up. I'll get a bug filed.
So far I have: moco releng: users: ffxbld, trybld source: internal flows: bug 742083 seamonkey: users: seabld source: external vip: bug 742563 Thunderbird: ?? jhopkins? Calendar: ?? jhopkins?
(In reply to Dustin J. Mitchell [:dustin] from comment #28) > So far I have: > > moco releng: > users: ffxbld, trybld We don't publish try builds to the main symbol server, so trybld can be dropped. We do publish xulrunner symbols so please add xrbld. In addition to the apps you list there are people who upload to symbols_camino - calbld symbols_fedora - bug 562570 symbols_os - ted et al symbols_opensuse - bug 535947 symbols_penelope - ?? symbols_solaris - ?? symbols_ubuntu - ??
(In reply to Nick Thomas [:nthomas] from comment #29) > symbols_camino - calbld Correction: caminobld
I'll make sure bug 688250 gets wrapped up monday (Jason is out today/tmrw). What else do we need to get this going? The dm-symbolpush01 VM will go away after next week.
At least part of what's left is to make sure that moco releng and community members can push symbols to the new machine, and that the location of those pushes changes at the appropriate moment. See comment 28 et seq. I think we can do the users with puppet -- I'm waiting for a full list, and in particular info about thunderbird and calendar.
Thunderbird update: I can't resolve symbols1.dmz.phx1.mozilla.com from the MoMo build machines. I also don't have a route to 10.8.74.48. The user we're currently using to upload symbols is 'tbirdbld'.
The Calendar symbols are pushed as user 'calbld'
thunderbird (and all community projects) will need to use the external VIP, which is named symbolpush.mozilla.org. The users are created, and ssh keys in LDAP can be used to login: realize(Ldap_Users::User[ 'ffxbld', # moco releng, non-try 'seabld', # seamonkey 'calbld', # calendar 'caminobld', # camino 'tbirdbld' # momo ]) Adding access for others is easy, although if they're not in LDAP, or UIDs don't line up, we may need to do some fiddling and chown'ing. Ted's in that category. Let's work that out as necessary, particularly since the people not already listed will need to use a new hostname to upload next time, anyway. This hostname and VIP are now up, but are pending flows in bug 743072. Nick, can you test this internally? Everyone else, can you test externally when 743072 is closed? Nick, are you comfortable with the transition process as laid out above? Callek, are you OK for Seamonkey? Anyone else have concerns?
Assignee: ted.mielczarek → dustin
(In reply to Dustin J. Mitchell [:dustin] from comment #35) > Callek, are you OK for Seamonkey? Yep ok, I would like ~ a day to verify we can connect, update the config and watch that it does in-fact work for uploading symbols before we disable the old host though, in case a problem arises that is not fixable in minutes. All that awaits flows of course.
The flow is open now, and seabld access is in - can you check it out?
(In reply to Nick Thomas [:nthomas] from comment #29) > symbols_fedora - bug 562570 jhorak is the contact for Fedora. > symbols_os - ted et al We already took care of the symbolfetch account that I use for OS symbol uploads. If you can make sure my LDAP account gets me into the new VM then I'll be all set. > symbols_opensuse - bug 535947 wolfIR is the contact for OpenSusE. > symbols_penelope - ?? Penelope is the Eudora+Thunderbird project, this was bug 466199. Jeff Beckley was the contact. I'm not sure of the current status. > symbols_solaris - ?? This was bug 424061. Alfred Peng is the contact. I'm not sure of the current status. > symbols_ubuntu - ?? This was bug 588078. Chris Coulson (already CCed) is the contact for Ubuntu. To anyone I just CCed: the symbol upload server is moving from dm-symbolpush01.mozilla.org to symbolpush.mozilla.org Real Soon Now. Sorry for the late notice, we were scrambling to get a bunch of other things fixed first. This should be a relatively painless transition, mostly just changing the hostname, but we need to ensure that everyone's usernames and SSH keys have been migrated.
I can't resolve symbolpush.mozilla.org right now. Is this server still down? I'd like also ask if ssh account stays same (username and keys) and if upload path still remains '/mnt/netapp/breakpad/symbols_fedora' (for fedora). Thanks for letting us know.
Administrator@SEA-WIN32-02 /c/Documents and Settings/seabld $ ssh -l seabld -i .ssh/seabld_dsa symbolpush.mozilla.org ssh: symbolpush.mozilla.org: no address associated with name Administrator@SEA-WIN32-02 /c/Documents and Settings/seabld $ ssh -l seabld -i .ssh/seabld_dsa symbols1.dmz.phx1.mozilla.com ssh: symbols1.dmz.phx1.mozilla.com: no address associated with name Administrator@SEA-WIN32-02 /c/Documents and Settings/seabld $ ssh -l seabld -i .ssh/seabld_dsa symbolpush.mozilla.com ssh: symbolpush.mozilla.com: no address associated with name Given that (from scl3 comm VLAN) And jhorak's comment I'm not even going to try from sjc1 yet.
whats left to do here, before this can be used for production traffic by all users of the old-and-going-away symbolpush? Per irc, we have a hard cutover date of 4/13. If we can switchover to using this new VM before 4/13, that would be.... good.
Ugh, sorry, I posted the "it should work now" update last night seeing the bug was resolved, without actually testing. My DNS change didn't work. I'll have an update in a few. I'll also get the remaining accounts added.
The hostname is resolving now. I missed the trailing dot in the CNAME. Classic error. :( (In reply to Ted Mielczarek [:ted] from comment #38) > (In reply to Nick Thomas [:nthomas] from comment #29) > > symbols_fedora - bug 562570 > > jhorak is the contact for Fedora. fedorasymbols was a local account - need LDAP > > symbols_os - ted et al > > We already took care of the symbolfetch account that I use for OS symbol > uploads. If you can make sure my LDAP account gets me into the new VM then > I'll be all set. done. > > symbols_opensuse - bug 535947 > > wolfIR is the contact for OpenSusE. done. (username 'wr') > > symbols_penelope - ?? > > Penelope is the Eudora+Thunderbird project, this was bug 466199. Jeff > Beckley was the contact. I'm not sure of the current status. penelopebld - need LDAP > > symbols_solaris - ?? > > This was bug 424061. Alfred Peng is the contact. I'm not sure of the current > status. apeng - need LDAP? > > symbols_ubuntu - ?? > > This was bug 588078. Chris Coulson (already CCed) is the contact for Ubuntu. chrisccoulson - done
For the accounts above that need LDAP, we'll need to get separate tickets to make sure the proper account-granting stuff gets done. Affected folks (jhorak, Jeff, and Alfred): please file a bug in "Server Operations: Desktop" requesting an LDAP account, copying me, and referencing this bug. I'll make sure that gets done in a timely fashion. Once that's in place, I'll make sure you have access to the machine. jhorak: yes, path remains the same. We're just waiting for the releng pieces to fall into place to schedule a time for the cutover -- sometime in the next week. We won't need that to be *terribly* accurate - just within a day. To put a stake in the ground, let's say next Wednesday (April 11) Here's how I'm proposing that will work: * In advance, everyone verifies the ability to upload (validating network flows, ssh authz, filesystem perms) and signs-off here * I remove --delete from the sjc1 -> scl3 rsync * everyone switches automated uploads to phx1 via the appropriate hostname * 24h later, I shut down the sjc1 host (to best generate errors for stragglers) Let me know if that doesn't work.
(In reply to Dustin J. Mitchell [:dustin] from comment #44) > * In advance, everyone verifies the ability to upload (validating network > flows, ssh authz, filesystem perms) and signs-off here Tested one slave in scl3 and from jumphost in sjc1: * Flows OK * ssh authz OK * filesystem perms: LOOKS ok, will be certain when I do the cutover > * everyone switches automated uploads to phx1 via the appropriate hostname Sounds good (Sea Will need to do known_host updates manually, but is doable)
(In reply to Dustin J. Mitchell [:dustin] from comment #44) > * In advance, everyone verifies the ability to upload (validating network > flows, ssh authz, filesystem perms) and signs-off here I've checked access to symbolpush.mozilla.org (which seems to be the proper alias to use?) Everything works apparently. Haven't uploaded real data yet but I can create files where I usually need to do so expect it to work.
Yep, that's the correct hostname. Thanks for verifying!
I will need an account on the new machine so that I can continue Flash symbol upload and other maintenance activities.
I think Nick/releng is the deciding factor for timing here. Recall that, per comment #31, this VM need to go away at the end of the week (really Monday morning, but let's not *plan* to work another weekend, eh?). The day-of migration plan is in comment 44. Nick, what do you think?
Could this happen in the planned hg downtime on Thursday morning?
per IRC conversation with Dustin, this could be done during the downtime Thursday as it is a reconfig change. The driving factor is coordination with nthomas. I'll ping nthomas to find out if this is accurate.
We can almost certainly do bug 722759 in the downtime - the Firefox side is pretty close to ready (bug 742045).
Please fix * file ownership of everything in symbols_xr should be xrbld * xrbld can't log on via ssh, it's rejecting the key
Oops, I missed xrbld in the comments above. Added via puppet, and the chown is running now. rsync *may* un-do those chown's, in which case we'll need to either stop the rsync's or exclude this dir. I removed --delete from the symbol rsync on dm-symbolpush01: rsync -av --whole-file /mnt/netapp/breakpad/* firstname.lastname@example.org:/mnt/socorro/symbols/ >> /tmp/slow_rsync.log 2>&1
The xrbld permissions seem to still be in place.
Did this get fixed up in the downtime? Tomorrow's my drop-dead date for the old VM
No, but I'm going to deploy the firefox/fennec/xulrunner change to buildbot shortly.
Dustin: can you add bsmedberg and myself to the 'users' group? Otherwise I can't write symbols to the symbols_os dir.
Hm, my puppety techniques for accomplishing that don't work - I've called in reinforcements.
(In reply to Ted Mielczarek [:ted] from comment #59) > Dustin: can you add bsmedberg and myself to the 'users' group? Otherwise I > can't write symbols to the symbols_os dir. I've added you both manually - I think puppet chokes on the fact that "users" is a reserved phrase.
http://pastebin.mozilla.org/1568181 - The lack of symbol manifests between April 8 and 12 indicates a problem with syncing from sjc1 to phx1. We should be able to recover the 12.0b5 symbols from the build slaves, but there's a gap in the nightly symbols which might be a problem.
Hmm, might just be the txt files, which means it only a problem for cleaning old symbols up. Here's a spot check for a m-c windows nightly: sjc1: nthomas@surf:/mnt/breakpad_symbols/symbols_ffx$ grep 9511ABD675174E37AD9D7589721D95AC2 *txt firefox-14.0a1-WINNT-20120410075652-symbols.txt:xul.pdb/9511ABD675174E37AD9D7589721D95AC2/xul.sym firefox-14.0a1-WINNT-20120410075652-symbols.txt:xul.pdb/9511ABD675174E37AD9D7589721D95AC2/xul.pd_ nthomas@surf:/mnt/breakpad_symbols/symbols_ffx$ sha1sum xul.pdb/9511ABD675174E37AD9D7589721D95AC2/* 92b67e9ea632d4188c18b0eaab0b64346d373ed8 xul.pdb/9511ABD675174E37AD9D7589721D95AC2/xul.pd_ f0f1b9367676119e7273626ce975c16586ba32e8 xul.pdb/9511ABD675174E37AD9D7589721D95AC2/xul.sym phx1: [email@example.com symbols_ffx]$ grep 9511ABD675174E37AD9D7589721D95AC2 *txt [firstname.lastname@example.org symbols_ffx]$ ls xul.pdb/9511ABD675174E37AD9D7589721D95AC2 xul.pd_ xul.sym [email@example.com symbols_ffx]$ sha1sum xul.pdb/9511ABD675174E37AD9D7589721D95AC2/* 92b67e9ea632d4188c18b0eaab0b64346d373ed8 xul.pdb/9511ABD675174E37AD9D7589721D95AC2/xul.pd_ f0f1b9367676119e7273626ce975c16586ba32e8 xul.pdb/9511ABD675174E37AD9D7589721D95AC2/xul.sym
I'm not sure of the details, but the old host isn't shut down yet, if you want to do some last-minute rsync'ing..
From bug 745363 it looks like we're OK for the actual symbols, just missing some txt files. Could we do an rsync on that ? I could get ffx but thunderbird has a similar problem so it might be global.
And if a complete rsync doesn't take too many hours that might be the most reassuring way to finish this up. Without --delete of course :-)
Curiosity got me running this on dm-symbolpush01 rsync -navOi /mnt/netapp/breakpad/symbols_ffx/ \ firstname.lastname@example.org:/mnt/netapp/breakpad/symbols_ffx/ \ 2>&1 | tee /tmp/ffx_check.log
It's looking very much like only some missing txt files. Given the sjc1 files are mounted on stage too, we can fix that up next week. If we have the cron jobs transferred to symbols1.dmz.phx1 then we could go ahead with terminating dm-symbolpush. FWIW, I don't see any record of that here.
I only see /var/spool/cron/root # HEADER: This file was autogenerated at Fri Apr 06 08:30:38 -0700 2012 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # This script sync only the newly uploaded symbols to phx */5 * * * * /usr/bin/flock -n /tmp/symbol-sync.lock -c /usr/local/bin/sync_symbol_dirs_to_phx.sh # This script is a full tree rsync - mostly for deleting stuff on the phx side 3 3 * * 0 /usr/bin/flock -n /tmp/symbol-rsync.lock -c /usr/local/bin/sync_symbols_to_phx.sh 1 1 * * 0 /bin/cat /dev/null > /tmp/rsync.log # Puppet Name: cleanup-breakpad-symbols MAILTO=root 0 4 * * * /mnt/netapp/breakpad/cleanup-breakpad-symbols.sh > /dev/null The first two of which aren't needed anymore. I'll replicate the last one.
I *think* we're done here - the old host is going away shortly, anyway.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Dustin, could you please unpack this at symbols.dmz.phx1.mozilla.com:/mnt/netapp/breakpad/ ? It's the manifests for beta/release builds that I grabbed from the sjc1 server before it goes away. I didn't bother with any nightly builds, since its been 30 days since we resolved this bug and most have been removed anyway. I left the Ubuntu manifests alone because I don't know the naming scheme. I've got a copy of almost all the manifests if required later.
Done: [email@example.com breakpad]# tar -ztf ~/symbol-manifest-additions.tar.gz | xargs ls -l -rw-rw-r-- 1 ffxbld users 20857 Apr 11 10:02 symbols_ffx/firefox-12.0-Darwin-20120411064248-macosx64-symbols.txt -rw-rw-r-- 1 ffxbld users 16552 Apr 11 07:52 symbols_ffx/firefox-12.0-Linux-20120411064248-linux64-symbols.txt -rw-rw-r-- 1 ffxbld users 16552 Apr 11 07:55 symbols_ffx/firefox-12.0-Linux-20120411064248-symbols.txt -rw-r--r-- 1 ffxbld users 20566 Apr 11 09:50 symbols_ffx/firefox-12.0-WINNT-20120411064248-symbols.txt -rw-rw-r-- 1 ffxbld users 14650 Apr 11 07:38 symbols_mob/fennec-12.0-Android-20120411064327-android-xul-mozilla-beta-symbols.txt -rw-rw-r-- 1 seabld users 16961 Apr 11 21:11 symbols_sea/seamonkey-2.9-Linux-20120411203037-linux64-symbols.txt -rw-rw-r-- 1 seabld users 16961 Apr 11 22:54 symbols_sea/seamonkey-2.9-Linux-20120411204253-symbols.txt -rw-r--r-- 1 tbirdbld users 21536 Apr 13 09:33 symbols_tbrd/thunderbird-10.0.2-WINNT-20120413055242-symbols.txt -rw-rw-r-- 1 tbirdbld users 21741 Apr 10 16:22 symbols_tbrd/thunderbird-12.0-Darwin-20120410125551-symbols.txt -rw-rw-r-- 1 tbirdbld users 16695 Apr 10 14:43 symbols_tbrd/thunderbird-12.0-Linux-20120410130036-symbols.txt -rw-rw-r-- 1 tbirdbld users 16695 Apr 10 15:25 symbols_tbrd/thunderbird-12.0-Linux-20120410130832-symbols.txt -rw-r--r-- 1 tbirdbld users 21212 Apr 10 16:02 symbols_tbrd/thunderbird-12.0-WINNT-20120410125251-symbols.txt -rw-r--r-- 1 xrbld users 5590 Mar 13 05:46 symbols_xr/xulrunner-12.0a2-WINNT-20120313042010-symbols.txt -rw-rw-r-- 1 xrbld users 7564 Apr 11 10:03 symbols_xr/xulrunner-12.0-Darwin-20120411064248-macosx64-symbols.txt -rw-rw-r-- 1 xrbld users 3795 Apr 11 07:22 symbols_xr/xulrunner-12.0-Linux-20120411064248-linux64-symbols.txt -rw-rw-r-- 1 xrbld users 3795 Apr 11 08:37 symbols_xr/xulrunner-12.0-Linux-20120411064248-symbols.txt -rw-r--r-- 1 xrbld users 5590 Apr 11 08:06 symbols_xr/xulrunner-12.0-WINNT-20120411064248-symbols.txt
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.