Closed Bug 344466 Opened 19 years ago Closed 18 years ago

talkback symbol push needs to move to spike

Categories

(Release Engineering :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rhelmer, Assigned: coop)

References

Details

Attachments

(1 file, 1 obsolete file)

The tinderboxes current do an SSH tunnel through stage to get to the talkback server. This isn't needed anymore, and also causes problems if an ssh tunnel is created elsewhere (since the accepted public hostkey for "localhost" changes).
Also, we should be pushing to spike, not hal. This is configured in the same Makefile.
(In reply to comment #1) > Also, we should be pushing to spike, not hal. This is configured in the same > Makefile. Actually, we should take this opportunity to upload to "talkback-upload.build.mozilla.org," and define that host in our build network's DNS. This will, of course, require ssh key futzing. TR: if you're working on this, please feel free to re-assign to yourself.
Assignee: rhelmer → tfullhart
Status: NEW → ASSIGNED
This patches Makefile.in so it will not use SSH tunnels to configure the talkback server web app or to upload the symbol data. Also, this uses talkback-upload as the host to use for FullCircle and for uploading the symbol data.
Blocks: 311977
Comment on attachment 239557 [details] [diff] [review] Patch to talkback/fullsoft/Makefile.in In general, we still need a way to support SSH tunnels. Tinderboxen that are in the office, for instance, will still need to use stage as a passthru (unless we allow ssh from the office network). I'm also worried about external tinderboxen (SeaMonkey, etc.) that may still need to use this method. Maybe wrap the changes in a LOCAL_TALKBACK define, so the default is to continue to use SSH, but the machines in the build network can use the new method? >+FC_SERVER = talkback-upload FQDN?
Whatever solution we end up deciding on, we need to make sure other projects that use Talkback are aware of the move to spike (or know about the new talkback-upload.build.mozilla.org hostname). I am also making this bug critical, since it has sat for long enough and I am blocked on it for retiring HAL. IT no longer supports HAL, so we need to get the symbol uploads moved to spike ASAP. If that means a short term solution that works, we need to do that.
Severity: normal → critical
This incorporates preed's feedback. If LOCAL_TALKBACK is defined, the ssh tunnel won't be used. This version hasn't been extensively tested, I only posted it to get review on it and to have it somewhere until it's been tested to my satisfaction.
Attachment #239557 - Attachment is obsolete: true
Taking TR's bug; I'll retriage these shortly.
Assignee: tfullhart → preed
Status: ASSIGNED → NEW
Reassigning bugs I'm not actively working on back into the triage pool.
Assignee: preed → build
During bug triage, this bug appears to be obsolete, so we are closing it. If you think this bug should be worked on, please reopen it and add comments explaining why. Thank you, The Build Team.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → WONTFIX
Actually, we need to move the symbols push to spike (see comment #2). I have been asking for this for over a year and it has been pushed off due to other build priorities. Now that we have an actual build team, it will be nice to get all symbols pushed directly to spike.mozilla.org. I have already setup the symbols user account on spike and copied over the ssh keys... so please try to get this fixed ASAP! I am having problem with the rsync and if this can be fixed quicker than me figuring out the cygwin issue on spike... that will save me a lot of pain.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Summary: talkback symbol push does not require SSH tunnel → talkback symbol push needs to move to spike
john asked whether this was relevant due to all the breakpad stuff going on, and the answer is we still need to do this. breakpad is not going to replace talkback completely for a while... so we still need to maintain it. and i have been wanting to kill off the hal machine for a long time, so still would like to get this one last piece off of that machine.
Assignee: build → ccooper
Status: REOPENED → NEW
Priority: -- → P2
Jay: is the new server listed in TR's patch, talkback-upload.build.mozilla.org, still the correct one, i.e. spike==talkback-upload.build.mozilla.org?
Status: NEW → ASSIGNED
The alias wasn't setup yet in the build DNS, so I went ahead and set it up.
How will this work for community builds that need to upload talkback symbols? Should we perhaps use a talkback-upload.mozilla.org alias to spike that would be visible both inside and outside the firewall instead?
(In reply to comment #14) > How will this work for community builds that need to upload talkback symbols? > Should we perhaps use a talkback-upload.mozilla.org alias to spike that would > be visible both inside and outside the firewall instead? > I think that was the original plan... so that others can continue to push symbols spike. We just need to make sure folks are notified soon so they can update their build automation. I already have ssh keys copied to spike for all current projects pushing to hal, so they should be able to switch anytime.
Sadly, I think the only person who needs to worry about this is me. The community projects are all using pre-built Talkback packages now (bug 373373) but I'll need to update those to point to the new server. I'll need to go accept the new SSH key on all the build machines too.
I had some issues a few weeks ago when I tried to push the symbols to spike from the community servers, so I'm going to work up to that again by first verifying that we can upload directly to spike from inside the firewall using the various build keys.
jay: I tried connecting to talk-upload.mozilla.org from a tinderbox within the firewall, but the connection just timed out (output below). Are the right ssh public keys (for cltbld, but also for the community build users: calbld, caminobld, and seabld) on this box? Should we still be connecting as the symbols user? bm-xserve02:~ cltbld$ time ssh -vv -2 symbols@talkback-upload.mozilla.org OpenSSH_3.8.1p1, OpenSSL 0.9.7i 14 Oct 2005 debug1: Reading configuration data /etc/ssh_config debug2: ssh_connect: needpriv 0 debug1: Connecting to talkback-upload.mozilla.org [10.2.74.101] port 22. debug1: connect to address 10.2.74.101 port 22: Operation timed out ssh: connect to host talkback-upload.mozilla.org port 22: Operation timed out real 1m15.235s user 0m0.005s sys 0m0.011s
Per preed/build's original firewalls, that access is specifically not allowed. build - okay to open?
I would say 'yes.' Bridging through stage isn't exactly more secure. Reminder: the community machines will need the same access. Not sure how much that complicates things.
mrz: any chance of getting this firewall change made this week? I'd like to start making the changes to the talkback source and rebuilding the community talkback packages.
Talked to preed - opened up the firewall from build to 10.2.74.101 (internally) and from community build to 63.245.208.174 (external address).
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
jay: I can connect to spike(talkback-upload) from the internal network now as the symbols user, but I still can't login from the community machines, i.e. I can connect to spike, but the login fails. How did we want to handle this? We use new product-specific ssh keys on the community build network now. Do we want to add these keys to the authorized_keys file for the symbols user on spike, or should we create individual upload accounts on spike for the community products?
Coop: I rather keep just one account "symbols" for all products...it just makes managing the server easier. I only have keys for cltbld and a couple of other projects (Camino and Sunbird), so I guess we need to round up all the product-specific ssh keys and add them to authorized_keys. If you have all of them, you can send them to me via email and I can get them added and report back here.
(In reply to comment #24) > If you have all of > them, you can send them to me via email and I can get them added and report > back here. Sent.
I have updated the symbols user's authorized_keys with the public keys for: firefox, thunderbird, calendar, camino, seamonkey, and xul runner. Any servers with access to spike that want to push symbols with their product-specific users should be able to do so now. Coop: Let me know if things look good on your end now. Thanks.
Yep, I can connect using all the new build keys now. Thanks. I'll start updating TR's patch, and then get on with repackaging the community packages.
Comment on attachment 250498 [details] [diff] [review] Patch to talkback/fullsoft/Makefile.in Luckily this code doesn't change very often. I've changed the REAL_FC_SERVER to be talkback-upload.mozilla.org, but otherwise this patch seems fine.
Attachment #250498 - Flags: review+
I checked in the code fix last night, and am updating the community packages right now. jay: can you verify that we have symbols or Fx/Tb from last night on the Talkback server? I don't see any upload errors when looking at tinderbox, but I just want to make sure.
<nagios> spike:symbols age is WARNING: WARNING on /g/symbols/Mozillaorg/Firefox2 - Last updated 101302 seconds ago. It could be a problem with the nagios check, though... jay?
The new symbols and dir on spike are owned by the symbols user rather than Administrator...not sure if that is relevant. e.g. /home/symbols/symbols/Mozillaorg/Firefox2/MacOSX/2007101606
Yeah, the Nagios error is probably due to the check parameters... it recovered in about an hour: **** Nagios ***** Notification Type: RECOVERY Service: symbols age Host: spike Address: 10.2.74.101 State: OK Date/Time: 10-16-2007 10:50:47 Additional Info: /g/symbols/Mozillaorg/Firefox2 OK - Last updated 540 seconds ago. Reed: I think you were the one to setup the Nagios checks, so if this problem continues, can you look into that? Coop: I see symbols from this morning for all 3 platforms: Administrator@spike /g/symbols/Mozillaorg/Firefox2 $ ll */ | grep symbols | grep 200710 drwxrwxrwx+ 2 symbols None 0 Oct 16 04:03 2007101603/ drwxrwxrwx+ 2 symbols None 0 Oct 16 08:47 2007101606/ drwxrwxrwx+ 2 symbols None 0 Oct 16 04:15 2007101603/ Looks like we're good. I will keep an eye on the reports to make sure the new "symbols" owned files don't break anything. I think we'll be ok, since the Talkback server just needs to be able to read them. Thanks for getting this going Coop!
Re: Nagios - Once Coop has all the symbols uploads going straight to spike, depending on what the check does right now, we might need to modify it to look in the right places for symbols. I am not sure what the current config is, but if it's checking rsync logs between hal and spike, we need to disable that and create a new check.
(In reply to comment #32) > Reed: I think you were the one to setup the Nagios checks, so if this problem > continues, can you look into that? Nope, I can't. I have no access to any of the servers that I would need to access in order to work on this.
All symbols uploads should now be going straight to spike. All the Talkback CVS branches we use from mofo/ have been updated, and all the community Talkback packages were updated this morning. Symbols are being uploaded to spike in the same hierarchy as was used on hal: /home/symbols/symbols/Mozillaorg/%PRODUCT%/%PLATFORM%/%BUILDID% I'll plan to close this tomorrow once I can confirm that community symbols get uploaded correctly.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
Looks like community users don't have permissions on spike: scp: /home/symbols/symbols/MozillaOrg/Sunbird05/LinuxIntel//2007101704.tar.bz2: Permission denied (from: http://tinderbox.mozilla.org/showlog.cgi?log=Sunbird-Mozilla1.8/1192619340.1192622194.17269.gz&fulltext=1)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
jay: I think we just need to chown everything under /home/symbols/symbols/Mozillaorg to symbols. Some of the Camino and Sunbird dirs are owned by Administrator.
Assignee: ccooper → jay
Status: REOPENED → NEW
(In reply to comment #37) > jay: I think we just need to chown everything under > /home/symbols/symbols/Mozillaorg to symbols. Some of the Camino and Sunbird > dirs are owned by Administrator. > symbols is now the owner of everything under /home/symbols/symbols. Please let me know if this helps the community projects get their symbols uploaded. Mark fixed if that is the case. Thanks Coop!
Coop: I just looked and we don't have any Firefox2 symbols for 10/18. Did we have build from this morning that did not get symbols uploaded? I have seen nagios alerts today: ***** Nagios ***** Notification Type: PROBLEM Service: symbols age Host: spike Address: 10.2.74.101 State: CRITICAL Date/Time: 10-18-2007 12:58:24 Additional Info: /g/symbols/Mozillaorg/Firefox2/Win32 CRITICAL - Last updated 115293 seconds ago. Please check to make sure things are working on our Firefox2 boxes, since that is what nagios checks to verify that new symbols are coming in. Thanks!
Assignee: jay → ccooper
Spike threw a tar error while unpacking the symbols that I haven't seen before: Unpacking symbols on remote host... NEXT ERROR 2 [main] tar 1412 C:\cygwin\bin\tar.exe: *** fatal error - C:\cygwin\bin\tar.exe: *** CreateFileMapping Global\cygwin1S4.cygpid.4620, Win32 error 0. Terminating. tar: child process: Cannot fork: Resource temporarily unavailable tar: Error is not recoverable: exiting now I've since uploaded the symbols manually. This only happened yesterday, and only for the Windows build (i.e. today's symbols were uploaded fine). If this recurs, I might suggest upgrading some cygwin packages on spike. We also get intermittent bash warnings when connecting to spike via ssh, so tar and bash would be good candidates for upgrading.
Status: NEW → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: