Closed Bug 628731 Opened 9 years ago Closed 9 years ago

Move symbol upload server (dm-symbolpush01) to PHX

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ted, Assigned: aravind)

References

Details

Currently our build machines + a few other groups (Fedora, Ubuntu, etc) are uploading symbol files to dm-symbolpush01, which lives in the MPT colo. Since all the Socorro processing has migrated to PHX, we are rsync'ing the symbols from dm-symbolpush01 over to the symbol store in PHX. Currently this is working okay, except that the rsync takes about 15 hours per run, so there's a window between when symbols get uploaded and when they get transferred that we can process crashes from nightly builds and not get useful crash reports. In bug 619944 I was the one who suggested the rsync solution, but I was only thinking of it as an interim solution. I believe the right long-term solution is to move the symbol upload server to PHX as well. The lowest-impact change here would be to setup a box in PHX with the same SSH host keys and accounts as dm-symbolpush01, and just switch the DNS to point to it when we're ready. If we want to change the hostname (since the current one indicates it's in MPT), we can do that, but we'll have to update our buildbot configs as well as coordinate with some third-parties that are uploading symbols.
Is this a before-Fx4 or after-Fx4 type task?
That's a good question. This will definitely require a downtime, since even if we just switch the DNS there's the possibility that we get a firewall rule wrong or something like that and disrupt builds. Our current setup is probably having a small negative impact on crash processing for our nightly builds, since it's possible that we can receive crash reports before the symbol rsync has completed. It shouldn't have any impact on beta or release builds, since there's plenty of time from when the build completes to when users actually start using the builds.
Assignee: server-ops → aravind
> so there's a window between when symbols get uploaded and when they get 
> transferred that we can process crashes from nightly builds and not get useful > crash reports.

This is turning out to be a significant problem, resulting in a lot noise and bad reports for top crash bugs on the trunk.

The risk is that we won't be able to diagnose new crashes and get them fixed early and we will end up shipping regressions in beta's, RCs and final releases.

We need to move quickly to get this fixed.
Duplicate of this bug: 629959
Talked to Ted about this.  There is a POST_SYMBOL_UPLOAD_CMD option that we could override to give us specific symbol file information that needs to be synced, as stuff is uploaded.  This should help in the interim.  The bug where that work is being done 607951
Depends on: 607951
After some more discussion, Aravind thinks he can work something up using the existing -symbols.txt index files instead of relying on the post-symbol-upload command. He's going to try to get something working today and see if that's sufficient.
Duplicate of this bug: 629959
The script to sync just the newly uploaded symbols is now ready and running in production.  In my test runs it seems to be a lot faster than the 15 hours the previous one took.  I will post the numbers here, once I let it work through tonight's uploads.
I am willing to call this good.  With the new sync scripts, the symbol files now show up in phx within 15 minutes of being uploaded to the mpt server.  Given this rate, I am not sure we need to move the symbol push server to phx or fix the build scripts to upload to multiple locations.
It's your setup to maintain, so I am fine with whatever you want to do.
I am going to close this as a WONTFIX for now.  We still need symbols in mpt anyway for all of the soccorro staging infrastructure, so we will have to maintain a copy in mpt anyway for that.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.