Closed Bug 1126369 Opened 11 years ago Closed 10 years ago

move crash-analysis.m.c/rkaiser/ to EC2

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rhelmer, Assigned: rhelmer)

References

Details

We currently have a crash-analysis server that hosts KaiRo's reports, written in PHP that live in this repo: http://hg.mozilla.org/users/kairo_kairo.at/crash-report-tools/ Let's move this to github and auto-deploy it to Heroku.
Depends on: 1126626
Status: NEW → ASSIGNED
Out of curiosity, why Heroku in specific ?
(In reply to Daniel Maher [:phrawzty] from comment #2) > Out of curiosity, why Heroku in specific ? Just one less thing for us to manage. It's trivial to auto-deploy to Heroku from travis, and I've already tested that these PHP scripts (which are intended to run from cron not web-facing) work find under the Heroku PHP buildpack. Since Heroku runs on AWS it has fast and free access to S3 and our hosted services (it needs Postgres too) If we end up having to do this ourselves, it'd require either a crontabber job or a cron job somewhere, PHP installed in our base image, and we'd have to figure out how to deploy it. All solvable, just a bit of extra work I'd like to avoid if we can.
Depends on: 1126885
OK so I got this working on Heroku, and converted most of the scripts over to use S3 instead of local filesystem: https://github.com/rhelmer/crash-report-tools/compare/automate?expand=1 However I think this is too much to test in time, in the short-term it'd be better to run this on EC2 and sync files from S3 before the scripts run, let them run on local files, and then sync the output over to S3.
Summary: move crash-analysis.m.c/rkaiser/ to Heroku → move crash-analysis.m.c/rkaiser/ to EC2
We're backing off the approach in comment 4 a bit, going to have a long-running instance which can run things the old-fashioned way, and expose the same mount points and web service as the old server. I've installed some packages which we'll need to get into the socorro-infra repo, for rebuilding this box in the future: mercurial, php-xml, pgp-pgsql, nano Also - I am copying /mnt/crashanalysis/rkaiser/ from the old crash-analysis server. /mnt/crashanalysis should be served statically by nginx on the new box.
(In reply to Robert Helmer [:rhelmer] from comment #5) > I've installed some packages which we'll need to get into the socorro-infra > repo, for rebuilding this box in the future: mercurial, php-xml, pgp-pgsql, > nano FTR, the second-to-last package there is php-pgsql (just a typo Rob made writing it down here).
(In reply to Robert Helmer [:rhelmer] from comment #5) > I've installed some packages which we'll need to get into the socorro-infra > repo, for rebuilding this box in the future: mercurial, php-xml, pgp-pgsql, > nano https://github.com/mozilla/socorro-infra/pull/169
I've mounted a 500 GB EBS volume on /mnt/crashanalysis, and sync'd it with the crashstorage S3 bucket (which was populated from the original crash-analysis server) Kairo, you should find the expected data files in /mnt/crashanalysis/rkaiser (and it should be fast enough now), and you should be able to browse this at https://elb-prod-socorroanalysis-1689424753.us-west-2.elb.amazonaws.com/rkaiser/ - can you confirm that this looks OK and everything is working for you?
(In reply to Robert Helmer [:rhelmer] from comment #8) > I've mounted a 500 GB EBS volume on /mnt/crashanalysis, and sync'd it with > the crashstorage S3 bucket (which was populated from the original > crash-analysis server) We'll need to get this new volume into terraform ^ I think we also want to do at least a nightly backup of this volume to the public crash-analysis S3 bucket.
The current setup for this box seems to be working now. I'm waiting for tomorrow with closing this bug as I want to confirm that the cron job works as well.
The cron ran fine today, so I'll mark this bug fixed. Rob, do we have open bugs on 1) investigating why AWS consistently has ~5% higher crash counts than PHX (I see that across the board with any data on any channel), 2) Switching over to the production URL of https://crash-analysis.mozilla.com/ when we feel ready?
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(rhelmer)
Resolution: --- → FIXED
(In reply to Robert Kaiser (:kairo@mozilla.com) - on vacation or slow to reply until the end of June from comment #11) > The cron ran fine today, so I'll mark this bug fixed. > > Rob, do we have open bugs on > 1) investigating why AWS consistently has ~5% higher crash counts than PHX > (I see that across the board with any data on any channel), Lars and I are investigating this now. In all the cases we've checked so far, PHX wasn't able to find the crash in S3 in time and AWS was. I'd like to understand a bit more why this is happening more in PHX, but it's not a bad thing - we've likely been undercounting by ~5% in PHX due to this error rate. This problem doesn't seem to happen at all in AWS in fact. > 2) Switching over to the production URL of > https://crash-analysis.mozilla.com/ when we feel ready? We're waiting on the SSL certs (bug 1173928), I'll file DNS bugs as services are ready.
Flags: needinfo?(rhelmer)
Thanks, sounds good. Both of those.
You need to log in before you can comment on or make changes to this bug.