Closed Bug 625965 Opened 13 years ago Closed 13 years ago

Add "export to TSV" cron job to input (prod)

Categories

(mozilla.org Graveyard :: Server Operations, task)

All
Other
task
Not set
blocker

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: wenzel, Assigned: cshields)

References

Details

input.stage now has a script that'll export its entire opinion database to a .tsv.bz2 file. The goal is to share our data with Metrics, and the community at large, for analyses.

We need to run this job on stage for testing, and after deployment also on prod.

On the stage box, please:
- pick a useful directory that the dumps will go into. Make it writable by the Apache user.
- to Input's settings_local.py, add:
  TSV_EXPORT_DIR = '/the/dump/dir'
- add a cron job like this (adapt paths if they have changed):
  0 2 * * * apache /data/virtualenvs/input/bin/python26
/data/www/input.stage.mozilla.com/reporter/manage.py cron export_tsv &> /dev/null
(this'll run once a day at 2am, let me know if that's not okay)

- run the command once, watch it work (takes perhaps 5 minutes), make sure there are two files available in the dump dir. Make the files writable by Apache so the cron job can replace them.
- in the Apache conf, hook up the following:
  input.stage.mozilla.com/data should point to the dump dir. I believe you can almost duplicate what we do for /media. Please enable directory listings for this and only this dir, if that's possible. If not, let me know.

If someone could get to this on Tuesday, that'd be peachy, so that Metrics and QA can test before we deploy Input 3.0.

Thanks!
Depends on: 605539
Assignee: server-ops → justdave
This blocks Tuesday's push, as I don't want to set this up on prod before we staged it.

If you need help, let me know, or if you are too busy, I can work with IT to get someone else on it, no biggie. Thanks!
Blocks: 626705
Severity: major → blocker
Can we get this done today? This is blocking the release for tomorrow and we need sometime to verify if its working.
Done on stage:

-bash-3.2$ ls -l /data/www/input.stage.mozilla.com/tsv_exports/
total 32828
-rw-r--r-- 1 apache apache 33572292 Jan 24 13:52 opinions.tsv.bz2
-rw-r--r-- 1 apache apache      127 Jan 24 13:52 ratings.tsv.bz2

aliased /data there as well.  Cron will run at 0200.

Will do on prod after the push tomorrow.
Assignee: justdave → cshields
Status: NEW → ASSIGNED
So, for stage verify this, I can. Was able to download the data and ran the opinions through the sites mapreduce script successfully. Also, the ratings seem good as far as I can tell just by looking at them. 

Will check again (cron) after the nightly export.
Corey: Can you also look at this part of comment 0: "Please enable directory listings for this and only this dir, if that's possible. If not, let me know."

I am not sure what the feasibility of this is, so if it's hard or even impossible, feel free to suggest an alternative.
(In reply to comment #5)
> Corey: Can you also look at this part of comment 0: "Please enable directory
> listings for this and only this dir, if that's possible. If not, let me know."
> 
> I am not sure what the feasibility of this is, so if it's hard or even
> impossible, feel free to suggest an alternative.

this was my FAIL.  fixed.  sorry!
this needs to be done in prod still - there is a bit of an architectural difference between stage and prod that needs to be solved in this implementation.  We have many options for doing this so I've dropped oremj a note for his opinion.

ETA tomorrow or sooner.
Summary: Add "export to TSV" cron job to input.stage → Add "export to TSV" cron job to input (prod)
Corey (Comment 7): is that architectural difference just about getting the export installed as a cronjob? Or does it prevent us from running it manually as well?

Cause at the moment manual TSV exports do not seem to work (in contrast to stage).
No(In reply to comment #8)
> Corey (Comment 7): is that architectural difference just about getting the
> export installed as a cronjob? Or does it prevent us from running it manually
> as well?
> 
> Cause at the moment manual TSV exports do not seem to work (in contrast to
> stage).

No, this has to do with serving the file from multiple webheads that need access to said file versus stage which is one webhead and local storage.

I'm working on this now.  Sorry, had outages elsewhere that preempted this.
I think I see the problem -- did not think about the issues in a load balanced setup. Great to know that you’re on it.

Aakash did not announce any download URL in the post, so I think we are super-good. We are producing a file offline right now and we’ll point anyone interested to it.
This is done in prod.  Got an nfs share from the netapp between the two (currently) webheads.  Apache is setup and /data is live.

Important thing to note on the admin side, this mount is not shared across the admin node (still mradm02) so I've set the cron job to run from pp-app-input02.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
It's working. Thanks!
Status: RESOLVED → VERIFIED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.