Closed Bug 1266584 Opened 8 years ago Closed 8 years ago

Setup supervisord to run Pulse ingestion listening (bin/run_read_pulse_jobs) on SCL3 stage/prod

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: camd, Assigned: fubar)

References

Details

Attachments

(1 file)

This management command will process messages on a set of pulse exchanges and ingest the data as jobs into treeherder.

The configuration for this command is all in settings.py, so no arguments are needed.

The command would simply be ``./mange.py ingest_from_pulse``

This process should not be activated, however, until bug 1266229 is closed.
Depends on: 1266229
Assignee: nobody → klibby
Summary: Setup machine to run the ingest_from_pulse management command → Setup supervisord to run the ingest_from_pulse management command
Why not celerybeat or a normal queue, out of curiosity? :-)
Well, this may be (probably is) my own ignorance.  But I listen to the Queue with the Kombu ``consumer.run()`` which just keeps listening to anything new that shows up.  But that calls a ``consume`` function where it will consume a limited number of messages to process at one time.

Looking at it, I think I could just make this a celery beat process that reads N messages every N seconds.  Or, hopefully, there's a way to have it read till it's empty, then let celerybeat restart it after N seconds to look again.  I'll see if I can get it working that way locally.
Ah it's just permanently listening (it's been ages since I looked at the code that landed in bug 1169320, 7 months in fact, wow time flies!). Then celery may not be the way to go anyway :-)
Blocks: 1169320
Once bug 1275717 is resolved, then this should be able to work.
I think we should create a script in the bin/ directory which gets run by supervisord. That way we can tweak ourselves. (The command will need to source the environment variables, use the newrelic wrapper etc like the other scripts)
You're right, as usual, Ed.  :)  OK, yeah, I'll make that now...
Attachment #8756576 - Flags: review?(emorley)
Attachment #8756576 - Flags: review?(emorley) → review+
OK, Fubar: so the new command would be to run ``bin/run_pulse_ingestion``
Once bug 1275717 has been addressed, we're ready to start with this process.  But the new command is:

    bin/run_read_pulse_jobs

If you would, please ping me in IRC when you are about to start this so I can monitor new relic and stuff.
Flags: needinfo?(klibby)
No longer blocks: 1169320
which set of hosts should this run on? etl nodes? processor nodes? other?
Flags: needinfo?(klibby) → needinfo?(cdawson)
Fubar:  It would be the etl nodes.  Though I'm not sure we need one on every etl node.  One instance would probably suffice.  However, it would work just fine to run one on every ETL node.  So if it's easier to do one on each, then that'll work.

This also needs to be able to access https://pulseguardian.mozilla.org.  So do you need a network flow for that?  I tried it as a one-off from ssh and it seemed to work without a flow.  Just wanted to mention.. :)
Flags: needinfo?(cdawson) → needinfo?(klibby)
Will this work fine if run on multiple nodes? The celery workers (run via the existing buildapi scripts) will be fine, but this will mean multiple consumers of the pulse queue?

We could instead just run on the rabbitmq node, since there's just one, and this is fairly lightweight compared to the store_pulse_jobs celery task.
added to puppet config to run on rabbitmq node, though if you confirm it's safe it's easy to move to the etl nodes.

asked :camd to chmod 755 the script in the repo, since it's 644 and supervisord doesn't like that. :-)
Flags: needinfo?(klibby)
Fubar: awesome, thanks.  working nicely on stage.  garndt and I are doing work this week to get stage to ONLY use the pulse job ingestion from Task Cluster (and stop TC from submitting them via the API.  it's doing BOTH at the moment, which is working fine, too).  Once we've tested that everything is working fine in the pulse-only-from-TC scenario, then we'll be ready to move to production.

So this is just a heads up that the timeline window to start this supervisord process on production is looking like June 27-30th.  I'm on PTO from 20-24.

Would that work with your schedule?  

Also: are the need-info's an annoyance to you?  :)  I'll stop doing that, if they are.
Flags: needinfo?(klibby)
adding this to prod should be really simple, so any time that week should be ok. NI's are great! please keep using them! :-D
Flags: needinfo?(klibby)
Adjusting summary for the rename of the script. (The script is enabled on SCL3 stage, just need a few more fixes in bug 1266229 prior to enabling on stage.)
Priority: -- → P2
Summary: Setup supervisord to run the ingest_from_pulse management command → Setup supervisord to run Pulse ingestion listening (bin/run_read_pulse_jobs) on SCL3 stage/prod
s/prior to enabling on stage/prior to enabling on prod/
OK, we are a go for turning this on at 10am PT.

However, the current config for prod is using a pulse user owned by mdoglio.  Let's use the one called "treeherder-prod" instead.  It's owned by me.

I can send you PM with the password when you're ready to make the change.  It'll be:

export PULSE_DATA_INGESTION_CONFIG="amqp://treeherder-prod:<pw goes here>@pulse.mozilla.org:5671/?ssl=true"
Flags: needinfo?(klibby)
Turn on time at 10am PT July 7, that is.  :)
new pulse credentials added (in addition to existing/old creds), new job enabled on prod, and restart-jobs script updated with new job.
Flags: needinfo?(klibby)
Everything is working smoothly on prod.  We're done here.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
See Also: → 1297208
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: