This management command will process messages on a set of pulse exchanges and ingest the data as jobs into treeherder. The configuration for this command is all in settings.py, so no arguments are needed. The command would simply be ``./mange.py ingest_from_pulse`` This process should not be activated, however, until bug 1266229 is closed.
Why not celerybeat or a normal queue, out of curiosity? :-)
Well, this may be (probably is) my own ignorance. But I listen to the Queue with the Kombu ``consumer.run()`` which just keeps listening to anything new that shows up. But that calls a ``consume`` function where it will consume a limited number of messages to process at one time. Looking at it, I think I could just make this a celery beat process that reads N messages every N seconds. Or, hopefully, there's a way to have it read till it's empty, then let celerybeat restart it after N seconds to look again. I'll see if I can get it working that way locally.
Ah it's just permanently listening (it's been ages since I looked at the code that landed in bug 1169320, 7 months in fact, wow time flies!). Then celery may not be the way to go anyway :-)
I think we should create a script in the bin/ directory which gets run by supervisord. That way we can tweak ourselves. (The command will need to source the environment variables, use the newrelic wrapper etc like the other scripts)
You're right, as usual, Ed. :) OK, yeah, I'll make that now...
Created attachment 8756576 [details] [review] [treeherder] mozilla:pulse-ingest-script > mozilla:master
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/c0af32bf58e8ddb5addcede159f8d39f912be65d Bug 1266584 - Create ingest_from_pulse command script for supervisord
OK, Fubar: so the new command would be to run ``bin/run_pulse_ingestion``
Once bug 1275717 has been addressed, we're ready to start with this process. But the new command is: bin/run_read_pulse_jobs If you would, please ping me in IRC when you are about to start this so I can monitor new relic and stuff.
which set of hosts should this run on? etl nodes? processor nodes? other?
Fubar: It would be the etl nodes. Though I'm not sure we need one on every etl node. One instance would probably suffice. However, it would work just fine to run one on every ETL node. So if it's easier to do one on each, then that'll work. This also needs to be able to access https://pulseguardian.mozilla.org. So do you need a network flow for that? I tried it as a one-off from ssh and it seemed to work without a flow. Just wanted to mention.. :)
Will this work fine if run on multiple nodes? The celery workers (run via the existing buildapi scripts) will be fine, but this will mean multiple consumers of the pulse queue? We could instead just run on the rabbitmq node, since there's just one, and this is fairly lightweight compared to the store_pulse_jobs celery task.
added to puppet config to run on rabbitmq node, though if you confirm it's safe it's easy to move to the etl nodes. asked :camd to chmod 755 the script in the repo, since it's 644 and supervisord doesn't like that. :-)
Fubar: awesome, thanks. working nicely on stage. garndt and I are doing work this week to get stage to ONLY use the pulse job ingestion from Task Cluster (and stop TC from submitting them via the API. it's doing BOTH at the moment, which is working fine, too). Once we've tested that everything is working fine in the pulse-only-from-TC scenario, then we'll be ready to move to production. So this is just a heads up that the timeline window to start this supervisord process on production is looking like June 27-30th. I'm on PTO from 20-24. Would that work with your schedule? Also: are the need-info's an annoyance to you? :) I'll stop doing that, if they are.
adding this to prod should be really simple, so any time that week should be ok. NI's are great! please keep using them! :-D
Adjusting summary for the rename of the script. (The script is enabled on SCL3 stage, just need a few more fixes in bug 1266229 prior to enabling on stage.)
s/prior to enabling on stage/prior to enabling on prod/
OK, we are a go for turning this on at 10am PT. However, the current config for prod is using a pulse user owned by mdoglio. Let's use the one called "treeherder-prod" instead. It's owned by me. I can send you PM with the password when you're ready to make the change. It'll be: export PULSE_DATA_INGESTION_CONFIG="amqp://treeherder-prod:<pw goes here>@pulse.mozilla.org:5671/?ssl=true"
Turn on time at 10am PT July 7, that is. :)
new pulse credentials added (in addition to existing/old creds), new job enabled on prod, and restart-jobs script updated with new job.
Everything is working smoothly on prod. We're done here.