Closed Bug 1266584 Opened 9 years ago Closed 9 years ago

Setup supervisord to run Pulse ingestion listening (bin/run_read_pulse_jobs) on SCL3 stage/prod

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: camd, Assigned: fubar)

References

Details

Attachments

(1 file)

[treeherder] mozilla:pulse-ingest-script > mozilla:master 9 years ago GitHub Autolander Bot 47 bytes, text/x-github-pull-request	emorley : review+	Details \| Review

Cameron Dawson [:camd]

Reporter

Description

•

9 years ago

This management command will process messages on a set of pulse exchanges and ingest the data as jobs into treeherder. The configuration for this command is all in settings.py, so no arguments are needed. The command would simply be ``./mange.py ingest_from_pulse`` This process should not be activated, however, until bug 1266229 is closed.

Cameron Dawson [:camd]

Reporter

Updated

•

9 years ago

Depends on: 1266229

Cameron Dawson [:camd]

Reporter

Updated

•

9 years ago

Assignee: nobody → klibby

Cameron Dawson [:camd]

Reporter

Updated

•

9 years ago

Summary: Setup machine to run the ingest_from_pulse management command → Setup supervisord to run the ingest_from_pulse management command

Ed Morley [:emorley]

Comment 1

•

9 years ago

Why not celerybeat or a normal queue, out of curiosity? :-)

Cameron Dawson [:camd]

Reporter

Comment 2

•

9 years ago

Well, this may be (probably is) my own ignorance. But I listen to the Queue with the Kombu ``consumer.run()`` which just keeps listening to anything new that shows up. But that calls a ``consume`` function where it will consume a limited number of messages to process at one time. Looking at it, I think I could just make this a celery beat process that reads N messages every N seconds. Or, hopefully, there's a way to have it read till it's empty, then let celerybeat restart it after N seconds to look again. I'll see if I can get it working that way locally.

Ed Morley [:emorley]

Comment 3

•

9 years ago

Ah it's just permanently listening (it's been ages since I looked at the code that landed in bug 1169320, 7 months in fact, wow time flies!). Then celery may not be the way to go anyway :-)

Blocks: 1169320

Cameron Dawson [:camd]

Reporter

Comment 4

•

9 years ago

Once bug 1275717 is resolved, then this should be able to work.

Ed Morley [:emorley]

Comment 5

•

9 years ago

I think we should create a script in the bin/ directory which gets run by supervisord. That way we can tweak ourselves. (The command will need to source the environment variables, use the newrelic wrapper etc like the other scripts)

Cameron Dawson [:camd]

Reporter

Comment 6

•

9 years ago

You're right, as usual, Ed. :) OK, yeah, I'll make that now...

GitHub Autolander Bot

Comment 7

•

9 years ago

Attached file [treeherder] mozilla:pulse-ingest-script > mozilla:master — Details

Cameron Dawson [:camd]

Reporter

Updated

•

9 years ago

Attachment #8756576 - Flags: review?(emorley)

Ed Morley [:emorley]

Updated

•

9 years ago

Attachment #8756576 - Flags: review?(emorley) → review+

Treeherder GitHub Bugbot

Comment 8

•

9 years ago

Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/c0af32bf58e8ddb5addcede159f8d39f912be65d Bug 1266584 - Create ingest_from_pulse command script for supervisord

Cameron Dawson [:camd]

Reporter

Comment 9

•

9 years ago

OK, Fubar: so the new command would be to run ``bin/run_pulse_ingestion``

Cameron Dawson [:camd]

Reporter

Comment 10

•

9 years ago

Once bug 1275717 has been addressed, we're ready to start with this process. But the new command is: bin/run_read_pulse_jobs If you would, please ping me in IRC when you are about to start this so I can monitor new relic and stuff.

Flags: needinfo?(klibby)

Ed Morley [:emorley]

Updated

•

9 years ago

No longer blocks: 1169320

Kendall Libby [:fubar] (he/him)

Assignee

Comment 11

•

9 years ago

which set of hosts should this run on? etl nodes? processor nodes? other?

Flags: needinfo?(klibby) → needinfo?(cdawson)

Cameron Dawson [:camd]

Reporter

Comment 12

•

9 years ago

Fubar: It would be the etl nodes. Though I'm not sure we need one on every etl node. One instance would probably suffice. However, it would work just fine to run one on every ETL node. So if it's easier to do one on each, then that'll work. This also needs to be able to access https://pulseguardian.mozilla.org. So do you need a network flow for that? I tried it as a one-off from ssh and it seemed to work without a flow. Just wanted to mention.. :)

Flags: needinfo?(cdawson) → needinfo?(klibby)

Ed Morley [:emorley]

Comment 13

•

9 years ago

Will this work fine if run on multiple nodes? The celery workers (run via the existing buildapi scripts) will be fine, but this will mean multiple consumers of the pulse queue? We could instead just run on the rabbitmq node, since there's just one, and this is fairly lightweight compared to the store_pulse_jobs celery task.

Kendall Libby [:fubar] (he/him)

Assignee

Comment 14

•

9 years ago

added to puppet config to run on rabbitmq node, though if you confirm it's safe it's easy to move to the etl nodes. asked :camd to chmod 755 the script in the repo, since it's 644 and supervisord doesn't like that. :-)

Flags: needinfo?(klibby)

Cameron Dawson [:camd]

Reporter

Comment 15

•

9 years ago

Fubar: awesome, thanks. working nicely on stage. garndt and I are doing work this week to get stage to ONLY use the pulse job ingestion from Task Cluster (and stop TC from submitting them via the API. it's doing BOTH at the moment, which is working fine, too). Once we've tested that everything is working fine in the pulse-only-from-TC scenario, then we'll be ready to move to production. So this is just a heads up that the timeline window to start this supervisord process on production is looking like June 27-30th. I'm on PTO from 20-24. Would that work with your schedule? Also: are the need-info's an annoyance to you? :) I'll stop doing that, if they are.

Flags: needinfo?(klibby)

Kendall Libby [:fubar] (he/him)

Assignee

Comment 16

•

9 years ago

adding this to prod should be really simple, so any time that week should be ok. NI's are great! please keep using them! :-D

Flags: needinfo?(klibby)

Ed Morley [:emorley]

Comment 17

•

9 years ago

Adjusting summary for the rename of the script. (The script is enabled on SCL3 stage, just need a few more fixes in bug 1266229 prior to enabling on stage.)

Priority: -- → P2

Summary: Setup supervisord to run the ingest_from_pulse management command → Setup supervisord to run Pulse ingestion listening (bin/run_read_pulse_jobs) on SCL3 stage/prod

Ed Morley [:emorley]

Comment 18

•

9 years ago

s/prior to enabling on stage/prior to enabling on prod/

Cameron Dawson [:camd]

Reporter

Comment 19

•

9 years ago

OK, we are a go for turning this on at 10am PT. However, the current config for prod is using a pulse user owned by mdoglio. Let's use the one called "treeherder-prod" instead. It's owned by me. I can send you PM with the password when you're ready to make the change. It'll be: export PULSE_DATA_INGESTION_CONFIG="amqp://treeherder-prod:<pw goes here>@pulse.mozilla.org:5671/?ssl=true"

Flags: needinfo?(klibby)

Cameron Dawson [:camd]

Reporter

Comment 20

•

9 years ago

Turn on time at 10am PT July 7, that is. :)

Kendall Libby [:fubar] (he/him)

Assignee

Comment 21

•

9 years ago

new pulse credentials added (in addition to existing/old creds), new job enabled on prod, and restart-jobs script updated with new job.

Flags: needinfo?(klibby)

Cameron Dawson [:camd]

Reporter

Comment 22

•

9 years ago

Everything is working smoothly on prod. We're done here.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Cameron Dawson [:camd]

Reporter

Updated

•

9 years ago