Setup supervisord to run Pulse ingestion listening (bin/run_read_pulse_jobs) on SCL3 stage/prod

RESOLVED FIXED

Status

Tree Management
Treeherder: Data Ingestion
P2
normal
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: camd, Assigned: fubar)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
This management command will process messages on a set of pulse exchanges and ingest the data as jobs into treeherder.

The configuration for this command is all in settings.py, so no arguments are needed.

The command would simply be ``./mange.py ingest_from_pulse``

This process should not be activated, however, until bug 1266229 is closed.
(Reporter)

Updated

2 years ago
Depends on: 1266229
(Reporter)

Updated

2 years ago
Assignee: nobody → klibby
(Reporter)

Updated

2 years ago
Summary: Setup machine to run the ingest_from_pulse management command → Setup supervisord to run the ingest_from_pulse management command

Comment 1

2 years ago
Why not celerybeat or a normal queue, out of curiosity? :-)
(Reporter)

Comment 2

2 years ago
Well, this may be (probably is) my own ignorance.  But I listen to the Queue with the Kombu ``consumer.run()`` which just keeps listening to anything new that shows up.  But that calls a ``consume`` function where it will consume a limited number of messages to process at one time.

Looking at it, I think I could just make this a celery beat process that reads N messages every N seconds.  Or, hopefully, there's a way to have it read till it's empty, then let celerybeat restart it after N seconds to look again.  I'll see if I can get it working that way locally.

Comment 3

2 years ago
Ah it's just permanently listening (it's been ages since I looked at the code that landed in bug 1169320, 7 months in fact, wow time flies!). Then celery may not be the way to go anyway :-)
Blocks: 1169320
(Reporter)

Comment 4

a year ago
Once bug 1275717 is resolved, then this should be able to work.
Depends on: 1275717
I think we should create a script in the bin/ directory which gets run by supervisord. That way we can tweak ourselves. (The command will need to source the environment variables, use the newrelic wrapper etc like the other scripts)
(Reporter)

Comment 6

a year ago
You're right, as usual, Ed.  :)  OK, yeah, I'll make that now...

Comment 7

a year ago
Created attachment 8756576 [details] [review]
[treeherder] mozilla:pulse-ingest-script > mozilla:master
(Reporter)

Updated

a year ago
Attachment #8756576 - Flags: review?(emorley)

Updated

a year ago
Attachment #8756576 - Flags: review?(emorley) → review+

Comment 8

a year ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/c0af32bf58e8ddb5addcede159f8d39f912be65d
Bug 1266584 - Create ingest_from_pulse command script for supervisord
(Reporter)

Comment 9

a year ago
OK, Fubar: so the new command would be to run ``bin/run_pulse_ingestion``
(Reporter)

Comment 10

a year ago
Once bug 1275717 has been addressed, we're ready to start with this process.  But the new command is:

    bin/run_read_pulse_jobs

If you would, please ping me in IRC when you are about to start this so I can monitor new relic and stuff.
Flags: needinfo?(klibby)

Updated

a year ago
No longer blocks: 1169320
(Assignee)

Comment 11

a year ago
which set of hosts should this run on? etl nodes? processor nodes? other?
Flags: needinfo?(klibby) → needinfo?(cdawson)
(Reporter)

Comment 12

a year ago
Fubar:  It would be the etl nodes.  Though I'm not sure we need one on every etl node.  One instance would probably suffice.  However, it would work just fine to run one on every ETL node.  So if it's easier to do one on each, then that'll work.

This also needs to be able to access https://pulseguardian.mozilla.org.  So do you need a network flow for that?  I tried it as a one-off from ssh and it seemed to work without a flow.  Just wanted to mention.. :)
Flags: needinfo?(cdawson) → needinfo?(klibby)
Will this work fine if run on multiple nodes? The celery workers (run via the existing buildapi scripts) will be fine, but this will mean multiple consumers of the pulse queue?

We could instead just run on the rabbitmq node, since there's just one, and this is fairly lightweight compared to the store_pulse_jobs celery task.
(Assignee)

Comment 14

a year ago
added to puppet config to run on rabbitmq node, though if you confirm it's safe it's easy to move to the etl nodes.

asked :camd to chmod 755 the script in the repo, since it's 644 and supervisord doesn't like that. :-)
Flags: needinfo?(klibby)
(Reporter)

Comment 15

a year ago
Fubar: awesome, thanks.  working nicely on stage.  garndt and I are doing work this week to get stage to ONLY use the pulse job ingestion from Task Cluster (and stop TC from submitting them via the API.  it's doing BOTH at the moment, which is working fine, too).  Once we've tested that everything is working fine in the pulse-only-from-TC scenario, then we'll be ready to move to production.

So this is just a heads up that the timeline window to start this supervisord process on production is looking like June 27-30th.  I'm on PTO from 20-24.

Would that work with your schedule?  

Also: are the need-info's an annoyance to you?  :)  I'll stop doing that, if they are.
Flags: needinfo?(klibby)
(Assignee)

Comment 16

a year ago
adding this to prod should be really simple, so any time that week should be ok. NI's are great! please keep using them! :-D
Flags: needinfo?(klibby)
Adjusting summary for the rename of the script. (The script is enabled on SCL3 stage, just need a few more fixes in bug 1266229 prior to enabling on stage.)
Priority: -- → P2
Summary: Setup supervisord to run the ingest_from_pulse management command → Setup supervisord to run Pulse ingestion listening (bin/run_read_pulse_jobs) on SCL3 stage/prod
s/prior to enabling on stage/prior to enabling on prod/
(Reporter)

Comment 19

a year ago
OK, we are a go for turning this on at 10am PT.

However, the current config for prod is using a pulse user owned by mdoglio.  Let's use the one called "treeherder-prod" instead.  It's owned by me.

I can send you PM with the password when you're ready to make the change.  It'll be:

export PULSE_DATA_INGESTION_CONFIG="amqp://treeherder-prod:<pw goes here>@pulse.mozilla.org:5671/?ssl=true"
Flags: needinfo?(klibby)
(Reporter)

Comment 20

a year ago
Turn on time at 10am PT July 7, that is.  :)
(Assignee)

Comment 21

a year ago
new pulse credentials added (in addition to existing/old creds), new job enabled on prod, and restart-jobs script updated with new job.
Flags: needinfo?(klibby)
(Reporter)

Comment 22

a year ago
Everything is working smoothly on prod.  We're done here.
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
(Reporter)

Updated

a year ago
See Also: → bug 1297208
You need to log in before you can comment on or make changes to this bug.