Closed Bug 1314396 Opened 8 years ago Closed 8 years ago

pulse_actions - Performance analysis of pulse_actions

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: armenzg)

References

Details

Attachments

(1 file)

It seems that we can't keep up processing messages.
This changed in the last few days.

I've increased the number of workers (right now I tripled our cost), however, I would like to analyze why this is happening and address it.
2 minutes and 39 seconds is unnaceptable:
> Nov 01 13:43:03 pulse-actions app/worker1.1: mozci	 INFO:	 ==> We want to trigger 'Windows 7 32-bit mozilla-inbound talos other-e10s' a total of 1 time(s). 
> Nov 01 13:43:03 pulse-actions app/worker1.1: mozci	 INFO:	  
> Nov 01 13:45:42 pulse-actions app/worker1.1: mozci	 INFO:	 We found a job that finished, however, it did not produced files. 


[1]
Nov 01 13:42:57 pulse-actions app/worker1.1: root	 INFO:	 #### New user request ####. 
Nov 01 13:42:57 pulse-actions app/worker1.1: root	 INFO:	 This log was produced by https://github.com/mozilla/pulse_actions in case you want to help us out! :D 
Nov 01 13:42:58 pulse-actions app/worker1.1: treeherder_job_action	 INFO:	 Backfill action requested by jmaher@mozilla.com for 'Windows 7 32-bit mozilla-inbound talos other-e10s' 
Nov 01 13:42:58 pulse-actions app/worker1.1: treeherder_job_action	 INFO:	 Request for https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=b555a8f33edb7ac95555ab341d5b8281ae5b234f&selectedJob=38409929 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	 We're *aiming* to backfill; note that we ignore the revision that you request to backfill from (b555a8f33edb) up to 12.0 pushes (seta skip: 8; factor: 1.5) and we backfill up to the last green if found. 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	 https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=Windows 7 32-bit mozilla-inbound talos other-e10s&tochange=54a28d6fbed4db85e0f40ebeaf52d563c8226509&fromchange=0a1486cf11b228e41d27499b79b1e3ea6be3e28f 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	 We want to have 1 job(s) of Windows 7 32-bit mozilla-inbound talos other-e10s on the following revisions:  
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - 0a1486cf11b228e41d27499b79b1e3ea6be3e28f 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - 33c2241696c07a203f5355771a40a8c28a0767b5 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - 7b5270bac41b49133cb44af4851f120c1de7cdd4 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - b62b6568d65e63bd2f514f7a5099135065431123 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - 5c834894e4b46cf628237789b900996e6525a729 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - c6587fbc2dbe99f0fabc25454dbe8a02e1eeccf4 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - 72764ba31b81562e3069f6935da33cae06fe6b00 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - 8c08e1aca9ea4953d610ead9b7d1cd4eb0d822d3 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - 2777fabbe62a3a55a3514d724303977bac7260fb 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - b171726c465e305411429b29012f48e890598b3f 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - ac4628bf99126783da833e1641d673c92ee4b1a7 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  - 54a28d6fbed4db85e0f40ebeaf52d563c8226509 
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:43:00 pulse-actions app/worker1.1: mozci	 INFO:	 === 0a1486cf11b228e41d27499b79b1e3ea6be3e28f === 
Nov 01 13:43:01 pulse-actions app/worker1.1: mozci	 INFO:	 We have found 0 potential job(s) matching 'Windows 7 32-bit mozilla-inbound talos other-e10s' on 0a1486cf11b228e41d27499b79b1e3ea6be3e28f. We need to trigger more. 
Nov 01 13:43:01 pulse-actions app/worker1.1: buildapi	 INFO:	 We're going to re-trigger an existing completed job with request_id: 129750737 1 time(s). 
Nov 01 13:43:01 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:43:01 pulse-actions app/worker1.1: mozci	 INFO:	 === 33c2241696c07a203f5355771a40a8c28a0767b5 === 
Nov 01 13:43:03 pulse-actions app/worker1.1: mozci	 INFO:	 We have found 0 potential job(s) matching 'Windows 7 32-bit mozilla-inbound talos other-e10s' on 33c2241696c07a203f5355771a40a8c28a0767b5. We need to trigger more. 
Nov 01 13:43:03 pulse-actions app/worker1.1: buildapi	 INFO:	 We're going to re-trigger an existing completed job with request_id: 129750707 1 time(s). 
Nov 01 13:43:03 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:43:03 pulse-actions app/worker1.1: mozci	 INFO:	 === 7b5270bac41b49133cb44af4851f120c1de7cdd4 === 
Nov 01 13:43:03 pulse-actions app/worker1.1: mozci	 INFO:	 We have found 0 potential job(s) matching 'Windows 7 32-bit mozilla-inbound talos other-e10s' on 7b5270bac41b49133cb44af4851f120c1de7cdd4. We need to trigger more. 
Nov 01 13:43:03 pulse-actions app/worker1.1: mozci	 INFO:	 ==> We want to trigger 'Windows 7 32-bit mozilla-inbound talos other-e10s' a total of 1 time(s). 
Nov 01 13:43:03 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:45:42 pulse-actions app/worker1.1: mozci	 INFO:	 We found a job that finished, however, it did not produced files. 
Nov 01 13:45:42 pulse-actions app/worker1.1: mozci	 INFO:	 Status of job: failure 
Nov 01 13:45:42 pulse-actions app/worker1.1: mozci	 INFO:	 The build job WINNT 5.2 mozilla-inbound build failed on revision 7b5270bac41b49133cb44af4851f120c1de7cdd4 without generating the necessary files. We will not trigger anything. 
Nov 01 13:45:42 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:45:42 pulse-actions app/worker1.1: mozci	 INFO:	 === b62b6568d65e63bd2f514f7a5099135065431123 === 
Nov 01 13:45:43 pulse-actions app/worker1.1: mozci	 INFO:	 We have 1 job(s) for 'Windows 7 32-bit mozilla-inbound talos other-e10s' which is enough for the 1 job(s) we want. 
Nov 01 13:45:43 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:45:43 pulse-actions app/worker1.1: mozci	 INFO:	 === 5c834894e4b46cf628237789b900996e6525a729 === 
Nov 01 13:45:43 pulse-actions app/worker1.1: mozci	 INFO:	 We have 1 job(s) for 'Windows 7 32-bit mozilla-inbound talos other-e10s' which is enough for the 1 job(s) we want. 
Nov 01 13:45:43 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:45:43 pulse-actions app/worker1.1: mozci	 INFO:	 === c6587fbc2dbe99f0fabc25454dbe8a02e1eeccf4 === 
Nov 01 13:45:44 pulse-actions app/worker1.1: mozci	 INFO:	 We have 1 job(s) for 'Windows 7 32-bit mozilla-inbound talos other-e10s' which is enough for the 1 job(s) we want. 
Nov 01 13:45:44 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:45:44 pulse-actions app/worker1.1: mozci	 INFO:	 === 72764ba31b81562e3069f6935da33cae06fe6b00 === 
Nov 01 13:45:45 pulse-actions app/worker1.1: mozci	 INFO:	 We have found 0 potential job(s) matching 'Windows 7 32-bit mozilla-inbound talos other-e10s' on 72764ba31b81562e3069f6935da33cae06fe6b00. We need to trigger more. 
Nov 01 13:45:45 pulse-actions app/worker1.1: mozci	 INFO:	 ==> We want to trigger 'Windows 7 32-bit mozilla-inbound talos other-e10s' a total of 1 time(s). 
Nov 01 13:45:45 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:45:46 pulse-actions app/worker1.1: mozci	 INFO:	 We found a job that finished, however, it did not produced files. 
Nov 01 13:45:46 pulse-actions app/worker1.1: mozci	 INFO:	 Status of job: failure 
Nov 01 13:45:46 pulse-actions app/worker1.1: mozci	 INFO:	 The build job WINNT 5.2 mozilla-inbound build failed on revision 72764ba31b81562e3069f6935da33cae06fe6b00 without generating the necessary files. We will not trigger anything. 
Nov 01 13:45:46 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:45:46 pulse-actions app/worker1.1: mozci	 INFO:	 === 8c08e1aca9ea4953d610ead9b7d1cd4eb0d822d3 === 
Nov 01 13:45:47 pulse-actions app/worker1.1: mozci	 INFO:	 We have 1 job(s) for 'Windows 7 32-bit mozilla-inbound talos other-e10s' which is enough for the 1 job(s) we want. 
Nov 01 13:45:47 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:45:47 pulse-actions app/worker1.1: mozci	 INFO:	 === 2777fabbe62a3a55a3514d724303977bac7260fb === 
Nov 01 13:45:47 pulse-actions app/worker1.1: mozci	 INFO:	 We have 1 job(s) for 'Windows 7 32-bit mozilla-inbound talos other-e10s' which is enough for the 1 job(s) we want. 
Nov 01 13:45:47 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:45:47 pulse-actions app/worker1.1: mozci	 INFO:	 === b171726c465e305411429b29012f48e890598b3f === 
Nov 01 13:45:48 pulse-actions app/worker1.1: mozci	 INFO:	 We have 6 job(s) for 'Windows 7 32-bit mozilla-inbound talos other-e10s' which is enough for the 1 job(s) we want. 
Nov 01 13:45:48 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:45:48 pulse-actions app/worker1.1: mozci	 INFO:	 === ac4628bf99126783da833e1641d673c92ee4b1a7 === 
Nov 01 13:45:48 pulse-actions app/worker1.1: mozci	 INFO:	 We have 2 job(s) for 'Windows 7 32-bit mozilla-inbound talos other-e10s' which is enough for the 1 job(s) we want. 
Nov 01 13:45:49 pulse-actions app/worker1.1: mozci	 INFO:	  
Nov 01 13:45:49 pulse-actions app/worker1.1: mozci	 INFO:	 === 54a28d6fbed4db85e0f40ebeaf52d563c8226509 === 
Nov 01 13:45:49 pulse-actions app/worker1.1: mozci	 INFO:	 We have 1 job(s) for 'Windows 7 32-bit mozilla-inbound talos other-e10s' which is enough for the 1 job(s) we want. 
Nov 01 13:45:49 pulse-actions app/worker1.1: root	 WARNING:	 The handler did not give us an exit_code 
Nov 01 13:45:49 pulse-actions app/worker1.1: root	 INFO:	 Message {u'action': u'backfill', u'job_id': 38409929, u'requester': u'jmaher@mozilla.com', u'version': 1, u'project': u'mozilla-inbound', u'build_system_type': u'buildbot', u'job_guid': u'c74adb69274fad5d88b93f673ff25ea3b317c7da'}, took 172 seconds to execute 
Nov 01 13:45:50 pulse-actions app/worker1.1: root	 INFO:	 Log uploaded to https://tc-gp-public-31d.s3-us-west-2.amazonaws.com/ateam/pulse-action-dev/196ac91e-c0cf-4a78-a06e-d13d82e04d67 
Nov 01 13:45:50 pulse-actions app/worker1.1: root	 INFO:	 Created Treeherder 'Sch' job. 
Nov 01 13:45:50 pulse-actions app/worker1.1: root	 INFO:	 #### End of user request ####.
The number of workers processing the requests seems to be irrelevant. I brought the number down to 2 workers since we can have workers without processing pulse messages.
See worker 1.3 [1], it did not process any messages for 6 hours.

I thought that we can have many workers listening to the same queue.

rail, have you faced something similar before?
Have you noticed any issues with Pulse in the last week?

[1]
Nov 01 11:12:04 pulse-actions heroku/worker1.3: Restarting 
Nov 01 11:12:04 pulse-actions heroku/worker1.3: State changed from up to starting 
Nov 01 11:12:06 pulse-actions heroku/worker1.3: Stopping all processes with SIGTERM 
Nov 01 11:12:07 pulse-actions heroku/worker1.3: Process exited with status 143 
Nov 01 11:12:26 pulse-actions heroku/worker1.3: Starting process with command `python pulse_actions/worker.py --config-file configs/manual_requests.json --memory-saving` 
Nov 01 11:12:27 pulse-actions heroku/worker1.3: State changed from starting to up 
Nov 01 11:12:30 pulse-actions app/worker1.3: root	 INFO:	 Console output logs INFO level messages. 
Nov 01 11:12:30 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/job-actions, buildbot.#.backfill) 
Nov 01 11:12:30 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-actions, #.#) 
Nov 01 11:12:30 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-runnable-job-actions, #) 
Nov 01 11:14:06 pulse-actions app/worker1.3: root	 INFO:	 Message {u'_meta': {u'sent': u'2016-11-01T11:14:02+01:00', u'exchange': u'exchange/build/normalized', u'routing_key': u'build.mo 
Nov 01 11:16:07 pulse-actions heroku/worker1.3: State changed from up to down 
Nov 01 11:16:10 pulse-actions heroku/worker1.3: Stopping all processes with SIGTERM 
Nov 01 11:16:10 pulse-actions heroku/worker1.3: Process exited with status 143 
Nov 01 14:00:12 pulse-actions heroku/worker1.3: Starting process with command `python pulse_actions/worker.py --config-file configs/manual_requests.json --memory-saving` 
Nov 01 14:00:13 pulse-actions heroku/worker1.3: State changed from starting to up 
Nov 01 14:00:17 pulse-actions app/worker1.3: root	 INFO:	 Console output logs INFO level messages. 
Nov 01 14:00:18 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/job-actions, buildbot.#.backfill) 
Nov 01 14:00:18 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-runnable-job-actions, #) 
Nov 01 14:00:18 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-actions, #.#) 
Nov 01 14:24:18 pulse-actions heroku/worker1.3: Restarting 
Nov 01 14:24:18 pulse-actions heroku/worker1.3: State changed from up to starting 
Nov 01 14:24:20 pulse-actions heroku/worker1.3: Stopping all processes with SIGTERM 
Nov 01 14:24:21 pulse-actions heroku/worker1.3: Process exited with status 143 
Nov 01 14:24:22 pulse-actions heroku/worker1.3: Starting process with command `python pulse_actions/worker.py --config-file configs/manual_requests.json --memory-saving` 
Nov 01 14:24:22 pulse-actions heroku/worker1.3: State changed from starting to up 
Nov 01 14:24:25 pulse-actions app/worker1.3: root	 INFO:	 Console output logs INFO level messages. 
Nov 01 14:24:25 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/job-actions, buildbot.#.backfill) 
Nov 01 14:24:25 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-actions, #.#) 
Nov 01 14:24:25 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-runnable-job-actions, #) 
Nov 01 14:32:09 pulse-actions app/worker1.3: root	 INFO:	 Message {u'times': 1, u'version': 1, u'action': u'cancel_all', u'requester': u'jorgk@jorgk.com', u'resultset_id': u'2223', u'pro 
Nov 01 14:35:38 pulse-actions heroku/worker1.3: Restarting 
Nov 01 14:35:38 pulse-actions heroku/worker1.3: State changed from up to starting 
Nov 01 14:35:42 pulse-actions heroku/worker1.3: Stopping all processes with SIGTERM 
Nov 01 14:35:42 pulse-actions heroku/worker1.3: Process exited with status 143 
Nov 01 14:35:44 pulse-actions heroku/worker1.3: Starting process with command `python pulse_actions/worker.py --config-file configs/manual_requests.json --memory-saving` 
Nov 01 14:35:45 pulse-actions heroku/worker1.3: State changed from starting to up 
Nov 01 14:35:47 pulse-actions heroku/worker1.3: State changed from up to down 
Nov 01 14:35:48 pulse-actions app/worker1.3: root	 INFO:	 Console output logs INFO level messages. 
Nov 01 14:35:48 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-runnable-job-actions, #) 
Nov 01 14:35:48 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/job-actions, buildbot.#.backfill) 
Nov 01 14:35:48 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-actions, #.#) 
Nov 01 14:35:49 pulse-actions app/worker1.3: root	 INFO:	 #### New user request ####. 
Nov 01 14:35:49 pulse-actions app/worker1.3: root	 INFO:	 This log was produced by https://github.com/mozilla/pulse_actions in case you want to help us out! :D 
Nov 01 14:35:50 pulse-actions app/worker1.3: treeherder_job_action	 INFO:	 Backfill action requested by jmaher@mozilla.com for 'Windows 7 32-bit autoland talos other-e10s' 
Nov 01 14:35:50 pulse-actions app/worker1.3: treeherder_job_action	 INFO:	 Request for https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=aadb57a9ddf83910a604f6be3b787fee661249db&selectedJob=5887512 
Nov 01 14:35:51 pulse-actions heroku/worker1.3: Starting process with command `python pulse_actions/worker.py --config-file configs/manual_requests.json --memory-saving` 
Nov 01 14:35:51 pulse-actions heroku/worker1.3: Stopping all processes with SIGTERM 
Nov 01 14:35:51 pulse-actions heroku/worker1.3: State changed from starting to up 
Nov 01 14:35:52 pulse-actions heroku/worker1.3: Process exited with status 143 
Nov 01 14:35:53 pulse-actions app/worker1.3: root	 INFO:	 Console output logs INFO level messages. 
Nov 01 14:35:53 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-runnable-job-actions, #) 
Nov 01 14:35:53 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-actions, #.#) 
Nov 01 14:35:53 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/job-actions, buildbot.#.backfill) 
Nov 01 16:19:19 pulse-actions heroku/worker1.3: Restarting 
Nov 01 16:19:19 pulse-actions heroku/worker1.3: State changed from up to starting 
Nov 01 16:19:23 pulse-actions heroku/worker1.3: Stopping all processes with SIGTERM 
Nov 01 16:19:23 pulse-actions app/worker1.3: root	 INFO:	 #### New user request ####. 
Nov 01 16:19:23 pulse-actions app/worker1.3: root	 INFO:	 This log was produced by https://github.com/mozilla/pulse_actions in case you want to help us out! :D 
Nov 01 16:19:23 pulse-actions app/worker1.3: treeherder_push_action	 INFO:	 trigger_missing_jobs action requested by philringnalda@gmail.com on repo_name autoland with resultset_id: 3753 
Nov 01 16:19:24 pulse-actions heroku/worker1.3: Process exited with status 143 
Nov 01 16:19:25 pulse-actions heroku/worker1.3: Starting process with command `python pulse_actions/worker.py --config-file configs/manual_requests.json --memory-saving` 
Nov 01 16:19:26 pulse-actions heroku/worker1.3: State changed from starting to up 
Nov 01 16:19:30 pulse-actions app/worker1.3: root	 INFO:	 Console output logs DEBUG level messages. 
Nov 01 16:19:30 pulse-actions app/worker1.3: mozci	 DEBUG:	 Disabling validations. 
Nov 01 16:19:30 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/job-actions, buildbot.#.backfill) 
Nov 01 16:19:30 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-actions, #.#) 
Nov 01 16:19:30 pulse-actions app/worker1.3: replay.replay	 INFO:	 Listening to (exchange/treeherder/v1/resultset-runnable-job-actions, #) 
Nov 01 17:39:40 pulse-actions heroku/worker1.3: State changed from up to down 
Nov 01 17:39:44 pulse-actions heroku/worker1.3: Stopping all processes with SIGTERM 
Nov 01 17:39:45 pulse-actions heroku/worker1.3: Process exited with status 143
Keep in mind that people might re-trigger the pulse_actions job also multiple times because they do not appear on Treeherder now. With submitting the pending status we might gain a lot. I filed bug 1314623 for that.
See Also: → 1314623
Depends on: 1314930
This problem is multi dimensional. All steaming from either bugs that were hard to figure out without spending a lot of time or a not good enough design (increasing number of workers instead of dispatching to TaskCluster - dynamic scaling).

* The load and usage might have been increasing for a while and we've reached a tipping point
* Heroku workers could have become slower
** I don't see any metrics being fed into New Relic even though the app seems to be connected
* Having multiple consumers is not working as expected
** Increasing the number of workers does not mean that all will be consuming
** It could be that I'm doing something silly and only work well when there's not much in the queue
** I will look into using what rail recommends [1]
* Beefing up the Heroku workers barely has any impact because:
** We're currently using Buildapi/buildjson for job status (TH lacks some data) which is network and disk bound
** See attachment to see that half the time of a backfill request is spent on network and json manipulation
* Change mozci to do operation concurrently
* Schedule some actions via TaskCluster to reduce calls on Buildapi (reduce tip risk when increasing number of workers)
** e.g. talos jobs for build

Ideas to try:
* Try rail's suggestion:
> The client fetches only one message from the queue, works on it, acks or rejects, then fetches another one.
> By default it may fetch multiple messages and steal them from other clients. If this client is slow, it will
> slow down the whole process
> I don't know how to set the QoS in mozillapulse, https://wiki.mozilla.org/Auto-tools/Projects/Pulse talks
> about prefetch count, but it's not clear how to set it using mozillapulse 
> I use plain kombu http://kombu.readthedocs.io/en/latest/reference/kombu.mixins.html#kombu.mixins.ConsumerMixin this one
> It gives you access to the channel object, where you can set the QoS
* Instead of writing buildjson files to disk and then reading back; uncompress on the fly
** Unfortunately we have memory restrictions on the Heroku instances
* Switch completely away from Buildapi/buildjson (memory and performance gain)
** We need to fix the dependent bug
* Improve the performance of each type of request
** Take one by one all

[1] https://github.com/rail/funsize/blob/master/funsize/worker.py#L101
Summary: Performance analysis of pulse_actions → pulse_actions - Performance analysis of pulse_actions
So far I've determined that pulse_actions is processing the Pulse messages in the queue but it is *not* acknowledging the messages.
I've also noticed that we can't have two consumers.
Until I kill one the other one will *not* see any messages to be consumed.
It seems I inadvertedly introduced a regression [1] that caused the acknowledge global flag to be set to False (*facepalm*)
I fixed this in [2] It's a brittle and complicated piece of code that needs to be redone.

While the backlog clears, I will check if using multiple consumers works as expected.

[1] https://github.com/mozilla/pulse_actions/commit/f5098052892f90ff15bd94d394327ab905448983
[2] https://github.com/mozilla/pulse_actions/commit/2022354f8bd6661835e80b20caf1ea1387fd9996
This is now fixed and queue has cleared.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: