We are currently using the gevent scheduler for log parsing: https://github.com/mozilla/treeherder-service/blob/master/deployment/supervisord/worker_node.conf#L14 Since log parsing is a mostly CPU-bound task (I think?) it might be worth investigating using prefork scheduling for that (which should ensure multiple python processes can work in parallel). According to :fubar we're currently saturating one of the CPU's on our logparsing vm's while the other sits idle, which is a strong indication to me that the prefork model would work better. At the moment it seems like our infrastructure is mostly capable of keeping up with what's coming at us, but maybe something to keep in mind for the future.
We're struggling to catch up with a backlog at the moment (caused by bug 1125124), whilst processor[1-3] remain at < 50% cpu usage :-/
The change required here is really minimal, I would give it a try on stage. Basically we need to remove the -P gevent parameter here and tune the concurrency parameter https://github.com/mozilla/treeherder-service/blob/master/bin/run_celery_worker_gevent#L29
Comment on attachment 8557896 [details] [review] Github PR #364 on treeherder-service Some minor comments/thoughts but overall I'd be happy to see this go in as-is.
Commit pushed to master at https://github.com/mozilla/treeherder-service https://github.com/mozilla/treeherder-service/commit/fd9eb1760325faf913ab07ccdffcdd9210f7fb69 Bug 1123479 - add startup script for a prefork-based log parser
:fubar can you please update the supervisord configuration on staging to point to the new script for the log processor worker? It should change from bin/run_celery_worker_gevent to bin/run_celery_worker_log_parser Once this is verified working on staging we can apply the change on production as well. After that we can remove run_celery_worker_gevent from the repository.
Commited in r99952; pushed changes to puppet and staging log processors. [treeherder-processor1.stage.private.scl3.mozilla.com] out: run_celery_worker_log_parser RUNNING pid 23984, uptime 0:00:31 [treeherder-processor2.stage.private.scl3.mozilla.com] out: run_celery_worker_log_parser RUNNING pid 26391, uptime 0:00:32 [treeherder-processor3.stage.private.scl3.mozilla.com] out: run_celery_worker_log_parser RUNNING pid 23726, uptime 0:00:31
I've just done a prod push, so we should be ready to make the change to prod now too. We'll also need to update the restart-jobs script to reference the new name. Many thanks :-)
Cancel that request for now, we've had to roll back the prod push for unrelated reasons (bug 1097090 comment 19).
fubar, we're ready to make the change in comment 6 to production too now. We'll also need to update the restart-jobs script to reference the new name. After that we can remove bin/run_celery_worker_gevent from the repo. Thanks! :-)
Changed job name on prod and updated restart-jobs script in puppet and deployed. I've left the old logs around for the moment in case we want them, but feel free to remove if you like (or pester me later).
(In reply to Kendall Libby [:fubar] from comment #11) > Changed job name on prod and updated restart-jobs script in puppet and > deployed. I've left the old logs around for the moment in case we want them, > but feel free to remove if you like (or pester me later). That's great - thank you :-) I've filed bug 1140850 for removing the file from the repo / checking dev is also updated etc.