Closed Bug 1123479 Opened 9 years ago Closed 9 years ago

Investigate using prefork scheduling instead of gevent scheduling for log parsing

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: mdoglio)

References

Details

Attachments

(1 file)

We are currently using the gevent scheduler for log parsing:

https://github.com/mozilla/treeherder-service/blob/master/deployment/supervisord/worker_node.conf#L14

Since log parsing is a mostly CPU-bound task (I think?) it might be worth investigating using prefork scheduling for that (which should ensure multiple python processes can work in parallel). 

According to :fubar we're currently saturating one of the CPU's on our logparsing vm's while the other sits idle, which is a strong indication to me that the prefork model would work better.

At the moment it seems like our infrastructure is mostly capable of keeping up with what's coming at us, but maybe something to keep in mind for the future.
Blocks: 1074927
OS: Linux → All
Priority: -- → P3
Hardware: x86_64 → All
We're struggling to catch up with a backlog at the moment (caused by bug 1125124), whilst processor[1-3] remain at < 50% cpu usage :-/
Priority: P3 → P2
The change required here is really minimal, I would give it a try on stage.
Basically we need to remove the -P gevent parameter here and tune the concurrency parameter 
https://github.com/mozilla/treeherder-service/blob/master/bin/run_celery_worker_gevent#L29
Assignee: nobody → mdoglio
Status: NEW → ASSIGNED
Attachment #8557896 - Flags: review?(wlachance)
Comment on attachment 8557896 [details] [review]
Github PR #364 on treeherder-service

Some minor comments/thoughts but overall I'd be happy to see this go in as-is.
Attachment #8557896 - Flags: review?(wlachance) → review+
:fubar can you please update the supervisord configuration on staging to point to the new script for the log processor worker?
It should change from bin/run_celery_worker_gevent to bin/run_celery_worker_log_parser
Once this is verified working on staging we can apply the change on production as well.
After that we can remove run_celery_worker_gevent from the repository.
Flags: needinfo?(klibby)
Commited in r99952; pushed changes to puppet and staging log processors.

[treeherder-processor1.stage.private.scl3.mozilla.com] out: run_celery_worker_log_parser     RUNNING    pid 23984, uptime 0:00:31
[treeherder-processor2.stage.private.scl3.mozilla.com] out: run_celery_worker_log_parser     RUNNING    pid 26391, uptime 0:00:32
[treeherder-processor3.stage.private.scl3.mozilla.com] out: run_celery_worker_log_parser     RUNNING    pid 23726, uptime 0:00:31
Flags: needinfo?(klibby)
I've just done a prod push, so we should be ready to make the change to prod now too.
We'll also need to update the restart-jobs script to reference the new name.

Many thanks :-)
Flags: needinfo?(klibby)
Cancel that request for now, we've had to roll back the prod push for unrelated reasons (bug 1097090 comment 19).
Flags: needinfo?(klibby)
fubar, we're ready to make the change in comment 6 to production too now.
We'll also need to update the restart-jobs script to reference the new name.
After that we can remove bin/run_celery_worker_gevent from the repo.
Thanks! :-)
Flags: needinfo?(klibby)
Changed job name on prod and updated restart-jobs script in puppet and deployed. I've left the old logs around for the moment in case we want them, but feel free to remove if you like (or pester me later).
Flags: needinfo?(klibby)
Depends on: 1140850
(In reply to Kendall Libby [:fubar] from comment #11)
> Changed job name on prod and updated restart-jobs script in puppet and
> deployed. I've left the old logs around for the moment in case we want them,
> but feel free to remove if you like (or pester me later).

That's great - thank you :-)
I've filed bug 1140850 for removing the file from the repo / checking dev is also updated etc.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 1140882
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: