If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Investigate using prefork scheduling instead of gevent scheduling for log parsing

RESOLVED FIXED

Status

Tree Management
Treeherder: Data Ingestion
P2
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: wlach, Assigned: mdoglio)

Tracking

Details

Attachments

(1 attachment)

We are currently using the gevent scheduler for log parsing:

https://github.com/mozilla/treeherder-service/blob/master/deployment/supervisord/worker_node.conf#L14

Since log parsing is a mostly CPU-bound task (I think?) it might be worth investigating using prefork scheduling for that (which should ensure multiple python processes can work in parallel). 

According to :fubar we're currently saturating one of the CPU's on our logparsing vm's while the other sits idle, which is a strong indication to me that the prefork model would work better.

At the moment it seems like our infrastructure is mostly capable of keeping up with what's coming at us, but maybe something to keep in mind for the future.

Updated

3 years ago
Blocks: 1074927
OS: Linux → All
Priority: -- → P3
Hardware: x86_64 → All

Comment 1

3 years ago
We're struggling to catch up with a backlog at the moment (caused by bug 1125124), whilst processor[1-3] remain at < 50% cpu usage :-/
Priority: P3 → P2
(Assignee)

Comment 2

3 years ago
The change required here is really minimal, I would give it a try on stage.
Basically we need to remove the -P gevent parameter here and tune the concurrency parameter 
https://github.com/mozilla/treeherder-service/blob/master/bin/run_celery_worker_gevent#L29
(Assignee)

Updated

3 years ago
Assignee: nobody → mdoglio
Status: NEW → ASSIGNED
(Assignee)

Comment 3

3 years ago
Created attachment 8557896 [details] [review]
Github PR #364 on treeherder-service
(Assignee)

Updated

3 years ago
Attachment #8557896 - Flags: review?(wlachance)
Comment on attachment 8557896 [details] [review]
Github PR #364 on treeherder-service

Some minor comments/thoughts but overall I'd be happy to see this go in as-is.
Attachment #8557896 - Flags: review?(wlachance) → review+

Comment 5

3 years ago
Commit pushed to master at https://github.com/mozilla/treeherder-service

https://github.com/mozilla/treeherder-service/commit/fd9eb1760325faf913ab07ccdffcdd9210f7fb69
Bug 1123479 - add startup script for a prefork-based log parser
(Assignee)

Comment 6

3 years ago
:fubar can you please update the supervisord configuration on staging to point to the new script for the log processor worker?
It should change from bin/run_celery_worker_gevent to bin/run_celery_worker_log_parser
Once this is verified working on staging we can apply the change on production as well.
After that we can remove run_celery_worker_gevent from the repository.
Flags: needinfo?(klibby)
Commited in r99952; pushed changes to puppet and staging log processors.

[treeherder-processor1.stage.private.scl3.mozilla.com] out: run_celery_worker_log_parser     RUNNING    pid 23984, uptime 0:00:31
[treeherder-processor2.stage.private.scl3.mozilla.com] out: run_celery_worker_log_parser     RUNNING    pid 26391, uptime 0:00:32
[treeherder-processor3.stage.private.scl3.mozilla.com] out: run_celery_worker_log_parser     RUNNING    pid 23726, uptime 0:00:31
Flags: needinfo?(klibby)

Comment 8

3 years ago
I've just done a prod push, so we should be ready to make the change to prod now too.
We'll also need to update the restart-jobs script to reference the new name.

Many thanks :-)
Flags: needinfo?(klibby)

Comment 9

3 years ago
Cancel that request for now, we've had to roll back the prod push for unrelated reasons (bug 1097090 comment 19).
Flags: needinfo?(klibby)
fubar, we're ready to make the change in comment 6 to production too now.
We'll also need to update the restart-jobs script to reference the new name.
After that we can remove bin/run_celery_worker_gevent from the repo.
Thanks! :-)
Flags: needinfo?(klibby)
Changed job name on prod and updated restart-jobs script in puppet and deployed. I've left the old logs around for the moment in case we want them, but feel free to remove if you like (or pester me later).
Flags: needinfo?(klibby)

Updated

3 years ago
Depends on: 1140850
(In reply to Kendall Libby [:fubar] from comment #11)
> Changed job name on prod and updated restart-jobs script in puppet and
> deployed. I've left the old logs around for the moment in case we want them,
> but feel free to remove if you like (or pester me later).

That's great - thank you :-)
I've filed bug 1140850 for removing the file from the repo / checking dev is also updated etc.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED

Updated

3 years ago
Blocks: 1140882

Updated

3 years ago
Duplicate of this bug: 1125124
You need to log in before you can comment on or make changes to this bug.