Closed
Bug 1310972
Opened 8 years ago
Closed 8 years ago
Remove as much use of datasource as possible from the log parsing tasks and autoclassification code
Categories
(Tree Management :: Treeherder, defect)
Tree Management
Treeherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jgraham, Assigned: jgraham)
References
Details
Attachments
(1 file)
Now that we have information about jobs in the main database we can start using this directly for orchestrating the log parsing tasks, rather than using the repository / project specific id / job guid information.
Comment 1•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Attachment #8802060 -
Flags: review?(wlachance)
Comment 2•8 years ago
|
||
Comment on attachment 8802060 [details] [review] [treeherder] mozilla:log_tasks_job_id > mozilla:master Sorry, meant to r+ this earlier.
Attachment #8802060 -
Flags: review?(wlachance) → review+
Comment 3•8 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/d38acb67ab7e1685ef3404859a59683cbca075f0 Bug 1310972 - Remove parse_job_logs function. We now parse logs one at a time, so this isn't used. https://github.com/mozilla/treeherder/commit/3a7b5dc651c643accdae7319890bef6b24b679c9 Bug 1310972 - Use job id as input to log parsing tasks Now that log information is in the main treeherder database use the id information there as the input to the log parsing and autoclassification tasks. As part of the same change there is a mild refactor of the autoclassification related tasks to not go through the call_command infrastructure as this is just unnecessary overhead.
Comment 4•8 years ago
|
||
Before this is deployed to prod we need to fix it so already-queued tasks aren't burnt: crossreference-error-lines exceptions:TypeError crossreference_error_lines() takes exactly 1 argument (2 given) store-failure-lines exceptions:TypeError inner() takes exactly 2 arguments (4 given) log-parser exceptions:TypeError inner() takes exactly 2 arguments (4 given) autoclassify exceptions:TypeError autoclassify() takes exactly 1 argument (2 given) eg: https://rpm.newrelic.com/accounts/677903/applications/14179733/filterable_errors#/show/aba08b5b-9616-11e6-9c7c-b82a72d22a14_15754_19345/stack_trace?top_facet=transactionUiName&primary_facet=error.class&barchart=barchart&_k=p9c0zd As a one-off workaround, I guess we could temporarily disable the celerybeat+pulse-listener dynos to let the queues empty out whilst this deploys. We need to be extra careful in the future when making changes like this (and same for schema changes) to make sure that the changes are not only correct, but are backwards-forwards compatible for the deploy itself.
Updated•8 years ago
|
Assignee: nobody → james
Assignee | ||
Comment 5•8 years ago
|
||
> As a one-off workaround, I guess we could temporarily disable the celerybeat+pulse-listener dynos to let the queues empty out whilst this deploys. If we think that will be acceptable it's the easiest way to get this landed > We need to be extra careful in the future when making changes like this (and same for schema changes) to make sure that the changes are not only correct, but are backwards-forwards compatible for the deploy itself. Yeah, I think the right thing for breaking changes is to add version numbers to the queues so that we add new tasks with an incremented version number in one patch and remove the old queues in a second patch once the tasks are drained. I can rewrite this patch to be more like that if you think it's a better idea.
Comment 6•8 years ago
|
||
For today's prod deploy I: * reduced the worker_beat and worker_read_pulse_jobs dynos to zero * waited until the rabbitmq admin panel queue counts reached zero * started the deploy * waited for the release phase tasks to run and new code to be pushed out * returned the worker_beat and worker_read_pulse_jobs dyno counts to 1 * checked NR for errors * checked queues caught up using rabbitmq admin panel All looks fine, no exceptions shown in NR.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•