Enable classification of all job failures through a single panel

RESOLVED FIXED

Status

RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: jgraham, Assigned: jgraham)

Tracking

(Blocks: 1 bug)

Details

Attachments

(2 attachments)

(Assignee)

Description

3 years ago
Based on sheriff feedback, the main problem with the existing job panel was that it did not show all the error lines. To fix this, the plan is to allow all error lines to be shown, and assigned a bug number, under the autoclassification panel. Implmentation wise this involves working out which structured lines correspond to which unstructured lines to enable deduplication.

This change will not enable autoclassification of unstructured lines, since that would require moving away from the artifact storage, which would be a lot of yak to shave.
(Assignee)

Comment 1

3 years ago
Created attachment 8725726 [details] [review]
PR

camd: can you look at the frontend and log ingestion parts? emorley, mdoglio, can you look at the backend? I guess this will need a few iterations, but it's already a fairly large change.
Attachment #8725726 - Flags: review?(mdoglio)
Attachment #8725726 - Flags: review?(emorley)
Attachment #8725726 - Flags: review?(cdawson)
Comment on attachment 8725726 [details] [review]
PR

Unfortunately I don't even know where to start with beginning this review. I've lost track of what was reviewed before (since we never landed it on master, so now have an even larger PR) and the new work has drastically increased the complexity of our celery tasks, such that I'm going to need to spend several days digging through this next week (I'm mostly on PTO yesterday/today).

I think the best way forwards might be for us to have a meeting where we can discuss the bigger picture for the way we split of the log parse tasks and see if there is a simpler way we can work avoid the current complex dependency chains which have required the use of the celery results backend for the first time, along with the custom sync/locking solution (https://github.com/mozilla/treeherder/commit/7edf602975d8c9334a709daa74d9bb01d188a52a).
Attachment #8725726 - Flags: review?(emorley)
Comment on attachment 8725726 [details] [review]
PR

It looks like we are going to have a meeting to discuss these changes. I'll hold on this review until then.
Attachment #8725726 - Flags: review?(mdoglio)
(Assignee)

Updated

3 years ago
Attachment #8725726 - Flags: review?(mdoglio)
Attachment #8725726 - Flags: review?(emorley)
Attachment #8725726 - Flags: review?(emorley) → feedback+
Comment on attachment 8725726 [details] [review]
PR

I only reviewed the UI code.
Attachment #8725726 - Flags: review?(cdawson) → review+
(Assignee)

Updated

2 years ago
Attachment #8725726 - Flags: review?(emorley)
Attachment #8725726 - Flags: review?(emorley) → review+
I don't think I can review attachment 1 [details] [diff] [review] anytime before next week. Given that both :emorley and :camd reviewed it I would be tempted to clear the r? flag on me. :jgraham are you ok with that?
(Assignee)

Comment 6

2 years ago
I would prefer if you did the (hopefully trivial) reviews for the webapi bits and db changes next week. I think it's more important to do this right than do it by March 31st.
Comment on attachment 8725726 [details] [review]
PR

Sorry I won't be able to review this patch anytime soon. Is there anything specific you want some feedback on?
Attachment #8725726 - Flags: review?(mdoglio)
(Assignee)

Comment 8

2 years ago
Comment on attachment 8725726 [details] [review]
PR

mdoglio: I think it should be fine; emorley and camd should know all the unreviewed code.
Attachment #8725726 - Flags: review?(emorley)
Comment on attachment 8725726 [details] [review]
PR

Looks good, thank you.

As we discussed yesterday, I'll be away until 19th April.

I saw another few CloudAMQP alerts this morning (even after the queues were reset yesterday) - I'll leave it to you to check whether that blocks testing on stage.

For the landing of this I imagine that it may be worth landing the migrations first (presuming everything is currently switched off on stage/prod that uses the affected schema) and then the rest separately. The only gotcha might be how long it takes for the migrations to run on prod, but I guess all we can do it try :-) (And if they time out, comment out the migration step in the Chief update.py script and run them manually after deployment from one of the nodes).

Prior to landing on prod, it's probably worth keeping an eye on the following for stage to see if there are any performance regressions/issues:
* New Relic APM exceptions/error pages (https://rpm.newrelic.com/accounts/677903/applications/5585473/filterable_errors)
* New Relic APM transaction page: web transactions (https://rpm.newrelic.com/accounts/677903/applications/5585473/transactions?type=app)
* New Relic APM transaction page: non-web transactions (https://rpm.newrelic.com/accounts/677903/applications/5585473/transactions?type=other&show_browser=false)
* New Relic APM database page (https://rpm.newrelic.com/accounts/677903/applications/5585473/datastores#/overview/All?value=total_call_time_per_minute)
* New Relic Plugin page for rabbitmq (ie https://rpm.newrelic.com/accounts/677903/plugins/16138)

Good luck! :-)
Attachment #8725726 - Flags: review?(emorley) → review+
(Assignee)

Comment 10

2 years ago
ni? KWierso at his own request to try this out on stage.
Flags: needinfo?(wkocher)
(Assignee)

Updated

2 years ago
Blocks: 1265095
Created attachment 8742943 [details] [review]
[treeherder] mozilla:autoclassify_integration > mozilla:master

Comment 12

2 years ago
Commits pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/7e4084d24fcf6d543fde9d9a2515be1e4072ee70
Bug 1252854 - Backend work for matching unstructured and structured log summary lines.

https://github.com/mozilla/treeherder/commit/af1db114519b14033548dc3f35826e3a4d924caf
Bug 1252854 -  Front end changes to allow structured and unstructured lines to both be displayed in the autoclassify UI.

https://github.com/mozilla/treeherder/commit/daf296dc30c030e8292b4915d5766b86c8b7dacf
Merge pull request #1414 from mozilla/autoclassify_integration

Bug 1252854 - Integrate classification of structured and unstructured error lines.
I never had a chance to get to this over the weekend. I'll try to look at an updated stage sometime this week, I guess.
Flags: needinfo?(wkocher)
Blocks: 1177519
No longer depends on: 1177519
(Assignee)

Updated

2 years ago
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Assignee: nobody → james
You need to log in before you can comment on or make changes to this bug.