Closed Bug 1307093 Opened 8 years ago Closed 7 years ago

Refactor autoclassification data storage

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jgraham, Unassigned)

References

Details

James Graham [:jgraham]

Reporter

Description

•

8 years ago

So I'm considering a large refactor of the way that autoclassification data is stored. The main motivation here is that now we are free of datasource artifacts we can store autoclassification data on text_log_error rather than on failure_line. That allows for the possibility of writing matchers that work on the unstructured data for cases where there is no corresponding failure_line; these likely won't be as good, but they will be better than nothing. It also means the UI layer can be considerably simplified as much of the distinction between structured and unstructured lines will go away. The only disadvantage is that we will ignore lines we found in the structured log that aren't in the unstructured log. I think that's acceptable as those lines are probably duplicated in the UI right now anyway.

So the data model changes I think I need to make here are:
Add a failure_line foreign key to text log error
Add a best_classification foreign key to text_log_error
Add a boolean best_is_verified to text_log_error
(later the corresponding columns on failure_line will be removed)
Add an error_match link table between classified_failure and text_log_error
(later delete the failure_match table)
Add a column to the job table to indicate the ingestion status (i.e. whether we have matched error lines, or run autoclassification)

There will also need to be some additional indicies:
On text_log_step.finished, on text_log_error.line (prefix), probably some others that I haven't thought of yet.

In terms of endpoints, I imagine that the a text_log_error endpoint will be used to get all the data for the autoclassification panel, and for setting most/all of the data (I will try to reduce it to a single API call for get and a single call for get I think, but I haven't clearly decided how to achieve that).

Obviously this is quite a bit of work, but I think it will be worthwhile. I haven't entirely decided what the best way to structure it into a series of steps is.

James Graham [:jgraham]

Reporter

Updated

•

8 years ago

Depends on: 1310972

James Graham [:jgraham]

Reporter

Updated

•

8 years ago

Depends on: 1310974

James Graham [:jgraham]

Reporter

Updated

•

8 years ago

Depends on: 1312575

Ed Morley [:emorley]

Updated

•

7 years ago

Component: Treeherder → Treeherder: Log Parsing & Classification

James Graham [:jgraham]

Reporter

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Assignee

Updated

•

2 years ago

Component: Treeherder: Log Parsing & Classification → TreeHerder

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Refactor autoclassification data storage

Categories

(Tree Management :: Treeherder, defect)

Tracking

(Not tracked)

People

(Reporter: jgraham, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Updated

Updated