So I'm considering a large refactor of the way that autoclassification data is stored. The main motivation here is that now we are free of datasource artifacts we can store autoclassification data on text_log_error rather than on failure_line. That allows for the possibility of writing matchers that work on the unstructured data for cases where there is no corresponding failure_line; these likely won't be as good, but they will be better than nothing. It also means the UI layer can be considerably simplified as much of the distinction between structured and unstructured lines will go away. The only disadvantage is that we will ignore lines we found in the structured log that aren't in the unstructured log. I think that's acceptable as those lines are probably duplicated in the UI right now anyway. So the data model changes I think I need to make here are: Add a failure_line foreign key to text log error Add a best_classification foreign key to text_log_error Add a boolean best_is_verified to text_log_error (later the corresponding columns on failure_line will be removed) Add an error_match link table between classified_failure and text_log_error (later delete the failure_match table) Add a column to the job table to indicate the ingestion status (i.e. whether we have matched error lines, or run autoclassification) There will also need to be some additional indicies: On text_log_step.finished, on text_log_error.line (prefix), probably some others that I haven't thought of yet. In terms of endpoints, I imagine that the a text_log_error endpoint will be used to get all the data for the autoclassification panel, and for setting most/all of the data (I will try to reduce it to a single API call for get and a single call for get I think, but I haven't clearly decided how to achieve that). Obviously this is quite a bit of work, but I think it will be worthwhile. I haven't entirely decided what the best way to structure it into a series of steps is.
3 months ago