Closed Bug 1757222 Opened 3 years ago Closed 3 years ago

adjust Treeherder FailureSummaryTab bug SuggestionList to only show simplified intermittent bugs instead of all related bugs

Categories

(Tree Management :: Treeherder: Frontend, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: jmaher)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

Attached image suggestions_hacked.png

We are working to simplify the bugs that we have to track intermittent failures to be per test case, not per error.

As we will be starting this with a subset of tests, we could change this at any point of time. In an ideal situation we would need a simple change to treeherder and it would work with the existing code and then new bugs as they are available.

Keep in mind, there is a cycle to ingest the bugs (I think hourly, but lets say 24 hours), and then there is a bugscache that treeherder has which lasts for 24 hours. So any change made to bugs will take from an hour to 48 hours to be seen by the sheriffs.

I did a hack locally:

diff --git a/ui/shared/tabs/failureSummary/FailureSummaryTab.jsx b/ui/shared/tabs/failureSummary/FailureSummaryTab.jsx
index 7df2187df..df8538a95 100644
--- a/ui/shared/tabs/failureSummary/FailureSummaryTab.jsx
+++ b/ui/shared/tabs/failureSummary/FailureSummaryTab.jsx
@@ -74,6 +74,35 @@ class FailureSummaryTab extends React.Component {
     }
     BugSuggestionsModel.get(selectedJob.id).then(async (suggestions) => {
       suggestions.forEach((suggestion) => {
+        let simpleCase = [];
+        if (suggestion.bugs.open_recent.length > 0) {
+          suggestion.bugs.open_recent.forEach((bug) => {
+            if (bug.summary.endsWith('single tracking bug')) {
+              simpleCase = [bug];
+            }
+          });
+        }
+        if (suggestion.bugs.all_others.length > 0) {
+          suggestion.bugs.all_others.forEach((bug) => {
+            if (bug.summary.endsWith('single tracking bug')) {
+              simpleCase = [bug];
+            }
+          });
+        }
+
+        // HACK: use the simple case if found.
+        if (simpleCase.length > 0) {
+          suggestion.bugs.open_recent = simpleCase;
+
+          // HACK: remove the error message from the error line to avoid confusion
+          suggestion.search =
+            suggestion.search.split(suggestion.path_end)[0] +
+            suggestion.path_end;
+
+          // HACK: remove any other bugs, keep this simple.
+          suggestion.bugs.all_others = [];
+        }
+
         suggestion.bugs.too_many_open_recent =
           suggestion.bugs.open_recent.length > thBugSuggestionLimit;
         suggestion.bugs.too_many_all_others =

this is a bit hacky, but I imagine it would be better to make this a bit more robust. I am not sure if this is the right place to edit the hack. All the queries and matching is done in the backend in python which I had trouble setting up locally.

I am attaching a screenshot of what this looks like when running on localhost.

:aryx and :marco, do you have thoughts on this?

Flags: needinfo?(mcastelluccio)
Flags: needinfo?(aryx.bugmail)

The hack sounds reasonable to me, this way we can easily remove it if the experiment doesn't bear the expected fruit.

A question: does this exclude timeout failures?

Flags: needinfo?(mcastelluccio)

this hack only looks for bugs with the summary ending with single tracking bug and of course matching the test name. If those conditions are not met, then nothing changes within treeherder. To avoid that, we would need to adjust the hack to only affect error lines starting with TEST-UNEXPECTED-FAIL

  • to do this in the front end code, we would need to examine the suggestion.search field and flag for keywords like crash, leak, timeout.

If we file a bug for a test case and that test case has TEST-UNEXPECTED-FAIL and TEST-TIMEOUT conditions, we probably would match the new failure. Is that so bad? I see a few scenarios as end results:

  1. we get too many TEST-UNEXPECTED-FAIL but slightly higher numbers due to extra TEST-TIMEOUT instances and disable the test - this is what would normally happen
  2. we get too many TEST-TIMEOUT instances and disable the test, but it would be disabled faster due to the higher numbers by adding in TEST-UNEXPECTED-FAIL.

In summary, we will have more tests be flagged for attention or disabling due to aggregating all failures. Ideally we will have tools/data for developers to see the types of failures and where they occur so debugging can take place.

Code looks reasonable. Please be aware it's TEST-UNEXPECTED-TIMEOUT and let me know if the backend shall be changed to handle test failure and timeout differently.

Flags: needinfo?(aryx.bugmail)

I don't see a need to change the backend, longer term we will want to do that, but for this I don't see a need to do it.

Let me see if I can update this to do the old way for TIMEOUT|CRASH|LEAK

deployed to treeherder production

Assignee: nobody → jmaher
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: