Closed Bug 1355940 Opened 7 years ago Closed 3 years ago

The "GECKO(1613) | " and "PID 3288 " log line prefix break error summary whole line matches

Categories

(Tree Management :: Treeherder, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kats, Assigned: aryx)

Details

Attachments

(1 file)

Failure at [1]. First line that TH detects as an error is this one:

INFO - GECKO(1613) | Assertion failure: rc != 0 (destroyed timer off its target thread!), at /home/worker/workspace/build/src/xpcom/threads/TimerThread.cpp:490 

There is already bug 1355315 on file for this, but TH doesn't show this bug in the list of matching bugs. It should, because this assertion is the root cause and this is the thing that needs the bug, rather than the less-useful UNEXPECTED-FAIL lines after it.

[1] https://treeherder.mozilla.org/logviewer.html#?job_id=90983583&repo=graphics&lineNumber=11880
A couple things are conspiring to make this difficult to suggest.

Here's what that job looks at for bug suggestions: https://treeherder.mozilla.org/api/project/graphics/jobs/90983583/bug_suggestions/

That first object is for the failure line you're looking at.

To get those results, this function is called: https://github.com/mozilla/treeherder/blob/ac04d5ab8c314d046016b501720d1be564b62568/treeherder/model/error_summary.py#L44-L78

That function takes the failure line and tries to clean it up into the search terms used to search the Bugscache. 

It first strips out mozharness prefixes: https://github.com/mozilla/treeherder/blob/ac04d5ab8c314d046016b501720d1be564b62568/treeherder/model/error_summary.py#L15

After that, we move on to https://github.com/mozilla/treeherder/blob/ac04d5ab8c314d046016b501720d1be564b62568/treeherder/model/error_summary.py#L85-L140 where the cleaned up error line is stripped down even further. 

The whole section starting with | if len(tokens) >= 3: | is being skipped since there was only one '|' in the error line. If it *had* been run, we would have stripped off the "GECKO(1613) | " part of the error string, leaving just the "Assertion failure" portion.

Either way, whatever the failure line ended up being at this point would have been capped at 100 characters due to https://github.com/mozilla/treeherder/blob/ac04d5ab8c314d046016b501720d1be564b62568/treeherder/model/error_summary.py#L138

Since it doesn't have an obvious test name or crash signature to search for, it tries to match the complete string, which fails since the failure line is cut off mid-assertion.




Things that might help:
1. Make sure we strip out the "GECKO(1613) | " part even if we don't have three tokens at that point. This would buy the complete string match another 14 characters to use.
2. Maybe strip out the path leading up to "TimerThread.cpp:490"? This would free up a lot of extra characters for the complete string match, though might risk stripping out desired paths elsewhere if we aren't careful.
3. Increase the cap on the search term length from 100 to something somewhat larger.


Doing 1 and 2 together would make the search term be "Assertion failure: rc != 0 (destroyed timer off its target thread!), at TimerThread.cpp:490", which should find the correct bug when searching bugzilla: https://bugzilla.mozilla.org/buglist.cgi?list_id=13534776&short_desc_type=allwordssubstr&short_desc=Assertion%20failure%3A%20rc%20%21%3D%200%20%28destroyed%20timer%20off%20its%20target%20thread%21%29%2C%20at%20TimerThread.cpp%3A490&resolution=---&query_format=advanced
I think doing 1 is a good idea. For 2, to mitigate the risk of stripping out desired paths, can we just have a whitelist of path prefixes that show up in automation (e.g. "/home/worker/workspace/build/src") and strip those out unconditionally using a regex match? Clearly there was some code that stripped this prefix when I filed bug 1355315 via bugfiler, so presumably we should be doing the same thing here.

More generally, I think that one invariant we should strive for is that any bug filed with bugfiler should be findable if the same error message shows up again. So whatever bugfiler does to the error message, the search should as well.
Ed commented similarly in the PR for Bug 1355339. 
I'd be willing to add some of the stripping stuff from the bug filer into error_summary.py's flow if that's what we want to do.
Flags: needinfo?(emorley)
I'm not sure what question you're asking?

The rough ideas above sound good - will be easier to judge via a PR.
Flags: needinfo?(emorley)
Component: Treeherder → Treeherder: Data Ingestion
Summary: TreeHerder doesn't find matching intermittents → The "GECKO(1613) | " log line prefix break error summary whole line matches
Component: Treeherder: Data Ingestion → Treeherder: Log Parsing & Classification
Summary: The "GECKO(1613) | " log line prefix break error summary whole line matches → The "GECKO(1613) | " and "PID 3288 " log line prefix break error summary whole line matches

Triage: Sebastian: Is this still an issue?

Flags: needinfo?(aryx.bugmail)

Yes.

Flags: needinfo?(aryx.bugmail)
Assignee: nobody → aryx.bugmail
Status: NEW → ASSIGNED
Comment on attachment 9142352 [details] [review]
Link to GitHub pull-request: https://github.com/mozilla/treeherder/pull/6339

Just need to fix the tests and this should be good to merge.
Attachment #9142352 - Flags: review?(cdawson) → review+
Component: Treeherder: Log Parsing & Classification → TreeHerder
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: