Closed
Bug 1465420
Opened 7 years ago
Closed 7 years ago
Some intermittent bugs not suggested any more for failures
Categories
(Tree Management :: Treeherder, defect, P1)
Tree Management
Treeherder
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: aryx, Assigned: emorley)
Details
Trees are closed for this.
e.g. https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=6be2c9f129c9966e7ee1cd489a534754362ac662&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Should suggest https://bugzilla.mozilla.org/show_bug.cgi?id=1415911 and did so in the past.
https://treeherder.mozilla.org/#/jobs?repo=mozilla-esr60&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=pending&filter-resultStatus=running&selectedJob=180863247 suggests the wrong bug.
All those browser-chrome failures without a suggestion https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&revision=904b53e0ca0c480d907b71a86f46d3fe721294ab&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable also likely have existing bugs.
ghickman made some matcher changes in the today's TH production push: https://github.com/mozilla/treeherder/compare/c80e34213340...bc3e14d6f8ea#files_bucket
Flags: needinfo?(ghickman)
| Reporter | ||
Comment 1•7 years ago
|
||
First url should https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=6be2c9f129c9966e7ee1cd489a534754362ac662&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable&selectedJob=180877513
This is with the default view "Failure Summary"
| Assignee | ||
Comment 2•7 years ago
|
||
For the first example, here's what was shown previously:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=3eccd139667d491ca259e8ff96f740ecfa36a781&selectedJob=180377254
| Assignee | ||
Comment 3•7 years ago
|
||
I've just cleared the production Redis cache (which is where the bug suggestions are cached), and the link in comment 1 now returns the correct suggestions for me. (And the other links look improved now too)
Do things look better for you?
| Assignee | ||
Comment 4•7 years ago
|
||
(Clearing the Redis cache also logs out all users, in case you were thinking bug 1465355 was back)
| Reporter | ||
Comment 5•7 years ago
|
||
Thank you, suggestions are back to expected ones.
Severity: blocker → normal
| Assignee | ||
Comment 6•7 years ago
|
||
I don't understand how the cache entries could have been affected, since there were no New Relic errors during the deployment, and the cache result should only be stored in case of success:
https://github.com/mozilla/treeherder/blob/df5cb3fcf400c0da06098e6cac88e5f8c02a06e4/treeherder/model/error_summary.py#L22-L39
| Assignee | ||
Comment 7•7 years ago
|
||
To give more context on comment 6 - I was initially thinking that perhaps if logs were parsed whilst the migration was running, perhaps the queries might have failed. However:
* the job in comment 1 completed at 12:12:40 UTC, which is well after the 08:25 UTC deploy
* the cache shouldn't be saved in a case of a failed query
* if the query had failed we should have seen it in new relic
| Assignee | ||
Comment 8•7 years ago
|
||
This seems suspiciously close timing to the finish time of that job (I've adjusted the timestamp to show UTC):
May 30 12:15:30 treeherder-prod app/worker_default.2: [2018-05-30 12:15:30,110: ERROR/MainProcess] Hard time limit (930s) exceeded for fetch-bugs[b1aa8125-d0fc-4b2b-95eb-cb55dd23c8cf]
Looking at the bugscache population code, it seems at first glance that this shouldn't be a problem, since only old bugs are removed from the table (rather than truncating the table and then re-populating, like it used to do a few years ago, which is bad for race conditions and if the whole task fails).
However perhaps this is buggy and not doing what we think?
https://github.com/mozilla/treeherder/blob/df5cb3fcf400c0da06098e6cac88e5f8c02a06e4/treeherder/etl/bugzilla.py#L47-L49
ie:
* instead of deleting just the old bugs, all bugs are deleted
* normally the bugs table is then re-populated soon after (though still would leave a window where some bugs were missing)
* in this case, since the fetch-bugs task timed out, the table was left with lots of bugs missing, until it ran again the next hour
| Assignee | ||
Comment 9•7 years ago
|
||
Hmm seems to work fine:
"""
$ ths run ./manage.py shell
Running ./manage.py shell on ⬢ treeherder-stage... up, run.7771 (Standard-1X)
...
>>> from treeherder.model.models import Bugscache
>>> bugs_stored = set(Bugscache.objects.values_list('id', flat=True))
>>> len(bugs_stored)
22080
>>> Bugscache.objects.first().id
473680L
>>> bug_list = [{'id': 1}, {'id': 2}, {'id': 473680L}]
>>> old_bugs = bugs_stored.difference(set(bug['id'] for bug in bug_list))
>>> len(old_bugs)
22079
"""
| Assignee | ||
Comment 10•7 years ago
|
||
Clearing the Redis cache fixed this, and unless it occurs again I'm out of low hanging fruit things to investigate.
Assignee: nobody → emorley
Status: NEW → RESOLVED
Closed: 7 years ago
Component: Treeherder → Treeherder: Log Parsing & Classification
Flags: needinfo?(ghickman)
Priority: -- → P1
Resolution: --- → FIXED
Summary: some intermittent bugs not suggested anymore for failures → Some intermittent bugs not suggested any more for failures
Updated•4 years ago
|
Component: Treeherder: Log Parsing & Classification → TreeHerder
You need to log in
before you can comment on or make changes to this bug.
Description
•