Closed Bug 1503172 Opened 6 years ago Closed 5 years ago

update_bugscache intermittently purges cached bugs that are still valid

Categories

(Tree Management :: Treeherder, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: dluca, Assigned: emorley)

References

Details

Bug suggestions are not loading, even for perma or high freq intermittent bugs.
See Also: → 1503182
Thank you for filing.

Looking at the bugscache table, it had 6000 rows, which is strange since:
* unusual that it's rounded to the nearest 1000 rows
* https://bugzilla.mozilla.org/bzapi/count?keywords=intermittent-failure&chfieldfrom=-1y says there are 20668

This would explain why most suggestions are missing.

There are no errors in New Relic, however the command being run for the task (now that it's being run in the Heroku scheduler; bug 1176492):
* doesn't include the `newrelic-admin run-program` prefix
* needs to be listed here too for New Relic to pick it up:
  https://github.com/mozilla/treeherder/blob/f761c7280308b2443f8ddc649ed7eed8e12408b0/newrelic.ini#L28-L31

Looking in Papertrail:
* there are no errors
* however the most recent bugscache run only look 4 minutes before exiting with code 0, rather than the 10 minutes it usually takes
(https://papertrailapp.com/systems/treeherder-prod/events?focus=993831949187579906&q=program%3Ascheduler&selected=993831949187579906)

This makes me suspect that the pagination logic here is not working as expected, and as such the loop is bailing before all bugs are downloaded, which causes all of the other bugs to be treated as old and get deleted from the bugscache table:
https://github.com/mozilla/treeherder/blob/f761c7280308b2443f8ddc649ed7eed8e12408b0/treeherder/etl/bugzilla.py#L37-L49

Whilst looking into this I've been running a one-off manual `heroku run -- newrelic-admin run-program ./manage.py update_bugscache`, and there are now 20668 rows in the bugscache table, which is the correct number.

So the tl;dr is:

"Sometimes the bugscache updating task intermittently fails in a way where only some of the bugs are fetched from bugzilla, and anything that wasn't fetched correctly is deleted from the bugscache table, meaning missing bugs suggestions until the next time it runs (which is hourly)"
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Priority: -- → P1
Whilst the bugscache was re-populated as of ~08:40 UTC, I've had to flush the Redis cache to force old bug suggestions to be regenerated (connected to the Redis instance using the Heroku CLI addon, then ran the `flushall` command). This will mean everyone will need to log in again.

The bug suggestions for the link in comment 0 now works.
Summary: Treeherder is not loading bug suggestions → update_bugscache intermittently purges cached bugs that are still valid
Depends on: 1503215
Depends on: 1503576
Waiting on finding out more in bug 1503215.

This is still blocked on having the BMO logs in bug 1503215.

Unfortunately we never got the information we needed from the logs in bug 1503215, so it's hard to know what to do here. The comments there indicate at least one bug on BMO's side has been fixed, which might mean this doesn't occur again.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → INCOMPLETE
Component: Treeherder: Log Parsing & Classification → TreeHerder
You need to log in before you can comment on or make changes to this bug.