Closed Bug 1338570 Opened 8 years ago Closed 8 years ago

The seta-analyze-failures task times out and makes at least 140,000 MySQL requests

Categories

(Tree Management Graveyard :: Treeherder: SETA, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1330354

People

(Reporter: emorley, Unassigned)

References

Details

As part of the recent SETA issue debugging on IRC, I noticed that the seta-analyze-failures task wasn't appearing on New Relic under the "transactions" section. Searching the logs found the task was hitting the Celery TimeLimitExceeded, which is why it wasn't showing up. (The reporting of this to new relic is now fixed, see bug 1308549). That new relic reporting fix isn't yet on prod, but in the meantime I manually scheduled the task on prod using: from treeherder.seta.tasks import seta_analyze_failures seta_analyze_failures.apply_async(time_limit=15*60, soft_time_limit=10*60) Even though the time limit is double that currently used for the task, it still didn't complete in time. However due to using the soft time limit we at least now have a trace: https://rpm.newrelic.com/accounts/677903/applications/14179757/transactions?tw%5Bend%5D=1486744165&tw%5Bstart%5D=1486742365#id=5b224f746865725472616e73616374696f6e2f43656c6572792f736574612d616e616c797a652d6661696c75726573222c22225d The task is making 170,000 MySQL requests in the part captured before timeout - it looks like a `.select_related()` is missing here: https://github.com/mozilla/treeherder/blob/2b7b0e62f27826324add96743b03247fc1b952c9/treeherder/seta/analyze_failures.py#L61
Blocks: 1330728
The cause of this is actually the same missing `.select_related()` as bug 1330354 (which is about timeouts to the API endpoint rather than during ingestion, but both use this codepath).
Blocks: 1326102
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.