Closed Bug 1407365 Opened 7 years ago Closed 4 years ago

autoclassifier: log with failure summary takes very long to load if output is big (compared to old "Failure summary)

Categories

(Tree Management :: Treeherder, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: aryx, Unassigned)

References

(Blocks 1 open bug)

Details

https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=6bec02f9a646f11777dbc12ceeb90af76198242d&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
has browser-chrome failures which start with
> TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_tabfocus.js

Clicking on such a failure takes relatively very long to load (>6s) which slows sheriffs down. The devtools' Network tab shows that the summary is ~9KB compressed and >300KB uncompressed. The old "Failure Summary" view loads its content faster.

Suggestion: The processing of the log (adding of bug suggestions) should end after its size has reached a threshold and just add a line like "followed by more failures".
Wow that is slow. The API call being made is to:
https://treeherder.mozilla.org/api/project/autoland/jobs/135968747/text_log_errors/

Sample slow transaction trace from new relic:
https://rpm.newrelic.com/accounts/677903/applications/14179757/transactions?tw%5Bend%5D=1507661177&tw%5Bstart%5D=1507650377#id=5b225765625472616e73616374696f6e2f46756e6374696f6e2f747265656865726465722e7765626170702e6170692e6a6f62733a4a6f6273566965775365742e746578745f6c6f675f6572726f7273222c22225d&app_trace_id=18cda09b-ade8-11e7-b246-0242ac11000e_37199_47920

Breakdown:
* 70%: 200x MySQL bugscache select 
* 29%: In treeherder.webapp.api.jobs:JobsViewSet.text_log_errors 
*  1%: everything else

The 200 selects are what is destroying performance. Each query is pretty quick (eg 20-80ms) but in aggregrate :-(

The individual queries are of form:

SELECT id, summary, crash_signature, keywords, os, resolution, status, MATCH (`summary`) AGAINST (%s IN BOOLEAN MODE) AS relevance FROM bugscache WHERE 1 AND `summary` LIKE CONCAT ('%%%%', %s, '%%%%') ESCAPE '=' AND (modified < %s OR resolution <> '') ORDER BY relevance DESC LIMIT 0, %s 

or

SELECT id, summary, crash_signature, keywords, os, resolution, status, MATCH (`summary`) AGAINST (%s IN BOOLEAN MODE) AS relevance FROM bugscache WHERE 1 AND resolution = '' AND `summary` LIKE CONCAT ('%%%%', %s, '%%%%') ESCAPE '=' AND modified >= %s ORDER BY relevance DESC LIMIT 0, %s 

Solutions include:
* lowering the cap of error lines returns (presumably 100 lines x 2 recent/older query buckets at the moment)
* batching the queries
* pre-generating the results rather than generating dynamically (would prefer to avoid this)
Priority: -- → P1
Blocks: 1407377

Autoclassifier has been removed.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Component: Treeherder: Log Parsing & Classification → TreeHerder
You need to log in before you can comment on or make changes to this bug.