New production tbpl frequently has problems fetching the summary for a particular job

RESOLVED FIXED

Status

Tree Management Graveyard
TBPL
--
major
RESOLVED FIXED
7 years ago
3 years ago

People

(Reporter: philor, Assigned: rhelmer)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

7 years ago
A number of possibilities:

* PHP's max execution time is too short

* somehow, something's different in the code that's now on tbpl.m.o - if you look at bug 614146, you can see that we haven't been getting suggestions for failures in the "test filename" | Shutdown | since March/April, when it switches from tbplbot starring to manual copy-pasting, but now tbpl.m.o does get those, making me suspicious that it's somehow different than what we've been running, and also different in what it tries to search bugzilla for on the troublesome logs, all of which are failure modes that include non-filename stuff like Shutdown or unknown test file or plugin process

* it just has a slower connection to BzAPI than anything we've run before (seems very unlikely, since I never had this much trouble while I was running tbpl locally, over a fringe 3G connection)

* something I'm not thinking of

STR:
1. The hard part, find a failure that won't load a summary. Instances of bug 666092 are good, but you'll only find that on Beta now; instances of bug 484123 are also good, and much thicker on the ground.
2. Click the letter, watch the throbber in the summary panel spin until it gives up. Do it again, and again, and again, and again.
3. Get bored, open Tools - Web Developer - Web Console, do it again a few times, noting the ~29980ms timeouts, then copy the https://tbpl.mozilla.org/php/getLogExcerpt.php?id=1234567&type=annotated URL, load that in a separate tab.
4. Once it has loaded in that separate tab, click the letter again, the summary will load instantly, and you're back to starring.

In the olden days, we had some of this same sort of problem, like with bug 484123 you would typically have to wait for the first timeout, then it would load on the second try. Now, it doesn't load after dozens of tries.
(Assignee)

Comment 1

7 years ago
Created attachment 557623 [details] [diff] [review]
bump timeout from 30s to 120s

I think the short-term solution here is:

* bump timeout on the PHP side, what it is doing is:
** downloading logfile from FTP
** gunzipping log file
** possibly grepping
** caching result

* bump timeout on frontend to match whatever we do on PHP side (120s)
** this is hardcoded to 30s right now, patch attached to bring it to 120s
Assignee: nobody → rhelmer
Status: NEW → ASSIGNED
Attachment #557623 - Flags: review?(peterbe)
Comment on attachment 557623 [details] [diff] [review]
bump timeout from 30s to 120s

Review of attachment 557623 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
(Assignee)

Comment 3

7 years ago
Comment on attachment 557623 [details] [diff] [review]
bump timeout from 30s to 120s

Taking Arpad's r+ for this, nm peterbe :)
Attachment #557623 - Flags: review?(peterbe)
(Assignee)

Comment 4

7 years ago
Pushed 0e085b64a113

I think I see a way to optimize the log downloading/gunzipping/parsing, I'll file a new bug about that though.
Status: ASSIGNED → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.