Closed Bug 1027421 Opened 10 years ago Closed 10 years ago

Delay between actual end of build and when that appears on tbpl

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1096863

People

(Reporter: glandium, Unassigned)

Details

(Probably not the right component)

I was following several of my try builds directly on build masters yesterday, and saw them turn red in real time, and it took several minutes for tbpl to catch up (even when refreshing manually). I don't think that helps sheriffs when there is incoming bustage.
We discussed this with Chris, he said there are some workarounds we might be able to do to improve this.
Flags: needinfo?(catlee)
Last time we talked about it, builds-4hr was generated on a 1 minute cron, tbpl polled it on a 5 minute cron, the browser polled tbpl on a 2 minute timeout. The releng part of that would be "tell tbpl to use pulse, be told it won't change in its final months of life, but treeherder..." (Well, and make pulse so supported and failure-proof and backed up that it would be reasonable to change.)

I guess there are cooperative workarounds possible, like turning it into builds-2hr and dropping the tbpl polling to 2 minutes (or whichever combination of minutes and hours wouldn't result in tbpl tripping over its previous import's feet), but tbpl is very much legacy at this point, and the current delays have been known and unchanged (other than when cruncher was breaking) since I filed bug 677004 in the summer of 2011.
Treeherder is sadly having to use builds-4hr for now, due to pulse not being adequate yet (bug 862595 should hopefully improve this). I'm not sure how often treeherder polls builds-4hr, but it would be good to use something less than 5 mins. On the plus side, treeherder uses push notifications to send updates to the UI, so the current additional 2 min UI refresh lag will be avoided.
I've filed bug 1031238 for the longer term goal of switching treeherder to pulse.
(In reply to Ed Morley [:edmorley UTC+0] from comment #3)
> Treeherder is sadly having to use builds-4hr for now, due to pulse not being
> adequate yet (bug 862595 should hopefully improve this).

Please stop repeating this FUD. pulse data and builds-4hr should be equivalent in terms of the contents of data. Treeherder is already dealing with having to guess platform, etc. from the builder names from builds-4hr without this more structured information, so it already has the required logic to parse it out of pulse as well.

Improving the data structures is a great thing to aim for, but I don't agree that treeherder is blocked on it.
Flags: needinfo?(catlee)
As to how we can actually shorten the gap here, it appears like most of the delay is on the TBPL side.

Worst case we're ~8 minutes from build finished to showing up in the browser? We could reasonably shave 3-4 minutes off that by changing some TBPL polling frequencies.

Is that 3-4 minutes worth the effort for a legacy application at this point?
(In reply to Chris AtLee [:catlee] from comment #5)
> Please stop repeating this FUD. pulse data and builds-4hr should be
> equivalent in terms of the contents of data. 

I object to that - it's not FUD - the previous concerns (or misunderstandings, if that's what they were) still haven't been addressed, at least not in a format that I've seen. According to jeads, they weren't identical (and I'm not talking about buildername). In addition, it proved less reliable for them back then - and aiui pulse still doesn't have the same level of support as the rest of our infra (from an SLA/can page POV). Has this been resolved by speaking to jeads since?
(In reply to Chris AtLee [:catlee] from comment #6)
> As to how we can actually shorten the gap here, it appears like most of the
> delay is on the TBPL side.

Agreed; the pitfall of a polling vs push based approach, combined with conservative polling times.

> Worst case we're ~8 minutes from build finished to showing up in the
> browser? We could reasonably shave 3-4 minutes off that by changing some
> TBPL polling frequencies.
> 
> Is that 3-4 minutes worth the effort for a legacy application at this point?

We could increase the frequency by which the TBPL import-buildbot-data.py cron fires (currently every 5 mins) & also increase the frequency of the client side refresh (currently 2 mins) - though increasing the latter beyond the former seems pointless (and the latter has greater load implications).

Either way I agree, the cost:benefit is debatable - given that I hope that in 1-2 months we should be using Treeherder as the primary tool.
Are we happy to mark this as resolved now that tree herder is the primary CI interface? Is tree herder reporting build results more quickly that tbpl was?
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(emorley)
Not really, comment 3 stands tbh.
Flags: needinfo?(emorley)
Indeed, builds still show up earlier in tbpl than they do in th.
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #11)
> Indeed, builds still show up earlier in tbpl than they do in th.

I believe this is just "time for the push/first pending job to show in Treeherder", since completed jobs should appearing roughly the same time as TBPL.

Either way, we still want to reduce the time for jobs to appear, but many of the pieces don't involve releng (other than switching to builds-2hr/pulse) and so are probably easier to coordinate via a Treeherder bug. I've filed bug 1096863 with a more up to date comment 0, and broken out various issues into new dep bugs.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Wonderful, thanks Ed!
You need to log in before you can comment on or make changes to this bug.