Closed Bug 1053678 Opened 10 years ago Closed 10 years ago

New Try pushes are not appearing on Treeherder since the try repo reset

Categories

(Tree Management :: Treeherder, defect, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: mdoglio)

References

Details

Bug 1053558 reset the try repo.

New pushes since the reset are no longer appearing - ie compare the latest push on:
https://treeherder.mozilla.org/ui/#/jobs?repo=try

vs:
https://tbpl.mozilla.org/?tree=Try

This is presumably because Treeherder imports the pushlog & combines with the results internally, so I imagine the ingestion process is stuck, since the pushlog DB has to be reset when the repo itself is reset - and so the push IDs have been reset.

Resetting try is something that happens periodically (as we discussed in the work week previously), so we need to handle this gracefully.

In addition, TBPL currently allows accessing old results data when using the direct URL (&rev=SHA) - it would be good if we could preserve that with treeherder and not just reset the data for try entirely.
Treeherder is caching the last push id fetched from the json-pushes service for each repo. When that value is available in cache the subsequent requests are incremental.
When a repo is reset the push id cached is ahead of last from the repo, resulting in empty results from json-pushes. 
I reset the cache key to solve this issue.
One way to mitigate this on the long term would be to set a low cache expiration time on those keys (like 15 minutes or so) but that still seems sub optimal to me. I would like much more to base the pushlog ingestion on pulse notifications.
:camd :jeads do you have an opinion on that?
Flags: needinfo?(jeads)
Flags: needinfo?(cdawson)
mdoglio: yeah, that sounds nice.  As discussed in meeting, we would want to have a "double-check" incrementally of the json-pushes endpoint to verify we didn't miss anything in pulse.  But this would be great!
Flags: needinfo?(cdawson)
mdoglio: That sounds like a good fail-safe. I think optimally we would be able to synchronize the cache key reset with try getting reset. Not sure how to do this. From edmorley's comment:

"Resetting try is something that happens periodically (as we discussed in the work week previously), so we need to handle this gracefully."

What is the procedure carried out to reset try? Is there some way that could be broadcast as an event on pulse? or tied into a treeherder webhook? or if it's manual, we could add a utility to the Sheriff tab in treeherder to trigger the reset.

If so we could trigger the cache key reset dynamically at the same time removing the possibility of getting out of sync with try altogether.
Flags: needinfo?(jeads) → needinfo?(emorley)
Manually performing the cache reset via the admin panel sounds reasonable - this could either be done by the sheriffs when they here of a try reset having taken place - or better, instructions could be added to the document that IT follows to reset the try repo (last I heard it was at https://mana.mozilla.org/wiki/display/SYSADMIN/Mercurial , but non IT don't have permission to access pages under SYSADMIN helpfully). A message could also be added to the reset_try.sh script that the instructions get them to use, to remind them to message someone on IRC (or if we gave a few people from IT permissions to the cache reset, they could do it themselves).
Flags: needinfo?(emorley)
s/when they here/when they hear/
(Too early on a saturday morning without caffeine lol)
Gum will soon need a cache reset too:
Bug 1055756
Indeed, gum has been reset and needs its cache reset.

Try doesn't get reset often (six months since the previous reset), but projects are reset more frequently. If it's easy to reset the cache via the admin panel, I'm happy to add it to our docs if you're willing to give us access. And by "us" I mean Dev Services (no longer a subsidiary of IT): bkero, myself, and probably hwine. At this point, no one else should really be doing resets of anything except under extreme duress.
(In reply to Kendall Libby [:fubar] from comment #7)
> Dev Services (no longer a subsidiary of IT)

Ah sorry I'd misremembered :-)
Depends on: 1058808
(In reply to Ed Morley [:edmorley] from comment #6)
> Gum will soon need a cache reset too:
> Bug 1055756

Broken this out to bug 1058802.

Have also filed bug 1058808 for the long term solution.

This bug as filed is fixed.
Assignee: nobody → mdoglio
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.