Closed Bug 853697 Opened 11 years ago Closed 11 years ago

Reset /try

Categories

(Developer Services :: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rbryce, Assigned: rbryce)

References

Details

(Whiteboard: [reit-ops])

Received and alert and confirmed that /try repo was serving 500 Internal Server Errors.  There were no pushes in progress, as well no defunct processes.  Ran the reset_try script to re-sync the repo.  /try is back online now.
Whiteboard: [reit-ops]
Group: infra
Whiteboard: [reit-ops]
Whiteboard: [reit-ops]
This was completed.
Assignee: server-ops → rbryce
Status: NEW → RESOLVED
Closed: 11 years ago
Component: Server Operations → Server Operations: Developer Services
Resolution: --- → FIXED
Blocks: 853761
(In reply to Rick Bryce [:rbryce] from comment #0)
> Received and alert and confirmed that /try repo was serving 500 Internal
> Server Errors.  There were no pushes in progress, as well no defunct
> processes.  Ran the reset_try script to re-sync the repo.  /try is back
> online now.

Was the reset needed for perf reasons/was the repo actually corrupt?

(Resetting Try looses all prior try results, so whilst expected when we have to reset for perf, is something that would be good to avoid in the future if the repo wasn't actually corrupted :-))

Thank you for getting it back online anyway :-)
Try was reset for performance reasons. The head count at reset was 21811 dating back to January 2011. This was so large that it was causing requests for json-pushes to time out and create HTTP 500s resulting in a service availability for everybody.
Ah didn't realise it was back up that high again - thanks :-)
The head dates don't match with try resets. (I believe that could happen if someone pushed a very old commit from their local repo to try.)

I believe our prior try reset was July 2012 per bug 778062
(In reply to Ben Kero [:bkero] from comment #3)
> Try was reset for performance reasons. The head count at reset was 21811
> dating back to January 2011. This was so large that it was causing requests
> for json-pushes to time out and create HTTP 500s resulting in a service
> availability for everybody.

That and the docs we had weren't clear on a course of action that would have led to discussing things with people first _before_ the reset was carried out. 

I'll work on fixing that for the future and making sure we have better clarity in our documentation.

Sorry for all the trouble this has caused.
(In reply to Shyam Mani [:fox2mike] from comment #6)
> Sorry for all the trouble this has caused.

No problem - keep up the awesome work :-)
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in before you can comment on or make changes to this bug.