Status

Developer Services
General
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: rbryce, Assigned: rbryce)

Tracking

Details

(Whiteboard: [reit-ops])

(Assignee)

Description

5 years ago
Received and alert and confirmed that /try repo was serving 500 Internal Server Errors.  There were no pushes in progress, as well no defunct processes.  Ran the reset_try script to re-sync the repo.  /try is back online now.
Whiteboard: [reit-ops]
(Assignee)

Updated

5 years ago
Group: infra
Whiteboard: [reit-ops]
Whiteboard: [reit-ops]
This was completed.
Assignee: server-ops → rbryce
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Component: Server Operations → Server Operations: Developer Services
Resolution: --- → FIXED
Blocks: 853761

Comment 2

5 years ago
(In reply to Rick Bryce [:rbryce] from comment #0)
> Received and alert and confirmed that /try repo was serving 500 Internal
> Server Errors.  There were no pushes in progress, as well no defunct
> processes.  Ran the reset_try script to re-sync the repo.  /try is back
> online now.

Was the reset needed for perf reasons/was the repo actually corrupt?

(Resetting Try looses all prior try results, so whilst expected when we have to reset for perf, is something that would be good to avoid in the future if the repo wasn't actually corrupted :-))

Thank you for getting it back online anyway :-)

Comment 3

5 years ago
Try was reset for performance reasons. The head count at reset was 21811 dating back to January 2011. This was so large that it was causing requests for json-pushes to time out and create HTTP 500s resulting in a service availability for everybody.

Comment 4

5 years ago
Ah didn't realise it was back up that high again - thanks :-)
The head dates don't match with try resets. (I believe that could happen if someone pushed a very old commit from their local repo to try.)

I believe our prior try reset was July 2012 per bug 778062
(In reply to Ben Kero [:bkero] from comment #3)
> Try was reset for performance reasons. The head count at reset was 21811
> dating back to January 2011. This was so large that it was causing requests
> for json-pushes to time out and create HTTP 500s resulting in a service
> availability for everybody.

That and the docs we had weren't clear on a course of action that would have led to discussing things with people first _before_ the reset was carried out. 

I'll work on fixing that for the future and making sure we have better clarity in our documentation.

Sorry for all the trouble this has caused.

Comment 7

5 years ago
(In reply to Shyam Mani [:fox2mike] from comment #6)
> Sorry for all the trouble this has caused.

No problem - keep up the awesome work :-)
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in before you can comment on or make changes to this bug.