Closed Bug 778062 Opened 12 years ago Closed 12 years ago

Try server appears to have been reset entirely

Categories

(Infrastructure & Operations :: Infrastructure: Other, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mattwoodrow, Assigned: cshields)

Details

https://hg.mozilla.org/try/ is showing no changesets at all, and hg outgoing (locally) shows the entire m-c history. This probably needs to be recloned from m-c
cshields was doing some more experimenting a few hours ago, probably using a single node with upgraded hg and tweaked setup again. Last time that was done there was some rules on the Zeus load balancer to make sure all try requests went to that node. Perhaps that was not all undone when he finished ? Bug 777521 was to reset try at the weekend.
Assignee: server-ops-infra → server-ops-devservices
Component: Server Operations: Infrastructure → Server Operations: Developer Services
QA Contact: jdow → shyam
Try is unusable due to this (and has been since at least the time of comment 0); bumping severity.
Severity: normal → blocker
OS: Mac OS X → All
Hardware: x86 → All
Assignee: server-ops-devservices → ashish
Assignee: ashish → cshields
Pushlog on /try is empty. Corey is resetting try to bring it back to a consistent state.
I've already restarted the buildbot scheduler, it should pick up new changesets as soon as Corey is finished and someone does a push. If not ping catlee/bhearsum/rail.
Severity: blocker → normal
Component: Server Operations: Developer Services → Server Operations: Infrastructure
OS: All → Mac OS X
Hardware: All → x86
(In reply to Nick Thomas [:nthomas] from comment #4) > I've already restarted the buildbot scheduler, it should pick up new > changesets as soon as Corey is finished and someone does a push. If not ping > catlee/bhearsum/rail. This unfortunately meant that build were scheduled on 20 or so pushes cloned from the tip of m-c. Manually cancelled using buildapi; but is there a way we can avoid this?
Try is reset and back to a usable state (we still have the issue of the growing heads that will cause problems shortly down the road) What happened is while prepping for 777521 last night it appears some of the steps were accidentally triggered by an admin but thought to be cancelled in time. This was one of the admins who has been working around the clock on the try issues so I attribute this mistake to admin fatigue. These scripts are only used for /try (not a threat to any other repo) and we will add a confirmation stop to prevent this in the future. While this bug came in about 4 hours ago, the repo has been broken for more like 6 hours. I apologize for the inconvenience. Try should be fixed for now but we have bigger architectural issues to address still in fixing 770811 for the admins: some reason the pushlog that copied over did not match source (even though nothing changed mid flight) and had to be copied by hand. While this is no problem, I forgot to fix the g+w perm on the new pushlog and with the new ordering of the hooks, the first commit was able to be done without inserting into pushlog and screwed things up. Reset the repo again after this. mattwoodrow confirmed it working on irc.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
(In reply to Ed Morley [:edmorley] from comment #5) > (In reply to Nick Thomas [:nthomas] from comment #4) > > I've already restarted the buildbot scheduler, it should pick up new > > changesets as soon as Corey is finished and someone does a push. If not ping > > catlee/bhearsum/rail. > > This unfortunately meant that build were scheduled on 20 or so pushes cloned > from the tip of m-c. Manually cancelled using buildapi; but is there a way > we can avoid this? I'm surprised about this, it would indicate the pushlog was empty/smaller prior to Nick's scheduler restart, and then had entries added to it later. Maybe Corey did a pull or something that caused the pushlog hook to fire? Hard to be sure.
(In reply to Ben Hearsum [:bhearsum] from comment #7) > I'm surprised about this, it would indicate the pushlog was empty/smaller > prior to Nick's scheduler restart, and then had entries added to it later. > Maybe Corey did a pull or something that caused the pushlog hook to fire? > Hard to be sure. An empty pushlog is what started this problem. In addition, for some reason our try reset scripts resulted in a corrupt pushlog (which would have shown up empty after the repo itself looked "good") until it was fixed by hand.
Component: Server Operations: Infrastructure → Infrastructure: Other
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.