778062 - Try server appears to have been reset entirely

Reporter

Description

•

12 years ago

https://hg.mozilla.org/try/ is showing no changesets at all, and hg outgoing (locally) shows the entire m-c history. This probably needs to be recloned from m-c

Nick Thomas [:nthomas] (UTC+12)

Comment 1

•

12 years ago

cshields was doing some more experimenting a few hours ago, probably using a single node with upgraded hg and tweaked setup again. Last time that was done there was some rules on the Zeus load balancer to make sure all try requests went to that node. Perhaps that was not all undone when he finished ? Bug 777521 was to reset try at the weekend.

Ashish Vijayaram [:ashish]

Updated

•

12 years ago

Assignee: server-ops-infra → server-ops-devservices

Component: Server Operations: Infrastructure → Server Operations: Developer Services

QA Contact: jdow → shyam

Ed Morley [:emorley]

Comment 2

•

12 years ago

Try is unusable due to this (and has been since at least the time of comment 0); bumping severity.

Severity: normal → blocker

OS: Mac OS X → All

Hardware: x86 → All

Ashish Vijayaram [:ashish]

Updated

•

12 years ago

Assignee: server-ops-devservices → ashish

Ashish Vijayaram [:ashish]

Updated

•

12 years ago

Assignee: ashish → cshields

Ashish Vijayaram [:ashish]

Comment 3

•

12 years ago

Pushlog on /try is empty. Corey is resetting try to bring it back to a consistent state.

Nick Thomas [:nthomas] (UTC+12)

Comment 4

•

12 years ago

I've already restarted the buildbot scheduler, it should pick up new changesets as soon as Corey is finished and someone does a push. If not ping catlee/bhearsum/rail.

Severity: blocker → normal

Component: Server Operations: Developer Services → Server Operations: Infrastructure

OS: All → Mac OS X

Hardware: All → x86

Ed Morley [:emorley]

Comment 5

•

12 years ago

(In reply to Nick Thomas [:nthomas] from comment #4) > I've already restarted the buildbot scheduler, it should pick up new > changesets as soon as Corey is finished and someone does a push. If not ping > catlee/bhearsum/rail. This unfortunately meant that build were scheduled on 20 or so pushes cloned from the tip of m-c. Manually cancelled using buildapi; but is there a way we can avoid this?

Corey Shields [:cshields]

Assignee

Comment 6

•

12 years ago

Try is reset and back to a usable state (we still have the issue of the growing heads that will cause problems shortly down the road) What happened is while prepping for 777521 last night it appears some of the steps were accidentally triggered by an admin but thought to be cancelled in time. This was one of the admins who has been working around the clock on the try issues so I attribute this mistake to admin fatigue. These scripts are only used for /try (not a threat to any other repo) and we will add a confirmation stop to prevent this in the future. While this bug came in about 4 hours ago, the repo has been broken for more like 6 hours. I apologize for the inconvenience. Try should be fixed for now but we have bigger architectural issues to address still in fixing 770811 for the admins: some reason the pushlog that copied over did not match source (even though nothing changed mid flight) and had to be copied by hand. While this is no problem, I forgot to fix the g+w perm on the new pushlog and with the new ordering of the hooks, the first commit was able to be done without inserting into pushlog and screwed things up. Reset the repo again after this. mattwoodrow confirmed it working on irc.

Corey Shields [:cshields]

Assignee

Updated

•

12 years ago

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

bhearsum@mozilla.com (:bhearsum)

Comment 7

•

12 years ago

(In reply to Ed Morley [:edmorley] from comment #5) > (In reply to Nick Thomas [:nthomas] from comment #4) > > I've already restarted the buildbot scheduler, it should pick up new > > changesets as soon as Corey is finished and someone does a push. If not ping > > catlee/bhearsum/rail. > > This unfortunately meant that build were scheduled on 20 or so pushes cloned > from the tip of m-c. Manually cancelled using buildapi; but is there a way > we can avoid this? I'm surprised about this, it would indicate the pushlog was empty/smaller prior to Nick's scheduler restart, and then had entries added to it later. Maybe Corey did a pull or something that caused the pushlog hook to fire? Hard to be sure.

Corey Shields [:cshields]

Assignee

Comment 8

•

12 years ago

(In reply to Ben Hearsum [:bhearsum] from comment #7) > I'm surprised about this, it would indicate the pushlog was empty/smaller > prior to Nick's scheduler restart, and then had entries added to it later. > Maybe Corey did a pull or something that caused the pushlog hook to fire? > Hard to be sure. An empty pushlog is what started this problem. In addition, for some reason our try reset scripts resulted in a corrupt pushlog (which would have shown up empty after the repo itself looked "good") until it was fixed by hand.

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: Infrastructure → Infrastructure: Other

Product: mozilla.org → Infrastructure & Operations

Bugzilla

Try server appears to have been reset entirely

Categories

(Infrastructure & Operations :: Infrastructure: Other, task)

Tracking

(Not tracked)

People

(Reporter: mattwoodrow, Assigned: cshields)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Updated

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Updated