994028 - Timeouts on try while searching for changes ("remote: abort: repository /repo/hg/mozilla/try: timed out waiting for lock held by hgssh1.dmz.scl3.mozilla.com:NNNNN")

Reporter

Description

•

11 years ago

A coupe of devs complaint that to try result in "searching for changes" for more then +5 minutes and then times out. Not sure what the issue is, but this seems to affect several people in different locations

Hannes Verschore [:h4writer]

Comment 1

•

11 years ago

> $ try > pushing to ssh://hg.mozilla.org/try > searching for changes > ^Cinterrupted! > remote: waiting for lock on repository /repo/hg/mozilla/try held by 'hgssh1.dmz.scl3.mozilla.com:1850'

Julian Seward [:jseward]

Comment 2

•

11 years ago

$ hg -v push -f ssh://jseward@mozilla.com@hg.mozilla.org/try/ pushing to ssh://jseward%40mozilla.com@hg.mozilla.org/try/ running ssh jseward@mozilla.com@hg.mozilla.org 'hg -R try/ serve --stdio' searching for changes 1 changesets found ^Cinterrupted! remote: waiting for lock on repository /repo/hg/mozilla/try/ held by 'hgssh1.dmz.scl3.mozilla.com:25220' remote: Killed by signal 2.

Gabor Krizsanits (INACTIVE)

Updated

•

11 years ago

Blocks: 821809

Pete Moore [:pmoore][:pete]

Comment 3

•

11 years ago

Duplicate of: 994647

Gabor Krizsanits (INACTIVE)

Comment 5

•

11 years ago

Seems to be working again. At least for me.

Mike Ratcliffe [:miker] [:mratcliffe] [:mikeratcliffe]

Comment 6

•

11 years ago

It has been working on and off for a couple of days now but it down for a good couple of hours when it fails. Occasionally it does go through but is really slow (10-15 minutes). I keep getting: remote: waiting for lock on repository /repo/hg/mozilla/try held by 'hgssh1.dmz.scl3.mozilla.com:25220' So the same server for everyone in this bug.

Gabor Krizsanits (INACTIVE)

Updated

•

11 years ago

No longer blocks: 821809

Gabor Krizsanits (INACTIVE)

Updated

•

11 years ago

Blocks: 821809

Ed Morley [:emorley]

Updated

•

11 years ago

Hardware: x86 → All

Summary: timeouts on try while searching for changes → Timeouts on try while searching for changes ("remote: abort: repository /repo/hg/mozilla/try: timed out waiting for lock held by hgssh1.dmz.scl3.mozilla.com:NNNNN")

Joel Maher ( :jmaher ) (UTC -8)

Comment 7

•

11 years ago

this is happening consistently for me. Is there a workaround we can do locally?

Carsten Book [:Tomcat]

Reporter

Comment 8

•

11 years ago

raising severity since again devs complaint about this issue and that this is blocking them for using try

Severity: normal → major

Ryan Watson [:w0ts0n]

Comment 9

•

11 years ago

Taking until I can find someone from webops to work on this.

Assignee: server-ops-webops → rwatson

Ryan Watson [:w0ts0n]

Updated

•

11 years ago

Assignee: rwatson → klibby

Kendall Libby [:fubar] (he/him)

Assignee

Comment 10

•

11 years ago

Try is currently up to 7700 heads, which is around when things start to get ugly. OTOH, we've managed to get up to over 21,000... It would be helpful to know what the parent revs are to changesets that are failing. The older they are, the longer it takes for hg to process, which can cause things to timeout and fall over. I'd like to also note that we had planned to do regular resets of try during the 6-week tree closing windows, but were told by "devs" that doing so was painful and to not reset try. And yet, here we are, all unhappy. :-(

hwine

Comment 11

•

11 years ago

Note: 5 minutes (comment 1) is not considered excessive. The guidelines at https://wiki.mozilla.org/ReleaseEngineering/TryServer#Pushes_to_try_take_a_very_long_time suggests 15 minutes is about when things are "bad". Please confirm you are requesting a reset of try -- that process is disruptive to other devs.

Carsten Book [:Tomcat]

Reporter

Updated

•

11 years ago

Whiteboard: see comment #11 if you run into this problem

(no longer active)

Comment 13

•

11 years ago

In comment 1 and 2, it seems like people are being blocked by a lock which was not properly released. That is a different issue than the traditional "push to try takes long because too many heads" (the distinction being the push taking long versus timing out). Which problem do you see, Joel? (FWIW I have not noticed pushing to try to take an unusually long amount of time as of Sunday.)

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

11 years ago

when I was trying to push last week, I was unable to push at all to try for about 4 hours, then it started working. While I couldn't push, I did see new pushes to try. My case was a timeout on april 16th.

Flags: needinfo?(jmaher)

(no longer active)

Comment 15

•

11 years ago

(In reply to comment #14) > when I was trying to push last week, I was unable to push at all to try for > about 4 hours, then it started working. While I couldn't push, I did see new > pushes to try. My case was a timeout on april 16th. Yeah, seems like this was not the usual problem of pushes to try being slow because of the number of heads then. Thanks!

(no longer active)

Comment 16

•

11 years ago

This has striken again, today I got the following twice: pushing to ssh://hg.mozilla.org/try searching for changes remote: waiting for lock on repository /repo/hg/mozilla/try held by 'hgssh1.dmz.scl3.mozilla.com:15332' remote: abort: repository /repo/hg/mozilla/try: timed out waiting for lock held by hgssh1.dmz.scl3.mozilla.com:15332

Paul Rouget [:paul]

Comment 17

•

11 years ago

I'm unable to push. No mater what I do. Stuck on "searching for changes". After ~10 minutes, it times out.

Paul Rouget [:paul]

Comment 18

•

11 years ago

Same logs for the past 2 hours (I tried to push many times): pushing to ssh://hg.mozilla.org/try searching for changes remote: waiting for lock on repository /repo/hg/mozilla/try held by 'hgssh1.dmz.scl3.mozilla.com:28056' remote: abort: repository /repo/hg/mozilla/try: timed out waiting for lock held by hgssh1.dmz.scl3.mozilla.com:28056 abort: unexpected response: empty string

David Burns :automatedtester

Comment 19

•

11 years ago

I have just been hit by this (marionette)☁ marionette hg try pushing to ssh://hg.mozilla.org/try searching for changes remote: waiting for lock on repository /repo/hg/mozilla/try held by 'hgssh1.dmz.scl3.mozilla.com:18064' remote: abort: repository /repo/hg/mozilla/try: timed out waiting for lock held by hgssh1.dmz.scl3.mozilla.com:28994 abort: unexpected response: empty string

Mason Chang [Inactive] [:mchang]

Comment 20

•

11 years ago

Attached file hg try push with --debug — Details

:hwine asked me to post this on IRC. This is a try push with the --debug option from hg. I actually only have 1 changeset, so I'm not sure where the other 38 came from.

Daniel Holbert [:dholbert]

Comment 21

•

11 years ago

(The other 38 are just csets that you've pulled from upstream [m-c or m-i or wheverer] & you're the first person to push those csets to try. That's normal.) There have been no successful pushes to Try by anyone since 11:55 PDT today (so, approaching 4 hours). hwine, not sure if you're actively looking into this -- if you are, awesome. If not, do you know who should look at it & how we can escalate it?

Flags: needinfo?(hwine)

Phil Ringnalda (:philor)

Comment 22

•

11 years ago

It unhorked itself, after four hours and ten minutes. If you don't want to do this again, there are two things to do: Right now, reply to the thread in https://groups.google.com/d/msg/mozilla.dev.platform/Hb2EKXZmY70/Ijzo3Jo2WxcJ saying that you disagree that it is better to let try get to the point of completely sucking and then do an emergency reset (and that both of those things are better than just sitting around for four hours and ten minutes). Next time it happens, https://wiki.mozilla.org/ReleaseEngineering/TryServer#Pushes_to_try_take_a_very_long_time says that if you are unable to push for more than 15 minutes (and implies that if your timeouts keep coming up with the same PID), to file a bug. Judging by the outcome of bug 1001735, where the eventual response was to ask the reporter whether or not he personally was still unable to push, at a time five days later when it was known by all that nobody had been able to push for three hours, that means *everyone* file *their own* bug asking for a reset.

hwine

Comment 23

•

11 years ago

(In reply to Phil Ringnalda (:philor) from comment #22) > It unhorked itself, after four hours and ten minutes. > > If you don't want to do this again, there are two things to do: > > Right now, reply to the thread in > https://groups.google.com/d/msg/mozilla.dev.platform/Hb2EKXZmY70/ > Ijzo3Jo2WxcJ Excellent suggestion. The major impact of resets is on developers. We need their help in coming up with criteria to make the call of "enough pain - reset!". > Next time it happens, Judging by the outcome of bug 1001735, > *everyone* file *their own* bug asking for a reset. Please don't -- what we need is a process that works for all devs. The dev.platform thread is the right place to do this.

Flags: needinfo?(hwine)

Ed Morley [:emorley]

Updated

•

11 years ago

Blocks: 770811

Kendall Libby [:fubar] (he/him)

Assignee

Updated

•

11 years ago

Blocks: try-tracker

hwine

Updated

•

11 years ago

No longer blocks: 770811

Kendall Libby [:fubar] (he/him)

Assignee

Updated

•

11 years ago

Component: WebOps: Source Control → Mercurial: hg.mozilla.org

Product: Infrastructure & Operations → Developer Services

:kanban-engops

Updated

•

11 years ago

Whiteboard: see comment #11 if you run into this problem → [kanban:engops:https://kanbanize.com/ctrl_board/6/143] see comment #11 if you run into this problem

:kanban-engops

Updated

•

11 years ago

Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/143] see comment #11 if you run into this problem → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1014] [kanban:engops:https://kanbanize.com/ctrl_board/6/143] see comment #11 if you run into this problem

:kanban-engops

Updated

•

11 years ago

Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1014] [kanban:engops:https://kanbanize.com/ctrl_board/6/143] see comment #11 if you run into this problem [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1023] [kanban:engops:https://kanb… → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1023] [kanban:engops:https://kanbanize.com/ctrl_board/6/143] see comment #11 if you run into this problem [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1024] [kanban:engops:https://kanb…

Nobody; OK to take it and work on it

Updated

•

11 years ago

Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1024] [kanban:engops:https://kanbanize.com/ctrl_board/6/143] see comment #11 if you run into this problem → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1024] see comment #11 if you run into this problem

Gregory Szorc [:gps]

Comment 25

•

10 years ago

Don't see much value in keeping this bug open.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → WORKSFORME