Closed Bug 656757 Opened 14 years ago Closed 14 years ago

Cannot push to try

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: stechz, Assigned: nmeyerhans)

Details

I get a "waiting for lock on repository /repo/hg/mozilla/try/ held by 'dm-svn02.mozilla.org:7802'.
Assignee: nobody → server-ops
Component: Release Engineering → Server Operations
QA Contact: release → mrz
Severity: normal → blocker
Looks like there is just a whole lot of activity on try today. The locks are legit and go away when the push completes.
Nobody has pushed to try since over 4 hours ago, and I've been consistently trying to push for the past 3 hours. There are several people reporting the same problem on #developers.
There is an hg process owned by you currently holding a lock on the try repo. Did you abort your push?
Not that I'm aware of. I don't cancel, I just wait until it times out.
Not sure who's actually driving this, but I see discussion on IRC among people who look like they're attempting to solve it, and it's paging me. I'll make sure it gets taken care of at least.
Assignee: server-ops → justdave
Just wanted to give an update from Noah over IRC. This doesn't seem to be a lock issue, as CPU is getting pegged when someone tries to push.
remote: adding changesets remote: adding manifests remote: adding file changes remote: added 3 changesets with 8 changes to 9 files (+1 heads) remote: Trying to insert into pushlog. remote: Please do not interrupt... remote: error: pretxnchangegroup.z_loghistory hook raised an exception: column rev is not unique remote: transaction abort! remote: rollback completed remote: ** unknown exception encountered, details follow remote: ** report bug details to http://mercurial.selenic.com/bts/ remote: ** or mercurial@selenic.com remote: ** Python 2.4.3 (#1, Jun 11 2009, 14:09:58) [GCC 4.1.2 20080704 (Red Hat 4.1.2-44)] remote: ** Mercurial Distributed SCM (version 1.5.4) remote: ** Extensions loaded: hgwebjson, pushlog-feed, buglink remote: Traceback (most recent call last): remote: File "/usr/bin/hg", line 27, in ? remote: mercurial.dispatch.run() remote: File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 16, in run remote: sys.exit(dispatch(sys.argv[1:])) remote: File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 30, in dispatch remote: return _runcatch(u, args) remote: File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 50, in _runcatch remote: return _dispatch(ui, args) remote: File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 471, in _dispatch remote: return runcommand(lui, repo, cmd, fullargs, ui, options, d) remote: File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 341, in runcommand remote: ret = _runcommand(ui, options, cmd, d) remote: File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 522, in _runcommand remote: return checkargs() remote: File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 476, in checkargs remote: return cmdfunc() remote: File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 470, in <lambda> remote: d = lambda: util.checksignature(func)(ui, *args, **cmdoptions) remote: File "/usr/lib/python2.4/site-packages/mercurial/util.py", line 401, in check remote: return func(*args, **kwargs) remote: File "/usr/lib/python2.4/site-packages/mercurial/commands.py", line 2904, in serve remote: s.serve_forever() remote: File "/usr/lib/python2.4/site-packages/mercurial/sshserver.py", line 45, in serve_forever remote: while self.serve_one(): remote: File "/usr/lib/python2.4/site-packages/mercurial/sshserver.py", line 57, in serve_one remote: impl() remote: File "/usr/lib/python2.4/site-packages/mercurial/sshserver.py", line 208, in do_unbundle remote: r = self.repo.addchangegroup(fp, 'serve', self.client_url()) remote: File "/usr/lib/python2.4/site-packages/mercurial/localrepo.py", line 2120, in addchangegroup remote: url=url, pending=p) remote: File "/usr/lib/python2.4/site-packages/mercurial/localrepo.py", line 152, in hook remote: return hook.hook(self.ui, self, name, throw, **args) remote: File "/usr/lib/python2.4/site-packages/mercurial/hook.py", line 142, in hook remote: r = _pythonhook(ui, repo, name, hname, hookfn, args, throw) or r remote: File "/usr/lib/python2.4/site-packages/mercurial/hook.py", line 68, in _pythonhook remote: r = obj(ui=ui, repo=repo, hooktype=name, **args) remote: File "/usr/lib/python2.4/site-packages/mozhghooks/pushlog.py", line 79, in log remote: (pushid, ctx.rev(), hex(ctx.node()))) remote: pysqlite2.dbapi2.IntegrityError: column rev is not unique abort: unexpected response: empty string
Assignee: justdave → nmeyerhans
After meeting with NoahM and lsblakk, we think the fastest way to get try working again is to move this repo aside and create a fresh new try repo. We've marked the Try tree closed and have started this. Also, email send to dev.planning, dev.tree-management, and notified developers.
That isn't a new error, and it's an easy fix. Can somebody provide the output for the following commands (using `sqlite3 pushlog2.db`)? select * from changesets order by pushid desc limit 5; select * from pushlog order by id desc limit 5;
I just pushed twice to try, successfully.
(In reply to comment #9) > That isn't a new error, and it's an easy fix. > > Can somebody provide the output for the following commands (using `sqlite3 > pushlog2.db`)? > select * from changesets order by pushid desc limit 5; > select * from pushlog order by id desc limit 5; Actually, this did appear to be new, and I did check pushlog: sqlite> select * from pushlog order by id desc limit 6; 22042|respindola@mozilla.com|1305222363 22041|eakhgari@mozilla.com|1305222306 22040|romaxa@gmail.com|1305221823 22039|respindola@mozilla.com|1305221805 22038|dtownsend@mozilla.com|1305221206 22037|dtownsend@mozilla.com|1305219691 sqlite> select * from changesets order by pushid desc limit 6; 22042|84832|de10fad6cb7a4db141043688535eead7c0fe09df 22041|84831|d8078fc9279ef9ee0e34607dc405757ba86abfe8 22040|84830|3c46bc426fac6d6c12994e78268361f486b735f6 22040|84829|03c3ba8e36d66ef5fd98fe0b2dadfc7b1677f2d2 22040|84828|9a7b966ab0b60e52f868e5f54e1d231a08f6e7b8 22040|84827|b135939df49e12b7dd2df9d7c92c79de8e188781 And the last commit in the repo was changeset: 84832:de10fad6cb7a tag: tip parent: 84811:ed867467d35b user: Rafael Ávila de Espíndola <respindola@mozilla.com> date: Thu May 12 13:43:57 2011 -0400 summary: try: -b do -p macosx,macosx64 -u all -t all
Appears to be working now, I just successfully pushed as well.
(In reply to comment #10) > Note that this has happened many times before > (https://bugzilla.mozilla.org/buglist.cgi?quicksearch=ALL%20prod%3Amozilla. > org%20%22column%20rev%20is%20not%20unique%22). The bit about "column rev is not unique" was actually a secondary issue. The primary symptom was that push attempts would spin for a long time and eventually give up. However, before failing, an entry would successfully log to pushlog. A second attempt to push the same change would result in "column rev is not unique". My update in Comment 12 shows the state of pushlog after I had cleaned out a push attempt that had failed to make it into the repo. Unfortunately, you'll have to take my word for it that pushes continued to fail after fixing pushlog. (I'd have happily stopped right there if they didn't!)
(In reply to comment #13) > Appears to be working now, I just successfully pushed as well. (In reply to comment #11) > I just pushed twice to try, successfully. From these comments, and others in irc, all is working, so the tree is reopened and all working again. Leaving this bug open while we try to figure out what went wrong, and whether we have to worry about this happening to other repos.
Did this happen before we started any of the work in bug 633161? If so, I wonder if we just finally got to a state where the repo was too slow to work with, so pushes would time out before completing.
I wondered that too, but I don't think it's the case. According to one of the people attempting to push, performance didn't steadily degrade, but got suddenly worse. From the sound of things, push operations don't see the performance degradation from having lots of heads. Prior to yesterday's incident, pushes were still completing in 10-20 seconds.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.