Closed Bug 656757 Opened 13 years ago Closed 13 years ago

Cannot push to try

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: stechz, Assigned: nmeyerhans)

Details

I get a "waiting for lock on repository /repo/hg/mozilla/try/ held by 'dm-svn02.mozilla.org:7802'.
Assignee: nobody → server-ops
Component: Release Engineering → Server Operations
QA Contact: release → mrz
Severity: normal → blocker
Looks like there is just a whole lot of activity on try today.  The locks are legit and go away when the push completes.
Nobody has pushed to try since over 4 hours ago, and I've been consistently trying to push for the past 3 hours. There are several people reporting the same problem on #developers.
There is an hg process owned by you currently holding a lock on the try repo. Did you abort your push?
Not that I'm aware of. I don't cancel, I just wait until it times out.
Not sure who's actually driving this, but I see discussion on IRC among people who look like they're attempting to solve it, and it's paging me.  I'll make sure it gets taken care of at least.
Assignee: server-ops → justdave
Just wanted to give an update from Noah over IRC. This doesn't seem to be a lock issue, as CPU is getting pegged when someone tries to push.
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 3 changesets with 8 changes to 9 files (+1 heads)
remote: Trying to insert into pushlog.
remote: Please do not interrupt...
remote: error: pretxnchangegroup.z_loghistory hook raised an exception: column rev is not unique
remote: transaction abort!
remote: rollback completed
remote: ** unknown exception encountered, details follow
remote: ** report bug details to http://mercurial.selenic.com/bts/
remote: ** or mercurial@selenic.com
remote: ** Python 2.4.3 (#1, Jun 11 2009, 14:09:58) [GCC 4.1.2 20080704 (Red Hat 4.1.2-44)]
remote: ** Mercurial Distributed SCM (version 1.5.4)
remote: ** Extensions loaded: hgwebjson, pushlog-feed, buglink
remote: Traceback (most recent call last):
remote:   File "/usr/bin/hg", line 27, in ?
remote:     mercurial.dispatch.run()
remote:   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 16, in run
remote:     sys.exit(dispatch(sys.argv[1:]))
remote:   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 30, in dispatch
remote:     return _runcatch(u, args)
remote:   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 50, in _runcatch
remote:     return _dispatch(ui, args)
remote:   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 471, in _dispatch
remote:     return runcommand(lui, repo, cmd, fullargs, ui, options, d)
remote:   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 341, in runcommand
remote:     ret = _runcommand(ui, options, cmd, d)
remote:   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 522, in _runcommand
remote:     return checkargs()
remote:   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 476, in checkargs
remote:     return cmdfunc()
remote:   File "/usr/lib/python2.4/site-packages/mercurial/dispatch.py", line 470, in <lambda>
remote:     d = lambda: util.checksignature(func)(ui, *args, **cmdoptions)
remote:   File "/usr/lib/python2.4/site-packages/mercurial/util.py", line 401, in check
remote:     return func(*args, **kwargs)
remote:   File "/usr/lib/python2.4/site-packages/mercurial/commands.py", line 2904, in serve
remote:     s.serve_forever()
remote:   File "/usr/lib/python2.4/site-packages/mercurial/sshserver.py", line 45, in serve_forever
remote:     while self.serve_one():
remote:   File "/usr/lib/python2.4/site-packages/mercurial/sshserver.py", line 57, in serve_one
remote:     impl()
remote:   File "/usr/lib/python2.4/site-packages/mercurial/sshserver.py", line 208, in do_unbundle
remote:     r = self.repo.addchangegroup(fp, 'serve', self.client_url())
remote:   File "/usr/lib/python2.4/site-packages/mercurial/localrepo.py", line 2120, in addchangegroup
remote:     url=url, pending=p)
remote:   File "/usr/lib/python2.4/site-packages/mercurial/localrepo.py", line 152, in hook
remote:     return hook.hook(self.ui, self, name, throw, **args)
remote:   File "/usr/lib/python2.4/site-packages/mercurial/hook.py", line 142, in hook
remote:     r = _pythonhook(ui, repo, name, hname, hookfn, args, throw) or r
remote:   File "/usr/lib/python2.4/site-packages/mercurial/hook.py", line 68, in _pythonhook
remote:     r = obj(ui=ui, repo=repo, hooktype=name, **args)
remote:   File "/usr/lib/python2.4/site-packages/mozhghooks/pushlog.py", line 79, in log
remote:     (pushid, ctx.rev(), hex(ctx.node())))
remote: pysqlite2.dbapi2.IntegrityError: column rev is not unique
abort: unexpected response: empty string
Assignee: justdave → nmeyerhans
After meeting with NoahM and lsblakk, we think the fastest way to get try working again is to move this repo aside and create a fresh new try repo. We've marked the Try tree closed and have started this. 

Also, email send to dev.planning, dev.tree-management, and notified developers.
That isn't a new error, and it's an easy fix.

Can somebody provide the output for the following commands (using `sqlite3 pushlog2.db`)?
select * from changesets order by pushid desc limit 5;
select * from pushlog order by id desc limit 5;
I just pushed twice to try, successfully.
(In reply to comment #9)
> That isn't a new error, and it's an easy fix.
> 
> Can somebody provide the output for the following commands (using `sqlite3
> pushlog2.db`)?
> select * from changesets order by pushid desc limit 5;
> select * from pushlog order by id desc limit 5;

Actually, this did appear to be new, and I did check pushlog:

sqlite> select * from pushlog order by id desc limit 6;
22042|respindola@mozilla.com|1305222363
22041|eakhgari@mozilla.com|1305222306
22040|romaxa@gmail.com|1305221823
22039|respindola@mozilla.com|1305221805
22038|dtownsend@mozilla.com|1305221206
22037|dtownsend@mozilla.com|1305219691
sqlite> select * from changesets order by pushid desc limit 6;
22042|84832|de10fad6cb7a4db141043688535eead7c0fe09df
22041|84831|d8078fc9279ef9ee0e34607dc405757ba86abfe8
22040|84830|3c46bc426fac6d6c12994e78268361f486b735f6
22040|84829|03c3ba8e36d66ef5fd98fe0b2dadfc7b1677f2d2
22040|84828|9a7b966ab0b60e52f868e5f54e1d231a08f6e7b8
22040|84827|b135939df49e12b7dd2df9d7c92c79de8e188781

And the last commit in the repo was 
changeset:   84832:de10fad6cb7a
tag:         tip
parent:      84811:ed867467d35b
user:        Rafael Ávila de Espíndola <respindola@mozilla.com>
date:        Thu May 12 13:43:57 2011 -0400
summary:     try: -b do -p macosx,macosx64 -u all -t all
Appears to be working now, I just successfully pushed as well.
(In reply to comment #10)
> Note that this has happened many times before
> (https://bugzilla.mozilla.org/buglist.cgi?quicksearch=ALL%20prod%3Amozilla.
> org%20%22column%20rev%20is%20not%20unique%22).

The bit about "column rev is not unique" was actually a secondary issue.

The primary symptom was that push attempts would spin for a long time and eventually give up.  However, before failing, an entry would successfully log to pushlog.  A second attempt to push the same change would result in "column rev is not unique".

My update in Comment 12 shows the state of pushlog after I had cleaned out a push attempt that had failed to make it into the repo.  Unfortunately, you'll have to take my word for it that pushes continued to fail after fixing pushlog. (I'd have happily stopped right there if they didn't!)
(In reply to comment #13)
> Appears to be working now, I just successfully pushed as well.

(In reply to comment #11)
> I just pushed twice to try, successfully.

From these comments, and others in irc, all is working, so the tree is reopened and all working again.


Leaving this bug open while we try to figure out what went wrong, and whether we have to worry about this happening to other repos.
Did this happen before we started any of the work in bug 633161? If so, I wonder if we just finally got to a state where the repo was too slow to work with, so pushes would time out before completing.
I wondered that too, but I don't think it's the case.  According to one of the people attempting to push, performance didn't steadily degrade, but got suddenly worse.  From the sound of things, push operations don't see the performance degradation from having lots of heads.  Prior to yesterday's incident, pushes were still completing in 10-20 seconds.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.