Closed Bug 695467 Opened 8 years ago Closed 8 years ago

Bm builders are hitting mercurial bugs & failing (turning blue)


(Release Engineering :: General, defect, P3)



(Not tracked)



(Reporter: dholbert, Assigned: bkero)



(Whiteboard: [buildduty][hg])

We seem to be hitting a mercurial bug on Mozilla-Beta and Mozilla-Aurora, triggering tons of (automatically-retriggered) Bm builds.

OS X 10.5.2 Mobile Desktop mozilla-beta build on 2011-10-18 12:19:07 PDT for push 72be1d924c35
WINNT 5.2 Mobile Desktop mozilla-beta build on 2011-10-18 11:36:38 PDT for push 52c9c801be77

The error looks like this:
 argv: ['/usr/local/bin/hg', 'clone', '--verbose', '--noupdate', u'', 'build']
 using PTY: False
transaction abort!
requesting all changes
adding changesets
rollback completed
** unknown exception encountered, please report by visiting
** Python 2.5.1 (r251:54863, Jan 17 2008, 19:35:17) [GCC 4.0.1 (Apple Inc. build 5465)]
** Mercurial Distributed SCM (version 1.7.5)
** Extensions loaded: share, rebase, mq, purge
Traceback (most recent call last):
  File "/usr/local/bin/hg", line 38, in <module>
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 16, in run
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 36, in dispatch
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 58, in _runcatch
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 593, in _dispatch
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 401, in runcommand
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 644, in _runcommand
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 598, in checkargs
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 591, in <lambda>
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 426, in check
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/", line 736, in clone
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 337, in clone
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 1886, in clone
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 1295, in pull
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 1692, in addchangegroup
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 1381, in addgroup
  File "tools/mercurial-1.7.5/lib/python2.5/site-packages/mercurial/", line 1220, in _addrevision
mpatch.mpatchError: patch cannot be decoded
program finished with exit code 1
cc-ing bkero because he upgraded varnish in bug 693202 this morning
Priority: -- → P3
Whiteboard: [buildduty][hg]
The problem appears to have started today.

The last-good push to Mozilla-Beta was last Thursday:
The first-bad push was this morning at 11:00 AM:
(nothing else was pushed between Thursday and today)

On Mozilla-Aurora, the last-good push appears to be this morning at 11:11 AM:
and the first-bad push was today at 12:16 PM:
bkero: are there scripts involved here that need to be updated, like those for comm-beta this morning>
coop: I'm wondering if the scripts I updated were the same as the comm-beta ones.  Is it possible to rerun this job to see if this problem was fixed with the scripts that I updated for comm-beta?
So, these mobile desktop builders (Bm) do a regular 'hg clone .../releases/mozilla-beta', instead of using our which uses hg share where it can. Consequently they will cause more traffic than the other builds (B), and the constant retrying could lead to a situation where you never get out of a broken state. The slaves having issues are located in SJC1, so the traffic is intra-colo.

We know we changed varnish this morning, and we're consistently getting 
  mpatch.mpatchError: patch cannot be decoded

Can we try dumping all the pages for mozilla-beta and mozilla-aurora in the varnish cache ?
Assignee: nobody → server-ops-releng
Component: Release Engineering → Server Operations: RelEng
QA Contact: release → zandr
Assignee: server-ops-releng → bkero
This is happening on the main mozilla-central and mozilla-inbound trees too.
Removing specific mention of Mozilla-Beta from summary.
Summary: Bm builders on Mozilla-Beta are hitting mercurial bugs & failing (turning blue) → Bm builders are hitting mercurial bugs & failing (turning blue)
I've dumped all of the varnish cache to see if helps resolve this problem.

I have been attempting to replicate the issue.  I've done a duplicate clone on a separate varnish instance (on an unrelated VM) and did not observe the issue.

At this point I think the check might be related to how the cache is expired.  I'll be investigating that.
Component: Server Operations: RelEng → Release Engineering
Ok, thanks. It would be good to know that the cache eviction on a push is working with the new version of varnish.
This is happening still.  See

The first build that failed on this branch showed
/usr/local/bin/hg clone --verbose --noupdate build
requesting all changes
abort: HTTP Error 503: Service Unavailable
program finished with exit code 255

The following builds show the traceback in comment 0.

(minus some of the noise)
/usr/local/bin/hg clone --verbose --noupdate build
requesting all changes
adding changesets
adding manifests
adding file changes
added 58555 changesets with 0 changes to 0 files (+27 heads)

/usr/local/bin/hg identify --num --branch
-1 default

/usr/local/bin/hg update --clean --repository build --rev 0cb1870e32d2d63b380f48bd30e1f8e281dbd5ec
abort: unknown revision '0cb1870e32d2d63b380f48bd30e1f8e281dbd5ec'!
... and that was the 49th attempt at building for that push :)
(In reply to Phil Ringnalda (:philor) from comment #12)
> ... and that was the 49th attempt at building for that push :)

The darwin9 clones are succeeding now after downgrading varnish. This build should finally persevere.
I'm going to dup this, since it's clearly related to / a symptom of changes in bug 693202.
Closed: 8 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 693202
Product: → Release Engineering
You need to log in before you can comment on or make changes to this bug.