Closed Bug 976436 Opened 10 years ago Closed 10 years ago

bld-lion-r5-037 debug failures may be showing a problem with bug 969689

Categories

(Release Engineering :: General, defect)

x86_64
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: mshal)

References

Details

Attachments

(1 file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=35156233&tree=Try and https://tbpl.mozilla.org/php/getParsedLog.php?id=35169171&tree=Try are both the same slave failing a check test dealing with the eclipse project stuff. But, the latter is a push of aurora, which doesn't have any of the eclipse project stuff on it at all.

glandium said to disable the slave to preserve the evidence, and file a bug blocking bug 969689, so I did.
Note the error suggests a pyc file is preserved that hg purge should have removed. Also note that the same slave failed to pull in an earlier build and started afresh from an unbundle, which may or may not have triggered something.
There's also a jsreftest failure from my build on that slave, https://tbpl.mozilla.org/php/getParsedLog.php?id=35171106&tree=Try, which went away on a build on a different slave; dunno what random stray thing in the repo would have caused that, but I no longer trust any try build, and am just advising people who don't believe the results of their try push to retrigger the build and see if they get more reasonable results.
Mike, Chris,

can one of you look into this and determine if we need to backout the hg purge again?
Flags: needinfo?(mshal)
Flags: needinfo?(catlee)
so indeed this looks like a problem with hg purge.

The slave fails with:
Traceback (most recent call last):
  File "/builds/slave/try-osx64-d-000000000000000000/build/python/mozbuild/mozbuild/test/backend/test_android_eclipse.py", line 11, in <module>
    from mozbuild.backend.android_eclipse import AndroidEclipseBackend
  File "/builds/slave/try-osx64-d-000000000000000000/build/python/mozbuild/mozbuild/backend/android_eclipse.py", line 18, in <module>
    from ..frontend.data import (
ImportError: cannot import name AndroidEclipseProjectData

hg sez:
[cltbld@bld-lion-r5-037.try.releng.scl3.mozilla.com backend]$ pwd
/builds/slave/try-osx64-d-000000000000000000/build/python/mozbuild/mozbuild/backend

[cltbld@bld-lion-r5-037.try.releng.scl3.mozilla.com backend]$ hg parent
changeset:   172806:0cfad7eb4c81
tag:         tip
user:        Rik Cabanier <cabanier@adobe.com>
date:        Mon Feb 24 16:49:51 2014 -0800
summary:     try: -b d -p linux,macosx64 -u mochitest-1 -t none

[cltbld@bld-lion-r5-037.try.releng.scl3.mozilla.com backend]$ hg status android_eclipse.py 
[cltbld@bld-lion-r5-037.try.releng.scl3.mozilla.com backend]$ 

But that file doesn't exist at that rev according to hg:
https://hg.mozilla.org/try/file/0cfad7eb4c81/python/mozbuild/mozbuild/backend
Flags: needinfo?(mshal)
Flags: needinfo?(catlee)
From the log of the build that broke it:


command: START
command: hg pull -r e0d305689e96eaf17d363158bcf6ac38a9122686 https://hg.mozilla.org/try
command: cwd: /builds/hg-shared/try
command: output:
warning: hg.mozilla.org certificate with fingerprint af:27:b9:34:47:4e:e5:98:01:f6:83:2b:51:c9:aa:d8:df:fb:1a:27 not verified (check hostfingerprints or web.cacerts config setting)
pulling from https://hg.mozilla.org/try
abort: [Errno 116] Stale file handle: '/repo/hg/mozilla/try/.hg/store/phaseroots'!
command: ERROR
Traceback (most recent call last):
  File "/builds/slave/try-osx64-d-000000000000000000/tools/buildfarm/utils/../../lib/python/util/commands.py", line 47, in run_cmd
    return subprocess.check_call(cmd, **kwargs)
  File "/tools/python27/lib/python2.7/subprocess.py", line 511, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['hg', 'pull', '-r', u'e0d305689e96eaf17d363158bcf6ac38a9122686', 'https://hg.mozilla.org/try']' returned non-zero exit status 255
command: END (0.33s elapsed)

Error pulling changes into /builds/hg-shared/try from https://hg.mozilla.org/try; clobbering
Attempting to initialize clone with bundles
command: START
command: hg init /builds/hg-shared/try
command: cwd: /builds/slave/try-osx64-d-000000000000000000
command: output:
command: END (0.67s elapsed)

Trying to use bundle https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/bundles/try.hg
command: START
command: hg unbundle https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/bundles/try.hg
command: cwd: /builds/hg-shared/try
command: output:
warning: ftp-ssl.mozilla.org certificate with fingerprint 87:da:ae:ce:33:43:96:18:48:ee:1f:0c:00:db:17:24:85:d3:5a:b3 not verified (check hostfingerprints or web.cacerts config setting)
adding changesets
adding manifests
adding file changes
added 170016 changesets with 958391 changes to 151074 files
(run 'hg update' to get a working copy)

command: END (616.21 elapsed)

command: START
command: hg pull -r e0d305689e96eaf17d363158bcf6ac38a9122686 https://hg.mozilla.org/try
command: cwd: /builds/hg-shared/try
command: output:
warning: hg.mozilla.org certificate with fingerprint af:27:b9:34:47:4e:e5:98:01:f6:83:2b:51:c9:aa:d8:df:fb:1a:27 not verified (check hostfingerprints or web.cacerts config setting)
pulling from https://hg.mozilla.org/try
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 8 changes to 8 files (+1 heads)
(run 'hg heads' to see heads, 'hg merge' to merge)
command: END (4.93s elapsed)

command: START
command: hg --config extensions.purge= purge -a --all /builds/slave/try-osx64-d-000000000000000000/build
command: cwd: /builds/slave/try-osx64-d-000000000000000000/build
command: output:
warning: ignoring unknown working parent 83a4e267597b!
command: END (73.97s elapsed)

command: START
command: hg update -C -r e0d305689e96eaf17d363158bcf6ac38a9122686
command: cwd: /builds/slave/try-osx64-d-000000000000000000/build
command: output:
warning: ignoring unknown working parent 83a4e267597b!
90065 files updated, 0 files merged, 3 files removed, 0 files unresolved
command: END (143.78s elapsed)

command: START
command: hg parent --template {node|short}
command: cwd: /builds/slave/try-osx64-d-000000000000000000/build
command: output:
e0d305689e96
command: END (0.82 elapsed)

https://tbpl.mozilla.org/php/getParsedLog.php?id=35156233&full=1&branch=try#error0

Simply put, there's another mode that resets the hg share, and it's not handled properly.
The check added in bug 969689 happens before that.
(In reply to Mike Hommey [:glandium] from comment #5)
> abort: [Errno 116] Stale file handle:
> '/repo/hg/mozilla/try/.hg/store/phaseroots'!

[FWIW: I just filed bug 979560 on another instance of this same error, on Try; not sure if it's the same as this bug or not.]
Assignee: nobody → mshal
Attached patch bug976436Splinter Review
Thanks for the info glandium - that was helpful.

It looks like testShareReset() goes through the same path as the failed build (where the mercurial() call after "Updating shared repo" hits the "Error pulling changes" message). I added a new test based off of this to do the same newfile trick as in testShareExtraFilesReset().

I also removed the old_revs line in testShareExtraFiles since it was left there when I was copying things from another test.
Attachment #8385795 - Flags: review?(bhearsum)
(In reply to Daniel Holbert [:dholbert] from comment #7)
> (In reply to Mike Hommey [:glandium] from comment #5)
> > abort: [Errno 116] Stale file handle:
> > '/repo/hg/mozilla/try/.hg/store/phaseroots'!
> 
> [FWIW: I just filed bug 979560 on another instance of this same error, on
> Try; not sure if it's the same as this bug or not.]

Just a heads up that my patch doesn't fix the "Stale file handle" issue - it just tries to make sure the working repo doesn't end up in a corrupt state if the shared repo does get a stale file handle (or any other issue) when updating. I'm not sure why we are getting stale file handles, but that should probably also get addressed.
Comment on attachment 8385795 [details] [diff] [review]
bug976436

Review of attachment 8385795 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the good comments, very helpful!
Attachment #8385795 - Flags: review?(bhearsum) → review+
I believe this specific hg purge issue has been fixed. If it resurfaces, feel free to reopen or create a new bug.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: