Closed
Bug 1025801
Opened 10 years ago
Closed 10 years ago
Problem with cannot create debug link section
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: cbook, Unassigned)
References
Details
see bug https://bugzilla.mozilla.org/show_bug.cgi?id=1025800 about a problem with a slave with builds.
01:21 < pmoore|projectduty> Tomcat|sheriffduty: in those logs with the corrupted zips, i also see lines like:
01:22 < pmoore|projectduty> /usr/bin/objcopy:dist/bin/stSlDtG3: cannot create debug link section `dist/bin/test_unlock_notify.dbg': Invalid operation
01:23 < pmoore|projectduty> i wonder if we need to update our version of objcopy - i see a lot of e.g. debian bugs with this problem, and the solution was to upgrade binutils package
01:23 < Tomcat|sheriffduty> pmoore|projectduty: yeah i will file a bug for this slave
01:24 < pmoore|projectduty> i wonder if we are just using a buggy version of code, and the problem is occasionally exhibited on a slave, but maybe is not a slave problem but a tools problem that happens sporadically
01:24 < pmoore|projectduty> or maybe once it happens, it leaves the machine in a bad state, so it looks like a slave problem
01:24 < pmoore|projectduty> just a thought, could be way off base
01:24 < Tomcat|sheriffduty> yeah
01:25 < pmoore|projectduty> might be worth raising a bug for the slave and also a separate bug for the root cause? or maybe we already have one…
Comment 1•10 years ago
|
||
Pmoore's guess looks probable:
looking at an ec2 instance (non spot but should be the same), I am getting:
[cltbld@dev-linux64-ec2-iconnolly.dev.releng.use1.mozilla.com ~]$ /usr/bin/objcopy --version
GNU objcopy version 2.20.51.0.2-5.28.el6 20091009
and 2.20 was reported to have a bug that may be related to this situation. Here is that bug: https://sourceware.org/bugzilla/show_bug.cgi?id=11072
based off comments, in the sourceware bug, an updated version resolved that issue. I'm guessing we need to play with puppet to do this. /me dives in deeper.
Comment 2•10 years ago
|
||
so i am not sure if these add up. the issue reported
here - https://sourceware.org/bugzilla/show_bug.cgi?id=11072
and here - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556951
refers to using --add-gnu-debuglink. That option doesn't seem to be in the output of 'make buildsymbols'
we do use the 'gold' binutils and that seems to be a culprit for such output like:
"""
/usr/bin/objcopy:dist/bin/stBmHhWA: cannot create debug link section `dist/bin/libxul.so.dbg': Invalid operation
"""
It looks like we we install binutils against: http://mxr.mozilla.org/build/source/puppet-manifests/modules/packages/manifests/devtools.pp#48
Comment 3•10 years ago
|
||
If objcopy is not a red herring, we can just set OBJCOPY in build/unix/mozconfig.linux. That said, why are we using gold? we shouldn't be.
Comment 4•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #3)
> If objcopy is not a red herring, we can just set OBJCOPY in
> build/unix/mozconfig.linux
To use the one that comes alongside gcc, which is from a recent binutils.
Comment 5•10 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #3)
> If objcopy is not a red herring, we can just set OBJCOPY in
> build/unix/mozconfig.linux. That said, why are we using gold? we shouldn't
> be.
/me finds `Bug 633269 - Use gold for linking on linux` looks like you were against this too back then :)
not sure if this how things are today, I may be reading things incorrectly from puppet repo.
Comment 6•10 years ago
|
||
So far bld-linux64-spot-136 is the only slave I could find that was having this issue. There is the possibility that the zip was actually bad, or the instance created was an anomaly.
maybe related, it seems that https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=bld-linux64-spot-494 was also having issues unpacking. not with zip but with tar. It had four failures in a row:
"""
ERROR: Command failed. See logs for output.
# ['tar', '--use-compress-program', 'pigz', '-xf', '/builds/mock_mozilla/cache/mozilla-centos6-x86_64/root_cache/cache.tar.gz', '-C', '/builds/mock_mozilla/mozilla-centos6-x86_64/root/']
program finished with exit code 2
"""
It appears like bld-linux64-spot-494 got 'un-stuck' and is running green again.
Comment 7•10 years ago
|
||
So there are a few ideas here:
1) The slave is to blame
a) this was due to bad instances (bad AMI) -> Bug 1025842 certainly suggests that to be the case for bld-linux64-spot-494
b) maybe ran out of disk space, this would explain how bld-linux64-spot-136 got stuck in a rut on the same builder
2) the zip was corrupt: unlikely as other slaves built fine for the rev at incident and the surrounding revs.
3) there is a bug with binutils: I want to say that this is a red herring as other slaves seem to be able to build just fine.
I am tempted to suggest this is number (1). Let's see what the result of Bug 1025842 is first as it might be related.
I am going to re-enable bld-linux64-spot-136. It looks like it has been terminated since being disabled so its state is lost and it should have a fresh start. Won't help with debugging what actually happened though.
If this is a 'disk space' issue, we will have to act quickly on the spot(s) in question.
Comment 8•10 years ago
|
||
this *looks* solved, or if its not is not a buildduty issue anymore.
Please either file new bugs for followup issues, or re-open and move to a different component if my assessment is wrong.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•