Problem with cannot create debug link section

RESOLVED FIXED

Status

Release Engineering
Buildduty
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: Tomcat, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

3 years ago
see bug https://bugzilla.mozilla.org/show_bug.cgi?id=1025800 about a problem with a slave with builds.

01:21 < pmoore|projectduty> Tomcat|sheriffduty: in those logs with the corrupted zips, i also see lines like:
01:22 < pmoore|projectduty>  /usr/bin/objcopy:dist/bin/stSlDtG3: cannot create debug link section `dist/bin/test_unlock_notify.dbg': Invalid operation
01:23 < pmoore|projectduty> i wonder if we need to update our version of objcopy - i see a lot of e.g. debian bugs with this problem, and the solution was to upgrade binutils package
01:23 < Tomcat|sheriffduty> pmoore|projectduty: yeah i will file a bug for this slave
01:24 < pmoore|projectduty> i wonder if we are just using a buggy version of code, and the problem is occasionally exhibited on a slave, but maybe is not a slave problem but a tools problem that happens sporadically
01:24 < pmoore|projectduty> or maybe once it happens, it leaves the machine in a bad state, so it looks like a slave problem
01:24 < pmoore|projectduty> just a thought, could be way off base
01:24 < Tomcat|sheriffduty> yeah
01:25 < pmoore|projectduty> might be worth raising a bug for the slave and also a separate bug for the root cause? or maybe we already have oneā€¦
(Reporter)

Updated

3 years ago
Blocks: 1025863

Comment 1

3 years ago
Pmoore's guess looks probable:

looking at an ec2 instance (non spot but should be the same), I am getting:

[cltbld@dev-linux64-ec2-iconnolly.dev.releng.use1.mozilla.com ~]$ /usr/bin/objcopy --version
GNU objcopy version 2.20.51.0.2-5.28.el6 20091009

and 2.20 was reported to have a bug that may be related to this situation. Here is that bug: https://sourceware.org/bugzilla/show_bug.cgi?id=11072

based off comments, in the sourceware bug, an updated version resolved that issue. I'm guessing we need to play with puppet to do this. /me dives in deeper.

Comment 2

3 years ago
so i am not sure if these add up. the issue reported
here - https://sourceware.org/bugzilla/show_bug.cgi?id=11072
and here - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=556951

refers to using --add-gnu-debuglink. That option doesn't seem to be in the output of 'make buildsymbols'

we do use the 'gold' binutils and that seems to be a culprit for such output like:
"""
/usr/bin/objcopy:dist/bin/stBmHhWA: cannot create debug link section `dist/bin/libxul.so.dbg': Invalid operation
"""

It looks like we we install binutils against: http://mxr.mozilla.org/build/source/puppet-manifests/modules/packages/manifests/devtools.pp#48
If objcopy is not a red herring, we can just set OBJCOPY in build/unix/mozconfig.linux. That said, why are we using gold? we shouldn't be.
(In reply to Mike Hommey [:glandium] from comment #3)
> If objcopy is not a red herring, we can just set OBJCOPY in
> build/unix/mozconfig.linux

To use the one that comes alongside gcc, which is from a recent binutils.

Comment 5

3 years ago
(In reply to Mike Hommey [:glandium] from comment #3)
> If objcopy is not a red herring, we can just set OBJCOPY in
> build/unix/mozconfig.linux. That said, why are we using gold? we shouldn't
> be.

/me finds `Bug 633269 - Use gold for linking on linux` looks like you were against this too back then :)

not sure if this how things are today, I may be reading things incorrectly from puppet repo.

Comment 6

3 years ago
So far bld-linux64-spot-136 is the only slave I could find that was having this issue. There is the possibility that the zip was actually bad, or the instance created was an anomaly.

maybe related, it seems that https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?name=bld-linux64-spot-494 was also having issues unpacking. not with zip but with tar. It had four failures in a row:

"""
ERROR: Command failed. See logs for output.
 # ['tar', '--use-compress-program', 'pigz', '-xf', '/builds/mock_mozilla/cache/mozilla-centos6-x86_64/root_cache/cache.tar.gz', '-C', '/builds/mock_mozilla/mozilla-centos6-x86_64/root/']
program finished with exit code 2
"""

It appears like bld-linux64-spot-494 got 'un-stuck' and is running green again.

Comment 7

3 years ago
So there are a few ideas here:

1) The slave is to blame
    a) this was due to bad instances (bad AMI) -> Bug 1025842 certainly suggests that to be the case for bld-linux64-spot-494
    b) maybe ran out of disk space, this would explain how bld-linux64-spot-136 got stuck in a rut on the same builder

2) the zip was corrupt: unlikely as other slaves built fine for the rev at incident and the surrounding revs.

3) there is a bug with binutils: I want to say that this is a red herring as other slaves seem to be able to build just fine.

I am tempted to suggest this is number (1). Let's see what the result of Bug 1025842 is first as it might be related.

I am going to re-enable bld-linux64-spot-136. It looks like it has been terminated since being disabled so its state is lost and it should have a fresh start. Won't help with debugging what actually happened though.
 If this is a 'disk space' issue, we will have to act quickly on the spot(s) in question.

Updated

3 years ago
Depends on: 1025842
this *looks* solved, or if its not is not a buildduty issue anymore.

Please either file new bugs for followup issues, or re-open and move to a different component if my assessment is wrong.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.