Closed Bug 1324492 Opened 3 years ago Closed 3 years ago

CMake needs to be upgraded for desktop-build workers

Categories

(Taskcluster :: Workers, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla53

People

(Reporter: ehsan, Assigned: ehsan)

References

Details

Attachments

(1 file, 3 obsolete files)

For bug 1324488, we need to upgrade CMake to the latest version.

I did what I think used to work before, which I will attach to this bug shortly but it seems that doesn't work any more.  I get errors such as this one:

https://public-artifacts.taskcluster.net/W5X8vJupTJmqpT0cQkrGUg/0/public/logs/live_backing.log
Step 0 : FROM taskcluster/centos6-build-upd:0.1.9.20161218115200
Pulling repository taskcluster/centos6-build-upd
Tag 0.1.9.20161218115200 not found in repository taskcluster/centos6-build-upd

(I used both 0.1.9.xxx and 0.1.6.xxx version numbers as 0.1.8.xxx was previously used but backed out, and neither work.)

Dustin, any ideas what I'm doing wrong this time?
Flags: needinfo?(dustin)
Newer versions of LLVM require CMake 3.4, so we may as well
upgrade to the latest version.
Assignee: nobody → ehsan
The docker-image-building support does not support chaining multiple docker images automatically.  A modification to the centos6-build image requires pushing the new versions to the docker hub, which only someone with write access to that repo can do.  If you want to experiment in try, you can change REGISTRY to point to your own docker hub registry (so I would change it to `djmitche`, for example).

If you're confident that those are correct, then I can build and push the images for centos6-build and centos6-build-upd, if you'd like.

This situation will likely get better when bug 1289812 is fixed -- but that's still up in the air.
Flags: needinfo?(dustin)
I published centos6-build and centos6-build-upd to my own docker hub account (ehsanakhgari) and even though they haven't shown up in the docker hub webpage, it seems the image building job is green now on try: <https://treeherder.mozilla.org/#/jobs?repo=try&revision=6d0aad7da6250308a49267df2b78cb270957e9d3>.  I also tested a new desktop-build image locally based on the above tags and verified that it indeed has the upgraded cmake, so I think I can make a version of this patch that is ready for review, and you can build the images and upload them to the taskcluster account.
Newer versions of LLVM require CMake 3.4, so we may as well
upgrade to the latest version.
Attachment #8822592 - Flags: review?(dustin)
Attachment #8819942 - Attachment is obsolete: true
Comment on attachment 8822592 [details] [diff] [review]
Upgrade CMake to 3.7.1 in the desktop-build image

Review of attachment 8822592 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!  I'll pull and build and push and post back here when that's set.
Attachment #8822592 - Flags: review?(dustin) → review+
Ok, built and pushed.
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fd09b9651dc6
Upgrade CMake to 3.7.1 in the desktop-build image; r=dustin
Hmm, I'm pretty sure that this is caused by the "yum update" that runs when building the centos6-build-upd image.

Dustin, is there any way to upgrade the CMake version *without* this "yum update" in the process?  If not, I'd have to install this cmake as part of the tasks that need it, which can get pretty ugly.
Flags: needinfo?(ehsan) → needinfo?(dustin)
Julian, is this something we need to add valgrind suppressions for?  If yes, can you please help a little bit on how one is supposed to do that?  I noticed that in these stacks nothing is symbolicated, and I'm also not familiar with how the suppression format works, and which leaks are OK to ignore... :/

This is currently blocking a whole bunch of completely unrelated work from landing.
Flags: needinfo?(jseward)
Let me try to fill in the blanks here.. The patch that landed installs a newer version of CMake directly from a tarball.  Then the docker build of centos6-build-upd then runs `yum update` which installs a different CMake over top of that?

I don't see a cmake installed as an rpm, but it is installed (presumably from the tarball) in centos6-build:0.1.7

[root@c91183e6635e ~]# rpm -qa | grep cmake
[root@c91183e6635e ~]# cmake --version
cmake version 3.7.1

and the `yum update` when building centos6-build-upd doesn't update it:

Updated:
  nodejs.x86_64 0:0.10.48-3.el6      nodejs-devel.x86_64 0:0.10.48-3.el6        
  tzdata.noarch 0:2016j-1.el6        vim-minimal.x86_64 2:7.4.629-5.el6_8.1     
  yasm.x86_64 0:1.2.0-1.el6       

So if you can explain a little more of what's wrong, that might help.  There's a chance that this is similar to bug 1289812, and something in the `yum install` occurring in centos6-build is causing the issues.  But to date we've only seen that with Ubuntu.
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] from comment #12)
> Let me try to fill in the blanks here.. The patch that landed installs a
> newer version of CMake directly from a tarball.  Then the docker build of
> centos6-build-upd then runs `yum update` which installs a different CMake
> over top of that?

No, this patch makes us not use yum to install CMake in the first place.  I think the issue here is completely irrelevant to cmake, i.e., when we yum update as part of rebuilding centos6-build-upd, something else on the system changes in a way that breaks this test.

> I don't see a cmake installed as an rpm, but it is installed (presumably
> from the tarball) in centos6-build:0.1.7
> 
> [root@c91183e6635e ~]# rpm -qa | grep cmake
> [root@c91183e6635e ~]# cmake --version
> cmake version 3.7.1
> 
> and the `yum update` when building centos6-build-upd doesn't update it:
> 
> Updated:
>   nodejs.x86_64 0:0.10.48-3.el6      nodejs-devel.x86_64 0:0.10.48-3.el6    
> 
>   tzdata.noarch 0:2016j-1.el6        vim-minimal.x86_64 2:7.4.629-5.el6_8.1 
> 
>   yasm.x86_64 0:1.2.0-1.el6       

Yeah, this is all expected.

> So if you can explain a little more of what's wrong, that might help. 
> There's a chance that this is similar to bug 1289812, and something in the
> `yum install` occurring in centos6-build is causing the issues.  But to date
> we've only seen that with Ubuntu.

I think that's exactly what's happening.  Note that that bug is mostly talking about test machines, but this is a test that we run on a *build* machine which is why it's affected in a similar way but on the desktop-build image instead.
(In reply to :Ehsan Akhgari from comment #11)
> Julian, is this something we need to add valgrind suppressions for?  If yes,
> can you please help a little bit on how one is supposed to do that?  I
> noticed that in these stacks nothing is symbolicated, and I'm also not
> familiar with how the suppression format works, and which leaks are OK to
> ignore... :/

I did some investigation on this leak.  If you look at the log <https://public-artifacts.taskcluster.net/X33GvFf_QtSbC6b8LT4DCQ/0/public/logs/live_backing.log>, specifically this line before the leak report:

[task 2016-12-30T17:20:31.591949Z] 17:20:31     INFO -   0:16.32 --66000-- Discarding syms at 0x1534dce0-0x1600bbb6 in /usr/lib64/libLLVM-3.6-mesa.so due to munmap()

The address range includes the final symbols in the leak allocation stack trace, so this is presumably coming from some JIT'ed code from llvmpipe.  I tried dlopen'ing this library in the beginning of XRE_main() but it'd still get munmap'ed before valgrind generates the backtrace.

Adding a suppression rule based on what valgrind suggests makes the build go green <https://treeherder.mozilla.org/#/jobs?repo=try&revision=55d7321b40084499ec4b43586691d2c23b53d217> but I'm not sure how much sense it makes...
Attachment #8823515 - Flags: review?(jseward)
Comment on attachment 8823515 [details] [diff] [review]
Suprress the valgrind leak coming from libLLVM-3.6-mesa.so; jseward

Review of attachment 8823515 [details] [diff] [review]:
-----------------------------------------------------------------

Unfortunately that will cause suppression of the leak of any block
allocated by operator new(unsigned long), so we'll need to find a
different solution.  The underlying problem is, as you observe, that
Valgrind discards the debug info for libLLVM-3.6-mesa.so and friends
when it is unmapped, leaving it unable to subsequently symbolicate
the stacks for leaked blocks allocated by libLLVM-3.6-mesa.so.
Attachment #8823515 - Flags: review?(jseward)
(In reply to Julian Seward [:jseward] from comment #16)
> Comment on attachment 8823515 [details] [diff] [review]
> Suprress the valgrind leak coming from libLLVM-3.6-mesa.so; jseward
> 
> Review of attachment 8823515 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Unfortunately that will cause suppression of the leak of any block
> allocated by operator new(unsigned long), so we'll need to find a
> different solution.

That's what I thought.  :(

What other solution do you have in mind?  I'm absolutely out of ideas here.

> The underlying problem is, as you observe, that
> Valgrind discards the debug info for libLLVM-3.6-mesa.so and friends
> when it is unmapped, leaving it unable to subsequently symbolicate
> the stacks for leaked blocks allocated by libLLVM-3.6-mesa.so.

I even went as far as dlopening libLLVM-3.6-mesa.so on startup manually and never dlclosing the return value but that made no difference, libLLVM-3.6-mesa.so would still get munmap'ed in the same place.  But even if through some magic we got a symbolicated backtrace here, fact remains that valgrind wouldn't be able to get one when it wants to match the leak it detects against the suppression list...  So I can't imagine that we can use a suppression to get around this in the first place.
Dustin, back to my original question: Can we somehow *only* upgrade CMake and not run yum update in the process?
Flags: needinfo?(dustin)
I think we've established that `yum install` is actually the problem, and it's an awful lot harder to build an image without that.

Even if we do figure out how to build docker images deterministically, we'll need to solve whatever bug was introduced between generation of the 0.1.6 image and now -- we're very unlikely to be able to reproduce the 0.1.6 image's package contents.

That said, you could add the CMake install in taskcluster/docker/desktop-build, as that is generated with the yum install/update as a base.
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] from comment #19)
> That said, you could add the CMake install in
> taskcluster/docker/desktop-build, as that is generated with the yum
> install/update as a base.

This works perfectly!
Attachment #8823799 - Flags: review?(dustin)
Attachment #8822592 - Attachment is obsolete: true
Attachment #8823515 - Attachment is obsolete: true
Attachment #8823799 - Flags: review?(dustin) → review+
Pushed by eakhgari@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/73dae2d9869e
Install CMake 3.7.1 in desktop-build images; r=dustin
https://hg.mozilla.org/mozilla-central/rev/73dae2d9869e
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Flags: needinfo?(jseward)
Component: Worker → Workers
You need to log in before you can comment on or make changes to this bug.