Closed Bug 731256 Opened 10 years ago Closed 9 years ago

Releng build support for DXR

Categories

(Release Engineering :: General, defect, P3)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mdas, Assigned: catlee)

References

Details

(Whiteboard: [dxr])

Attachments

(6 files, 1 obsolete file)

Hi there!

dxr.mozilla.org currently makes its own builds, but we would like to have the builds done through releng. 

The packages needed to build DXR are: 
* python >= 2.6 with python-sqlite.
* sqlite3 >= 3.7.9 
* cxx-clang

The build script used is located here: https://raw.github.com/garnacho/dxr/lanedo-changes/update.sh

This build script reads in configuration information from here: https://github.com/garnacho/dxr/blob/lanedo-changes/dxr.config#L15

To activate the script:
make -f client.mk build

For clobber builds (we haven't yet decided on a frequency yet):
buildcommand=make -f client.mk clean && make -f client.mk build

Repository:
https://github.com/mozilla/dxr

this is a github repository, not sure if you have support for it. If not, we can mirror it on an hg repo.

Please let us know of any other questions you have / information you need. Thanks!
Component: Release Engineering → Release Engineering: Automation
QA Contact: release → catlee
Whiteboard: [project]
best thing to do in the short term would be to get an hg mirror of this up and running
Would this be better suited to ci.mozilla.org?  Shyam?  I don't know the details, but I think that a number of websites go through CI, and some even do deployments on that basis.
just to clarify, the builds will produce indices that the production machine will pull down and serve, so we need to publish them to a well defined location, correct?

some other questions:

* how often should this be run?

* which mozilla repositories should it be run against? (e.g. mozilla-central, aurora, ...)

* you mention for clobber builds you have special instructions. what happens if every build is a clobber?
(In reply to Chris AtLee [:catlee] from comment #3)
> just to clarify, the builds will produce indices that the production machine
> will pull down and serve, so we need to publish them to a well defined
> location, correct?

yes, that's correct, such location is specified in the dxr.config file that the update script takes as first parameter. There is some support for rsync'ing the produced info to a remote location if the build and server machines aren't the same, but that could deserve more testing.

> 
> some other questions:
> 
> * how often should this be run?

It depends on how up-to-date is the indexed tree meant to be, once a day sounds like a sane minimal threshold

> 
> * which mozilla repositories should it be run against? (e.g.
> mozilla-central, aurora, ...)

So far testing has happened against mozilla-central, I can't tell myself how worth is it to include aurora or other repos

> 
> * you mention for clobber builds you have special instructions. what happens
> if every build is a clobber?

It could perhaps take a bit longer to generate the information, I don't think that's actually the most time consuming step.

It is also worth mentioning that the dxr-index.py script is currently quite memory intensive at the time of creating the cross-reference maps in a big project like mozilla-central is, this should probably be taken in consideration if the indexing is going to happen in a machine that provides other services.
(In reply to Carlos Garnacho from comment #4)
> (In reply to Chris AtLee [:catlee] from comment #3)
> > just to clarify, the builds will produce indices that the production machine
> > will pull down and serve, so we need to publish them to a well defined
> > location, correct?
> 
> yes, that's correct, such location is specified in the dxr.config file that
> the update script takes as first parameter. There is some support for
> rsync'ing the produced info to a remote location if the build and server
> machines aren't the same, but that could deserve more testing.
> 

releng can push the builds to an ftp service where the production machine can pull them down.

> > 
> > some other questions:
> > 
> > * how often should this be run?
> 
> It depends on how up-to-date is the indexed tree meant to be, once a day
> sounds like a sane minimal threshold
> 
> > 
> > * which mozilla repositories should it be run against? (e.g.
> > mozilla-central, aurora, ...)
> 
> So far testing has happened against mozilla-central, I can't tell myself how
> worth is it to include aurora or other repos
> 
> > 
> > * you mention for clobber builds you have special instructions. what happens
> > if every build is a clobber?
> 
> It could perhaps take a bit longer to generate the information, I don't
> think that's actually the most time consuming step.
> 
> It is also worth mentioning that the dxr-index.py script is currently quite
> memory intensive at the time of creating the cross-reference maps in a big
> project like mozilla-central is, this should probably be taken in
> consideration if the indexing is going to happen in a machine that provides
> other services.

One question that came up previous is how should the update.sh script be called for both clobber and regular builds?
could you give us specific instructions for where to get / how to build cxx-clang?
Priority: -- → P3
(In reply to Chris AtLee [:catlee] from comment #1)
> best thing to do in the short term would be to get an hg mirror of this up
> and running

hg mirror is up here:

http://hg.mozilla.org/users/mdas_mozilla.com/dxr-mirror/
(In reply to Malini Das [:mdas] from comment #7)
> (In reply to Chris AtLee [:catlee] from comment #1)
> > best thing to do in the short term would be to get an hg mirror of this up
> > and running
> 
> hg mirror is up here:
> 
> http://hg.mozilla.org/users/mdas_mozilla.com/dxr-mirror/

*sigh*, scratch that. I'll need to get it hosted on a repo that can be push to by a non-mozilla user and I didn't know there was this restriction. I'll be automating the git->hg mirror on a separate machine that won't have my keys on it. Will update the bug accordingly.
(In reply to Malini Das [:mdas] from comment #5)
> 
> One question that came up previous is how should the update.sh script be
> called for both clobber and regular builds?

I've attached sample configurations for both builds, the paths within these could be subject to change depending on the server setup, so the update script would be run like:

$ ./update.sh ./dxr.config mozilla-central
$ ./update.sh ./dxr.clobber.config mozilla-central

(In reply to Chris AtLee [:catlee] from comment #6)
> could you give us specific instructions for where to get / how to build
> cxx-clang?

The cxx-clang plugin is automatically built by the setup-env.sh script, which is called by update.sh, another option is doing

$ cd xref-tools/cxx-clang
$ make

clang 3.0 and its development headers would be necessary to compile this module
trying to get this running on my machine, and I hit this:
Building SQL...
Traceback (most recent call last):
  File "/home/catlee/mozilla/dxr/dxr-index.py", line 462, in <module>
    main(sys.argv[1:])
  File "/home/catlee/mozilla/dxr/dxr-index.py", line 459, in main
    parseconfig(configfile, doxref, dohtml, tree, debugfile)
  File "/home/catlee/mozilla/dxr/dxr-index.py", line 398, in parseconfig
    indextree(treecfg, doxref, dohtml, debugfile)
  File "/home/catlee/mozilla/dxr/dxr-index.py", line 268, in indextree
    builddb(treecfg, dbdir, tmproot)
  File "/home/catlee/mozilla/dxr/dxr-index.py", line 228, in builddb
    plugin.build_database(conn, srcdir, objdir, cache)
AttributeError: 'module' object has no attribute 'build_database'
Error: unable to open database "//home/catlee/public_html/dxr/dxr/mozilla-central/.dxr_xref/mozilla-central.sqlite": unable to open database file

any ideas?
cc'ing David Humphrey, one of the dxr devs
Malini (:mdas) can probably help out here, but she's busy with the B2G work week all this week, so her response time will likely be a little slow.
(In reply to Clint Talbert ( :ctalbert ) from comment #14)
> Malini (:mdas) can probably help out here, but she's busy with the B2G work
> week all this week, so her response time will likely be a little slow.

I haven't built DXR indices lately, but I haven't seen this issue before. Unfortunately, I won't have time to build and test from the work week, so your best bet is to ask on #static, garnacho (Carlos), should be around to help with the build.
Okay! So now that we have the user, repo and automation machine all in order, we now have daily mirroring from git->hg up at http://hg.mozilla.org/projects/dxr. The mirroring occurs at midnight PDT.
This bug is raised to P1 as per Ehsan and Mdas request (on bug #759499 )
Priority: P3 → P1
oups, sorry, I didn't saw it was not a DXR bug
Priority: P1 → P3
Depends on: 760158
Hey there, now that Bug 760158 is resolved, is there anything else blocking this bug?
(In reply to Malini Das [:mdas] from comment #19)
> Hey there, now that Bug 760158 is resolved, is there anything else blocking
> this bug?

Not as far as we know.
In that case, :catlee, do we have an estimate on when we can get these builds?
FWIW, I'm setting up a local DXR instance myself, and I fixed a number of issues I encountered and pushed to the github repo.  If you see any problems while working on this, please ping me and I'll try to fix them as soon as possible.
Assignee: nobody → catlee
Whiteboard: [project] → [dxr]
So we had an impromptu meeting and we came up with a list of action items:

* We need a build script that works within single directory. This will be worked on by Catlee and DXR team.
* We will build against both tip DXR code, and a stable DXR release. We need to create a branch and hg repo for the stable code. I'll be doing this, related to Bug 771614.
* Ne need better tests/verification of DXR builds. DXR team needs to address this, but I'll need to help with the test framework, so that tbpl/autolog can process error messages correctly.
Okay, so I've been trying to get DXR running on a CentOS 6.0 VM. Anyways, to sum up:
1) I needed to build llvm and clang for the specific machine, otherwise it wouldn't find libstdc++
2) CentOS 6.0 comes with sqlite 3.6.20, we need 3.7.4 if not newer (I've only tested 3.7.9).
You can test python-sqlite version with "python -c 'import sqlite3; print sqlite3.sqlite_version'", but it's probably dynamically linked against system sqlite.

I've already messed around with compiling pysqlite and statically linking against newest sqlite, but that introduced other problems. Which might be fixable, but I'm not sure this is a road we want to go down, we'll just end up maintaining with a horrible mess of binary tarballs.

So I wonder if we have infrastructure for running VMs?
As that's probably the easiest and safest solution.
Do we have to use CentOS 6 in order to be able to use the existing builder pool?
Okay, so I finally got sqlite, pysqlite, llvm and clang to install in a prefix and use libraries from prefix.
I've packed sources, config files, and build scripts with prebuilt binary dependencies into a tarball.
And if you ask my nicely I'll give you a copy of this tarballs too, or just send my your IP and passwd and I'll scp it to you :)
(I assume someone will be less happy if I try attaching a 3GB tarball to bugzilla)

The "deploy-instructions.mkd" should be all you need to know to build mozilla-central with DXR.

I've used the package to build mozilla-central on a CentOS 6.0 VM with 5GB memory, it took about 24 hours, so I sincerely hope my quick n' dirty VM setup wasn't optimal :)

With regards to the webserver, if it has python-sqlite with sqlite >= 3.7.4, that should work, if not I hope mdas, will let me know and I'll setup a slightly smaller tarball for the webserver.
Please note that mod_rewrite will be needed on the webserver and that the tarball the build server produces contains a .htaccess file.

My apache-site config for dxr looks like this:
<VirtualHost *:80>
	DocumentRoot /var/www
	AddHandler cgi-script .cgi
	<Directory /var/www>
		Options Indexes FollowSymLinks MultiViews
		Options +ExecCGI
		AllowOverride None
		Order allow,deny
		allow from all
	</Directory>
</VirtualHost>
Quick update, Lanedo is taking their servers offline by the end of the month, so we should get something online soon.

I've deleted object files from the build env, so it's only 1GB... 
It can be can downloaded from my S3 account, here:
https://s3.amazonaws.com/mozilla-stuff/dxr-build-env-no-objs.tar.gz
(See deploy-instructions.mkd for more information).
The version of sqlite on the server is 3.3.6, but I'd like to get confirmation that no other scripts are running on the dxr machine require sqlite 3.3.6 before it gets upgraded.

Ehsan, do you know if the dxr host machine is being used for any other purpose, or know anyone else who could be using the machine?
(In reply to comment #28)
> Ehsan, do you know if the dxr host machine is being used for any other purpose,
> or know anyone else who could be using the machine?

As mentioned on IRC, it seems that I no longer have access to this machine.  IT should be consulted here.
The most critical thing is actually python, if as mdas said on IRC it's runnong RHEL 5.5, it has python 2.4, which is quite old. It might work, but it might also be a pain in the ... to maintain 2.4 compatibility.

Installing sqlite and pysqlite in a prefix bypassing system sqlite is something we can do. The only minor hurdle here is to ensure that apache provides the correct PYTHONPATH variable for cgi scripts.
Other than that I've found the magic parameters that makes pysqlite load an sqlite binary from prefix install.
taras suggests that it's not obvious in which case I sum up:
https://s3.amazonaws.com/mozilla-stuff/dxr-build-env-no-objs.tar.gz
Is ready for deployment on build servers. So pull it in, and replace dxr and mozilla-central checkout with something available from build servers.
Oh, the hg mirror of dxr is:
http://hg.mozilla.org/projects/dxr/

I'm not familiar with the dark arts required to access this from the build servers, but have been told that should be possible.

@catlee, please let me know if anything is blocking this bug. We need to get automated build online.
Nothing blocking except man-hours :\

I've taken the liberty of stripping the symbols from binaries and libraries, as well as source tarballs from the build env tarball above. This brings it down to ~60MB compressed.

How long should this be expected to run for? A test run I did didn't finish over night, so I'm wondering if I have it misconfigured somehow.
My test run in CentOS 6 VM took about 24 hours... they're usually a little faster outside the VM.. but it depends on the hardware...

Makes sense to remove symbols and libs if you care about size, I kept the sources in there because build servers doesn't have internet access and moving between different machines it can be necessary to rebuild as system libs changes.

Anyways, glade to hear that the test didn't crash :)

I hope to bring down the build time in the future, but no promises.
(For the time being I'm mainly messing around in the frontend)
@Chris, can we access people.mozilla.com from build servers?

If so I'll push to source tars there, remove them from the build env.
By the way, if build servers have multiple cores (they probably do) and isn't using them for anything else while doing DXR builds, the DXR indexing build can apparently run in parallel.
(Just, tested it with the build env last night, anyways, I'll update the makefile to take a parameter for number of jobs (won't be -j though, minor technicality))

Also I noticed that the git checkout of llvm/clang defaults to no optimizations, I think enabling that configure flag might help too :)
(In reply to comment #35)
> Also I noticed that the git checkout of llvm/clang defaults to no
> optimizations, I think enabling that configure flag might help too :)

If you do your own clang builds, you want co configure llvm with --enable-optimized --disable-assertions.  That basically gives you a build equivalent to their release builds.
New build env without sources, symbols, asserts and optimizations enabled...
http://people.mozilla.com/~jojensen/dxr-build-env-r2.tar.gz (79M)

I've activated parallel builds, makefile will use 12 jobs by default, you can change this using NB_JOB variable. This time it only took some 3 hours to build in my 4 core VM.

I've also merged all the prefix installs into one directory, so the whole things a little cleaner :)

Please test this, and let me know if you run into any issues. I'm just about to have access to the webserver, so I'll need an ftp to fetch automated builds from.
(In reply to comment #37)
> I've activated parallel builds, makefile will use 12 jobs by default, you can
> change this using NB_JOB variable. This time it only took some 3 hours to build
> in my 4 core VM.

I usually use 2*(number of cores) as a start value to choose the number of jobs to parallelize, and I sometimes end up cutting down the factor to 1.5.  If you have way more jobs than you have cores, you'll end up wasting CPU cycles for context switching, etc. and if you hae too few jobs, you'll end up with idle CPU cores.  If you have time, it would be interesting to play with the number of jobs and try to pick an optimum value (but it's OK if you want to focus on other stuff, consider this as an insider tip ;-)
Considering that my hardware is different from the build servers, I seriously doubt it makes sense to try and guess a number that is likely very dependent on hardware and OS configuration.

But yes, could be fun to let my PC build central with 10 different values.
Anyways, I think refactoring the plugin architecture and running HTML generation in parallel will give a lot more.
(but there's some hacks to clean up before that happens).
(In reply to comment #39)
> Considering that my hardware is different from the build servers, I seriously
> doubt it makes sense to try and guess a number that is likely very dependent on
> hardware and OS configuration.

Oh, sorry, I meant testing on the actual machines that the build is supposed to run on.  :-)  Other people can adjust NB_JOBS to their heart's content.

> But yes, could be fun to let my PC build central with 10 different values.
> Anyways, I think refactoring the plugin architecture and running HTML
> generation in parallel will give a lot more.
> (but there's some hacks to clean up before that happens).

Agreed.
Okay, so catlee on IRC that he had some problems with sqlite version mismatch... I suppose this could be an issue with moving the build-env around, when I use absolute rpath for pysqlite.

I would strongly recommend that we delete prefix/ (make dist-clean) and build/install dependencies  (make) whenever we move the build-env around.
As this probably the most robust thing. I had problems getting clang to find stdlibc++ if not built on the specific machine (ie. moving from VM to desktop).

Anyways, I've built a new build-env using relative rpath for pysqlite, you can find it here:
http://people.mozilla.com/~jojensen/dxr-build-env-r3.tar.gz
According to ldd, it should find the right sqlite library.
You can test this by doing "ldd prefix/lib/python*/site-packages/pysqlite2/_sqlite.so" and check if it find sqlite3.so somewhere in /usr/... which would be bad :)

@catlee: If you made any changes to ./build.sh, PKG_CONFIG_PATH or something that I need to put back into dxr-build-env please put them somewhere and drop me a link...
Okay, so in the interest of keeping this thread up to date. I pushed another revision of dxr-build-env containing the PKG_CONFIG hack, yesterday, get it here:
http://people.mozilla.com/~jojensen/dxr-build-env-r4.tar.gz
Again, I recommend rebuilding the dependencies on the build machine, the overhead is fairly small (6 min on 8 cores), atleast I need to do this when moving between different machines.
(deployment-instructions.mkd and )

@catlee, you mentioned problems with sqlite, please post here if they're still present in r4.
Also you are using CentOS 6.0, right? That's what I've tested them on.
By all means let me know if there's anything I can do to speed things up, Lanedo servers are going offline soon, Tuesday or so!
(And I do need at least a day or at least a night to setup a cron job and test that the deployment works properly).
s/(deployment-instructions.mkd and )/(deployment-instructions.mkd and makefile, contained in the dxr-build-env should ease the pain of rebuild dependencies considerably).
Depends on: 778763
Okay, so I got access to a build server, guess we can close #778763.

I hacked a little more apparently LD_LIBRARY_PATH, old python version with development headers and rebuild of all dependencies in place on target machine did the trick.

I strongly recommend that we do a clean build every time, ie. not reuse prebuilt dependencies. This way, we're sure that what I just did can be reproduced!

Let's worry about optimizing this later, I think I'm coming up with significant improvements to the backend, which should make deployment easier and more stable. So let's not waste time on details now.

I've attached the build script as I've hacked it, all sources are included in:
http://people.mozilla.com/~jojensen/dxr-build-env-r7.tar.gz
which is the revision I used to do a successful build on the server.

Note I briefly tested the build on dxr.allizom.org (it's no longer there).
Anyways, to sum up, I need the contents of www/ packaged into tarball and uploaded to a place where I can fetch it from dxr.allizom.org.
If we can give each tarball a sequence number, or whatever, that would be great :)
(Note, there's no need to cache multiple versions of build on the intermediate server).

Anyways, catlee please test this under build bot. It would be great to have automated build by sunset tomorrow :)
Attachment #647561 - Flags: review?(rail)
Attachment #647561 - Flags: review?(jhopkins)
Attachment #647561 - Flags: review?(rail)
Attachment #647561 - Flags: review?(jhopkins)
@catlee, If there's any issues here, ping me or hook me into a server again, and I'll fix it - I hope :)
Attachment #647561 - Attachment is obsolete: true
Attachment #648011 - Flags: review?(rail)
Attachment #648011 - Flags: review?(jhopkins)
Attachment #648016 - Flags: review?(rail)
Attachment #648017 - Flags: review?(rail) → review+
Attachment #648016 - Flags: review?(rail) → review+
Attachment #648011 - Flags: review?(rail) → review+
Attachment #648011 - Flags: review?(jhopkins) → checked-in+
Attachment #648016 - Flags: checked-in+
Attachment #648017 - Flags: checked-in+
This made it into production yesterday afternoon.
As we discussed last week, we try to upload dxr logs to /home/ftp/pub/dxr on stage but this fails because that directory doesn't exist. We end up with nagios alerts like
 buildbot-master12.build.scl1:Command Queue is CRITICAL: 1 dead item
and don't get to add statusdb records or send pulse messages.

The actual indexes go into /home/ftp/pub/firefox/dxr. Did we decide to pick pub/dxr for both, or something else ?
I think the best solution is to use /home/ftp/pub/dxr for both indexes and logs.
Depends on: 781412
I adjusted the upload path slightly. Starting tonight the indexes should be available at
http://ftp.mozilla.org/pub/mozilla.org/dxr/dxr-mozilla-central.tar.gz
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Depends on: 803357
Depends on: 803530
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.