Closed Bug 648676 Opened 13 years ago Closed 8 years ago

[Tracking bug] build and test NSS, NSPR on supported infrastructure

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: KaiE, Unassigned)

References

(Blocks 2 open bugs)

Details

Gerv, when we met in February, you made the kind proposal that Mozilla could look into providing computers that run Tinderbox builds for the NSPR/NSS libraries.

We'd be very glad to accept such an offer.

It would be helpful if Mozilla provided one machine for each platform it cares about: Linux, Mac, Windows, and maybe a mobile machine might be reasonable, too?

At any given time, we maintain at most two branches of NSS, the trunk and the stable branch.

http://tinderbox.mozilla.org/showbuilds.cgi?tree=NSS
http://tinderbox.mozilla.org/showbuilds.cgi?tree=NSS-Stable-Branch

IMHO we can easily build both branches on a single machine, alternating, so one machine per platform should be sufficient.


I think, it would probably be best, if Mozilla engineering provided machines that use the same operating system environment as being used for e.g. Firefox build machines.

If Mozilla gave the NSS team access to an ordinary user account on each machine, that ought to be sufficient.

(We could discuss separately whether we should have root permissions, in order to setup automatic start after reboots.)


I'm currently the sheriff for the NSS tinderbox, and I already managed tinderbox builds for several platforms, so I would be able to setup the NSS builds.

Thanks
Assignee: gerv → nobody
Component: Tinderbox Configuration → Release Engineering
QA Contact: coop → release
1) From comment#0 and the two pages on tinderbox server:
http://tinderbox.mozilla.org/showbuilds.cgi?tree=NSS
http://tinderbox.mozilla.org/showbuilds.cgi?tree=NSS-Stable-Branch
...it sounds like you already have tinderbox servers, and we no longer have any tinderbox clients, so I assume you are asking for build slaves. I've updated summary.

2) Depending on what hardware Kai/Gerv buys, RelEng+IT can image the machines like the production RelEng machines. Once we hand the imaged machines over to you, and you start running them somewhere, then you would need to keep them in sync with any future changes we make to RelEng production systems. This approach means you would have root+user login privs on the machines. There are a few approaches to do this, more details if this is the path you intended.

3) An alternative approach is to modify the NSS and NSS-Stable-Branch trees to behave like Mozilla project branches. This would mean those builds+tests are run on the shared pool of RelEng machines, which are all maintained by RelEng. RelEng would continue to be responsible for maintaining and updating the tools installed on all these machines. Note: you would not have any access to these machines, except under limited per-request basis asking us to pull machines from production. If this approach sounds interesting, we'll need to meet to discuss security and permissions pre-requesites first.
Summary: Could mozilla.org provide managed Tinderbox servers for NSS + NSPR ? → Could mozilla.org provide managed build slaves for NSS + NSPR ?
My view is that, as NSS is a proper Mozilla project, we should aim to be running NSS tinderboxen on the shared Mozilla infrastructure (i.e. John's option 3). However, I don't know if the (reasonable, IMO) restrictions required by RelEng, such as the NSS team not having accounts on the box, cause problems for the NSS team for any reason. Kai?

Gerv
Assignee: nobody → joduinn
Thanks for your proposals. Regarding (1), I'm not sure I understand the difference between tinderbox clients and build slaves.

In my understanding, there is one central tinderbox server, and that's sufficient, we're not asking for another one.

We are asking for machines that run builds and send the results by email to Mozilla's single tinderbox server.



> 2) Depending on what hardware Kai/Gerv buys

It was the hope that Mozilla would buy/donate :)


> RelEng+IT can image the machines
> like the production RelEng machines. 

This sounds good to me, assuming that "production RelEng" is equivalent to "tinderbox build machine/slave".


> Once we hand the imaged machines over to
> you, and you start running them somewhere

Do you mean "hand over physically"?

We aren't asking to have hardware given to us.
We'd like to ask that Mozilla.org runs them in a Mozilla owned building.


> then you would need to keep them in
> sync with any future changes we make to RelEng production systems.

This seems tricky.

I'd say, it's not strictly necessary that we're always in perfect sync.

Would the following work?

- Mozilla provides the machines
- Mozilla is administrator
- Mozilla provides a user account with non-administrator rights
- a NSS developer remotely sets up what's necessary to run the NSS builds

Maybe once a year (or even less frequently), we might ask to be upgraded to the latest RelEng images.

The NSS team would be responsible on their own to backup everything that is on that non-priviliged user account.

Mozilla could simply erase the images, and recreate the user account, and we would recreate the build.


> 3) An alternative approach is to modify the NSS and NSS-Stable-Branch trees to
> behave like Mozilla project branches.

I don't know if this can work. We probably have different scripts, so we'd have to come up with something to make it compatible. But this might be worth looking into.

Note that NSPR/NSS still uses CVS.



> This would mean those builds+tests are
> run on the shared pool of RelEng machines, which are all maintained by RelEng.

This would be fine, if we can manage to make that work.

How would our interface look like?
You check out from a CVS branch and start a script checked out from there?


> RelEng would continue to be responsible for maintaining and updating the tools
> installed on all these machines. Note: you would not have any access to these
> machines, except under limited per-request basis asking us to pull machines
> from production. If this approach sounds interesting, we'll need to meet to
> discuss security and permissions pre-requesites first.

If things are fully automated, and we are able to influence the build script, that would be fine, I think.
Note that any meeting would have to be a phone meeting, because I live in Europe.
Would a setup similar to the nanojit project work? https://bugzilla.mozilla.org/show_bug.cgi?id=506404
Kai, we don't use tinderbox at all anymore - that may be a big part of the confusion here.  We use a system called Buildbot now, which feeds results through the empty husk of a tinderbox server and into the tinderbox push log (tbpl.mozilla.org).  Rather than devote slaves to particular purposes, all of our slaves do all sorts of builds right now.

Option (3) would be a lot of work - we don't use CVS, and it would take a while to make your scripts work in buildbot.

I can't tell from comment 2, but it sounds like you already have a tinderbox server, or at least you're able to set one up?

If so, it sounds like (2) is the best option, perhaps summarized best as "Mozilla buys, hosts, and puts a basic image on N machines, and you can do what you like with them".  We can of course do remote-hands kinds of things, reimage, etc. as necessary.  I don't know that any/all of that is possible -- I'm just trying to disambiguate :)
(In reply to comment #6)
> 
> I can't tell from comment 2, but it sounds like you already have a tinderbox
> server, or at least you're able to set one up?

I'm referring to http://tinderbox.mozilla.org
In my understanding that is a tinderbox server, and that's we currently use.
> I can't tell from comment 2, but it sounds like you already have a tinderbox
> server, or at least you're able to set one up?

We are using the mozilla tinderbox 'server' to push result logs from our builds. The physical machines that do the builds and push the result logs are located at Oracle and Red Hat. They are driven by scripts which are checked into mozilla/security/tinderlight.

In general, most of the nss team do not mess with the scripts themselves (actually even the release team seldom needs to make changes to them). Those scripts simply pull NSS and use the NSS makefiles to build it and the NSS test scripts to test it.

bob
I think there is rather a lot of talking past each other going on in this bug, because the NSS/NSPR teams are using vocabulary which was current, and making assumptions about the system which were true 5 years ago, but are not true now, and the RelEng team 

NSS and NSPR are both core dependencies of Firefox, and fully-fledged Mozilla projects. It benefits us greatly for them to have decent nightly build and cross-OS test coverage. We don't, as far as I know, ask the developers of other Mozilla projects to admin their own build slaves, and so we should be working on a solution which doesn't require that here.

Can Bob, Kai, and someone from RelEng get on the phone and first of all, make sure you are all using the same language and each side has a good understanding of how the other side is working in 2011, and then use that shared understanding to work out a path forward?

Gerv
(In reply to comment #9)
> NSS and NSPR are both core dependencies of Firefox, and fully-fledged Mozilla
> projects. It benefits us greatly for them to have decent nightly build and
> cross-OS test coverage. We don't, as far as I know, ask the developers of other
> Mozilla projects to admin their own build slaves, and so we should be working
> on a solution which doesn't require that here.

Yes, this is more akin to the recent SpiderMonkey-only builds we got going than anything else.
I agree with Gerv that it would be good to get on the phone.

Sorry, I didn't have time yet to move forward on this, I had hoped for a simple solution :)

Bug 654468 is a good example why this bug makes sense, because we have a build failure on OSX 64 bit, only, with latest NSPR/NSS - a platform which we don't cover yet.
What we're doing for nanojit is:
* Polling a separate repo (http://hg.mozilla.org/projects/nanojit-central/) for changes
* Run builds and tests during idle time on our regular build infrastructure. The build is a simple script: http://hg.mozilla.org/build/tools/file/10deea32888a/scripts/nanojit/nanojit.sh

Kai, would something like this work for you?
(In reply to comment #11)
> I agree with Gerv that it would be good to get on the phone.

I'm happy to setup a meeting, if people think it would help. If so, specifically who should be at this meeting to describe the requirements, and agree to decisions?

Drop me an email with your phone/contact info, and timezones, I'll set it up.

I'll loop back here, and summarize after we meet.
found during triage; setting up meeting w/kai, bsmith, myself. Anyone else interested in attending, please email me offline.
Hi. I apologize that I haven't updated this in quite a while. I wonder if we could have the meeting next Wednesday or Thursday while I'll be in MV? Thx.
Summary: Could mozilla.org provide managed build slaves for NSS + NSPR ? → Could mozilla.org provide managed (tinderbox) build slaves for NSS + NSPR ?
(In reply to Kai Engert (:kaie) from comment #15)
> Hi. I apologize that I haven't updated this in quite a while. I wonder if we
> could have the meeting next Wednesday or Thursday while I'll be in MV? Thx.

mtg booked for 5-6pm tmrw w/kai, bsmith, myself.
Assignee: joduinn → nobody
A short term petition based on my understanding of requirements:

- NSPR/NSS developers continue to use CVS

- keep tinderbox.mozilla.org and bonsai.mozilla.org usable

- one (better two) computers per operating system that Mozilla supports, e.g.:
  - Android
  - Mac
  - Windows 7
  - older Windows?
  - Linux (lower priority because Red Hat can contribute Linux machines,
           but they are behind a firewall)

  If we have 64 bit machines, we can test 32/64 in alternating cycles.

- it would be great if we had at least one machine that uses a "big endian" CPU

- all machines should be equipped with OS and basic development environment
  (compiler, bash)

- remote access for the NSS team, ssh is sufficient for most platforms,
  only for Windows we probably require a GUI access (rdesktop or vnc)


Based on the above, we (maybe mostly I) will setup the tests on all those machines and document how to set them up on a public wiki (wiki.mozilla.org)

Setting up Android will probably be a challenge. We might want to have a chat to discuss what can be done here.

The good news is, NSS tests have no GUI, they are limited to console scripts and networking.
 

Near term future:

Once the above is set up, we can work to expand the testing by running additional (and odd) test jobs, which are currently disabled, based on resource constraints.


Future (this should be covered in a future, separate bug):

Consider a transitioning project (like migrating to RelEng test infrastructure), which will require a major amount of rewriting of NSS testing scripts.

The NSS developers have a strong preference on staying with CVS for the primary development.

However NSS usually maintains only two branches (stable branch and development trunk).
I believe this should make it possible to setup some sort of automatic mirroring from CVS to HG, in order to make things easier for transitioining projects.
John, it looks like you're driving this for the time being.
Assignee: nobody → joduinn
What is the next step?
(In reply to matthew zeier [:mrz] from comment #19)
> What is the next step?

Let's start with the first step, and let's postpone all the additional proposals mentioned in this bug.

The first step is what I said in the initial comment in this bug,
but I'll summarize again:


- Mozilla should provide 2 machines
  for each primary platform that the Mozilla project supports:

  - Windows (multiple variants?)
  - Mac
  - Linux
  - Android

- Windows and Mac are highest priority, because our current coverage is bad.

- Linux has a lower priority, because Red Hat already provides Linux systems.

  However, it will be good to get Linux machines, too, 
  it will allow us to run additional sets of tests
  (such as memory leak tests, currently only being run on one Sun Legacy machine)

- Android would be nice, if you can find a way to give me a remote ssh login
  with a compiler installed, but I understand that's trickier.

- Mozilla should run these machines somewhere at a Mozilla space

- Mozilla IT should install base operating system, 
  and install the usual development environment (compiler).

- Mozilla should give remote access to these machines to the NSS team
  (let's start with myself)

- I will setup the tests on those machines, and while I do so,
  I will write wiki documentation of what I did.


The above would solve our initial needs,
and furthermore,
all of the above is a precondition for any future migration work,
and for any potential future community administration
(I would be the initial community administrator).
Depends on: 748476
Depends on: 754908
Depends on: 754974
Depends on: 748478
Depends on: 826751
found in triage of all RelEng tinderbox+NSS/NSPR bugs. morphing to match current reality.
Summary: Could mozilla.org provide managed (tinderbox) build slaves for NSS + NSPR ? → [Tracking bug] build and test NSS, NSPR on supported infrastructure
Depends on: 843380
Depends on: 843381
(In reply to John O'Duinn [:joduinn] from comment #21)
> found in triage of all RelEng tinderbox+NSS/NSPR bugs. morphing to match
> current reality.

As part of the bug-cleanup, this was supposed to be assigned to coop, who is driving this.
Assignee: joduinn → coop
No longer depends on: 843380, 843381
per email on 28mar from kai and bsmith, NSS no longer use tinderbox.m.o. 

Leaving this bug open to track remaining cleanup work.
Can someone help me and summarize the ask of Mozilla IT?

Is this something that can be handled with a EC2 instance?
(In reply to matthew zeier [:mrz] from comment #24)
> Can someone help me and summarize the ask of Mozilla IT?
> 
> Is this something that can be handled with a EC2 instance?

NSS would like to hand-off management of their buildbot master to Mozilla, in the same way that we used to manage the tinderbox server for them. We (releng) have just starting to get our own masters setup in AWS, so yes, that's what makes the most sense to me.

I'm uncertain how the current NSS buildbot master is setup in terms of whether they are using a separate scheduler and database. Kai: can you provide details here on what a Mozilla-(or AWS-)hosted master needs in terms of parity?

There's probably nothing to do here for IT if we go with a master in AWS. Releng can set up a new master in AWS outside of the build network and work with Kai to get in running.
(In reply to Chris Cooper [:coop] from comment #25)
> 
> I'm uncertain how the current NSS buildbot master is setup in terms of
> whether they are using a separate scheduler and database. Kai: can you
> provide details here on what a Mozilla-(or AWS-)hosted master needs in terms
> of parity?

I don't understand the question regarding databases. I'm currently using a standard buildbot 0.8.7p1 with a configuration file, and I adjusted the waterfall html template to show some additional static information. My config file has a few schedulers, and uses HgPoller, but I don't remember having set up any separate database.
(In reply to Kai Engert (:kaie) from comment #26) 
> I don't understand the question regarding databases. I'm currently using a
> standard buildbot 0.8.7p1 with a configuration file, and I adjusted the
> waterfall html template to show some additional static information. My
> config file has a few schedulers, and uses HgPoller, but I don't remember
> having set up any separate database.

Mozilla uses a buildbot variant that stores scheduling info and status in separate dbs. The scheduler component also runs on a separate VM. 

If everything for your NSS master is contained on a single machine, that's makes it quite simple on our end. I'll talk with the releng folk that know more about setting up new AWS instance types and get back to you.
(In reply to Chris Cooper [:coop] from comment #27)
> 
> If everything for your NSS master is contained on a single machine,

Yes it is. In addition to buildbot I run stunnel, to enable secure connections from the outside. I assume an "AWS instance" is a Linux VM?
Product: mozilla.org → Release Engineering
Kai: Sorry for the delay. I finally have some info for you about Mozilla sponsorship of NSS in AWS.

We now have the ability to link (and pay for) individual user accounts under a larger umbrella Amazon account for Release Engineering. If you still want to, you can setup a user account on AWS and create some instances for NSS use. Just provide the email you use to sign up with AWS to joduinn and he'll get you added to the umbrella account. We'll get itemized reports of usage for each billing period on a per-user basis.

In terms of numbers of instances for NSS, Mozilla can easily absorb the cost of up to 10 instances. If you need more than that, we'll need to talk more. No stipulation on what you use the instances for, could be anything that NSS needs: buildbot master, buildslave, etc.

I should note that (other than the billing) these instances would be completely separate from Mozilla's in terms of networking and such.
Flags: needinfo?(kaie)
Chris: Thanks a lot for following up. I've presented your offer to the NSS developers, and the option of running our buildbot master and potentially additional services on infrastructure provided by Mozilla was appreciated.

I've setup an account and I'll send an email to joduinn and cc you.
Flags: needinfo?(kaie)
The releng part of this is done. Un-assigning, but leaving open to track the dependent bugs.
Assignee: coop → nobody
Priority: -- → P3
coop tells me this should not block tinderbox-death!
No longer blocks: tinderbox-death
Kai: I'm not certain what the current status is here. Has this bug in fact been resolved to the satisfaction of the NSS team? Is NSS still using Tinderbox?

Gerv
NSS no longer uses tinderbox.

We have received a few VMs that we hav been using for more than a year already.

The very latest update is, since last week, the only Windows VM that we received from Mozilla has started to crash with system errors (haven't filed a bug yet).
(In reply to Kai Engert (:kaie) from comment #34)
> The very latest update is, since last week, the only Windows VM that we
> received from Mozilla has started to crash with system errors (haven't filed
> a bug yet).

now reported as bug 1006136
We have separate tracking bugs for changes to NSS builds, or problems with amazon VM machines etc - I think the scope of this bug was to provide machinery for building, which I think has now been done.

Kai, Gervase: are you happy for me to close this bug, and remove dependencies on bugs 799855 and 843372 (since I don't think these are true dependencies - but downstream issues).

Also moving to "Platform Support" component since it is about providing a build platform for NSS.
Component: Other → Platform Support
Flags: needinfo?(kaie)
Flags: needinfo?(gerv)
I have no opinion; it's up to Kai.

Gerv
Flags: needinfo?(gerv)
Pete, when you say "downstream", I assume that means the Firefox level (not NSS).

Bug 799855 suggests an enhancement to the Firefox build process level, to get similar NSS test coverage during Firefox builds. I agree that's independent of this bug, which is about NSS testing during development (while bug 799855 is kind of "after" NSS development). I'm removing the dependency.

I'm reading bug 843372 for the first time. I cannot see why we would need to work on it at this time. I suggest to resolve bug 843372 as invalid or similar. I'm removing the dependency.

Regarding this bug: Yes, the original task to get this work started, was done. I'll leave it to you, if you want to keep it open for tracking links to current and future issues (such as bug 1006136), or if you prefer to close it.
No longer depends on: 799855, 843372
Flags: needinfo?(kaie)
I think NSS is using Taskcluster now?
Flags: needinfo?(kaie)
QA Contact: coop
Yes, indeed.  2 years since the last comment & empty dependencies suggests there's no longer anything to track here :)
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(kaie)
Resolution: --- → FIXED
Well, Mozilla isn't testing NSS on Mac yet.
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.