Closed Bug 900663 Opened 9 years ago Closed 9 years ago

Figure out how to deal with gaia-integration-test npm packages

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jgriffin, Assigned: wlach)

References

Details

(Whiteboard: [b2g])

James Lal has gaia-integration-tests almost functional, although without many tests yet.  We should start planning how to get these running in TBPL.

The biggest unknown at this point is how to deal with the npm packages that these tests need.  Some of these packages are ours, in github, and we should probably clone these on the test slaves.

Others will be more-or-less static dependencies.  How have we dealt with npm packages in the past on our test slaves, or haven't we?
Being ignorant of this process (for the time being) isn't there a way to run some code to update the slaves outside of running tests or building gecko (so we can access the internet). Once the base packages are downloaded we probably don't need external access for anything else.

I would like to experiment this with the test VM I have [now that the gaia piece has landed].
In discussion with lightsofapollo and gaye today, it looks like we'd like to use a different directory at http://puppetagain.pub.build.mozilla.org/data/python/packages/ to store mirrors of npm packages.

Dustin, is some work needed to create a /data/npm/packages dir, or can this just be done when we're ready to upload there?
Use of that directory for more than the python packages used by PuppetAgain sort of grew organically, and has some serious flaws that I'd like to not reproduce for Node.  Chief among those is that lots of internal stuff looks to http://repos/data/python/packages for Python packages, which is not a URL that has any degree of reliability attached to it.  So when a host goes down for maintenance, all of the tests fail - bug 885780.

We also shouldn't have tests accessing something in *.pub.build.mozilla.org.  Like http://repos, it's single-hosted and sometimes goes down.  Its purpose is to provide a convenient way for external users to get access to the binary files associated with our puppet installation, so a half-hour's downtime here and there isn't considered critical.

Organically, that site has grown to also provide developers a way to know that they're using the same packages as the releng systems.

Internally, though, the correct solution is to train clients (build/test machines) to try multiple URLs when one is down, just like they normally do for yum or apt repos.  For Python, that's done with pip.conf.

As such, rather than pattern this on the Python repo, I'd like to:
 (a) set this up in a fashion that we can serve it reliably to build/test hosts
 (b) set this up in a fashion that external users can expect it to work all of the time
 (c) have a plan to transition the Python stuff to something parallel

So, can you tell me more about now npm finds packages?  Is there an npm equivalent to pip.conf?  Does a pile of Node packages need to be anything more that a directory index?  Do we need to have some sort of node-specific index file?
Gaye and James are the npm experts here.
Flags: needinfo?(jlal)
Product: mozilla.org → Release Engineering
pinging James and Gareth for npm details needed per comment #3
So node has options to specify the server used for repositories so we can easily specify our own internal server. The tricky part is setting up the replica for our npm server. There is a good blog post overview here http://www.idimmu.net/2013/06/20/how-to-create-an-npm-repository-mirror/
Flags: needinfo?(jlal)
So in theory we can manually push things to our clone (that should be easy) but I think a much better option would be to automatically update a set of whitelisted packages when they are pushed to npm OR when gaia pushes a new package.json.. I think this concept applies well to our pip situation where we currently need to manually push packages.
> but I think a much better option would be to automatically update a set of 
> whitelisted packages when they are pushed to npm OR when gaia pushes a new 
> package.json.

I don't think we can use package.json as mechanism for triggering this.  The reason is that changes to package.json will also kick off a test run, and the uploading of new packages could occur after the test run (since the two are not coupled), resulting in broken tests.

But, I think polling npm for new package versions and posting them on our mirror is probably not too difficult.  In the meantime, we can go with manual updates.

Dustin, do you have enough information to set up the npm mirror?  It would be great to have this in place prior to the B2G work week on Sept 9.
Flags: needinfo?(dustin)
Not really - this doesn't address any of the issues in comment 3, and is mostly concerned with how to get updates, which is really secondary.

From the blog entry in comment 6, it looks like an NPM repository is, fundamentally, a public couchdb instance, which could be tricky (requiring a security review, at least, unless someone's done this here before).

And I don't see any suggestions about mirroring this in such a way that it is not a single point of failure.  I'm not familiar enough with couchdb to know how its clustering works, nor how or if the npm client can be configured to try multiple servers.

I'm not sure what you're suggesting in terms of giving access both to buildslaves (internally) and developers (externally), either.  Perhaps clustering could fix that, too, with some cluster nodes serving internal clients and some serving external clients?

To put this another way, I'd like to configure buildslaves with

  registry =
    http://someserver.scl3.mozilla.com:5984/blah
    http://someserver.scl1.mozilla.com:5984/blah
    http://someserver.usw2.mozilla.com:5984/blah
    http://someserver.use1.mozilla.com:5984/blah

so that a client can reach its nearest mirror, but fall back to other mirrors.  And I'd like developers' systems to be configured with

  registry =
    http://npmjs-a.pub.build.mozilla.org:5984/blah
    http://npmjs-b.pub.build.mozilla.org:5984/blah

with a similar find-the-working-server behavior.  Barring that functionality, we can put a load balancer in front of the public-facing registry without too much loss.  However, doing so for the buildslaves poses a significant risk since the load balancers are in one datacenter and loss or congestion on the links to that datacenter will then cause build failures and closed trees.  No good.

So that's the kind of functionality I need more info on.  If the npm client doesn't support this kind of behavior, can we hack it in?  Would round-robin DNS help (this still requires support in the client to be effective)?  How does couchdb's clustering work?  Can it operate across sites, and continue operating with one or more sites down?
Flags: needinfo?(dustin)
I'll take a look at this next week, unless someone else wants to volunteer. It looks like the npm server protocol is rather simple, some people have already created their own servers which don't depend on couchdb:

http://blog.strongloop.com/deploy-a-private-npm-registry-without-couchdb-or-redis/
Assignee: nobody → wlachance
Ok, so I did a bunch of research on this topic today (put my working notes in https://etherpad.mozilla.org/npm-mirror). So a few things:

1. No, npm does not currently support multiple repositories directly. It looks like the maintainer initially told people that he would not allow that (https://github.com/isaacs/npm/issues/100) and then changed his mind (https://github.com/isaacs/npm/issues/1401). However, there is no progress on this issue, so there's not much we can really do. I suppose we could try calling "npm" multiple times with a different server configured if it failed because one of them died.

2. CouchDB is indeed a bit of a beast, and mirroring everything on npm.org (which is non-optional AFAICT) seems like serious overkill for what we're trying to do. To be honest I didn't look that closely at it, because I was excited to find a very simple replacement server called "reggie" (https://github.com/mbrevoort/node-reggie) which seems to be all that we'd really need for the purposes of running gaia-integration-test. The procedure would look something like this:

* With a fresh install of node in a clean directory, install all the dependencies needed to run the gaia integration tests.
* Then run this command line: for f in `find ~/.npm -name 'package.tgz'`; do NAME_VER=`echo $f | sed 's/^.*\/\([^\/]*\)\/\([^\/]*\)\/package.tgz/\1\/\2/'`; echo $NAME_VER; curl -T $f http://<reggie-server>/package/$NAME_VER; done

After you've done that, you should be good to re-install all the required dependencies just using <reggie-server>

If we wanted to make a public mirror of the content on the private server, I imagine we could probably just copy the files over to a public-facing machine.

The main problem is that "publishing" a package currently has no authentication associated with it, so anyone with access to the reggie server could publish a new version. We probably need to fix that, not sure how difficult that would be but I'd guess not that hard. The source code to reggie is pretty trivial:

https://github.com/mbrevoort/node-reggie/blob/master/server.js

(heck, we could even rewrite the thing python without that much trouble if we really wanted to)

What do you think Dustin? Does using a modified version of reggie seem like a viable way forward, or do we need to look more closely at couchdb, or ... ?
Flags: needinfo?(dustin)
Ok, I thought about this a bit more. There's two things that we could potentially do to streamline this as well as make it way more secure:

1. For the deployed version of the reggie server, we could just turn off all functionality that lets users upload new packages by setting a boolean configuration parameter instead of going to the trouble of setting up "real" authentication. Not sure why I didn't think of this earlier. :)

2. Perhaps even better, I just did an analysis of the read-only API that reggie provides, and I see no reason we could not just mirror it with a static directory structure served by something like nginx and apache. We'd just need to write a quick script in node or python or whatever to write all the key JSON information and files out in the appropriate format. In that case, we could easily provide the load-balancing architecture that Dustin asked for.

I am tempted to go forward with (2), if that sounds acceptable to people.
2- sounds very attractive to me given the ease of automating that... we could then build a second service on top of this in the future to poll the list of packages we care about to update the dir.. This is assuming that the static dir structure can handle multiple versions of the same packages.
#2 sounds great.  We can set up an admin system that can either run a full version of reggie with LDAP and HTTP auth in front of it, or at least allow users to upload.  That admin system can also do the polling.

With static service like that, we can build enough resiliency on the server side (multiple webservers, multiple HA load balancers) that I'm comfortable with a client that does not retry.
Flags: needinfo?(dustin)
Ok, starting on a python script (will probably eventually rewrite as a nodejs script?) which crawls through an npm server and mirrors data locally. It mostly just a straightforward wget like process, with two differences:

1. Some paths returned in the server json are absolute, so we need to rewrite those (fortunately the format is extremely regular).
2. We will need to rewrite urls like /marionette-js-runner/ to return a file called (e.g.) index.json instead of the directory contents or a forbidden message. Should be trivial to do with nginx or apache.

I will hopefully post a prototype here tomorrow.
Ok, so I finished up a basic prototype which mirrors a set of packages directly from npm.org (after thinking about it a bit, I figured that uploading packages to reggie as an intermediary step was both unnecessary and annoying):

https://github.com/wlach/mirror-npm

It's written in python because I wanted to get something working quickly and it's what I know. It probably makes more sense to write this in JavaScript/node given the intended audience of the utility, so I'll probably do that at some point in the near future. Anyway, to get it working do something like this:

git clone https://github.com/wlach/mirror-npm.git
cd mirror-npm
virtualenv .
./bin/pip install mozhttpd httplib2
mkdir mirror
./bin/python mirror-npm.py http://registry.npmjs.org mirror http://<public ip or dns of machine which will be hosting stuff> marionette-js-runner

This will install a set of packages and metadata in the "mirror" directory. Unfortunately the mirror directory is not quite usable as-is -- some rewrite rules are necessary. Should be trivial to setup for nginx or apache, but for now I've written a simple webserver using mozhttpd which does that for you. So use that for testing purposes:

./bin/python npmserver.py mirror

You should now be able to reinstall marionette-js-runner from any machine with access to yours. Just run:

npm --registry http://<ip of machine running npmserver.py>:8888 install marionette-js-runner

Dustin, does this approach seem basically sound to you?
Flags: needinfo?(dustin)
Yep, looks nice.  Can you describe what rewrite rules are required?  I'll open a webops bug to work on this - I'll do the work, but it's in their domain.
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] from comment #17)
> Yep, looks nice.  Can you describe what rewrite rules are required?  I'll
> open a webops bug to work on this - I'll do the work, but it's in their
> domain.

Actually I just realized it's really only one (the index thing is unnecessary). Essentially you need to make sure that URLs like:

http://server/marionette-js-runner

get mapped to this file in the mirror: marionette-js-runner/index.json

You can see how we do it in our test server here: https://github.com/wlach/mirror-npm/blob/master/npmserver.py#L24
Oh, so probably
  DirectoryIndex index.json
would do the trick.  Very cool.
I'd like to have some idea of the bandwidth this will require, so we can plan capacity.  Will clients download MB's of packages on every run?  Will those be cached locally?  How many runs do you expect?  How do you expect these numbers to change over time?  I know there aren't firm answers to most of this, but give me as much guesswork as you can :)
(In reply to Dustin J. Mitchell [:dustin] from comment #20)
> I'd like to have some idea of the bandwidth this will require, so we can
> plan capacity.  Will clients download MB's of packages on every run?  Will
> those be cached locally?  How many runs do you expect?  How do you expect
> these numbers to change over time?  I know there aren't firm answers to most
> of this, but give me as much guesswork as you can :)

It depends on whether this information is cached between runs on the clients running the tests. Assuming it is, I would expect the bandwidth requirements to be quite minimal. Even for clients downloading new data, my .npm directory is only 21 megs big after installing marionette-js-runner.
Is there any explicit provision in the system design for that caching?  For example, tooltool was specifically designed to cache, and in practice does not.
(In reply to Dustin J. Mitchell [:dustin] from comment #22)
> Is there any explicit provision in the system design for that caching?  For
> example, tooltool was specifically designed to cache, and in practice does
> not.

That really depends how the environment on the slaves is configured/managed. Assuming we're running tests on Linux hosts with a persistent home directory between runs, we should be good. This is probably something to ask about when we actually configure test slaves to start running this stuff. :)
Does the client cache in the home directory, or in the build directory?  The build directory is, I think, obliterated between runs.
This probably depends on the runner and/or mozharness script, which isn't written yet.  It also depends on whether the linux AWS VM's that this will be running on are recycled between each test run, which I also don't know.  :catlee, can you tell us the answer to the latter question?
Flags: needinfo?(catlee)
Flags: needinfo?(catlee)
Whiteboard: [b2g]
We do reuse the AWS VMs rather than creating them from scratch per test run. So some kind of cache outside of the build directory would be preserved between runs.
In that case it should be easy to cache them in some directory that doesn't get clobbered, as long as that isn't prevented by the integration harness.
wlach: can you make mirror-npm pip-installable (by adding a setup.py)?  It'd be great if that had the proper install_requires, too.
(let's move that to bug 912973, where I have some other changes to request)
So I have a little mozharness script that should run our integration tests now. However, it runs |npm install| and then shells out to a node script. How shall I make sure npm grabs the packages from our mirror instead of npmjs.org?
Flags: needinfo?(wlachance)
(In reply to Gareth Aye [:gaye] from comment #30)
> So I have a little mozharness script that should run our integration tests
> now. However, it runs |npm install| and then shells out to a node script.
> How shall I make sure npm grabs the packages from our mirror instead of
> npmjs.org?

Pass "--registry http://<server>" to the npm command.
Flags: needinfo?(wlachance)
Sweet! And where will our server live?
Flags: needinfo?(wlachance)
That's still TBD in bug 912973, but most likely http://npm-mirror.pub.build.mozilla.org
Flags: needinfo?(wlachance)
Is there any reason not to close this?
I think it makes sense to close this; there still are issues with npm packages, but those are being tracked in e.g., bug 953309.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.