Closed Bug 937437 Opened 11 years ago Closed 11 years ago

Make npm-mirror script run regularly (perhaps on crontab)

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gaye, Unassigned)

References

Details

Gaia's nodejs dependencies are regularly updated and I have spent too much time in #releng asking humans with better things to do to re-run my npm sync script. Can we make it run every hour or so on crontab?
CC dustin, since this would likely be implemented in cron on relengwebadmin
CC sheriffs for their input on how to make these changes sufficiently visible
Another option is to have a little server that polls https://raw.github.com/mozilla-b2g/gaia/master/package.json, notices when there is a change with diff, and then runs the sync script. The one issue here is that since we allow "loose" versions in gaia, a dependency could push an update that we want without our manifest changing. This is probably not a big deal though.
I would prefer we be explicit when it comes to dependencies.

I didn't even know Gaia used nodejs or that we sometimes update it invisibly. Please can we use the package.json approach? (Or else some other kinda of explicit in-tree manifest - eg the manifest in mozilla-central that lists the gaia hash being pulled, and once bug 899969 is fixed will also list other repos too).
I'm happy to do the crontab, once the bigger issues of visibility are addressed.

Ed, to be clear, this is about syncing the mirror containing the packages pulled down during the Gaia test process.  Gaia *is* using the package.json approach, and the request here is that we periodically make sure that the mirror contains the packages specified in that package.json and the latest versions of all of their dependencies.

It's the dependencies that Gareth is indicating could be problematic.  We would have the same problem if we were hitting npmjs.org directly.  I submit that the fix is to include all known dependencies in the package.json manifest as well, with exact (or, at least, loose) versions.  Then everyone, whether building against our mirror or against npmjs.org, can expect the same packages to be installed.

The other issue is, when the manifest changes, builds will fail until the mirror is synchronized to include the packages it requires.  When we first set this up, my impression was that we would avoid this problem by only making infrequent updates to the manifest, and by requesting that the packages in the manifest be mirrored before landing the manifest update.  That does not appear to be the case.

We could run a crontask that polls hg for changes to package.json, and only re-mirrors when the file contents change.  Polling hg every 3 minutes is not uncommon.  Gareth, do you think you could work that up?  It could either be a shell script, or an extension to the existing mirror script.  I expect we will quickly want to poll multiple package.json URLs, so it's probably worth supporting that from the start.
Ah sorry I understand now - sounds fine, thank you :-)
Found in triage:

1) I note that nodejs is being used by SPDY test automation, so visibility is important in case of accidental SPDY bustage because of nodejs update.

2) We mirror all repos used in production from github to git.m.o... Given the volume of machines doing pulls, we were seeing too many intermittent failures pulling from github every time. Are these nodejs dependencies on git.m.o? (if not, connect with hwine + aki for help with that, if needed.)
Component: Other → Platform Support
QA Contact: joduinn → coop
Depends on: 937780
Depends on: 937791
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #4)

> It's the dependencies that Gareth is indicating could be problematic.  We
> would have the same problem if we were hitting npmjs.org directly.  I submit
> that the fix is to include all known dependencies in the package.json
> manifest as well, with exact (or, at least, loose) versions.  Then everyone,
> whether building against our mirror or against npmjs.org, can expect the
> same packages to be installed.

https://npmjs.org/doc/shrinkwrap.html is a nice tool that can help us here. I don't think this is immediately necessary since the build is still on cedar, but once we get *some* green builds, I think it would be great to lock down dependencies' dependency versions.

> We could run a crontask that polls hg for changes to package.json, and only
> re-mirrors when the file contents change.  Polling hg every 3 minutes is not
> uncommon.  Gareth, do you think you could work that up?  It could either be
> a shell script, or an extension to the existing mirror script.  I expect we
> will quickly want to poll multiple package.json URLs, so it's probably worth
> supporting that from the start.

I've filed 937780 and 937791 to add support for multiple module manifests and for fetching a module manifest over http. I'll probably fix these later today and publish a major version of npm-mirror.
(In reply to John O'Duinn [:joduinn] from comment #6)

> 1) I note that nodejs is being used by SPDY test automation, so visibility
> is important in case of accidental SPDY bustage because of nodejs update.

Can you elaborate a bit here? What needs to be visible and what is updating?

> 2) We mirror all repos used in production from github to git.m.o... Given
> the volume of machines doing pulls, we were seeing too many intermittent
> failures pulling from github every time. Are these nodejs dependencies on
> git.m.o? (if not, connect with hwine + aki for help with that, if needed.)

We're using npm instead of git to manage our nodejs deps.
I've fixed 937780 and 937791 and published the fixes in npm-mirror@1.1.0. Now we can do

npm-mirror --master http://registry.npmjs.org --manifests http://hg.mozilla.org/integration/gaia-central/raw-file/default/package.json --hostname http://npm-mirror.pub.build.mozilla.org --root packages/

Notably, the "manifest" arg is replaced with "manifests" which now takes a comma separated list of manifests which can be specified either as local filesystem paths or as urls.

Dustin - will you go ahead and

1. upgrade npm-mirror to v1.1.0
2. add something like the command above to crontab to run every few minutes

Thanks!
I can, but let's make sure that's OK first.

I had two other suggestions in comment 4: updating gaia's manifest to include at least loose versions of all dependencies (fixing the visibility issue); and only checking for new upstream packages when the manifest file's contents change (avoiding banging npmjs.org with a few hundred requests every few minutes).

Without the first, I don't think we have the visibility that Ed would like, so I don't think this should get applied just yet.  Ed, what do you think?
Ah, I missed Gareth's comment 17 about shrinkwrap.  The question about only updating when manifests change remains, though.
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #10)
> I had two other suggestions in comment 4: updating gaia's manifest to
> include at least loose versions of all dependencies (fixing the visibility
> issue); and only checking for new upstream packages when the manifest file's
> contents change (avoiding banging npmjs.org with a few hundred requests
> every few minutes).

As we discussed on IRC, I replied to the first suggestion in c7. The second suggestion is good, but we shouldn't do it until we go for shrinkwrap since the packages served by the mirror should match the ones served by npmjs.org.

> Without the first, I don't think we have the visibility that Ed would like,
> so I don't think this should get applied just yet.  Ed, what do you think?

I'm the only one looking right now and it's permared :)
Another option instead of running something on crontab is for me to add a post-commit hook to github that pings a little server that I can write to fetch the appropriate manifest(s) from GitHub and run the mirror script.
I was hoping I could use curl's -z and --write-out options to rely on Github and If-Modified-Since to determine modification times -- but it seems that doesn't work very well.  I get a lot of 200 OK's with a few 304's interspersed.

A github hook is tricky, because it means writing code to handle a POST on the webheads, and finding some way to communicate that back to the admin host which actually runs the mirrors.  I think that just polling github will work out better.  To that end, here's the script, which I will configure to run every 10 minutes:

#! /bin/bash

SITE_DIR=`cd $(dirname $0); pwd`
cd $SITE_DIR
MANIFEST_DIR="$SITE_DIR/manifests"

need_remirror=false
manifests=
check_manifest() {
    local url=$1
    local manifest=$MANIFEST_DIR/$2
    manifests="$manifests $manifest"
   
    if [ ! -f $manifest ]; then
        curl --silent -o $manifest $url
        return 0  # need remirror
    fi
   
    rm $manifest.new
    if [ $(curl --silent -z $manifest -o $manifest.new --write-out %{http_code} $url ) == "305" ]; then
        return 1  # don't need remirror
    fi
   
    # github doesn't often return 304's, so compare manually..
    if cmp -s $manifest $manifest.new; then
        rm $manifest.new
        return 1  # don't need remirror
    fi
   
    rm $manifest.new
    return 0  # need remirror
}

# to add extra monitored manifests, just add a line similar to this (including the &&.. bit at the end)
check_manifest https://raw.github.com/mozilla-b2g/gaia/master/package.json gaia_package.json && need_remirror=true
#check_manifest https://raw.github.com/mozilla-b2g/marionette-js-runner/master/package.json marionette_js_runner_packgage.json && need_remirror=true

if $need_remirror; then
    manifests_commas=$(echo $manifests | sed -e 's/ /,/g')
    DEBUG=* ./node_modules/npm-mirror/bin/npm-mirror \
        --manifests "${manifests_commas}" \
        --hostname http://npm-mirror.pub.build.mozilla.org \
        --master http://registry.npmjs.org \
        --root /mnt/netapp/relengweb/npm-mirror
    # if that failed, kill the manifests so that we try again next time
    if [ $? != 0 ]; then
        for manifest in $manifests; do
            rm $manifest
        done
    fi
fi
I'm going to close this out of hope, but please re-open if you see problems.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
If we're polling, then we should use hg.mozilla.org instead of GitHub. The only advantage in subscribing to the GitHub hooks was that we wouldn't have to poll. The gaia that mozharness fetches is a mirror that lives in hg at http://hg.mozilla.org/integration/gaia-central so the manifest we should use is http://hg.mozilla.org/integration/gaia-central/raw-file/default/package.json.

Also, I don't mean to discourage, but there should be no need for any abstractions on top of my mirroring script. As of v1.1.0 npm-mirror natively supports the ability to download a package.json file over http/https and sync its dependencies. The sync script does not download anything it has already synced so there is literally no advantage in a caching, manifest-comparing wrapper.

To be verbose, what I need to run on crontab (if you do feel that we can't have a server that listens for GitHub hooks) is:

```
DEBUG=* npm-mirror \
  --manifests http://hg.mozilla.org/integration/gaia-central/raw-file/default/package.json \
  --hostname http://npm-mirror.pub.build.mozilla.org \
  -- master http://registry.npmjs.org \
  --root /path/to/packages
```

Please help me get this done. The gaia team is continuing to hurt since this is blocking us catching gecko patches that break us.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
QA Contact: coop → gaye
I switched it to use the hg mirror.  I can remove the only-if-changed logic, but (a) is that really worthwhile given working code and (b) does that end up hitting the npmjs.org registry's index.json for every package, and is that going to cause a problem for npmjs.org?
Updated shell script:

#! /bin/bash

## this script is run from a crontask as configured by puppet

SITE_DIR=`cd $(dirname $0); pwd`
cd $SITE_DIR

# comma-separated list of URLs
manifests=http://hg.mozilla.org/integration/gaia-central/raw-file/default/package.json

DEBUG=* ./node_modules/npm-mirror/bin/npm-mirror \
    --manifests "${manifests}" \
    --hostname http://npm-mirror.pub.build.mozilla.org \
    --master http://registry.npmjs.org \
    --root /mnt/netapp/relengweb/npm-mirror
Thanks Dustin! I think we should be good now
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.