Add node_modules directly into gaia repo

RESOLVED FIXED

Status

RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: mikehenrty, Unassigned)

Tracking

unspecified
Dependency tree / graph

Firefox Tracking Flags

(b2g-v1.3T fixed)

Details

Attachments

(1 attachment, 1 obsolete attachment)

46 bytes, text/x-github-pull-request
yurenju
: review+
Details | Review | Splinter Review
Periodically, gaia tests break due to problems with the NPM registry. This can cause tree closures, or worse, breaking code to get merged into master. In addition to this, we maintain a mirror of NPM for TBPL, which is a drain on engineering resources. Therefore I suggest we check node_modules directly into the gaia repo, and bypass NPM altogether for our test environments.
Assignee: nobody → gaye
Proposal: Use an external repository for gaia node modules. This has the benefit of keeping the source tree clean and small, and not depending on npm. We depend on github for gaia, so I think we can depend on them for another folder from the same porject. My recommendation would be to create a project: mozilla-b2g/gaia-node-modules which would then be pulled into gaia with the Makefile. I think we would just need to change the npm install to npm remake, and add a command before that to fetch the library code.

I would personally recommend on a git subproject, as those are fairly easy to implement, and it should just be a single subproject (without subproject dependencies, where things get messy). I'm sure there are other solutions though.

I'm open to doing a POC here if it would be helpful.
Is the idea of using a separate project to keep gaia source small? I just did a fresh rm -rf node_modules && npm install, and it looks like node_modules is only about 1K. My preference would be to not have a separate repo, since I couldn't image gaia node modules being useful anywhere else, nor is it an optional plugin like the marionette-js helper repos.
The current size of node_modules is reading 88M on my mac. Although this may grow significantly, it's not only about size. Lots of people have workflows that this may disrupt (e.g., autocomplete filenames in github, or other IDE plugins which do this).

I am mainly worried about potential developer workflow impact, as well as unknown repository performance issues in the future. I'm assuming that our npm list will start to grow quite a bit as we are just starting to embrace node and npm.
(In reply to Kevin Grandon :kgrandon from comment #3)
> I am mainly worried about potential developer workflow impact, as well as
> unknown repository performance issues in the future. I'm assuming that our
> npm list will start to grow quite a bit as we are just starting to embrace
> node and npm.

Makes sense.
People told about size doesn't or does matter seems ignored one point is that node_modules should not be a part of repo, not only because of the size, but in the role they play. These are definitely not a part of our code base, thus we shouldn't manage they as other parts of our codebase (by git).

We should use other tools to maintain these dependencies, like Ruby's bundle. I don't think put these released code under git's management is a good idea, especially some of them may be native (binary) modules (forbid them just because of the convenience is strange, too).

I think the main problem is our repo just plays multiple roles: for developers, it should be a source repo, managed by git and includes only those parts we developed; for testers, it should be a semi-built repo, includes everything ready to be build and test. The later is more like the Linux distro's "repo". For example, Debian's apt-get packages, which is not the same with those package's original source repo. If we can successfully manage our repo in such way (one source repo, one test semi-built repo), it may be a good solution to solve the conflicts between these two different requirements. Although I don't think we can jump to that way easily.

I know testers work very hard on fighting NPM modules and other issues. But I really worry our repo would become a huge (not for size) and complex thing just because we use wrong tool or wrong way to manage it. In fact, some problem of them can be solved by adopting some new tools, like separating modules with Gitslaves which I'm studying these days. But if we're restricted in the current development and management model, we may never end conflicts like this.
(And I said bundle not means we should use it or learn it, I just remembered it has .gemlock, which would lock the git version.)
I don't see the benefit of including some ruby bundle scripts and new tools when we already have all of the tooling there that we need. We have git and npm already - both are more than capable of handling this in an elegant way.
(In reply to Kevin Grandon :kgrandon from comment #7)
> I don't see the benefit of including some ruby bundle scripts and new tools
> when we already have all of the tooling there that we need. We have git and
> npm already - both are more than capable of handling this in an elegant way.

If we can find a good solution, yes, we can use existing tools, and that would be perfect (but check-in whole node_modules won't). I also mentioned that I don't think we need to introduce that but the concept maybe we can use.

(BTW: we've found that git seems can't manage super-sub project architecture well (people already asked similar questions on internet), thus someone invented tools just like Gitslave to solve this extending problem. I mention tool issues just because maybe some problems are caused by the tool itself, or the way we use).
+1 for Kevin's proposal to use an external repository for gaia node modules only for TBPL.
see https://npmjs.org/doc/faq.html#Should-I-check-my-node_modules-folder-into-git for some external thoughts.

I've still not set my own advice yet. I think I'm more leaning into checking it in, but then I think we need clear rules about how to update it, and who can update it.
Also don't forget we have binary modules like "sock-it" and the upcoming "exec-sync" for bug 917717. This complexify the issue as I don't trust checking binaries in and also cause issues about binary compat on our VM.
+1 for a mirror in an external repo.
although I +1 for kevein's solution

but the best solution for fixing root problem is make TBPL support npm (or internet access) just like travis-ci, I know we have some security concern for accessing internet, but travis-ci, drone.io, codeship.io and other CI service supported internet access, so we should consider to support it.
And go down like we had to close gaia's tree today when npm was having trouble? That doesn't seem like a win to me.
package.json also supports naming dependencies in ways other than just a version that needs to be looked up on the flakey npm servers.  Quoting https://npmjs.org/doc/json.html#dependencies

===
URLs as Dependencies

You may specify a tarball URL in place of a version range.

This tarball will be downloaded and installed locally to your package at install time.
Git URLs as Dependencies

Git urls can be of the form:

git://github.com/user/project.git#commit-ish
git+ssh://user@hostname:project.git#commit-ish
git+ssh://user@hostname/project.git#commit-ish
git+http://user@hostname/project/blah.git#commit-ish
git+https://user@hostname/project/blah.git#commit-ish

The commit-ish can be any tag, sha, or branch which can be supplied as an argument to git checkout. The default is master.
GitHub URLs

As of version 1.1.65, you can refer to GitHub urls as just "foo": "user/foo-project". For example:

json { "name": "foo", "version": "0.0.0", "dependencies": { "express": "visionmedia/express" } } 
===


Instead of checking everything in, we could use a variant on shrinkwrap and use infrastructure that we already absolutely depend on.  For example, mirror our dependencies on our git server, depend on github, etc. etc.
Andrew - nice! Currently experimenting, as I think that could be a solid intermediate step. I would suggest giving it a shot, and if we still face any difficulties, we can try another full-fledged approach.

Experiment here: https://github.com/mozilla-b2g/gaia/pull/15338
(In reply to Kevin Grandon :kgrandon from comment #16)
> Andrew - nice! Currently experimenting, as I think that could be a solid
> intermediate step. I would suggest giving it a shot, and if we still face
> any difficulties, we can try another full-fledged approach.
> 
> Experiment here: https://github.com/mozilla-b2g/gaia/pull/15338

Argh - I thought this would work, but I guess it would still have to reach out to NPM to get dependencies of dependencies. I think it's not going to work =/
Oh right - shrinkwrap *all the things*, then have a tarball for each one.. That should work.
Comment spam, sorry. 

So it looks like the best option is to leave package.json as it is today. If we run a npm shrinkwrap --dev, we will generate an npm-shrinkwrap.json, with *all* of the dependencies laid out. We would then need to have a step where we change version numbers for github URLs. I don't see us being able to manually do this, but it may be possible with a bit of clever scripting. I don't really have time to take the prototype further ATM, but may be able to look into this in the next few weeks.
Sorry for this naive question, is it really difficult to have a local (as in "local to mozilla") npm proxy somewhere? I can only imagine the use of npm will grow up in Mozilla over time.

There are at least 2 caching npm proxy I found using a simple google search:
* https://npmjs.org/package/npm-proxy-cache
* https://npmjs.org/package/nopar

"nopar" says it caches metadata as well, not sure for "npm-proxy-cache".
(In reply to Julien Wajsberg [:julienw] from comment #20)
> Sorry for this naive question, is it really difficult to have a local (as in
> "local to mozilla") npm proxy somewhere? I can only imagine the use of npm
> will grow up in Mozilla over time.
> 
> There are at least 2 caching npm proxy I found using a simple google search:
> * https://npmjs.org/package/npm-proxy-cache
> * https://npmjs.org/package/nopar
> 
> "nopar" says it caches metadata as well, not sure for "npm-proxy-cache".

We do use an npm-mirror in TBPL (due to restricted network access of those machines), bug 937437 for example. Gareth has also deployed his own NPM mirror to AWS one day when NPM was having issues, and it worked like a charm. But in my opinion, this adds infrastructure we have to maintain, and better would be to rely on github (ie. node_modules in a separate repo, or tarball urls).

On a side note, there was talk from jgriffin today in #ateam about allowing access to github in TBPL. If this happens, I think we can kill two birds with one stone here, and eliminate our NPM mirror for that too.
That's great! do we have a bug for it?
(In reply to Michael Henretty [:mhenretty] from comment #21)
> (In reply to Julien Wajsberg [:julienw] from comment #20)
> > Sorry for this naive question, is it really difficult to have a local (as in
> > "local to mozilla") npm proxy somewhere? I can only imagine the use of npm
> > will grow up in Mozilla over time.
> > 
> > There are at least 2 caching npm proxy I found using a simple google search:
> > * https://npmjs.org/package/npm-proxy-cache
> > * https://npmjs.org/package/nopar
> > 
> > "nopar" says it caches metadata as well, not sure for "npm-proxy-cache".
> 
> We do use an npm-mirror in TBPL (due to restricted network access of those
> machines), bug 937437 for example. Gareth has also deployed his own NPM
> mirror to AWS one day when NPM was having issues, and it worked like a
> charm. But in my opinion, this adds infrastructure we have to maintain, and
> better would be to rely on github (ie. node_modules in a separate repo, or
> tarball urls).

But you'll have to maintain your node_modules directory as well.

A npm proxy is something very nice: you basically don't need to do regular jobs. It behaves like a mirror except it's fetching packages from the real thing if it's not present yet. This removes all the burden of mirroring the packages yourself (which you need to do with npm-mirror or checking in node_modules). As soon as someone runs npm install using that proxy, the proxy will have the packages locally.

I believe this actually _removes_ maintenance work over checking in node_modules to github.

To be clearer: this would be a maintenance work that our IT team would do, because that's something they know how to do, as opposed to maintenance work that developers would need to do. I believe this is better.
(In reply to Yuren Ju [:yurenju] from comment #22)
> That's great! do we have a bug for it?

we do! bug 960201
(In reply to Julien Wajsberg [:julienw] from comment #23)
> A npm proxy is something very nice: you basically don't need to do regular
> jobs. It behaves like a mirror except it's fetching packages from the real
> thing if it's not present yet. This removes all the burden of mirroring the
> packages yourself (which you need to do with npm-mirror or checking in
> node_modules). As soon as someone runs npm install using that proxy, the
> proxy will have the packages locally.

Ah, I didn't realize the difference between mirror and proxy (implicit vs. explicit update). Proxy seems like a better solution than mirror in general, even for the tbpl use case.

In any case, my thinking on this bug is that we want to put more of our destiny in our hands (meaning gaia developers). Relying on NPM, or even a different team who maintains a proxy adds potential down time. Sure, we rely on github if we use a submodule, but without github we are dead in the water anyway.
> Ah, I didn't realize the difference between mirror and proxy (implicit vs.
> explicit update). Proxy seems like a better solution than mirror in general,
> even for the tbpl use case.

But remember that we can't go to the master registry when we're missing a package so we have to load the dependencies from the master eagerly (in the tbpl case).
(In reply to Gareth Aye [:gaye] from comment #26)
> > Ah, I didn't realize the difference between mirror and proxy (implicit vs.
> > explicit update). Proxy seems like a better solution than mirror in general,
> > even for the tbpl use case.
> 
> But remember that we can't go to the master registry when we're missing a
> package so we have to load the dependencies from the master eagerly (in the
> tbpl case).

Which would be just automatically done as soon as a developer (you, Michael, me) uses the proxy to run npm install.

(In reply to Michael Henretty [:mhenretty] from comment #25)
> 
> In any case, my thinking on this bug is that we want to put more of our
> destiny in our hands (meaning gaia developers). Relying on NPM, or even a
> different team who maintains a proxy adds potential down time. Sure, we rely
> on github if we use a submodule, but without github we are dead in the water
> anyway.

Relying on github was not my issue here, clearly we already rely on github :) But I wanted to avoid giving more maintenance burden to the gaia developers :) Our IT team's reason of existence is to help us.

That said, I'm not against putting the node_modules in github or any other solution that was already said, and this is also something we can rethink later if we see it doesn't work. I only wanted to give another idea which is IMO better but I won't push for it :)
(In reply to Gareth Aye [:gaye] from comment #26)
> > Ah, I didn't realize the difference between mirror and proxy (implicit vs.
> > explicit update). Proxy seems like a better solution than mirror in general,
> > even for the tbpl use case.
> 
> But remember that we can't go to the master registry when we're missing a
> package so we have to load the dependencies from the master eagerly (in the
> tbpl case).

If TBPL can access the local (local as in "same network") proxy, and the local proxy can access the master, then also it would work.
(In reply to Julien Wajsberg [:julienw] from comment #28)
> (In reply to Gareth Aye [:gaye] from comment #26)
> > > Ah, I didn't realize the difference between mirror and proxy (implicit vs.
> > > explicit update). Proxy seems like a better solution than mirror in general,
> > > even for the tbpl use case.
> > 
> > But remember that we can't go to the master registry when we're missing a
> > package so we have to load the dependencies from the master eagerly (in the
> > tbpl case).
> 
> If TBPL can access the local (local as in "same network") proxy, and the
> local proxy can access the master, then also it would work.

I believe that the local proxy cannot access the master during test runs.
It depends what we call "local proxy". If it's the same proxy that _we_ would use, then I don't see why it would be different.
But maybe test run VM can't access the same internal networks than we do.
That being said, the engineering effort is [at this point] somewhat of a sunk cost. npm-mirror is *very* stable nowadays. We had longstanding issues with https://bugzilla.mozilla.org/show_bug.cgi?id=960192 up until last week, but they are now resolved and Gi is very green. We should consider whether or not to do this irrespective of the dev effort for npm-mirror.
Created attachment 8377858 [details] [review]
Potential solution

This may be a potential solution to the problem.
see also bug 932946

If we had a private repository we could use Travis' caching capability. Don't know how to do this though.
Comment on attachment 8377858 [details] [review]
Potential solution

Gareth - if this works should we land it?
Attachment #8377858 - Flags: review?(gaye)
Depends on: 952566
Assignee: gaye → nobody
Attachment #8377858 - Attachment is obsolete: true
Attachment #8377858 - Flags: review?(gaye)
Created attachment 8378679 [details] [review]
Github pull request

Holy green build. Not ready to land yet - if this looks good we'll create a mozilla-b2g/gaia-node-modules repo instead. (Let's get the R+ before doing that)
Attachment #8378679 - Flags: review?(gaye)
Left comments on github.

I still think the maintenance burden will be very high but as long as I don't do it and it's done by someone I don't mind.

Also, what about the node_modules for the test-agent?
(In reply to Julien Wajsberg [:julienw] from comment #36)
> Left comments on github.
> 
> I still think the maintenance burden will be very high but as long as I
> don't do it and it's done by someone I don't mind.

It is a little bit more work - instead of bumping package.json in gaia, we do it in gaia-node-modules. After that, you commit as normal, but you need to run npm install in the external repo. I also don't mind signing up to manage this for a while if needed.

> Also, what about the node_modules for the test-agent?

Good point - my preference would be to migrate them to the standard node_modules folder, and handle it like the other libraries. I would like to do this in a follow-up.
Comment on attachment 8378679 [details] [review]
Github pull request

Hey Julien, Yuren - If you have some time please take a look at this. This can resolve quite a bit of the grey that we've been seeing lately. Thanks!
Attachment #8378679 - Flags: review?(yurenju.mozilla)
Attachment #8378679 - Flags: review?(felash)
Comment on attachment 8378679 [details] [review]
Github pull request

only one comment on github for less commands in Makefile, r=yurenju

we can land it and fix the minor issue later.
Attachment #8378679 - Flags: review?(yurenju.mozilla)
Attachment #8378679 - Flags: review?(gaye)
Attachment #8378679 - Flags: review?(felash)
Attachment #8378679 - Flags: review+
Keywords: checkin-needed
Give me one minute to migrate the node_modules to mozilla-b2g. This way they're not in my own repo :) Thanks!
Keywords: checkin-needed
Landed: https://github.com/mozilla-b2g/gaia/commit/e542540cfa64c31ac8bb71b1a9cf16879fc9d58e

This may need some follow-ups, but should get the job done for now. Thanks all!
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.