Closed Bug 1280178 Opened 8 years ago Closed 8 years ago

Update desktop-test images to node 5.0.0 (or newer)

Categories

(Taskcluster Graveyard :: Docker Images, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla50

People

(Reporter: emk, Assigned: emk)

References

Details

Attachments

(2 files)

Currently node 0.10.36 is used which is very old.
https://hg.mozilla.org/mozilla-central/annotate/f3f2fa1d7eed/testing/docker/ubuntu1204-test/system-setup.sh#l170

We can't drop NPN support without this.
Please feel free to upgrade this and test it out in try.  I don't know what else might break with a newer node.

It's OK to download the node tarball from the internet instead of tooltool, as long as you are careful to check the hash in the shell script.
Sorry, I'm new to this. How to test a docker image on try? Are there any tutorials?
Flags: needinfo?(dustin)
I tried cargo-culting from bug 1227637, but the airplane did not fly.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8a3af34064d372446163696fb1b8d7185a126877

I have no idea what is wrong.
Ah, sorry, your patch is good but needs to be run by someone with the ability to push to the docker hub repos.  The `desktop-test` image can be modified in try, but not the images it depends on.  I'll take care of that for you.
Flags: needinfo?(dustin)
OK, I built and pushed those two docker images, and re-triggered the "I" job in your try push.  That should do the trick, then.  Consider this an r+ on rev 96a276c81631 too.  Let me know if you have further problems?
(In reply to Dustin J. Mitchell [:dustin] from comment #5)
> OK, I built and pushed those two docker images, and re-triggered the "I" job
> in your try push.  That should do the trick, then.  Consider this an r+ on
> rev 96a276c81631 too.  Let me know if you have further problems?

Looks like your re-trigger is failing again:
https://treeherder.mozilla.org/logviewer.html#?job_id=22589834&repo=try#L0-L315
Flags: needinfo?(dustin)
> EXDEV: cross-device link not permitted, rename '/usr/local/lib/node_modules/npm' -> '/usr/local/lib/node_modules/.npm.DELETE'

It seems to be a long standing known issue since node 5.0.0 :(
https://github.com/npm/npm/issues/9863
Maybe

RUN cd $(npm root -g)/npm \
 && npm install fs-extra \
 && sed -i -e s/graceful-fs/fs-extra/ -e s/fs\.rename/fs.move/ ./lib/utils/rename.js

will help?  Of course, that's installing fs-extra without verifying it, but we can fix that later.

If you modify that in the desktop-test image, and push to try, it will re-run the image task for you.
Flags: needinfo?(dustin)
https://public-artifacts.taskcluster.net/VdYLKffzTKaWGhy_4k7IIg/0/public/logs/live_backing.log
> sed: can't read ./lib/utils/rename.js: No such file or directory

:(
It tried to skip reinstalling npm because npm is already new enough (3.3.6).
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e6ca8a6037fa&selectedJob=22664278
The next error:
> ../../nan/nan.h:43:3: error: #error This version of node/NAN/v8 requires a C++11 compiler
I finally succeeded to build the Docker image. Thank you very much!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8d762ad7a091bb5e336ee190caaf5c948359dc34
Attached patch patchSplinter Review
Could you review additional changes since rev 96a276c81631?
* Stop re-installing npm because npm is already new enough.
* Installing g++-4.8 because node 5 requires a C++11 compiler.
Assignee: nobody → VYV03354
Status: NEW → ASSIGNED
Attachment #8764062 - Flags: review?(dustin)
Comment on attachment 8764062 [details] [diff] [review]
patch

Review of attachment 8764062 [details] [diff] [review]:
-----------------------------------------------------------------

I'm glad you got it working!!

Does this cause any other in-tree issues?  For example, is tc-vcs able to run with node 5?  Do the eslint tasks pass?  If all of that is in order, then I think all that remains is to do the gcc install in ubuntu1204-test.

::: testing/docker/desktop-test/Dockerfile
@@ +7,5 @@
> + && apt-get -y install gcc-4.8 g++-4.8 \
> + && update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 20 --slave /usr/bin/g++ g++ /usr/bin/g++-4.8 \
> + && update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.6 10 --slave /usr/bin/g++ g++ /usr/bin/g++-4.6 \
> + && gcc --version \
> + && g++ --version

I'm wary of this -- it means we're shipping around a docker image with not one but two compilers installed, for tests which theoretically do not require a compiler (in fact, at one point we deliberately did not install a compiler, to avoid accidental dependencies on compiler libraries).

That said, I don't see a better option, and as you've noticed we do already have 4.6 installed.

I think we would want to move this to the ubuntu1204-test image, but we can do that later (and I can regenerate the images with version 0.1.11)
Attachment #8764062 - Flags: review?(dustin)
(In reply to Dustin J. Mitchell [:dustin] from comment #13)
> Does this cause any other in-tree issues?  For example, is tc-vcs able to
> run with node 5?  Do the eslint tasks pass?

How to test them? (Sorry for the dumb question.)
Flags: needinfo?(dustin)
Just a push to try, I assume.  If you can update the patch to install the compilers in the ubuntu1204-test image, with a new version number, I can build them and push them to the repo.

I feel like we should get a super-review of the wisdom of putting a second compiler in the image.  I just have no idea who to ask -- do you have any thoughts?
Flags: needinfo?(dustin)
I pushed a new commit, but the Docker-images build task has not been triggered.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=326a94f640c1
What is wrong?
Flags: needinfo?(dustin)
We were experiencing issues with some things this morning and Im not sure if it caused an issue with reporting jobs to treeherder, but I did find the cause of the tests not getting scheduled, the image creation task failed:
https://public-artifacts.taskcluster.net/egjk-oEeQVuWHRAITS47Lw/0/public/logs/live_backing.log

Specifically:
Step 0 : FROM taskcluster/ubuntu1204-test-upd:0.1.11.20160628204600
Pulling repository taskcluster/ubuntu1204-test-upd
[34mINFO[0m[0003] Tag 0.1.11.20160628204600 not found in repository taskcluster/ubuntu1204-test-upd 

Was this new base image pushed to docker hub?
Flags: needinfo?(dustin)
(In reply to Greg Arndt [:garndt] from comment #17)
> We were experiencing issues with some things this morning and Im not sure if
> it caused an issue with reporting jobs to treeherder, but I did find the
> cause of the tests not getting scheduled, the image creation task failed:
> https://public-artifacts.taskcluster.net/egjk-oEeQVuWHRAITS47Lw/0/public/
> logs/live_backing.log
> 
> Specifically:
> Step 0 : FROM taskcluster/ubuntu1204-test-upd:0.1.11.20160628204600
> Pulling repository taskcluster/ubuntu1204-test-upd
> [34mINFO[0m[0003] Tag 0.1.11.20160628204600 not found in repository
> taskcluster/ubuntu1204-test-upd 
> 
> Was this new base image pushed to docker hub?

Yes. Could you push this image to docker hub?
No, but I'll do that now.
Done!

0.1.11.20160628204600: digest: sha256:e5ba129edf0e03effcd939675afaed25675bc1d52175bb7a59cf2ea968aadd9c size: 1994
Thanks you!

I re-triggered the job and some jobs failed with "no space left on device" errors. Any idea?
Flags: needinfo?(dustin)
Can you link the job?  I'd like to figure out which job failed.
Flags: needinfo?(dustin)
I am looking at this now.  It seems the desktop-test-xlarge instances are plagued with this lately.

I believe this should be the fix:
https://github.com/taskcluster/docker-worker/pull/233

I created a temp AMI to use on that worker type to see how it fares out.  I plan on killing off older instances.
Oops, test_spdy.js fails even without bug 1248198 changes...
Depends on: 1283862
Try is green. What is the next step to land the patch?
Flags: needinfo?(dustin)
Just land it, as far as I'm concerned (that is, relating to the docker stuff, not the actual Gecko changes)
Flags: needinfo?(dustin)
Comment on attachment 8769410 [details]
Bug 1280178 - Update desktop-test images to node 5.0.0.

I didn't mean to request a review.
Attachment #8769410 - Flags: review?(dustin) → review+
It looks like npm bombed out?  My usual response to this on my desktop is "rm -rf node_modules; npm install", but that's sort of implicit here since it's building from scratch.  :emk, are your npm-debugging skills better than mine?
No, I know little about node/npm.
Oy, those two logs are from when it was triggered by your landing and when I retriggered it because it looked sort of network-infra-like, but what I failed to notice was that in between those two failed runs it ran on the push after yours and was green, https://tools.taskcluster.net/task-inspector/#fY-ZW8FZQ_epJ0LJR02r_g/0

So I guess that means it's safe to reland, because the image building is already done, and relanding it will just cause that new image to be used? I guess?
Well, ish -- apparently we have an intermittent (and 33% successful at N=3) build process and that's going to cause heartache and misery later, even if it succeeds for now.  Brian is my local npm expert so maybe he can take a look and figure out what happened?
Flags: needinfo?(bstack)
Note: I'm also pretty much just guessing when it comes to node/npm. With that said:

I've been poking at this for a bit and can't really figure out anything from what debug info we have now. I think the best option might be to try something newer than node v5.0.0 and npm@2.0.0 (both of which are pretty old at this point afaik). As of me typing this, v4.4.7 is the most recent LTS version of node according to https://nodejs.org/en/ and v6.3.0 is the most recent "current" release. I think the best one would be the v4.4.7 release except it seems like this bug requires v5.0.0 at the oldest, so perhaps go with v5.11.1 or v5.12.0. I can look more into what the patches and such are on each version if we'd like. v.5.0.0 is from 2015-10-29 so I expect some bugs have been squashed since then. 

On the same line of reasoning. npm@^2.0.0 is pretty old as well. The version of npm that was packaged with node v5.0.0 was 3.3.6 and v5.12.0 comes with 3.8.6. Are we sticking with old versions because we rely on the way npm 2 functioned vs npm 3? Perhaps we should see if we can make the changes to allow for an upgrade. I expect that a lot of bugs have been squashed in npm in the intervening time.

The last thought I have is that maybe we should make npm-debug.log available as an artifact if it exists. Another thing we can do is to add a `-ddd` to the npm command.
Flags: needinfo?(bstack)
(In reply to Brian Stack [:bstack] from comment #39)
> On the same line of reasoning. npm@^2.0.0 is pretty old as well. The version
> of npm that was packaged with node v5.0.0 was 3.3.6 and v5.12.0 comes with
> 3.8.6. Are we sticking with old versions because we rely on the way npm 2
> functioned vs npm 3? Perhaps we should see if we can make the changes to
> allow for an upgrade. I expect that a lot of bugs have been squashed in npm
> in the intervening time.

This patch uses the npm version bundled with node (that is, npm 3.3.6 for node 5.0.0) because clobbering npm directory fails on node >=5. See beginning comments of this bug for details.

But certainly it would be worth to try newer node versions.
Jobs failed even without the patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a51a0a6aa227&selectedJob=23730813
I'll reland because this failure was not regression.
I filed bug 1286279 for the docker-build issues.  Sorry about that!
https://hg.mozilla.org/mozilla-central/rev/5290d874c8c5
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Blocks: 1303843
Product: Taskcluster → Taskcluster Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: