Closed Bug 1141792 Opened 9 years ago Closed 9 years ago

[fxos-automation] Add support for taskcluster npm-cache in gaia

Categories

(Taskcluster :: General, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aus, Assigned: aus)

References

Details

(Whiteboard: [fxos-automation])

Attachments

(9 files)

We're trying to get rid of our tarball node modules trick in favor of a pre-populated cache as part of the docker image tasks.
Blocks: 1141793
Actively working on this.
Status: NEW → ASSIGNED
Component: Marionette → TaskCluster
Attachment #8585821 - Flags: review?(jlal)
Attachment #8585823 - Flags: review?(jlal)
Attachment #8585821 - Flags: review?(jlal) → review+
Comment on attachment 8585823 [details] [review]
Pull Request - Gaia taskcluster magic foo.

the basics lgtm but please note my comment about splitting out your changes that touch different modules into different commits... I did not look closely at the debugging code either so please remember what to strip out (or again ideally do that stuff in separate commits).
Attachment #8585823 - Flags: review?(jlal) → review+
npm-cache module published: + taskcluster-npm-cache@1.1.11
npm-cache module published: + taskcluster-npm-cache@1.1.12 [final]
docker worker image gaia-taskenv:0.8.9 published [final]
docker worker image npm-cache:0.1.7 published [final]

https://github.com/mozilla-b2g/gaia/tree/auswerk-makes-stuff-better updated with latest changes addressing review comments.
Commit (master): https://github.com/mozilla-b2g/gaia/commit/84b9296e0f50eeb2811ab81d5788ef20a7cf1e0d

Keeping an eye on things... if all seems will, this will get resolved as fixed.
Things look good. Backward compatibility is making things run as expected on b2g-inbound and mozilla-central. Going to file some follow up tasks to convert those to using npm-cache as well for gaia tests.

Note that we no longer hide failed tasks that required a retry to pass. This should help us diagnose where our bad tests live and fix them a lot more easily.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Turns out this is really starting to cause problems, as the 'bootstrap' tests are failing on gaia-master and try.

Backed out: https://github.com/mozilla-b2g/gaia/commit/505056d3b1fb616f8735a31591481a0fa76cb7b3

Example of failure: https://treeherder.mozilla.org/#/jobs?repo=gaia-master&revision=c30b840b671cf8e8657b5561861945918eaa590c
https://s3-us-west-2.amazonaws.com/taskcluster-public-artifacts/cZ4IQw-RQp-ILzlQGsdRwg/2/public/logs/live_backing.log
Status: RESOLVED → REOPENED
Flags: needinfo?(aus)
Resolution: FIXED → ---
So, somehow, that was a bad package.json, I'm guessing there was a merge failure that wasn't addressed properly. I'm going to re-update and re-push shortly.
Flags: needinfo?(aus)
Gij13 is just too failure prone for this to get re-pushed without fixing the test first. I'll fix it first thing in the morning.
Changes to Gaia Makefile should be reviewed by the build peers: Ricky preferably, or ochameau.
Updated and re-pushed: https://github.com/mozilla-b2g/gaia/commit/766080a5c260696c16afec7ac71f84a8a156a4a1

Tentatively marking fixed once again. :) I'll be keeping an eye on it...

I'll have ricky look at the Makefile change and follow up if there are any changes requested.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Comment on attachment 8585823 [details] [review]
Pull Request - Gaia taskcluster magic foo.

Ricky, if you have a moment to look at the Makefile changes that'd be appreciated. Ultimately, we're hoping to get rid of all of the node_modules related goop in there. :)
Attachment #8585823 - Flags: feedback?(ricky060709)
I'm sorry Aus, but this is pretty clearly increasing our failure rate. There were several failures on b2g-inbound, and some on gaia-master as well. I'm afraid I had to back this out until we have a mitigation plan in place.

I don't think we can remove per-file retries until we have the infamous "socket timeout" crash solved. Is it possible to split out the per-file retry change from your patch?

Backout: https://github.com/mozilla-b2g/gaia/commit/29c2ecb90453d6cead8800dd5ef5357dde0a73b7
Status: RESOLVED → REOPENED
Flags: needinfo?(aus)
Resolution: FIXED → ---
I'm still going to have to mention that the rate of failure is the same, it's just not hidden anymore. I can't stress that enough.

Anyway, I'm going to re-commit this with the per-file retry stuff still in place for the time being, but beware that this per-file retry has been known to fail to show *errors* on test runs. Even after the 5 per-file retries have been completed.
Flags: needinfo?(aus)
(In reply to Ghislain Aus Lacroix [:aus] from comment #26)
> I'm still going to have to mention that the rate of failure is the same,
> it's just not hidden anymore. I can't stress that enough.

Right, I agree with you, the only difference is what we surface. If we don't mask them, we end up surfacing way too many errors. It becomes very difficult to sheriff. So much so that it is likely that GIJ would end up getting hidden again.
Comment on attachment 8585823 [details] [review]
Pull Request - Gaia taskcluster magic foo.

Makefile changes seems good to me. 

However, I feel confused by this bug title with the patch. I still see gaia_node_modules located in Makefile and not get rid of it. Could you take a time to describe why do we have such changes in Makefile? 

thanks!
Attachment #8585823 - Flags: feedback?(ricky060709) → feedback+
Summary: [fxos-automation] Get rid of gaia_node_modules in favor of using npm cache → [fxos-automation] Add support for taskcluster npm-cache in gaia
(In reply to Kevin Grandon :kgrandon from comment #27)
> (In reply to Ghislain Aus Lacroix [:aus] from comment #26)
> > I'm still going to have to mention that the rate of failure is the same,
> > it's just not hidden anymore. I can't stress that enough.
> 
> Right, I agree with you, the only difference is what we surface. If we don't
> mask them, we end up surfacing way too many errors. It becomes very
> difficult to sheriff. So much so that it is likely that GIJ would end up
> getting hidden again.

Totally understandable, just wanted our IRL conversation to be reflected here. :) I have a pull request minus the removal gaia_marionette_retry.js that will land shortly.

I'll file a separate bug to get rid of gaia_marionette_retry.js when I've stabilized the rest of the tests.
Commit (master): https://github.com/mozilla-b2g/gaia/commit/e5c1905b0144a855537fd857d62ec7a3393bb334

That's... a lot of green greeness. We're using the same retry settings as previously.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
(In reply to Ricky Chien [:rickychien] from comment #28)
> Comment on attachment 8585823 [details] [review]
> Pull Request - Gaia taskcluster magic foo.
> 
> Makefile changes seems good to me. 
> 
> However, I feel confused by this bug title with the patch. I still see
> gaia_node_modules located in Makefile and not get rid of it. Could you take
> a time to describe why do we have such changes in Makefile? 
> 
> thanks!

This was not answered, but I think npm-cache is used in taskcluster only (for now ?)
(In reply to Julien Wajsberg [:julienw] (PTO -> Apr 27) from comment #32)
> (In reply to Ricky Chien [:rickychien] from comment #28)
> > Comment on attachment 8585823 [details] [review]
> > Pull Request - Gaia taskcluster magic foo.
> > 
> > Makefile changes seems good to me. 
> > 
> > However, I feel confused by this bug title with the patch. I still see
> > gaia_node_modules located in Makefile and not get rid of it. Could you take
> > a time to describe why do we have such changes in Makefile? 
> > 
> > thanks!
> 
> This was not answered, but I think npm-cache is used in taskcluster only
> (for now ?)

Yes, this is correct. npm-cache is only used when run on taskcluster by settings NODE_MODULES_SRC to 'npm-cache' in the environment. We are planning to add support for local clients in the near future.
Hi Aus,

It seems like 'npm-cache' server has been enabled on task-cluster. Is that mean we can begin the work to remove modules.tar from Gaia build process and pull npm packages from internet? and treeherder will pull any npm packages from 'npm-cahce' server properly without any network failures?
Flags: needinfo?(aus)
(In reply to Ricky Chien [:rickychien] from comment #34)
> Hi Aus,
> 
> It seems like 'npm-cache' server has been enabled on task-cluster. Is that
> mean we can begin the work to remove modules.tar from Gaia build process and
> pull npm packages from internet? and treeherder will pull any npm packages
> from 'npm-cahce' server properly without any network failures?

We're almost at the point where we can get rid of modules.tar. I'm landing changes that enable use of npm-cache on gecko builds which is the last CI hurdle. After that, we want to have local developer support which will require adding a local cache of downloaded node_modules tarballs enabling switching between branches without network connectivity (given that the package must have been downloaded already). After that last change, we can kill gaia-git-modules / modules.tar and all related makefile bits. :) I have bugs filed for this already.
Flags: needinfo?(aus)
Component: TaskCluster → General
Product: Testing → Taskcluster
Target Milestone: --- → mozilla41
Version: unspecified → Trunk
Resetting Version and Target Milestone that accidentally got changed...
Target Milestone: mozilla41 → ---
Version: Trunk → unspecified
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: