The default bug view has changed. See this FAQ.

Run mock against repos other than puppetagain

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: dustin, Assigned: dustin)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 7 obsolete attachments)

(Assignee)

Description

3 years ago
We currently run 'mock' against the PuppetAgain repos.  This has been a source of pain:
 - bug 899548 - mock only looks on one puppetmaster, and if that's down builds burn
 - bug 1022761 - just uploading new, unused packages to the repos can cause trees to close if mock doesn't like them due to some inscrutible yum error
 - system management requires a different cadence of upgrades than builds

Where can we point mock, aside from puppetagain?
(Assignee)

Comment 1

3 years ago
Talking to :jakem, it sounds like mrepo is a good option here.  It's currently not "bulletproof", but is better than what we're getting with bug 899548, and can be improved as necessary.

If that's the option we select, then getting started is as simple as rsync'ing the repos from puppetagain to mrepo and changing the repo URLs in the mock builds.

Jabba, what do you think?
Flags: needinfo?(jdow)

Comment 2

3 years ago
If it's tree-closing when down, our current mrepo infrastructure probably is not ideal. We currently have a single mrepo host in scl3 and a single mrepo host in phx1, we use DNS to failover between them, but there is definitely not much high availability built into the system and we can't necessarily guarantee consistency between scl3 and phx1, so sometimes they are different from each other until someone notices something weird and then it kinda gets manually poked until things work again. What's the scope of the usage? Is it a high-demand thing? (lots of builders hitting this constantly, or once per day, or every now and then one host uses it, etc.?)
Flags: needinfo?(jdow)
Would the following help?

1) use dated repos in mock instead of "latest"
2) don't use the same releng repo (http://puppetagain.pub.build.mozilla.org/data/repos/yum/releng/public/CentOS/6/). Deploying packages to puppet and mock would be different procedures.
(Assignee)

Comment 4

3 years ago
Jabba, it's "lots of builders hitting this constantly".  I suppose that means that if it's hosted in phx1 at any particular time then we're slinging nontrivial traffic over that link.  I don't have hard figures on traffic, but I might be able to get them if that's important to the decision.

Is mrepo built in such a way that you could add more nodes in either DC, or is that a redesign?  The inconsistency seems less problematic, as these repos would change fairly rarely.

Rail, that would work, but at that point it shouldn't be hosted on the puppetagain servers, particularly if we don't have the client-side resiliency (bug899548) that those expect.

Comment 5

3 years ago
adding more nodes would be a redesign. Each one has a storage blade attached, which is where all the repos are hosted.
:dustin

A few questions for you:

- How much storage do you anticipate needing?
- Do you have any traffic estimates?
- Is the repo data secret? i.e. needs to be protected
- Do you need self-service access to add packages?
- Plans on signing packages?
(Assignee)

Comment 7

3 years ago
(In reply to Brian Hourigan [:digi] from comment #6)
> - How much storage do you anticipate needing?

20G     /data/repos/yum/mirrors/centos/6/2012-03-07

> - Do you have any traffic estimates?

CentOS mirror:

About 3 hits per second, with an average size of 740k, or 2.25 MB/s.  Some small fraction of that will stay on the puppet repos (that is, hits generated by puppet on the base system, rather than by mock), but I don't have an analytical way to determine the fraction exactly.

Releng repo:
About 0.3 hits per second, with an average size of 9MB, or 2.7 MB/s.  Very little of this comes from puppet.  The size is skewed by installing big Android SDKs.

> - Is the repo data secret? i.e. needs to be protected

Nope, it's already public.

> - Do you need self-service access to add packages?

Yes

> - Plans on signing packages?

The centos mirror has signatures, but the releng repo is not signed.
(Assignee)

Comment 8

3 years ago
I got that data from a week's worth of access logs (6/{2..8}/2014) on the five currently-working puppet masters.  However, during that time there were two additional masters, so the numbers are likely to be 40% higher than this.  I'm getting more accurate logs now.
(Assignee)

Comment 9

3 years ago
Yikes, the centos mirror is more like 4.2MB/s and the releng repo is more like 5.5MB/s.
I had a brief discussion with :cshields about this last week - I think our best bet is to move this all to S3. We would still need a 'controller' box to rebuild repodata and to be the source of truth when syncing.

What's your timeframe?
(Assignee)

Comment 11

3 years ago
Soon - this is blocking our remediating the latest six OpenSSL vulns
Ok - I don't think we can support this using the existing mrepo infrastructure. :cshields just went on pto but I reached out to him for approval to proceed with S3.
(Assignee)

Comment 13

3 years ago
That sounds good.  I had no idea the data rates were that high!
(Assignee)

Comment 14

3 years ago
10 MB/s is 864000 MB/day = 843.75 GB/day

Per http://calculator.s3.amazonaws.com/index.html that's about $3800/mo!
(Assignee)

Comment 15

3 years ago
Sorry, typo, $2881.39.
So to pose the Question (primarily for :digi, but also :dustin):

Will this mrepo (in s3?) be accessible from outside the Releng VLAN/Network, how about from outside the Mozilla Corp machine VLANs, how about public internet?

The QBTQ is that SeaMonkey currently has a puppet server setup ala PuppetAgain, and needs Mock support as well. Currently we have no direct access to infra in AWS or within the releng BU, however we do have access to rsync from the public puppetagain repos, which we do.

We will need the ability to instantiate mock in a similar/identical way to MoCo, especially because we're using the same puppet manifests, just with a different config. So building into this design a need for SeaMonkey/Community/External to use the repo we use for mock would greatly benefit me, since this use-case was built into our design plans for the past few years, (including me buying hardware out of pocket to support this current implementation).
to elaborate slightly SeaMonkeys use is ~0 today (buildbot work isn't complete to use mock, and in fact caused us to not ship a Gecko 30 release), however when its up we'll be doing continuous builds across ~10-15 machines, vs MoCo's hundreds (or thousands if you include AWS). So our usage would be relatively infinitesimal.
(In reply to Dustin J. Mitchell [:dustin] from comment #14)
> 10 MB/s is 864000 MB/day = 843.75 GB/day
> 
> Per http://calculator.s3.amazonaws.com/index.html that's about $3800/mo!

How much of those hits would come from Releng in AWS though?  I believe that traffic would be excluded from the bill?
(Assignee)

Comment 19

3 years ago
A substantial portion, true.  However, that's something we need to monitor.  The routing tables we have configured in our VPC send all traffic via scl3 by default, with netblocks known to contain S3 endpoints excepted.  However, we don't know about all S3 endpoints, so it's possible that this service could move to a netblock not in our list, at which point we would ship all traffic over the VPC (bad for total VPC bandwidth) and then NAT it back out to the 'net and hit the S3 endpoint from a non-Amazon IP (incurring transfer costs).

Also, worth noting that Releng uses two regions, so traffic from the region this bucket is in would be free, but traffic from the other region would not.

I'm no expert at calculating these things, so don't take my word as final.
Ok..   might be worth designing mrepo to scale out and replicate across regions then?  But even then adds more complexity (and possibly more cost).

Adding dtorre to this as he'll have some AWS experience to add in costing this out.
Flags: needinfo?(dtorre)
I'll take a look at this on Friday. 

In the mean time, if someone could provide more info on mock, PuppetAgain, and the overall process here (what are we doing??) that would be nice.
Flags: needinfo?(dtorre)
(Assignee)

Comment 22

3 years ago
Mock basically builds software in a chroot, and builds that chroot by installing packages.  We use it for all of our Linux builds.  So, it's rebuilding the chroot on every build, which means pulling a lot of packages.  I have no idea why it doesn't cache them.

PuppetAgain is currently hosting the repos from which these packages are downloaded, and that's causing problems because PuppetAgain's availability and update models are different than those required for production builds.  PuppetAgain uses the repos to install full systems, and when we starte using mock we just used the same repos because they were there.
Got it. (For some reason the potential of a Docker-based solution comes to mind here... but I digress.)

Regarding above:
> ...I have no idea why it doesn't cache them.

Is the mock source hosted on Github or similar?
(Assignee)

Comment 24

3 years ago
https://github.com/jhford/mock_mozilla
(Assignee)

Updated

3 years ago
Duplicate of this bug: 899548
Blocks: 1022761
Blocks: 1021152
No longer blocks: 1022761
When I run the numbers, I get an S3 figure that's more in the ballpark of about $545/mo. This assumes a bunch of things, like you're builders aren't crankin' away 24/7/365 but instead are busy for about a typical 10-hour work day. Another assumption is that your Linux builders are spread across two AWS zones (Oregon and Virginia) plus SLC3. 


Check out the diagram below

https://mana.mozilla.org/wiki/display/EA/MREPO


:digi had an interesting suggestion about utilizing an S3-only repository. In such a scenario, nodes would get updates directly from S3 without a web front-end. This could save a few bucks on EC2 instances, load balancers, etc but the "catch" is you'd need an S3-aware Yum client. If it were my call I'd leave yum as-is and just front-end S3 with an auto-scale group of instances. If this is something you'd like to entertain, let me know and I'll add more views to the diagram.
(In reply to dtorre from comment #26)
> When I run the numbers, I get an S3 figure that's more in the ballpark of
> about $545/mo. This assumes a bunch of things, like you're builders aren't
> crankin' away 24/7/365 but instead are busy for about a typical 10-hour work
> day. Another assumption is that your Linux builders are spread across two
> AWS zones (Oregon and Virginia) plus SLC3. 
> 
> 
> Check out the diagram below
> 
> https://mana.mozilla.org/wiki/display/EA/MREPO
> 
> 
> :digi had an interesting suggestion about utilizing an S3-only repository.
> In such a scenario, nodes would get updates directly from S3 without a web
> front-end. This could save a few bucks on EC2 instances, load balancers, etc
> but the "catch" is you'd need an S3-aware Yum client. If it were my call I'd
> leave yum as-is and just front-end S3 with an auto-scale group of instances.
> If this is something you'd like to entertain, let me know and I'll add more
> views to the diagram.

PS - doc should be open to all. Feel free to play with the diagrams if you like.
> :digi had an interesting suggestion about utilizing an S3-only repository.
> In such a scenario, nodes would get updates directly from S3 without a web
> front-end. This could save a few bucks on EC2 instances, load balancers, etc
> but the "catch" is you'd need an S3-aware Yum client. If it were my call I'd
> leave yum as-is and just front-end S3 with an auto-scale group of instances.
> If this is something you'd like to entertain, let me know and I'll add more
> views to the diagram.

The s3-aware yum client is only needed when your data needs to be protected. In
this particular case I think all data is public, so no special client side
consideration would need to be made.

I'm still learning about AWS - I'm unclear on how S3 would provide shared storage
across multiple instances. Would this be via a FUSE module like s3fs? Or would
we use large EBS volumes and synchronize the data at the application layer?
>I'm still learning about AWS 
That's true for a lot of us, so no worries. And there's no one single answer here. There are lots of ways to do this.


As for a pure S3 access model, here are some other reasons to use an Apache front-end versus going straight to S3:

* Eventually we'll likely need security, so my thought is build it with security capabilities from the get-go
* Server-side modules (like s3fs) have cahcing capabilities
* S3 (to my knowledge) doesn't support partial/chunked downloads. Apache does, assuming caching is happening.



As you noted above, you could also implement a shared-nothing architecture whereby each EC2 node has a mirrored 1TB EBS store. My gut tells me heavy reserved instances backed by 1TB EBS will be cost-prohibitive, but it wouldn't hurt to look.
> As for a pure S3 access model, here are some other reasons to use an Apache
> front-end versus going straight to S3:
> 
> * Eventually we'll likely need security, so my thought is build it with
> security capabilities from the get-go
> * Server-side modules (like s3fs) have cahcing capabilities
> * S3 (to my knowledge) doesn't support partial/chunked downloads. Apache
> does, assuming caching is happening.

Makes sense - thanks for explaining.

> 
> 
> 
> As you noted above, you could also implement a shared-nothing architecture
> whereby each EC2 node has a mirrored 1TB EBS store. My gut tells me heavy
> reserved instances backed by 1TB EBS will be cost-prohibitive, but it
> wouldn't hurt to look.

That's how the system is setup today between our two bare metal hosts - and
keeping data synchronized is currently a operational burden. It would be nice
to remove this layer.
(Assignee)

Comment 31

3 years ago
Our builders run 24/7, although load varies by day and time.  But the load figures I quoted were over an entire week, so already take that variance into account.

S3 does support partial/chunked downloads.

S3 also supports ACLs.  Depending on what security model you need, this might be adequate -- limit access to Mozilla's public IPs and whatever EC2 VPCs need access.  I think that'd be fine for protecting RHEL repos, for example, but probably not for protecting packages with baked-in secrets or the like.

Even if that's not sufficient for security, setting up ELB and an instance pool seems like premature optimization.
Brian, Dave, this has been a great discussion, and it sounds like you guys are on the path to implement something really interesting and complex in AWS/S3. I'm wondering if relops should try to implement something a bit simpler and more short term so we can fix bug 1021152 in a timely fashion, though, because I'm guessing the timeframe on your possible plan isn't all that soon?
Hey, Amy. 

Hopefully the suggested architecture didn't come across as me being on your critical path. 

I'll leave it up to you guys how to implement something relops-specific. 

Thanks,
dtorre: there were a lot of great ideas in here, so I'm happy about the conversation.  I'd love to use the service you guys are describing, but I'm guessing it's a larger architectural project.  So, cool, we'll try to figure something out for the short term to move along on that bug and then perhaps we can revisit the mrepo solution once you guys get something set up in production.  Thanks again to you and digi for the ideas!
(Assignee)

Updated

3 years ago
Depends on: 1032491
(Assignee)

Comment 35

3 years ago
I'm syncing the repos with:

s3cmd sync /data/repos/yum/mirrors/centos/6/latest/ s3://mozilla-releng-mock-centos-us-east-1/centos/6/latest/
s3cmd sync /data/repos/yum/mirrors/centos/6/latest/ s3://mozilla-releng-mock-centos-us-west-2/centos/6/latest/

once that's in place, this will be over to releng to actually point to these repos in some regionally-appropriate way.  I can sync this to a releng cluster host, too, if that's helpful for onsite buildslaves.
(Assignee)

Updated

3 years ago
Duplicate of this bug: 1032491
(Assignee)

Comment 37

3 years ago
OK:

s3cmd ls s3://mozilla-releng-mock-centos-us-east-1/
                       DIR   s3://mozilla-releng-mock-centos-us-east-1/centos/
                       DIR   s3://mozilla-releng-mock-centos-us-east-1/releng/

s3cmd ls s3://mozilla-releng-mock-centos-us-west-2/
                       DIR   s3://mozilla-releng-mock-centos-us-west-2/centos/
                       DIR   s3://mozilla-releng-mock-centos-us-west-2/releng/

Now, I suspect the next step is to change the mock configs via puppet to point to the appropriate one of these buckets per location.  Chris, anything funny to watch out for there?
Flags: needinfo?(catlee)
Two things come to mind: permissions, and lack of index.html files.

permissions should be simple to check. and I don't think yum or apt rely on directory scraping to work, do they?
Flags: needinfo?(catlee)
(Assignee)

Comment 39

3 years ago
Correction to the above: I'm moving /centos/ to /mirrors/centos/ to match puppet.

As far as permissions, yeah - it looks like nobody but me can read these buckets, so I'll need to add ACLs.  What should those be based on?  IP range?  Something IAM flavored?

Yum doesn't need indexes, correct.

We need to make these repos public inspectable *somehow*, and we need support for an on-site builds, so I'll also set up http://mockbuild-repos.pub.build.mozilla.org on the releng cluster containing the same data, and configure the push scripts to sync to S3.  I'll need some help setting up IAM credentials for that, I think.
(Assignee)

Comment 40

3 years ago
Created attachment 8453095 [details] [diff] [review]
bug1022763-p1.patch
Assignee: nobody → dustin
Attachment #8453095 - Flags: review?(catlee)
(Assignee)

Comment 41

3 years ago
Created attachment 8453097 [details] [diff] [review]
bug1022763-p2.patch
Attachment #8453097 - Flags: review?(catlee)
(Assignee)

Comment 42

3 years ago
Created attachment 8453098 [details] [diff] [review]
bug1022763-p3.patch

This one can't land until mockbuild-repos.pub.b.m.o is set up and the ACLs are fixed.
Attachment #8453098 - Flags: review?(catlee)
These look basically OK, except that they're going to break when we generate new AMIs. Currently we generate an AMI in one region and then copy it to the other region(s). This is nice because we get identical AMIs across regions. However, this also means we're going to have AMIs in one region using the S3 buckets from another region.

I'm not sure where the right place to fix that up is.
(Assignee)

Comment 44

3 years ago
Well, that leaves two options, that I see:

1. Build different AMIs per location
2. Configure mock with something other than puppet

Maybe that "something" is runner?

While I have your attention, some advice on the ACLs would be helpful too :)
Are the only clients of these going to be AWS machines? If so, we could use IAM roles for permissions.

The instances are configured with IAM roles matching their moz-type, e.g. "bld-linux64". In the IAM console you can attach permissions for each IAM role. We already use this to give S3 access to certain buckets for sccache.
Fixing it up / generating it in runner seems like a good option.
(Assignee)

Comment 47

3 years ago
http://mockbuild-repos.pub.build.mozilla.org is up and running now.

I think we'll need the following policy on all slave roles:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1405009790000",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::mozilla-releng-mock-centos-*",
        "arn:aws:s3:::mozilla-releng-mock-centos-*/*"
      ]
    }
  ]
}

I don't have permission for that -- in fact, I only have permission to view about half of the roles, it seems.  Can someone either fix my perms or add that policy?
applied to {dev,bld,try}-linux64 and buildbot-master roles
(Assignee)

Comment 49

3 years ago
Created attachment 8453919 [details] [diff] [review]
bug1022763-p1-r2.patch

OK, how's this look, conceptually (untested).

I'm not sure how best to test this, really.  Hints appreciated!
Attachment #8453095 - Attachment is obsolete: true
Attachment #8453097 - Attachment is obsolete: true
Attachment #8453098 - Attachment is obsolete: true
Attachment #8453095 - Flags: review?(catlee)
Attachment #8453097 - Flags: review?(catlee)
Attachment #8453098 - Flags: review?(catlee)
Attachment #8453919 - Flags: feedback?(catlee)
Comment on attachment 8453919 [details] [diff] [review]
bug1022763-p1-r2.patch

Driveby ...

>+case "$availability_zone" in
>+    us-east-1*) REPOROOT="https://s3.amazonaws.com/mozilla-releng-mock-centos-use-east-1" ;;
>+    us-west-2*) REPOROOT="https://s3-us-west-2.amazonaws.com/mozilla-releng-mock-centos-use-west-2" ;;
>+    *)  REPOROOT='http://mockbuild-repos.pub.build.mozilla.org' ;;
>+esac

Needs s/-use-/-us-/ to avoid a NoSuchBucket response.
(Assignee)

Comment 51

3 years ago
Thanks, Nick - updated in my git repo.
(Assignee)

Comment 52

3 years ago
Comment on attachment 8453919 [details] [diff] [review]
bug1022763-p1-r2.patch

Rail, any chance you could take a look and let me know what you think, and about how  I might test and deploy this?
Attachment #8453919 - Flags: feedback?(catlee) → feedback?(rail)
I think we can easily can test this using a dev (loaner) instance talking to a puppet user environment. If we don't see any issues we can land this, see how the on-demand build instances behave and force AMI regeneration (or just wait until they are automatically rebuilt at night).

Is there anything else that I miss?
Comment on attachment 8453919 [details] [diff] [review]
bug1022763-p1-r2.patch

Review of attachment 8453919 [details] [diff] [review]:
-----------------------------------------------------------------

I like the idea!

::: modules/runner/files/config_mockbuild
@@ +13,5 @@
> +    *)  REPOROOT='http://mockbuild-repos.pub.build.mozilla.org' ;;
> +esac
> +
> +for ARCH in i386 x86_64; do
> +    TARGET=/etc/mock_mozilla/mozilla-centos6-i386.cfg

s/i386/$ARCH/ maybe?

::: modules/runner/manifests/tasks/config_mockbuild.pp
@@ +11,5 @@
> +    }
> +
> +    runner::task {
> +        "${runlevel}-config_mockbuild":
> +            source  => 'puppet:///modules/runner/config_mockbuild';

Maybe add a dependency on the file above?
Attachment #8453919 - Flags: feedback?(rail) → feedback+
Comment on attachment 8453919 [details] [diff] [review]
bug1022763-p1-r2.patch

Review of attachment 8453919 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/runner/files/config_mockbuild
@@ +9,5 @@
> +availability_zone=$(python -c 'import json; print json.load(open("/etc/instance_data.json")).get("placement/availability_zone", "")')
> +case "$availability_zone" in
> +    us-east-1*) REPOROOT="https://s3.amazonaws.com/mozilla-releng-mock-centos-use-east-1" ;;
> +    us-west-2*) REPOROOT="https://s3-us-west-2.amazonaws.com/mozilla-releng-mock-centos-use-west-2" ;;
> +    *)  REPOROOT='http://mockbuild-repos.pub.build.mozilla.org' ;;

as I've asked elsewhere, but been unable to find an answer thusfar, is this accessible from outside moco vlan? (e.g. servo, or qa, or seamonkey) [I just get an apache landing page at mockbuild-repos, while on VPN and no indication of whats underneath)

@@ +18,5 @@
> +    TMP="${TARGET}~"
> +    sed "s#%ARCH%#${ARCH}#g" "s#%REPOROOT%#${REPOROOT}#g" ${TEMPLATE} > ${TMP}
> +    if diff -q ${TMP} ${TARGET}; then
> +        mv ${TMP} ${TARGET}
> +        sudo -u cltbld /usr/bin/mock_mozilla -v -r mozilla-centos6-$ARCH --scrub=all

needs to use builder_username not cltbld
Attachment #8453919 - Flags: feedback+
(Assignee)

Comment 56

3 years ago
Simone and I will land and test something like this on Monday, including both these changes and bug 1043023.  I'll get a patch put together today.
(Assignee)

Comment 57

3 years ago
Created attachment 8462836 [details] [diff] [review]
bug1022763-p1-r3.patch

This accomplishes the necessary stuff for *this* bug.

Still to be addressed:
 * build a system to synchronize repository changes out to S3 automatically
 * document the repositories, their contents, and how to update them

the first of which will be required to deploy JDK1.4.

Simone, once the repo is in place (it will probably be /custom/jdk), we'll just update modules/runner/files/mockbuild-config-template to point to that repository as well, and perhaps include the package in chroot_setup_cmd.  Once that's done, the new JDK should be installed in every mock env.
Attachment #8453919 - Attachment is obsolete: true
Attachment #8462836 - Flags: review?(rail)
Attachment #8462836 - Flags: feedback?(sbruno)
Comment on attachment 8462836 [details] [diff] [review]
bug1022763-p1-r3.patch

Hi Dustin, ready to test changes as soon as /custom/jdk repo is created. Let's use Bug 1043023 to track progress.
Attachment #8462836 - Flags: feedback?(sbruno) → feedback+
(Assignee)

Comment 59

3 years ago
Simone, we'll need to deploy this change -- carefully -- first.  We had talked about testing it in user puppet environments.  I can use my environment, but I'll need to know which hosts I should pin -- preferably some in AWS and some on site.
(Assignee)

Comment 60

3 years ago
Comment on attachment 8462836 [details] [diff] [review]
bug1022763-p1-r3.patch

(since rail's on PTO)
Attachment #8462836 - Flags: review?(rail) → review?(sbruno)
(Assignee)

Updated

3 years ago
Depends on: 1045641
Dustin: I recently used b-linux64-hp-0025 to create a pinned environment, I guess we can use that box to run tests for this and Bug 1043023. It's still not enabled in slavealloc.
(Assignee)

Comment 62

3 years ago
https://wiki.mozilla.org/ReleaseEngineering/How_To/Update_Mock_Build_Repositories

I'm sync'ing the JDK changes now.
Comment on attachment 8462836 [details] [diff] [review]
bug1022763-p1-r3.patch

After Dustin pinned b-linux64-hp-0025 to his own puppet environment (which includes this patch + the pinning changes on top of current production), I enabled the slave on my dev environment and was able to correctly build mock environments for builds.
Attachment #8462836 - Flags: review?(sbruno) → review+
Still have to run tests which use the AWS mock repos.
(Assignee)

Comment 65

3 years ago
Exactly what I was going to say!  Once we're confident that this works in AWS, we can ship this change, and then the change to add JDK is relatively straightforward.
Dustin: can you add the jdk repo (the non AWS one) to modules/runner/files/mockbuild-config-template in your puppet environment (to which b-linux64-hp-0025 is still pinned), I would like to test that jdk packages are correctly picked up from there.
(Assignee)

Comment 67

3 years ago
I had the wrong patch (bug1022763-p1-r2.patch instead of bug1022763-p1-r3.patch) in my environment.  That had the wrong URLs for the releng repos.

I've added the JDK repo for bug 1043023.
So, I checked on the linux 25 box I am using for these tests, and none of the config files in /etc/mock_mozilla includes the new repo I would expect. I was tricked into giving you an r+ for the patch because of https://bugzilla.mozilla.org/show_bug.cgi?id=753291: I assumed (wrongly) that errors in installing packages would cause build failures and, not seeing any errors, I assumed the jdk-1.7 package was installed (which exists only in the new repo http://mockbuild-repos.pub.build.mozilla.org/custom/jdk/).
It seems to me there's something wrong with the new config_mockbuild task...
Flags: needinfo?(dustin)
Comment on attachment 8462836 [details] [diff] [review]
bug1022763-p1-r3.patch

Review of attachment 8462836 [details] [diff] [review]:
-----------------------------------------------------------------

r- since tests on a pinned environment did not work as expected.
Attachment #8462836 - Flags: review+ → review-
(Assignee)

Comment 70

3 years ago
Ah, it failed in runner:

/opt/runner/tasks.d/3-config_mockbuild: line 20: syntax error near unexpected token `newline'
/opt/runner/tasks.d/3-config_mockbuild: line 20: `    TARGET=/etc/mock_mozilla/mozilla-centos6-${ARCH}.cfg'

The error is a missing 'esac'.  Oops.

ian, shouldn't a failure in runner result in Buildbot not starting?  In fact, there are two other runner failures in that same logfile.  Also, runner.log doesn't really name the tasks as it runs them.  I guess those should be different bugs :)
Flags: needinfo?(iconnolly)
(Assignee)

Comment 71

3 years ago
Created attachment 8465416 [details] [diff] [review]
bug1022763-p2-r4.patch

This is in place in my environment now.
Attachment #8465416 - Flags: review?(sbruno)
Thanks Dustin! Testing it now.
Flags: needinfo?(dustin)
Test successful on non-AWS boxes.
Let's see the AWS part now.
(In reply to Dustin J. Mitchell [:dustin] from comment #70)
> Ah, it failed in runner:
> 
> /opt/runner/tasks.d/3-config_mockbuild: line 20: syntax error near
> unexpected token `newline'
> /opt/runner/tasks.d/3-config_mockbuild: line 20: `   
> TARGET=/etc/mock_mozilla/mozilla-centos6-${ARCH}.cfg'
> 
> The error is a missing 'esac'.  Oops.
> 
> ian, shouldn't a failure in runner result in Buildbot not starting?  In
> fact, there are two other runner failures in that same logfile.  Also,
> runner.log doesn't really name the tasks as it runs them.  I guess those
> should be different bugs :)

Dependent on the error it will once buildbot is a runner task. If a runner task exits with `2` then runner will not re-try and will exit. https://github.com/IanConnolly/runner/blob/master/README.md. When Bug 1045730
you'll also be able to set task-specific max re-try attempts.

So once Bug 1042340 lands the above will be true for CentOS and as discussed previously the rest will follow after.

That's a good point re: logging. Let me take a look at that when I get some free time.
Flags: needinfo?(iconnolly)
(Assignee)

Comment 75

3 years ago
p2-r4 has "instance_data" where it should say "instance_metadata".  Fixed in my environment.
I am testing on dev-linux64-ec2-sbruno.dev.releng.use1.mozilla.com, which is pinned to my puppet environment (containing dustin's patch).

After the change mentioned in comment 75, the repo  https://s3.amazonaws.com/mozilla-releng-mock-centos-us-east-1 is correctly listed in mock_mozilla/*.cfg.

However, I am having some permissions issues:

wget  https://s3.amazonaws.com/mozilla-releng-mock-centos-us-east-1/mirrors/centos/6/latest/os/i386/Packages/librelp-0.1.1-4.1.el6.i686.rpm
--2014-07-31 11:04:27--  https://s3.amazonaws.com/mozilla-releng-mock-centos-us-east-1/mirrors/centos/6/latest/os/i386/Packages/librelp-0.1.1-4.1.el6.i686.rpm
ERROR: Failed to open cert /etc/ssl/certs/make-dummy-cert: (-34).
ERROR: Failed to open cert /etc/ssl/certs/Makefile: (-34).
ERROR: Failed to open cert /etc/ssl/certs/ca-bundle.trust.crt: (-34).
Resolving s3.amazonaws.com... 207.171.185.200
Connecting to s3.amazonaws.com|207.171.185.200|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2014-07-31 11:04:28 ERROR 403: Forbidden.

Can that depend on the security group of the box I am using (which has been created as a loaner, as per https://wiki.mozilla.org/ReleaseEngineering/How_To/Loan_a_Slave#AWS_machines)?
Flags: needinfo?(catlee)
Dustin: apart from the permission issue in the previous comment, are there any other blocking issues ( maybe related to order of execution of tasks in runner, which you were discussing with ian in #mozbuild?)
Flags: needinfo?(dustin)
(Assignee)

Comment 78

3 years ago
Yes, sort of -- currently the 'instance_metadata' service runs after runner, which means that the instance_metadata.json file is out of date.  Not a problem for reserved/ondemand as the metadata should be static from run to run.  And maybe not even a problem for spot if were sure that AMIs are always generated in the same region as they're used.  At any rate, that's bug 1046926 and sounds easy to fix.
(Assignee)

Updated

3 years ago
Depends on: 1046926
Flags: needinfo?(dustin)
(Assignee)

Comment 79

3 years ago
The permissions for the bld-linux64 role include:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Stmt1405009790000",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::mozilla-releng-mock-centos-*",
        "arn:aws:s3:::mozilla-releng-mock-centos-*/*"
      ]
    }
  ]
}

Catlee suggests that maybe we're using the wrong endpoint URLs - maybe they need to be bucket hostnames, e.g., mozilla-releng-mock-centos-us-east-1.s3.amazonaws.com.

The other possibility would be that the machine used to test this didn't have the proper role, but IIRC we checked that already.

Simone, can you point me to an AWS host to use to test this, and I'll fiddle around?
Flags: needinfo?(catlee)
(Assignee)

Comment 80

3 years ago
We may also need additional permissions, such as List*
Dustin - the aws box I am using for tests is dev-linux64-ec2-sbruno.dev.releng.use1.mozilla.com, which is still active.
(Assignee)

Comment 82

3 years ago
Thanks!  From catlee:

12:20 <catlee> could try adding ListBucket to get better error messages?
(Assignee)

Comment 83

3 years ago
That host is in the 'try-linux64' role, which has the same permissions.

  curl -v >/dev/null https://s3.amazonaws.com/mozilla-releng-mock-centos-us-east-1/custom/jdk/repodata/repomd.xml

fails with 403, as does

  curl -v >/dev/null https://mozilla-releng-mock-centos-us-east-1.s3.amazonaws.com/custom/jdk/repodata/repomd.xml

The policy simulator says it should be allowed, though (assuming that arn:aws:s3:::mozilla-releng-mock-centos-us-east-1/custom/jdk/repodata/repomd.xml is the correct arn)
(Assignee)

Comment 84

3 years ago
I was under the impression that EC2 instances "magically" got the access granted by their IAM roles.  That seems to be false.  From my read of the docs:

http://docs.aws.amazon.com/IAM/latest/UserGuide/WorkingWithRoles.html
--
If an application runs on an Amazon EC2 instance and needs to make requests for AWS resources such as Amazon S3 buckets or an DynamoDB table, it must have security credentials. It isn't a good practice to embed or pass IAM user credentials to each instance—distributing long-term credentials to each instance is challenging to manage and a potential security risk. A better strategy is to create a role that is used when the Amazon EC2 instance is launched. An application can then get temporary security credentials from the Amazon EC2 instance. For more information, see Granting Applications that Run on Amazon EC2 Instances Access to AWS Resources. 
--

IAM doesn't magically grant access to an EC2 instance -- instead, it provides credentials for that access, based on the role.

Concurring:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html
---
    Create an IAM role.

    Define which accounts or AWS services can assume the role.

    Define which API actions and resources the application can use after assuming the role.

    Specify the role when you launch your instances.

    Have the application retrieve a set of temporary credentials and use them.
---

This is done via the AWS STS:
http://docs.aws.amazon.com/STS/latest/UsingSTS/Welcome.html

http://stackoverflow.com/a/11130701/2737366 indicates that boto reads these creds, too.  That's how the S3 caching stuff is working.

Indeed,
  curl http://169.254.169.254/latest/meta-data/iam/security-credentials/try-linux64/ 
returns a set of credentials, but with those credentials:

[root@dev-linux64-ec2-sbruno.dev.releng.use1.mozilla.com awscli-bundle]# /root/.local/lib/aws/bin/aws s3 cp s3://mozilla-releng-mock-centos-us-east-1//custom/jdk/repodata/repomd.xml .
A client error (Forbidden) occurred when calling the HeadObject operation: Forbidden
Completed 1 part(s) with ... file(s) remaining

Per http://docs.aws.amazon.com/AmazonS3/latest/dev/using-with-s3-actions.html, the Amazon S3 Operation "HEAD Object" is governed by the permission "s3:GetObject", so it's weird that this doesn't work.

Anyway, assuming that permission problem could be fixed, Yum can probably be made to use these credentials, with a plugin:
  http://www.carrollops.com/blog/2012/09/11/s3-yum-repos-with-iam-authorization/
but that'd have to be plugged into mock_mozilla somehow.

The next two things to try:
 * using a bucket policy
 * just opening read access to everyone and hoping nobody uses too much traffic
(Assignee)

Comment 85

3 years ago
(the double-slash in the 'aws s3 cp' command was a copy/paste error.. same effect with one slash)
(Assignee)

Comment 86

3 years ago
OK, permissions problem solved: access to that bucket is being denied by the bucket policy, regardless of the policy attached to the IAM role.
(Assignee)

Comment 87

3 years ago
I removed the mock-repos policy from try-linux64, dev-linux64, and bld-linux64.  I also marked custom/jdk/repodata/repomd.xml as readable for "everyone", and saw success with both aws and plain-old curl:
  curl -v >/dev/null https://mozilla-releng-mock-centos-us-east-1.s3.amazonaws.com/custom/jdk/repodata/repomd.xml
(Assignee)

Comment 88

3 years ago
http://blogs.aws.amazon.com/security/post/TxPOJBY6FE360K/IAM-policies-and-Bucket-Policies-and-ACLs-Oh-My-Controlling-Access-to-S3-Resourc
"One of the neat things about AWS is that you can actually apply both IAM policies and S3 bucket policies simultaneously, with the ultimate authorization being the least-privilege union of all the permissions"

So no amount of futzing with IAM role policies was going to allow access to that bucket.
(Assignee)

Comment 89

3 years ago
I added the following bucket policy:

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "EveryoneGetsToGet",
			"Effect": "Allow",
			"Principal": {
				"AWS": "arn:aws:iam::314336048151:root"
			},
			"Action": "s3:GetObject",
			"Resource": "arn:aws:s3:::mozilla-releng-mock-centos-us-east-1/*"
		}
	]
}

which according to http://serverfault.com/questions/569046/anonymous-access-to-s3-bucket-only-from-my-ec2-instances should allow "anonymous" access to this bucket from appropriately owned EC2 instances (although I don't really see how that would work).

Sure enough, with the other S3 ACLs removed, this gets me 403's, either with or without the EC2 instance's API credentials, as well as from an non-EC2 host.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "EveryoneGetsToGet",
			"Effect": "Allow",
			"Principal": "*",
			"Action": "s3:GetObject",
			"Resource": "arn:aws:s3:::mozilla-releng-mock-centos-us-east-1/*"
		}
	]
}

does allow access without creds.

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "EveryoneGetsToGet",
			"Effect": "Allow",
			"Principal": {
				"Service": "ec2.amazonaws.com"
			},
			"Action": "s3:GetObject",
			"Resource": "arn:aws:s3:::mozilla-releng-mock-centos-us-east-1/*"
		}
	]
}

allows access for nobody.

I've settled for the "Principal": "*" option, which corresponds to "just opening read access to everyone and hoping nobody uses too much traffic".  I'm convinced that's as good as it gets.
(Assignee)

Comment 90

3 years ago
http://serverfault.com/questions/569046/anonymous-access-to-s3-bucket-only-from-my-ec2-instances
(Assignee)

Comment 91

3 years ago
OK, simone, in trying to test this, I get:

[cltbld@dev-linux64-ec2-sbruno.dev.releng.use1.mozilla.com ~]$ mock_mozilla -r mozilla-centos6-x86_64 --init
ERROR: Could not find required config file: /etc/mock_mozilla/site-defaults.cfg

any chance you know a quick way around that, or a better way to test?

Also, we talked in irc today about finding a way to specify mock configurations in the build process, rather than in puppet.  Would you like to pursue that (and how feasible is it?), or should I try to get bug 1046926 fixed so that we can roll this out with runner?
Flags: needinfo?(sbruno)
Dustin.

I don't know why site-defaults.cfg is missing... maybe it's better to provide you a clean newlt-created loaner box if you need to play around some more.

Anyway, I will work on moving mock config to builders; if this will reveal to be taking too much, we can roll-back in a few days to the original strategy.
So hold on for now, I'll provide further feedback.
Flags: needinfo?(sbruno)
(Assignee)

Comment 93

3 years ago
Sounds great -- thanks!
hi Dustin,
since apparently there is no way to specify the mock configuration in files which are not under /etc/mock_mozilla (so we would need to change system level config files in the context of a specific build), I am no longer so convinced of the "mock config to builders" move; in order to deploy this I would recommend to get this out with runner.
(Assignee)

Updated

3 years ago
No longer depends on: 1046926
(Assignee)

Comment 95

3 years ago
Comment on attachment 8465416 [details] [diff] [review]
bug1022763-p2-r4.patch

I need to rework this to support the multiple configs for JDK
Attachment #8465416 - Attachment is obsolete: true
Attachment #8465416 - Flags: review?(sbruno)
(Assignee)

Comment 96

3 years ago
Created attachment 8475273 [details] [diff] [review]
bug1022763-p4.patch

Untested, but reworked to handle the android config (and easily add others if necessary)
Attachment #8462836 - Attachment is obsolete: true
Attachment #8475273 - Flags: review?(sbruno)
(Assignee)

Comment 97

3 years ago
Created attachment 8478350 [details] [diff] [review]
bug1022763-p5.patch

Tested this time -- interdiff:

diff --git a/modules/runner/files/mockbuild-config-templates/mozilla-centos6-i386.cfg b/modules/runner/files/mockbuild-config-templates/mozilla-centos6-i386.cfg
index 24f9283..ff2f011 100644
--- a/modules/runner/files/mockbuild-config-templates/mozilla-centos6-i386.cfg
+++ b/modules/runner/files/mockbuild-config-templates/mozilla-centos6-i386.cfg
@@ -45,19 +45,19 @@ syslog_device=

 [centos6]
 name=centos6
-baseurl=%REPOROOT%/centos/6/latest/os/i386
+baseurl=%REPOROOT%/mirrors/centos/6/latest/os/i386
 failovermethod=priority

 [centos6-updates]
 name=updates
-baseurl=%REPOROOT%/centos/6/latest/updates/i386
+baseurl=%REPOROOT%/mirrors/centos/6/latest/updates/i386
 failovermethod=priority

 [releng-centos6-i386]
 name=releng-centos6-i386
-baseurl=%REPOROOT%/public/CentOS/6/i386
+baseurl=%REPOROOT%/releng/public/CentOS/6/i386

 [releng-centos6-noarch]
 name=releng-centos6-noarch
-baseurl=%REPOROOT%/public/CentOS/6/noarch
+baseurl=%REPOROOT%/releng/public/CentOS/6/noarch
 """
diff --git a/modules/runner/files/mockbuild-config-templates/mozilla-centos6-x86_64-android.cfg b/modules/runner/files/mockbuild-config-templates/mozilla-centos6-x86_64-android.cfg
index b8303ab..7b30678 100644
--- a/modules/runner/files/mockbuild-config-templates/mozilla-centos6-x86_64-android.cfg
+++ b/modules/runner/files/mockbuild-config-templates/mozilla-centos6-x86_64-android.cfg
@@ -40,23 +40,23 @@ syslog_device=

 [centos6]
 name=centos6
-baseurl=%REPOROOT%/centos/6/latest/os/x86_64
+baseurl=%REPOROOT%/mirrors/centos/6/latest/os/x86_64
 failovermethod=priority

 [centos6-updates]
 name=updates
-baseurl=%REPOROOT%/centos/6/latest/updates/x86_64
+baseurl=%REPOROOT%/mirrors/centos/6/latest/updates/x86_64
 failovermethod=priority

 [releng-centos6-x86_64]
 name=releng-centos6-x86_64
-baseurl=%REPOROOT%/public/CentOS/6/x86_64
+baseurl=%REPOROOT%/releng/public/CentOS/6/x86_64

 [releng-centos6-noarch]
 name=releng-centos6-noarch
-baseurl=%REPOROOT%/public/CentOS/6/noarch
+baseurl=%REPOROOT%/releng/public/CentOS/6/noarch

 [custom-jdk]
 name=custom-jdk
-baseurl=%REPOROOT%/jdk
+baseurl=%REPOROOT%/custom/jdk
 """
diff --git a/modules/runner/files/mockbuild-config-templates/mozilla-centos6-x86_64.cfg b/modules/runner/files/mockbuild-config-templates/mozilla-centos6-x86_64.cfg
index 9ef38b5..6a12205 100644
--- a/modules/runner/files/mockbuild-config-templates/mozilla-centos6-x86_64.cfg
+++ b/modules/runner/files/mockbuild-config-templates/mozilla-centos6-x86_64.cfg
@@ -45,19 +45,19 @@ syslog_device=

 [centos6]
 name=centos6
-baseurl=%REPOROOT%/centos/6/latest/os/x86_64
+baseurl=%REPOROOT%/mirrors/centos/6/latest/os/x86_64
 failovermethod=priority

 [centos6-updates]
 name=updates
-baseurl=%REPOROOT%/centos/6/latest/updates/x86_64
+baseurl=%REPOROOT%/mirrors/centos/6/latest/updates/x86_64
 failovermethod=priority

 [releng-centos6-x86_64]
 name=releng-centos6-x86_64
-baseurl=%REPOROOT%/public/CentOS/6/x86_64
+baseurl=%REPOROOT%/releng/public/CentOS/6/x86_64

 [releng-centos6-noarch]
 name=releng-centos6-noarch
-baseurl=%REPOROOT%/public/CentOS/6/noarch
+baseurl=%REPOROOT%/releng/public/CentOS/6/noarch
 """
diff --git a/modules/runner/templates/tasks/config_mockbuild.erb b/modules/runner/templates/tasks/config_mockbuild.erb
index 711c522..689b067 100755
--- a/modules/runner/templates/tasks/config_mockbuild.erb
+++ b/modules/runner/templates/tasks/config_mockbuild.erb
@@ -15,8 +15,8 @@ esac
 for tpl in $TEMPLATES/*.cfg; do
     TARGET=/etc/mock_mozilla/$(basename $tpl)
     TMP="${TARGET}~"
-    sed "s#%REPOROOT%#${REPOROOT}#g" ${TEMPLATE} > ${TMP}
-    if diff -q ${TMP} ${TARGET}; then
+    sed "s#%REPOROOT%#${REPOROOT}#g" ${TEMPLATE} < ${tpl} > ${TMP}
+    if [ ! -f ${TARGET} ] || ! diff -q ${TMP} ${TARGET}; then
         mv ${TMP} ${TARGET}
         sudo -u <%= scope.lookupvar('::config::builder_username') %> /usr/bin/mock_mozilla -v -r $(basename $tpl .cfg) --scrub=all
     else
Attachment #8475273 - Attachment is obsolete: true
Attachment #8475273 - Flags: review?(sbruno)
Attachment #8478350 - Flags: review?(sbruno)
Comment on attachment 8478350 [details] [diff] [review]
bug1022763-p5.patch

Review of attachment 8478350 [details] [diff] [review]:
-----------------------------------------------------------------

Tested on a physical linux box pinned to my test environment, it worked like a charm.
Attachment #8478350 - Flags: review?(sbruno) → review+
(Assignee)

Updated

3 years ago
Attachment #8478350 - Flags: checked-in+
(Assignee)

Comment 99

3 years ago
This has been out for 30 minutes and I've seen mock installs run without failure, so I'm going to call this fixed.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
The runner task is looking for /etc/instance_data.json instead of /etc/instance_metadata.json, which is failing each time in AWS and also resulting in mock configs that point to the repositories in SCL3.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Created attachment 8520234 [details] [diff] [review]
look at the right file
Attachment #8520234 - Flags: review?(dustin)
(Assignee)

Updated

2 years ago
Attachment #8520234 - Flags: review?(dustin) → review+
Comment on attachment 8520234 [details] [diff] [review]
look at the right file

Review of attachment 8520234 [details] [diff] [review]:
-----------------------------------------------------------------

https://hg.mozilla.org/build/puppet/rev/618af0bea0bc

fixed up a comment in modules/instance_metadata/files/instance_metadata.py too.
Attachment #8520234 - Flags: checked-in+
Looks like this is working now.
Status: REOPENED → RESOLVED
Last Resolved: 3 years ago2 years ago
Resolution: --- → FIXED
We should probably use s3-external-1.amazonaws.com instead of s3.amazonaws.com in us-east-1. Writing patch/testing now.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Created attachment 8524618 [details] [diff] [review]
Use the correct URL for us-east-1

When using the DNS resolvers in SCL3 (as all our build machines do), you get responses for servers on the west cost for s3.amazonaws.com. The network performance from machines in us-east-1 to these S3 servers is pretty poor.

We should use s3-external-1.amazonaws.com for US Standard buckets being accessed by machines in us-east-1.
Attachment #8524618 - Flags: review?(rail)
Attachment #8524618 - Flags: review?(rail) → review+

Updated

2 years ago
Attachment #8524618 - Flags: checked-in+
(Assignee)

Comment 106

2 years ago
Let's open new bugs for further changes to this system.
Status: REOPENED → RESOLVED
Last Resolved: 2 years ago2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.