Closed Bug 928147 Opened 11 years ago Closed 11 years ago

git.mozilla.org was down

Categories

(Developer Services :: General, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: fox2mike, Unassigned)

References

Details

Tracker
Group: infra
Assignee: server-ops → server-ops-webops
Severity: minor → blocker
Component: Server Operations → WebOps: Source Control
Product: mozilla.org → Infrastructure & Operations
QA Contact: shyam → nmaul
Assignee: server-ops-webops → shyam
Service is back up. What I saw when I logged in was a massive bunch of gitolite process, mostly reads and the machine was out of RAM and almost using up all of swap which lead to the high load average on the box. This quickly came back down, but the Zeus health check didn't recover quickly enough.

I kicked apache (since Zeus had marked the service offline anyway and wouldn't have sent any new connections) and things seem to have come back fine.

Webops can do further investigations if needed and see if we need to share the load on the machine. This looked like a classic "burst" scenario where there were a lot of things going on on the machine.
Assignee: shyam → server-ops-webops
Severity: blocker → normal
And by down, I mean the http/https bits were probably the only ones affected. I was able to ssh in fine to the box and poke around etc.
Summary: git.mozilla.org is down → git.mozilla.org was down
both myself and aki checked our conversion systems -- no unusual activity there. And, we push via ssh, not pull over http.

I believe it would be useful to see the pending and rejected URL requests to understand the reason behind the burst. As well as which repositories the requests were for. (ffox development is such that certain patterns imply certain workflows)

Of particular interest:
 - number of attempted connections
 - number from "inside" vs "outside"
 - repositories requested
Last time it ran out of RAM was when Ehsan's repo tried to gc (it's enormous); it's still set to not auto-gc, though, so either something forced it or it's unrelated.
Depends on: 928170
At this point, we suspect some bad timing led to an unintentional DOS from the 3 releng networks. It would be helpful to get the http logs available so we can track if there was an event on our side that caused everything to ask at once.

From #sysadmins:
16:04 < hwine> atoll: can we get the logs stuck somewhere for some more analysis?
16:06 < atoll> i'd like to defer to fox2mike for that, since this isn't my field
16:06 < atoll> and there's no active attack underway or anything
16:08 < hwine> np -- any wild idea how long before they would be purged? days? weeks?
16:09 < atoll> stored a copy aside. also i think i was only looking at one zeus, but we
               don't multihome iirc, so that's probably fine.
16:10 < atoll> i *think* logs persist forever, but if needed, zlb1:/root/ has a copy for
               fox2mike.
16:10 < atoll> s/forever/something longer than a day/
16:10 < hwine> okay -- great for now -- we'll do more later
16:10  * atoll bows
16:10 < hwine> thanks for your help!
Flags: needinfo?(shyam)
Someone from webops can help with the logs, doesn't need me :)
Flags: needinfo?(shyam)
gitmo just alerted again. logged in and one proc was using 32G virt mem, same as before. I couldn't tell offhand which repo it was, though. otoh, a few minutes before there was a push to users/eakhgari@mozilla.com/mozilla-history-tools, which is the massive repo that caused problems last time.
(In reply to Shyam Mani [:fox2mike] from comment #6)
> Someone from webops can help with the logs, doesn't need me :)

atoll -> fox2mike -> webops a triple play!
Flags: needinfo?(server-ops-webops)
I've blocked write access to users/eakhgari@mozilla.com/mozilla-history-tools. It's currently using 196G on-disk, which is about 100G more than when I looked at it a couple weeks ago. Clearly, disabling auto-gc has side effects. ;-) 

Hal will consult with TPTB to see if we can shut the repo down; if not, we'll need to move it off (to git2, probably) and run gc there to shrink it back down to a reasonable size. If the repo needs to exist for the long term, we'll need to figure out a way to host it such that it does not effect the rest of git.m.o.

21:23 < hwine> fubar: thanks for catching the git.m.o thing -- any way we can see if 
               there was a similar event push to ehsan's prior to first event?
21:23 < fubar> yeah, I'll go look
21:23  * hwine notes that github had issues with this repo as well
21:25 < hwine> otoh, this may make splitting real easy. that repo and all other repos ;)
21:25 < fubar> sonofa... just kill ANOTHER one
21:26 < hwine> there's an automated process behind that what will push every time 
               there's activity on the key gecko repos (most of releases/ & projects/ 
               on hg.m.o)
21:26 < bkero> oh that thing
21:27 < hwine> the community thing, not the releng things <- note plural now :)
21:27 < bkero> There's a very good reason github told it to gtfo :P
21:28 < fubar> there have been a bunch of updates to it this evening/afternoon, 
               including a bunch right around the time git fell over earlier
21:28 < fubar> oh look... there it is again
21:29 < bkero> fubar: sorry I'm not going to be much help. It's 0330 here and I'm 
               really tired. and have a train to catch in 2.25 hours
21:29 < fubar> dude. gtfts. :-)
21:30 < bkero> hwine: I can take a look at the performance bottlenecks sometime, but 
               I'm not sure we'll be able to host that repo on the same hardware as 
               other things.
21:31 < hwine> bkero: fubar fyi, we're getting "real close" to having a supported 
               solution, so shortish term options may be reasonable
21:31 < fubar> I thought we had one with disabling auto-gc, but it still appears to 
               wanna do that :-(
21:31 < hwine> ah - Paris - no wonder bkero looked so awake at CAB this morning :)
21:32 < fubar> so it looks like it's not always eating ALL of the RAM, just some times
21:32 < hwine> fubar: it wouldn't do any harm to turn off gc, as long as there is disk
21:32 < fubar> currently at 17G and holding
21:35 < bkero> 196G users/eakhgari@mozilla.com/
21:36 < fubar> wow. so that's grown by 100G since last I looked.
21:38 < bkero> fubar: we should be sure this doesn't fill the disk up
21:38 < bkero> or else we're going to have a Bad Time (TM)
21:38 < fubar> yeah, I was just thinking that
21:38 < fubar> since we only have 99G left
21:38 < fubar> mfer
21:42 < fubar> running git gc on it seems like the obvious thing, but that'll just end 
               up thrashing the box :-(
21:42 < hwine> fubar: let me send email to various powers that be -- it may be 
               reasonable to consider shutting this down in the near term (like 
               tomorrow)
21:43 < hwine> politics though, so I can't make the call
21:43 < fubar> sure
21:43 < fubar> otoh, gitmo falling over...
21:44 < fubar> if nothing else, I WILL have to shut it down, move it, and gc it on a 
               less-used machine
21:44 < hwine> oh, is git.m.o continually falling over now?
21:44 < fubar> it's jsut about out of memory
21:45 < fubar> auto gc is off, but something is still causing it to pack things which 
               eats all the RAM
21:45 < fubar> it's been at 28G for the last 20 minutes
21:45 < hwine> Okay, that call I can make -- let's shut down write access to that repo
21:48 < fubar> ok, removing write access now
Whoa guys.  That repo is used for a service used by lots of our developers.  I'm sorry that this repo is too big, but can you please note disable this without syncing up with me first?  This broke the git mirror last night.

And can you please re-enable that account ASAP?  If you can't support this repo, it's fine to give me advance notice so that I can take it elsewhere.  It's not cool to turn it off without any prior notice.

Thanks!
We recognise that it is bad that this was disabled and that it might have affected services which depend on it, however it was becoming an issue of availability of repositories that are required to build FirefoxOS, so this would have been a tree-closing problem.

If we enable this repository again, it is highly likely that FirefoxOS builds that require repositories from this service will cease to work. That is something that neither of us want to be responsible for.

Besides the HTTPD availability, the repository was growing unbounded, and would have out-of-disk'd the box within a few more hours.

I propose that we meet to discuss if it is possible (and how to) properly host it now that we understand the resource requirements.
Flags: needinfo?(server-ops-webops)
(In reply to Ben Kero [:bkero] from comment #11)
> We recognise that it is bad that this was disabled and that it might have
> affected services which depend on it, however it was becoming an issue of
> availability of repositories that are required to build FirefoxOS

This repository, by being required by the github m-c mirror *is* required for Firefox OS.  OEM partners have been and are still depending on that.

We need to find a solution for this repository, and we need it fast.
I'm currently attempting to repack this repo on a separate machine. I believe I've sorted memory usage issue, but disk usage remains a significant concern. If, once the repack is done, disk usage is down to a reasonable amount, then I'll copy it back to git.m.o and re-enable it. But leaving it active and *guaranteeing* git.m.o outages is a non-starter; at least without that repo other developers and processes can continue.

OTOH, if all FFOS development has ceased because of that repo (which according to RelEng is about to be replaced), then RelEng and IT need to work together to purchase a system capable of handling a repo of this size and growth.
Yes, it is also used by an unknown number of our developers on a daily basis too.  Github currently says it has 344 forks, and that's only github forks, the actual number of forks that people use is probably bigger.

I'm fine with having a meeting to discuss this.  Who's going to be in the meeting and what are we going to discuss?

Also, please turn this account back on in the mean time.  Thanks.
(In reply to Kendall Libby [:fubar] from comment #13)
> OTOH, if all FFOS development has ceased because of that repo (which
> according to RelEng is about to be replaced), then RelEng and IT need to
> work together to purchase a system capable of handling a repo of this size
> and growth.

This repository is used by our developers across all of our products, and at least by one of our partners (not sure if their name is public or not -- I'd be happy to name them in a private email if you want.)

The RelEng repo may or may not be affected by the same issue, that is a separate discussion.  I only care about the repo which I am responsible for here.
Also to give some context, I have been maintaining this repository for 2.5+ years out of my free time.  I am going on vacation starting on Oct 28 and was hoping to be able to stop reading my email for the first time in that duration on vacation.  My only concern here is for this kind of stuff to not happen while I am gone.
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #14)

> Also, please turn this account back on in the mean time.  Thanks.

I want to be very clear about this: you want us to enable this repo and put the entire git.mozilla.org service in jeopardy? It has already caused at least two outages...
(In reply to comment #17)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #14)
> 
> > Also, please turn this account back on in the mean time.  Thanks.
> 
> I want to be very clear about this: you want us to enable this repo and put the
> entire git.mozilla.org service in jeopardy? It has already caused at least two
> outages...

No I do not want to put git.mozilla.org services in jeopardy, those are not our only two options here.

The first time IIRC the problem was caused by git gc starving everything else running on that machine.  Is that correct?  If my memory serves, we "fixed" that by disabling git gc by default on that repo.  Can we enable periodic git gc's on the repository, and perhaps have some kind of monitoring on the disk space to make sure that somebody gets notified when we're about to run out of disk space?

Also, reading the irc log pasted here, I can't actually figure out why my write access was revoked.  According to the logs the server was not about to run out of disk space in a short time, it seems to me like my access was revoked because people were under the wrong impression on how important this repository is, but I don't know why nobody considered even trying to contact me.
Mozilla has a contractual obligation to our partners to keep git.mozilla.org online with the releases/*.git repos available.

Ehsan, we once before had a bad outage, where developers had to/chose to switch (temporarily I believe) from your repo to the github:mozilla/releases-mozilla-central repo. It was painful, but kept development going.

Can you comment on whether such a course is possible this time using the newer mozilla/integration-gecko-dev which is the long term solution being discussed?
As I said in #c13:

> I'm currently attempting to repack this repo on a separate machine. I believe I've > sorted memory usage issue, but disk usage remains a significant concern. If, once 
> the repack is done, disk usage is down to a reasonable amount, then I'll copy it 
> back to git.m.o and re-enable it.

Monitoring on disk usage already exists. But we're already at the point that we need to be worrying about moving your repo elsewhere or buying new hardware for gitmo, as it's using 2/3rds of the current disk. Hopefully, the repack will cut it down noticeably. 

I won't hold up re-enabling the repo on disk usage alone, but RelEng, IT and you will need to work out a plan for increasing the hardware (or other alternatives) next week. However, if disk usage goes critical in the meantime, we will have to disable it again, lest really, *really* bad things happen to all of the repos.

Write access was revoked because pushes to the repo were causing gc's/repacks to run, despite the auto-gc setting, which was causing the service to run out of memory.
(In reply to comment #19)
> Mozilla has a contractual obligation to our partners to keep git.mozilla.org
> online with the releases/*.git repos available.

I understand that.

> Ehsan, we once before had a bad outage, where developers had to/chose to switch
> (temporarily I believe) from your repo to the
> github:mozilla/releases-mozilla-central repo. It was painful, but kept
> development going.

IIRC only a few people actually switched, most people just chose to wait for a few hours while I fixed the outage.  But there were many fewer users of the git mirror back then, there are a lot more now.

I'm not sure what you mean by this though.  Do you really think that disrupting the workflow of hundreds of our developers and *at least* engineers working for one of our partners is acceptable?

> Can you comment on whether such a course is possible this time using the newer
> mozilla/integration-gecko-dev which is the long term solution being discussed?

I don't know what the current status of those repositories is.  You should check with Aki.  The only high level issue that I can remember right now is figuring out how to "migrate" the actual github repository in a way that doesn't break all of the forks/stars/watches of that repo.  We may or may not want to give people a few days notice in advance to give them time to prepare to rebase all of their branches, etc.

But unless we know that we can switch to that repository *right now* I don't see how this is relevant.  Can we discuss a fix to the issue at hand for now?
(In reply to comment #20)
> As I said in #c13:
> 
> > I'm currently attempting to repack this repo on a separate machine. I believe I've > sorted memory usage issue, but disk usage remains a significant concern. If, once 
> > the repack is done, disk usage is down to a reasonable amount, then I'll copy it 
> > back to git.m.o and re-enable it.
> 
> Monitoring on disk usage already exists. But we're already at the point that we
> need to be worrying about moving your repo elsewhere or buying new hardware for
> gitmo, as it's using 2/3rds of the current disk. Hopefully, the repack will cut
> it down noticeably. 

I expect that it will, yes.  I _think_ the reason why this repository gets so big is that git falls back to store the git-mapfile (which is a 100+MB file) in its full blob on every commit or every few commits.  Doing a repack/gc should let it use delta compression and tremendously reduce the repo size.

> I won't hold up re-enabling the repo on disk usage alone, but RelEng, IT and
> you will need to work out a plan for increasing the hardware (or other
> alternatives) next week.

Can we have a periodic (once a day or so) repack of this repo?

> However, if disk usage goes critical in the meantime,
> we will have to disable it again, lest really, *really* bad things happen to
> all of the repos.
> 
> Write access was revoked because pushes to the repo were causing gc's/repacks
> to run, despite the auto-gc setting, which was causing the service to run out
> of memory.

That... is surprising.  Looks like somebody needs to debug why this happens.  There should be a way to disable auto-gc, but I'm not very familiar with the git server software being used.

But this is actually puzzling, I thought the repo is big because the gc was turned off, but it looks like the gc was not actually turned off...  Those two seem to contradict each other. :/
(In reply to :Ehsan Akhgari (needinfo? me!) from comment #22)
>
> I expect that it will, yes.  I _think_ the reason why this repository gets
> so big is that git falls back to store the git-mapfile (which is a 100+MB
> file) in its full blob on every commit or every few commits.  Doing a
> repack/gc should let it use delta compression and tremendously reduce the
> repo size.

That's good news, then. The repack is currently at 72%; it's dog slow, though. If I had to guess, it'll probably take another 3-4 hours to complete. 

 
> Can we have a periodic (once a day or so) repack of this repo?

Possibly? I'm not sure, but I'll look into it. Also, see below.


> That... is surprising.  Looks like somebody needs to debug why this happens.
> There should be a way to disable auto-gc, but I'm not very familiar with the
> git server software being used.
> 
> But this is actually puzzling, I thought the repo is big because the gc was
> turned off, but it looks like the gc was not actually turned off...  Those
> two seem to contradict each other. :/

So, auto-gc IS disabled, but there's clearly some other housekeeping task that git is doing, that looks, acts, and smells like gc. Or maybe it's just one of the things that gc does, but can otherwise be kicked off after some threshold. 

On the other hand, setting "config pack.windowMemory = 256m" seems to make the memory usage problem go away, in which case re-enabling gc would be possible again.
It might be worth playing with that variable a bit, but set at 256m it's currently using 4.2G of memory... git.m.o has a fair bit of memory but I'm not sure how far I'd want to push it.
(In reply to comment #23)
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #22)
> >
> > I expect that it will, yes.  I _think_ the reason why this repository gets
> > so big is that git falls back to store the git-mapfile (which is a 100+MB
> > file) in its full blob on every commit or every few commits.  Doing a
> > repack/gc should let it use delta compression and tremendously reduce the
> > repo size.
> 
> That's good news, then. The repack is currently at 72%; it's dog slow, though.
> If I had to guess, it'll probably take another 3-4 hours to complete. 

Thanks!  FWIW. it'll be faster next time, since it pretty much starts where it left off last time.

> > Can we have a periodic (once a day or so) repack of this repo?
> 
> Possibly? I'm not sure, but I'll look into it. Also, see below.
> 
> > That... is surprising.  Looks like somebody needs to debug why this happens.
> > There should be a way to disable auto-gc, but I'm not very familiar with the
> > git server software being used.
> > 
> > But this is actually puzzling, I thought the repo is big because the gc was
> > turned off, but it looks like the gc was not actually turned off...  Those
> > two seem to contradict each other. :/
> 
> So, auto-gc IS disabled, but there's clearly some other housekeeping task that
> git is doing, that looks, acts, and smells like gc. Or maybe it's just one of
> the things that gc does, but can otherwise be kicked off after some threshold. 

I have git auto gc disabled on my side of things, and I have never seen git doing any kind of gc/repack underneath my nose...  What you're seeing it very weird.  Unless it's gitolite stepping on our toes here?

> On the other hand, setting "config pack.windowMemory = 256m" seems to make the
> memory usage problem go away, in which case re-enabling gc would be possible
> again.
> It might be worth playing with that variable a bit, but set at 256m it's
> currently using 4.2G of memory... git.m.o has a fair bit of memory but I'm not
> sure how far I'd want to push it.

That's good news!  Unfortunately I don't know what that setting does and how it affects things, so I can't be of much help there.
(In reply to Kendall Libby [:fubar] from comment #20)
> I won't hold up re-enabling the repo on disk usage alone, but RelEng, IT and
> you will need to work out a plan for increasing the hardware (or other
> alternatives) next week. However, if disk usage goes critical in the
> meantime, we will have to disable it again, lest really, *really* bad things
> happen to all of the repos.

When this discussion happens (via Vidyo?), please include jst (CC'd on this bug).  It would be great if we could (perhaps temporarily) transition what Ehsan is doing to jst some time next week (Oct 21-25) so it's off Ehsan's plate before his planned PTO and we can work through any issues by then.  Thanks!
(In reply to Andrew Overholt [:overholt] from comment #25)
> (In reply to Kendall Libby [:fubar] from comment #20)
> > I won't hold up re-enabling the repo on disk usage alone, but RelEng, IT and
> > you will need to work out a plan for increasing the hardware (or other
> > alternatives) next week. However, if disk usage goes critical in the
> > meantime, we will have to disable it again, lest really, *really* bad things
> > happen to all of the repos.
> 
> When this discussion happens (via Vidyo?), please include jst (CC'd on this
> bug).  It would be great if we could (perhaps temporarily) transition what
> Ehsan is doing to jst some time next week (Oct 21-25) so it's off Ehsan's
> plate before his planned PTO and we can work through any issues by then. 
> Thanks!

I have not received any meeting invites yet, but I talked to jst and he offered to help cover this when I'm away.
Ok, after a hiccup part way through, the repack finished and the resulting repo size was reduced to ... *drumroll* 6.2G. I've copied it back to git.m.o, re-enabled auto gc and reduced the window memory size to 512M. During the repack, memory usage didn't exceed ~10G, which git.m.o can handle, at least at the moment. :-)

Ehsan, please let us know if anything seems amiss with the repo!
Flags: needinfo?(ehsan)
(in case there are issues the original, 200g, repo is still there, just renamed. if there are no issues, we'll nuke it.)
(In reply to Kendall Libby [:fubar] from comment #27)
> Ok, after a hiccup part way through, the repack finished and the resulting
> repo size was reduced to ... *drumroll* 6.2G. I've copied it back to
> git.m.o, re-enabled auto gc and reduced the window memory size to 512M.
> During the repack, memory usage didn't exceed ~10G, which git.m.o can
> handle, at least at the moment. :-)
> 
> Ehsan, please let us know if anything seems amiss with the repo!

Thanks!  Everything seems to be fine, and my scripts can now push to this repo successfully.
Flags: needinfo?(ehsan)
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #12)
> This repository, by being required by the github m-c mirror *is* required
> for Firefox OS.  OEM partners have been and are still depending on that.
> 
> We need to find a solution for this repository, and we need it fast.

(In reply to :Ehsan Akhgari (needinfo? me!) from comment #15)
> This repository is used by our developers across all of our products, and at
> least by one of our partners (not sure if their name is public or not -- I'd
> be happy to name them in a private email if you want.)

To clarify this point, RelEng was contacted by a partner recently complaining about how https://github.com/mozilla/mozilla-central contains non-fastforward commits, and that they cannot trust it to be a dependable source of history if that happens.  We pointed them to http://git.mozilla.org/?p=releases/gecko.git;a=summary , which is the RelEng-supported partner-oriented repository, which has denyNonFastForwards and denyDeletes rules set.  To my knowledge, all partners are now using http://git.mozilla.org/?p=releases/gecko.git;a=summary ; if not we should point them there.
(In reply to Kendall Libby [:fubar] from comment #27)
> Ok, after a hiccup part way through, the repack finished and the resulting
> repo size was reduced to ... *drumroll* 6.2G. I've copied it back to
> git.m.o, re-enabled auto gc and reduced the window memory size to 512M.
> During the repack, memory usage didn't exceed ~10G, which git.m.o can
> handle, at least at the moment. :-)
> 
> Ehsan, please let us know if anything seems amiss with the repo!

(In reply to Kendall Libby [:fubar] from comment #28)
> (in case there are issues the original, 200g, repo is still there, just
> renamed. if there are no issues, we'll nuke it.)

(In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment #29)
> (In reply to Kendall Libby [:fubar] from comment #27)
> > Ok, after a hiccup part way through, the repack finished and the resulting
> > repo size was reduced to ... *drumroll* 6.2G. I've copied it back to
> > git.m.o, re-enabled auto gc and reduced the window memory size to 512M.
> > During the repack, memory usage didn't exceed ~10G, which git.m.o can
> > handle, at least at the moment. :-)
> > 
> > Ehsan, please let us know if anything seems amiss with the repo!
> 
> Thanks!  Everything seems to be fine, and my scripts can now push to this
> repo successfully.

ehsan: if you are happy with this 6GB compacted repo, can :fubar now proceed to delete the earlier pre-compacted 200GB repo?
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment #26)
> (In reply to Andrew Overholt [:overholt] from comment #25)
> > (In reply to Kendall Libby [:fubar] from comment #20)
> > > I won't hold up re-enabling the repo on disk usage alone, but RelEng, IT and
> > > you will need to work out a plan for increasing the hardware (or other
> > > alternatives) next week. However, if disk usage goes critical in the
> > > meantime, we will have to disable it again, lest really, *really* bad things
> > > happen to all of the repos.
> > 
> > When this discussion happens (via Vidyo?), please include jst (CC'd on this
> > bug).  It would be great if we could (perhaps temporarily) transition what
> > Ehsan is doing to jst some time next week (Oct 21-25) so it's off Ehsan's
> > plate before his planned PTO and we can work through any issues by then. 
> > Thanks!
> 
> I have not received any meeting invites yet, but I talked to jst and he
> offered to help cover this when I'm away.

I've asked jakem to setup a meeting of all the stakeholders and find a solution that can be reliably supported in production.

(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #12)
> (In reply to Ben Kero [:bkero] from comment #11)
> > We recognise that it is bad that this was disabled and that it might have
> > affected services which depend on it, however it was becoming an issue of
> > availability of repositories that are required to build FirefoxOS
> 
> This repository, by being required by the github m-c mirror *is* required
> for Firefox OS.  OEM partners have been and are still depending on that.
...

(In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment #15)
> (In reply to Kendall Libby [:fubar] from comment #13)
> > OTOH, if all FFOS development has ceased because of that repo (which
> > according to RelEng is about to be replaced), then RelEng and IT need to
> > work together to purchase a system capable of handling a repo of this size
> > and growth.
> 
> This repository is used by our developers across all of our products, and at
> least by one of our partners (not sure if their name is public or not -- I'd
> be happy to name them in a private email if you want.)
...


(In reply to Aki Sasaki [:aki] from comment #30)
> (In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #12)
> > This repository, by being required by the github m-c mirror *is* required
> > for Firefox OS.  OEM partners have been and are still depending on that.
> > 
> > We need to find a solution for this repository, and we need it fast.
> 
> (In reply to :Ehsan Akhgari (needinfo? me!) from comment #15)
> > This repository is used by our developers across all of our products, and at
> > least by one of our partners (not sure if their name is public or not -- I'd
> > be happy to name them in a private email if you want.)
> 
> To clarify this point, RelEng was contacted by a partner recently
> complaining about how https://github.com/mozilla/mozilla-central contains
> non-fastforward commits, and that they cannot trust it to be a dependable
> source of history if that happens.  We pointed them to
> http://git.mozilla.org/?p=releases/gecko.git;a=summary , which is the
> RelEng-supported partner-oriented repository, which has denyNonFastForwards
> and denyDeletes rules set.  To my knowledge, all partners are now using
> http://git.mozilla.org/?p=releases/gecko.git;a=summary ; if not we should
> point them there.

vlad/ehsan: 

It *used* to be true that some OEMs were using Ehsan's repo on github instead of the repos on git.m.o. However, RelEng and Mozilla Technical Account Managers have worked over several months to transition all OEMs over to use git.m.o. As of last month, we believed all OEMs had transitioned, and were happily using the repos on git.m.o. 

Please forward to me the name(s) of any OEMs you believe are still using the repo on github, and we'll work with the Mozilla TAMs to transition those OEMs to git.m.o asap.
Flags: needinfo?(vladimir)
(In reply to John O'Duinn [:joduinn] from comment #31)
> (In reply to Kendall Libby [:fubar] from comment #27)
> > Ok, after a hiccup part way through, the repack finished and the resulting
> > repo size was reduced to ... *drumroll* 6.2G. I've copied it back to
> > git.m.o, re-enabled auto gc and reduced the window memory size to 512M.
> > During the repack, memory usage didn't exceed ~10G, which git.m.o can
> > handle, at least at the moment. :-)
> > 
> > Ehsan, please let us know if anything seems amiss with the repo!
> 
> (In reply to Kendall Libby [:fubar] from comment #28)
> > (in case there are issues the original, 200g, repo is still there, just
> > renamed. if there are no issues, we'll nuke it.)
> 
> (In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment
> #29)
> > (In reply to Kendall Libby [:fubar] from comment #27)
> > > Ok, after a hiccup part way through, the repack finished and the resulting
> > > repo size was reduced to ... *drumroll* 6.2G. I've copied it back to
> > > git.m.o, re-enabled auto gc and reduced the window memory size to 512M.
> > > During the repack, memory usage didn't exceed ~10G, which git.m.o can
> > > handle, at least at the moment. :-)
> > > 
> > > Ehsan, please let us know if anything seems amiss with the repo!
> > 
> > Thanks!  Everything seems to be fine, and my scripts can now push to this
> > repo successfully.
> 
> ehsan: if you are happy with this 6GB compacted repo, can :fubar now proceed
> to delete the earlier pre-compacted 200GB repo?

Sure.

(In reply to John O'Duinn [:joduinn] from comment #32)
> (In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment
> #26)
> > (In reply to Andrew Overholt [:overholt] from comment #25)
> > > (In reply to Kendall Libby [:fubar] from comment #20)
> > > > I won't hold up re-enabling the repo on disk usage alone, but RelEng, IT and
> > > > you will need to work out a plan for increasing the hardware (or other
> > > > alternatives) next week. However, if disk usage goes critical in the
> > > > meantime, we will have to disable it again, lest really, *really* bad things
> > > > happen to all of the repos.
> > > 
> > > When this discussion happens (via Vidyo?), please include jst (CC'd on this
> > > bug).  It would be great if we could (perhaps temporarily) transition what
> > > Ehsan is doing to jst some time next week (Oct 21-25) so it's off Ehsan's
> > > plate before his planned PTO and we can work through any issues by then. 
> > > Thanks!
> > 
> > I have not received any meeting invites yet, but I talked to jst and he
> > offered to help cover this when I'm away.
> 
> I've asked jakem to setup a meeting of all the stakeholders and find a
> solution that can be reliably supported in production.

Thanks.  FWIW now that this is no longer urgent, asynchronous communication would also work!
Flags: needinfo?(ehsan)
> (In reply to Aki Sasaki [:aki] from comment #30)
> > (In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #12)
> > > This repository, by being required by the github m-c mirror *is* required
> > > for Firefox OS.  OEM partners have been and are still depending on that.
> > > 
> > > We need to find a solution for this repository, and we need it fast.
> > 
> > (In reply to :Ehsan Akhgari (needinfo? me!) from comment #15)
> > > This repository is used by our developers across all of our products, and at
> > > least by one of our partners (not sure if their name is public or not -- I'd
> > > be happy to name them in a private email if you want.)
> > 
> > To clarify this point, RelEng was contacted by a partner recently
> > complaining about how https://github.com/mozilla/mozilla-central contains
> > non-fastforward commits, and that they cannot trust it to be a dependable
> > source of history if that happens.  We pointed them to
> > http://git.mozilla.org/?p=releases/gecko.git;a=summary , which is the
> > RelEng-supported partner-oriented repository, which has denyNonFastForwards
> > and denyDeletes rules set.  To my knowledge, all partners are now using
> > http://git.mozilla.org/?p=releases/gecko.git;a=summary ; if not we should
> > point them there.
> 
> vlad/ehsan: 
> 
> It *used* to be true that some OEMs were using Ehsan's repo on github
> instead of the repos on git.m.o. However, RelEng and Mozilla Technical
> Account Managers have worked over several months to transition all OEMs over
> to use git.m.o. As of last month, we believed all OEMs had transitioned, and
> were happily using the repos on git.m.o. 
> 
> Please forward to me the name(s) of any OEMs you believe are still using the
> repo on github, and we'll work with the Mozilla TAMs to transition those
> OEMs to git.m.o asap.

To close the loop: we talked offline, and the last remaining partner had already been migrated off to git.m.o. At this point, we all agree there are no known external partners using this github m-c mirror.
> > (In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment
> > #29)
> > > (In reply to Kendall Libby [:fubar] from comment #27)
> > > > Ok, after a hiccup part way through, the repack finished and the resulting
> > > > repo size was reduced to ... *drumroll* 6.2G. I've copied it back to
> > > > git.m.o, re-enabled auto gc and reduced the window memory size to 512M.
> > > > During the repack, memory usage didn't exceed ~10G, which git.m.o can
> > > > handle, at least at the moment. :-)
> > > > 
> > > > Ehsan, please let us know if anything seems amiss with the repo!
> > > 
> > > Thanks!  Everything seems to be fine, and my scripts can now push to this
> > > repo successfully.
> > 
> > ehsan: if you are happy with this 6GB compacted repo, can :fubar now proceed
> > to delete the earlier pre-compacted 200GB repo?
> 
> Sure.
> 


:fubar - if you havent already done so, you are now formally ok to delete the old pre-compacted 200gb repo whenever you have time.
Flags: needinfo?(vladimir) → needinfo?(klibby)
It's gone. I think everything else is all set or noted elsewhere, so closing bug. Feel free to reopen if there are still outstanding issues.
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: needinfo?(klibby)
Resolution: --- → FIXED
(In reply to Kendall Libby [:fubar] from comment #37)
> It's gone. I think everything else is all set or noted elsewhere, so closing
> bug. Feel free to reopen if there are still outstanding issues.

Excellent, thanks Fubar!
Component: WebOps: Source Control → General
Product: Infrastructure & Operations → Developer Services
You need to log in before you can comment on or make changes to this bug.