Closed Bug 636314 Opened 13 years ago Closed 13 years ago

Problems updating git websites (timeouts, locks, etc.)

Categories

(Infrastructure & Operations Graveyard :: NetOps, task)

All
Other
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: clouserw, Unassigned)

References

Details

For the past couple weeks we've had trouble updating git websites on our preview and production boxes.  The update script on preview complains about a git process already running (because it's taking so long to update), and I know it's delayed our production pushes for the past couple weeks because oremj has to keep retrying to update.

Any attempts to update outside of the production/preview boxes is fast and finished within seconds.  All the developers continue to use github on their boxes with no problems.

Has their been any iptables changes or firewall reconfigurations on the production/stage boxes in the past few weeks?
I've done some stracing and tcpdumps. It looks like everything is fine, but eventually we send a package and never get anything back. Is this a netops thing? We weren't sure if the problem was with us or them.
Assignee: server-ops → network-operations
Component: Server Operations → Server Operations: Netops
Can we have some more data here?  What are the host IP pairs?  How can I reproduce the problem?
git clone git://github.com/jbalogh/zamboni.git
git fetch

The fetch will randomly hang. This happens most often on mradm02. I'm not certain this is a network problem or even our problem, but it does seems to hang after on a recv from github.
This eventually times out:

[root@pm-app-amodev01 ~]# time ssh -vv git@github.com
OpenSSH_4.3p2, OpenSSL 0.9.8e-fips-rhel5 01 Jul 2008
debug1: Reading configuration data /root/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to github.com [207.97.227.239] port 22.
debug1: connect to address 207.97.227.239 port 22: Connection timed out
ssh: connect to host github.com port 22: Connection timed out

real	3m9.006s
user	0m0.001s
sys	0m0.003s

From anywhere else it connects immediately.
I've been doing a git fetch from cm-metricsapp02 every minute for two days now without any issues until github's acknowledged problem tonight (http://twitter.com/#!/github/status/45324428137070592).
What are you trying to accomplish by running the fetch every minute?  I'm unclear what you're asking to be looked into.  That you're having difficult time accessing a 3rd party resource off our network?
Please see comment 3 and comment 4.  We're trying to update our sites.

I suspect comment 5 is just to show that the problem is with the mozilla infrastructure and not github.
(In reply to comment #7)
> What are you trying to accomplish by running the fetch every minute?  

I'm running the git fetch to get new code and restart my app. I'm running it every minute because I can. Our big sites run git fetch every 5 minutes and fail a lot these days.

> I'm unclear what you're asking to be looked into.  That you're having difficult
> time accessing a 3rd party resource off our network?

I'm hoping to narrow down the source of this problem. Updates from mradm02 are failing but cm-metricsapp02 appears to be fine.
Can you tell me the correct way to reproduce this?  Did it ever work from pm-app-amodev01?

[root@mradm02 636314]# git clone git://github.com/jbalogh/zamboni.git
Cloning into zamboni...
remote: Counting objects: 37025, done.
remote: Compressing objects: 100% (9645/9645), done.
remote: Total 37025 (delta 26411), reused 35710 (delta 25248)
Receiving objects: 100% (37025/37025), 13.29 MiB | 7.08 MiB/s, done.
Resolving deltas: 100% (26411/26411), done.

[root@mradm02 636314]# git fetch
fatal: Not a git repository (or any of the parent directories): .git
So the script in question is kicked off by cron

/etc/cron.d/updates-amo:*/5 * * * * root /data/bin/omg_push_zamboni_preview_live.sh > /dev/null

The script does a lot of different things that each of which could fail causing issues.  Manually running a git fetch completed in 0.6s on 5 second intervals for 5 minutes. 

It is still unclear to me if it ever worked from pm-app-amodev01.

I'm not sure who maintains the script.  Jeremy, is this you?  Can you add some debugging/verbosity to it to help track which part is failing?
(In reply to comment #11)
> So the script in question is kicked off by cron
> 
> /etc/cron.d/updates-amo:*/5 * * * * root
> /data/bin/omg_push_zamboni_preview_live.sh > /dev/null
> 
> The script does a lot of different things that each of which could fail causing
> issues.  Manually running a git fetch completed in 0.6s on 5 second intervals
> for 5 minutes.

The fetch is the failing part. Of course it's fine now, but I looked earlier today and [1] said

+ /usr/bin/git fetch
fatal: The remote end hung up unexpectedly

[1]: https://addons.allizom.org/media/updater.output.txt

fox2mike and oremj have had fetch issues during releases as well.
As a point of reference, I was seeing a lot of fetch errors like this: https://hudson.mozilla.org/job/amo-master-js/16/console

But when I deleted the my workspace and rebuilt, it worked fine.  I know clone is different then fetch but it might be worth upgrading Hudson/Jenkins to see if it fixes the problem.  It would be nice to upgrade anyway.
Can we try adding this to git.config ?

git config http.postBuffer 524288000

This was a recent error when trying to update PAMO:

+ date
Fri Apr 15 13:02:02 PDT 2011
+ /usr/bin/git fetch
error: RPC failed; result=52, HTTP code = 100
fatal: The remote end hung up unexpectedly
+ echo 'Fetch failed'
Fetch failed
+ exit 1

The HTTP code 100 (continue) seems to suggest that our git buffer needs to be bigger.  http://support.github.com/discussions/repos/4323-error-rpc-failed-result22-http-code-411#comment_3265602

(I don't know if this will address the other fetch problems though)
Added that to preview.
I'm not sure what the action here is anymore.  Resolved?
jbalogh created http://gitmirror.mozilla.org/ as a workaround which I believe is working well
I don't think it's an issue on our end. I opened http://support.github.com/discussions/repos/5897-flaky-access-to-git-fetch-from-mozillas-network with github.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.