Closed Bug 587674 Opened 14 years ago Closed 14 years ago

pp-web01 is serving planet.mozilla.org with stale data (posts no more recent than Aug. 12, 2010)

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: WeirdAl, Assigned: oremj)

References

Details

I can't reproduce it now, but all over the weekend, I kept seeing the most recent post for planet was August 12.

Ping times for planet.mozilla.org are differing based on IP addresses:
Reply from 63.245.217.21: bytes=32 time=37ms TTL=48

Reply from 63.245.213.94: bytes=32 time=205ms TTL=50 (this one most frequent)

Reply from 63.245.209.11: bytes=32 time=23ms TTL=244

Reply from 63.245.213.93: bytes=32 time=206ms TTL=50
When it does appear, the topmost entry is:

Mobile Add-on Developers: Update Your Mobile Add-on to 2.0a1
Thursday, August 12, 2010 5:23 PM
Assignee: nobody → server-ops
Component: planet.mozilla.org → Server Operations
OS: Windows 7 → All
Product: Websites → mozilla.org
QA Contact: planet-mozilla-org → mrz
Hardware: x86 → All
Version: unspecified → other
I've been seeing it a lot this weekend, and again today (but I could never reproduce it in Firefox with Live HTTP Headers).  I finally managed to catch it today in Firefox and I've got the Live HTTP Headers to lay the blame.  It looks like pp-web01 is the culprit of the stale data.

http://planet.mozilla.org/

GET / HTTP/1.1
Host: planet.mozilla.org
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie: __utma=150903082.2118154255.1270698538.1281849346.1281981606.26; __utmz=150903082.1270698538.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); WT_FPC=id=68.157.168.83-3451810576.30081694:lv=1281996024034:ss=1281996021327; __utmc=150903082; __utmb=150903082
If-Modified-Since: Fri, 13 Aug 2010 02:38:34 GMT
If-None-Match: "69bd9-48dab610cde80"
Cache-Control: max-age=0

HTTP/1.1 200 OK
Server: Apache
X-Backend-Server: pp-web01
Cache-Control: max-age=300
Content-Type: application/xhtml+xml;charset=utf-8
Date: Mon, 16 Aug 2010 18:02:46 GMT
Keep-Alive: timeout=20, max=994
Expires: Mon, 16 Aug 2010 18:07:46 GMT
Accept-Ranges: bytes
Etag: "69bd9-48dab8a6a7800"
Connection: Keep-Alive
Last-Modified: Fri, 13 Aug 2010 02:50:08 GMT
X-Cache-Info: caching
Content-Length: 433113
----------------------------------------------------------
Summary: planet frequently appears with posts no more recent than Aug. 12, 2010 → pp-web01 is serving planet.mozilla.org with stale data (posts no more recent than Aug. 12, 2010)
(See also bug 566038, bug 560433, and bug 550494 where "stale planet" has happened before.)
Jeremy,

Seems like something is busted with git.

[root@pp-web01 ~]# /data/bin/libget/get-static-www.sh
fatal: protocol error: bad line length character: git:

stat("/data", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
stat("/data/www", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
chdir("/data/www")                      = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x2b9ba9a02e60) = 4640
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGINT, {0x436c60, [], SA_RESTORER, 0x3aab8302d0}, {SIG_DFL, [], SA_RESTORER, 0x3aab8302d0}, 8) = 0
wait4(-1, fatal: protocol error: bad line length character: git:
[{WIFEXITED(s) && WEXITSTATUS(s) == 128}], 0, NULL) = 4640
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(-1, 0x7fffc3422e74, WNOHANG, NULL) = -1 ECHILD (No child processes)
rt_sigreturn(0xffffffffffffffff)        = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x3aab8302d0}, {0x436c60, [], SA_RESTORER, 0x3aab8302d0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(255, "", 283)                      = 0
exit_group(128
Assignee: server-ops → jeremy.orem+bugs
I noticed the entry which would normally come above it was from Gen Kanai, but it reported HTTP 404.
Any progress here? Having planet be effectively broken for several days isn't great.
Severity: normal → major
I disabled pp-web01 until it's fixed.
Looks like git updated itself on ip-admin01, but didn't install git-daemon. Installing git-daemon fixed the problem.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.