Closed Bug 959769 Opened 12 years ago Closed 12 years ago

Intermittent issues loading try pushlog and json-pushes

Categories

(Developer Services :: General, task)

x86_64
Linux
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Unassigned)

References

Details

We've been experiencing this for at least a couple of days now, maybe longer. Sometimes pages will load quickly, sometimes they won't load at all. This also means that TBPL often fails to load pages such as https://tbpl.mozilla.org/?tree=Try&rev=d3de6823f1cd or https://tbpl.mozilla.org/?tree=Try. Some examples: https://hg.mozilla.org/try/pushloghtml https://hg.mozilla.org/try/json-pushes?full=1&changeset=d3de6823f1cd
I have also run into this problem a ton in the last two days. For example: https://tbpl.mozilla.org/?tree=Try&rev=1ebdfe0c5320 https://tbpl.mozilla.org/?tree=Try&pusher=brian@briansmith.org However, I found that shift-reload somehow "fixed" it for me after repeated failures, though maybe that was just luck.
Bug 721152 will make this hg.m.o issue more tolerable, by allowing the &rev=foo case to partially load (using fake data for the rest), even if json-pushes isn't working.
Depends on: 721152
Summary: intermittent issues loading try pushlog and json-pushes → Intermittent issues loading try pushlog and json-pushes
Bunches of errors recently like this: [Wed Jan 15 04:58:24 2014] [error] [client 10.22.74.212] Premature end of script headers: hgweb.wsgi, referer: https://hg.mozilla.org/try/json-pushes And this: httpd[30563]: segfault at 8 ip 00007fae2ea9ee9a sp 00007fae2d792d90 error 4 in libpython2.6.so.1.0[7fae2e9ae000+15d000] Also, oomkiller pops up occasionally: Killed process 19491, UID 500, (httpd) total-vm:1276536kB, anon-rss:860440kB, file-rss:128kB Naturally, the involved logs don't include anything to corelate the messages. Will see about bumping apache logging up to info to see if we get anything useful about mod_wsgi.
Looks like most of the 'Premature end of script headers' errors are attributable to the mod_wsgi deadlock timer, e.g.: [Wed Jan 15 11:21:29 2014] [info] mod_wsgi (pid=23201): Daemon process deadlock timer expired, stopping process 'hg_mozill a_org'. [Wed Jan 15 11:21:29 2014] [info] mod_wsgi (pid=23201): Shutdown requested 'hg_mozilla_org'. [Wed Jan 15 11:21:34 2014] [info] mod_wsgi (pid=23201): Aborting process 'hg_mozilla_org'. [Wed Jan 15 11:21:34 2014] [error] [client 10.22.74.212] Premature end of script headers: hgweb.wsgi, referer: https://hg.mozilla.org/try/json-pushes It's currently set to 30s (default is 300s). It was set in r35245 to try to prevent web heads getting hung up in a stuck state. I'll bump it up a bit and see if that cuts down noticeably on the end of script errors.
So far today it looks like we're at 1/5th the number of 'Premature end of script' errors as compared to yesterday, with only 13 on try (plus an equal number of NFS stale filehandle errors). Early next week I'll look again; I may bump it up another 15-30 seconds.
I think try should be reset.
Depends on: 962275
How have things been since Try was reset?
Much better.
Excellent. I'm seeing almost no premature end of script errors now, and we were getting triple digits on each webhead before the reset. Current plan going forward is regular Try resets during the 6wk tree closing windows, so hopefully this won't reappear.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Blocks: 1030020
Component: WebOps: Source Control → General
Product: Infrastructure & Operations → Developer Services
You need to log in before you can comment on or make changes to this bug.