developer.mozilla.org times out when loading *some* articles

RESOLVED INCOMPLETE

Status

RESOLVED INCOMPLETE
7 years ago
6 years ago

People

(Reporter: dbaron, Unassigned)

Tracking

Details

(URL)

While some articles on developer.mozilla.org load fine, others, such as:
https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior
don't seem to load at all.  I just get "The connection was reset" when loading them.  (I've been seeing this problem consistently for about a week, since bug 93077 comment 17 prompted me to look at it -- I've attempted to load that article on most days since, and failed every time.)
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 712237
Reopening this, because bug 712237 has been marked fixed twice, but I still haven't been able to load this page (5 tries since bug 712237 was fixed, probably over 20 in the past 2 weeks).
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---

Comment 3

7 years ago
As a data point, I time out loading this page too.
I've done some tests with two pages that have this problem. Sometimes they do succeed, in that case, I always have X-Backend-Server:pm-dekiwiki01 in the response.

When they do succeed, I always have timing much lower than the time-out (that is between 5 and 20s with a timeout slightly higher than 30s).

These two pages where I see this problem right now do both have a lot of {{bugzilla}} templates.

(Times are CET which are PT -0900)

Raw data (for: https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior )
[08:01:22.752] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 31942ms]
--
[08:02:08.104] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 31760ms]
--
[08:09:10.899] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [undefined 59999ms]
--
[08:10:23.122] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 76217ms]
--
[08:14:40.557] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 32542ms]
--
[08:18:36.867] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 40620ms]
--
[08:21:20.315] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 34088ms]
--
[08:23:35.317] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 200 OK 21773ms]
--
[08:27:02.239] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 31708ms]
--
[08:34:51.525] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 33268ms]
--
[08:37:34.531] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 33238ms]
--
[08:50:23.185] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 32553ms]
--
[09:08:03.022] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 32101ms]
--
[09:20:29.279] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 30830ms]
--
[13:01:06.172] GET https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior [HTTP/1.1 500 Internal Server Error 33298ms]

And for https://developer.mozilla.org/en/Mozilla_CSS_support_chart

[08:12:18.379] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 33271ms]
--
[08:13:02.881] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 200 OK 9826ms]
--
[08:14:55.777] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 32294ms]
--
[08:15:48.607] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 33322ms]
--
[08:17:03.712] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 200 OK 10245ms]
--
--
[08:21:30.183] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 33642ms]
--
[08:27:15.826] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 33071ms]
--
[08:28:08.031] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 31362ms]
--
[08:35:25.950] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 32028ms]
--
[08:37:30.916] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 32739ms]
--
[08:38:47.163] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 31368ms]
--
[08:49:31.366] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 37461ms]
--
[09:01:59.997] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [undefined 33ms]
--
[09:07:23.269] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 32729ms]
--
[09:15:36.085] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 32066ms]
--
[09:18:07.193] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 200 OK 4898ms]
--
[09:25:47.386] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 32757ms]
--
[09:34:45.366] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 32612ms]
--
[09:41:00.064] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 32381ms]
--
[09:46:35.012] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 30521ms]
--
[10:00:12.591] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 200 OK 9504ms]
--
[10:17:53.510] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 33066ms]
--
[10:23:47.752] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 32909ms]
--
[11:12:45.472] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 32546ms]
--
[11:45:33.310] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 31275ms]
--
[13:00:20.734] GET https://developer.mozilla.org/en/Mozilla_CSS_support_chart [HTTP/1.1 500 Internal Server Error 31109ms]

Comment 5

7 years ago
When you say "a lot of {{bugzilla}} templates", how many is "a lot"?

Could we find (or make) a set of pages that use progressively more and more {{bugzilla}} templates?

For me, loading a particular Bugzilla bug often takes a few seconds. I'm wondering if maybe we have a situation where too many slow Bugzilla responses are causing Zeus to time out the page load.


Yesterday afternoon the issue with MDN being unable to reach Bugzilla was fixed. Today it appears to be broken again. I have reopened bug 712237 about this, and it's marked as 'critical'.


As far as pm-dekiwiki01 being the only working server, I believe this is because 02 and 03 are currently set as passive/failover servers in Zeus. This was due to an earlier issue where they were performing poorly, which I believe may be a Lucene indexing issue (01 is currently the indexing server, and 02/03 send queries to it).
Jake. I can't give you the precise numbers right now (as I can't access the two pages right now), but:
the page given by D. Baron is about 50, and the second one about 20.

I am in the progress of making pages that progressively use more {{bug(xyz)}} template on the MDN.

URLs are
https://developer.mozilla.org/User:Teoli/Bugzilla_test   (1 templates)
https://developer.mozilla.org/User:Teoli/Bugzilla_test5  (5 templates)
https://developer.mozilla.org/User:Teoli/Bugzilla_test10 (10 templates)
https://developer.mozilla.org/User:Teoli/Bugzilla_test20 (20 templates)
https://developer.mozilla.org/User:Teoli/Bugzilla_test50 (50 templates)

(it is a wiki, feel free to edit them if you need them)

I was able to load the last once after I've created, but no more now.
Can we just make the bug template not access bugzilla, so that we can actually see these pages?
David: I did it. The pages should be accessible again.
Jake: the wiki pages that I just created use now a copy of the original templates and do connect to bmo. So they still should exhibit the problem.

Note that it is a workaround, and we really need to fix the problem.

Comment 9

7 years ago
Excellent, thanks for the test pages. The first one (1 template) took a very long time, but did ultimately load. All of the others timed out with the Firefox "connection was reset" page.

It looks like MDN still cannot connect to bugzilla at all- I'm guessing on the first page, the mdn->bugzilla connection times out and MDN proceeds to load the page without that info... the others take long enough that Firefox times out the whole page load.

Updated

7 years ago
Depends on: 712237
Duplicate of this bug: 703969

Comment 11

7 years ago
The URLs in comment 6 are all working now... I believe the MDN->Bugzilla issue is fixed. It seems to have been a combination of 2 things- 1 ACL, and 1 LB setting on Bugzilla's side, that was actually affecting more than just MDN (but MDN was a pretty visible case). The ACL was initially responsible for the total failure, and the LB setting was causing intermittent (but pretty frequent) problems.

Interestingly, those 5 pages all take approximately the same time to load (3-5 seconds for me). I suspect either they're being fetched in parallel, or there's some optimization which realizes they're all the same request and only does it once.

pm-dekiwiki02 and 03 are currently disabled again. I don't want to enable them right now as I'm about to leave for the day... we can try that tomorrow, if the MDN-Bugzilla link shows no more evidence of problems.

Comment 12

7 years ago
2 and 3 are turned back up now. The link to Bugzilla is stable now... go ahead and revert the main template to use it.

I'm going to move this bug to the MDN component, to make sure that fixing the template doesn't get forgotten.
Assignee: server-ops → nobody
Status: REOPENED → NEW
Component: Server Operations → Website
Product: mozilla.org → Mozilla Developer Network
QA Contact: cshields → website
Version: other → Deki
I reverted the change in the template. I'll close this bug later today if everything still looks ok.
Un-revert it; the quirks mode page, for example, is back to timing out. :(
Un-reverted. :-(
(In reply to Jake Maul [:jakem] from comment #16)
> How is the quirks mode page:
> 
> https://developer.mozilla.org/en/Mozilla_Quirks_Mode_Behavior
> 
> Different from the test pages:
> 
> https://developer.mozilla.org/User:Teoli/Bugzilla_test50

It's not; the test pages do it too. I just didn't say so.

Comment 18

7 years ago
The test pages are working fine for me currently. I understand they're a different template right now that's *not* disabled, so that should be a reasonable test, right? I didn't have time to hit them between comments 13 and 15.

Can anyone else confirm if the test pages are working as expected?
There is one difference between these pages and real ones: they link to the same bug and not to different bugs.

I will edit them to link to x different bugs.
I updated the test pages with different bug # (as it seems that DekiWiki is caching the results — good but unfortunate in that specific case).
URLs are
https://developer.mozilla.org/User:Teoli/Bugzilla_test   (1 templates)
https://developer.mozilla.org/User:Teoli/Bugzilla_test5  (5 templates)
https://developer.mozilla.org/User:Teoli/Bugzilla_test10 (10 templates)
https://developer.mozilla.org/User:Teoli/Bugzilla_test20 (20 templates)
https://developer.mozilla.org/User:Teoli/Bugzilla_test50 (50 templates)

The problem can be reproduced there.

Comment 21

7 years ago
With these updated test pages, I see very dramatic load time differences between them, and between runs. The first run against a URL is very slow... subsequent loads are generally somewhat faster- I believe due to Bugzilla-side caching (MySQL query cache and such). Eventually, once all 3 deki nodes have hit and cached the content, the page is always fast (at least until it gets purged from cache for some reason).

For example, on test10, my first curl run timed out (30s). My second run succeeded in 25s, and pm-dekiwiki02 was the chosen backend. My third run was 13s, with pm-dekiwiki03 as the backend... I attribute the difference to Bugzilla-side caching. The next load ran aginst pm-dekiwiki02 again, and was instant... deki cache. 


https://developer.mozilla.org/User:Teoli/Bugzilla_test   (1 templates)    5s
https://developer.mozilla.org/User:Teoli/Bugzilla_test5  (5 templates)    10-17s
https://developer.mozilla.org/User:Teoli/Bugzilla_test10 (10 templates)   25s-timeout
https://developer.mozilla.org/User:Teoli/Bugzilla_test20 (20 templates)   timeout
https://developer.mozilla.org/User:Teoli/Bugzilla_test50 (50 templates)   timeout

I suspect deki is hitting the bugs in serial, so more bugs == longer load time.

Bypassing Zeus and hitting a node directly, here's some results loading the 20-template page. These are all with less than 1 minute between requests.
1st hit: 45 seconds
2nd hit: <1 second
3rd hit: 44 seconds (maybe cache got flushed?)
4th hit: 10 seconds
5th hit: 10 seconds

Subsequent hits seemed to settle in around 7-15 seconds fairly consistently.

This tells me there's not much we can do. Pages will need to be hit a few times (at least once per node) to get properly cached everywhere before they'll be fast.

This may have been working better before simply because those pages were being hit frequently. Now, it's been several days since the underlying Bugzilla bugs have actually been fetched... the data is likely out of cache on the Bugzilla side, and definitely on the Deki side.

One possible solution would be to attempt to "prime" the caches, but prefetching certain pages (or just re-fetching them periodically). For this to work we'd need a list of URLs to prefetch, and set up a simple cron job to hit them all (on each machine, ideally).

We can also increase the timeout in Zeus, which should give a bit more room for error on pages that aren't fully cached. This won't make them fast, but may avoid some complete timeouts.

Comment 22

7 years ago
I have updated the Zeus timeouts to be 45 seconds, which should at least allow some pages to actually load as expected, even if they're not cached. Previous timeout was 30 seconds, so this is a 50% increase. Of course this won't help for pages that use a severe amount of Bugzilla template invocations, and of course this should never come into play with pages that *are* cached.

I'm going to close this bug out, as I don't know what more IT can do on this. I can't easily make Bugzilla responses faster (that's a whole separate issue in itself), and I can't make Deki query them in parallel instead of serial.

If someone would like to experiment with cache priming, please re-open this bug (or file a new one, either way) and include a list of URLs that should be primed. We can set up a cron job that will hit them on a regular basis on each server, so that they're (hopefully) always fast.
Status: NEW → RESOLVED
Last Resolved: 7 years ago7 years ago
Resolution: --- → INCOMPLETE
(Assignee)

Updated

6 years ago
Version: Deki → unspecified
(Assignee)

Updated

6 years ago
Component: Website → Landing pages
Product: Mozilla Developer Network → Mozilla Developer Network
You need to log in before you can comment on or make changes to this bug.