Closed Bug 573117 Opened 15 years ago Closed 15 years ago

Getting a lot of empty responses from the server when requesting pages on drumbeat.org

Categories

(mozilla.org Graveyard :: Server Operations, task)

x86
Linux
task
Not set
major

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: paulosman, Assigned: dmoore)

References

()

Details

I've been getting lot of empty responses from drumbeat.org today. Seems to be more common when requesting paths other than '/' (i.e. /people, /project-index). Is anything being reported in the Apache error log? The output would be consistent with PHP seg faulting or running out of memory / reaching max execution time / etc. $ telnet www.drumbeat.org 80 Trying 63.245.217.39... Connected to drumbeat.org. Escape character is '^]'. GET /people HTTP/1.1 Host: www.drumbeat.org Connection closed by foreign host. $ curl -D - http://www.drumbeat.org/project-index curl: (52) Empty reply from server Good response: $ curl -D - http://www.drumbeat.org/ HTTP/1.1 200 OK Server: Apache X-Backend-Server: pp-drumbeat01 Vary: Accept-Encoding Cache-Control: must-revalidate Content-Type: text/html; charset=utf-8 Date: Fri, 18 Jun 2010 20:26:43 GMT Expires: Sun, 19 Nov 1978 05:00:00 GMT Transfer-Encoding: chunked ETag: "5992eb239caafe36ef5c244f946a25c2" Set-Cookie: SESS10c66e53b380515677fb7bd5d6939b2a=j85l3pabu40jau7nv11tmij9e7; expires=Mon, 12-Jul-2010 00:00:03 GMT; path=/; domain=.drum beat.org Last-Modified: Fri, 18 Jun 2010 20:24:00 GMT X-Powered-By: PHP/5.2.9 X-Cache-Info: not cacheable; response has already expired <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> ...
Not much in the error logs (http://oremj.pastebin.mozilla.org/737556), but I think drupal does a bunch of error suppression.
Hmm. So helpful of Drupal :) Okay, could I get a MySQL dump of mozcrm_drupal and mozcrm_civicrm dropped into /root on drumbeat.stage.mozilla.com then? I can't repro the problem on staging or a local development instance and I'd like to see if it's related to any drupal changes made in the last little while (doesn't seem likely, but seems like a place to start debugging).
Done: drumbeat_org_civicrm.dump drumbeat_org_drupal.dump
Thanks. Unfortunately, like I suspected, it's not related to any Drupal settings (or any code changes).
You aren't able to reproduce at all on staging?
Turns out it is blocking on a slow query: SELECT COUNT(*) FROM (SELECT DISTINCT(node.nid) AS nid_1, node_data_field_p_image.field_p_image_fid AS node_data_field_p_image_field_p_image_fid, node_data_field_p_image.field_p_image_list AS node_data_field_p_image_field_p_image_list, node_data_field_p_image.field_p_image_data AS node_data_field_p_image_field_p_image_data, node.type AS node_type, node.nid AS nid, node.vid AS node_vid, node.title AS node_title, og.og_description AS og_og_description, (SELECT COUNT(*) FROM og_uid ou INNER JOIN users u ON ou.uid = u.uid WHERE ou.nid = og.nid AND u.status > 0 AND ou.is_active >= 1 AND ou.is_admin >= 0 ) AS member_count, votingapi_cache_node_points_vote_count.value AS votingapi_cache_node_points_vote_count_value, DATE_FORMAT((FROM_UNIXTIME(node.created) + INTERVAL -25200 SECOND), '%Y%m%d%H%i') AS node_created_minute FROM node node LEFT JOIN og_uid og_uid ON node.nid = og_uid.nid LEFT JOIN node node_og_uid ON og_uid.nid = node_og_uid.nid LEFT JOIN votingapi_cache votingapi_cache_node_points_vote_count ON node.nid = votingapi_cache_node_points_vote_count.content_id AND (votingapi_cache_node_points_vote_count.content_type = 'node' AND votingapi_cache_node_points_vote_count.value_type = 'points' AND votingapi_cache_node_points_vote_count.tag = 'vote' AND votingapi_cache_node_points_vote_count.function = 'count') LEFT JOIN content_type_project node_data_field_p_image ON node.vid = node_data_field_p_image.vid LEFT JOIN og og ON node.nid = og.nid WHERE (node.status <> 0 OR (node.uid = 3568 AND 3568 <> 0) OR 1 = 1) AND (node.type in ('project')) GROUP BY nid_1 ORDER BY node_created_minute DESC ) count_alias;
It looks like /people isn't running one long query, but running about 2000 queries and eventually timing out.
Yeah, I've seen pages that run ~ 2000 queries (mind boggling) but in the past they've just been slow, not timing out. That query above is troubling and surprises me even for this site, which uses a lot of similar but not-as-slow queries. Clearly that one will have to be at least optimized if not removed. I guess the main question I have is why it's stopped working now? The same slow queries work on staging, and take forever, but PHP and Apache still come back with something. It's only on the production server that the web server gives up and fails to return anything. Could it be a MySQL or PHP configuration issue?
I bumped the connect timeout on the load balancer. I think that will "fix" the issue. Real fix will be to fix those queries.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Awesome, thanks Jeremy. I got rid of the inner most SELECT COUNT(*) (member_count) which brought the query time from ~6 seconds to 0.16 seconds on my development machine, which should help. I also cut the number of queries on the people page down to about one half. Still a lot of performance optimization to do, but this helps a lot. Thanks again.
Getting empty responses regularly now, but only when logged in as an admin user. Load times are significantly slower. We're running a nearly identical database on staging and not experiencing the problems there. Check out load time differences: Anonymous request: $ time curl https://www.drumbeat.org ... real 0m1.938s Admin request: $ time curl -b SESS...=session_id https://www.drumbeat.org ... real 0m6.060s And many requests just time out. I.e. $ time curl -b SESS...=session_id https://www.drumbeat.org/admin/build/views/enable/project_mailing_list?destination=admin%2Fbuild%2Fviews%2Flist curl: (52) Empty reply from server real 0m32.560s user 0m0.010s sys 0m0.020s As I mentioned, can't replicate on staging. If you drop another database dump of 'mozcrm_drupal' and 'mozcrm_civicrm' in /root on drumbeat.stage.mozilla.com I can try again with the identical database config.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Compare with staging: $ time curl -k https://drumbeat.stage.mozilla.com ... real 0m1.165s $ time curl -k -b SESS...=session_id https://drumbeat.stage.mozilla.com ... real 0m1.689s As I mentioned in comment # 11, I can redo the comparison with an identical setup if you drop another database dump on staging.
I've got Paul Booker (community member) looking at ways of optimizing the SQL being generated on the site, but any explanation(s) for the performance discrepancy between production and staging?
I haven't had much time to look at this. Do you want to jump on and debug? Derek, can you give him network access to pp-drumbeat01 and then reassign back to me?
Assignee: jeremy.orem+bugs → dmoore
Thanks Jeremy. If I can get access, I can debug and see if I can figure anything out. Derek, let me know if you need anything from me (i.e. keys, etc).
Don't have a good way to get you remote access to this box. Should be fixed when we get the other KVM boxes up. I can hold this open until then or close it knowing we'll move stuff over to those boxes. Your call.
I'll close it out then. I'll investigate once we move over to the new boxes and I have access. That's all being tracked in Bug # 569251 anyway I assume. Thanks mrz.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → INCOMPLETE
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.