Closed
Bug 798756
Opened 12 years ago
Closed 2 years ago
plugins.mozilla.org and kohana
Categories
(Websites :: plugins.mozilla.org, defect)
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: ericz, Assigned: laura)
References
Details
Attachments
(2 files)
plugins[5-6].webapp.phx1 flapped with these alerts: [00:17] < nagios-phx1> | Sat 00:17:19 PDT [171] plugins6.webapp.phx1.mozilla.com:https - port 81 plugins.m.o is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1007 bytes in 0.037 second response time [01:40] < nagios-phx1> | Sat 01:40:26 PDT [183] plugins5.webapp.phx1.mozilla.com:https - port 81 plugins.m.o is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 1007 bytes in 0.047 second response time Trying it in a browser gives this error: An error occurred during a connection to plugins5.webapp.phx1.mozilla.com:81. SSL received a record that exceeded the maximum permissible length. (Error code: ssl_error_rx_record_too_long)
Reporter | ||
Comment 1•12 years ago
|
||
plugins4.webapp.phx1.mozilla.com just alerted the same way.
Reporter | ||
Updated•12 years ago
|
Summary: plugins[5-6].webapp.phx1 SSL received a record that exceeded the maximum permissible length → plugins[4-6].webapp.phx1 SSL received a record that exceeded the maximum permissible length
Reporter | ||
Comment 2•12 years ago
|
||
plugins.mozilla.org is down now (was working before) even though all of the individual plugins alerts above cleared (and the site was working a few minutes ago).
Severity: normal → critical
Reporter | ||
Comment 3•12 years ago
|
||
More specifically https://plugins.mozilla.org/en-us says only: Ouch Something went south...
Reporter | ||
Comment 4•12 years ago
|
||
And now it is back up again. Lowering severity.
Severity: critical → normal
Summary: plugins[4-6].webapp.phx1 SSL received a record that exceeded the maximum permissible length → plugins.mozilla.org briefly down, SSL alerts
Reporter | ||
Comment 5•12 years ago
|
||
Noticed this in the logs on each of the plugins[3-6]: [Sun Sep 30 03:33:01 2012] [warn] WARNING: Attempt to change ServerLimit ignored during restart Not sure what that means but after manually restarting httpd it doesn't show.
Comment 6•12 years ago
|
||
Looks like we have increased traffic, and we probably need to add additional nodes.
Comment 7•12 years ago
|
||
I am also seeing a lot of these in my the plugins db nodes: [root@plugins2.db.phx1 ~]# tail -n2 /var/log/mysql/mysql.err 121006 6:23:41 [Warning] Too many connections 121006 6:23:41 [Warning] Too many connections Can we raise the connection limit or do we need to stand up additional slaves?
Comment 8•12 years ago
|
||
max_connections is 1200, which is pretty high, but we can raise it.
Comment 9•12 years ago
|
||
updated max_connections to 5000 (from 1200, the default for us) in puppet and on the machines. plugins1, the rw/master, is OK, only about 200 connections, but immediately the slave went from 1200 connections to 2779 connections....looking to see if there's a smoking gun wrt a query.
Comment 10•12 years ago
|
||
tons of these (over 1200): 1666072743 plugins_user 10.8.70.202:22948 plugins_mozilla_org Query 0 Opening tables SHOW COLUMNS FROM `plugins` 0 0 1 1666072744 plugins_user 10.8.70.202:22949 plugins_mozilla_org Query 0 Opening tables SHOW COLUMNS FROM `plugins` 0 0 1 1666072745 plugins_user 10.8.70.202:22950 plugins_mozilla_org Query 0 Opening tables SHOW COLUMNS FROM `plugins` 0 0 1 [root@plugins2.db.phx1 ~]# mysql -e "show full processlist" | grep SHOW.COLUMNS.FROM..plugins | wc -l 1223
Comment 11•12 years ago
|
||
Ganglia graphs for web and db show that yesterday (Friday) at noon, traffic really started kicking into high gear: https://ganglia-phx1.mozilla.org/ganglia/?r=day&cs=&ce=&m=load_one&s=by+name&c=plugins-web&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 and https://ganglia-phx1.mozilla.org/ganglia/?r=day&cs=&ce=&m=load_one&s=by+name&c=plugins-db&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4
Comment 12•12 years ago
|
||
The SHOW COLUMNS btw go away right away, so it's not anything locking/killing things.
Comment 13•12 years ago
|
||
The max time any query is taking is 6 seconds....but we're still seeing the SHOW COLUMNS FROM plugins query which is odd...especially in conjunction with temporary tables (this is on plugins2): mysql> select * from processlist order by TIME DESC limit 10\G *************************** 1. row *************************** ID: 1 USER: system user HOST: DB: NULL COMMAND: Connect TIME: 14323605 STATE: Waiting for master to send event INFO: NULL TIME_MS: 14323604575 ROWS_SENT: 0 ROWS_EXAMINED: 0 ROWS_READ: 1 *************************** 2. row *************************** ID: 3 USER: repl HOST: 10.8.70.153:34773 DB: NULL COMMAND: Binlog Dump TIME: 14323589 STATE: Master has sent all binlog to slave; waiting for binlog to be up INFO: NULL TIME_MS: 14323588958 ROWS_SENT: 0 ROWS_EXAMINED: 0 ROWS_READ: 2 *************************** 3. row *************************** ID: 2 USER: system user HOST: DB: NULL COMMAND: Connect TIME: 1422 STATE: Slave has read all relay log; waiting for the slave I/O thread t INFO: NULL TIME_MS: 1421624 ROWS_SENT: 0 ROWS_EXAMINED: 0 ROWS_READ: 2 *************************** 4. row *************************** ID: 1666361804 USER: plugins_user HOST: 10.8.70.202:12580 DB: plugins_mozilla_org COMMAND: Query TIME: 9 STATE: removing tmp table INFO: SHOW COLUMNS FROM `platforms` TIME_MS: 8346 ROWS_SENT: 5 ROWS_EXAMINED: 5 ROWS_READ: 6 *************************** 5. row *************************** ID: 1666363653 USER: plugins_user HOST: 10.8.70.202:15505 DB: plugins_mozilla_org COMMAND: Query TIME: 8 STATE: removing tmp table INFO: SHOW COLUMNS FROM `plugins` TIME_MS: 8064 ROWS_SENT: 12 ROWS_EXAMINED: 12 ROWS_READ: 13 *************************** 6. row *************************** ID: 1666362557 USER: plugins_user HOST: 10.8.70.202:13741 DB: plugins_mozilla_org COMMAND: Query TIME: 8 STATE: removing tmp table INFO: SHOW COLUMNS FROM `platforms` TIME_MS: 8014 ROWS_SENT: 5 ROWS_EXAMINED: 5 ROWS_READ: 6 *************************** 7. row *************************** ID: 1666360426 USER: plugins_user HOST: 10.8.70.202:10488 DB: plugins_mozilla_org COMMAND: Query TIME: 8 STATE: removing tmp table INFO: SHOW COLUMNS FROM `plugin_releases` TIME_MS: 8047 ROWS_SENT: 29 ROWS_EXAMINED: 29 ROWS_READ: 30 *************************** 8. row *************************** ID: 1666359693 USER: plugins_user HOST: 10.8.70.202:9307 DB: plugins_mozilla_org COMMAND: Query TIME: 8 STATE: removing tmp table INFO: SELECT `plugins`.`id` AS `plugin_id`, `plugins`.`pfs_id` AS `plugin_pfs_id`, `plugins`.`original_plugin_id` AS `plugin_original_plugin_id`, `plugins`.`sandbox_profile_id` AS `plugin_sandbox_profile_id`, `plugins`.`name` AS `plugin_name`, `plugins`.`description` AS `plugin_description`, `plugins`.`vendor` AS `plugin_vendor`, `plugins`.`url` AS `plugin_url`, `plugins`.`icon_url` AS `plugin_icon_url`, `plugins`.`license_url` AS `plugin_license_url`, `plugins`.`modified` AS `plugin_modified`, `plugins`.`created` AS `plugin_created`, `plugin_releases`.`id` AS `pluginrelease_id`, `plugin_releases`.`plugin_id` AS `pluginrelease_plugin_id`, `plugin_releases`.`os_id` AS `pluginrelease_os_id`, `plugin_releases`.`platform_id` AS `pluginrelease_platform_id`, `plugin_releases`.`status_code` AS `pluginrelease_status_code`, `plugin_releases`.`name` AS `pluginrelease_name`, `plugin_releases`.`description` AS `pluginrelease_description`, `plugin_releases`.`vendor` AS `pluginrelease_vendor`, `plugin_releases`.`url` AS `pluginrelease_url`, `plugin_releases`.`icon_url` AS `pluginrelease_icon_url`, `plugin_releases`.`vulnerability_description` AS `pluginrelease_vulnerability_description`, `plugin_releases`.`vulnerability_url` AS `pluginrelease_vulnerability_url`, `plugin_releases`.`guid` AS `pluginrelease_guid`, `plugin_releases`.`filename` AS `pluginrelease_filename`, `plugin_releases`.`version` AS `pluginrelease_version`, `plugin_releases`.`detected_version` AS `pluginrelease_detected_version`, `plugin_releases`.`detection_type` AS `pluginrelease_detection_type`, `plugin_releases`.`xpi_location` AS `pluginrelease_xpi_location`, `plugin_releases`.`installer_location` AS `pluginrelease_installer_location`, `plugin_releases`.`installer_hash` AS `pluginrelease_installer_hash`, `plugin_releases`.`installer_shows_ui` AS `pluginrelease_installer_shows_ui`, `plugin_releases`.`manual_installation_url` AS `pluginrelease_manual_installation_url`, `plugin_releases`.`license_url` AS `pluginrelease_license_url`, `plugin_releases`.`needs_restart` AS `pluginrelease_needs_restart`, `plugin_releases`.`min` AS `pluginrelease_min`, `plugin_releases`.`max` AS `pluginrelease_max`, `plugin_releases`.`xpcomabi` AS `pluginrelease_xpcomabi`, `plugin_releases`.`modified` AS `pluginrelease_modified`, `plugin_releases`.`created` AS `pluginrelease_created`, `oses`.`id` AS `os_id`, `oses`.`name` AS `os_name`, `platforms`.`id` AS `platform_id`, `platforms`.`app_id` AS `platform_app_id`, `platforms`.`app_release` AS `platform_app_release`, `platforms`.`app_version` AS `platform_app_version`, `platforms`.`locale` AS `platform_locale` FROM (`plugins`) JOIN `plugin_releases` ON (`plugin_releases`.`plugin_id` = `plugins`.`id`) JOIN `oses` ON (`oses`.`id` = `plugin_releases`.`os_id`) JOIN `platforms` ON (`platforms`.`id` = `plugin_releases`.`platform_id`) JOIN `mimes_plugins` ON (`mimes_plugins`.`plugin_id` = `plugins`.`id`) JOIN `mimes` ON (`mimes_plugins`.`mime_id` = `mimes`.`id`) WHERE plugins.sandbox_profile_id IS NULL AND `detection_type` IN ('version_available','*') AND `oses`.`name` IN ('windows nt 6.1','win','*') AND `platforms`.`app_id` IN ('{ec8030f7-c20a-464f-9b0e-13a3a9e97384}','*') AND `platforms`.`app_version` IN (20120905151427,'*') AND `platforms`.`app_release` IN ('15.0.1','*') AND `platforms`.`locale` IN ('en-US','*') AND `mimes`.`name` IN ('application/x-wlpg3-detect','application/x-wlpg-detect') GROUP BY `plugin_releases`.`id` ORDER BY `plugin_releases`.`version` DESC, `oses`.`name` DESC, `platforms`.`locale` DESC, `platforms`.`app_id` DESC, `platforms`.`app_version` DESC, `platforms`.`app_release` DESC TIME_MS: 7803 ROWS_SENT: 0 ROWS_EXAMINED: 6 ROWS_READ: 1 *************************** 9. row *************************** ID: 1666363983 USER: plugins_user HOST: 10.8.70.202:15974 DB: plugins_mozilla_org COMMAND: Query TIME: 7 STATE: removing tmp table INFO: SHOW COLUMNS FROM `plugins` TIME_MS: 6737 ROWS_SENT: 12 ROWS_EXAMINED: 12 ROWS_READ: 13 *************************** 10. row *************************** ID: 1666364699 USER: plugins_user HOST: 10.8.70.202:17171 DB: plugins_mozilla_org COMMAND: Query TIME: 6 STATE: removing tmp table INFO: SHOW COLUMNS FROM `plugins` TIME_MS: 5083 ROWS_SENT: 12 ROWS_EXAMINED: 12 ROWS_READ: 13 10 rows in set (5.49 sec)
Comment 14•12 years ago
|
||
Laura - is the SHOW COLUMNS stuff MySQL's magic, or is that actually in the code? Also was there a release or something at noon yesterday? graphs are showing an increase in overall traffic pretty sharply at noon yesterday, about an hour ramping up, and staying pretty high until just now.
Comment 15•12 years ago
|
||
Added plugins1.db (rw server) to ro pool in zeus for now.
Comment 16•12 years ago
|
||
Bug 798793 for additional web nodes
Comment 17•12 years ago
|
||
It would be nice if we could offload the traffic by caching on Zeus. The app is currently setting cache-control headers. ➜ ~ curl -I "https://plugins.mozilla.org/pfs/v2?appID=%7Bec8030f7-c20a-464f-9b0e-13a3a9e97384%7D&appRelease=15.0&appVersion=20120824154833&clientOS=Windows+NT+5.1&chromeLocale=de-DE&detection=version_available&mimetype=application%2Fx-director&callback=C" HTTP/1.1 200 OK Server: Apache X-Backend-Server: plugins2.webapp.phx1.mozilla.com Cache-Control: no-store, no-cache, must-revalidate, private Content-Type: text/javascript Date: Sat, 06 Oct 2012 15:43:43 GMT Expires: Thu, 19 Nov 1981 08:52:00 GMT Pragma: no-cache Transfer-Encoding: chunked Connection: Keep-Alive Set-Cookie: auth_profiles=deleted; expires=Fri, 07-Oct-2011 15:43:42 GMT; path=/; httponly Set-Cookie: kohanasession=jmsrlgqr6pn4pivudhrmfp4tb6; expires=Sat, 06-Oct-2012 17:43:43 GMT; path=/; HttpOnly Set-Cookie: kohanasession=jmsrlgqr6pn4pivudhrmfp4tb6; expires=Sat, 06-Oct-2012 17:43:43 GMT; path=/; httponly Set-Cookie: auth_profiles=deleted; expires=Fri, 07-Oct-2011 15:43:42 GMT; path=/; httponly Set-Cookie: auth_profiles=deleted; expires=Fri, 07-Oct-2011 15:43:42 GMT; path=/; httponly Set-Cookie: kohanasession_data=c2Vzc2lvbl9pZHxzOjI2OiJqbXNybGdxcjZwbjRwaXZ1ZGhybWZwNHRiNiI7dG90YWxfaGl0c3xpOjE7X2tmX2ZsYXNoX3xhOjA6e311c2VyX2FnZW50fHM6Nzg6ImN1cmwvNy4yNC4wICh4ODZfNjQtYXBwbGUtZGFyd2luMTIuMCkgbGliY3VybC83LjI0LjAgT3BlblNTTC8wLjkuOHIgemxpYi8xLjIuNSI7aXBfYWRkcmVzc3xzOjEzOiI3Mi40NC4yMTAuMjIwIjtsYXN0X2FjdGl2aXR5fGk6MTM0OTUzODIyMzs%3D; expires=Sat, 06-Oct-2012 17:43:43 GMT; path=/; httponly X-Frame-Options: DENY X-Powered-By: PHP/5.3.3 X-Cache-Info: not cacheable; response specified "Cache-Control: no-store"
Updated•12 years ago
|
Assignee: server-ops-webops → server-ops
Component: Server Operations: Web Operations → Server Operations: AMO Operations
QA Contact: cshields → oremj
Updated•12 years ago
|
Assignee: server-ops → jthomas
Comment 18•12 years ago
|
||
plugins[7-12].webapp.phx1 configured and added to Zeus plugins pool.
Comment 19•12 years ago
|
||
Sad again, all the webheads reached MaxClients. Zeus graphs show an increase in the last hours to almost 200k hits/minute, which is more than double of what it was 24h ago.
Comment 20•12 years ago
|
||
(In reply to Dumitru Gherman [:dumitru] from comment #19) > Sad again, all the webheads reached MaxClients. Zeus graphs show an increase > in the last hours to almost 200k hits/minute, which is more than double of > what it was 24h ago. Was there some type of change in the latest Firefox release or nightly that is causing this traffic increase? This seems very odd.
Comment 22•12 years ago
|
||
I doubled the MaxClients on all 11 webheads(!) (from 256 to 512), but per comment 17, some Zeus caching love would be really great.
Comment 23•12 years ago
|
||
The database servers are also hitting "Too many connections", obviously...
Updated•12 years ago
|
Severity: normal → critical
Comment 24•12 years ago
|
||
I've been stracing some processes. The SHOW COLUMNS queries do seem to be the culprit.
Comment 25•12 years ago
|
||
Connections to the zeus db vip where taking seconds to respond. I moved the db VIP over to pp-zlb08 for now and the problem has gone away.
Comment 26•12 years ago
|
||
happening again...
Comment 27•12 years ago
|
||
Traffic is way lower on plugins2: mysql> show status like 'max_used_connections'; +----------------------+-------+ | Variable_name | Value | +----------------------+-------+ | Max_used_connections | 1005 | +----------------------+-------+ 1 row in set (0.00 sec) Even though plugins1 is hovering around max_connections (5000). I've updated max_connections for both of them (dynamic and in puppet) to 10,000.
Comment 28•12 years ago
|
||
A lot of these http requests are the same. I modified the Cache-Control headers so these requests are being cached in Zeus and off load some of the traffic from our web nodes.
Comment 29•12 years ago
|
||
Traffic graphs for the past 7 days.
Comment 30•12 years ago
|
||
We are still experiencing increased traffic. Caching at Zeus is helping with performance and availability.
Severity: critical → normal
Assignee | ||
Comment 31•12 years ago
|
||
I would venture the spike was caused by this: https://blog.mozilla.org/addons/2012/10/05/prompting-our-users-to-update-their-plugins/ and a press push surrounding that.
Comment 32•12 years ago
|
||
See bug https://bugzilla.mozilla.org/show_bug.cgi?id=798995 - can we check if the traffic came from one IP or was malicious?
Comment 33•12 years ago
|
||
Laura - can you speak to all the "SHOW COLLUMNS" that seems to be coming in? is it from the code?
Comment 34•12 years ago
|
||
Getting a full thread from the general log: http://scabral.pastebin.mozilla.org/1860939 [10:08:31] <@laura> yeah, looks like framework games to me [10:08:38] <@laura> ugh, kohana, why can't I quit you [10:08:49] <@laura> I'll dig up the code that is the offender [10:08:53] <@laura> but it's not easy to fix [10:08:59] <sheeri> OK. Let me know if you want other threads [10:09:00] <@laura> not userspace code [10:09:07] <sheeri> :-\ [10:09:14] <sheeri> laura what's the framework? [10:09:24] <sheeri> I can look to see if there's a switch or something that can bypass that crap
Assignee | ||
Comment 35•12 years ago
|
||
Is there still a load problem that we need to be solving, or has it resolved? It's very unclear from the bug.
Comment 36•12 years ago
|
||
I changed the zeus config to make ro go only to plugins2, which has the mysql query cache on now. The zeus cache is still on, but we are already getting query cache hits.
Comment 37•12 years ago
|
||
We have a temporary fix in place, but we need a more permanent fix.
Assignee | ||
Comment 38•12 years ago
|
||
I'll see if I can solve the ORM issue, give me some time to test.
Comment 39•12 years ago
|
||
(In reply to Laura Thomson :laura from comment #31) > I would venture the spike was caused by this: > https://blog.mozilla.org/addons/2012/10/05/prompting-our-users-to-update- > their-plugins/ > and a press push surrounding that. bug 797378
Updated•12 years ago
|
Assignee: jthomas → laura
Component: Server Operations: AMO Operations → plugins.mozilla.org
Product: mozilla.org → Websites
QA Contact: oremj
Comment 40•12 years ago
|
||
Is there a plan/another bug for the ORM issue? Or has this been resolved/addressed?
Assignee | ||
Comment 41•12 years ago
|
||
There is a plan, and it may as well be this bug. I have not had time yet to address this, sorry, but have other upcoming work on plugins (in bug 811841) so I will address it then.
Updated•12 years ago
|
Summary: plugins.mozilla.org briefly down, SSL alerts → plugins.mozilla.org and kohana
Updated•11 years ago
|
Assignee: laura → bsavage
Updated•10 years ago
|
Assignee: bsavage → laura
Comment 42•9 years ago
|
||
Does plugins still use kohana? If so, are you (Laura) still the right contact for this issue?
Comment 43•9 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #42) > Does plugins still use kohana? If so, are you (Laura) still the right > contact for this issue? Hey Sheeri, for the moment unfortunately yes it does. Ricky Rosario is going to rewrite this service in Django and we are most likely going to move the whole thing to AWS. I am going to ping both Ricky and Lonnen for more info here.
Flags: needinfo?(rrosario)
Flags: needinfo?(chris.lonnen)
Comment 44•9 years ago
|
||
OK, great. Actuall
Comment 45•9 years ago
|
||
The intent is a whole rewrite into python this Q. We'll deploy it in *the cloud* and then this can be decomm'd. I'm not sure what it means for this bug.
Flags: needinfo?(rrosario)
Flags: needinfo?(chris.lonnen)
Comment 46•2 years ago
|
||
PluginCheck is no longer supported
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•