All users were logged out of Bugzilla on October 13th, 2018
First symptom was a backlog of compactions. Turned off collection and processing for a while, so it caught up, but now bogged down again. Stack has recommended config changes, dre will document them here. We'll do those and a cluster restart so they can take effect.
Good possiblity for underlying cause is the fact that in the last month we have significantly increased the amount of crash submissions coming in: http://www.screencast.com/users/DEinspanjer/folders/Jing/media/c2461be5-3ccf-4170-9677-b6a51572ef51 The cluster was stopped and restarted with the two new config settings: <!-- <property> <name>hbase.regionserver.hlog.blocksize</name> <value>67108864</value> <description>Block size for HLog files. To minimize potential data loss, the size should be (avg key length) * (avg value length) * flushlogentries. (stack@stumbleupon 2010-09-30: HLogs are rolling too fast again and this is the likely cause of increased IO stress on the cluster causing slowdowns. Changing it back to the "Default" of 64MB. </description> </property> --> <property> <name>hbase.hregion.memstore.flush.size</name> <value>134217728</value> <description> Memstore will be flushed to disk if size of the memstore exceeds this number of bytes. Value is checked by a thread that runs every hbase.server.thread.wakefrequency. </description> </property>
Things appear to be running smoothly again. Closing for now.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.