Closed
Bug 599195
Opened 14 years ago
Closed 14 years ago
Correlation reports broken the last couple of days...
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jst, Assigned: aravind)
Details
It's been a few days at least since I noticed this, and I don't know how long before then this broke, but we no longer get correlation reports (extensions, modules, core count, etc) for crashes on crash-stats.mozilla.com.
Reporter | ||
Comment 1•14 years ago
|
||
This makes it hard to investigate the reasons for many of our crashes, so raising importance of this bug.
Severity: normal → blocker
Comment 2•14 years ago
|
||
Can you provide URLS or Product/Versions? This feature depends on the files in http://people.mozilla.com/crash_analysis/20100923/ Example: This Fx 4.0b6 crash has correlations http://crash-stats.mozilla.com/report/index/01951b76-7a6e-4353-a446-1ecc42100923 This Fx 3.6.9 crash report does not http://crash-stats.mozilla.com/report/index/86a3b4d7-0a6a-4d75-9904-42a7d2100917 This may be a simple matter of getting the Fx versions you need generated in p.m.c/crash_analysis (IT Bug CC aravind) A more robust system is under development: (backend Bug#554373)
Comment 3•14 years ago
|
||
Does/should this cover the empty correlations found here? http://crash-stats.mozilla.com/topcrasher/byversion/Firefox/3.6.9, or should that be a separate bug?
Comment 4•14 years ago
|
||
(In reply to comment #3) This type of issue has been reported before, but it's not really a bug. Firefox 3.6.9 correlation reports weren't generated for today. (nor for a while). http://people.mozilla.com/crash_analysis/20101012/ The real bug is that we haven't replaced the crash-analysis hack with the hadoop backend. I think that is in the works
Reporter | ||
Comment 5•14 years ago
|
||
Still seems broken (3 weeks later), correlation reports are *extremely* valuable, not having them for long periods of time is not acceptable.
Comment 6•14 years ago
|
||
(In reply to comment #5) If this is for 3.6.9, then we just have to ask IT (or whoever runs the dbaron reports on people) to add that version. I'm not sure of the bug# or schedule for adding Bug#554373 (Hadoop proper fix) to the frontend.
Reporter | ||
Comment 7•14 years ago
|
||
What I've been seeing is more 4.0 beta stuff than anything else, but that doesn't mean it's a problem only for 4.0 beta, that's just what I've run into many many times recently. To name a few, have a look at: http://crash-stats.mozilla.com/report/index/e0b5a37b-7e76-41b4-8a49-020e02100927 http://crash-stats.mozilla.com/report/index/8280a8ff-067d-45c2-9c7d-ee6792100922
Comment 8•14 years ago
|
||
Yes, I don't see 4.0b7pre in http://people.mozilla.com/crash_analysis/20101013/. I'll ping IT to find out who can fix this.
Comment 9•14 years ago
|
||
@aravind: please add 4.0b7pre and 3.6.9 to http://people.mozilla.com/crash_analysis/20101013/
Assignee: nobody → server-ops
Component: Socorro → Server Operations
Product: Webtools → mozilla.org
QA Contact: socorro → mrz
Version: Trunk → other
Updated•14 years ago
|
Assignee: server-ops → aravind
Assignee | ||
Comment 10•14 years ago
|
||
The problem here is that the hbase connection to pull out the crashes is being flaky, Here is the log from the python script. DEBUG Ooid: "2f53ad4d-74d8-43df-8c2c-08aa82101013" DEBUG MainThread - retry_wrapper: get_processed_json_as_string, try number 1 DEBUG MainThread - retry_wrapper: handled exception, timed out DEBUG MainThread - retry_wrapper: about to retry connection DEBUG make_connection, timeout = 5000 DEBUG connection successful DEBUG MainThread - retry_wrapper: get_processed_json_as_string, try number 2 DEBUG MainThread - retry_wrapper: handled exception, timed out Traceback (most recent call last): File "/data/breakpad/processor/socorro/hbase/hbaseClient.py", line 889, in ? connection.export_jsonz_tarball_for_ooids(*args) File "/data/breakpad/processor/socorro/hbase/hbaseClient.py", line 493, in export_jsonz_tarball_for_ooids json = self.get_processed_json_as_string(ooid) File "/data/breakpad/processor/socorro/hbase/hbaseClient.py", line 143, in f result = fn(self, *args, **kwargs) File "/data/breakpad/processor/socorro/hbase/hbaseClient.py", line 401, in get_processed_json_as_string listOfRawRows = self.client.getRowWithColumns('crash_reports',row_id,['processed_data:json']) File "/data/breakpad/processor/thirdparty/hbase/hbase.py", line 1116, in getRowWithColumns return self.recv_getRowWithColumns() File "/data/breakpad/processor/thirdparty/hbase/hbase.py", line 1129, in recv_getRowWithColumns (fname, mtype, rseqid) = self._iprot.readMessageBegin() File "/data/breakpad/processor/thirdparty/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin sz = self.readI32() File "/data/breakpad/processor/thirdparty/thrift/protocol/TBinaryProtocol.py", line 203, in readI32 buff = self.trans.readAll(4) File "/data/breakpad/processor/thirdparty/thrift/transport/TTransport.py", line 58, in readAll chunk = self.read(sz-have) File "/data/breakpad/processor/thirdparty/thrift/transport/TTransport.py", line 155, in read self.__rbuf = StringIO(self.__trans.read(max(sz, self.DEFAULT_BUFFER))) File "/data/breakpad/processor/thirdparty/thrift/transport/TSocket.py", line 92, in read buff = self.handle.recv(sz) __main__.FatalException: the connection is not viable. retries fail: I increased the hbase timeout to 60s. Also, One thing to note here is that in the past choffman had asked me to generate reports for the two most active firefox beta versions. Here is the count from the last 24 hours. version | counts ----------+-------- 4.0b6 | 25632 4.0b4 | 1857 4.0b8pre | 1834 4.0b5 | 1647 4.0b1 | 1220 4.0b2 | 1185 4.0b3 | 869 4.0b7pre | 776 3.1b3 | 706 3.6b4 | 542 (10 rows) Did we want to change the script to instead generate reports for specific versions?
Comment 11•14 years ago
|
||
when is this process scheduled to run?
Assignee | ||
Comment 12•14 years ago
|
||
(In reply to comment #11) > when is this process scheduled to run? 5:00 AM.
Assignee | ||
Comment 13•14 years ago
|
||
Increasing the timeout seems to have helped. I also added a manual override to include 4.0b7pre and 3.6.9 in the generated reports.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•