Closed Bug 572174 Opened 14 years ago Closed 13 years ago

Every Camino 2.0.3 "null signature" crash since 11 Jun is missing the actual report

Categories

(Socorro :: General, task)

task
Not set
critical

Tracking

(Not tracked)

VERIFIED DUPLICATE of bug 607810

People

(Reporter: alqahira, Assigned: rhelmer)

References

()

Details

I was checking the Camino 2.0.3 topcrashers report today, and I noticed that the "null signature" crash was still ranked high and was increasing at what seemed like an abnormally high rate.

I had last looked at the null sig reports on 08 Jun, so I went to look at the new crashes since then (since, previously, many of them were simply incorrectly-categorized unprocessed reports that processed just fine when accessed).  

The two reports from 10 Jun showed up, though not with the expected "processor could not figure out which thread crashed" message (instead: "WARNING: Json file missing Add-ons; expected string or buffer").

Every report after that, however, beginning with the 11 Jun crash, triggered the "wait for us to fetch it" UI, and, eventually, failed with "Oh Noes! This archived report could not be located."

I have no idea what's going on, but it's disconcerting that the null signature crashes continue to come in at a higher-than-normal rate for Camino, and, worse, the reports don't even actually exist!
Aravind, can you please take a look at this and confirm the data is missing?
Severity: normal → critical
Note also that this continues to be the case with all new "null signature" crashes that are coming in.

However (worse?), the new crashes I looked at this morning from the null signature report (in the URL field)

bp-d2b3f0c5-1a4d-42e8-b337-b93f82100617
bp-9d14291b-06e0-4b85-8a50-1530c2100615
bp-7cfbeea4-6182-4def-a215-7b4f72100617
bp-1434e4e0-5950-47c1-a1fe-87fdd2100616

are not showing up on the null signature report anymore this afternoon!  (The reports for those 4 crashes themselves are still missing/"Oh Noes! This archived report could not be located.")
While the null signature ranking is not abnormally high any more, this bug continues to exist.

bp-6f4b710e-4052-42f8-861c-a834f2100712
bp-2495a597-8a90-491c-9604-14d532100702
bp-fd7659f8-42cc-476d-a925-1c08e2100707
bp-e6804567-e8c0-4705-890c-8cb932100712

are some of the reports I looked at tonight; the table view[1] showed them as having user comments, so I decided to look at them, but the other dozen or so reports I looked at also all came up as "Oh Noes! This archived report could not be located." too.

[1] https://crash-stats.mozilla.com/report/list?version=Camino%3A2.0.3&build_id=&query_search=signature&query_type=exact&query=&date=2010-07-13%2019%3A00%3A00&range_value=2&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=&signature=&missing_sig=NULL&page=1
Ted, is this the same malformed minidump issue?
Probably, yes, but the fact that you can't actually get to the reports anymore is new.
Target Milestone: --- → 1.7.7
Xavier, can you see if these crashes are in HBase?
Assignee: nobody → xstevens
Laura,

I didn't check all of the above but the crash dumps are in HBase for everyone I checked so far.  I also did some stats over the past week on Camino 2.0.3.

20110118	Camino	2.0.3	submission	10
20110119	Camino	2.0.3	submission	9
20110120	Camino	2.0.3	submission	9
20110121	Camino	2.0.3	submission	9
20110122	Camino	2.0.3	submission	6
20110123	Camino	2.0.3	submission	11
20110124	Camino	2.0.3	submission	8
6 months later, not likely to be many Camino 2.0.3 crashes any more.

However, this problem still exists with 2.0.6 "null signature" crashes (though there aren't nearly as many as there were for 2.0.3 when I filed this):

https://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2011-01-25%2007%3A00%3A00&signature=&missing_sig=NULL&version=Camino%3A2.0.6

As in comment 3, bp-882a8731-c85f-4b1f-9b16-a438d2110118 and bp-6756fb88-38b7-450e-ad18-8dd2e2110120 show up in the table view as having comments.
Assignee: xstevens → rhelmer
Looking at http://crash-stats.mozilla.com/report/pending/882a8731-c85f-4b1f-9b16-a438d2110118 it looks like the system tries to do priority processing and fails:

"""
Queue Info

ID
    882a8731-c85f-4b1f-9b16-a438d2110118
Time Queued
    2011-01-26 14:50:58.305783
Time Started
    2011-01-26 14:51:03.442994
Message
    INFO: This record is a replacement for a previous record with the same uuid; WARNING: Json file missing Add-ons; /data/socorro/stackwalk/bin/stackwalk.sh returned no header lines for reportid: 211828910; No thread was identified as the cause of the crash; No signature could be created because we do not know which thread crashed; /data/socorro/stackwalk/bin/stackwalk.sh returned no frame lines for reportid: 211828910; /data/socorro/stackwalk/bin/stackwalk.sh failed with return code 1 when processing dump 882a8731-c85f-4b1f-9b16-a438d2110118 
"""

If I try to fetch raw JSON or dump for this OOID, that does work:

https://crash-stats.mozilla.com/rawdumps/882a8731-c85f-4b1f-9b16-a438d2110118.json
https://crash-stats.mozilla.com/rawdumps/882a8731-c85f-4b1f-9b16-a438d2110118.dump

After a little more investigation and discussion in IRC, we can see that an entry was made to the reports table but the processed_data:json column in HBase was not. This means that this OOID will appear in searches and reports like TCBS, but any attempts to retrieve the processed report via mware will fail (and cause the job to be submitted for priority processing, which will fail).

(In reply to comment #5)
> Probably, yes, but the fact that you can't actually get to the reports anymore
> is new.

What did we used to do in this situation and/or what would be preferred?

deinspanjer suggests:

1. set processing state to Y even if it failed
2. store a new column in the crash_reports HBase table 
                     called processed_data:notes with the notes so we can get 
                     them if we need.
3. Make priority processing not blindly reprocess if it 
                     previously failed
4. Make the UI aware of points 1-3
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Verifying as a duplicate of bug 607810 per comment 10 and a conversation with stephend.
Status: RESOLVED → VERIFIED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.