Closed Bug 631051 Opened 14 years ago Closed 14 years ago

Should be able to search based on Crash Reason

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: jrmuizel, Assigned: rhelmer)

References

Details

Attachments

(2 files)

I was looking at a weird crash where the reason was EXCEPTION_INVALID_INSTRUCTION. I've seen weird crashes that crashed with EXCEPTION_INVALID_INSTRUCTION before and it would be nice to be able to search for them.
Ryan: can you add this while fixing the other search bug? If not, it can wait.
By "other search bug" I mean bug 613379. What, bugzilla can't read minds?
It will require a different fix than 613379, but I can toss this bug into the mix for 1.7.7.
Assignee: nobody → ryan
Status: NEW → ASSIGNED
Target Milestone: --- → 1.7.7
Attached patch Patch for 631051Splinter Review
Updating the query parameters and advanced search page to include "reason" as a searchable item. Also fixed some spacing discrepancies as I ran into them.
Attachment #509516 - Flags: review?(rhelmer)
Attachment #509516 - Flags: review?(rhelmer) → review?(laura)
Attachment #509516 - Flags: review?(laura) → review+
Thanks Laura. Committing for 1.7.7. Will be available for testing on stage in the Advanced Search page. == Sending webapp-php/application/controllers/query.php Sending webapp-php/application/libraries/MY_SearchReportHelper.php Sending webapp-php/application/models/common.php Sending webapp-php/application/views/common/query_form.php Sending webapp-php/application/views/query/query.php Transmitting file data ..... Committed revision 2916.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
I should point out that 'reason' is not an indexed column within Postgres. If you want this to work with any sort of speed, we'll probably have to rectify that. And remember that inexact matches ('like' or 'contains') will always be slow because they will cause a scan of an entire table. I'm not sure how the new hardware/postgres will perform.
Reopening this bug because it cannot go live in 1.7.7 without performance testing. Likely we will need to add a full-text-index on the 'reason' column. What do strings in that column look like?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Some examples: EXCEPTION_BREAKPOINT EXCEPTION_ACCESS_VIOLATION_EXEC EXCEPTION_ACCESS_VIOLATION_READ EXCEPTION_ACCESS_VIOLATION_WRITE EXCEPTION_INVALID_HANDLE 0xc000070a / 0x00000000 e.g. it's mostly a standard set of strings, with some that look like addresses. jrmuizel may provide more insight.
The addresses are just "unknown" values in the fields. The fields are numeric, and minidump_stackwalk interprets known values and prints their names, but we can wind up with oddball values in there. I can't imagine anyone searching for substrings here, it's likely to be "show me all the EXCEPTION_INVALID_INSTRUCTION crashes" as mentioned in comment 0.
OK, a regular btree index should work here. Is there a strong reason to make it case-insensitive?
(In reply to comment #10) > OK, a regular btree index should work here. Is there a strong reason to make > it case-insensitive? I think a case-sensitive regular btree index would be perfectly fine. I can't think of any use cases for anything else.
There aren't many crash reasons, so a bitmap index might be more efficient. Anyway, it can certainly be case-sensitive.
Here are sample queries that can be used for testing. The following 2 queries are run every time a search is performed using the Contains option for the crash Reason. 1 week is the default query time range for the quick search. 12 weeks is the maximum query time range for logged in users. == SELECT COUNT(DISTINCT reports.signature) as count FROM reports WHERE (reports.product = 'Firefox') AND (reports.os_name = 'Windows NT') AND reports.reason LIKE '%EXCEPTION_ACCESS_VIOLATION_READ%' AND reports.date_processed BETWEEN CAST('02/11/2011 14:29:24' AS TIMESTAMP WITHOUT TIME ZONE) - CAST('1 weeks' AS INTERVAL) AND CAST('02/11/2011 14:29:24' AS TIMESTAMP WITHOUT TIME ZONE) == SELECT reports.signature, count(reports.id), count(CASE WHEN (reports.os_name = 'Windows NT') THEN 1 END) AS is_windows, count(CASE WHEN (reports.os_name = 'Mac OS X') THEN 1 END) AS is_mac, count(CASE WHEN (reports.os_name = 'Linux') THEN 1 END) AS is_linux, count(CASE WHEN (reports.os_name = 'Solaris') THEN 1 END) AS is_solaris, SUM (CASE WHEN hangid IS NULL THEN 0 ELSE 1 END) AS numhang, SUM (CASE WHEN process_type IS NULL THEN 0 ELSE 1 END) AS numplugin FROM reports WHERE (reports.product = 'Firefox') AND (reports.os_name = 'Windows NT') AND reports.reason LIKE '%EXCEPTION_ACCESS_VIOLATION_READ%' AND reports.date_processed BETWEEN CAST('02/11/2011 14:29:24' AS TIMESTAMP WITHOUT TIME ZONE) - CAST('1 weeks' AS INTERVAL) AND CAST('02/11/2011 14:29:24' AS TIMESTAMP WITHOUT TIME ZONE) GROUP BY reports.signature ORDER BY count(reports.id) DESC LIMIT 300 OFFSET 100
Ryan, Oh. Those are unanchored text searches; I can't index those. Also, if you want to do case_insensitive, use ILIKE instead of LIKE. At some point in the future we'll convert most of the TEXT fields to CITEXT, and that won't be an issue, but that day is not today. If the crash reason is an exact match most of the time, though, why are you doing it unanchored? Anyway, the same applies to this as the previous bugs: if the default is for the user to search on ( product, os_name, time limit ) then I don't care about other criteria for performance. Oh, and FYI: PostgreSQL doesn't do separate bitmap indexes. Instead, btree indexes are converted to bitmap indexes in memory if needed.
Oh, if the crash is an exact match most of the time, then I'll remove the option of using a LIKE statement and only support an exact match. I've updated the code accordingly. Josh, let me know if you need anything else from me to button up this ticket. == Sending application/models/common.php Sending application/views/common/query_form.php Transmitting file data .. Committed revision 2929.
If you're replacing "LIKE" with "=", then let's leave this bug open so that I know that I need to add an index to the "reason" column. Assigigning the bug to myself for that reason.
Assignee: ryan → josh
I think we should only allow using exact matches, as mentioned earlier in the bug. There's no compelling reason for searching for partial matches.
Ok, here's the code to add the new index: DO $f$ DECLARE part_name TEXT; part_week TIMESTAMP := '2010-06-07'; BEGIN set maintenance_work_mem = '1GB'; WHILE part_week < now() LOOP part_name := 'reports_' || to_char(part_week,'YYYYMMDD'); EXECUTE 'CREATE INDEX ' || part_name || '_reason ON ' || part_name || '(reason);'; part_week := part_week + INTERVAL '7 days'; END LOOP; END;$f$; note that this will take quite a while (hours) and do quite a bit of I/O. So it needs to be run after-hours.
Still TODO on this bug: Add the index to the cron job which creates new reports partitions.
Rob, Please add the following to the ReportPartitionsTemplate in schema.py: CREATE INDEX %(partitionName)s_reason ON %(partitionName)s (reason);
Assignee: josh → rhelmer
Committed revision 2956.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
Flags: in-litmus?
Hardware: x86 → All
Verified. Searched on: (comment 0 and comment 8) EXCEPTION_INVALID_INSTRUCTION EXCEPTION_BREAKPOINT EXCEPTION_ACCESS_VIOLATION_EXEC EXCEPTION_ACCESS_VIOLATION_READ EXCEPTION_ACCESS_VIOLATION_WRITE EXCEPTION_INVALID_HANDLE
Status: RESOLVED → VERIFIED
Component: Socorro → General
Product: Webtools → Socorro
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: