Closed Bug 652829 Opened 15 years ago Closed 9 years ago

crash marked as duplicate of itself. need to identify leader crash.

Categories

(Socorro :: Webapp, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kairo, Unassigned)

References

()

Details

From IRC yesterday: <marcia> https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2011-04-25%2011%3A00%3A00&signature=nsVariant%3A%3ASetAsInt64%28__int64%29&version=Firefox%3A6.0a1 is odd since all the reports are marked as dup Looking through those, I found that https://crash-stats.mozilla.com/report/8007e3fc-fa7d-43ae-b476-064532110424 (which they're all duped to) seems to be marked a duplicate of itself, which looks strange. As a small UI nit, I saw that the links on the "dup" are slightly different then the links in the "Date" column, meaning that the browser doesn't mark the already visited report as its link differs (Date column goes to /report/index/ while Dup column goes to /report/ instead). Should I file an extra bug on that?
Assignee: nobody → mpressman
Target Milestone: --- → 2.1
Rob, This is intentional. The selection of a "leader crash" in each duplicate set is arbitrary. As a result, the "leader" crash is *always* marked as a duplicate of itself. Otherwise, we wouldn't know that the leader was part of a duplicate set. However, this is clearly confusing to the users. I don't think we should change the data, but we might want to change the way we display it to make it less confusing. Suggestions?
Assignee: mpressman → nobody
Target Milestone: 2.1 → ---
Assignee: nobody → mpressman
Target Milestone: --- → 2.1
KaiRo, Oh, and the link thing is a separate bug.
(In reply to comment #3) > This is intentional. Ouch. That means that when we exclude dupe from results in the most logical way, i.e. everything that has a non-zero "duplicate of", we exclude not just the ones that are dupes of something else, but even the leaders. Sounds strange to me.
Target Milestone: 2.1 → 2.2
Robert, No, it's very simple. You include everything: WHERE ( duplicate_of IS NULL OR uuid = duplicate_of ) or even do the join that way: LEFT OUTER JOIN reports_duplicates ON ( reports.uuid = reports_duplicates.uuid AND reports_duplicates.uuid <> reports_duplicates.duplicate_of ) ... WHERE reports_duplicates.uuid IS NULL
There are a couple reasons why we want to have duplicate_of set for the leader in the data structure: 1) as mentioned before, the choice of leader is arbitrary, and if we ever decided to reprocess duplicates some duplicate groups might not have the same leader the second time we generate the data. 2) otherwise there's no way to know that the leader is part of a duplicate group except checking to see if it has "followers" which is both slower and more complex. 3) It makes the detection of duplicates job faster, because we can do one check to see if the report is already part of a duplicates group instead of two. To repeat: my assessment is that this is a UI bug and not a database bug. Who works on the interface and can change how this is displayed?
I think this is a webapp only change. When we come to filter dupes, we'll need that leader info. (That already has a bug.) In the leader, how do you want it marked? "has dupes"?
(In reply to comment #8) > In the leader, how do you want it marked? "has dupes"? As long as it's different to the "followers", anything is OK for me.
Target Milestone: 2.2 → 2.3
Target Milestone: 2.3 → 2.4
Target Milestone: 2.4 → ---
Component: Socorro → General
Product: Webtools → Socorro
I may be overstating the case, but I am encountering many signatures where most of the crashes are marked duplicate, and so identifying the unique crashes in the list is quite problematic. OT question ... to what extent is duplicates counted or not counted in topcrash numbers?
Severity: normal → major
Summary: crash marked as duplicate of itself → crash marked as duplicate of itself. need to identify leader crash.
Wayne, All possible duplicates are counted in topcrash numbers. We're still waiting for users to iterate over our duplicate-identification algo, help us perfect it, and then we can stop counting them.
Are we still seeing this? This appears be stagnant, please reopen with more information if this was not accounted for.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
I am not aware anything has changed. Since when do we close stagnant bugs fixed?
I wasn't sure if this was completed or not. Since this was marked for a milestone sometime ago. Based on your reply, I guess it isn't and have reopened the bug and will investigate to get information as to where we are with this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I believe this is really a webapp bug (show the leader), moving and reassigning.
Assignee: mpressman → nobody
Component: General → Webapp
QA Contact: socorro → webapp
We'll want this until we actually delete dupes. Can do it in the webapp: leader is the one with the lowest date
We no longer have dupes in the report index or in SuperSearch which is what the new reports are based on.
Status: REOPENED → RESOLVED
Closed: 13 years ago9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.