Closed Bug 216321 Opened 22 years ago Closed 16 years ago

collectstats.pl randomly misses duplicates in trees

Categories

(Bugzilla :: Reporting/Charting, defect, P1)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: myk, Unassigned)

Details

collectstats.pl tries to count the total number of duplicates for original bugs by repeatedly aggregating every child bug's count of duplicates with its parent bug's count until the aggregated count reaches the original bug. After aggregating a child bug's count with its parent's, the script no longer looks at that child bug or any of its children (flagging the child as "done" by setting its count to zero), but it doesn't ensure that the child no longer has any uncounted children. Thus, depending on the order in which the bugs are processed (pseudo-random hash access), a child bug's count may be aggregated with its parent's count before the child bug's children's counts have a chance to be aggregated, after which those counts will never be aggregated (because the parent bug will be marked as "done") and the total aggregate count for the original bug will miss them. Line 223 starts the code that contains this flaw: http://lxr.mozilla.org/mozilla/source/webtools/bugzilla/collectstats.pl#213 Perhaps the solution is to aggregate only duplicates with no children in each iteration, thus working up from the bottom of the tree and ensuring that no child gets aggregated with its parent until its children have been accounted for.
Here's the problem, though. With the way the data structures are set up at the moment, finding out if a bug is a "leaf" in the dupe tree you are collapsing is equivalent to the operation "are there any members of the dupes hash whose value is X"? This is a fairly expensive operation. Gerv
myk: ping? Gerv
Priority: -- → P1
Myk: trying to understand this code bends my brain, but I've made a big tree of duplicates in my local install, which is three levels deep, and collectstats.pl gets it right every time. Can you point to an example on b.m.o. or landfill where this is wrong? Gerv
QA Contact: mattyt-bugzilla → default-qa
Assignee: gerv → charting
collectstats.pl no longer collects data about duplicates. This is done by duplicates.cgi on the fly. And by looking at the code there, this is no longer a problem.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.