Closed
Bug 721169
Opened 13 years ago
Closed 13 years ago
Inconsistent numbers across TCBS, search and report/list
Categories
(Socorro Graveyard :: Middleware, defect)
Socorro Graveyard
Middleware
Tracking
(Not tracked)
RESOLVED
FIXED
10
People
(Reporter: adrian, Assigned: adrian)
References
Details
We still have non matching numbers between TCBS and search / report/list. Those last two are running the same code base which mean they are consistent, but TCBS shows different numbers (often less crashes) for identical periods.
This bug is to investigate and fix those inconsistencies, wherever they are.
| Assignee | ||
Updated•13 years ago
|
Assignee: nobody → adrian
| Assignee | ||
Comment 1•13 years ago
|
||
Copy / pasting an email from Josh:
> SELECT count(*) FROM reports
>> WHERE date_processed BETWEEN '2012-01-15T00:00:00+00:00' AND
>> '2012-01-16T00:00:00+00:00';
>>
>> # Returns 332371 results
>>
>>
>> #..............................................................................
>> SELECT count(*) FROM reports_clean
>> WHERE date_processed BETWEEN '2012-01-15T00:00:00+00:00' AND
>> '2012-01-16T00:00:00+00:00';
>>
>> # Returns 244635 results
This difference is expected. Reports contains crashes which
reports_clean does not, mainly:
-- FF/TB/Fennec versions before 5.0
-- garbaged crashes where no product/version can be discerned
Try comparing counts for a specific version of FF.
>> #..............................................................................
>> SELECT sum(report_count) FROM tcbs
>> WHERE report_date BETWEEN '2012-01-15T00:00:00+00:00' AND
>> '2012-01-16T00:00:00+00:00';
>>
>> # Returns 218710 results
This is more interesting; I'm not sure why TCBS would have a lower count
than reports_clean. When we did 2.4 I tested specific product/versions
and results were correct, but I didn't compare grand totals.
If you have time, it would be worth finding specific
product/version/signature combos where the counts don't match between
reports_clean and TCBS as test cases.
| Assignee | ||
Comment 2•13 years ago
|
||
So, I tested with a specific product and version, and a smaller date range.
> SELECT count(*) FROM reports
> WHERE date_processed BETWEEN '2012-01-15T00:00:00+00:00' AND '2012-01-15T01:00:00+00:00'
> AND product=E'Firefox' AND version=E'12.0a1';
>
> # Returns 38 results
>
> #..............................................................................
> SELECT count(*) FROM reports_clean
> JOIN product_versions pv USING (product_version_id)
> WHERE date_processed BETWEEN '2012-01-15T00:00:00+00:00' AND '2012-01-15T01:00:00+00:00'
> AND product_name=E'Firefox' AND version_string=E'12.0a1';
>
> # Returns 34 results
And then I tried to display the difference between those two queries. They have the exact same distinct signatures but different counts in some cases:
> SELECT signature, count(*) FROM reports
> WHERE date_processed BETWEEN '2012-01-15T00:00:00+00:00' AND '2012-01-15T01:00:00+00:00'
> AND product=E'Firefox' AND version=E'12.0a1'
> GROUP BY signature
> EXCEPT
> SELECT signature, count(*) FROM reports_clean
> JOIN signatures USING (signature_id)
> JOIN product_versions pv USING (product_version_id)
> WHERE date_processed BETWEEN '2012-01-15T00:00:00+00:00' AND '2012-01-15T01:00:00+00:00'
> AND product_name=E'Firefox' AND version_string=E'12.0a1'
> GROUP BY signature;
>
> # Returns 4 results
signature | count
-----------------------------------------------------------------------------------------------------------+-------
kernelbase.dll@0xcacd | 2
JSFunction::trace(JSTracer*) | 2
mozalloc_abort(char const* const) | NS_DebugBreak_P | nsIFrame::GetOffsetToCrossDoc(nsIFrame const*, int) | 2
JSObject::getGeneric(JSContext*, __int64, JS::Value*) | 2
(4 rows)
Here are the four concerned UUIDs:
bf504428-6e50-42ff-8298-3e3942120115
56f092a9-6c87-4e39-ad20-d20f82120115
520b5c9a-6589-453a-9189-dc4232120115
ef545c0a-0d46-4e6e-9384-af5ad2120115
Trying to hit them on prod gives me a "We couldn't find the OOID you're after." error. Does this mean anything? Josh, do you have any idea of what's going on?
| Assignee | ||
Comment 3•13 years ago
|
||
@Josh: You asked me to find out why there was a difference in the total number of crashes in reports_clean and in tcbs. Here is what I found:
SELECT product_name, version_string
FROM reports_clean
JOIN product_versions pv USING (product_version_id)
WHERE date_processed BETWEEN '2012-01-15T00:00:00+00:00' AND '2012-01-16T00:00:00+00:00'
GROUP BY product_name, version_string
EXCEPT
SELECT product_name, version_string
FROM tcbs
JOIN product_versions pv USING (product_version_id)
WHERE report_date BETWEEN '2012-01-15T00:00:00+00:00' AND '2012-01-16T00:00:00+00:00'
GROUP BY product_name, version_string
ORDER BY product_name, version_string;
# Returns 79 results
product_name | version_string
--------------+----------------
Fennec | 10.0a1
Fennec | 5.0
Fennec | 5.0b2
Fennec | 5.0b7
Fennec | 6.0
Fennec | 6.0.2
Fennec | 6.0b1
Fennec | 6.0b2
Fennec | 6.0b3
Fennec | 6.0b4
Fennec | 6.0b5
Fennec | 6.0b6
Fennec | 7.0
Fennec | 7.0b1
Fennec | 7.0b4
Fennec | 7.0b5
Fennec | 8.0b1
Fennec | 8.0b2
Fennec | 8.0b3
Fennec | 8.0b6
Fennec | 9.0b1
Firefox | 10.0a1
Firefox | 10.0a2
Firefox | 11.0a1
Firefox | 5.0
Firefox | 5.0.1
Firefox | 5.0b1
Firefox | 5.0b2
Firefox | 5.0b3
Firefox | 5.0b4
Firefox | 5.0b5
Firefox | 5.0b6
Firefox | 5.0b7
Firefox | 5.0(beta)
Firefox | 6.0
Firefox | 6.0.1
Firefox | 6.0.2
Firefox | 6.0b1
Firefox | 6.0b2
Firefox | 6.0b3
Firefox | 6.0b4
Firefox | 6.0b5
Firefox | 6.0(beta)
Firefox | 7.0
Firefox | 7.0.1(beta)
Firefox | 7.0b1
Firefox | 7.0b2
Firefox | 7.0b3
Firefox | 7.0b4
Firefox | 7.0b5
Firefox | 7.0b6
Firefox | 7.0(beta)
Firefox | 8.0a2
Firefox | 8.0b1
Firefox | 8.0b2
Firefox | 8.0b3
Firefox | 8.0b4
Firefox | 8.0b5
Firefox | 8.0b6
Firefox | 8.0(beta)
Firefox | 9.0a1
Firefox | 9.0a2
Firefox | 9.0b1
SeaMonkey | 2.3
SeaMonkey | 2.3.1
SeaMonkey | 2.3.3
SeaMonkey | 2.3b1
SeaMonkey | 2.3b2
SeaMonkey | 2.3b3
SeaMonkey | 2.4b2
SeaMonkey | 2.5b3
SeaMonkey | 2.5b4
Thunderbird | 10.0a1
Thunderbird | 6.0
Thunderbird | 6.0.1
Thunderbird | 6.0.2
Thunderbird | 7.0b3
Thunderbird | 8.0b1
Thunderbird | 8.0b5
(79 rows)
I suppose those product/version pairs are not concerned by TCBS?
Comment 4•13 years ago
|
||
Adrian,
Regarding the 4 UUIDs you can't hit in prod: see if you can get lars or rhelmer to help you as to why this might be. I suspect that whatever prevents you from viewing them is also prevening me from adding them to reports_clean.
As for the query regarding the differences between reports_clean and TCBS, what you've demonstrated is that differences in counts are not at all related to product or version. So something else is making up the difference.
Thanks for doing this!
| Assignee | ||
Comment 5•13 years ago
|
||
Hey Josh,
I may have missed something, but it seems to me that I demonstrated that TCBS is missing some products / versions from reports_clean. Maybe those queries will make it clearer:
SELECT count(*)
FROM (SELECT product_name, version_string
FROM reports_clean
JOIN product_versions pv USING (product_version_id)
WHERE date_processed BETWEEN '2012-01-15T00:00:00+00:00' AND '2012-01-16T00:00:00+00:00'
GROUP BY product_name, version_string) products_versions;
# Returns 139
SELECT count(*)
FROM (SELECT product_name, version_string
FROM tcbs
JOIN product_versions pv USING (product_version_id)
WHERE report_date BETWEEN '2012-01-15T00:00:00+00:00' AND '2012-01-16T00:00:00+00:00'
GROUP BY product_name, version_string) products_versions;
# Returns 60
There is a difference of 79 pairs of product:version in TCBS compared to reports_clean. Or am I totally wrong and missing something?
| Assignee | ||
Comment 6•13 years ago
|
||
rhelmer: could you please help me finding what is wrong with those four reports? Here are the UUIDs:
bf504428-6e50-42ff-8298-3e3942120115
56f092a9-6c87-4e39-ad20-d20f82120115
520b5c9a-6589-453a-9189-dc4232120115
ef545c0a-0d46-4e6e-9384-af5ad2120115
They cannot be accessed through the UI (Crash Not Found), and Josh suppose what causes this also causes the bug we're investigating.
I have tried to find something different from those and regular reports but couldn't find anything. I also don't know why those reports are not included into reports_clean.
Comment 7•13 years ago
|
||
(In reply to Adrian Gaudebert [:adrian] from comment #6)
> rhelmer: could you please help me finding what is wrong with those four
> reports?
Sure, tracking them down now.
Comment 8•13 years ago
|
||
(In reply to Adrian Gaudebert [:adrian] from comment #6)
> rhelmer: could you please help me finding what is wrong with those four
> reports? Here are the UUIDs:
>
> bf504428-6e50-42ff-8298-3e3942120115
> 56f092a9-6c87-4e39-ad20-d20f82120115
> 520b5c9a-6589-453a-9189-dc4232120115
> ef545c0a-0d46-4e6e-9384-af5ad2120115
>
> They cannot be accessed through the UI (Crash Not Found), and Josh suppose
> what causes this also causes the bug we're investigating.
Which environment are you looking at? I don't see a record of these UUIDs in production/staging hbase (raw json) or postgres (reports table).
Comment 9•13 years ago
|
||
> There is a difference of 79 pairs of product:version in TCBS compared to
> reports_clean. Or am I totally wrong and missing something?
Oh, right! Sorry. reports_clean contains crashes from *all* versions. TCBS contains crashes only from "visible" versions, i.e. those between release_date and sunset_date. So, for example, a crash related to FF 7.0 would show up in reports_clean but not in TCBS.
So the real test here is to focus on specific product/versions which are currently "visible", not other total counts.
| Assignee | ||
Comment 10•13 years ago
|
||
I think I found one part of the problem: those 4 crash reports where processed on 2012-01-17 when they where received on 2012-01-15. So, when update_reports_clean ran, those four crashes where _not_ processed yet, and thus where not added to reports_clean.
However, if it true in this particular case, it is not in others. I have another list of UUIDs of crash reports missing in reports_clean:
59efac4e-3ee6-41f3-8dd9-744a92120110
b183de8f-b497-4ac7-a697-3a28f2120110
05913b2e-04d4-4ceb-a0f1-7acb02120110
781d92d0-95c9-4cb4-ae1e-9d5e02120110
428206ad-8b27-4a68-a44f-dae2c2120110
5f0577cc-e97c-4573-b88a-115b92120110
0c6e7a7d-3f9f-4c20-8e2e-f7bb92120110
e30ba1b1-a083-45f4-9345-60af02120110
42401c57-c34d-411b-95d2-121762120110
fbd8418b-3ef9-40a1-911a-83e542120110
a0e6acce-344e-4340-a18a-ab6b42120110
fdb0ee10-639c-4de0-a308-199c22120110
b80ed9f0-bc70-4297-a215-f43952120110
5539a4b5-8b42-4138-948e-7064b2120110
c75961c4-841a-46f3-98a5-184152120110
55051391-645f-466a-be28-190002120110
80e63828-3835-4d8d-9ae2-3e7702120110
135a47ff-321f-44b6-a1bd-adb002120110
f3790ef6-6bcb-485a-9ff9-720cf2120110
I really don't get it, those crashes seem to be alright.
| Assignee | ||
Comment 11•13 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #8)
> Which environment are you looking at? I don't see a record of these UUIDs in
> production/staging hbase (raw json) or postgres (reports table).
I'm running my tests in my VM, with the dump from Jan16.
(In reply to [:jberkus] Josh Berkus from comment #9)
> So the real test here is to focus on specific product/versions which are
> currently "visible", not other total counts.
Ok, I'll create a script to test all versions and see if we have any inconsistency, but I doubt I'll find any difference.
Comment 12•13 years ago
|
||
(In reply to Adrian Gaudebert [:adrian] from comment #10)
> I think I found one part of the problem: those 4 crash reports where
> processed on 2012-01-17 when they where received on 2012-01-15. So, when
> update_reports_clean ran, those four crashes where _not_ processed yet, and
> thus where not added to reports_clean.
>
> However, if it true in this particular case, it is not in others. I have
> another list of UUIDs of crash reports missing in reports_clean:
>
> 59efac4e-3ee6-41f3-8dd9-744a92120110
> b183de8f-b497-4ac7-a697-3a28f2120110
> 05913b2e-04d4-4ceb-a0f1-7acb02120110
> 781d92d0-95c9-4cb4-ae1e-9d5e02120110
> 428206ad-8b27-4a68-a44f-dae2c2120110
> 5f0577cc-e97c-4573-b88a-115b92120110
> 0c6e7a7d-3f9f-4c20-8e2e-f7bb92120110
> e30ba1b1-a083-45f4-9345-60af02120110
> 42401c57-c34d-411b-95d2-121762120110
> fbd8418b-3ef9-40a1-911a-83e542120110
> a0e6acce-344e-4340-a18a-ab6b42120110
> fdb0ee10-639c-4de0-a308-199c22120110
> b80ed9f0-bc70-4297-a215-f43952120110
> 5539a4b5-8b42-4138-948e-7064b2120110
> c75961c4-841a-46f3-98a5-184152120110
> 55051391-645f-466a-be28-190002120110
> 80e63828-3835-4d8d-9ae2-3e7702120110
> 135a47ff-321f-44b6-a1bd-adb002120110
> f3790ef6-6bcb-485a-9ff9-720cf2120110
>
> I really don't get it, those crashes seem to be alright.
Those crashes all have a Release Channel of "unknown" (at least, the 5 I checked all did) which prevents us from normalizing them or using then in reports. When we checked this earlier, such crashes constituted only 0.2% of all crashes sumbitted, and were a result of crashes which happened too soon after startup to collect useful data. However, the crashes you linked appear to have something different wrong, causing release channel not to be recorded.
I'll test later to see if we've had a jump in release_channel=unknown, or if it's affiliated with a particular signature.
Keep up the good work!
Comment 13•13 years ago
|
||
Daniel, Xavier: can you possibly help us analyze the reports that have no release_channel reported to work out what is common/special about them?
Comment 14•13 years ago
|
||
actually, let's see if Saptarshi might be up for the task. He might be able to use RIPHE to figure it out quickly.
Comment 15•13 years ago
|
||
Adding bug 704383 to this bug. Reports with unknown channel were originally fixed, and are apparently broken again. Please continue discussion of this particular data issue on that bug.
Depends on: 704383
Comment 16•13 years ago
|
||
Daniel,
Not necessary; we figured out the cause of that issue and it's been addressed.
| Assignee | ||
Comment 17•13 years ago
|
||
It seems the biggest part of the problem cannot be solved whatsoever. Moving to 2.4.3 to run further verifications.
Target Milestone: 2.4.2 → 2.4.3
| Assignee | ||
Updated•13 years ago
|
Target Milestone: 2.4.3 → 2.4.4
| Assignee | ||
Updated•13 years ago
|
Target Milestone: 2.4.4 → 2.5
| Assignee | ||
Updated•13 years ago
|
Target Milestone: 2.5 → 2.5.1
| Assignee | ||
Updated•13 years ago
|
Target Milestone: 2.5.1 → 2.5.3
| Assignee | ||
Updated•13 years ago
|
Target Milestone: 3 → 4
| Assignee | ||
Updated•13 years ago
|
Target Milestone: 5 → 6
| Assignee | ||
Updated•13 years ago
|
Target Milestone: 6 → 7
| Assignee | ||
Updated•13 years ago
|
Target Milestone: 7 → 9
| Assignee | ||
Updated•13 years ago
|
Target Milestone: 9 → 10
| Assignee | ||
Updated•13 years ago
|
Target Milestone: 10 → Future
| Assignee | ||
Comment 19•13 years ago
|
||
This has been around for too long, and we never heard any complaint since we fixed the first issue. I assume this is fixed.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
| Assignee | ||
Updated•12 years ago
|
Target Milestone: Future → 10
Updated•9 years ago
|
Product: Socorro → Socorro Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•