Closed
Bug 679065
Opened 14 years ago
Closed 14 years ago
Some Fennec 6.0b6 topcrash signatures return an empty report list
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
2.3.4
People
(Reporter: kairo, Assigned: rhelmer)
References
()
Details
While most signature links from https://crash-stats.mozilla.com/topcrasher/byversion/Fennec/6.0b6/7 return a report list as they should, we found two that don't:
https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2011-08-14&signature=mozilla%3A%3Alayout%3A%3ARenderFrameParent%3A%3AGetLayerManager&version=Fennec%3A6.0b6
https://crash-stats.mozilla.com/report/list?range_value=7&range_unit=days&date=2011-08-14&signature=nsHtml5StreamParser%3A%3AFinalizeSniffing&version=Fennec%3A6.0b6
Either they wrongly are listed in topcrashers, or not finding reports is wrong.
Comment 1•14 years ago
|
||
If this is a code error, I'd ship a 2.2.1 for it, seems pretty important.
Severity: normal → major
Target Milestone: --- → 2.3
| Assignee | ||
Comment 2•14 years ago
|
||
This looks like an error in new tcbs aggregation. I don't see any reports for Fennec 6.0b6 with this signature for the selected time range:
breakpad=> select build_id from product_versions as pv join product_version_builds as pvb on (pv.product_version_id = pvb.product_version_id) and product_name = 'Fennec' and version_string = '6.0b6';
build_id
----------------
20110810164700
20110810181057
20110810183719
20110810192454
20110810184133
(5 rows)
breakpad=> SELECT product, version, build FROM reports WHERE product = 'Fennec' AND version = '6.0'
AND signature = 'mozilla::layout::RenderFrameParent::GetLayerManager' AND reports.date_processed BETWEEN CAST('2011-08-14' AS TIMESTAMP WITHOUT TIME ZONE) - CAST('7 days' AS INTERVAL) AND CAST('2011-08-14' AS TIMESTAMP WITHOUT TIME ZONE) ORDER BY reports.date_processed DESC;
product | version | build
---------+---------+----------------
Fennec | 6.0 | 20110804021247
Fennec | 6.0 | 20110804021247
(2 rows)
--------------------
Yet it appears in TCBS:
--------------------
breakpad=> select product_name, version_string, signature, report_date from tcbs join signatures as s on s.signature_id = tcbs.signature_id join product_versions as pv on tcbs.product_version_id = pv.product_version_id where signature = 'mozilla::layout::RenderFrameParent::GetLayerManager' and product_name = 'Fennec' and version_string = '6.0b6';
product_name | version_string | signature | report_date
--------------+----------------+-----------------------------------------------------+-------------
Fennec | 6.0b6 | mozilla::layout::RenderFrameParent::GetLayerManager | 2011-08-14
(1 row)
Assignee: nobody → josh
| Reporter | ||
Comment 3•14 years ago
|
||
Umm, but the CSV has one:
robert@robert:/mnt/mozilla/projects/socorro> grep "mozilla::layout::RenderFrameParent::GetLayerManager" 2011-08-14/20110814-pub-crashdata.csv
mozilla::layout::RenderFrameParent::GetLayerManager URL (removed) http://crash-stats.mozilla.com/report/index/eea72d6a-1866-40c1-b572-64d572110814 201108140834 201108140833 \N Fennec 6.0 20110810164700 6 Linux 2.6.32.9 ?arch?arm | None | 0 0xe0 603680 \N 4 \N ../../dist/include/nsIDocument.h checked [blank] \N SIGSEGV \N motorola DROID2 | verizon/droid2_vzw/cdma_droid2/droid2:2.2/VZW/23.20:user/ota-rel-keys,release-keys 104916 \N
That's the 20110810164700 buildID from what I'm seeing...
| Reporter | ||
Comment 4•14 years ago
|
||
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Alayout%3A%3ARenderFrameParent%3A%3AGetLayerManager&version=Fennec%3A6.0b6 works (without a date/time range). Are we running into a problem because TCBS is using UTC and /reports/list is assuming Pacific?
| Assignee | ||
Comment 5•14 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #3)
> Umm, but the CSV has one:
>
> robert@robert:/mnt/mozilla/projects/socorro> grep
> "mozilla::layout::RenderFrameParent::GetLayerManager"
> 2011-08-14/20110814-pub-crashdata.csv
> mozilla::layout::RenderFrameParent::GetLayerManager URL (removed)
> http://crash-stats.mozilla.com/report/index/eea72d6a-1866-40c1-b572-
> 64d572110814 201108140834 201108140833 \N Fennec 6.0
> 20110810164700 6 Linux 2.6.32.9 ?arch?arm | None | 0 0xe0
> 603680 \N 4 \N ../../dist/include/nsIDocument.h
> checked [blank] \N SIGSEGV \N motorola DROID2 |
> verizon/droid2_vzw/cdma_droid2/droid2:2.2/VZW/23.20:user/ota-rel-keys,
> release-keys 104916 \N
>
> That's the 20110810164700 buildID from what I'm seeing...
Hmm good point... so I notice the cutoff for that reports query is '2011-08-14' (the right-hand-side of the "BETWEEN" clause), so that's not going to show up.
So the problem here isn't that TCBS aggregation is wrong, it's that the reports query only shows up to today Pacific.
Note that the way the date decided is not new code, this is starting to look like a pre-existing bug (possibly exacerbated by new tcbs being UTC? not sure).
| Reporter | ||
Comment 6•14 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #5)
> So the problem here isn't that TCBS aggregation is wrong, it's that the
> reports query only shows up to today Pacific.
It shows everything before *yesterday* (midnight) Pacific (2011-08-14 is yesterday, not today, and the crash happened yesterday morning). ;-)
| Assignee | ||
Comment 7•14 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #6)
> (In reply to Robert Helmer [:rhelmer] from comment #5)
> > So the problem here isn't that TCBS aggregation is wrong, it's that the
> > reports query only shows up to today Pacific.
>
> It shows everything before *yesterday* (midnight) Pacific (2011-08-14 is
> yesterday, not today, and the crash happened yesterday morning). ;-)
Right, thanks :)
Comment 8•14 years ago
|
||
The time zone issues are expected. The only real cure for them is a gradual movement away from pacific timestamps and towards UTC time and/or timestamptz. There's so much code to be changed though, that we need to do this in stages.
| Reporter | ||
Comment 9•14 years ago
|
||
(In reply to Josh Berkus from comment #8)
> The time zone issues are expected.
1) This is not just timezone, this is almost a whole day that it's wrong.
2) In this case, it should not be hard to fix if TCBS knows the exact time its report ends and pass it in the link (possibly with some TZ conversion for now).
Updated•14 years ago
|
Flags: in-testsuite?
Flags: in-litmus?
Comment 10•14 years ago
|
||
This might or might not get fixed by 2.2.4. We'll see.
| Reporter | ||
Comment 11•14 years ago
|
||
Just to clarify, this is probably UI or mware. In the example in here, we have list?range_value=7&range_unit=days&date=2011-08-14& and that seems to show "7 days up to the timestamp 2011-08-14" when the report it is linked from shows "7 days up to including 2011-08-14" which equals to "7 days up to timestamp 2011-08-15 (00:00:00)".
This should be easier to trigger with our one-day topcrasher reports, e.g. https://crash-stats.mozilla.com/topcrasher/byversion/Fennec/7.0b5/1/browser
Comment 12•14 years ago
|
||
After hashing this out with KaiRo, this is apparently an middleware issue where the date limits used in the different screens aren't phrased the same. As such, reassigning to bsavage and moving to 2.3.
Assignee: josh → bsavage
Updated•14 years ago
|
Target Milestone: 2.3 → 2.3.1
| Assignee | ||
Updated•14 years ago
|
Target Milestone: 2.3.1 → 2.3.2
Comment 13•14 years ago
|
||
Which type of report do you want to have? Do you want to have it "up to and including <day>" or "up to but excluding <day>"?
| Reporter | ||
Comment 14•14 years ago
|
||
(In reply to Brandon Savage [:brandon] from comment #13)
> Which type of report do you want to have? Do you want to have it "up to and
> including <day>" or "up to but excluding <day>"?
Whatever matches the topcrash report that link is on!
I don't care what way you resolve it, either change the link in the topcrash report to go to the next day or change the list to be inclusive of the queried day.
Updated•14 years ago
|
Target Milestone: 2.3.2 → 2.3.3
Comment 15•14 years ago
|
||
It appears that the reports table is reporting different data than the tcbs table. The report/list page uses the reports table to fetch it's data; should it be using something else? If not, then this appears to be both a code problem and a data bug, since tcbs should report data consistent with reports.
Comment 16•14 years ago
|
||
This is still a problem as of 2.3.2.
Updated•14 years ago
|
Assignee: bsavage → josh
Comment 17•14 years ago
|
||
Brandon,
Please give me the query being passed by the middleware.
Comment 18•14 years ago
|
||
The following query is being used by the middleware in socorro/services/tcbs/modern.py:
WITH tcbs_r as (
SELECT signature_id,
signature,
product_name,
version_string,
sum(report_count) as report_count,
sum(win_count) as win_count,
sum(lin_count) as lin_count,
sum(mac_count) as mac_count,
sum(hang_count) as hang_count,
plugin_count(process_type,report_count) as plugin_count,
content_count(process_type,report_count) as content_count
FROM tcbs JOIN signatures USING (signature_id)
JOIN product_versions USING (product_version_id)
WHERE product_name = '%s'
AND version_string = '%s'
AND report_date BETWEEN '%s' AND '%s'
%s
GROUP BY signature_id, signature, product_name, version_string
),
tcbs_window AS (
SELECT tcbs_r.*,
sum(report_count) over () as total_crashes,
dense_rank() over (order by report_count desc) as ranking
FROM
tcbs_r
)
SELECT signature,
report_count,
win_count,
lin_count,
mac_count,
hang_count,
plugin_count,
content_count,
report_count / total_crashes::float
as percent_of_total
FROM tcbs_window
ORDER BY report_count DESC
LIMIT %s
Comment 19•14 years ago
|
||
Brandon:
I'm going to need to capture the actual query passed when you click through. Can we set up an isolated test of this on devdb later today? Thanks.
Comment 21•14 years ago
|
||
Brandon and I debugged this. First of all, the query above is not the one which is being passed to search reports. That's a different query.
The issue is actually a date issue, but not specifically one for the time zone. The problem is that when TCBS shows "1 day of 11-06" what it means is "11-06 00:00:00 UTC TO 11-07 00:00:00 UTC". But what gets passed in the reports search is:
BETWEEN CAST('2011-11-06' AS TIMESTAMP WITHOUT TIME ZONE) - CAST('1 days' AS INTERVAL) AND CAST('2011-11-06' AS TIMESTAMP WITHOUT TIME ZONE)
In other words, '11-06 00:00:00 PST' TO '11-06 00:00:00 PST'. A period with only 7 hours of overlap with the TCBS day.
I don't know if it's the TCBS screen which is displaying the wrong dates or the reports screen which is searching on the wrong dates. But the two need to be consistent.
Comment 22•14 years ago
|
||
Entire queries per Brandon's request:
WITH tcbs_r as (
SELECT signature_id,
signature,
product_name,
version_string,
sum(report_count) as report_count,
sum(win_count) as win_count,
sum(lin_count) as lin_count,
sum(mac_count) as mac_count,
sum(hang_count) as hang_count,
plugin_count(process_type,report_count) as plugin_count,
content_count(process_type,report_count) as content_count
FROM tcbs JOIN signatures USING (signature_id)
JOIN product_versions USING (product_version_id)
WHERE product_name = 'Fennec'
AND version_string = '8.0b3'
AND report_date BETWEEN '2011-11-05' AND '2011-11-06'
AND process_type = 'browser'
GROUP BY signature_id, signature, product_name, version_string
),
tcbs_window AS (
SELECT tcbs_r.*,
sum(report_count) over () as total_crashes,
dense_rank() over (order by report_count desc) as ranking
FROM
tcbs_r
)
SELECT signature,
report_count,
win_count,
lin_count,
mac_count,
hang_count,
plugin_count,
content_count,
report_count / total_crashes::float
as percent_of_total
FROM tcbs_window
ORDER BY report_count DESC
LIMIT 300
",,,,,,,,,""
SELECT
reports.date_processed,
reports.uptime,
reports.user_comments,
reports.uuid,
reports.product,
reports.version,
reports.build,
reports.signature,
reports.url,
reports.os_name,
reports.os_version,
reports.cpu_name,
reports.cpu_info,
reports.address,
reports.reason,
reports.last_crash,
reports.install_age,
reports.hangid,
reports.process_type,
( reports.client_crash_date - ( reports.install_age * INTERVAL '1 second' ) ) as install_time,
reports_duplicates.duplicate_of
FROM reports
JOIN product_versions ON reports.version = product_versions.release_version AND reports.product = product_versions.product_name
LEFT OUTER JOIN reports_duplicates ON reports.uuid = reports_duplicates.uuid
WHERE reports.signature = 'RNG_FileUpdate'
AND ((reports.product = 'Fennec'
AND product_versions.version_string = '8.0b3'
AND reports.version = product_versions.release_version
AND reports.release_channel ILIKE 'beta'
AND product_versions.build_type = 'Beta'
AND EXISTS ( SELECT 1
FROM product_version_builds
WHERE product_versions.product_version_id = product_version_builds.product_version_id
AND build_numeric(reports.build) = product_version_builds.build_id ))) AND reports.date_processed
BETWEEN CAST('2011-11-06' AS TIMESTAMP WITHOUT TIME ZONE) - CAST('1 days' AS INTERVAL) AND CAST('2011-11-06' AS TIMESTAMP WITHOUT TIME ZONE)
ORDER BY reports.date_processed DESC LIMIT 500 OFFSET 0;
Comment 23•14 years ago
|
||
so as you can see, the issue comes from the change from oldtcbs to newtcbs. In oldtcbs, the "window_end" column was a timestamp, so "2011-11-05 to 2011-11-06" meant "all of 11/05, stopping at midnight". Whereas in newtcbs, the "report_date" column is a DATE, which means that "2011-11-05 to 2011-11-06" means "all of 11/05 and 11/06".
Updated•14 years ago
|
Target Milestone: 2.3.3 → 2.3.4
Updated•14 years ago
|
Assignee: josh → chris.lonnen
| Assignee | ||
Comment 24•14 years ago
|
||
I just randomly was able to reproduce this while working on a different bug (generating a minimal dev DB), so I am in a good position to test and fix (it's blocking me anyway).
Assignee: chris.lonnen → rhelmer
Status: NEW → ASSIGNED
| Assignee | ||
Comment 25•14 years ago
|
||
lonnen r?: https://github.com/mozilla/socorro/pull/161
After consulting with Josh and testing locally, here is a fix to treat /report/list queries as Pacific (since we can assume with newtcbs that incoming links will be UTC).
All of this code will go away and use the common search code when bug 678642 is fixed, so not going super deep into this. We should actually audit the search code to make sure it's doing this correctly (it will be an issue for Elastic Search queries as well, since those are Pacific while newtcbs is UTC).
The only downside I can think of is that now pre-rapid-release (old TCBS) links will have the problem described in this bug, but as that's deprecated I'm not sure it's worth the larger change we'd need to make to this code to make it determine this and choose the right method.
Comment 26•14 years ago
|
||
If it's one or the other, fennec has our priority. Tested locally, r+.
Comment 27•14 years ago
|
||
Commit pushed to https://github.com/mozilla/socorro
https://github.com/mozilla/socorro/commit/217f8fa514045c8e7149540bb5fbc09e14832cce
Merge pull request #161 from rhelmer/bug679065-fix-report-list-query
bug 679065 - incoming links will be UTC dates for newtcbs, be explicit a...
| Assignee | ||
Updated•14 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•14 years ago
|
Component: Socorro → General
Product: Webtools → Socorro
You need to log in
before you can comment on or make changes to this bug.
Description
•