Closed Bug 1301763 Opened 9 years ago Closed 8 years ago

Why are Thunderbird 49.0b1 crashes being reported in Soccoro as 49.0b0?

Categories

(Socorro :: Backend, task)

x86
Windows 7
task
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: wsmwk, Assigned: adrian)

Details

Attachments

(1 file)

I don't know if this is a socorro issue or a build issue. Noticed a few days ago that 49.0b1 crashes are showing in socorro as 49.0b0. And 49.0b1 is not available in the socorro UI. 49.0b1 https://crash-stats.mozilla.com/daily?p=Thunderbird near zero 49.0b0 is showing at normal beta crash rates (can't query the version# directly) https://crash-stats.mozilla.com/search/?release_channel=beta&product=Thunderbird&_sort=-date&_facets=version&_columns=date&_columns=signature&_columns=version&_columns=build_id#facet-version For example my report bp-12e8af44-3473-4a03-81e1-e928d2160909 is showing as 49.0b0
Severity: critical → normal
I'm not sure where to look, what questions to ask, or who to ask. So I'll probably be pinging people for ideas to make progress. Are we likely to hit this same issue for beta 50?
Flags: needinfo?(rail)
Thanks rail! FWIW I confirm that thunderbird-49.0b1 is marked "shipped" in ship-it.
(In reply to Rail Aliiev [:rail] from comment #2) > https://github.com/mozilla/socorro/blob/ > 991171dcf54fbe40b247ede4a72e4b77b7c64a29/socorro/processor/ > mozilla_transform_rules.py#L938-L942 looks responsible for this. I'd ask > Soccoro folks. adrian, help?
Flags: needinfo?(adrian)
Things get marked with version b0 when we do not have any data about that beta version. I suppose this could be a problem with the ftpscrapper? I do not know how Thunderbird versions data is pulled into Socorro those days...
Flags: needinfo?(adrian)
Thelast time this happened, peterb sorted things out in bug 1257651
Component: Build Config → General
Flags: needinfo?(peterbe)
Product: Thunderbird → Socorro
So, it happens when there's "not enough product version information" based on the parameters product, version, release_channel and build_id. The ftpscraper is dumb. It just pulls down what's in the archive.mozilla.org. Sadly there's a lot of black magic that collects what data is in archive.mozilla.org and populates the product_versions table. Can you take a look at http://archive.mozilla.org/pub/thunderbird/ and try to figure out what's different in the .json files for 49 that wasn't a problem before. E.g. does 46, 47, 48 have a valid build_id or release_channel and 49 doesn't? Here is what our ftscraper + our black magic postgres functions have managed to collect about the recent product versions: breakpad=> select major_version, release_version, version_string, beta_number, build_date, build_type, has_builds, is_rapid_beta from product_versions where product_name ='Thunderbird' order by version_sort desc limit 50; major_version | release_version | version_string | beta_number | build_date | build_type | has_builds | is_rapid_beta ---------------+-----------------+----------------+-------------+------------+------------+------------+--------------- 52.0 | 52.0a1 | 52.0a1 | | 2016-09-20 | nightly | t | f 51.0 | 51.0a2 | 51.0a2 | | 2016-09-20 | aurora | t | f 51.0 | 51.0a1 | 51.0a1 | | 2016-08-02 | nightly | t | f 50.0 | 50.0a2 | 50.0a2 | | 2016-08-02 | aurora | t | f 50.0 | 50.0a1 | 50.0a1 | | 2016-06-07 | nightly | t | f 49.0 | 49.0 | 49.0b1 | 1 | 2016-08-05 | beta | f | f 49.0 | 49.0a2 | 49.0a2 | | 2016-06-07 | aurora | t | f 49.0 | 49.0a1 | 49.0a1 | | 2016-04-26 | nightly | t | f 48.0 | 48.0 | 48.0b1 | 1 | 2016-07-12 | beta | f | f 48.0 | 48.0a2 | 48.0a2 | | 2016-04-26 | aurora | t | f 48.0 | 48.0a1 | 48.0a1 | | 2016-03-08 | nightly | t | f 47.0 | 47.0 | 47.0b2 | 2 | 2016-06-17 | beta | f | f 47.0 | 47.0 | 47.0b1 | 1 | 2016-06-04 | beta | f | f 47.0 | 47.0a2 | 47.0a2 | | 2016-03-08 | aurora | t | f 47.0 | 47.0a1 | 47.0a1 | | 2016-01-26 | nightly | t | f 46.0 | 46.0a2 | 46.0a2 | | 2016-01-26 | aurora | t | f 46.0 | 46.0a1 | 46.0a1 | | 2015-12-15 | nightly | t | f 45.4 | 45.4.0 | 45.4.0 | | 2016-09-28 | release | f | f 45.4 | 45.4.0 | 45.4.0b99 | 99 | 2016-09-28 | beta | f | f 45.3 | 45.3.0 | 45.3.0 | | 2016-08-25 | release | f | f 45.3 | 45.3.0 | 45.3.0b99 | 99 | 2016-08-25 | beta | f | f 45.2 | 45.2 | 45.2 | | 2016-06-28 | release | f | f 45.2 | 45.2.0 | 45.2.0 | | 2016-06-30 | release | f | f 45.2 | 45.2.0 | 45.2.0b99 | 99 | 2016-06-30 | beta | f | f 45.2 | 45.2 | 45.2b1 | 1 | 2016-05-19 | beta | f | f 45.1 | 45.1.1 | 45.1.1 | | 2016-05-26 | release | f | f 45.1 | 45.1.0 | 45.1.0 | | 2016-05-05 | release | f | f 45.1 | 45.1.0 | 45.1.0b99 | 99 | 2016-05-05 | beta | f | f 45.1 | 45.1 | 45.1b1 | 1 | 2016-04-28 | beta | f | f 45.0 | 45.0 | 45.0 | | 2016-04-07 | release | f | f 45.0 | 45.0 | 45.0b99 | 99 | 2016-04-07 | beta | f | f 45.0 | 45.0 | 45.0b4 | 4 | 2016-04-04 | beta | f | f 45.0 | 45.0 | 45.0b3 | 3 | 2016-03-22 | beta | f | f 45.0 | 45.0 | 45.0b2 | 2 | 2016-02-18 | beta | f | f 45.0 | 45.0 | 45.0b1 | 1 | 2016-02-02 | beta | f | f 45.0 | 45.0a2 | 45.0a2 | | 2015-12-15 | aurora | t | f 45.0 | 45.0a1 | 45.0a1 | | 2015-10-30 | nightly | t | f 44.0 | 44.0 | 44.0b1 | 1 | 2016-01-12 | beta | f | f 44.0 | 44.0a2 | 44.0a2 | | 2015-10-30 | aurora | t | f 44.0 | 44.0a1 | 44.0a1 | | 2015-09-22 | nightly | t | f 43.0 | 43.0 | 43.0b1 | 1 | 2015-12-07 | beta | f | f 43.0 | 43.0a2 | 43.0a2 | | 2015-09-23 | aurora | t | f 43.0 | 43.0a1 | 43.0a1 | | 2015-08-11 | nightly | t | f 42.0 | 42.0 | 42.0b2 | 2 | 2015-10-12 | beta | f | f 42.0 | 42.0 | 42.0b1 | 1 | 2015-09-23 | beta | f | f 42.0 | 42.0a2 | 42.0a2 | | 2015-08-11 | aurora | t | f 42.0 | 42.0a1 | 42.0a1 | | 2015-06-30 | nightly | t | f 41.0 | 41.0 | 41.0b2 | 2 | 2015-09-16 | beta | f | f 41.0 | 41.0 | 41.0b1 | 1 | 2015-08-27 | beta | f | f 41.0 | 41.0a2 | 41.0a2 | | 2015-06-30 | aurora | t | f (50 rows) Anything there you think stands out?
Flags: needinfo?(peterbe)
I'm not entirely sure what I'm doing but here's a comparison of that query the transform rule does compared for 47, 48 and 49. breakpad=> select major_version, release_version, version_string, beta_number, build_date, build_type, has_builds, b.build_id, b.platform from product_versions pv left join product_version_builds b on (b.product_version_id = pv.product_version_id) where product_name ='Thunderbird' and version_string = '49.0b1' order by version_sort desc limit 50; major_version | release_version | version_string | beta_number | build_date | build_type | has_builds | build_id | platform ---------------+-----------------+----------------+-------------+------------+------------+------------+----------------+---------- 49.0 | 49.0 | 49.0b1 | 1 | 2016-08-05 | beta | f | 20160805071503 | linux 49.0 | 49.0 | 49.0b1 | 1 | 2016-08-05 | beta | f | 20160805071503 | mac (2 rows) breakpad=> select major_version, release_version, version_string, beta_number, build_date, build_type, has_builds, b.build_id, b.platform from product_versions pv left join product_version_builds b on (b.product_version_id = pv.product_version_id) where product_name ='Thunderbird' and version_string = '48.0b1' order by version_sort desc limit 50; major_version | release_version | version_string | beta_number | build_date | build_type | has_builds | build_id | platform ---------------+-----------------+----------------+-------------+------------+------------+------------+----------------+---------- 48.0 | 48.0 | 48.0b1 | 1 | 2016-07-12 | beta | f | 20160712184236 | linux 48.0 | 48.0 | 48.0b1 | 1 | 2016-07-12 | beta | f | 20160712184236 | mac 48.0 | 48.0 | 48.0b1 | 1 | 2016-07-12 | beta | f | 20160712184236 | win (3 rows) breakpad=> select major_version, release_version, version_string, beta_number, build_date, build_type, has_builds, b.build_id, b.platform from product_versions pv left join product_version_builds b on (b.product_version_id = pv.product_version_id) where product_name ='Thunderbird' and version_string = '47.0b1' order by version_sort desc limit 50; major_version | release_version | version_string | beta_number | build_date | build_type | has_builds | build_id | platform ---------------+-----------------+----------------+-------------+------------+------------+------------+----------------+---------- 47.0 | 47.0 | 47.0b1 | 1 | 2016-06-04 | beta | f | 20160604054735 | linux 47.0 | 47.0 | 47.0b1 | 1 | 2016-06-04 | beta | f | 20160604054735 | mac 47.0 | 47.0 | 47.0b1 | 1 | 2016-06-04 | beta | f | 20160604054735 | win (3 rows) Seems fine (except the lack of a win build). Perhaps reprocessing will set the version on the crash differently. Have you tried that?
Flags: needinfo?(adrian)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #10) > Created attachment 8799284 [details] > crash-stats verisons bonkers.png > > Now, neither 49.0b1 nor 49.0b0 are offered as a choice in > https://crash-stats.mozilla.com/crashes-per-day/?p=Thunderbird That might very well be unrelated and a matter of how those drop-downs aren't doing you any favors. The version drop-down choices are based on Firefox and it seems it doesn't reload after you have selected Thunderbird. Another battle for another day, but please file a bug. > New, is 50.0b1 was built on Friday. All crashes are being shown as 50.0b0 > [1] like > https://crash-stats.mozilla.com/report/index/a320137a-9e8a-42f4-a911- > 507262161010 > According to the raw crash that crash is version "50.0" https://crash-stats.mozilla.com/rawdumps/a320137a-9e8a-42f4-a911-507262161010.json But the "pretty version" is turned into "50.0b0" https://crash-stats.mozilla.com/api/UnredactedCrash/?crash_id=a320137a-9e8a-42f4-a911-507262161010 Is that not correct? > [1] > https://crash-stats.mozilla.com/search/?version=50.0b0&version=50. > 0b1&product=Thunderbird&date=%3E%3D2016-10-03T01%3A56%3A00.000Z&date=%3C2016- > 10-10T01%3A56%3A00.000Z&_sort=- > date&_facets=signature&_columns=date&_columns=signature&_columns=product&_col > umns=version&_columns=build_id&_columns=platform#crash-reports According to https://crash-stats.mozilla.com/search/?product=Thunderbird&date=%3E%3D2016-09-10T14%3A29%3A00.000Z&date=%3C2016-10-10T14%3A29%3A00.000Z&_sort=-date&_facets=version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-version (all Thunderbird crashes the last 1 month from now) It seems that there are no crashes that have come in under 50.* at all. That doesn't make any sense because I'm pretty sure https://crash-stats.mozilla.com/report/index/a320137a-9e8a-42f4-a911-507262161010 should be included. It happened 4 hours since the upper date bound on my search. Adrian, can you explain that?
When a version gets a -b0 version number, it is because we could not find data in our database for that crash's (product, version, release channel, build id) tuple. For bp-a320137a-9e8a-42f4-a911-507262161010, that means that when it was processed, our database had no data for (Thunberbird, 50.0, beta, 20161007134619). @Peter, that would simply be because we only show the top 50 results in facets. I checked for version 50.0b and it has only 151 results, which is less than the 50th version has (388). If you want to prove that, you can use the API, like this: https://crash-stats.mozilla.com/api/SuperSearch/?product=Thunderbird&date=%3E%3D2016-09-10T14%3A29%3A00.000Z&date=%3C2016-10-10T14%3A29%3A00.000Z&_facets_size=200&_results_number=0&_facets=version Wayne, for which versions should we run a reprocess? I don't think we have data for 50.0 yet, but Peter can maybe confirm that?
Flags: needinfo?(adrian)
(In reply to Peter Bengtsson [:peterbe] from comment #11) > ... > > New, is 50.0b1 was built on Friday. All crashes are being shown as 50.0b0 > > [1] like > > https://crash-stats.mozilla.com/report/index/a320137a-9e8a-42f4-a911- > > 507262161010 > > > > According to the raw crash that crash is version "50.0" > https://crash-stats.mozilla.com/rawdumps/a320137a-9e8a-42f4-a911- > 507262161010.json > > But the "pretty version" is turned into "50.0b0" > https://crash-stats.mozilla.com/api/UnredactedCrash/?crash_id=a320137a-9e8a- > 42f4-a911-507262161010 > > Is that not correct? We did not build a 50.0b0 so, I don't see how it could be 50.0b0. Build specs at https://public.etherpad-mozilla.org/p/thunderbird-release-50.0b1
(In reply to Adrian Gaudebert [:adrian] from comment #12) > When a version gets a -b0 version number, it is because we could not find > data in our database for that crash's (product, version, release channel, > build id) tuple. > > For bp-a320137a-9e8a-42f4-a911-507262161010, that means that when it was > processed, our database had no data for (Thunberbird, 50.0, beta, > 20161007134619). > > @Peter, that would simply be because we only show the top 50 results in > facets. I checked for version 50.0b and it has only 151 results, which is > less than the 50th version has (388). If you want to prove that, you can use > the API, like this: I don't understand what the number of crashes has to do with what version is offered in the UI. But 21 hours after your comment we are at 412 crashes for 50.b0, and in https://crash-stats.mozilla.com/search/ 50.0b0 is not offered in the static version field nor for a "new line" set to "version has terms" > https://crash-stats.mozilla.com/api/SuperSearch/ > ?product=Thunderbird&date=%3E%3D2016-09-10T14%3A29%3A00.000Z&date=%3C2016-10- > 10T14%3A29%3A00.000Z&_facets_size=200&_results_number=0&_facets=version > > Wayne, for which versions should we run a reprocess? I don't think we have > data for 50.0 yet, but Peter can maybe confirm that? version 49.0 beta and 50.0 beta.
Since we moved to rapid betas a while ago, products in their beta version stopped being aware of their actual version number. They only know about their "major version". So for a Thunderbird 50.0b2, the version number it knows is 50.0. That is what gets sent to crash-stats as part of the crash report. Then, in crash-stats, we have a rule to rewrite these version numbers to what you would expect, so in that earlier case, turn 50.0 into 50.0b2. To do that, we use our database, in which we store a bunch of associations like I described earlier: (product, version, release channel, build id) -> actual version number For example: ("Thunderbird", "50.0", "beta", "20161007134619") -> "50.0b2" When we cannot find a match in our database for that tuple (product, version, release channel, build id), we assume that we do not have data about that release yet, and thus give the crash report a "fake" version number ending with "b0". We use "0" because we know that it is not a valid version number for an actual beta, so that's a good way of seeing that something went wrong. It also makes it quite easy for us to spot all of those crashes to send them for reprocessing once we have the missing data. So, it is expected that "50.0b0" is not in any drop-down. That is not a valid version number. It is there to show a crash report has a bogus version number. Now, the cause of this problem is that we do not have data about Thunderbird 49.0 and 50.0 beta versions in our database. The table that contains data about builds is called `product_version_builds`. As far as I know, it is populated by our FTP Scrapper cron job. Peter, do you know why that data is missing from our postgres database? FYI, the query we use to find that data is here: https://github.com/mozilla/socorro/blob/master/socorro/processor/mozilla_transform_rules.py#L881-L891
Component: General → Backend
I've been looking at the ftpscraper recently so I ran it in dry-run mode. Here's an excerpt of the output for Thunderbird releases: INSERT BUILD ('thunderbird', '48.0', 'win', u'20160712184236', 'beta', '1', u'mozilla-beta', 'build3') {'ignore_duplicates': True} INSERT BUILD ('thunderbird', '49.0', 'win', u'20160901155122', 'beta', '1', u'comm-beta', 'build5') {'ignore_duplicates': True} INSERT BUILD ('thunderbird', '50.0', 'win', u'20161007134619', 'beta', '1', u'comm-beta', 'build2') {'ignore_duplicates': True} I've just shown Windows for brevity, there's also 2 entries for linux, and one for mac. Anyway, the change from mozilla-beta to comm-beta seems like a decent lead to follow.
Not an answer but here's what we have in the database: breakpad=> SELECT breakpad-> pv.version_string, breakpad-> pv.build_type breakpad-> FROM product_versions pv breakpad-> WHERE pv.product_name = 'Thunderbird' breakpad-> AND breakpad-> ( breakpad(> pv.release_version like '48%' OR breakpad(> pv.release_version like '49%' OR breakpad(> pv.release_version like '50%' OR breakpad(> pv.release_version like '51%') breakpad-> group by pv.version_string, pv.build_type breakpad-> order by version_string breakpad-> ; version_string | build_type ----------------+------------ 48.0a1 | nightly 48.0a2 | aurora 48.0b1 | beta 49.0a1 | nightly 49.0a2 | aurora 49.0b1 | beta 50.0a1 | nightly 50.0a2 | aurora 51.0a1 | nightly 51.0a2 | aurora (10 rows) In other words, for 48* and 49* there were 3 build types (nightly, aurora, beta). For 50* and 51* there's only 2 build types (nightly, aurora) Is Nick's comment about the fact that the "mozilla-beta" now seems to be "comm-beta" a clue?
Maybe, but the switch to comm-beta happens at 49, instead of 50.0. Could be another change had an impact too.
Are you saying this should be mozilla-beta, not comm-beta? ('thunderbird', '49.0', 'win', u'20160901155122', 'beta', '1', u'comm-beta', 'build5') {'ignore_duplicates': True}
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #19) > Are you saying this should be mozilla-beta, not comm-beta? > > ('thunderbird', '49.0', 'win', u'20160901155122', 'beta', '1', u'comm-beta', > 'build5') > {'ignore_duplicates': True} I'm not entirely sure what the details are but I think what Nick is suggesting is that starting with v 49, the name of the repository changed. That is a clue that the stuff that ftpscraper picks up from archive.mozilla.org might be different in other ways.
Is it possible to update the database to get 50.0b1 in there and reprocess those beta crashes, while we sort out the longer term fix?
The `comm-beta` lead what a good one. That is indeed not a known repository in our database. To fix that: > INSERT INTO release_repositories (repository) VALUES ('comm-beta'); Now we need to update the tables being queried: > SELECT update_product_versions(); And finally we can verify that now we have builds for version 50.0: > SELECT DISTINCT pvb.build_id FROM product_versions pv > LEFT JOIN product_version_builds pvb ON pv.product_version_id = pvb.product_version_id > WHERE pv.product_name = 'Thunderbird' AND pv.release_version = '50.0' > AND pv.build_type ILIKE 'beta'; > build_id > ---------------- > 20161007134619 > 20161017040505 > 20161003102417 > (3 rows) I have applied all of these on stage, then went to look for a 50.0b0 crash report and reprocessed it. And tada! https://crash-stats.allizom.org/report/index/49b67ee9-f320-4fed-b57a-197cf2161026 It now has a version number of 50.0b2. So I think that solves it. I'm going to run the same commands on prod and then reprocess all -b0 crash reports we have.
Assignee: nobody → adrian
I have applied the database changes to prod, and have started reprocessing all crashes with version 49.0b0 or 50.0b0 in stage and prod. That's ongoing and should be done quickly. I believe that resolves this bug!
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
\o/ Thanks!
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: