Why are Thunderbird 49.0b1 crashes being reported in Soccoro as 49.0b0?

VERIFIED FIXED

Status

Socorro
Backend
VERIFIED FIXED
a year ago
a year ago

People

(Reporter: wsmwk, Assigned: adrian)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

a year ago
I don't know if this is a socorro issue or a build issue.

Noticed a few days ago that 49.0b1 crashes are showing in socorro as 49.0b0. And 49.0b1 is not available in the socorro UI.

49.0b1 https://crash-stats.mozilla.com/daily?p=Thunderbird near zero

49.0b0 is showing at normal beta crash rates (can't query the version# directly)
 https://crash-stats.mozilla.com/search/?release_channel=beta&product=Thunderbird&_sort=-date&_facets=version&_columns=date&_columns=signature&_columns=version&_columns=build_id#facet-version

For example my report bp-12e8af44-3473-4a03-81e1-e928d2160909 is showing as 49.0b0
(Reporter)

Updated

a year ago
Severity: critical → normal
(Reporter)

Comment 1

a year ago
I'm not sure where to look, what questions to ask, or who to ask. So I'll probably be pinging people for ideas to make progress.
Are we likely to hit this same issue for beta 50?
Flags: needinfo?(rail)
(Reporter)

Comment 3

a year ago
Thanks rail!  FWIW I confirm that thunderbird-49.0b1 is marked "shipped" in ship-it.
(Reporter)

Comment 4

a year ago
(In reply to Rail Aliiev [:rail] from comment #2)
> https://github.com/mozilla/socorro/blob/
> 991171dcf54fbe40b247ede4a72e4b77b7c64a29/socorro/processor/
> mozilla_transform_rules.py#L938-L942 looks responsible for this. I'd ask
> Soccoro folks.

adrian, help?
Flags: needinfo?(adrian)
(Assignee)

Comment 5

a year ago
Things get marked with version b0 when we do not have any data about that beta version. I suppose this could be a problem with the ftpscrapper? I do not know how Thunderbird versions data is pulled into Socorro those days...
Flags: needinfo?(adrian)
(Reporter)

Comment 6

a year ago
Thelast time this happened, peterb sorted things out in bug 1257651
Component: Build Config → General
Flags: needinfo?(peterbe)
Product: Thunderbird → Socorro
So, it happens when there's "not enough product version information" based on the parameters product, version, release_channel and build_id. 

The ftpscraper is dumb. It just pulls down what's in the archive.mozilla.org. Sadly there's a lot of black magic that collects what data is in archive.mozilla.org and populates the product_versions table. 

Can you take a look at http://archive.mozilla.org/pub/thunderbird/ and try to figure out what's different in the .json files for 49 that wasn't a problem before. E.g. does 46, 47, 48 have a valid build_id or release_channel and 49 doesn't?

Here is what our ftscraper + our black magic postgres functions have managed to collect about the recent product versions:

breakpad=> select major_version, release_version, version_string, beta_number, build_date, build_type, has_builds, is_rapid_beta from product_versions where product_name ='Thunderbird' order by version_sort desc limit 50;
 major_version | release_version | version_string | beta_number | build_date | build_type | has_builds | is_rapid_beta
---------------+-----------------+----------------+-------------+------------+------------+------------+---------------
 52.0          | 52.0a1          | 52.0a1         |             | 2016-09-20 | nightly    | t          | f
 51.0          | 51.0a2          | 51.0a2         |             | 2016-09-20 | aurora     | t          | f
 51.0          | 51.0a1          | 51.0a1         |             | 2016-08-02 | nightly    | t          | f
 50.0          | 50.0a2          | 50.0a2         |             | 2016-08-02 | aurora     | t          | f
 50.0          | 50.0a1          | 50.0a1         |             | 2016-06-07 | nightly    | t          | f
 49.0          | 49.0            | 49.0b1         |           1 | 2016-08-05 | beta       | f          | f
 49.0          | 49.0a2          | 49.0a2         |             | 2016-06-07 | aurora     | t          | f
 49.0          | 49.0a1          | 49.0a1         |             | 2016-04-26 | nightly    | t          | f
 48.0          | 48.0            | 48.0b1         |           1 | 2016-07-12 | beta       | f          | f
 48.0          | 48.0a2          | 48.0a2         |             | 2016-04-26 | aurora     | t          | f
 48.0          | 48.0a1          | 48.0a1         |             | 2016-03-08 | nightly    | t          | f
 47.0          | 47.0            | 47.0b2         |           2 | 2016-06-17 | beta       | f          | f
 47.0          | 47.0            | 47.0b1         |           1 | 2016-06-04 | beta       | f          | f
 47.0          | 47.0a2          | 47.0a2         |             | 2016-03-08 | aurora     | t          | f
 47.0          | 47.0a1          | 47.0a1         |             | 2016-01-26 | nightly    | t          | f
 46.0          | 46.0a2          | 46.0a2         |             | 2016-01-26 | aurora     | t          | f
 46.0          | 46.0a1          | 46.0a1         |             | 2015-12-15 | nightly    | t          | f
 45.4          | 45.4.0          | 45.4.0         |             | 2016-09-28 | release    | f          | f
 45.4          | 45.4.0          | 45.4.0b99      |          99 | 2016-09-28 | beta       | f          | f
 45.3          | 45.3.0          | 45.3.0         |             | 2016-08-25 | release    | f          | f
 45.3          | 45.3.0          | 45.3.0b99      |          99 | 2016-08-25 | beta       | f          | f
 45.2          | 45.2            | 45.2           |             | 2016-06-28 | release    | f          | f
 45.2          | 45.2.0          | 45.2.0         |             | 2016-06-30 | release    | f          | f
 45.2          | 45.2.0          | 45.2.0b99      |          99 | 2016-06-30 | beta       | f          | f
 45.2          | 45.2            | 45.2b1         |           1 | 2016-05-19 | beta       | f          | f
 45.1          | 45.1.1          | 45.1.1         |             | 2016-05-26 | release    | f          | f
 45.1          | 45.1.0          | 45.1.0         |             | 2016-05-05 | release    | f          | f
 45.1          | 45.1.0          | 45.1.0b99      |          99 | 2016-05-05 | beta       | f          | f
 45.1          | 45.1            | 45.1b1         |           1 | 2016-04-28 | beta       | f          | f
 45.0          | 45.0            | 45.0           |             | 2016-04-07 | release    | f          | f
 45.0          | 45.0            | 45.0b99        |          99 | 2016-04-07 | beta       | f          | f
 45.0          | 45.0            | 45.0b4         |           4 | 2016-04-04 | beta       | f          | f
 45.0          | 45.0            | 45.0b3         |           3 | 2016-03-22 | beta       | f          | f
 45.0          | 45.0            | 45.0b2         |           2 | 2016-02-18 | beta       | f          | f
 45.0          | 45.0            | 45.0b1         |           1 | 2016-02-02 | beta       | f          | f
 45.0          | 45.0a2          | 45.0a2         |             | 2015-12-15 | aurora     | t          | f
 45.0          | 45.0a1          | 45.0a1         |             | 2015-10-30 | nightly    | t          | f
 44.0          | 44.0            | 44.0b1         |           1 | 2016-01-12 | beta       | f          | f
 44.0          | 44.0a2          | 44.0a2         |             | 2015-10-30 | aurora     | t          | f
 44.0          | 44.0a1          | 44.0a1         |             | 2015-09-22 | nightly    | t          | f
 43.0          | 43.0            | 43.0b1         |           1 | 2015-12-07 | beta       | f          | f
 43.0          | 43.0a2          | 43.0a2         |             | 2015-09-23 | aurora     | t          | f
 43.0          | 43.0a1          | 43.0a1         |             | 2015-08-11 | nightly    | t          | f
 42.0          | 42.0            | 42.0b2         |           2 | 2015-10-12 | beta       | f          | f
 42.0          | 42.0            | 42.0b1         |           1 | 2015-09-23 | beta       | f          | f
 42.0          | 42.0a2          | 42.0a2         |             | 2015-08-11 | aurora     | t          | f
 42.0          | 42.0a1          | 42.0a1         |             | 2015-06-30 | nightly    | t          | f
 41.0          | 41.0            | 41.0b2         |           2 | 2015-09-16 | beta       | f          | f
 41.0          | 41.0            | 41.0b1         |           1 | 2015-08-27 | beta       | f          | f
 41.0          | 41.0a2          | 41.0a2         |             | 2015-06-30 | aurora     | t          | f
(50 rows)

Anything there you think stands out?
Flags: needinfo?(peterbe)
I'm not entirely sure what I'm doing but here's a comparison of that query the transform rule does compared for 47, 48 and 49. 

breakpad=> select major_version, release_version, version_string, beta_number, build_date, build_type, has_builds, b.build_id, b.platform from product_versions pv left join product_version_builds b on (b.product_version_id = pv.product_version_id) where product_name ='Thunderbird' and version_string = '49.0b1' order by version_sort desc limit 50;
 major_version | release_version | version_string | beta_number | build_date | build_type | has_builds |    build_id    | platform
---------------+-----------------+----------------+-------------+------------+------------+------------+----------------+----------
 49.0          | 49.0            | 49.0b1         |           1 | 2016-08-05 | beta       | f          | 20160805071503 | linux
 49.0          | 49.0            | 49.0b1         |           1 | 2016-08-05 | beta       | f          | 20160805071503 | mac
(2 rows)

breakpad=> select major_version, release_version, version_string, beta_number, build_date, build_type, has_builds, b.build_id, b.platform from product_versions pv left join product_version_builds b on (b.product_version_id = pv.product_version_id) where product_name ='Thunderbird' and version_string = '48.0b1' order by version_sort desc limit 50;
 major_version | release_version | version_string | beta_number | build_date | build_type | has_builds |    build_id    | platform
---------------+-----------------+----------------+-------------+------------+------------+------------+----------------+----------
 48.0          | 48.0            | 48.0b1         |           1 | 2016-07-12 | beta       | f          | 20160712184236 | linux
 48.0          | 48.0            | 48.0b1         |           1 | 2016-07-12 | beta       | f          | 20160712184236 | mac
 48.0          | 48.0            | 48.0b1         |           1 | 2016-07-12 | beta       | f          | 20160712184236 | win
(3 rows)

breakpad=> select major_version, release_version, version_string, beta_number, build_date, build_type, has_builds, b.build_id, b.platform from product_versions pv left join product_version_builds b on (b.product_version_id = pv.product_version_id) where product_name ='Thunderbird' and version_string = '47.0b1' order by version_sort desc limit 50;
 major_version | release_version | version_string | beta_number | build_date | build_type | has_builds |    build_id    | platform
---------------+-----------------+----------------+-------------+------------+------------+------------+----------------+----------
 47.0          | 47.0            | 47.0b1         |           1 | 2016-06-04 | beta       | f          | 20160604054735 | linux
 47.0          | 47.0            | 47.0b1         |           1 | 2016-06-04 | beta       | f          | 20160604054735 | mac
 47.0          | 47.0            | 47.0b1         |           1 | 2016-06-04 | beta       | f          | 20160604054735 | win
(3 rows)

Seems fine (except the lack of a win build). Perhaps reprocessing will set the version on the crash differently. Have you tried that?
(Reporter)

Comment 9

a year ago
https://archive.mozilla.org/pub/thunderbird/candidates/49.0b1-candidates/build5/win32/en-US/thunderbird-49.0b1.json looks OK to me.

who needs to run the reprocess step?
Flags: needinfo?(adrian)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #10)
> Created attachment 8799284 [details]
> crash-stats verisons bonkers.png
> 
> Now, neither 49.0b1 nor 49.0b0 are offered as a choice in
> https://crash-stats.mozilla.com/crashes-per-day/?p=Thunderbird 

That might very well be unrelated and a matter of how those drop-downs aren't doing you any favors. 
The version drop-down choices are based on Firefox and it seems it doesn't reload after you have selected Thunderbird. Another battle for another day, but please file a bug. 


> New, is 50.0b1 was built on Friday. All crashes are being shown as 50.0b0
> [1] like
> https://crash-stats.mozilla.com/report/index/a320137a-9e8a-42f4-a911-
> 507262161010
> 

According to the raw crash that crash is version "50.0"
https://crash-stats.mozilla.com/rawdumps/a320137a-9e8a-42f4-a911-507262161010.json

But the "pretty version" is turned into "50.0b0"
https://crash-stats.mozilla.com/api/UnredactedCrash/?crash_id=a320137a-9e8a-42f4-a911-507262161010

Is that not correct?


> [1]
> https://crash-stats.mozilla.com/search/?version=50.0b0&version=50.
> 0b1&product=Thunderbird&date=%3E%3D2016-10-03T01%3A56%3A00.000Z&date=%3C2016-
> 10-10T01%3A56%3A00.000Z&_sort=-
> date&_facets=signature&_columns=date&_columns=signature&_columns=product&_col
> umns=version&_columns=build_id&_columns=platform#crash-reports

According to
https://crash-stats.mozilla.com/search/?product=Thunderbird&date=%3E%3D2016-09-10T14%3A29%3A00.000Z&date=%3C2016-10-10T14%3A29%3A00.000Z&_sort=-date&_facets=version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-version
(all Thunderbird crashes the last 1 month from now) 
It seems that there are no crashes that have come in under 50.* at all. That doesn't make any sense because I'm pretty sure https://crash-stats.mozilla.com/report/index/a320137a-9e8a-42f4-a911-507262161010 should be included. It happened 4 hours since the upper date bound on my search. 

Adrian, can you explain that?
(Assignee)

Comment 12

a year ago
When a version gets a -b0 version number, it is because we could not find data in our database for that crash's (product, version, release channel, build id) tuple. 

For bp-a320137a-9e8a-42f4-a911-507262161010, that means that when it was processed, our database had no data for (Thunberbird, 50.0, beta, 20161007134619). 

@Peter, that would simply be because we only show the top 50 results in facets. I checked for version 50.0b and it has only 151 results, which is less than the 50th version has (388). If you want to prove that, you can use the API, like this: 

https://crash-stats.mozilla.com/api/SuperSearch/?product=Thunderbird&date=%3E%3D2016-09-10T14%3A29%3A00.000Z&date=%3C2016-10-10T14%3A29%3A00.000Z&_facets_size=200&_results_number=0&_facets=version

Wayne, for which versions should we run a reprocess? I don't think we have data for 50.0 yet, but Peter can maybe confirm that?
Flags: needinfo?(adrian)
(Reporter)

Comment 13

a year ago
(In reply to Peter Bengtsson [:peterbe] from comment #11)
> ...
> > New, is 50.0b1 was built on Friday. All crashes are being shown as 50.0b0
> > [1] like
> > https://crash-stats.mozilla.com/report/index/a320137a-9e8a-42f4-a911-
> > 507262161010
> > 
> 
> According to the raw crash that crash is version "50.0"
> https://crash-stats.mozilla.com/rawdumps/a320137a-9e8a-42f4-a911-
> 507262161010.json
> 
> But the "pretty version" is turned into "50.0b0"
> https://crash-stats.mozilla.com/api/UnredactedCrash/?crash_id=a320137a-9e8a-
> 42f4-a911-507262161010
> 
> Is that not correct?

We did not build a 50.0b0 so, I don't see how it could  be 50.0b0. 
Build specs at https://public.etherpad-mozilla.org/p/thunderbird-release-50.0b1
(Reporter)

Comment 14

a year ago
(In reply to Adrian Gaudebert [:adrian] from comment #12)
> When a version gets a -b0 version number, it is because we could not find
> data in our database for that crash's (product, version, release channel,
> build id) tuple. 
> 
> For bp-a320137a-9e8a-42f4-a911-507262161010, that means that when it was
> processed, our database had no data for (Thunberbird, 50.0, beta,
> 20161007134619). 
> 
> @Peter, that would simply be because we only show the top 50 results in
> facets. I checked for version 50.0b and it has only 151 results, which is
> less than the 50th version has (388). If you want to prove that, you can use
> the API, like this: 

I don't understand what the number of crashes has to do with what version is offered in the UI. But 21 hours after your comment we are at 412 crashes for 50.b0, and in https://crash-stats.mozilla.com/search/ 50.0b0 is not offered in the static version field nor for a "new line" set to "version has terms"

> https://crash-stats.mozilla.com/api/SuperSearch/
> ?product=Thunderbird&date=%3E%3D2016-09-10T14%3A29%3A00.000Z&date=%3C2016-10-
> 10T14%3A29%3A00.000Z&_facets_size=200&_results_number=0&_facets=version
> 
> Wayne, for which versions should we run a reprocess? I don't think we have
> data for 50.0 yet, but Peter can maybe confirm that?

version 49.0 beta and 50.0 beta.
(Assignee)

Comment 15

a year ago
Since we moved to rapid betas a while ago, products in their beta version stopped being aware of their actual version number. They only know about their "major version". So for a Thunderbird 50.0b2, the version number it knows is 50.0. That is what gets sent to crash-stats as part of the crash report. Then, in crash-stats, we have a rule to rewrite these version numbers to what you would expect, so in that earlier case, turn 50.0 into 50.0b2. To do that, we use our database, in which we store a bunch of associations like I described earlier: 

(product, version, release channel, build id) -> actual version number

For example: 

("Thunderbird", "50.0", "beta", "20161007134619") -> "50.0b2"

When we cannot find a match in our database for that tuple (product, version, release channel, build id), we assume that we do not have data about that release yet, and thus give the crash report a "fake" version number ending with "b0". We use "0" because we know that it is not a valid version number for an actual beta, so that's a good way of seeing that something went wrong. It also makes it quite easy for us to spot all of those crashes to send them for reprocessing once we have the missing data. 

So, it is expected that "50.0b0" is not in any drop-down. That is not a valid version number. It is there to show a crash report has a bogus version number. 

Now, the cause of this problem is that we do not have data about Thunderbird 49.0 and 50.0 beta versions in our database. The table that contains data about builds is called `product_version_builds`. As far as I know, it is populated by our FTP Scrapper cron job. 

Peter, do you know why that data is missing from our postgres database? FYI, the query we use to find that data is here: https://github.com/mozilla/socorro/blob/master/socorro/processor/mozilla_transform_rules.py#L881-L891
Component: General → Backend
I've been looking at the ftpscraper recently so I ran it in dry-run mode. Here's an excerpt of the output for Thunderbird releases:

INSERT BUILD
('thunderbird', '48.0', 'win', u'20160712184236', 'beta', '1', u'mozilla-beta', 'build3')
{'ignore_duplicates': True}

INSERT BUILD
('thunderbird', '49.0', 'win', u'20160901155122', 'beta', '1', u'comm-beta', 'build5')
{'ignore_duplicates': True}

INSERT BUILD
('thunderbird', '50.0', 'win', u'20161007134619', 'beta', '1', u'comm-beta', 'build2')
{'ignore_duplicates': True}

I've just shown Windows for brevity, there's also 2 entries for linux, and one for mac. Anyway, the change from mozilla-beta to comm-beta seems like a decent lead to follow.
Not an answer but here's what we have in the database:

breakpad=> SELECT
breakpad->     pv.version_string,
breakpad->     pv.build_type
breakpad-> FROM product_versions pv
breakpad-> WHERE pv.product_name = 'Thunderbird'
breakpad-> AND
breakpad-> (
breakpad(>     pv.release_version like '48%' OR
breakpad(>     pv.release_version like '49%' OR
breakpad(>     pv.release_version like '50%' OR
breakpad(>     pv.release_version like '51%')
breakpad-> group by pv.version_string, pv.build_type
breakpad-> order by version_string
breakpad-> ;
 version_string | build_type
----------------+------------
 48.0a1         | nightly
 48.0a2         | aurora
 48.0b1         | beta
 49.0a1         | nightly
 49.0a2         | aurora
 49.0b1         | beta
 50.0a1         | nightly
 50.0a2         | aurora
 51.0a1         | nightly
 51.0a2         | aurora
(10 rows)


In other words, for 48* and 49* there were 3 build types (nightly, aurora, beta). 
For 50* and 51* there's only 2 build types (nightly, aurora)

Is Nick's comment about the fact that the "mozilla-beta" now seems to be "comm-beta" a clue?
Maybe, but the switch to comm-beta happens at 49, instead of 50.0. Could be another change had an impact too.
(Reporter)

Comment 19

a year ago
Are you saying this should be mozilla-beta, not comm-beta?

('thunderbird', '49.0', 'win', u'20160901155122', 'beta', '1', u'comm-beta', 'build5')
{'ignore_duplicates': True}
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #19)
> Are you saying this should be mozilla-beta, not comm-beta?
> 
> ('thunderbird', '49.0', 'win', u'20160901155122', 'beta', '1', u'comm-beta',
> 'build5')
> {'ignore_duplicates': True}

I'm not entirely sure what the details are but I think what Nick is suggesting is that starting with v 49, the name of the repository changed. That is a clue that the stuff that ftpscraper picks up from archive.mozilla.org might be different in other ways.
(Reporter)

Comment 21

a year ago
Is it possible to update the database to get 50.0b1 in there and reprocess those beta crashes, while we sort out the longer term fix?
(Assignee)

Comment 22

a year ago
The `comm-beta` lead what a good one. That is indeed not a known repository in our database. To fix that: 

> INSERT INTO release_repositories (repository) VALUES ('comm-beta');

Now we need to update the tables being queried: 

> SELECT update_product_versions();

And finally we can verify that now we have builds for version 50.0:

> SELECT DISTINCT pvb.build_id FROM product_versions pv
> LEFT JOIN product_version_builds pvb ON pv.product_version_id = pvb.product_version_id
> WHERE pv.product_name = 'Thunderbird' AND pv.release_version = '50.0'
> AND pv.build_type ILIKE 'beta';
>     build_id    
> ----------------
>  20161007134619
>  20161017040505
>  20161003102417
> (3 rows)

I have applied all of these on stage, then went to look for a 50.0b0 crash report and reprocessed it. And tada! https://crash-stats.allizom.org/report/index/49b67ee9-f320-4fed-b57a-197cf2161026 It now has a version number of 50.0b2. 

So I think that solves it. I'm going to run the same commands on prod and then reprocess all -b0 crash reports we have.
Assignee: nobody → adrian
(Assignee)

Comment 23

a year ago
I have applied the database changes to prod, and have started reprocessing all crashes with version 49.0b0 or 50.0b0 in stage and prod. That's ongoing and should be done quickly. 

I believe that resolves this bug!
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
(Reporter)

Comment 24

a year ago
\o/

Thanks!
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.