Closed Bug 1499714 Opened 7 years ago Closed 7 years ago

crashes for b99 builds can get the wrong version_string

Categories

(Socorro :: Processor, task, P1)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

Details

Attachments

(1 file)

Crash reports for 63.0b99 (version: 63.0, build id: 20181015152800) are coming in and getting assigned "63.0" as the version string. Example: bp-7e0f45e9-6246-448a-a3c9-6f3c70181017 This bug covers figuring out why, fixing it, and reprocessing affected crashes.
Socorro has a processor rule called BetaVersionRule which uses the webapp's /api/VersionString API endpoint to do a lookup in the product version data Socorro has for a (product, version, buildid) combination and then uses the resulting version_string in the processed crash. This allows crash reports for beta builds to get the correct version string (63.0b11 vs. 63.0). In the case of the 63.0b99 builds, however, the query that /api/VersionString is using ends up with two different version_strings: version_string | build_id | platform | product_version_id | repository ----------------+----------------+----------+--------------------+----------------- 63.0b99 | 20181015152800 | linux | 7 | mozilla-beta 63.0b99 | 20181015152800 | mac | 7 | mozilla-beta 63.0b99 | 20181015152800 | win | 7 | mozilla-beta 63.0 | 20181015152800 | linux | 5 | mozilla-release 63.0 | 20181015152800 | mac | 5 | mozilla-release 63.0 | 20181015152800 | win | 5 | mozilla-release (6 rows) The ftpscraper data looks ok. I verified this is true of 62.0b99 as well. There's a comment in the code that goes like this: # The query can return multiple results, but they're the same value. So # we just return the first one. I think that comment is wrong for the 0b99 case. I haven't looked at whether it's possibly wrong for other cases as well. Maybe. I think the right fix here is to change /api/VersionString to also require the release channel. That would disambiguate the results and fix this issue. I think that fix also doesn't break other expectations/assumptions. Grabbing this to do now.
Assignee: nobody → willkg
Status: NEW → ASSIGNED
Priority: -- → P1
Relatedly, I suspect this has been a bug for a long time. We probably didn't notice it because the "right choice" got cached often enough in the previous iteration. The rewrite of the BetaVersionRule bits caches for longer, so the problem became noticeable. If I have some time, I'll try to prove that theory.
Commits pushed to master at https://github.com/mozilla-services/socorro https://github.com/mozilla-services/socorro/commit/60953244f9541da20c1d1bf8985917fda332c058 fix bug 1499714: further restrict VersionString by channel This fixes the problem with b99 where there are two different version strings and the VersionString API code picked the "first" one, but it's unsorted, so it's really a random one and random isn't awesome here. https://github.com/mozilla-services/socorro/commit/f05379e026cc4d763d93f1bfcec51db36b6fb81c Merge pull request #4648 from willkg/1499714-b99 fix bug 1499714: further restrict VersionString by channel
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
I landed the fix, deployed it to stage, and tested it there... and it didn't work. Crashes were still getting 63.0 as the version string. So I read through the logs and noticed that the /api/VersionString/ request for this build was ages ago and then I wondered whether during the deploy, the processors came up before the webapp, so then they talked to the old webapp and got the wrong version string and because the value is cached in the processor, reprocessing didn't help. I did another stage deploy and everything is fine now. As an aside, this would have fixed itself since the cache in the processor has a TTL. I think it takes 6 hours or something like that. We might need to recycle the processor nodes in prod after the prod deploy depending on what comes up when.
The changes are on prod. We did have to recycle the processor nodes. I reprocessed all the crashes with that build id. Pretty sure we're good here now.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: