Closed Bug 1139249 Opened 9 years ago Closed 9 years ago

Fix BadData errors in aus4.m.o for edge cases

Categories

(Release Engineering :: Release Requests, defect, P2)

x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: nthomas)

Details

About 20k of these in the last 24 hours, eg  
 auslib.errors:BadDataError: ("Can't find platform '%s'", u'Darwin_Universal-gcc3') 

Looks like there are a couple of problems
* single arch builds before the 29.0 watershed on beta, eg
/update/3/Firefox/25.0/20130917123208/Darwin_x86_64-gcc3/en-US/beta/Darwin%2011.4.2/default/default/update.xml 

* 3.6.13+ ppc builds on release, eg
/update/3/Firefox/3.6.18/20110614230723/Darwin_x86-gcc3-u-ppc-i386/de/release/Darwin%2010.1.1/default/default/update.xml
Assignee: nobody → nthomas
OS: Mac OS X → All
Priority: -- → P2
Fixes, look ok Ben ?

* modify Firefox-29.0b8-build1-schema2, adding this to platforms:
    "Darwin_x86_64-gcc3": {
      "alias": "Darwin_x86_64-gcc3-u-i386-x86_64"

* looks like 3.6.28 was the last build on release for ppc, we dropped it at 4.0b7 on beta. We'll need a 3.6.28 blob to handle 3.6-3.6.12 updates anyway, so might as well create one here too. Then add two rules, one for low range, and the other for >=3.6.13 and Header Arch of ppc.


There was also some Darwin Universal logs, but newrelic doesn't seem completely consistent in the way it searches.
Flags: needinfo?(bhearsum)
(In reply to Nick Thomas [:nthomas] from comment #1)
> Fixes, look ok Ben ?
> 
> * modify Firefox-29.0b8-build1-schema2, adding this to platforms:
>     "Darwin_x86_64-gcc3": {
>       "alias": "Darwin_x86_64-gcc3-u-i386-x86_64"

Makes sense.

> * looks like 3.6.28 was the last build on release for ppc, we dropped it at
> 4.0b7 on beta. We'll need a 3.6.28 blob to handle 3.6-3.6.12 updates anyway,
> so might as well create one here too. Then add two rules, one for low range,
> and the other for >=3.6.13 and Header Arch of ppc.

To make sure I understand correctly:
* 3.6-3.6.12 will all go to 3.6.28 (because they need to watershed to get the new build targets.
* 3.6.13-3.6.28 i386 will get 12.0
* 3.6.13-3.6.27 ppc will get 3.6.28
* 3.6.28 ppc will dead end

(Some of these might be handled in 998721 or one of the other bugs for setting up old release update paths, I just wanted to spell out the whole picture.)
Flags: needinfo?(bhearsum)
(In reply to Nick Thomas [:nthomas] from comment #1)
> Fixes, look ok Ben ?
> 
> * modify Firefox-29.0b8-build1-schema2, adding this to platforms:
>     "Darwin_x86_64-gcc3": {
>       "alias": "Darwin_x86_64-gcc3-u-i386-x86_64"

This is done, except for pointing to Darwin_x86-gcc3-u-i386-x86_64 to match the existing alias. Discovered bug 1140262 along the way, which scared me.

(In reply to Ben Hearsum [:bhearsum] from comment #2)
> To make sure I understand correctly:
> * 3.6-3.6.12 will all go to 3.6.28 (because they need to watershed to get
> the new build targets.
> * 3.6.13-3.6.28 i386 will get 12.0
> * 3.6.13-3.6.27 ppc will get 3.6.28
> * 3.6.28 ppc will dead end
> 
> (Some of these might be handled in 998721 or one of the other bugs for
> setting up old release update paths, I just wanted to spell out the whole
> picture.)

That sounds right to me. And yeah, windows will be different after bug 998721.
Added an alias for Darwin_x86-gcc3 to Firefox-29.0b8-build1-schema2 too.
I've updated the rule which was blocking updates for <3.6.13, to block <4.0. Otherwise they get pointed at Firefox-12.0-build1, which is a schema 2 blob which 3.6.13-3.6.28 can't handle.

The other big problem we've had (worse than comment #0) is 
   auslib.errors:BadDataError: No such locale 'en-US' in platform 'Android_arm-eabi-gcc3' 
from the multilocale builds from Google Play. Note they're querying with en-US locale, which means we can't tell them apart from the single-locale update where we do want to offer an update. The updater should actually be off in the multilocale builds, since Google Play handles the updates there. Bug 800397 for the build side of this.

I've added two block rules, for Fennec with channel beta* and release*, and locale en-US. That should leave the single locale updates working on nightly/aurora.
Those changes have brought down the error rather from ~0.4% to ~0.01%. The residuals are
* invalid literal for int() with base 10: '1?avast=1', from malformed requests like ?force=1?avast=1 I think
* ("Can't find platform '%s'", u'Darwin_Universal-gcc3'), not blocking beta requests for < 4.0
* locales not found for deprecated locales, eg mn, or bogus ones, eg ru-RU, de-DE
(In reply to Nick Thomas [:nthomas] from comment #6)
> Those changes have brought down the error rather from ~0.4% to ~0.01%. The
> residuals are
> * invalid literal for int() with base 10: '1?avast=1', from malformed
> requests like ?force=1?avast=1 I think

This got fixed in bug 1141513. However, there appears to be other errors getting grouped into this bucket on newrelic. Eg: it still shows up as recent on https://rpm.newrelic.com/accounts/263620/applications/5929969/traced_errors, but when I dive into it shows me https://rpm.newrelic.com/accounts/263620/applications/5929969/traced_errors/3412816268, which is actually "ValueError: invalid version number '4.0b8pre'".

The fact that https://aus4.mozilla.org/update/3/Firefox/34.0/20141125180439/Darwin_x86_64-gcc3-u-i386-x86_64/tr/release-localtest/default/default/default/update.xml?force=1?avast=1 returns an update gives me confidence that it's newrelic acting funny here, and not an incomplete patch.
I've also noticed weirdness about the way it aggregates errors, maybe because they're all BadDataErrors and it doesn't have much of a chance.

I see we've got a lot of errors 
 ("Can't find platform '%s'", u'WINNT_x86_64-msvc')
on urls like
 /update/3/Firefox/37.0/20150319212106/WINNT_x86_64-msvc/en-US/beta/Windows_NT%206.3.0.0%20(x64)/default/default/update.xml 
Which is odd, that should end up at Firefox-37.0b7-build1, which has that build target. Most of them seem to be 37.0b7 (20150319212106) itself.
(In reply to Nick Thomas [:nthomas] from comment #8)
> I've also noticed weirdness about the way it aggregates errors, maybe
> because they're all BadDataErrors and it doesn't have much of a chance.
> 
> I see we've got a lot of errors 
>  ("Can't find platform '%s'", u'WINNT_x86_64-msvc')
> on urls like
>  /update/3/Firefox/37.0/20150319212106/WINNT_x86_64-msvc/en-US/beta/
> Windows_NT%206.3.0.0%20(x64)/default/default/update.xml 
> Which is odd, that should end up at Firefox-37.0b7-build1, which has that
> build target. Most of them seem to be 37.0b7 (20150319212106) itself.

I'm guessing that's because we pushed 37.0 RC1 to beta already, which doesn't have a win64 build. I went ahead and created an extra rule for WINNT_x86_64-msvc to give it 37.0b7 again. We'll need to delete it when 38.0b1 ships.
Our error rate is typically < 0.1%, time to repress some perfectionist tendencies.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.