AUS needs to figure out latestCompleteBuild on a per-locale basis

RESOLVED FIXED

Status

AUS Graveyard
General
P2
normal
RESOLVED FIXED
9 years ago
8 years ago

People

(Reporter: Alexander L. Slovesnik, Assigned: nthomas)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

9 years ago
STR: Try to update Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.3a1pre) Gecko/20090918 Minefield/3.7a1pre ID:20090918030625

Expected results: Firefox is updated
Actual results: No updates found

Error console results (with app.update.log.all=true): 
AUS:UI gUpdates:onLoad - setting current page to startpage checking
AUS:SVC Checker:getUpdateURL - update URL: https://aus2.mozilla.org/update/3/Firefox/3.7a1pre/20090918030625/Linux_x86-gcc3/ru/nightly/Linux%202.6.28-15-generic%20(GTK%202.16.1)/default/default/update.xml?force=1
AUS:SVC Checker:checkForUpdates - sending request to: https://aus2.mozilla.org/update/3/Firefox/3.7a1pre/20090918030625/Linux_x86-gcc3/ru/nightly/Linux%202.6.28-15-generic%20(GTK%202.16.1)/default/default/update.xml?force=1
AUS:SVC Checker:onLoad - request completed downloading document
AUS:SVC Checker:onLoad - number of updates available: 0
AUS:UI gCheckingPage:onCheckComplete - no update found
AUS:UI gNoUpdatesPage:onPageShow - could not select an appropriate update. Either there were no updates or |selectUpdate| failed

From IRC chat with armenzg:

Unghost: I am sorry but everything seems to be fine with the RelEng side
[22:25:41]	this might be an AUS2 server bug
[22:26:22]	since you can see some locales like en-US and de https://aus2.mozilla.org/update/3/Firefox/3.7a1pre/20090918030625/Linux_x86-gcc3/de/nightly/Linux%202.6.28-15-generic%20%28GTK%202.16.1%29/default/default/update.xml?force=1
[22:26:34]	https://aus2.mozilla.org/update/3/Firefox/3.7a1pre/20090918030625/Linux_x86-gcc3/de/nightly/Linux%202.6.28-15-generic%20%28GTK%202.16.1%29/default/default/update.xml?force=1
[22:26:53]	you can file a bug about it
[22:27:28]	probably nthomas who will wake up in few hours might be able to debug it better than I do
[22:27:37]	or it might just be fixed in tomorrow's run
[22:33:42]	the snippets are physically on the server
[22:33:49]	<armenzg>	with the right contents
[22:34:18]	http://pastebin.mozilla.org/672232
[22:38:40]	<armenzg>	it is not the case since they have the same permissions of en-US
[22:38:45]	and de
[22:39:41]	I don't know how to continue debugging
[22:39:59]	<armenzg>	your best chance is that nthomas thinks of something else that I have not thought of it
The snippers are there, the partials MAR files are there but they are not being served by AUS2
(Assignee)

Comment 2

9 years ago
When I was looking (a wee while ago) only these locales were getting an update (for Linux mozilla-central):
 af ar as be bg bn-BD bn-IN ca cs cy da de el en-GB en-US eo es-AR es-ES es-MX 
 et eu fa fi it kn
which is a very strong overlap with the directories in /opt/aus2/incoming/2/Firefox/mozilla-central/Linux_x86-gcc3/20090921031032 (ie also a partial set of all-locales). What I think is happening is that AUS2 has to figure out the directory for the latest complete update, and that doesn't know about locales, so it gets it wrong.

Recall that for a 20090921HHMMDD build the partial patch generator creates
  20090920HHMMSS/partial.txt, complete.txt with genuine update information
  20090921HHMMSS/partial.txt, complete.txt which are empty files
To make sure we always offer the latest complete on the nightly channel, AUS2 has to figure out what that directory that snippet is in, which it does here
  http://mxr.mozilla.org/mozilla/source/webtools/aus/xml/inc/patch.class.php#433
It's creating a reverse sorted list of directories and taking the 2nd entry of the list to get the complete (timestamps are not relevant, only directory names).

So now imagine the Firefox/$BRANCH/$PLATFORM/20090921HHMMSS directory comes along when the partial for the en-US build is generated. AUS2 will now look in the 20090920HHMMDD directory for the complete update. That works fine for en-US because the snippets there are non-zero size, but it will fail for all the locales, at least until their own partial is generated. The problem was particularly obvious today because the partial generator processed the Linux m-c locales as far as fi then switched to mozilla-1.9.2 Linux and so on, and only just got back to processing m-c (it's working through those as I write).
(Assignee)

Comment 3

9 years ago
So there's more than one problem here. Partly it's due to us taking many hours to generate all the partial updates, and coop is addressing that by moving the partial generation into the nightly build process. But I think AUS also needs to look for the latest complete on a per-locale basis. Say a locale doesn't build for a couple of days - we still want people to update to the latest build that did, rather than get no update at all until it's fixed. I'll have a quick play on the staging server, maybe I'll get lucky.
Summary: AUS doesn't serve updates for some Firefox 3.7 localized night builds → AUS needs to figure out latestCompleteBuild on a per-locale basis
(Assignee)

Comment 4

9 years ago
Created attachment 402013 [details] [diff] [review]
A fix

This works on aus2-staging, but I don't claim any PHP expertise so please run a careful eye over it.
Assignee: morgamic → nthomas
Status: NEW → ASSIGNED
Attachment #402013 - Flags: review?(morgamic)
(Assignee)

Comment 5

9 years ago
Comment on attachment 402013 [details] [diff] [review]
A fix

This will fail if there is a gap between builds, revoking r?.
Attachment #402013 - Flags: review?(morgamic)
Nick - I missed this, will review tomorrow.
(Assignee)

Comment 7

9 years ago
Created attachment 403993 [details] [diff] [review]
A better fix

This works better in situations where there is no locale for one or more builds, and still gives the locale-specific latest build.

The problem is we're deleting the mar files after a few days to save on disk space on ftp.m.o, which interacts badly when a build breaks and/or a l10n team doesn't have time to make fixes. The current example is that building win32 l10n for mozilla-central is build-system broken; the last successful build was 20090921. Do we want to update people using older than 0921 to 0921, or is it somewhat pointless to be slightly less out of date. IIRC the app gets confused if you point it at a 404, so that may be the thing we want to dodge most. Perhaps we can avoid that by limiting that loop to some low-ish max-value, but that creates a hidden dependency between AUS and the cleanup cron job.

Not worth morgamic's precious review time until I have a firm opinion, and get more than a single test done.
(Assignee)

Updated

9 years ago
Duplicate of this bug: 529624
(Assignee)

Updated

8 years ago
Duplicate of this bug: 547263
Nick: will you have any time to do more testing this week? I'm sure this is going to come up at the l10n-releng meeting today.

Comment 11

8 years ago
FWIW, I'm seeing updates tickling in now. Though Minefield can't verify the integrity of today's minor update.

Comment 13

8 years ago
(In reply to comment #7)
> Created an attachment (id=403993) [details]
> A better fix
> 

Just a nit,
>+            for ($i = 0; $i <= count($files); $i++) {
should be
>+            for ($i = 0; $i < count($files); $i++) {

Works the same either way though, since empty() will not complain at any error/warning level.

Comment 14

8 years ago
(In reply to comment #7)
> Created an attachment (id=403993) [details]
> A better fix
> 

Well, 2 more things.

1. Need to update this comment at line 441:
>           // To get this build id, we sort the directory listing by number and retrieve $files[1].

2. The rest of the file uses foreach() instead of for(). So you could do this:

> +            // Return the directory with the non-empty complete.txt, which is the latest available build
> +            foreach ($files as $filename) {
> +                if (!empty($filename) && is_numeric($filename)) {
> +                    $testPath = $dir.'/'.$filename.'/'.$locale.'/complete.txt';
> +                    if (file_exists($testPath) && filesize($testPath) > 0) {
> +                        return $filesname;
> +                    }
> +                }
> +            }

Or, if limit checking on $i:

> +            // Return the directory with the non-empty complete.txt, which is the latest available build
> +            foreach ($files as $i=>$filename) {
> +                if (!empty($filename) && is_numeric($filename)) {
> +                    $testPath = $dir.'/'.$filename.'/'.$locale.'complete.txt';
> +                    if (file_exists($testPath) && filesize($testPath) > 0) {
> +                        return $filename;
> +                    }
> +                }
> +                // insert some limit checking on $i
> +            }

Comment 15

8 years ago
s/$filesname/$filename/
sorry
(Assignee)

Comment 16

8 years ago
Going to revisit this soon.
Priority: -- → P2
(Assignee)

Comment 17

8 years ago
My concerns in comment #7 about deleting mar files have been resolved by work in bug 596391, where we remove l10n snippets from AUS before the mar files are removed from ftp.

So attachment 403993 [details] [diff] [review] needs the PHP fixes mentioned above, plus a test written, then up for review by someone like rhelmer.
(Assignee)

Comment 18

8 years ago
Created attachment 497193 [details] [diff] [review]
Patch with tests

Rob, the problem on this bug is that we attempt to figure out the latest build for all locales based on the en-US data. Right now we end up disabling all updates for a locale until there is localized nightly based on the latest en-US one. Now that we generate updates for nightlies on slaves this is a fairly small interval, unless the locale is broken for some reason. 

This patch resolves both problems by digging in the data store to find the last complete.txt for a specified locale. The one that actually points to a mar file that is, you may recall we create empty files for the current build (for reasons now buried with the nightly partial generator on prometheus). Adds tests which work on my instance of a test harness.
Attachment #402013 - Attachment is obsolete: true
Attachment #403993 - Attachment is obsolete: true
Attachment #497193 - Flags: review?(robert)
Attachment #497193 - Flags: review?(robert) → review+
(Assignee)

Comment 19

8 years ago
Created attachment 497634 [details] [diff] [review]
Patch with tests, and actual test files

Carrying over review. Checked in:

File 	Rev
mozilla/webtools/aus/tests/Verify.txt 	1.18
mozilla/webtools/aus/tests/data/2/Synthetic/2.0.0.x/platform/1100000001/otherlocale/complete.txt 	1.1
mozilla/webtools/aus/tests/data/2/Synthetic/2.0.0.x/platform/1100000001/otherlocale/partial.txt 	1.1
mozilla/webtools/aus/tests/data/2/Synthetic/2.0.0.x/platform/1100000002/otherlocale/complete.txt 	1.1
mozilla/webtools/aus/tests/data/2/Synthetic/2.0.0.x/platform/1100000002/otherlocale/partial.txt 	1.1
mozilla/webtools/aus/xml/inc/patch.class.php 	1.25
Attachment #497634 - Flags: review+
(Assignee)

Comment 20

8 years ago
Green on Hudson (https://hudson.mozilla.org/job/AUSv2/77/) after rhelmer fixed it up to pull any changes to Verify.txt into fitnesse. 

Tagging:
$ cd mozilla/webtools/aus/xml
$ cvs tag AUS2_RTM_201012141526
T README
T flush.php
T htaccess.dist
T index.php
T status.php
T bug602275/please_reinstall.xml
T inc/aus.class.php
T inc/config-dist.php
T inc/config-test.php
T inc/init.php
T inc/memcaching.php
T inc/patch.class.php
T inc/update.class.php
T inc/xml.class.php

$ cvs tag AUS2_PRODUCTION
W inc/patch.class.php : AUS2_PRODUCTION already exists on version 1.24 : NOT MOVING tag to version 1.25
$ cvs tag -F AUS2_PRODUCTION inc/patch.class.php
T inc/patch.class.php

I'll file a deployment bug.
(Assignee)

Updated

8 years ago
Depends on: 619208
(Assignee)

Comment 21

8 years ago
Fixed. eg linux ca mozilla-1.9.2 was last built on Dec 13 and we weren't offering any updates at all, now we offer the build from the 13th.

https://aus2.mozilla.org/update/1/Firefox/3.6.14pre/20101212033253/Linux_x86-gcc3/ca/nightly/update.xml
<?xml version="1.0"?>
<updates>
    <update type="minor" version="3.6.14pre" extensionVersion="3.6.14pre" buildID="20101213033254">
        <patch type="complete" URL="http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2010/12/2010-12-13-03-mozilla-1.9.2-l10n/firefox-3.6.14pre.ca.linux-i686.complete.mar" hashFunction="sha512" hashValue="8ccee0662a9638c263ff11482f5f57d790a7ccef832eb2d207ab7536fa2f73b2c3dc10215c68dd605a24f913d8be710cb02f9618de799e8e0a0f2de9f18348e9" size="10528431"/>
        <patch type="partial" URL="http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2010/12/2010-12-13-03-mozilla-1.9.2-l10n/firefox-3.6.14pre.ca.linux-i686.partial.20101212033253-20101213033254.mar" hashFunction="sha512" hashValue="5f8691ffa0e383d3ad435008f3cb557a2622c7acc472110622f49df459cc6285d6ce64db0b61ca4cbda1f27c6c3cecadcc3ce83cc8d9772653addf8dcafe87b6" size="293501"/>
    </update>
</updates>
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.