Closed Bug 511724 Opened 15 years ago Closed 15 years ago

20090818 l10n nightlies (mozilla-central) are not updating

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: joduinn)

References

Details

The nightly job on prometheus-vm finished quite awhile ago but these builds still cannot update, it seems. I've had a report of win32 de not working and I personally just tested Mac en-GB and had the same behaviour. Here's a dump from the console:
bitters-2:MacOS bhearsum$ *** AUS:SVC UpdateService:canUpdate - testing /Users/bhearsum/Desktop/Minefield.app/Contents/MacOS/update.test
*** AUS:SVC UpdateService:canUpdate - testing /Users/bhearsum/Desktop/Minefield.app/Contents/MacOS/updates/0/update.test
*** AUS:SVC UpdateService:canUpdate - able to update
*** AUS:SVC General:readStringFromFile - file doesn't exist: /Users/bhearsum/Desktop/Minefield.app/Contents/MacOS/updates/0/update.status
*** AUS:SVC General:readStatusFile - status: null, path: /Users/bhearsum/Desktop/Minefield.app/Contents/MacOS/updates/0/update.status
*** AUS:SVC UpdateService:_postUpdateProcessing - no status, no update
*** AUS:SVC UpdateManager:_loadXMLFileIntoArray: XML file does not exist
*** AUS:SVC General:getLocale - getting locale from file: /Users/bhearsum/Desktop/Minefield.app/Contents/MacOS/update.locale, locale: en-GB
*** AUS:SVC Checker:getUpdateURL - update URL: https://aus2.mozilla.org/update/3/Firefox/3.7a1pre/20090818035827/Darwin_Universal-gcc3/en-GB/nightly/Darwin%209.8.0/default/default/update.xml?force=1
*** AUS:SVC Checker:checkForUpdates - sending request to: https://aus2.mozilla.org/update/3/Firefox/3.7a1pre/20090818035827/Darwin_Universal-gcc3/en-GB/nightly/Darwin%209.8.0/default/default/update.xml?force=1
*** AUS:SVC Checker:onLoad - request completed downloading document
*** AUS:SVC Checker:onLoad - number of updates available: 0

AUS looks ok:
[bhearsum@dm-ausstage01 en-GB]$ pwd 
/opt/aus2/incoming/2/Firefox/mozilla-central/Darwin_Universal-gcc3/20090818035827/en-GB
[bhearsum@dm-ausstage01 en-GB]$ ls -l
total 8
-rw-r--r-- 1 ffxbld users 388 Aug 20 11:05 complete.txt
-rw-r--r-- 1 ffxbld users 414 Aug 20 11:06 partial.txt

Not sure what's going on here.
This looks like the same problem I was having with Mac builds (bug 510080).
We never actually fixed anything but they just started working as expected.

morgamic any idea on how could we debug this?
Taking for now but I don't know how to debug things in AUS
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Depends on: 480081, 510524
Priority: -- → P3
Talked with Armen, now taking for investigation. 

seth/axel: have you heard anything about this from localizers - and if so, on what o.s.?
Assignee: armenzg → joduinn
OS: Mac OS X → All
Hardware: x86 → All
To clarify: the updates are eventually served, it just seems like some locales/o.s. have a long delay before the available updates are detected. Once detected, the updates are served as usual.
Severity: major → normal
I know that my mac german updates aren't in 'til the next morning, or at least I've stopped looking for them earlier.

I don't expect that we can really understand this until we understand what the update machine does.

We do have intermittent build failures, too. I don't see any right now, but I recall that Armen and I saw at least one platform failing for 3 or 4 locales for "something" the other day. There's a bug on file to tweak nagios to complain about any build older than ... instead of just no build new than ... .
(In reply to comment #5)
> I know that my mac german updates aren't in 'til the next morning, or at least
> I've stopped looking for them earlier.
Delays like this are expected right now. So long as you do get updates at some point during the day, we're calling that GOODENOUGHFORNOW. Trimming down the while branches we produce which l10n nightly updates on would really help reduce the load, and hence reduce the time delay you mention. I know we talked about that earlier this week - is there a bug tracking that? 


> I don't expect that we can really understand this until we understand what the
> update machine does.
Yeah, its a complex ball-of-string. Coop and Armen are working on that, as part of moving from the one dedicated update machine to queued jobs on the pool of slaves. That work is already tracked in separate bugs.


> We do have intermittent build failures, too. I don't see any right now, but I
> recall that Armen and I saw at least one platform failing for 3 or 4 locales
> for "something" the other day. There's a bug on file to tweak nagios to
> complain about any build older than ... instead of just no build new than ... .
Do you have any more details, or do we wait to see if it happens again ?
(In reply to comment #6)
> (In reply to comment #5)
> > I know that my mac german updates aren't in 'til the next morning, or at least
> > I've stopped looking for them earlier.
> Delays like this are expected right now. So long as you do get updates at some
> point during the day, we're calling that GOODENOUGHFORNOW. Trimming down the
> while branches we produce which l10n nightly updates on would really help
> reduce the load, and hence reduce the time delay you mention. I know we talked
> about that earlier this week - is there a bug tracking that? 
> 
Late updates will be fixed with bug 511967

> 
> > I don't expect that we can really understand this until we understand what the
> > update machine does.
> Yeah, its a complex ball-of-string. Coop and Armen are working on that, as part
> of moving from the one dedicated update machine to queued jobs on the pool of
> slaves. That work is already tracked in separate bugs.
>
I am working on prioritizing en-US over other locales (bug 511901) and as a side effect I have created a temporary google group (http://groups.google.com/group/mozilla-nightly-updates/topics) where prometheus-vm will be posting as it processes updates that they have been pushed live. This will help us know at which time an update has been pushed live and enable us to do further exposing of what is going on (even though I would rather wait to have the updates-pushed-live-by-slaves approach rather than improving the current patch-packager.pl)

> 
> > We do have intermittent build failures, too. I don't see any right now, but I
> > recall that Armen and I saw at least one platform failing for 3 or 4 locales
> > for "something" the other day. There's a bug on file to tweak nagios to
> > complain about any build older than ... instead of just no build new than ... .
> Do you have any more details, or do we wait to see if it happens again ?
The bug wrt tweaking nagios is bug 511696.

Some intermittent build failures are things like bug 511953 which a slave starts throwing l10n repackages jobs because it needs a clobber of one of the needed repos.

I also can't recall what that "something" was (too much good action has happened in the last 2 weeks - I feel that one day we will look back at August 2009 and we will say: "do you remember the old good days when so many bugs appeared and dissapeared in our l10n-releng infrastructure making our days fun? - yeah man, we really fixed many bugs" ;)
Closing, as there's nothing to do here. We do generate nightly updates for L10n but it happens that AUS sometimes does not serve the update that is correctly in place. The fix for this is actually being done in bug#517947, bug#511967, bug#511696.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.