Closed
Bug 524798
Opened 15 years ago
Closed 15 years ago
Firefox 3.6, 3.7 l10n repacks on push not building since oct 4th
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Pike, Assigned: coop)
Details
(Whiteboard: [l10n])
Both on mozilla-central and on 1.9.2, the repacks on change don't seem to pay attention. I don't see any builds after oct 4th for any of the "... build" builders.
Reporter | ||
Updated•15 years ago
|
Whiteboard: [l10n]
Comment 1•15 years ago
|
||
Verified; the builders are not picking up any changes. I have grep the twistd logs and there is nothing obvious. Unless somebody can pick this up (please do so) I can *probably* pick it up within today and tomorrow.
Assignee | ||
Comment 2•15 years ago
|
||
Grabbing for now, but Armen may take this from me tomorrow.
Assignee: nobody → ccooper
Status: NEW → ASSIGNED
Priority: -- → P2
Assignee | ||
Comment 3•15 years ago
|
||
I'm seeing some timeouts in the logs. Not sure whether that would tank the whole scheduler or not. 2009-10-28 05:54:01-0700 [-] <HgLocalePoller for http://hg.mozilla.org/releases/ l10n-mozilla-1.9.2/ar>: polling failed, result Getting http://hg.mozilla.org/rel eases/l10n-mozilla-1.9.2/ar/pushlog?fromchange=f211a1ffb498594947c7e1ab6a7a5d2c5 5066ea2 took longer than 30 seconds. 2009-10-28 05:54:01-0700 [-] Traceback (most recent call last): 2009-10-28 05:54:01-0700 [-] Failure: twisted.internet.defer.TimeoutError: Getti ng http://hg.mozilla.org/releases/l10n-mozilla-1.9.2/ar/pushlog?fromchange=f211a 1ffb498594947c7e1ab6a7a5d2c55066ea2 took longer than 30 seconds. 2009-10-28 05:54:01-0700 [-] <HgLocalePoller for http://hg.mozilla.org/l10n-cent ral/as>: polling failed, result 2009-10-28 05:54:01-0700 [-] Traceback (most recent call last): 2009-10-28 05:54:01-0700 [-] Failure: twisted.internet.error.TimeoutError: User timeout caused connection failure.
Assignee | ||
Comment 4•15 years ago
|
||
FWIW, repack-on-change seems to be working in staging...lots of builds since oct 4th. Maybe we're seeing load issues on production master?
Comment 5•15 years ago
|
||
We've had free slaves on pm if that's what you mean, but its certainly busy since we added debug unittests and split mochitest-plain. There's also been one "Firefox mozilla-1.9.2 win32 l10n" build pending on pm for quite a while, perhaps that's wedging it ?
Assignee | ||
Comment 6•15 years ago
|
||
(In reply to comment #5) > We've had free slaves on pm if that's what you mean, but its certainly busy > since we added debug unittests and split mochitest-plain. There's also been one > "Firefox mozilla-1.9.2 win32 l10n" build pending on pm for quite a while, > perhaps that's wedging it ? I was being non-specific about "load" until I narrow it down. ;) I was specifically worried about network congestion here since due to the recent timeout when pulling an l10n pushlog (comment #3).
Reporter | ||
Comment 7•15 years ago
|
||
My suspect is that there was a network problem at the time of the master start/config. If there is a problem loading the l10n.ini's, the Dispatchers don't get added to the Scheduler, and thus it doesn't listen to the changes. Not sure when the master got its last kicks.
Assignee | ||
Comment 8•15 years ago
|
||
(In reply to comment #7) > My suspect is that there was a network problem at the time of the master > start/config. > > If there is a problem loading the l10n.ini's, the Dispatchers don't get added > to the Scheduler, and thus it doesn't listen to the changes. > > Not sure when the master got its last kicks. AFAICT the last full restart happened on Sep 24, as witnessed by https://wiki.mozilla.org/ReleaseEngineering:Maintenance and the reported age of the buildbot master process on production-master. We've had many reconfigs since then though, including some on Oct 4-5. Would a single bad reconfig where the l10n.inis fail to load cause the problem to persist until the next restart?
Assignee | ||
Comment 9•15 years ago
|
||
I just scheduled some downtime for tomorrow (7am EDT) no restart this master and resurrect the scheduler. Axel: is there anyway we could make this more robust? Multiple initial loading attempts? Periodic retries is there's nothing setup?
Reporter | ||
Comment 10•15 years ago
|
||
Yeah, probably there are ways to make this more robust. And it might be that l10n.ini failures don't recover on reconfig, but fixing that seems hard. I didn't find a good way to report an error, fwiw, as if there's something bad, you might just not have a builder to which you can hook an error message, so the best way to tell right now is to look at the waterfall and make sure that the l10n builders (the on-demand ones) report that the tree is configured. I just cross-checked, the reconfigs of the other master got loaded.
Assignee | ||
Comment 11•15 years ago
|
||
(In reply to comment #10) > I didn't find a good way to report an error, fwiw, as if there's something bad, > you might just not have a builder to which you can hook an error message, so > the best way to tell right now is to look at the waterfall and make sure that > the l10n builders (the on-demand ones) report that the tree is configured. Armen had a good suggestion about having any exceptions from the master twistd.log files mailed immediately to releng so things like this wouldn't go unnoticed for so long that we lose the logs needed to diagnose them. Any such process to pull out exceptions would have to be extremely lightweight though to avoid bogging down the already-slow masters. We generate twisted logs at quite a rate.
Reporter | ||
Comment 12•15 years ago
|
||
Seems that coop restarted the master, but neither the central nor the 1.9.2 builders seem to indicate that the dispatchers went up. Can someone from releng attach the relevant twistd.log from that restart for reference and debugging?
Assignee | ||
Comment 13•15 years ago
|
||
Someone must have accidentally removed the symlink to l10nbuilds1.ini, probably on or around Oct 4. Re-creating the symlink got things working again.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•