All win32 l10n builds in http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central-l10n/ are built on Aug 15th.
Build log ( http://tinderbox.mozilla.org/showlog.cgi?log=Mozilla-l10n-ru/1345300590.1345302212.8382.gz ) shows errors like:
Failure: exceptions.RuntimeError: SIGKILL failed to kill process
Looks like backout of this bug https://bugzilla.mozilla.org/show_bug.cgi?id=782981 cause problems with win32 Nightly l10n.
iirc :rail was looking into this yesterday, I don't recall if there was any outcome
Looks like it's been fixed. I see on ftp win32 Nightly l10n builds from Aug 19th.
Latest cs Win32 build (ZIP).
Built from http://hg.mozilla.org/mozilla-central/rev/86ee4deea55b
It looks like there is still old version. No new updates are available.
Looks green to me:
Still seeing many timeouts on m-c. As in comment #0, final output in the logs always seems to be:
Adding file patch and add instructions to file 'update.manifest'
Timeout is currently set to 1200s.
All the failures are timing out in "make_partial_mar." The only repacks that are succeeding are for locales where we can't find the previous complete mar, so we don't even try to make a partial mar.
I tried re-running the "make_partial_mar" command on mw32-ix-slave06 after a failed repack to see whether we needed a slightly longer timeout, but the repack hadn't made any visible progress after 1 hour.
I should note that *all* the mw32 slaves have been rebooted since yesterady, and I've scheduled multiple clobbers. Repacks are still failing.
Hrmm, some of the clobbers failed too, but that doesn't stop the build.
Worse, manual clobbers of a few slaves (mw32-ix[02,03,06,26]) as Administrator haven't helped either. We're still timing out.
I did a reconfig this afternoon and whatever was blocking these builders seems to have become unblocked. We'll probably need to wait until tomorrow to get a full set of partial updates, but there will be some at least today.
Leaving open until I'm sure this is resolved.
(In reply to Chris Cooper [:coop] from comment #11)
> I did a reconfig this afternoon and whatever was blocking these builders
> seems to have become unblocked. We'll probably need to wait until tomorrow
> to get a full set of partial updates, but there will be some at least today.
Still broken AFAICT. We only managed to generate one partial successfully yesterday (it), and the timeout pattern has resumed this morning.
Looks like Linux and Mac builds also fail today.
Yeah, I noticed, I suspect that's something with running configure, and then updating to the code of the previous nightly. I hope those will just settle tomorrow.
(In reply to Axel Hecht [:Pike] from comment #14)
> Yeah, I noticed, I suspect that's something with running configure, and then
> updating to the code of the previous nightly. I hope those will just settle
Mac and linux will certainly be resolved by bug 785066. Limited testing in staging indicates that it *might* unblock Windows repacks as well.
I've get update for my Nightly, need to download manually but it is working stable.
Again, leaving this open until Monday to make sure we're green on both m-c and aurora again.
Still not OK, some builds are present some not.
(In reply to semtex2 from comment #18)
> Still not OK, some builds are present some not.
We're almost there. Aurora is back to normal, and about a third of m-c repacks failed yesterday. This could be due to certain repack slaves still needing a clobber: these slaves don't reboot very often, if at all, so they would have trouble clearing a wedged state on their own.
(In reply to Chris Cooper [:coop] from comment #19)
> We're almost there. Aurora is back to normal, and about a third of m-c
> repacks failed yesterday. This could be due to certain repack slaves still
> needing a clobber: these slaves don't reboot very often, if at all, so they
> would have trouble clearing a wedged state on their own.
We've been so long without nightlies that the repacks yesterday didn't even try to try to generate partial patches, that's why they were green.
So, if I get that right, we fail either in creating the patch (binary diff) for xul.dll - or the one after it (does make_partial_mar print the name of the file before or after working on it?).
Can we try in some place to re-enact this and maybe create more debug output from that script so we can get to the bottom what what exactly is failing or timing out there?
It feels to me like we changed xul.dll in some way that it either grew too large (so that make_partial_mar runs too long without output) or has something in it that the binary diff tools don't like.
It's suspected that this may cause l10n repacks to fail once 17 merges to mozilla-aurora. Given that, nominating for tracking so that we check in again early in the week.
It'd be great to get a locally reproducible case for devs to look at, and a sample of the process that's timing out.
coop, if you take a failing working dir for m-c l10n, and copy in the mbsdiff executable from aurora (working, right?), does that resolve the issue diffing xul.dll?
If so I'd suggest bug 579517 is causing funkiness on windows
* the Makefile in other-licenses/bsdiff/ includes toolkit/mozapps/update/updater
* http://hg.mozilla.org/mozilla-central/rev/88e47f6905e9 landed on Aug 8 (but maybe didn't show up until a clobber took affect ?)
KaiRo pointed out that this theory should affect en-US partials too, which I countered with 'but our minimized l10n build setup could be quite a different environment'. Callek also pointed out that we might be using older MSVC.
I see builds in https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora-l10n/ with differing timestamps. Does that mean the issue is now on Aurora as well?
Yes, I'm seeing lots of purple for win32 l10n aurora repacks now.
Sorry I haven't had a lot of time to spend on this so far this week. Between the release and pymake, it's been busy.
As I was debugging this, I noticed that the process creates a working dir called current.work/. I amended the rm_unpack_dirs step to also remove this working dir, and saw the following on future attempts:
rm: cannot remove `current.work/xul!1.pgc.patch': Permission denied
This error persists until the slave is rebooted, so I've also added a maybe_reboot step for l10n that reboots the slave after 5 jobs, just like everything else.
The slave can remove the current.work/ dir after the reboot, but the first repack just gets it wedged in the same way again. :/
Actually, is it in the end "just" a problem with the ! in that filename?
(In reply to Robert Kaiser (:email@example.com) from comment #27)
> Actually, is it in the end "just" a problem with the ! in that filename?
I can test that out, but as Ted indicates in https://bugzilla.mozilla.org/show_bug.cgi?id=785748#c2, the file shouldn't be there in the first place.
Created attachment 657034 [details] [diff] [review]
Add periodic reboot after l10n jobs, clean up current.work working
This should help prevent one errant slave from burning many l10n jobs (as) quickly.
Not sure that bug 785748 really blocks here, given that we actually generated l10n win32 partials on m-c last night. Still busted on aurora though.
This is now in production.
(In reply to Chris Cooper [:coop] from comment #28)
> (In reply to Robert Kaiser (:firstname.lastname@example.org) from comment #27)
> > Actually, is it in the end "just" a problem with the ! in that filename?
> I can test that out, but as Ted indicates in
> https://bugzilla.mozilla.org/show_bug.cgi?id=785748#c2, the file shouldn't
> be there in the first place.
I've tried this now without success. Partial patch generation still stalls on the .pgc file regardless of whether there are special chars in the filename.
I also tried subbing in working copies of mar and mbsdiff from m-c to aurora, but that didn't help. We still fail on the .pgc file.
Given those results, I went and looked at the unpacked directories for the complete mars on m-c since we're getting partial mars there again. The xul!1.pgc are absent from those complete mars now, despite a lack of visible progress on bug 785748.
Did something land on m-c *after* the merge to aurora that would have fixed this on m-c? I'd like some help trying to track this down.
Created attachment 657472 [details] [diff] [review]
Remove .pgc files before creating the partial mar
This gets updates unblocked on aurora until we figure out why the .pgc are being packaged in the first place.
Comment on attachment 657472 [details] [diff] [review]
Remove .pgc files before creating the partial mar
Got review from Aki on IRC after:
* escaping the wildcard
* adding the -print so we can see what gets deleted
* testing against Mac and Linux as well (we passed)
This is in production now.
Just checked https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-aurora-l10n/. The bug looks less pronounced, but these builds are from 9/4
while the rest are from 9/5. Can we call that fixed?
I don't know if current situation i related to this bug, but since 2 days random Linux and Win32 build fails again. Anyway most of them are missing...
It appears to be two unrelated problems. Filed
* win32 - bug 789838 - branding.nsi missing in win32 l10n builds
* linux32 - bug 789837 - fatal error: opening dependency file .deps/elf-dynstr-gc.pp in linux32 l10n builds