Closed Bug 1236259 Opened 7 years ago Closed 7 years ago

l10n builds fail to start with XML parsing error

Categories

(Firefox Build System :: General, defect)

defect
Not set
major

Tracking

(firefox46 fixed)

VERIFIED FIXED
Tracking Status
firefox46 --- fixed

People

(Reporter: theo148, Assigned: gps)

References

Details

Attachments

(1 file)

The 2016-01-02 l10n nightly builds don't appear to be starting correctly. Attempting to start them results in the following XML parsing error.

XML Parsing Error: undefined entity
Location: chrome://browser/content/browser.xul
Line Number 1415, Column 9:
        <toolbarbutton id="sync-tabs-menuitem2"
--------^

I can reproduce this with the 2016-01-02 win32 and win64 en-GB builds. The en-US builds start fine. Testing with a clean profile does not seem to make a difference.
Trying a couple of other builds, the win32 and win64 de builds also have this problem. The fr builds appear to work fine, however.
https://transvision.mozfr.org/string/?entity=browser/chrome/browser/browser.dtd:syncTabsMenu3.label&repo=central
Assignee: nobody → moz_en-gb
Component: Build Config → en-GB / English (United Kingdom)
Product: Firefox → Mozilla Localizations
QA Contact: moz_en-gb
Version: Trunk → unspecified
Moving back into build-config, as it effects multiple locales.

I don't have access to a windows VM right now, Theodore, can you test on other platforms?

Also, do you know which date was the last working build?
Assignee: moz_en-gb → nobody
Component: en-GB / English (United Kingdom) → Build Config
Product: Mozilla Localizations → Firefox
QA Contact: moz_en-gb
Blocks: 1236298
See Also: → 1236301
As far as I'm aware, the 2016-01-01 builds were working fine.

I've tested the en-US, en-GB, de, and fr builds from 2016-01-02 on 64 bit Linux. They all seem to start without a problem.
Italian works just fine on Win8.1 64bit (Firefox 46.0a1 20160102030217).

I tried the en-GB zip file and it's indeed broken.
Comparing
https://l10n.mozilla.org/dashboard/compare?run=561363

with the browser.dtd packaged in the Win32 zip, there are no merges from compare-locales. In the Linux build those strings are there.

Some builds are not failing just because those locales work on l10n-central, and have translated that string for Firefox 45.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I'm puzzled.

Looking at the build log from Jan 2nd, I see

compare-locales using:
c:\builds\moz2_slave\m-cen-w32-l10n-ntly-4-00000000\build\mozilla-central\obj-l10n\merged
installers-* using:
c:\builds\moz2_slave\m-cen-w32-l10n-ntly-4-00000000\build\mozilla-central\obj-l10n\merged/

December 26 shows the same paths, though.
c:\builds\moz2_slave\m-cen-w32-l10n-ntly-4-00000000\build\mozilla-central\obj-l10n\merged

CCing Ben and Rail, I wonder if anything regarding path handling changed on windows slaves?

Flod, can you try to do a repack on windows yourself?
(In reply to Axel Hecht [:Pike] from comment #7)
> Flod, can you try to do a repack on windows yourself?

I don't have any Windows machines or VMs set-up with Mozilla code and repositories, and I tried only a couple of times on Linux, so I don't think I can help with that.
'k, no problem.

I also think that we're probably best of testing this on the actual automation slaves.

I can't nail anything to a code change, moving this to releng for further investigation.

Testing I'd like to see. I personally won't be back at work in full until Tuesday, so releng help would be great:

Do a repack of the Jan 2nd win32 Nightly for en-GB, verify that it fails.
Do a repack of the Dec 30 win32 Nightly for en-GB, test.

If that fails, we're down in Windows changes land, if that succeeds, we need a shorter regression range.

I'd also investigate if the merge files are where we expect them, and if they have the content that's expected.

Another thing I'd try is to see if removing the trailing '/' on windows would help.

Another thing I'm curious to see tested is if our aurora builds run successfully. The Fxfn tests pass there for the few locales we have going, sadly we don't run those at all on central.
Severity: normal → major
Component: Build Config → General Automation
Product: Firefox → Release Engineering
QA Contact: catlee
As far as I tested win64 Nightly for ja, 2016-01-01 is good and 2016-01-02 is bad.
CCing Rob, maybe related to our switch AWS?
Just confirming that the win64 de and en-GB nightlies for 2016-01-03 still show the same issue.
Duplicate of this bug: 1236360
Considering there are no english strings in the locales *at all*, the question to answer is whether compare-locales does its job and creates the merged files and the subsequent build picks the files from the base locale directly, ignoring them, or whether compare-locales is not doing its job at all.
I'd like to mention once more that if we had l10n jobs on try, that would be easier to debug/bisect. And if this is caused by some build system change that landed between the last good and the first bad, those changes could have been validated on l10n beforehand.

This also probably calls for a validation step that turns those builds red if all the strings are not in the langpack.
Is it the same in Dev Edition? If yes, this sounds like a build machine configuration change, otherwise I'd dig into the build system changes in m-c.
I bet this would fix it:

diff --git a/toolkit/locales/l10n.mk b/toolkit/locales/l10n.mk
index 312e6aea..2cd9961 100644
--- a/toolkit/locales/l10n.mk
+++ b/toolkit/locales/l10n.mk
@@ -47,17 +47,17 @@ ZIP_IN ?= $(ABS_DIST)/$(PACKAGE)
 WIN32_INSTALLER_IN ?= $(ABS_DIST)/$(PKG_INST_PATH)$(PKG_INST_BASENAME).exe
 
 # Allows overriding the final destination of the repackaged file
 ZIP_OUT ?= $(ABS_DIST)/$(PACKAGE)
 
 ACDEFINES += \
        -DAB_CD=$(AB_CD) \
        -DMOZ_LANGPACK_EID=$(MOZ_LANGPACK_EID) \
-       -DMOZ_APP_ID=$(MOZ_APP_ID) \
+       -DMOZ_APP_ID='$(MOZ_APP_ID)' \
        -DMOZ_APP_VERSION=$(MOZ_APP_VERSION) \
        -DMOZ_APP_MAXVERSION=$(MOZ_APP_MAXVERSION) \
        -DLOCALE_SRCDIR=$(abspath $(LOCALE_SRCDIR)) \
        -DPKG_BASENAME='$(PKG_BASENAME)' \
        -DPKG_INST_BASENAME='$(PKG_INST_BASENAME)' \
        $(NULL)
Something else that would prevent it, is to normalize LOCALE_MERGEDIR with forward-slashes when passing it from buildbot or wherever it's set (mozharness, maybe).
Duplicate of this bug: 1236339
No longer blocks: 1236298
Duplicate of this bug: 1236298
Is this a regression by bug 1235676?
Changesets for bug 1235676 have been pushed at 2016-01-02 00:29 +0000.
BTW, I should mention the relevant thing in the logs is:
WARNING: --locale-mergedir passed, but 'c:buildsmoz2_slavem-cen-w64-l10n-ntly-3-00000000buildmozilla-centralobj-l10nmerged/' does not exist. Ignore this message if the locale is complete.

This should probably be made fatal too... if your locale is complete, don't pass --locale-mergedir.
(In reply to Takanori MATSUURA from comment #22)
> Is this a regression by bug 1235676?

My bet is on bug 1235151, thus comment 18.

As suggested by my current bugzilla "name", I won't be actively working on this being fixed, so I'd suggest interested parties to implement comment 18, comment 19 and comment 23.
(Added Reporter and Cc for bug 1236298)

(In reply to Mike Hommey [:glandium] (VAC: 31 Dec - 11 Jan) from comment #24)
> My bet is on bug 1235151, thus comment 18.

Ah, comment #22 is my mistake to paste.
I agree to comment #24.
However I don't have machine for development environment now. :-(
(In reply to Mike Hommey [:glandium] (VAC: 31 Dec - 11 Jan) from comment #23)
> BTW, I should mention the relevant thing in the logs is:
> WARNING: --locale-mergedir passed, but
> 'c:buildsmoz2_slavem-cen-w64-l10n-ntly-3-00000000buildmozilla-centralobj-
> l10nmerged/' does not exist. Ignore this message if the locale is complete.

Thanks for digging this up. Not yet sure how this is affected by escaping-or-not the app ID.

> This should probably be made fatal too... if your locale is complete, don't
> pass --locale-mergedir.

I think that'd make the system more complex still, I'd rather not. OTH, l10n merge could just add a dummy ".succeeded" file to the merge dir, and we could then make the missing mergedir fatal?

As for tests, yeah, I'd like to have meaningful tests. The l10n-check was a first step, but we never got beyond that. https://groups.google.com/forum/#!topic/mozilla.dev.automation/RbgbaShQb_Y is where I started the conversation, but I didn't get to follow up on that online nor in mozlando.
Maybe Releng should also re-prioritize the l10n-on-try work...
I'll have a look.
Assignee: nobody → gps
Status: NEW → ASSIGNED
I don't see any harm with the patch in comment #18, so I'm going to land that.
Comment on attachment 8703746 [details]
MozReview Request: Bug 1236259 - Normalize LOCALE_MERGEDIR paths; r?ted

https://reviewboard.mozilla.org/r/29457/#review26205

::: testing/mozharness/mozharness/mozilla/l10n/multi_locale_build.py:156
(Diff revision 1)
> -                command += " LOCALE_MERGEDIR=%s" % dirs['abs_merge_dir']
> +                command += " LOCALE_MERGEDIR=%s" % dirs['abs_merge_dir'].replace(os.sep, '/')

Do we not have something in mozpath to do this?
Attachment #8703746 - Flags: review+
https://reviewboard.mozilla.org/r/29457/#review26205

> Do we not have something in mozpath to do this?

We do. But this is mozharness code and I'm not sure it can access mozpath from whatever Python environment it is running from.
No promises these patches will fix things. I'm content with waiting around to see what happens when these are merged to central and the Nightly rebuilds occur. If someone else wants to test the l10n packaging with inbound, be my guest. (I haven't done that in a long time and quite frankly I can't justify a few hours to relearn it right now given other things on my plate today.)
(In reply to Axel Hecht [:Pike] from comment #26)
> (In reply to Mike Hommey [:glandium] (VAC: 31 Dec - 11 Jan) from comment #23)
> > BTW, I should mention the relevant thing in the logs is:
> > WARNING: --locale-mergedir passed, but
> > 'c:buildsmoz2_slavem-cen-w64-l10n-ntly-3-00000000buildmozilla-centralobj-
> > l10nmerged/' does not exist. Ignore this message if the locale is complete.
> 
> Thanks for digging this up. Not yet sure how this is affected by
> escaping-or-not the app ID.

The app id contains special characters (curly brackets) that make make invoke the command line through the shell. When doing so, the backslashed path is munged because the shell thinks they are escaping the following character. Quoting the app ID makes those characters quoted, such that make can invoke the command line itself with CreateProcess. When doing so, the backslashes path works properly.

> > This should probably be made fatal too... if your locale is complete, don't
> > pass --locale-mergedir.
> 
> I think that'd make the system more complex still, I'd rather not. OTH, l10n
> merge could just add a dummy ".succeeded" file to the merge dir, and we
> could then make the missing mergedir fatal?

Wait, compare-locales doesn't put anything in mergedir when the locale is complete? Why not change that? That would simplify things. As in, finding the right files would be straightforward for jarmaker, since there would only be one location to look for them.

> As for tests, yeah, I'd like to have meaningful tests. The l10n-check was a
> first step, but we never got beyond that.
> https://groups.google.com/forum/#!topic/mozilla.dev.automation/RbgbaShQb_Y
> is where I started the conversation, but I didn't get to follow up on that
> online nor in mozlando.

I asked catlee to have l10n jobs on try at mozlando. That's my prerequisite to any build system work involving l10n.
https://hg.mozilla.org/mozilla-central/rev/11acb7e361fe
https://hg.mozilla.org/mozilla-central/rev/18e67d15a1b6
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Axel: I'm not sure how to verify this unborked things. Could you? (Or anyone else who wants to for that matter.)
Flags: needinfo?(l10n)
German build, 46.0a1 (2016-01-05), works fine.
Status: RESOLVED → VERIFIED
Flags: needinfo?(l10n)
OS Windows 7, 64 bit.

Nightly 46.0a1 en-GB (win64) 2016-01-05 is OK.

(on 2016-01-03 I ran into the issue in comment # 0 when I updated Nightly)


Steps used to verify:

1. Used
http://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla-central-l10n/

to collect
firefox-46.0a1.en-GB.win64.zip (55M 05-Jan-2016 17:00)

2. As an Administrator renamed the location of my Win64 Nightly Program Files:
From
C:\Program Files\Nightly\
to
C:\Program Files\x-Nightly\

3. Unziped the downloaded file, in a temporary location.
This created a tree called "firefox" with subdirectories.

4. Renamed the "firefox" folder to "Nightly" and copied the tree to become
C:\Program Files\Nightly\

5. Used regedit to check that there were NO references to "x-Nightly"
(as expected there were none) and
then checked all references to "nightly" - which were all as I expected.

6. Started Nightly in Safe Mode, using a new Profile, all OK.

7. Started Nightly using my 'Admin Nightly Profile', updated some Extensions, all OK.

8. Checked my Firewall rules (all OK - no changes).

9. As a 'normal Windows user', with NO Administrator privileges, checked my 'normal Nightly Profile'.
Updated more Extensions, all OK.

DJ-Leith
Moving back to Firefox :: Build Config, as that's where fix was (was moved in comment 9)
Component: General Automation → Build Config
Product: Release Engineering → Firefox
QA Contact: catlee
Thank you for verifying the fix. I guess we're all set here.
[bugday-20160323]

Status: RESOLVED,FIXED -> UNVERIFIED

Comments:
STR: Not clear.
Developer specific testing

Component: 
Name			Firefox
Version			46.0b9
Build ID		20160322075646
Update Channel          beta
User Agent		Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0
OS			Windows 7 SP1 x86_64

Expected Results: 
Developer specific testing

Actual Results: 
As expected
Component: Build Config → General
Product: Firefox → Firefox Build System
You need to log in before you can comment on or make changes to this bug.