Closed Bug 879370 Opened 7 years ago Closed 6 years ago

Fix or disable Windows desktop B2G builds

Categories

(Release Engineering :: General, defect, critical)

x86
Windows 7
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RyanVM, Assigned: mossop)

References

Details

Attachments

(1 file)

Spin-off from bug 876068 so that TBPL spam doesn't get in the way of fixing the issue.

Relevant comments from that bug:

(In reply to Jonathan Griffin (:jgriffin) from comment #54)
> We have no plans to use the per-commit b2g desktop builds on Windows for
> automation; I can't say whether anyone uses the nightlies, however.

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #55)
> I've seen this locally while building a toy project, it was due to including
> headers in the wrong order:
> http://stackoverflow.com/questions/4845198/fatal-error-no-target-
> architecture-in-visual-studio-2010
> 
> I'm not sure why we'd be seeing this intermittently on our b2g builds.

(In reply to Phil Ringnalda (:philor) from comment #56)
> We see it intermittently because we intermittently do dep builds - the
> periods where I'm not starring it constantly are the periods where we either
> blind-clobbered everything, or I got sick of this and clobbered just it, or
> we got a slave without an existing objdir.

(In reply to Phil Ringnalda (:philor) from comment #64)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=23688541&tree=Mozilla-
> Inbound
> 
> Or maybe it is an intermittent race, that's just neatly ducked by slowing
> things down with a clobber, I'm not entirely sure.
Blocks: 876068
No longer depends on: 876068
if we do need them, one workaround would be to always do clobber builds
Yep. 80 minutes versus 30 minutes, and no guarantee that it will actually make it go away, only anecdotal evidence that it seems to decrease the frequency.

Since nobody has ever once voiced the slightest willingness to fix anything whatsoever about these builds, we could also just switch them to mozilla-central-nightly-only, and run them that way until they break and remain broken for so long that it's obviously the right thing to do to stop those builds, too. According to my vague memory, that's pretty much what we did with the similar XUL Fennec Desktop builds.
(In reply to Myk Melez [:myk] [@mykmelez] from bug 876068 comment #84)
> (In reply to Dave Townsend (:Mossop) from bug 876068 comment #79)
> > Myk, Alex, in the engineering call today they talked about turning off b2g
> > desktop builds on windows because of this. That seems likely harmful to
> > simulator, is that right?
> 
> It's indirectly harmful, because presumably some folks use those builds and
> find/fix problems that also affect the Simulator.  But it isn't directly
> harmful, because I build B2G Desktop myself for the Simulator; I don't rely
> on these B2G Desktop builds.

Ok but it seems like doing these regularly is going to give us an early warning that something broke.

Switching to just doing these nightly seems like it would be fine for that.
Assignee: nobody → dtownsend+bugmail
Attachment #760407 - Flags: review?(rail) → review+
(In reply to Dave Townsend (:Mossop) from comment #3)
> Ok but it seems like doing these regularly is going to give us an early
> warning that something broke.
> 
> Switching to just doing these nightly seems like it would be fine for that.

Agreed!
With any luck, bisecting bustage across a 24hr period of Gecko and Gaia commits won't be too burdensome.
This is in production.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #7)
> With any luck, bisecting bustage across a 24hr period of Gecko and Gaia
> commits won't be too burdensome.

Yep, didn't see this coming.
https://tbpl.mozilla.org/php/getParsedLog.php?id=24829368&tree=Mozilla-Central
Blocks: 889384
per platform meeting just now, mossop is working on fixing these b2g-win32-desktop builds, so bug is correctly assigned to mossop. 

Once these builds are working again, RelEng will re-enable builds-per-checkin and sheriffs will unhide on tbpl.
Ok, this is looking to be some build config or mozharness problem. The actual build failure occurs after the main build step has completed during the multilocale compile. Part of that process is to perform a make in obj-firefox/b2g/app. In the cases where the build fails the build system for some reason has decided that nsBrowserApp.cpp needs to be rebuilt (even though it was built earlier already). In the cases that pass no attempt to rebuild nsBrowserApp.cpp is made.

My guess is that the environment setup for running mozharness/scripts/b2g_desktop_multilocale.py isn't capable of compilation and so we fail. Maybe the VS environment variables aren't set? Maybe aki can answer that.

I'm not sure why we'd be attempting to rebuild nsBrowserApp.cpp at this point though, if we could figure that out and stop it then that'd make this problem go away, gps?
Flags: needinfo?(gps)
Flags: needinfo?(aki)
nsBrowserApp.cpp rebuild is typically caused by bug 740359. tl;dr the build-time generated build ID is embedded in nsBrowserApp.cpp and any new build will create a new build ID and force nsBrowserApp.cpp rebuild.

My guess is you are running into a variation where the "multilocale compile" is generating a new build ID and thus incurring nsBrowserApp.cpp rebuild. I don't know enough about the mozharness job/step this is referring to to know if this is proper behavior.
Flags: needinfo?(gps)
(In reply to Gregory Szorc [:gps] from comment #12)
> My guess is you are running into a variation where the "multilocale compile"
> is generating a new build ID and thus incurring nsBrowserApp.cpp rebuild. I
> don't know enough about the mozharness job/step this is referring to to know
> if this is proper behavior.

We do create a buildid file, and the proper behavior would be to refer to this file if it exists, and otherwise generate a new buildid.  If we're generating a new buildid on a rebuild, that's problematic.

(In reply to Dave Townsend (:Mossop) from comment #11)
> My guess is that the environment setup for running
> mozharness/scripts/b2g_desktop_multilocale.py isn't capable of compilation
> and so we fail. Maybe the VS environment variables aren't set? Maybe aki can
> answer that.

We're using pkg_env instead of self.env.

The |make -f client.mk build| step is using self.env; the make sdk, make installer, make package, make package-tests steps all use pkg_env.

The envs are output in the logs for each step, e.g. http://ftp.mozilla.org/pub/mozilla.org/b2g/nightly/2013-07-03-03-08-42-mozilla-central/mozilla-central-win32_gecko-nightly-bm61-build1-build7.txt.gz

When I copy the |make -f client.mk build| env to one text file, and the b2g_desktop_multilocale env to another, and diff, I get:

<   !EXITCODE=00000001

The script also adds this to the env:

            partial_env={
                'LOCALE_MERGEDIR': dirs['abs_merge_dir'],
                'MOZ_CHROME_MULTILOCALE': 'en-US ' + ' '.join(gecko_locales),
            }
Flags: needinfo?(aki)
Depends on: 890003
(In reply to Aki Sasaki [:aki] from comment #13)
> (In reply to Gregory Szorc [:gps] from comment #12)
> We do create a buildid file, and the proper behavior would be to refer to
> this file if it exists, and otherwise generate a new buildid. If we're
> generating a new buildid on a rebuild, that's problematic.

There are good reasons why we're refreshing the buildid each time.
I found something interesting and probably a bug, though I can't tell if it is causing this problem just yet.

The builds run with the environment set up for VS 9, as the INCLUDE logged in the build output shows. I did the build steps but when building in b2g/app I logged the current environment. During a top-level build the environment outputted shows that the environment has switched to build with VS 10. I don't know how or why.

Later in the multilocale repack the call to "make -C b2g/app" uses the VS 9 and so if an attempt is made to recompile nsBrowserApp.cpp then it does so with VS 9 instead of 10 where it (and things it depends on) were compiled previously.

In my tests that caused a missing header error, though not the same error as we seem here but I think I can assume it is bad.

So, should these builds be being made with VS 9 or VS 10? If 10 we need to switch the environment variables to match. If 9 why is the build system overriding that?
(In reply to Dave Townsend (:Mossop) from comment #15)
> So, should these builds be being made with VS 9 or VS 10? If 10 we need to
> switch the environment variables to match. If 9 why is the build system
> overriding that?

Ok, the in-tree mozconfigs set the INCLUDE and LIBS path for VS 10, so that should probably be what we use, but "make -C b2g/app" doesn't load the mozconfig so the paths stay pointing to VS 9.
Looks like this is a releng bug.
Component: Builds → Release Engineering
Product: Boot2Gecko → mozilla.org
Version: unspecified → other
Depends on: 903118
Product: mozilla.org → Release Engineering
Found in triage. I know pymake tripped us up on win32 last week with compiler versions, causing FF24.0b1 woes... Maybe this is related?
Component: Other → General Automation
(In reply to John O'Duinn [:joduinn] from comment #18)
> Found in triage. I know pymake tripped us up on win32 last week with
> compiler versions, causing FF24.0b1 woes... Maybe this is related?

Yes, I believe bug 903118 will solve this
Blocks: 908691
I've pinged bug 903118, since running these Nightly only is causing sheriff pain (eg bug 908691) almost as much as running them per-push was :-(
No longer blocks: 908691
Depends on: 908691
I've verified that the problem I could reproduce on the slave is fixed by bug 903118 so can we turn per-checkin builds back on and see whether they are working more reliably now?
Flags: needinfo?(catlee)
I wholeheartedly endorse this proposal. Start with b-i?
Sure, let's do it.

reverted 2b47d9f93738 and pushed https://hg.mozilla.org/build/buildbot-configs/rev/85fbfb0ff48d
Flags: needinfo?(catlee)
In production
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.