Last Comment Bug 831989 - use pymake by default on windows
: use pymake by default on windows
Status: RESOLVED FIXED
:
Product: Release Engineering
Classification: Other
Component: Release Automation (show other bugs)
: other
: All All
: -- normal (vote)
: ---
Assigned To: Ben Hearsum (:bhearsum)
: Ben Hearsum (:bhearsum)
Mentors:
Depends on: 837006
Blocks: 827305 827306 836762
  Show dependency treegraph
 
Reported: 2013-01-17 13:01 PST by Ben Hearsum (:bhearsum)
Modified: 2013-08-12 21:54 PDT (History)
11 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
pass enable_pymake for release builds (and xulrunner in general) (9.81 KB, patch)
2013-01-30 11:39 PST, Ben Hearsum (:bhearsum)
no flags Details | Diff | Splinter Review
enable pymake by default for windows (12.90 KB, patch)
2013-01-31 08:26 PST, Ben Hearsum (:bhearsum)
coop: review+
bhearsum: checked‑in+
Details | Diff | Splinter Review
remove now-pointless enable_pymake from configs (5.87 KB, patch)
2013-01-31 08:26 PST, Ben Hearsum (:bhearsum)
coop: review+
bhearsum: checked‑in+
Details | Diff | Splinter Review

Description Ben Hearsum (:bhearsum) 2013-01-17 13:01:50 PST
In Thunderbird 19.0b1 we hit a "bad file number" problem. Unlike bug 827304, this one appears to be an issue with the total command line length, not an individual path. If you split these and look at the maximum individual path length, it appears to be around 120 chars. Nick did some analysis on the whole line in bug 825315, here's what he said:
(In reply to Nick Thomas [:nthomas] from comment #3)
> The windows en-US build has failed out copying some mochitests. In a release
> build it looks like: 
> 
> e:/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/objdir-tb/mozilla/
> _virtualenv/Scripts/python.exe
> /e/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/mozilla/config/nsinstall.py
> -t
> "/e/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/mozilla/content/html/
> content/test/test_hidden.html" ...
> "/e/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/mozilla/content/html/
> content/test/test_mozaudiochannel.html"
> ../../../../_tests/testing/mochitest/tests/content/html/content/test
> 
> The total line length is 33904 characters, each arg is at most 130
> characters long.
> 
> In comparison, a dep build begins
> e:/builds/moz2_slave/tb-c-beta-w32/build/objdir-tb/mozilla/_virtualenv/
> Scripts/python.exe
> /e/builds/moz2_slave/tb-c-beta-w32/build/mozilla/config/nsinstall.py -t
> "/e/builds/moz2_slave/tb-c-beta-w32/build/mozilla/content/html/content/test/
> test_hidden.html" ...
> 
> This is only 31409 characters long, because the builder shortens by 8
> characters in 312 instances. In suggesting that there is a 32K character
> limit somewhere, which the combination of more tests and longer builder name
> causes breakeage.



bug 827306 describes a "fix" that makes sure we catch these issues at build time rather than release time, but we need to fix this individual one first, or else we'll burn the tree when we land it.
Comment 1 Nick Thomas [:nthomas] 2013-01-18 09:06:22 PST
I'd like this to scale better as we add tests. Can we evaluate whatever glob is getting the list of files then split the copy into multiple lines if it's over the 32K character boundary ? Or just switch to copying them one by one if it's not a huge perf hit.
Comment 2 Ben Hearsum (:bhearsum) 2013-01-23 06:28:27 PST
I know this doesn't necessarily seem like a big deal, but it really hurts us in the critical path of releases whenever we hit it. Ted, can suggest someone who can fix this, or even a rough outline of what a fix would look like? I'd be happy to take a stab at the patch myself with some guidance.
Comment 3 Ted Mielczarek [:ted.mielczarek] 2013-01-23 06:55:28 PST
So the issue here is that the "copy mochitests" command in content/ winds up overlong and fails? Is this only happening on Windows, or everywhere? If it's Windows-only, we may be able to get around it by ensuring that we use the native-command nsinstall instead of shelling out to Python. I thought we already fixed that in bug 774054, though, so I'm not sure what's happening here. Is this just Thunderbird's build system being out-of-date? Am I missing something?
Comment 4 Ben Hearsum (:bhearsum) 2013-01-23 07:06:56 PST
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #3)
> So the issue here is that the "copy mochitests" command in content/ winds up
> overlong and fails? Is this only happening on Windows, or everywhere?

We've only hit it on Windows, I'm not sure if that's just luck or not though.

> f
> it's Windows-only, we may be able to get around it by ensuring that we use
> the native-command nsinstall instead of shelling out to Python. I thought we
> already fixed that in bug 774054, though, so I'm not sure what's happening
> here. Is this just Thunderbird's build system being out-of-date? Am I
> missing something?

Hm, looking at similar Firefox build log I see:
e:/builds/moz2_slave/rel-m-beta-w32-bld/build/obj-firefox/_virtualenv/Scripts/python.exe /e/builds/moz2_slave/rel-m-beta-w32-bld/build/config/nsinstall.py -t "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_hidden.html" "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_bug589.html" "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_bug691.html" ... /test/test_rowscollection.html" "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_mozaudiochannel.html" ../../../../_tests/testing/mochitest/tests/content/html/content/test

...which seems to suggest that we're shelling out there for Firefox too. The full log I was looking at is at https://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/19.0b2-candidates/build1/logs/release-mozilla-beta-win32_build-bm25-build1-build6.txt.gz
Comment 5 Ben Hearsum (:bhearsum) 2013-01-30 09:53:10 PST
(In reply to Ben Hearsum [:bhearsum] from comment #4)
> (In reply to Ted Mielczarek [:ted.mielczarek] from comment #3)
> > So the issue here is that the "copy mochitests" command in content/ winds up
> > overlong and fails? Is this only happening on Windows, or everywhere?
> 
> We've only hit it on Windows, I'm not sure if that's just luck or not though.
> 
> > f
> > it's Windows-only, we may be able to get around it by ensuring that we use
> > the native-command nsinstall instead of shelling out to Python. I thought we
> > already fixed that in bug 774054, though, so I'm not sure what's happening
> > here. Is this just Thunderbird's build system being out-of-date? Am I
> > missing something?
> 
> Hm, looking at similar Firefox build log I see:
> e:/builds/moz2_slave/rel-m-beta-w32-bld/build/obj-firefox/_virtualenv/
> Scripts/python.exe
> /e/builds/moz2_slave/rel-m-beta-w32-bld/build/config/nsinstall.py -t
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_hidden.html"
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_bug589.html"
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_bug691.html" ... /test/test_rowscollection.html"
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_mozaudiochannel.html"
> ../../../../_tests/testing/mochitest/tests/content/html/content/test
> 
> ...which seems to suggest that we're shelling out there for Firefox too. The
> full log I was looking at is at
> https://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/19.0b2-candidates/
> build1/logs/release-mozilla-beta-win32_build-bm25-build1-build6.txt.gz

Interestingly, this log shows it using the native command: https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-beta-win32/1359510108/mozilla-beta-win32-bm13-build1-build302.txt.gz
evaluation from e:\builds\moz2_slave\m-beta-w32\build\config\makefiles\mochitest.mk:32:0:2:0$ nsinstall nsinstall -t "e:/builds/moz2_slave/m-beta-w32/build/content/html/content/test/test_hidden.html"

Which says that we're not using native commands in our release builds for some reason. I wonder if we're using pymake at all in those?
Comment 6 Ben Hearsum (:bhearsum) 2013-01-30 09:55:55 PST
Given that the non-release builds already have shortened paths, it seems like the real bug here is that our release builds aren't using pymake.
Comment 7 Ted Mielczarek [:ted.mielczarek] 2013-01-30 10:00:20 PST
Sure glad you figured this out before we fixed bug 828317 and it got to release!
Comment 8 Ben Hearsum (:bhearsum) 2013-01-30 10:37:00 PST
Looks like we're not passing makeCmd to MercurialBuildFactory. Hooray!
Comment 9 Ben Hearsum (:bhearsum) 2013-01-30 11:39:52 PST
Created attachment 708233 [details] [diff] [review]
pass enable_pymake for release builds (and xulrunner in general)

I changed up the flag we're passing so we could centralize the hardcoding of the path to make.py inside of the factory (rather than putting it in more places in factory.py, and in release.py). I've got dump master output if it helps to review it.
Comment 10 Ben Hearsum (:bhearsum) 2013-01-31 07:43:24 PST
Philor just found that localizer nightlies don't use pymake either. There's probably other things, too. Let's just make this the default and be done with it!
Comment 11 Ben Hearsum (:bhearsum) 2013-01-31 07:45:02 PST
Comment on attachment 708233 [details] [diff] [review]
pass enable_pymake for release builds (and xulrunner in general)

Gonna need a new patch.
Comment 12 Ben Hearsum (:bhearsum) 2013-01-31 08:26:16 PST
Created attachment 708606 [details] [diff] [review]
enable pymake by default for windows
Comment 13 Ben Hearsum (:bhearsum) 2013-01-31 08:26:38 PST
Created attachment 708607 [details] [diff] [review]
remove now-pointless enable_pymake from configs
Comment 14 Ben Hearsum (:bhearsum) 2013-01-31 08:48:33 PST
Oh, this buildbot-configs patch also fixes up a missed merge day item for pymake riding the trains for Thunderbird ಠ_ಠ
Comment 15 Ben Hearsum (:bhearsum) 2013-02-01 06:13:52 PST
This seems to have worked fine. I see a XULRunner nightly using make.py: https://tbpl.mozilla.org/php/getParsedLog.php?id=19353331&tree=Firefox&full=1
Comment 16 Frank Wein [:mcsmurf] 2013-02-10 17:16:40 PST
Do I see it correctly that this bug morphed from the python.exe "bad file number" error to "enable pymake by default" and those two things are not directly related when a build is already using pymake? Just wondering as SeaMonkey Windows nightly builds have the same problem now (32k limit looks like). Guess we could try to shorten the objdir name.
Comment 17 Ben Hearsum (:bhearsum) 2013-02-11 06:35:13 PST
(In reply to Frank Wein [:mcsmurf] from comment #16)
> Do I see it correctly that this bug morphed from the python.exe "bad file
> number" error to "enable pymake by default" and those two things are not
> directly related when a build is already using pymake? Just wondering as
> SeaMonkey Windows nightly builds have the same problem now (32k limit looks
> like). Guess we could try to shorten the objdir name.

Yes, the original bug was a specific problem with a specific build. I morphed the bug into a more general fix for this symptom.

Note You need to log in before you can comment on or make changes to this bug.