Closed Bug 831989 Opened 7 years ago Closed 7 years ago

use pymake by default on windows

Categories

(Release Engineering :: Release Automation: Other, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

Attachments

(2 files, 1 obsolete file)

In Thunderbird 19.0b1 we hit a "bad file number" problem. Unlike bug 827304, this one appears to be an issue with the total command line length, not an individual path. If you split these and look at the maximum individual path length, it appears to be around 120 chars. Nick did some analysis on the whole line in bug 825315, here's what he said:
(In reply to Nick Thomas [:nthomas] from comment #3)
> The windows en-US build has failed out copying some mochitests. In a release
> build it looks like: 
> 
> e:/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/objdir-tb/mozilla/
> _virtualenv/Scripts/python.exe
> /e/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/mozilla/config/nsinstall.py
> -t
> "/e/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/mozilla/content/html/
> content/test/test_hidden.html" ...
> "/e/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/mozilla/content/html/
> content/test/test_mozaudiochannel.html"
> ../../../../_tests/testing/mochitest/tests/content/html/content/test
> 
> The total line length is 33904 characters, each arg is at most 130
> characters long.
> 
> In comparison, a dep build begins
> e:/builds/moz2_slave/tb-c-beta-w32/build/objdir-tb/mozilla/_virtualenv/
> Scripts/python.exe
> /e/builds/moz2_slave/tb-c-beta-w32/build/mozilla/config/nsinstall.py -t
> "/e/builds/moz2_slave/tb-c-beta-w32/build/mozilla/content/html/content/test/
> test_hidden.html" ...
> 
> This is only 31409 characters long, because the builder shortens by 8
> characters in 312 instances. In suggesting that there is a 32K character
> limit somewhere, which the combination of more tests and longer builder name
> causes breakeage.



bug 827306 describes a "fix" that makes sure we catch these issues at build time rather than release time, but we need to fix this individual one first, or else we'll burn the tree when we land it.
I'd like this to scale better as we add tests. Can we evaluate whatever glob is getting the list of files then split the copy into multiple lines if it's over the 32K character boundary ? Or just switch to copying them one by one if it's not a huge perf hit.
I know this doesn't necessarily seem like a big deal, but it really hurts us in the critical path of releases whenever we hit it. Ted, can suggest someone who can fix this, or even a rough outline of what a fix would look like? I'd be happy to take a stab at the patch myself with some guidance.
Flags: needinfo?(ted)
So the issue here is that the "copy mochitests" command in content/ winds up overlong and fails? Is this only happening on Windows, or everywhere? If it's Windows-only, we may be able to get around it by ensuring that we use the native-command nsinstall instead of shelling out to Python. I thought we already fixed that in bug 774054, though, so I'm not sure what's happening here. Is this just Thunderbird's build system being out-of-date? Am I missing something?
Flags: needinfo?(ted)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #3)
> So the issue here is that the "copy mochitests" command in content/ winds up
> overlong and fails? Is this only happening on Windows, or everywhere?

We've only hit it on Windows, I'm not sure if that's just luck or not though.

> f
> it's Windows-only, we may be able to get around it by ensuring that we use
> the native-command nsinstall instead of shelling out to Python. I thought we
> already fixed that in bug 774054, though, so I'm not sure what's happening
> here. Is this just Thunderbird's build system being out-of-date? Am I
> missing something?

Hm, looking at similar Firefox build log I see:
e:/builds/moz2_slave/rel-m-beta-w32-bld/build/obj-firefox/_virtualenv/Scripts/python.exe /e/builds/moz2_slave/rel-m-beta-w32-bld/build/config/nsinstall.py -t "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_hidden.html" "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_bug589.html" "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_bug691.html" ... /test/test_rowscollection.html" "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_mozaudiochannel.html" ../../../../_tests/testing/mochitest/tests/content/html/content/test

...which seems to suggest that we're shelling out there for Firefox too. The full log I was looking at is at https://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/19.0b2-candidates/build1/logs/release-mozilla-beta-win32_build-bm25-build1-build6.txt.gz
(In reply to Ben Hearsum [:bhearsum] from comment #4)
> (In reply to Ted Mielczarek [:ted.mielczarek] from comment #3)
> > So the issue here is that the "copy mochitests" command in content/ winds up
> > overlong and fails? Is this only happening on Windows, or everywhere?
> 
> We've only hit it on Windows, I'm not sure if that's just luck or not though.
> 
> > f
> > it's Windows-only, we may be able to get around it by ensuring that we use
> > the native-command nsinstall instead of shelling out to Python. I thought we
> > already fixed that in bug 774054, though, so I'm not sure what's happening
> > here. Is this just Thunderbird's build system being out-of-date? Am I
> > missing something?
> 
> Hm, looking at similar Firefox build log I see:
> e:/builds/moz2_slave/rel-m-beta-w32-bld/build/obj-firefox/_virtualenv/
> Scripts/python.exe
> /e/builds/moz2_slave/rel-m-beta-w32-bld/build/config/nsinstall.py -t
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_hidden.html"
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_bug589.html"
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_bug691.html" ... /test/test_rowscollection.html"
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_mozaudiochannel.html"
> ../../../../_tests/testing/mochitest/tests/content/html/content/test
> 
> ...which seems to suggest that we're shelling out there for Firefox too. The
> full log I was looking at is at
> https://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/19.0b2-candidates/
> build1/logs/release-mozilla-beta-win32_build-bm25-build1-build6.txt.gz

Interestingly, this log shows it using the native command: https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-beta-win32/1359510108/mozilla-beta-win32-bm13-build1-build302.txt.gz
evaluation from e:\builds\moz2_slave\m-beta-w32\build\config\makefiles\mochitest.mk:32:0:2:0$ nsinstall nsinstall -t "e:/builds/moz2_slave/m-beta-w32/build/content/html/content/test/test_hidden.html"

Which says that we're not using native commands in our release builds for some reason. I wonder if we're using pymake at all in those?
Given that the non-release builds already have shortened paths, it seems like the real bug here is that our release builds aren't using pymake.
Summary: shorten content mochitest command line → release builds need pymake
Sure glad you figured this out before we fixed bug 828317 and it got to release!
Looks like we're not passing makeCmd to MercurialBuildFactory. Hooray!
Component: Build Config → Release Engineering: Automation (Release Automation)
Product: Core → mozilla.org
QA Contact: bhearsum
Version: unspecified → other
I changed up the flag we're passing so we could centralize the hardcoding of the path to make.py inside of the factory (rather than putting it in more places in factory.py, and in release.py). I've got dump master output if it helps to review it.
Assignee: nobody → bhearsum
Status: NEW → ASSIGNED
Attachment #708233 - Flags: review?(coop)
Philor just found that localizer nightlies don't use pymake either. There's probably other things, too. Let's just make this the default and be done with it!
Summary: release builds need pymake → use pymake by default on windows
Comment on attachment 708233 [details] [diff] [review]
pass enable_pymake for release builds (and xulrunner in general)

Gonna need a new patch.
Attachment #708233 - Attachment is obsolete: true
Attachment #708233 - Flags: review?(coop)
Blocks: 836762
Oh, this buildbot-configs patch also fixes up a missed merge day item for pymake riding the trains for Thunderbird ಠ_ಠ
Attachment #708606 - Flags: review?(coop) → review+
Attachment #708607 - Flags: review?(coop) → review+
Attachment #708607 - Flags: checked-in+
Attachment #708606 - Flags: checked-in+
Depends on: 837006
This seems to have worked fine. I see a XULRunner nightly using make.py: https://tbpl.mozilla.org/php/getParsedLog.php?id=19353331&tree=Firefox&full=1
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Do I see it correctly that this bug morphed from the python.exe "bad file number" error to "enable pymake by default" and those two things are not directly related when a build is already using pymake? Just wondering as SeaMonkey Windows nightly builds have the same problem now (32k limit looks like). Guess we could try to shorten the objdir name.
(In reply to Frank Wein [:mcsmurf] from comment #16)
> Do I see it correctly that this bug morphed from the python.exe "bad file
> number" error to "enable pymake by default" and those two things are not
> directly related when a build is already using pymake? Just wondering as
> SeaMonkey Windows nightly builds have the same problem now (32k limit looks
> like). Guess we could try to shorten the objdir name.

Yes, the original bug was a specific problem with a specific build. I morphed the bug into a more general fix for this symptom.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.