The default bug view has changed. See this FAQ.

use pymake by default on windows

RESOLVED FIXED

Status

Release Engineering
Release Automation
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: bhearsum, Assigned: bhearsum)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Assignee)

Description

4 years ago
In Thunderbird 19.0b1 we hit a "bad file number" problem. Unlike bug 827304, this one appears to be an issue with the total command line length, not an individual path. If you split these and look at the maximum individual path length, it appears to be around 120 chars. Nick did some analysis on the whole line in bug 825315, here's what he said:
(In reply to Nick Thomas [:nthomas] from comment #3)
> The windows en-US build has failed out copying some mochitests. In a release
> build it looks like: 
> 
> e:/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/objdir-tb/mozilla/
> _virtualenv/Scripts/python.exe
> /e/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/mozilla/config/nsinstall.py
> -t
> "/e/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/mozilla/content/html/
> content/test/test_hidden.html" ...
> "/e/builds/moz2_slave/tb-rel-c-beta-w32-bld/build/mozilla/content/html/
> content/test/test_mozaudiochannel.html"
> ../../../../_tests/testing/mochitest/tests/content/html/content/test
> 
> The total line length is 33904 characters, each arg is at most 130
> characters long.
> 
> In comparison, a dep build begins
> e:/builds/moz2_slave/tb-c-beta-w32/build/objdir-tb/mozilla/_virtualenv/
> Scripts/python.exe
> /e/builds/moz2_slave/tb-c-beta-w32/build/mozilla/config/nsinstall.py -t
> "/e/builds/moz2_slave/tb-c-beta-w32/build/mozilla/content/html/content/test/
> test_hidden.html" ...
> 
> This is only 31409 characters long, because the builder shortens by 8
> characters in 312 instances. In suggesting that there is a 32K character
> limit somewhere, which the combination of more tests and longer builder name
> causes breakeage.



bug 827306 describes a "fix" that makes sure we catch these issues at build time rather than release time, but we need to fix this individual one first, or else we'll burn the tree when we land it.
I'd like this to scale better as we add tests. Can we evaluate whatever glob is getting the list of files then split the copy into multiple lines if it's over the 32K character boundary ? Or just switch to copying them one by one if it's not a huge perf hit.
(Assignee)

Comment 2

4 years ago
I know this doesn't necessarily seem like a big deal, but it really hurts us in the critical path of releases whenever we hit it. Ted, can suggest someone who can fix this, or even a rough outline of what a fix would look like? I'd be happy to take a stab at the patch myself with some guidance.
Flags: needinfo?(ted)
So the issue here is that the "copy mochitests" command in content/ winds up overlong and fails? Is this only happening on Windows, or everywhere? If it's Windows-only, we may be able to get around it by ensuring that we use the native-command nsinstall instead of shelling out to Python. I thought we already fixed that in bug 774054, though, so I'm not sure what's happening here. Is this just Thunderbird's build system being out-of-date? Am I missing something?
Flags: needinfo?(ted)
(Assignee)

Comment 4

4 years ago
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #3)
> So the issue here is that the "copy mochitests" command in content/ winds up
> overlong and fails? Is this only happening on Windows, or everywhere?

We've only hit it on Windows, I'm not sure if that's just luck or not though.

> f
> it's Windows-only, we may be able to get around it by ensuring that we use
> the native-command nsinstall instead of shelling out to Python. I thought we
> already fixed that in bug 774054, though, so I'm not sure what's happening
> here. Is this just Thunderbird's build system being out-of-date? Am I
> missing something?

Hm, looking at similar Firefox build log I see:
e:/builds/moz2_slave/rel-m-beta-w32-bld/build/obj-firefox/_virtualenv/Scripts/python.exe /e/builds/moz2_slave/rel-m-beta-w32-bld/build/config/nsinstall.py -t "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_hidden.html" "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_bug589.html" "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_bug691.html" ... /test/test_rowscollection.html" "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/test_mozaudiochannel.html" ../../../../_tests/testing/mochitest/tests/content/html/content/test

...which seems to suggest that we're shelling out there for Firefox too. The full log I was looking at is at https://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/19.0b2-candidates/build1/logs/release-mozilla-beta-win32_build-bm25-build1-build6.txt.gz
(Assignee)

Comment 5

4 years ago
(In reply to Ben Hearsum [:bhearsum] from comment #4)
> (In reply to Ted Mielczarek [:ted.mielczarek] from comment #3)
> > So the issue here is that the "copy mochitests" command in content/ winds up
> > overlong and fails? Is this only happening on Windows, or everywhere?
> 
> We've only hit it on Windows, I'm not sure if that's just luck or not though.
> 
> > f
> > it's Windows-only, we may be able to get around it by ensuring that we use
> > the native-command nsinstall instead of shelling out to Python. I thought we
> > already fixed that in bug 774054, though, so I'm not sure what's happening
> > here. Is this just Thunderbird's build system being out-of-date? Am I
> > missing something?
> 
> Hm, looking at similar Firefox build log I see:
> e:/builds/moz2_slave/rel-m-beta-w32-bld/build/obj-firefox/_virtualenv/
> Scripts/python.exe
> /e/builds/moz2_slave/rel-m-beta-w32-bld/build/config/nsinstall.py -t
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_hidden.html"
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_bug589.html"
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_bug691.html" ... /test/test_rowscollection.html"
> "/e/builds/moz2_slave/rel-m-beta-w32-bld/build/content/html/content/test/
> test_mozaudiochannel.html"
> ../../../../_tests/testing/mochitest/tests/content/html/content/test
> 
> ...which seems to suggest that we're shelling out there for Firefox too. The
> full log I was looking at is at
> https://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/19.0b2-candidates/
> build1/logs/release-mozilla-beta-win32_build-bm25-build1-build6.txt.gz

Interestingly, this log shows it using the native command: https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-beta-win32/1359510108/mozilla-beta-win32-bm13-build1-build302.txt.gz
evaluation from e:\builds\moz2_slave\m-beta-w32\build\config\makefiles\mochitest.mk:32:0:2:0$ nsinstall nsinstall -t "e:/builds/moz2_slave/m-beta-w32/build/content/html/content/test/test_hidden.html"

Which says that we're not using native commands in our release builds for some reason. I wonder if we're using pymake at all in those?
(Assignee)

Comment 6

4 years ago
Given that the non-release builds already have shortened paths, it seems like the real bug here is that our release builds aren't using pymake.
Summary: shorten content mochitest command line → release builds need pymake
Sure glad you figured this out before we fixed bug 828317 and it got to release!
(Assignee)

Comment 8

4 years ago
Looks like we're not passing makeCmd to MercurialBuildFactory. Hooray!
Component: Build Config → Release Engineering: Automation (Release Automation)
Product: Core → mozilla.org
QA Contact: bhearsum
Version: unspecified → other
(Assignee)

Comment 9

4 years ago
Created attachment 708233 [details] [diff] [review]
pass enable_pymake for release builds (and xulrunner in general)

I changed up the flag we're passing so we could centralize the hardcoding of the path to make.py inside of the factory (rather than putting it in more places in factory.py, and in release.py). I've got dump master output if it helps to review it.
Assignee: nobody → bhearsum
Status: NEW → ASSIGNED
Attachment #708233 - Flags: review?(coop)
(Assignee)

Comment 10

4 years ago
Philor just found that localizer nightlies don't use pymake either. There's probably other things, too. Let's just make this the default and be done with it!
Summary: release builds need pymake → use pymake by default on windows
(Assignee)

Comment 11

4 years ago
Comment on attachment 708233 [details] [diff] [review]
pass enable_pymake for release builds (and xulrunner in general)

Gonna need a new patch.
Attachment #708233 - Attachment is obsolete: true
Attachment #708233 - Flags: review?(coop)
Blocks: 836762
(Assignee)

Comment 12

4 years ago
Created attachment 708606 [details] [diff] [review]
enable pymake by default for windows
Attachment #708606 - Flags: review?(coop)
(Assignee)

Comment 13

4 years ago
Created attachment 708607 [details] [diff] [review]
remove now-pointless enable_pymake from configs
Attachment #708607 - Flags: review?(coop)
(Assignee)

Comment 14

4 years ago
Oh, this buildbot-configs patch also fixes up a missed merge day item for pymake riding the trains for Thunderbird ಠ_ಠ

Updated

4 years ago
Attachment #708606 - Flags: review?(coop) → review+

Updated

4 years ago
Attachment #708607 - Flags: review?(coop) → review+
(Assignee)

Updated

4 years ago
Attachment #708607 - Flags: checked-in+
(Assignee)

Updated

4 years ago
Attachment #708606 - Flags: checked-in+

Updated

4 years ago
Depends on: 837006
(Assignee)

Comment 15

4 years ago
This seems to have worked fine. I see a XULRunner nightly using make.py: https://tbpl.mozilla.org/php/getParsedLog.php?id=19353331&tree=Firefox&full=1
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Do I see it correctly that this bug morphed from the python.exe "bad file number" error to "enable pymake by default" and those two things are not directly related when a build is already using pymake? Just wondering as SeaMonkey Windows nightly builds have the same problem now (32k limit looks like). Guess we could try to shorten the objdir name.
(Assignee)

Comment 17

4 years ago
(In reply to Frank Wein [:mcsmurf] from comment #16)
> Do I see it correctly that this bug morphed from the python.exe "bad file
> number" error to "enable pymake by default" and those two things are not
> directly related when a build is already using pymake? Just wondering as
> SeaMonkey Windows nightly builds have the same problem now (32k limit looks
> like). Guess we could try to shorten the objdir name.

Yes, the original bug was a specific problem with a specific build. I morphed the bug into a more general fix for this symptom.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.