Closed
Bug 1455221
Opened 7 years ago
Closed 7 years ago
Thunderbird automation and local build busted, 2018-04-18
Categories
(Thunderbird :: Build Config, defect)
Thunderbird
Build Config
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jorgk-bmo, Assigned: nika)
References
Details
Attachments
(3 files, 4 obsolete files)
1.92 KB,
patch
|
jorgk-bmo
:
review+
|
Details | Diff | Splinter Review |
1.06 KB,
patch
|
mccr8
:
review+
|
Details | Diff | Splinter Review |
2.54 KB,
patch
|
jorgk-bmo
:
review+
|
Details | Diff | Splinter Review |
This appears to have started with the Daily run on 2018-04-18.
The build (N) jobs still ran on M-C 789e30ff2e3d6e1fcfce1a373c1e563548 but the repacks might have run on M-C 4af4ae0aee552a99a995ce4b32198b98294a95f7.
In case that's not so, certainly builds succeed at 789e30ff2e3d6e1fcfce1a373c1e563548 but failed at 4af4ae0aee552a99a995ce4b32198b98294a95f7
with obscure errors:
Automation Error: mozprocess timed out after 2400 seconds running ['/usr/bin/python2.7', 'mach', '--log-no-times', 'build', '-v']
Aceman reported on IRC about a local build:
23:06:59 - aceman: I updated right now and it seems python is stuck in some loop for 10 minutes already
23:07:18 - aceman: it is in the 'export' tier of 'mach build'
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=789e30ff2e3d6e1fcfce1a373c1e563548 &tochange=4af4ae0aee552a99a995ce4b32198b9829
which is the push at Wed Apr 18 10:41:19 2018 +0000 and Wed Apr 18 10:50:38 2018 +0000.
Sadly I have no builds for 234 merged M-C changesets :-(
Flags: needinfo?(philipp)
Flags: needinfo?(mozilla)
Reporter | ||
Comment 1•7 years ago
|
||
Reporter | ||
Comment 2•7 years ago
|
||
On Windows, hangs in the XPT stage, last lines:
0:49.01 zipwriter.xpt
0:50.04 xptdata.cpp
Backout of
66a52ced217d Tom Prince — Bug 1436662: Remove `installer` MOZ_AUTOMATION stage; r=mshal
fba7cab660da Tom Prince — Bug 1436662: Package translated uninstaller; r=pike,mshal
Clear the hang. Well, I did Ctrl+C when it hung and then simply a non-clobber build, which already shows other bustage :-(
Reporter | ||
Comment 3•7 years ago
|
||
Try with latest C++ bustage fixed and this backed out:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=8fb8415648958f9bfd233b45688809d2cfdab234
Reporter | ||
Comment 4•7 years ago
|
||
Hmm, looks like bug 1436662 was false alarm, even with that backed out I'm still hanging on xptdata.cpp :-(
So maybe bug Bug 1444745.
Reporter | ||
Comment 5•7 years ago
|
||
Nika, Andrew and/or Eric: What do we need to port to get going again. This busted *all* our builds.
Flags: needinfo?(nika)
Flags: needinfo?(erahm)
Flags: needinfo?(continuation)
> 0:49.01 zipwriter.xpt
> 0:50.04 xptdata.cpp
This is the same as what I see. Just the mach TIER indicator says it is at "export".
The problem continues also this morning (after midnight's merge).
Reporter | ||
Comment 7•7 years ago
|
||
I tried backing out bug 1444745 locally, but that needs a backout of bug 1444991 which in turn need backout of bug 1452183 :-(
Reporter | ||
Comment 8•7 years ago
|
||
(In reply to :aceman from comment #6)
> The problem continues also this morning (after midnight's merge).
Sure, caused by bug 1444745, so why would it magically solve itself?
Comment 9•7 years ago
|
||
I can confirm the problem, I am stuck at the same point.
top shows a python process with 99% CPU load.
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Comment hidden (obsolete) |
Updated•7 years ago
|
Flags: needinfo?(mozilla)
Comment 14•7 years ago
|
||
Nika or I will take a look at this today. Bug 1444745 shouldn't have required any changes from Thunderbird, so maybe its XPIDL files are doing something that is hitting some kind of exponential behavior.
Blocks: 1444745
Flags: needinfo?(erahm)
Assignee | ||
Comment 15•7 years ago
|
||
The problem was that two different interfaces had the exact same UUID. This caused the perfect hash generator to hang forever, as it couldn't find a non-conflicting hash.
Giving both interfaces new UUIDs allowed the build to continue.
Attachment #8969352 -
Flags: review?(jorgk)
Assignee | ||
Comment 16•7 years ago
|
||
Attachment #8969353 -
Flags: review?(continuation)
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(nika)
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(philipp)
Flags: needinfo?(continuation)
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → nika
Comment 17•7 years ago
|
||
Comment on attachment 8969353 [details] [diff] [review]
Part 2: Assert if there is a duplicated UUID or name in xptcodegen.py
Review of attachment 8969353 [details] [diff] [review]:
-----------------------------------------------------------------
Fun. Thanks for fixing this.
Attachment #8969353 -
Flags: review?(continuation) → review+
Reporter | ||
Comment 18•7 years ago
|
||
Comment on attachment 8969352 [details] [diff] [review]
Part 1: Don't use the same UUID for two different interfaces (landed on C-C)
What can I say, old silly code from the year 1999 most likely :-(
I'll get this landed asap.
Attachment #8969352 -
Flags: review?(jorgk) → review+
Comment 19•7 years ago
|
||
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/5b57bba232f3
Part 1: Don't use the same UUID for two different interfaces. r=jorgk DONTBUILD
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 20•7 years ago
|
||
I believe we're still busted due to bug 1436662, so here's a try build which includes that the fix from bug 1436662 based on https://hg.mozilla.org/try/rev/40efd537899042e57676f168a16d1f1e1d95adde
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=8549426a9d205dc70e386743573a8872a3a39a4d
Reporter | ||
Comment 21•7 years ago
|
||
And a BIG THANKS or course!!!
Reporter | ||
Comment 22•7 years ago
|
||
Sadly this hasn't fully worked. The hang at
0:49.01 zipwriter.xpt
0:50.04 xptdata.cpp
is not gone, but I have another one as per comment ..., oops, I never posted it, so here goes:
1:47.99 Microsoft (R) 32b/64b MIDL Compiler Version 8.01.0622
1:47.99 Copyright (c) Microsoft Corporation. All rights reserved.
1:47.99 64 bit Processing c:\mozilla-source\comm-central\mozilla\other-licenses\ia2\IA2CommonTypes.idl
1:48.00 IA2CommonTypes.idl
1:50.44 \local\temp\tmpfewa1f
1:50.44 C:/PROGRA~2/WI3CF2~1/10/bin/100150~1.0/x64/fxc.exe -nologo -Tvs_4_0 c:/mozilla-source/comm-central/mozilla/gfx/layers/d3d11/mlgshaders\blend-vs.hlsl -EBlendVertexVS -VnBlendVertexVS -Vi -DVERTEX_SHADER -Fhc:\users\jorgk\appdata\local\temp\tmpu54xch
[snip]
1:50.83 C:/PROGRA~2/WI3CF2~1/10/bin/100150~1.0/x64/fxc.exe -nologo -Tvs_4_0 c:/mozilla-source/comm-central/mozilla/gfx/layers/d3d11/mlgshaders\test-features-vs.hlsl -ETestConstantBuffersVS -VnTestConstantBuffersVS -Vi -DVERTEX_SHADER -Fhc:\users\jorgk\appdata\local\temp\tmpeer7vk
So is that yet another problem?
Flags: needinfo?(nika)
Reporter | ||
Comment 23•7 years ago
|
||
BTW, the Mac try run had this problem:
https://taskcluster-artifacts.net/Z0TTxTMfSPGL7_gVYu2Drg/0/public/logs/live_backing.log
[task 2018-04-19T18:04:35.451Z] 18:04:35 INFO - make[4]: Entering directory '/builds/worker/workspace/build/src/obj-thunderbird/ipc/ipdl'
[task 2018-04-19T18:04:35.451Z] 18:04:35 INFO - /builds/worker/workspace/build/src/obj-thunderbird/_virtualenv/bin/python /builds/worker/workspace/build/src/config/pythonpath.py \
[task 2018-04-19T18:04:35.451Z] 18:04:35 INFO - -I/builds/worker/workspace/build/src/other-licenses/ply \
[task 2018-04-19T18:04:35.451Z] 18:04:35 INFO - /builds/worker/workspace/build/src/ipc/ipdl/ipdl.py \
[task 2018-04-19T18:04:35.451Z] 18:04:35 INFO - --sync-msg-list=/builds/worker/workspace/build/src/ipc/ipdl/sync-messages.ini \
[task 2018-04-19T18:04:35.451Z] 18:04:35 INFO - --msg-metadata=/builds/worker/workspace/build/src/ipc/ipdl/message-metadata.ini \
[task 2018-04-19T18:04:35.451Z] 18:04:35 INFO - --outheaders-dir=_ipdlheaders \
[task 2018-04-19T18:04:35.451Z] 18:04:35 INFO - --outcpp-dir=. \
[task 2018-04-19T18:04:35.453Z] 18:04:35 INFO - -I/builds/worker/workspace/build/src/obj-thunderbird/ipc/ipdl
[snip]
/src/widget/PlatformWidgetTypes.ipdlh /builds/worker/workspace/build/src/widget/headless/HeadlessWidgetTypes.ipdlh
[task 2018-04-19T18:04:35.473Z] 18:04:35 INFO - make[4]: Leaving directory '/builds/worker/workspace/build/src/obj-thunderbird/ipc/ipdl'
[task 2018-04-19T18:44:35.501Z] 18:44:35 INFO - Automation Error: mozprocess timed out after 2400 seconds running ['/usr/bin/python2.7', 'mach', '--log-no-times', 'build', '-v']
[task 2018-04-19T18:44:35.513Z] 18:44:35 ERROR - timed out after 2400 seconds of no output
Reporter | ||
Comment 24•7 years ago
|
||
There still seem to be problems surrounding that xptdata.cpp file. Should that be empty? Later I see:
6:17.43 c1xx: fatal error C1083: Cannot open source file: 'c:/mozilla-source/comm-central/obj-x86_64-pc-mingw32/xpcom/reflect/xptinfo/xptdata.cpp': Permission denied
I closed the window/process locking the file and restarted the build. Still hanging on
$ mozilla/mach build
0:00.95 Clobber not needed.
0:00.95 Adding make options from c:\mozilla-source\comm-central\.mozconfig
CONFIG_GUESS=x86_64-pc-mingw32
MOZ_OBJDIR=c:/mozilla-source/comm-central/obj-x86_64-pc-mingw32
OBJDIR=c:/mozilla-source/comm-central/obj-x86_64-pc-mingw32
FOUND_MOZCONFIG=c:/mozilla-source/comm-central/.mozconfig
export FOUND_MOZCONFIG
0:01.03 c:\mozilla-build\mozmake\mozmake.exe -f client.mk -s
0:02.02 Adding client.mk options from c:/mozilla-source/comm-central/.mozconfig:
0:02.02 CONFIG_GUESS=x86_64-pc-mingw32
0:02.02 FOUND_MOZCONFIG := c:/mozilla-source/comm-central/.mozconfig
0:03.77 Elapsed: 0.00s; From dist/public: Kept 0 existing; Added/updated 0; Removed 0 files and 0 directories.
0:03.77 Elapsed: 0.00s; From dist/private: Kept 0 existing; Added/updated 0; Removed 0 files and 0 directories.
0:03.77 Elapsed: 0.00s; From dist/xpi-stage: Kept 11 existing; Added/updated 0; Removed 0 files and 0 directories.
0:04.19 Elapsed: 0.41s; From dist/idl: Kept 0 existing; Added/updated 1178; Removed 0 files and 0 directories.
0:04.52 Elapsed: 0.73s; From _tests: Kept 0 existing; Added/updated 1455; Removed 0 files and 0 directories.
0:04.56 Elapsed: 0.80s; From dist/bin: Kept 0 existing; Added/updated 2121; Removed 0 files and 0 directories.
0:05.81 Elapsed: 2.03s; From dist/include: Kept 0 existing; Added/updated 5837; Removed 0 files and 0 directories.
0:07.44 Elapsed: 0.45s; From ../../dist/idl: Kept 0 existing; Added/updated 1178; Removed 0 files and 0 directories.
0:07.60 xptdata.cpp
:-(
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 25•7 years ago
|
||
BTW, I added the M-C patch but see no messages about further duplicates :-(
Reporter | ||
Comment 26•7 years ago
|
||
It hangs in xptcodegen.py, first argument is the xptdata.cpp file then follow lots of xpt files.
I tried to add prints to the Python file, nothing got printed. There wouldn't be a limitation on the number of arguments this can take?
Nika reported on IRC:
23:25:20 - nika: jorgk: no
23:25:25 - nika: jorgk: It shouldn't be empty
23:25:33 - nika: jorgk: It seems to be running fine on my end
Sadly it's not running fine neither for me nor for Aceman.
Reporter | ||
Comment 27•7 years ago
|
||
Just for fun, I ran
$ mozilla/xpcom/reflect/xptinfo/xptcodegen.py jk obj-x86_64-pc-mingw32/config/makefiles/xpidl/layout_printing.xpt
That created jk with some content. Also, all the prints I had added were shown. However, as part of the build, it hangs. When you kill it, it leaves behind a locked file.
If you build again, it will go further and fail later on the locked and empty file. If you kill the shell that owns the killed but still zombie python process, then start the build again, it hangs again.
Strangely, in a clobber build it appears to go past the processing of xptdata.cpp but hangs later as per comment #22, on something like:
1:45.47 C:/PROGRA~2/WI3CF2~1/10/bin/100150~1.0/x64/fxc.exe -nologo -Tvs_4_0 c:/mozilla-source/comm-central/mozilla/gfx/layers/d3d11/mlgshaders\test-features-vs.hlsl -ETestConstantBuffersVS -VnTestConstantBuffersVS -Vi -DVERTEX_SHADER -Fhc:\users\jorgk\appdata\local\temp\tmpin_xkm
That's just misleading, since in the process explorer, one can see that xptcodegen.py hangs. No print have come out.
So I guess something is wrong with the invocation of the script.
Reporter | ||
Comment 28•7 years ago
|
||
https://stackoverflow.com/questions/714475/what-is-a-maximum-number-of-arguments-in-a-python-function
starts with:
It's somewhat common knowledge that Python functions can have a maximum of 256 arguments.
I counted half the screen in the image I attached and came to 70, so the whole lot should be ~140. In my
C:\mozilla-source\comm-central\obj-x86_64-pc-mingw32\config\makefiles\xpidl
directory, I have 183 files.
OK, here's the next attempt:
$ mozilla/xpcom/reflect/xptinfo/xptcodegen.py jk obj-x86_64-pc-mingw32/config/makefiles/xpidl/*.xpt
This hangs and I only see the "111...111" from this debug:
def link_to_cpp(interfaces, fd):
# Perfect Hash from IID into the ifaces array.
print "11111111111111111111111111111"
iid_phf = PerfectHash(PHFSIZE, [
(iid_bytes(iface['uuid']), iface)
for iface in interfaces
])
# Perfect Hash from name to index in the ifaces array.
print "22222222222222222222222222222"
name_phf = PerfectHash(PHFSIZE, [
(bytearray(iface['name'], 'ascii'), idx)
for idx, iface in enumerate(iid_phf.values)
])
print "3333333333333333333333333333"
Reporter | ||
Comment 29•7 years ago
|
||
Reporter | ||
Comment 30•7 years ago
|
||
Killing that gives:
Traceback (most recent call last):
File "mozilla/xpcom/reflect/xptinfo/xptcodegen.py", line 540, in <module>
main()
File "mozilla/xpcom/reflect/xptinfo/xptcodegen.py", line 537, in main
link_and_write(args.xpts, fd)
File "mozilla/xpcom/reflect/xptinfo/xptcodegen.py", line 522, in link_and_write
link_to_cpp(interfaces, outfile)
File "mozilla/xpcom/reflect/xptinfo/xptcodegen.py", line 160, in link_to_cpp
for iface in interfaces
File "c:\mozilla-source\comm-central\mozilla\xpcom\reflect\xptinfo\perfecthash.py", line 98, in __init__
slot = hash(bucket.entries[idx].key, basis) % len(self.values)
File "c:\mozilla-source\comm-central\mozilla\xpcom\reflect\xptinfo\perfecthash.py", line 38, in hash
h *= FNV_PRIME # Multiply by the FNV prime
Reporter | ||
Comment 31•7 years ago
|
||
Works fine with PHFSIZE set to 512. There were 1630 UUIDs, so isn't that hash table a bit small?
Reporter | ||
Comment 32•7 years ago
|
||
Reporter | ||
Comment 33•7 years ago
|
||
I don't know what the 256 is about. If you think this makes sense, maybe get it landed with an appropriate commit message.
Next try:
M-C:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=be79666e4a595570642c9e8a0097e975c285c03e
C-C:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=f2145f8b3970fa5072b668e2e5dee9144d4b9d9b
Attachment #8969512 -
Flags: review?(nika)
Reporter | ||
Comment 34•7 years ago
|
||
Comment on attachment 8969512 [details] [diff] [review]
hash.patch
Andrew, given the urgency of the matter (no C-C builds for two days), perhaps you can take the review, suggest a good commit message and get this landed as well as Nika's other patch.
Attachment #8969512 -
Flags: review?(continuation)
Reporter | ||
Comment 35•7 years ago
|
||
People watching this bug:
hg import https://hg.mozilla.org/try/rev/be79666e4a595570642c9e8a0097e975c285c03e
will give you a working local build.
Comment 36•7 years ago
|
||
And since I just hit that trap:
import it in subdir mozilla, not directly in C-C...
Comment 37•7 years ago
|
||
Strangely, I updated c-c+m-c today morning (without applying the hash patch) and I could build successfully.
Reporter | ||
Comment 38•7 years ago
|
||
We also have a successful build on C-C on some platforms:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=5b57bba232f3883c11030edd337e66e58b582462
Linux32 and Linux64 seem to have passed, Mac hung and Windows seems to have passed but died on a build issue.
Watch the latest push:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=e3d205e2a797fa405f995a58d352ca942a3c29d9
Maybe some random bug in the perfect hash function that appears to be mitigated by doubling the size.
Reporter | ||
Comment 39•7 years ago
|
||
Looks like the Linux opt builds worked, Mac failed and Windows failed due to something else.
Comment 40•7 years ago
|
||
Ah, nice find. I'd guess that Thunderbird has more XPIDL files, so it needs a larger hash table.
Assignee | ||
Comment 41•7 years ago
|
||
Comment on attachment 8969512 [details] [diff] [review]
hash.patch
Review of attachment 8969512 [details] [diff] [review]:
-----------------------------------------------------------------
I'm guessing that the increased number of IDLs in Thunderbird caused the PHF stage of the build to take longer than expected, timing out.
::: xpcom/reflect/xptinfo/xptcodegen.py
@@ +18,2 @@
> # generate a more efficient modulo due to it being a power of 2.
> +PHFSIZE = 512
This is probably fine. This change in number will cause an additional 512 u32s to be included in static storage, totaling ~2kb additional uncompressed binary size.
This number controls how big the intermediate table used for the perfect hash function is. With the m-c setup, 256 worked well, so I didn't bother using 512, because it was fast enough as-is. This change will trade off compile time for binary size, which appears to be a pretty good tradeoff for Thunderbird at least.
Attachment #8969512 -
Flags: review?(nika)
Attachment #8969512 -
Flags: review?(continuation)
Attachment #8969512 -
Flags: review+
Reporter | ||
Comment 42•7 years ago
|
||
Carrying forward Nika's r+. Added commit message.
Attachment #8969481 -
Attachment is obsolete: true
Attachment #8969488 -
Attachment is obsolete: true
Attachment #8969512 -
Attachment is obsolete: true
Flags: needinfo?(nika)
Attachment #8969718 -
Flags: review+
Reporter | ||
Updated•7 years ago
|
Keywords: checkin-needed
Reporter | ||
Updated•7 years ago
|
Attachment #8969352 -
Attachment description: Part 1: Don't use the same UUID for two different interfaces → Part 1: Don't use the same UUID for two different interfaces (landed on C-C)
Reporter | ||
Comment 43•7 years ago
|
||
This is badly blocking TB, so please get this landed, preferably on M-C straight away.
Component: Build Config → XPCOM
Flags: needinfo?(aryx.bugmail)
Product: Thunderbird → Core
Reporter | ||
Comment 44•7 years ago
|
||
(In reply to :Nika Layzell from comment #41)
> I'm guessing that the increased number of IDLs in Thunderbird caused the PHF
> stage of the build to take longer than expected, timing out.
I beg to differ here. Running xptcodegen.py is simply hung, it times out after 40 minutes (2400 seconds) when a build on the fast servers takes 20-30 minutes. The behaviour is inconsistent across platforms and opt/debug builds, so I suggest there is a subtle bug somewhere. Please download attachment 8969488 [details] and see for yourself. I'd suggest to investigate this a bit further since hanging on a lot of input data is not really the right thing to do.
Comment 45•7 years ago
|
||
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6c972ac2e177
Part 2: Assert if there is a duplicated UUID or name in xptcodegen.py, r=mccr8
https://hg.mozilla.org/integration/mozilla-inbound/rev/ca83c0b3a19d
Part 3: Increase size of table used by perfect hash function from 256 to 512. r=nika
Keywords: checkin-needed
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(aryx.bugmail)
Comment 46•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/6c972ac2e177
https://hg.mozilla.org/mozilla-central/rev/ca83c0b3a19d
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
status-firefox61:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Updated•7 years ago
|
status-firefox61:
fixed → ---
Component: XPCOM → Build Config
Product: Core → Thunderbird
Target Milestone: mozilla61 → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•