3.96 - 22.09% build times (windows2012-32, windows2012-64, windows2012-64-noopt) regression on push d2013de59e87ff63899ef1e634bd3889ec126c93 (Fri Nov 9 2018)
Categories
(Core :: JavaScript: WebAssembly, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox-esr68 | --- | wontfix |
firefox63 | --- | unaffected |
firefox64 | --- | unaffected |
firefox65 | + | disabled |
firefox66 | --- | wontfix |
firefox69 | --- | wontfix |
firefox70 | --- | wontfix |
People
(Reporter: igoldan, Assigned: sunfish)
References
Details
(Keywords: regression)
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Updated•7 years ago
|
Assignee | ||
Comment 1•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 2•7 years ago
|
||
Updated•7 years ago
|
Assignee | ||
Comment 3•7 years ago
|
||
Comment 4•7 years ago
|
||
Reporter | ||
Comment 5•7 years ago
|
||
Updated•7 years ago
|
Assignee | ||
Comment 6•7 years ago
|
||
Reporter | ||
Comment 7•7 years ago
|
||
Comment 8•7 years ago
|
||
![]() |
||
Comment 9•7 years ago
|
||
Reporter | ||
Comment 10•7 years ago
|
||
Reporter | ||
Comment 12•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #9)
Otherwise, there have been some changes to how we compile Rust code recently
that may have helped this issue. We should wait and see for now; I don't
think we should back this code out or anything like that.
Can we resume the investigation into this, given comment 10?
![]() |
||
Comment 13•7 years ago
|
||
We could try. Now that I think about it, it's quite peculiar that the regression only shows up on Windows, and not any of our other platforms.
Cranelift invokes Python somewhere during its build process (or maybe it used to?)...is it possible that we're invoking Python a lot, and we don't notice it on Unix because new process creation is so face, but we do notice it on Windows?
Assignee | ||
Comment 14•7 years ago
|
||
If so, then one possible approach would be to check the generated files into the tree, rather than generating them at build time. That would avoid the need to run any python for typical builds. That'd require developers working on those files to manually run the scripts when making changes, but we might be ok with that if it fixes our build-time regression.
![]() |
||
Comment 15•7 years ago
|
||
OK, so it looks like we only invoke Python once:
https://github.com/CraneStation/cranelift/blob/master/lib/codegen/build.rs#L65
so the multiple invocations of Python theory seems to not hold water.
I would say that the update_file function doesn't try to avoid updating mtimes by checking contents before writing:
https://github.com/CraneStation/cranelift/blob/master/lib/codegen/meta-python/srcgen.py#L86-L91
But I would think that cargo ought to not run the build script in the first place, so that...shouldn't matter?
It seems strange to me that we're telling cargo to rerun-if-changed even for directory changes:
Maybe the sourcedirs are getting updated (mtimes?) on Windows (but not Unix-y systems), but not the actual files, and therefore Cargo thinks that everything needs to be recompiled, and that's what's triggering constant rebuilds of cranelift?
Comment 17•7 years ago
|
||
Cranelift invokes Python somewhere during its build process (or maybe it used to?)...is it possible that we're invoking Python a lot, and we don't notice it on Unix because new process creation is so face, but we do notice it on Windows?
Python is called only once as you found out, so this can't be this. Maybe the sources being regenerated over time would prevent cache hits?
Is there an explanation of how the Talos measures build time? Do we have any tool to measure build time of a subcomponent or sub-directory, or anything really that's not measuring the time it takes to build the whole tree?
(We could do an experiment in Cranelift, where the sources aren't dynamically generated (the build system would pass this option when building Cranelift), and instead we'd copy them in tree, as suggested in comment 14. Then we can see if the build time regression disappears or not, and investigate more precisely, or eliminate this as a possible solution.)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #16)
Any updates here?
To wit: this is a Nightly-only regression; Cranelift is not getting built at all on other releases.
![]() |
||
Comment 18•7 years ago
|
||
https://github.com/CraneStation/cranelift/pull/656 implemented the directory changes idea in comment 15. We'll see if that helps on the next cranelift import, I guess?
(In reply to Benjamin Bouvier [:bbouvier] from comment #17)
Python is called only once as you found out, so this can't be this. Maybe the sources being regenerated over time would prevent cache hits?
Maybe? If the sources being regenerated caused mtime updates, I would expect build time regressions on all platforms, not just Windows.
Is there an explanation of how the Talos measures build time? Do we have any tool to measure build time of a subcomponent or sub-directory, or anything really that's not measuring the time it takes to build the whole tree?
We do not, unfortunately.
(We could do an experiment in Cranelift, where the sources aren't dynamically generated (the build system would pass this option when building Cranelift), and instead we'd copy them in tree, as suggested in comment 14. Then we can see if the build time regression disappears or not, and investigate more precisely, or eliminate this as a possible solution.)
I think I'd attempt this via try first, to avoid polluting cranelift's history. The regression is big enough that doing this, and several try builds ought to be enough to tell you whether things got fixed or not.
Comment 19•7 years ago
|
||
What's the try command to get measures for the build time, please? I'd like to try a few things.
Reporter | ||
Comment 20•7 years ago
|
||
(In reply to Benjamin Bouvier [:bbouvier] from comment #19)
What's the try command to get measures for the build time, please? I'd like to try a few things.
I believe it is mach try fuzzy --no-artifact --query "build-win64-plain/opt | build-win64-asan/opt | build-win64/opt | build-win64/debug | build-win64-noopt/debug"
Comment 21•7 years ago
|
||
Did you get a chance to try to reproduce, Benjamin?
Comment 22•7 years ago
|
||
No, I haven't yet. I think the first step would be to do a bisection to confirm which change in Gecko introduced this regression, then in Cranelift to see if there's an obvious commit that raised compilation time (it might a death-by-a-thousand-commits scenario too).
Comment 23•7 years ago
|
||
(In reply to Benjamin Bouvier [:bbouvier] from comment #22)
No, I haven't yet. I think the first step would be to do a bisection to confirm which change in Gecko introduced this regression, then in Cranelift to see if there's an obvious commit that raised compilation time (it might a death-by-a-thousand-commits scenario too).
Ionuț, is this something you can do?
Comment 24•7 years ago
|
||
(I'm going to mark this fix-optional for 66 since a build time regression doesn't affect users)
Reporter | ||
Comment 25•7 years ago
|
||
(In reply to Andrew Overholt [:overholt] from comment #23)
(In reply to Benjamin Bouvier [:bbouvier] from comment #22)
No, I haven't yet. I think the first step would be to do a bisection to confirm which change in Gecko introduced this regression, then in Cranelift to see if there's an obvious commit that raised compilation time (it might a death-by-a-thousand-commits scenario too).
Ionuț, is this something you can do?
I don't think bisecting on Cranelift is something I could work on.
As for looking over Gecko changes: I need more details, to understand whether I could help here.
Updated•6 years ago
|
Comment 26•6 years ago
|
||
(Note this should now be addressed by the landing of bug 1555894, if this was Cranelift that caused the long build times.)
Updated•6 years ago
|
Updated•6 years ago
|
Reporter | ||
Comment 27•6 years ago
|
||
(In reply to Benjamin Bouvier [:bbouvier] from comment #26)
(Note this should now be addressed by the landing of bug 1555894, if this was Cranelift that caused the long build times.)
Indeed, all regressions here seem fixed now.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Comment 28•6 years ago
|
||
Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.
Updated•6 years ago
|
Description
•