hazard tasks runs 175-195 min instead of 65-80 min after bug 1766866 landed
Categories
(Core :: Preferences: Backend, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox100 | --- | unaffected |
firefox101 | --- | unaffected |
firefox102 | --- | fixed |
People
(Reporter: aryx, Assigned: sfink)
References
(Regression)
Details
(Keywords: regression)
Attachments
(3 files)
See these hazard tasks. The one which ran for bug 1766866 or newer take ~3h to succeed, older ones 65-80 minutes. Use the "Similar Jobs" tab at the bottom of the Treeherder page to get an overview, ignore the run time for the blue, retried tasks.
Comment 1•2 years ago
|
||
Interesting. I might need some help from sfink to understand it.
Now in Bug 1752332 I landed all the pref-sanitizing architecture, including the AutoSuppressGCAnalysis inside AddAccessCount. (And AddAccessCount is only enabled in debug mode FWIW.) This AutoSuppressGCAnalysis instantiation is made for every pref access - not just ones that are sanitized, not just ones in the parent, but literally every single one. The Hazard runtime didn't increase from this patch, so that can't be the culprit.
But in Bug 1766866 I added a bunch more prefs to the list of prefs that get sanitized. That list is only used in a loop iterating and comparing.
I went looking for the timestamp jumps[0] and found them in the same place two times:
[task 2022-05-04T11:10:37.473Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:37.585Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:37.585Z] /builds/worker/workspace/obj-analyzed-browser/_virtualenvs/build/bin/python -m mozbuild.action.check_binary --target ../../../dist/bin/jsapi-tests
[task 2022-05-04T11:10:37.585Z] TEST-PASS | check_textrel | jsapi-tests
[task 2022-05-04T11:10:37.586Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T12:58:55.834Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/modules/libpref'
[task 2022-05-04T12:58:55.836Z] /builds/worker/fetches/sixgill/usr/libexec/sixgill/scripts/wrap_gcc/basecc /builds/worker/fetches/gcc/bin/g++ --sysroot /builds/worker/fetches/sysroot-x86_64-linux-gnu -std=gnu++17 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include -o Unified_cpp_modules_libpref0.o -c -I/builds/worker/workspace/obj-analyzed-browser/dist/stl_wrappers -I/builds/worker/workspace/obj-analyzed-browser/dist/system_wrappers -include /builds/worker/checkouts/gecko/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DDEBUG=1 -DOS_POSIX=1 -DOS_LINUX=1 -DOS_ARCH=Linux -DMOZ_WIDGET_TOOLKIT=gtk -DMOZ_SERVICES_SYNC -DMOZ_BUILD_APP_IS_BROWSER -DMOZ_BACKGROUNDTASKS -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -I/builds/worker/checkouts/gecko/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/ipc/ipdl/_ipdlheaders -I/builds/worker/checkouts/gecko/ipc/chromium/src -I/builds/worker/workspace/obj-analyzed-browser/dist/include -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nspr -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nss -DMOZILLA_CLIENT -include /builds/worker/workspace/obj-analyzed-browser/mozilla-config.h -Wno-attributes -Wno-ignored-attributes -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wno-multistatement-macros -Wno-error=class-memaccess -Wno-error=deprecated-copy -Wformat -Wformat-overflow=2 -Wno-psabi -fno-sized-deallocation -fno-aligned-new -Wno-attributes -Wno-ignored-attributes -fno-exceptions -fPIC -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -gdwarf-4 -freorder-blocks -Os -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -ffp-contract=off -MD -MP -MF .deps/Unified_cpp_modules_libpref0.o.pp Unified_cpp_modules_libpref0.cpp
task 2022-05-04T11:10:11.253Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:11.368Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:11.368Z] /builds/worker/workspace/obj-analyzed-browser/_virtualenvs/build/bin/python -m mozbuild.action.check_binary --target ../../../dist/bin/jsapi-tests
[task 2022-05-04T11:10:11.368Z] TEST-PASS | check_textrel | jsapi-tests
[task 2022-05-04T11:10:11.368Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T13:10:40.083Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/modules/libpref'
[task 2022-05-04T13:10:40.084Z] /builds/worker/fetches/sixgill/usr/libexec/sixgill/scripts/wrap_gcc/basecc /builds/worker/fetches/gcc/bin/g++ --sysroot /builds/worker/fetches/sysroot-x86_64-linux-gnu -std=gnu++17 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include -o Unified_cpp_modules_libpref0.o -c -I/builds/worker/workspace/obj-analyzed-browser/dist/stl_wrappers -I/builds/worker/workspace/obj-analyzed-browser/dist/system_wrappers -include /builds/worker/checkouts/gecko/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DDEBUG=1 -DOS_POSIX=1 -DOS_LINUX=1 -DOS_ARCH=Linux -DMOZ_WIDGET_TOOLKIT=gtk -DMOZ_SERVICES_SYNC -DMOZ_BUILD_APP_IS_BROWSER -DMOZ_BACKGROUNDTASKS -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -I/builds/worker/checkouts/gecko/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/ipc/ipdl/_ipdlheaders -I/builds/worker/checkouts/gecko/ipc/chromium/src -I/builds/worker/workspace/obj-analyzed-browser/dist/include -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nspr -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nss -DMOZILLA_CLIENT -include /builds/worker/workspace/obj-analyzed-browser/mozilla-config.h -Wno-attributes -Wno-ignored-attributes -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wno-multistatement-macros -Wno-error=class-memaccess -Wno-error=deprecated-copy -Wformat -Wformat-overflow=2 -Wno-psabi -fno-sized-deallocation -fno-aligned-new -Wno-attributes -Wno-ignored-attributes -fno-exceptions -fPIC -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -gdwarf-4 -freorder-blocks -Os -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -ffp-contract=off -MD -MP -MF .deps/Unified_cpp_modules_libpref0.o.pp Unified_cpp_modules_libpref0.cpp
TC's timestamps are always kind of weird, it seems likely the issue is with building the Preferences module. But I don't quite get why increasing the number of iterations in this loop would cause the hazard build to freak out like this.
[0] Here's some bad code I'll turn into an actual script if I need it again.
Assignee | ||
Comment 2•2 years ago
|
||
This is... interesting. I'm trying to look at it locally. Smells like something really stupid is happening. It really shouldn't care about the number of iterations, since it's only processing the source code. But there are a lot of ways in which my firm, well-founded beliefs don't seem to agree with reality very well. I'm still waiting for reality to get its act together.
Assignee | ||
Comment 3•2 years ago
|
||
It most certainly does reproduce locally. It takes a very long time in Xgill::SplitLoops
while compiling Unified_cpp_modules_libpref0.cpp
, specifically while compiling /home/sfink/src/mozilla/modules/libpref/Preferences.cpp
at line 5670, which is the body of the function static void InitStaticPrefsFromShared() {
which is a long autogenerated sequence of macro bodies instantiated for every preference.
Oddly, I don't see anything I would call a loop, so I'm not sure why the control flow graph is so confusing to it. But I'll need to try to understand what it's doing first.
Comment 4•2 years ago
|
||
Oh. I left out something in my comment (because I forgot about it.) In Bug 1766866 I didn't just add things to the list, I also added something to the macros in InitStaticPrefsFromShared()
as seen here.
Now, for every one of those prefs, I've added
- A call to
XRE_IsParentProcess()
IsString<cpp_type>::value
which ought to be statically determined at compile-time- A call to
XRE_IsContentProcess()
- A call to
ShouldSanitizePreference(char*, bool)
which itself will include: - A call to
pref_Lookup
which is itself not trivial... - A call to
ShouldSanitizePreference(Pref*, bool)
which mostly calls getters and strcmp
Maybe it's that 'statically determined at compile-time' that's freaking it out?
Assignee | ||
Comment 5•2 years ago
|
||
(In reply to Tom Ritter [:tjr] (ni for sec-approvals and such) from comment #4)
Maybe it's that 'statically determined at compile-time' that's freaking it out?
No, it's something much dumber. Those chunks of code contain MOZ_ASSERT_DIAGNOSTIC
and similar things, which expand to multiple do { ... } while (false)
"loops". gcc passes them through to sixgill relatively unmodified, and sixgill ends up calling Xgill::SplitLoops
on a body that contains 10512 loops. The processing it does is unfortunately general, since it wants to handle even the cases where the loops are irreducible. Which I think means they can't be converted to nested structured control flow, so things like:
L1: expr;
while (cond) {
if (cond2) {
goto L1
}
}
if I'm reading this correctly. Furthermore, the processing is algorithmically inefficient. It needs to compute the set of places in the body that are dominated by each and every loop head, and it does that by scanning nearly the entire graph twice, then looping over both sets of results to compute their difference by doing 1-2 hash lookups for each edge in the body (and there are 84077 of those). It normally doesn't matter much at all, but O(n²) starts to hurt when n comes from this long autogenerated list of preferences.
At the end of it, sixgill strips out all of these degenerate loops, so the output the main analysis code sees is a boring linear series of instructions.
I'm going to try to avoid the "real" fix, which would be to do a real dominator analysis over the whole body, and instead figure out how to detect that none of these loop heads are ever looped back to and drop them. But it's tricky code and I only have a minimal test suite for the sixgill portion of the analysis, so I want to be very careful.
Assignee | ||
Comment 6•2 years ago
|
||
I compared the output before and after the optimization for every function in js/src, and it's identical. I also found that there are some ancient tests that exercise this logic (and in fact broke with my first attempt), so I'm feeling better about its reliability.
I don't actually know how much this will speed things up, though. I have a try push running right now to check.
Assignee | ||
Comment 7•2 years ago
|
||
Updated•2 years ago
|
Assignee | ||
Comment 8•2 years ago
|
||
Assignee | ||
Comment 9•2 years ago
|
||
No good. My try push still took 197 minutes to run. Something's still going wrong.
Assignee | ||
Comment 10•2 years ago
|
||
Comment 11•2 years ago
|
||
The severity field is not set for this bug.
:KrisWright, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 12•2 years ago
|
||
This is currently landing, so I set the severity and priority. Somewhat randomly, I'll admit.
Comment 13•2 years ago
|
||
Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/db80146ff379 Update to sixgill that optimizes out trivial loops
Comment 14•2 years ago
|
||
bugherder |
Description
•