Closed Bug 1767612 Opened 2 years ago Closed 2 years ago

hazard tasks runs 175-195 min instead of 65-80 min after bug 1766866 landed

Categories

(Core :: Preferences: Backend, defect, P2)

defect

Tracking

()

RESOLVED FIXED
102 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox100 --- unaffected
firefox101 --- unaffected
firefox102 --- fixed

People

(Reporter: aryx, Assigned: sfink)

References

(Regression)

Details

(Keywords: regression)

Attachments

(3 files)

See these hazard tasks. The one which ran for bug 1766866 or newer take ~3h to succeed, older ones 65-80 minutes. Use the "Similar Jobs" tab at the bottom of the Treeherder page to get an overview, ignore the run time for the blue, retried tasks.

Flags: needinfo?(tom)

Interesting. I might need some help from sfink to understand it.

Now in Bug 1752332 I landed all the pref-sanitizing architecture, including the AutoSuppressGCAnalysis inside AddAccessCount. (And AddAccessCount is only enabled in debug mode FWIW.) This AutoSuppressGCAnalysis instantiation is made for every pref access - not just ones that are sanitized, not just ones in the parent, but literally every single one. The Hazard runtime didn't increase from this patch, so that can't be the culprit.

But in Bug 1766866 I added a bunch more prefs to the list of prefs that get sanitized. That list is only used in a loop iterating and comparing.

I went looking for the timestamp jumps[0] and found them in the same place two times:

[task 2022-05-04T11:10:37.473Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:37.585Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:37.585Z] /builds/worker/workspace/obj-analyzed-browser/_virtualenvs/build/bin/python -m mozbuild.action.check_binary --target ../../../dist/bin/jsapi-tests
[task 2022-05-04T11:10:37.585Z] TEST-PASS | check_textrel | jsapi-tests
[task 2022-05-04T11:10:37.586Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T12:58:55.834Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/modules/libpref'
[task 2022-05-04T12:58:55.836Z] /builds/worker/fetches/sixgill/usr/libexec/sixgill/scripts/wrap_gcc/basecc /builds/worker/fetches/gcc/bin/g++ --sysroot /builds/worker/fetches/sysroot-x86_64-linux-gnu -std=gnu++17 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include -o Unified_cpp_modules_libpref0.o -c  -I/builds/worker/workspace/obj-analyzed-browser/dist/stl_wrappers -I/builds/worker/workspace/obj-analyzed-browser/dist/system_wrappers -include /builds/worker/checkouts/gecko/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DDEBUG=1 -DOS_POSIX=1 -DOS_LINUX=1 -DOS_ARCH=Linux -DMOZ_WIDGET_TOOLKIT=gtk -DMOZ_SERVICES_SYNC -DMOZ_BUILD_APP_IS_BROWSER -DMOZ_BACKGROUNDTASKS -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -I/builds/worker/checkouts/gecko/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/ipc/ipdl/_ipdlheaders -I/builds/worker/checkouts/gecko/ipc/chromium/src -I/builds/worker/workspace/obj-analyzed-browser/dist/include -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nspr -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nss -DMOZILLA_CLIENT -include /builds/worker/workspace/obj-analyzed-browser/mozilla-config.h -Wno-attributes -Wno-ignored-attributes -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wno-multistatement-macros -Wno-error=class-memaccess -Wno-error=deprecated-copy -Wformat -Wformat-overflow=2 -Wno-psabi -fno-sized-deallocation -fno-aligned-new -Wno-attributes -Wno-ignored-attributes -fno-exceptions -fPIC -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -gdwarf-4 -freorder-blocks -Os -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -ffp-contract=off  -MD -MP -MF .deps/Unified_cpp_modules_libpref0.o.pp   Unified_cpp_modules_libpref0.cpp


task 2022-05-04T11:10:11.253Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:11.368Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:11.368Z] /builds/worker/workspace/obj-analyzed-browser/_virtualenvs/build/bin/python -m mozbuild.action.check_binary --target ../../../dist/bin/jsapi-tests
[task 2022-05-04T11:10:11.368Z] TEST-PASS | check_textrel | jsapi-tests
[task 2022-05-04T11:10:11.368Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T13:10:40.083Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/modules/libpref'
[task 2022-05-04T13:10:40.084Z] /builds/worker/fetches/sixgill/usr/libexec/sixgill/scripts/wrap_gcc/basecc /builds/worker/fetches/gcc/bin/g++ --sysroot /builds/worker/fetches/sysroot-x86_64-linux-gnu -std=gnu++17 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include -o Unified_cpp_modules_libpref0.o -c  -I/builds/worker/workspace/obj-analyzed-browser/dist/stl_wrappers -I/builds/worker/workspace/obj-analyzed-browser/dist/system_wrappers -include /builds/worker/checkouts/gecko/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DDEBUG=1 -DOS_POSIX=1 -DOS_LINUX=1 -DOS_ARCH=Linux -DMOZ_WIDGET_TOOLKIT=gtk -DMOZ_SERVICES_SYNC -DMOZ_BUILD_APP_IS_BROWSER -DMOZ_BACKGROUNDTASKS -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -I/builds/worker/checkouts/gecko/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/ipc/ipdl/_ipdlheaders -I/builds/worker/checkouts/gecko/ipc/chromium/src -I/builds/worker/workspace/obj-analyzed-browser/dist/include -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nspr -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nss -DMOZILLA_CLIENT -include /builds/worker/workspace/obj-analyzed-browser/mozilla-config.h -Wno-attributes -Wno-ignored-attributes -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wno-multistatement-macros -Wno-error=class-memaccess -Wno-error=deprecated-copy -Wformat -Wformat-overflow=2 -Wno-psabi -fno-sized-deallocation -fno-aligned-new -Wno-attributes -Wno-ignored-attributes -fno-exceptions -fPIC -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -gdwarf-4 -freorder-blocks -Os -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -ffp-contract=off  -MD -MP -MF .deps/Unified_cpp_modules_libpref0.o.pp   Unified_cpp_modules_libpref0.cpp

TC's timestamps are always kind of weird, it seems likely the issue is with building the Preferences module. But I don't quite get why increasing the number of iterations in this loop would cause the hazard build to freak out like this.

[0] Here's some bad code I'll turn into an actual script if I need it again.

Flags: needinfo?(tom) → needinfo?(sphink)

This is... interesting. I'm trying to look at it locally. Smells like something really stupid is happening. It really shouldn't care about the number of iterations, since it's only processing the source code. But there are a lot of ways in which my firm, well-founded beliefs don't seem to agree with reality very well. I'm still waiting for reality to get its act together.

It most certainly does reproduce locally. It takes a very long time in Xgill::SplitLoops while compiling Unified_cpp_modules_libpref0.cpp, specifically while compiling /home/sfink/src/mozilla/modules/libpref/Preferences.cpp at line 5670, which is the body of the function static void InitStaticPrefsFromShared() { which is a long autogenerated sequence of macro bodies instantiated for every preference.

Oddly, I don't see anything I would call a loop, so I'm not sure why the control flow graph is so confusing to it. But I'll need to try to understand what it's doing first.

Oh. I left out something in my comment (because I forgot about it.) In Bug 1766866 I didn't just add things to the list, I also added something to the macros in InitStaticPrefsFromShared() as seen here.

Now, for every one of those prefs, I've added

  • A call to XRE_IsParentProcess()
  • IsString<cpp_type>::value which ought to be statically determined at compile-time
  • A call to XRE_IsContentProcess()
  • A call to ShouldSanitizePreference(char*, bool) which itself will include:
  • A call to pref_Lookup which is itself not trivial...
  • A call to ShouldSanitizePreference(Pref*, bool) which mostly calls getters and strcmp

Maybe it's that 'statically determined at compile-time' that's freaking it out?

(In reply to Tom Ritter [:tjr] (ni for sec-approvals and such) from comment #4)

Maybe it's that 'statically determined at compile-time' that's freaking it out?

No, it's something much dumber. Those chunks of code contain MOZ_ASSERT_DIAGNOSTIC and similar things, which expand to multiple do { ... } while (false) "loops". gcc passes them through to sixgill relatively unmodified, and sixgill ends up calling Xgill::SplitLoops on a body that contains 10512 loops. The processing it does is unfortunately general, since it wants to handle even the cases where the loops are irreducible. Which I think means they can't be converted to nested structured control flow, so things like:

    L1: expr;
        while (cond) {
          if (cond2) {
            goto L1
          }
        }

if I'm reading this correctly. Furthermore, the processing is algorithmically inefficient. It needs to compute the set of places in the body that are dominated by each and every loop head, and it does that by scanning nearly the entire graph twice, then looping over both sets of results to compute their difference by doing 1-2 hash lookups for each edge in the body (and there are 84077 of those). It normally doesn't matter much at all, but O(n²) starts to hurt when n comes from this long autogenerated list of preferences.

At the end of it, sixgill strips out all of these degenerate loops, so the output the main analysis code sees is a boring linear series of instructions.

I'm going to try to avoid the "real" fix, which would be to do a real dominator analysis over the whole body, and instead figure out how to detect that none of these loop heads are ever looped back to and drop them. But it's tricky code and I only have a minimal test suite for the sixgill portion of the analysis, so I want to be very careful.

I compared the output before and after the optimization for every function in js/src, and it's identical. I also found that there are some ancient tests that exercise this logic (and in fact broke with my first attempt), so I'm feeling better about its reliability.

I don't actually know how much this will speed things up, though. I have a try push running right now to check.

Flags: needinfo?(sphink)
Assignee: nobody → sphink
Status: NEW → ASSIGNED

No good. My try push still took 197 minutes to run. Something's still going wrong.

The severity field is not set for this bug.
:KrisWright, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(kwright)

This is currently landing, so I set the severity and priority. Somewhat randomly, I'll admit.

Severity: -- → S4
Flags: needinfo?(kwright)
Priority: -- → P2
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/db80146ff379
Update to sixgill that optimizes out trivial loops
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 102 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: