Closed Bug 1767612 Opened 2 years ago Closed 2 years ago

hazard tasks runs 175-195 min instead of 65-80 min after bug 1766866 landed

Tracking

()

Status:

RESOLVED FIXED

Milestone:

102 Branch

Tracking Flags:

Tracking

Status

firefox-esr91

---

unaffected

firefox100

---

unaffected

firefox101

---

unaffected

firefox102

---

fixed

People

(Reporter: aryx, Assigned: sfink)

References

(Regression)

Details

(Keywords: regression)

Attachments

(3 files)

Bug 1767612 - Discard trivial loops 2 years ago Steve Fink [:sfink] [:s:] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1767612 - Update to sixgill that optimizes out trivial loops 2 years ago Steve Fink [:sfink] [:s:] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1767612 - Add a performance test for a body containing many, many degenerate loops. 2 years ago Steve Fink [:sfink] [:s:] 48 bytes, text/x-phabricator-request		Details \| Review

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Description

•

2 years ago

See these hazard tasks. The one which ran for bug 1766866 or newer take ~3h to succeed, older ones 65-80 minutes. Use the "Similar Jobs" tab at the bottom of the Treeherder page to get an overview, ignore the run time for the blue, retried tasks.

Flags: needinfo?(tom)

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Comment 1

•

2 years ago

Interesting. I might need some help from sfink to understand it.

Now in Bug 1752332 I landed all the pref-sanitizing architecture, including the AutoSuppressGCAnalysis inside AddAccessCount. (And AddAccessCount is only enabled in debug mode FWIW.) This AutoSuppressGCAnalysis instantiation is made for every pref access - not just ones that are sanitized, not just ones in the parent, but literally every single one. The Hazard runtime didn't increase from this patch, so that can't be the culprit.

But in Bug 1766866 I added a bunch more prefs to the list of prefs that get sanitized. That list is only used in a loop iterating and comparing.

I went looking for the timestamp jumps[0] and found them in the same place two times:

[task 2022-05-04T11:10:37.473Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:37.585Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:37.585Z] /builds/worker/workspace/obj-analyzed-browser/_virtualenvs/build/bin/python -m mozbuild.action.check_binary --target ../../../dist/bin/jsapi-tests
[task 2022-05-04T11:10:37.585Z] TEST-PASS | check_textrel | jsapi-tests
[task 2022-05-04T11:10:37.586Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T12:58:55.834Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/modules/libpref'
[task 2022-05-04T12:58:55.836Z] /builds/worker/fetches/sixgill/usr/libexec/sixgill/scripts/wrap_gcc/basecc /builds/worker/fetches/gcc/bin/g++ --sysroot /builds/worker/fetches/sysroot-x86_64-linux-gnu -std=gnu++17 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include -o Unified_cpp_modules_libpref0.o -c  -I/builds/worker/workspace/obj-analyzed-browser/dist/stl_wrappers -I/builds/worker/workspace/obj-analyzed-browser/dist/system_wrappers -include /builds/worker/checkouts/gecko/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DDEBUG=1 -DOS_POSIX=1 -DOS_LINUX=1 -DOS_ARCH=Linux -DMOZ_WIDGET_TOOLKIT=gtk -DMOZ_SERVICES_SYNC -DMOZ_BUILD_APP_IS_BROWSER -DMOZ_BACKGROUNDTASKS -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -I/builds/worker/checkouts/gecko/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/ipc/ipdl/_ipdlheaders -I/builds/worker/checkouts/gecko/ipc/chromium/src -I/builds/worker/workspace/obj-analyzed-browser/dist/include -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nspr -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nss -DMOZILLA_CLIENT -include /builds/worker/workspace/obj-analyzed-browser/mozilla-config.h -Wno-attributes -Wno-ignored-attributes -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wno-multistatement-macros -Wno-error=class-memaccess -Wno-error=deprecated-copy -Wformat -Wformat-overflow=2 -Wno-psabi -fno-sized-deallocation -fno-aligned-new -Wno-attributes -Wno-ignored-attributes -fno-exceptions -fPIC -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -gdwarf-4 -freorder-blocks -Os -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -ffp-contract=off  -MD -MP -MF .deps/Unified_cpp_modules_libpref0.o.pp   Unified_cpp_modules_libpref0.cpp


task 2022-05-04T11:10:11.253Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:11.368Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T11:10:11.368Z] /builds/worker/workspace/obj-analyzed-browser/_virtualenvs/build/bin/python -m mozbuild.action.check_binary --target ../../../dist/bin/jsapi-tests
[task 2022-05-04T11:10:11.368Z] TEST-PASS | check_textrel | jsapi-tests
[task 2022-05-04T11:10:11.368Z] gmake[4]: Leaving directory '/builds/worker/workspace/obj-analyzed-browser/js/src/jsapi-tests'
[task 2022-05-04T13:10:40.083Z] gmake[4]: Entering directory '/builds/worker/workspace/obj-analyzed-browser/modules/libpref'
[task 2022-05-04T13:10:40.084Z] /builds/worker/fetches/sixgill/usr/libexec/sixgill/scripts/wrap_gcc/basecc /builds/worker/fetches/gcc/bin/g++ --sysroot /builds/worker/fetches/sysroot-x86_64-linux-gnu -std=gnu++17 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu/c++/7.5.0 -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include/x86_64-linux-gnu -isystem /builds/worker/fetches/sysroot-x86_64-linux-gnu/usr/include -o Unified_cpp_modules_libpref0.o -c  -I/builds/worker/workspace/obj-analyzed-browser/dist/stl_wrappers -I/builds/worker/workspace/obj-analyzed-browser/dist/system_wrappers -include /builds/worker/checkouts/gecko/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -DDEBUG=1 -DOS_POSIX=1 -DOS_LINUX=1 -DOS_ARCH=Linux -DMOZ_WIDGET_TOOLKIT=gtk -DMOZ_SERVICES_SYNC -DMOZ_BUILD_APP_IS_BROWSER -DMOZ_BACKGROUNDTASKS -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -I/builds/worker/checkouts/gecko/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/modules/libpref -I/builds/worker/workspace/obj-analyzed-browser/ipc/ipdl/_ipdlheaders -I/builds/worker/checkouts/gecko/ipc/chromium/src -I/builds/worker/workspace/obj-analyzed-browser/dist/include -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nspr -I/builds/worker/workspace/obj-analyzed-browser/dist/include/nss -DMOZILLA_CLIENT -include /builds/worker/workspace/obj-analyzed-browser/mozilla-config.h -Wno-attributes -Wno-ignored-attributes -Wall -Wempty-body -Wignored-qualifiers -Wpointer-arith -Wsign-compare -Wtype-limits -Wunreachable-code -Wno-invalid-offsetof -Wduplicated-cond -Wimplicit-fallthrough -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -Wno-multistatement-macros -Wno-error=class-memaccess -Wno-error=deprecated-copy -Wformat -Wformat-overflow=2 -Wno-psabi -fno-sized-deallocation -fno-aligned-new -Wno-attributes -Wno-ignored-attributes -fno-exceptions -fPIC -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -gdwarf-4 -freorder-blocks -Os -fno-omit-frame-pointer -funwind-tables -Werror -fno-strict-aliasing -ffp-contract=off  -MD -MP -MF .deps/Unified_cpp_modules_libpref0.o.pp   Unified_cpp_modules_libpref0.cpp

TC's timestamps are always kind of weird, it seems likely the issue is with building the Preferences module. But I don't quite get why increasing the number of iterations in this loop would cause the hazard build to freak out like this.

[0] Here's some bad code I'll turn into an actual script if I need it again.

Flags: needinfo?(tom) → needinfo?(sphink)

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Updated

•

2 years ago

Comment 2

•

2 years ago

This is... interesting. I'm trying to look at it locally. Smells like something really stupid is happening. It really shouldn't care about the number of iterations, since it's only processing the source code. But there are a lot of ways in which my firm, well-founded beliefs don't seem to agree with reality very well. I'm still waiting for reality to get its act together.

Steve Fink [:sfink] [:s:]

Assignee

Comment 3

•

2 years ago

It most certainly does reproduce locally. It takes a very long time in Xgill::SplitLoops while compiling Unified_cpp_modules_libpref0.cpp, specifically while compiling /home/sfink/src/mozilla/modules/libpref/Preferences.cpp at line 5670, which is the body of the function static void InitStaticPrefsFromShared() { which is a long autogenerated sequence of macro bodies instantiated for every preference.

Oddly, I don't see anything I would call a loop, so I'm not sure why the control flow graph is so confusing to it. But I'll need to try to understand what it's doing first.

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Comment 4

•

2 years ago

Oh. I left out something in my comment (because I forgot about it.) In Bug 1766866 I didn't just add things to the list, I also added something to the macros in InitStaticPrefsFromShared() as seen here.

Now, for every one of those prefs, I've added

A call to XRE_IsParentProcess()
IsString<cpp_type>::value which ought to be statically determined at compile-time
A call to XRE_IsContentProcess()
A call to ShouldSanitizePreference(char*, bool) which itself will include:
A call to pref_Lookup which is itself not trivial...
A call to ShouldSanitizePreference(Pref*, bool) which mostly calls getters and strcmp

Maybe it's that 'statically determined at compile-time' that's freaking it out?

Steve Fink [:sfink] [:s:]

Assignee

Comment 5

•

2 years ago

(In reply to Tom Ritter [:tjr] (ni for sec-approvals and such) from comment #4)

Maybe it's that 'statically determined at compile-time' that's freaking it out?

No, it's something much dumber. Those chunks of code contain MOZ_ASSERT_DIAGNOSTIC and similar things, which expand to multiple do { ... } while (false) "loops". gcc passes them through to sixgill relatively unmodified, and sixgill ends up calling Xgill::SplitLoops on a body that contains 10512 loops. The processing it does is unfortunately general, since it wants to handle even the cases where the loops are irreducible. Which I think means they can't be converted to nested structured control flow, so things like:

    L1: expr;
        while (cond) {
          if (cond2) {
            goto L1
          }
        }

if I'm reading this correctly. Furthermore, the processing is algorithmically inefficient. It needs to compute the set of places in the body that are dominated by each and every loop head, and it does that by scanning nearly the entire graph twice, then looping over both sets of results to compute their difference by doing 1-2 hash lookups for each edge in the body (and there are 84077 of those). It normally doesn't matter much at all, but O(n²) starts to hurt when n comes from this long autogenerated list of preferences.

At the end of it, sixgill strips out all of these degenerate loops, so the output the main analysis code sees is a boring linear series of instructions.

I'm going to try to avoid the "real" fix, which would be to do a real dominator analysis over the whole body, and instead figure out how to detect that none of these loop heads are ever looped back to and drop them. But it's tricky code and I only have a minimal test suite for the sixgill portion of the analysis, so I want to be very careful.

Steve Fink [:sfink] [:s:]

Assignee

Comment 6

•

2 years ago

I compared the output before and after the optimization for every function in js/src, and it's identical. I also found that there are some ancient tests that exercise this logic (and in fact broke with my first attempt), so I'm feeling better about its reliability.

I don't actually know how much this will speed things up, though. I have a try push running right now to check.

Flags: needinfo?(sphink)

Steve Fink [:sfink] [:s:]

Assignee

Comment 7

•

2 years ago

Attached file Bug 1767612 - Discard trivial loops — Details

Phabricator Automation

Updated

•

2 years ago

Assignee: nobody → sphink

Status: NEW → ASSIGNED

Steve Fink [:sfink] [:s:]

Assignee

Comment 8

•

2 years ago

Attached file Bug 1767612 - Update to sixgill that optimizes out trivial loops — Details

Steve Fink [:sfink] [:s:]

Assignee

Comment 9

•

2 years ago

No good. My try push still took 197 minutes to run. Something's still going wrong.

Steve Fink [:sfink] [:s:]

Assignee

Comment 10

•

2 years ago

Attached file Bug 1767612 - Add a performance test for a body containing many, many degenerate loops. — Details

BugBot [:suhaib / :marco/ :calixte]

Comment 11

•

2 years ago

The severity field is not set for this bug.
:KrisWright, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(kwright)

Steve Fink [:sfink] [:s:]

Assignee

Comment 12

•

2 years ago

This is currently landing, so I set the severity and priority. Somewhat randomly, I'll admit.

Severity: -- → S4

Flags: needinfo?(kwright)

Priority: -- → P2

Pulsebot

Comment 13

•

2 years ago

Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/db80146ff379
Update to sixgill that optimizes out trivial loops

Cristina Cozmuta (:CrissCozmuta)

Comment 14

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/db80146ff379

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox102: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 102 Branch

You need to log in before you can comment on or make changes to this bug.