Closed Bug 1638664 Opened 4 months ago Closed 4 months ago

Perma Windows searchfox idx [tier2] mozmake.EXE[4]: *** [Unified_cpp_dom_gamepad0.obj] Error 1 after clang: error: clang frontend command failed due to signal (use -v to see invocation)

Categories

(Webtools :: Searchfox, defect, P5)

defect

Tracking

(firefox78 fixed)

RESOLVED FIXED
Tracking Status
firefox78 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: kats)

References

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

Filed by: ccoroiu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=302628849&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/XAK4HmJnRyaJ5yS2CZMVDQ/runs/0/artifacts/public/logs/live_backing.log


task 2020-05-17T10:59:16.975Z] 10:59:16 INFO - clang: error: clang frontend command failed due to signal (use -v to see invocation)
[task 2020-05-17T10:59:16.975Z] 10:59:16 INFO - clang version 10.0.0
[task 2020-05-17T10:59:16.975Z] 10:59:16 INFO - Target: x86_64-pc-windows-msvc
[task 2020-05-17T10:59:16.976Z] 10:59:16 INFO - Thread model: posix
[task 2020-05-17T10:59:16.976Z] 10:59:16 INFO - InstalledDir: z:/task_1589710263/fetches/clang/bin
[task 2020-05-17T10:59:16.976Z] 10:59:16 INFO - clang: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
[task 2020-05-17T10:59:16.976Z] 10:59:16 INFO - clang: error: unable to execute command: Couldn't execute program 'z:\task_1589710263\fetches\clang\bin\clang.exe': The paging file is too small for this operation to complete. (0x5AF)
[task 2020-05-17T10:59:16.976Z] 10:59:16 INFO - clang: note: diagnostic msg: Error generating preprocessed source(s).
[task 2020-05-17T10:59:16.977Z] 10:59:16 INFO - MOZSEARCH: Z:\task_1589710263\build\src z:\task_1589710263\workspace\obj-build\mozsearch_index\ z:\task_1589710263\workspace\obj-build
[task 2020-05-17T10:59:16.977Z] 10:59:16 INFO - Z:/task_1589710263/build/src/config/rules.mk:746: recipe for target 'UnifiedBindings7.obj' failed
[task 2020-05-17T10:59:16.977Z] 10:59:16 INFO - mozmake.EXE[4]: *** [UnifiedBindings7.obj] Error 1
[task 2020-05-17T10:59:16.977Z] 10:59:16 INFO - mozmake.EXE[4]: Leaving directory 'z:/task_1589710263/workspace/obj-build/dom/bindings'
[task 2020-05-17T10:59:16.977Z] 10:59:16 INFO - mozmake.EXE[4]: Entering directory 'z:/task_1589710263/workspace/obj-build/dom/file/ipc'
[task 2020-05-17T10:59:16.978Z] 10:59:16 INFO - z:/task_1589710263/fetches/clang/bin/clang.exe --driver-mode=cl -Xclang -std=c++17 -FoUnified_cpp_dom_file_ipc0.obj -c -Iz:/task_1589710263/workspace/obj-build/dist/stl_wrappers -Xclang -ftrivial-auto-var-init=pattern -guard:cf -DDEBUG=1 -DUNICODE -D_UNICODE -D_CRT_RAND_S -DCERT_CHAIN_PARA_HAS_EXTRA_FIELDS -D_SECURE_ATL -DCHROMIUM_BUILD -DU_STATIC_IMPLEMENTATION -DOS_WIN=1 -DWIN32 -D_WIN32 -D_WINDOWS -DWIN32_LEAN_AND_MEAN -DCOMPILER_MSVC -DWINAPI_NO_BUNDLED_LIBRARIES -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -IZ:/task_1589710263/build/src/dom/file/ipc -Iz:/task_1589710263/workspace/obj-build/dom/file/ipc -IZ:/task_1589710263/build/src/dom/file -IZ:/task_1589710263/build/src/dom/ipc -IZ:/task_1589710263/build/src/xpcom/build -Iz:/task_1589710263/workspace/obj-build/ipc/ipdl/_ipdlheaders -IZ:/task_1589710263/build/src/ipc/chromium/src -IZ:/task_1589710263/build/src/ipc/glue -Iz:/task_1589710263/workspace/obj-build/dist/include -Iz:/task_1589710263/workspace/obj-build/dist/include/nspr -Iz:/task_1589710263/workspace/obj-build/dist/include/nss -MD -FI z:/task_1589710263/workspace/obj-build/mozilla-config.h -DMOZILLA_CLIENT -Qunused-arguments -Qunused-arguments -fcrash-diagnostics-dir=z:/task_1589710263/public/build -fcrash-diagnostics-dir=/z/task_1589710263/public/build -TP -Zc:sizedDealloc- -D_HAS_EXCEPTIONS=0 -W3 -Gy -Zc:inline -Gw -Wno-inline-new-delete -Wno-invalid-offsetof -Wno-microsoft-enum-value -Wno-microsoft-include -Wno-unknown-pragmas -Wno-ignored-pragmas -Wno-deprecated-declarations -Wno-invalid-noreturn -Wno-inconsistent-missing-override -Wno-implicit-exception-spec-mismatch -Wno-microsoft-exception-spec -Wno-unused-local-typedef -Wno-ignored-attributes -Wno-used-but-marked-unused -D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING -GR- -Z7 -Xclang -load -Xclang z:/task_1589710263/workspace/obj-build/build/clang-plugin/clang-plugin.dll -Xclang -add-plugin -Xclang moz-check -Xclang -add-plugin -Xclang mozsearch-index -Xclang -plugin-arg-mozsearch-index -Xclang Z:/task_1589710263/build/src -Xclang -plugin-arg-mozsearch-index -Xclang z:/task_1589710263/workspace/obj-build/mozsearch_index -Xclang -plugin-arg-mozsearch-index -Xclang z:/task_1589710263/workspace/obj-build -O2 -Oy- -Werror -Xclang -fexperimental-new-pass-manager -Xclang -MP -Xclang -dependency-file -Xclang .deps/Unified_cpp_dom_file_ipc0.obj.pp -Xclang -MT -Xclang Unified_cpp_dom_file_ipc0.obj Unified_cpp_dom_file_ipc0.cpp
[task 2020-05-17T10:59:16.978Z] 10:59:16 INFO - LLVM ERROR: out of memory
[task 2020-05-17T10:59:16.978Z] 10:59:16 INFO - Stack dump:
[task 2020-05-17T10:59:16.979Z] 10:59:16 INFO - 0. Program arguments: z:/task_1589710263/fetches/clang/bin/clang.exe --driver-mode=cl -Xclang -std=c++17 -FoUnified_cpp_dom_file_ipc0.obj -c -Iz:/task_1589710263/workspace/obj-build/dist/stl_wrappers -Xclang -ftrivial-auto-var-init=pattern -guard:cf -DDEBUG=1 -DUNICODE -D_UNICODE -D_CRT_RAND_S -DCERT_CHAIN_PARA_HAS_EXTRA_FIELDS -D_SECURE_ATL -DCHROMIUM_BUILD -DU_STATIC_IMPLEMENTATION -DOS_WIN=1 -DWIN32 -D_WIN32 -D_WINDOWS -DWIN32_LEAN_AND_MEAN -DCOMPILER_MSVC -DWINAPI_NO_BUNDLED_LIBRARIES -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -IZ:/task_1589710263/build/src/dom/file/ipc -Iz:/task_1589710263/workspace/obj-build/dom/file/ipc -IZ:/task_1589710263/build/src/dom/file -IZ:/task_1589710263/build/src/dom/ipc -IZ:/task_1589710263/build/src/xpcom/build -Iz:/task_1589710263/workspace/obj-build/ipc/ipdl/_ipdlheaders -IZ:/task_1589710263/build/src/ipc/chromium/src -IZ:/task_1589710263/build/src/ipc/glue -Iz:/task_1589710263/workspace/obj-build/dist/include -Iz:/task_1589710263/workspace/obj-build/dist/include/nspr -Iz:/task_1589710263/workspace/obj-build/dist/include/nss -MD -FI z:/task_1589710263/workspace/obj-build/mozilla-config.h -DMOZILLA_CLIENT -Qunused-arguments -Qunused-arguments -fcrash-diagnostics-dir=z:/task_1589710263/public/build -fcrash-diagnostics-dir=/z/task_1589710263/public/build -TP -Zc:sizedDealloc- -D_HAS_EXCEPTIONS=0 -W3 -Gy -Zc:inline -Gw -Wno-inline-new-delete -Wno-invalid-offsetof -Wno-microsoft-enum-value -Wno-microsoft-include -Wno-unknown-pragmas -Wno-ignored-pragmas -Wno-deprecated-declarations -Wno-invalid-noreturn -Wno-inconsistent-missing-override -Wno-implicit-exception-spec-mismatch -Wno-microsoft-exception-spec -Wno-unused-local-typedef -Wno-ignored-attributes -Wno-used-but-marked-unused -D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING -GR- -Z7 -Xclang -load -Xclang z:/task_1589710263/workspace/obj-build/build/clang-plugin/clang-plugin.dll -Xclang -add-plugin -Xclang moz-check -Xclang -add-plugin -Xclang mozsearch-index -Xclang -plugin-arg-mozsearch-index -Xclang Z:/task_1589710263/build/src -Xclang -plugin-arg-mozsearch-index -Xclang z:/task_1589710263/workspace/obj-build/mozsearch_index -Xclang -plugin-arg-mozsearch-index -Xclang z:/task_1589710263/workspace/obj-build -O2 -Oy- -Werror -Xclang -fexperimental-new-pass-manager -Xclang -MP -Xclang -dependency-file -Xclang .deps/Unified_cpp_dom_file_ipc0.obj.pp -Xclang -MT -Xclang Unified_cpp_dom_file_ipc0.obj Unified_cpp_dom_file_ipc0.cpp
[task 2020-05-17T10:59:16.979Z] 10:59:16 INFO - 1. z:/task_1589710263/workspace/obj-build/dist/include/nsLayoutUtils.h:85:24: current parser token ';'
[task 2020-05-17T10:59:16.979Z] 10:59:16 INFO - 2. z:/task_1589710263/workspace/obj-build/dist/include/nsLayoutUtils.h:64:1: parsing namespace 'mozilla'
[task 2020-05-17T10:59:16.979Z] 10:59:16 INFO - 3. z:/task_1589710263/workspace/obj-build/dist/include/nsLayoutUtils.h:78:1: parsing namespace 'mozilla::dom'
[task 2020-05-17T10:59:16.980Z] 10:59:16 INFO - 0x00007FF6975410C6 (0x00007FF69AD3D568 0x00007FF699A4D213 0x000000930EF1ABC8 0x0000000000000000)
[task 2020-05-17T10:59:16.980Z] 10:59:16 INFO - 0x00007FF699A4D583 (0x0000000000000001 0x0000000000000000 0x0000000000000000 0x0000000000000090)
[task 2020-05-17T10:59:16.980Z] 10:59:16 INFO - 0x00007FF699A3FE18 (0x0000000000000002 0x0000000000000090 0x000000930EF1AC10 0x0000000000100000)
[task 2020-05-17T10:59:16.980Z] 10:59:16 INFO - 0x00007FF6975312D8 (0x000000930F11C520 0x00007FF69895A72A 0x000067F3860D33D4 0x0000009312E182E0)
[task 2020-05-17T10:59:16.980Z] 10:59:16 INFO - 0x00007FF696BB2764 (0x0000000400000000 0x00007FF6990FAB5A 0x000067F3860D32B4 0x00007FF699546C50)
[task 2020-05-17T10:59:16.981Z] 10:59:16 INFO - 0x00007FF69917502D (0x000000930F873ED0 0x0000000000000000 0x000000930EF1AD90 0x00007FF6990FAB5A)
[task 2020-05-17T10:59:16.981Z] 10:59:16 INFO - 0x00007FF698AB61A0 (0x000000930F11C520 0x00007FF6990FCF26 0x0000000000000000 0x0000009300000000)
[task 2020-05-17T10:59:16.981Z] 10:59:16 INFO - 0x00007FF698852FA7 (0x000000930EF1BB01 0x0000000002B53AA9 0x000000930F094730 0x00007FF69771DBFF)
[task 2020-05-17T10:59:16.981Z] 10:59:16 INFO - 0x00007FF698877E0C (0x000067F3860D21E4 0x0000000000000000 0x0000009315262200 0x00000093152622E0)
[task 2020-05-17T10:59:16.981Z] 10:59:16 INFO - 0x00007FF69882AAF0 (0x0000009327CD0864 0x00007FF697772508 0x0000009327CD084F 0xFFFFFFFF00000009)
[task 2020-05-17T10:59:16.981Z] 10:59:16 INFO - 0x00007FF69882A6F4 (0x000067F3860D2604 0x0000009328477BA0 0x0000009328477BA0 0x00000093152622E0)
[task 2020-05-17T10:59:16.982Z] 10:59:16 INFO - 0x00007FF69882865C (0x00000093152622E0 0x000000930F0AA2C0 0x000000930F5C7120 0x000000930F5C7120)
[task 2020-05-17T10:59:16.982Z] 10:59:16 INFO - 0x00007FF69884D09B (0x000067F3860D5894 0x000000930EF1C310 0x000000930EF1C240 0x000000930F5C7120)
[task 2020-05-17T10:59:16.982Z] 10:59:16 INFO - 0x00007FF69884B581 (0x00000000000007D4 0x000000930F094730 0x000000930F03F990 0x0000000002B538CF)
[task 2020-05-17T10:59:16.982Z] 10:59:16 INFO - 0x00007FF698876E20 (0x000067F3860D5C24 0x00000093284771F0 0x00000093284771F0 0x000000930F5FB600)
[task 2020-05-17T10:59:16.982Z] 10:59:16 INFO - 0x00007FF6988285F3 (0x000000930F5FB600 0x000000930F0AA2C0 0x000000930F5C7120 0x000000930F5C7120)
[task 2020-05-17T10:59:16.982Z] 10:59:16 INFO - 0x00007FF69884D09B (0x000067F3860D5EB4 0x000000930EF1C930 0x000000930EF1C860 0x000000930F5C7120)
[task 2020-05-17T10:59:16.983Z] 10:59:16 INFO - 0x00007FF69884B581 (0x000067F3860D2C44 0x000000930EF1CB58 0x000000930EF1CDB0 0x00000093284199D8)
[task 2020-05-17T10:59:16.983Z] 10:59:16 INFO - 0x00007FF698876E20 (0x000067F3860D5264 0x0000000000000003 0x00007FF69A3F2CF0 0x00007FF69981A384)
[task 2020-05-17T10:59:16.983Z] 10:59:16 INFO - 0x00007FF6988285F3 (0x0000009328477160 0x0000009300000000 0x000000930F0A5050 0x00007FF69A62E176)
[task 2020-05-17T10:59:16.983Z] 10:59:16 INFO - 0x00007FF698826F22 (0x000000930F08FEC0 0x0000000000000000 0x0000000000000001 0x000000930EF1CEC8)
[task 2020-05-17T10:59:16.983Z] 10:59:16 INFO - 0x00007FF69882286E (0x000000010000000E 0x000067F3860D50F4 0x0000000000000000 0x000000000000000F)
[task 2020-05-17T10:59:16.984Z] 10:59:16 INFO - 0x00007FF697CEAD57 (0x00007FF6970D9E40 0x0000000000000001 0x000000930F0A7BB0 0x000000930F08FEC0)
[task 2020-05-17T10:59:16.984Z] 10:59:16 INFO - 0x00007FF697CA82F1 (0x00007FF699A88201 0x0000000000000000 0x0000000000000030 0x0000000000000000)
[task 2020-05-17T10:59:16.984Z] 10:59:16 INFO - 0x00007FF697D5B005 (0x00007FF69A35DF95 0x0000000000000002 0x000000930EF1D5F0 0x0000000000000006)
[task 2020-05-17T10:59:16.984Z] 10:59:16 INFO - 0x00007FF6964E7FC4 (0x00007FF6961FC080 0x0000000000000130 0x000000930EF1DF88 0x00007FF69710E430)
[task 2020-05-17T10:59:16.984Z] 10:59:16 INFO - 0x00007FF6964E50E2 (0x000067F3860D40A4 0x000000930F08FD90 0x000000930EF1DAD0 0x000000930EF1DAB8)

Component: General → Searchfox
Flags: needinfo?(emilio)
Product: Firefox Build System → Webtools
Summary: Perma idx [tier2] mozmake.EXE[4]: *** [Unified_cpp_dom_gamepad0.obj] Error 1 after clang: error: clang frontend command failed due to signal (use -v to see invocation) → Perma Windows searchfox idx [tier2] mozmake.EXE[4]: *** [Unified_cpp_dom_gamepad0.obj] Error 1 after clang: error: clang frontend command failed due to signal (use -v to see invocation)
Version: unspecified → other

It seems it ran out of memory... Aryx, do you know if something has changed in our windows builders recently or such? Also do you know if it fails in the same object file all the time?

Flags: needinfo?(aryx.bugmail)

As far as I know, the Windows builders weren't changed. This started on Sunday, changelog is https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=882de07e4cbe31a0617d1ae350236123dfdbe17f&tochange=5f7897d4523021ee64a2d75eb9ace78bdd8d6133

It fails on different object files after 49-55 minutes into the build.

Flags: needinfo?(aryx.bugmail)

I triggered the job on the intervening m-c push to narrow the regression window down:

https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=searchfox&tochange=5f7897d4523021ee64a2d75eb9ace78bdd8d6133&fromchange=4221f87da7fe9ecdb1a7ceda3967a7cc0acbf942

which gives this regression range:

https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=4221f87da7fe9ecdb1a7ceda3967a7cc0acbf942&tochange=5f7897d4523021ee64a2d75eb9ace78bdd8d6133

Nothing in there looks particularly likely to have caused this, I'm guessing it's more that the memory usage has been creeping up over time and just hit the worker limits. Unfortunately the taskcluster memory reporting for these workers seems to not be working, as it reports 0 memory usage throughout the task, so I can't confirm that.

I'll do a windows mozsearch-enabled build locally and see if I can reproduce the problem.

See Also: → 1638943

Also: in terms of the clang-plugin, one spot that I thought would use up big gobs of memory is this code where we read the entire analysis file for a source file into memory, append the new rows, and sort it all before writing it back out. If the analysis file for a source file gets really big this would eat up a lot of memory. I looked at a recent index of m-c to find the largest files in the win64 mozsearch-index.zip file, and got this:

$ find . -type f -print0 | xargs -0 wc -c | sort -rn  | head  # approximately what I did
125440912 ./__GENERATED__/dist/include/mozilla/Assertions.h
 67277607 ./js/src/zydis/Zydis/Generated/InstructionDefinitions.inc
 58214307 ./__GENERATED__/dom/bindings/TestJSImplGenBinding.cpp
 43461257 ./__GENERATED__/dom/bindings/TestCodeGenBinding.cpp
 36553630 ./third_party/sqlite3/src/sqlite3.c
 35245726 ./__GENERATED__/dom/bindings/CSS2PropertiesBinding.cpp
 32859531 ./__GENERATED__/dist/include/nsISupportsImpl.h
 26875546 ./__GENERATED__/dom/bindings/TestExampleGenBinding.cpp
 22401566 ./__GENERATED__/dist/include/mozilla/Likely.h
 19753412 ./__GENERATED__/toolkit/components/telemetry/TelemetryHistogramData.inc
 19729881 ./__GENERATED__/dist/include/mozilla/StaticAnalysisFunctions.h
 19566712 ./toolkit/components/reputationservice/chromium/chrome/common/safe_browsing/csd.pb.h

So Assertions.h generates an analysis file 125M big. I don't know how that compares to what else clang is doing and how much memory is available on the system.

Related but off-topic: Frustratingly, the Assertions.h lines are all valid-ish uses. For example, indexing the macro expansion of MOZ_ASSERT is choosing to generate a use at the point that MOZ_ASSERT bottoms out in a call to AnnotateMozCrashReason. The expansion looks like:

That's for the JSON analysis line:

{"loc":"00044:34-56","target":1,"kind":"use","pretty":"AnnotateMozCrashReason","sym":"_ZL22AnnotateMozCrashReasonPKc","context":"(anonymous namespace)::ASTSerializer::arrayPattern","contextsym":"_ZN12_GLOBAL__N_113ASTSerializer12arrayPatternEPN2js8frontend8ListNodeEN2JS13MutableHandleINS5_5ValueEEE"}

The size issue could be mitigated by generating the uses in the source files that invoke the macro, but there's a reason the heuristics attempt to pick actual source tokens to map to rather than opaquely claiming that a million things are happening inside the macro use point itself. Fixing bug 1583635 so we could have the macro expansions inline and generate the locations against the expanded lines would probably be the most useful way of addressing this specific situation.

Another option is to have a more memory-efficient approach to updating the analysis files. For example, sort the new entries in-memory, then read one entry at a time from the existing (sorted) file, and do a merge operation on the two sorted streams (one from disk one from memory) as we write to a new file. That way we don't have to load the entire existing file into memory, and the memory usage should stay bounded to whatever the currently-being-processed source file is generating.

That sounds like a great idea. If you end up pursuing this, https://github.com/asutherland/mozsearch/commit/a712c690bb0c3944e08f35d51a7f4c141f5411aa#diff-e9518421f76684479a12d2fb8070954b contains changes to use std::getline instead of fgets[1] if you're tempted to try and fix the hard-coded fixed-size Buffer as part of those changes.

1: Well, technically https://github.com/asutherland/mozsearch/commit/efe2fea6a981358084104cbc5da52a0dae92f9df#diff-e9518421f76684479a12d2fb8070954b got rid of fgets and replaced it with getline and https://github.com/asutherland/mozsearch/commit/79012e6abd342cf41f486927e5d7ddd6e71e5fcb#diff-e9518421f76684479a12d2fb8070954b added the comment about that that the above patch then fixed.

I wasn't able to repro the OOM locally, which lends credence to the theory that it just hit some threshold on the worker (as opposed to e.g. running into some sort of infinite memory-allocation loop).

I cobbled together a minimum viable patch to test the theory on try, and it seems to work. Try push is here and a cursory examination of the mozsearch-index.zip shows that it seems sane. I didn't dedupe all the things perfectly though, so I need to fix that, and I need to make the non-Windows codepath work as well. But still, this seems like the right fix, so I'll take this bug and finish it up.

Assignee: nobody → kats
Flags: needinfo?(emilio)

Also it looks like the problem has fixed itself (i.e. windows indexer jobs on m-c are green again) so fixing this is less urgent. But I might as well clean up my patch and land it.

The approach of "write to a tmp file, move it to replace the original" seems surprisingly hard on Linux. Right now the code does a flock on the original file, which I thought would continue to work, but with the new approach there's a race condition where a new process could start waiting on a flock while another process is busy working on the file. And then when the old process deletes and renames the file, the new process will have a file handle to now-deleted file and will effectively be working with stale data. The linux semaphore mechanism (sem_open et al) don't have any good place to call sem_unlink to remove the semaphores after you're done with them, as far as I can tell. I can leave them dangling but as they take up kernel memory (AIUI) doing that can result in OOM. I could create a completely separate lock file in some garbage dir and use that instead. It just feels so gross.

We could use SQLite using WITHOUT ROWID where the JSON text is also the primary key and we do idempotent inserts, so sqlite is effectively just one big btree. SQLite has cross-platform locking magic as long as NFS isn't involved.

SQLite is an interesting idea... that might be something to do in the future. For now I realized that I can use the source file (instead of the analysis file) as my flock target, since that is always present and doesn't move. That seems to work fine.

So now the question is, should I still land the patch? It's probably going to make your life more difficult when it comes to rebasing the commits you linked in comment 8.

Actually maybe it makes it less hacky because now you can just open the files as streams directly without having to bother with keeping the FileDescriptor thing. I'll put up the patch and you can decide if it's worth it or not.

Instead of doing this:
a) read existing file into memory
b) append new entries
c) sort all entries
d) write unique entries back to file

We now do this:
a) sort new entries
b) loop through existing file entries, one at a time, writing them to a tmp file
c) insert the new entries in between the existing file entries in lexicographic order,
and deduplicating the new entries
d) write any remaining new entries (that are lexicographically after the last
entry in the pre-existing file), again deduplicating the entries
e) move tmp file back to original file location

This avoids reading the entire file into memory which could be potentially
hundreds of MB large.

The changes in FileOperations.* are needed to support these changes, as we now
have two files that we're dealing with - reading from one and writing to the
other. We still use a Mutex (Windows) or exclusive-flock (POSIX) on the file
for the duration of the entire operation, so we should still be robust in the
face of multiple concurrently running clang processes.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #13)

So now the question is, should I still land the patch? It's probably going to make your life more difficult when it comes to rebasing the commits you linked in comment 8.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #14)

Actually maybe it makes it less hacky because now you can just open the files as streams directly without having to bother with keeping the FileDescriptor thing. I'll put up the patch and you can decide if it's worth it or not.

Yeah, this change seems like a massive improvement that will eliminate the need for the gross hack. Woo! (And thank goodness flock is just advisory locking and won't interfere with clang's own opening of the source files, etc.!)

So I did a couple of try pushes with this patch. The first one, everything was green:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=2942eb3fbed45becf14a356bad60f23da9503368

The second one, Windows was red, twice:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=6397c555e918e80746bfc633ff537b847c278f28

One failure due to errno 13 when moving the tmp file into place. The other was again some sort of OOM. The only difference between the first and second try push was that I dropped the O_CREAT flag on the POSIX locking codepath, so that shouldn't affect the Windows side at all.

But this means that (a) the OOM problem is still there, even with my patch. and (b) my patch might make things worse by adding another mode of failure. I'll hold off on landing it until I have a chance to dig into this a bit more.

The second try push in the previous comment actually has a stack pointing to the clang plugin:

[task 2020-05-20T21:15:10.183Z] 21:15:10     INFO -  1.	<eof> parser at end of file
[task 2020-05-20T21:15:10.183Z] 21:15:10     INFO -   #0 0x00007ffea5b18a5c (C:\Windows\system32\KERNELBASE.dll+0x8a5c)
[task 2020-05-20T21:15:10.184Z] 21:15:10     INFO -   #1 0x00007ffe9a5fa591 _CxxThrowException f:\dd\vctools\crt\vcruntime\src\eh\throw.cpp:133:0
[task 2020-05-20T21:15:10.184Z] 21:15:10     INFO -   #2 0x00007ffe9a5f8bbb __scrt_throw_std_bad_alloc(void) f:\dd\vctools\crt\vcstartup\src\heap\throw_bad_alloc.cpp:35:0
[task 2020-05-20T21:15:10.184Z] 21:15:10     INFO -   #3 0x00007ffe9a5f556d operator new(unsigned __int64) f:\dd\vctools\crt\vcstartup\src\heap\new_scalar.cpp:48:0
[task 2020-05-20T21:15:10.184Z] 21:15:10     INFO -   #4 0x00007ffe9a596e04 (z:\task_1590000809\workspace\obj-build\build\clang-plugin\clang-plugin.dll+0xb6e04)
[task 2020-05-20T21:15:10.186Z] 21:15:10     INFO -   #5 0x00007ffe9a59494c IndexConsumer::visitIdentifier(char const *,char const *,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class clang::SourceLocation,class std::vector<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::allocator<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > > > const &,struct IndexConsumer::Context,int,class clang::SourceRange,class clang::SourceRange) (z:\task_1590000809\workspace\obj-build\build\clang-plugin\clang-plugin.dll+0xb494c)
[task 2020-05-20T21:15:10.186Z] 21:15:10     INFO -   #6 0x00007ffe9a5c2a64 IndexConsumer::VisitNamedDecl(class clang::NamedDecl *) (z:\task_1590000809\workspace\obj-build\build\clang-plugin\clang-plugin.dll+0xe2a64)
[task 2020-05-20T21:15:10.193Z] 21:15:10     INFO -   #7 0x00007ffe9a59eae8 clang::RecursiveASTVisitor<class IndexConsumer>::TraverseEnumConstantDecl(class clang::EnumConstantDecl *) (z:\task_1590000809\workspace\obj-build\build\clang-plugin\clang-plugin.dll+0xbeae8)

The previous OOMs were inside clang proper, I think, so it could just be general memory exhaustion, but it's also possible that the symbol list in visitIdentifer is excessively large, per this comment. I'm doing another run with some more logging around there to see if I can get more data.

The symbol list is just the list of methods a method is overriding, I don't think that's likely to be the source of huge problems on its own. And the other unbounded list of symbols where concatSymbols does the same preparation comes from the context stack for "contextsym" which is bounded by the maximum nesting of declarations in source files.

Also I just realized that another option here might be to migrate the searchfox windows task to run on linux instead, like the official win64/debug build does. That might get us the best of all worlds - faster, cheaper, no OOMs.

This updates the task definition for the win64 searchfox job to closely match
the win64/debug task definition in taskcluster/ci/build/windows.yml. So, instead
of running the build with the mozsearch-plugin on a windows worker, it runs
on a Linux worker and does a cross-compile of windows code. The end result in
terms of searchfox artifacts is equivalent, except for absolute filename paths.
I verified that with mozsearch/mozsearch#299 and mozsearch/mozsearch-mozilla#87
in place, searchfox correctly indexes windows-only C++ and Rust code.

So yeah, that seems to work. Try push is at https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=847dc26cdcfd1be11930fbdcd2e96b1c0fea36e9 and a searchfox indexing run based on that try push (and including mozsearch/mozsearch-mozilla#65 and mozsearch/mozsearch#299) is currently deployed at https://kats.searchfox.org/

As the first patch I uploaded here still seems like a nice-ish cleanup, I'd like to land that too, even though with this change the windows-ifdef parts never actually get run anymore.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #22)

As the first patch I uploaded here still seems like a nice-ish cleanup, I'd like to land that too, even though with this change the windows-ifdef parts never actually get run anymore.

Agreed.

Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/56729c1c06ba
Run the windows searchfox build as cross-compilation on Linux. r=asuth
https://hg.mozilla.org/integration/autoland/rev/7ddc165dc82e
Improve the way MozsearchIndexer merges analysis data. r=asuth
You need to log in before you can comment on or make changes to this bug.