Open Bug 1757116 Opened 4 years ago Updated 1 year ago

Windows ASAN build fails during due to STATUS_STACK_BUFFER_OVERRUN in e.g. serde

Categories

(Firefox Build System :: General, defect, P3)

Desktop
Windows 10
defect

Tracking

(firefox-esr91 wontfix, firefox98 wontfix, firefox99 wontfix, firefox100 wontfix, firefox101 fix-optional)

Tracking Status
firefox-esr91 --- wontfix
firefox98 --- wontfix
firefox99 --- wontfix
firefox100 --- wontfix
firefox101 --- fix-optional

People

(Reporter: bryce, Unassigned)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

Attached file buildLog.txt

I'm attempting to create a local ASAN build by following the steps here. I'm using a similar mozconfig, but with a few additional config lines:

 ac_add_options --enable-warnings-as-errors
 ac_add_options --enable-debug
 mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../mozilla-builds/obj-ff-asan-dbg

 # Enable EME
 ac_add_options --enable-eme=widevine

 # ASAN
 ac_add_options --enable-address-sanitizer
 ac_add_options --disable-jemalloc

 export LDFLAGS="clang_rt.asan_dynamic-x86_64.lib clang_rt.asan_dynamic_runtime_thunk-x86_64.lib"
 CLANG_LIB_DIR="$(cd ~/.mozbuild/clang/lib/clang/*/lib/windows && pwd)"
 export MOZ_CLANG_RT_ASAN_LIB_PATH="${CLANG_LIB_DIR}/clang_rt.asan_dynamic-x86_64.dll"
 export LIB=$LIB:$CLANG_LIB_DIR

I've tried removing all the lines except the ASAN ones and the objdir change, and am still hitting the issue.

STR:

  • Attempt to compile Firefox using the above mozconfig (MOZCONFIG=.mozconfig-asan-dbg ./mach build)

Actual result:
The build fails with an error (see attached file for full build log)

 1:40.95   process didn't exit successfully: `c:/projects/mozilla/mozilla-builds/obj-ff-asan-dbg\debug\build\serde-9b07c3ec2a70ce88\build-script-build` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)

Expected result:
Build runs without issue.

OS: Unspecified → Windows 10
Hardware: Unspecified → Desktop
Priority: -- → P3

I can reproduce, just not sure how to continue debugging in isolation.

Assignee: nobody → ahochheiden

I spoke with Glandium and he said "there is essentially no solution but to disable asan in rust". The problem is that Cargo can't differentiate between what's built for the build and what's built for Firefox, so if asan is enabled, if there's any asan errors in the build scripts we're hosed.

I still need to figure out how to disable asan just for Cargo, but in the meantime, is using ./mach try fuzzy and selecting build-win64-asan/opt, build-win64-asan/debug, or build-win64-asan-fuzzing/opt a viable workaround for you? For all those, the build host machine is on Linux and the build scripts don't have the same problem, so the asan build for the 'target' Windows build should succeed.

Flags: needinfo?(bvandyk)

Thanks for the workaround! For bug 1754168 I think my workflow is such that the non-local dev friction may just mean I park that for now, but it's not the most pressing thing I have in my queue at the moment anyhwo.

Flags: needinfo?(bvandyk)

I think I've spent too much time on this and haven't got a working solution, so I I'll put this on pause for now, since a (albeit annoying) workaround does exists. I'll write down what I've tried and what I'm thinking for next steps just to make it easier to continue this in the future.

As mentioned previously, I spoke with Glandium and his fix was to just disable ASAN for in Rust. While this might sound easy, I wasn't able to get this to work. To enable ASAN, the mozconfig setting eventually adds -fsanitize=address to CFLAGS, CXXFLAGS, and LDFLAGS. Eventually in Mach/Python land, we invoke the root makefile the builds both C/C++ and Rust code, so we can't disable it there, only inside the makefile(s). In rust.mk I tried to filter-out the -fsanitize=address and re-export the flags, but this didn't work.

I used $(foreach v, $(.VARIABLES), $(if $(filter file,$(origin $(v))), $(info $(v)=$($(v))))) to get all the env variables visible in the file and the ones I listed weren't visible (which is suspicious), but -fsanitize=address was present in OS_CXXFLAGS and COMPUTED_CXX_LDFLAGS. I removed the -fsanitize=address from them, and re-exported them with this code:

OS_CXXFLAGS :=$(filter-out  -fsanitize=address,$(OS_CXXFLAGS))
export OS_CXXFLAGS:=$(OS_CXXFLAGS)

COMPUTED_CXX_LDFLAGS :=$(filter-out  -fsanitize=address,$(COMPUTED_CXX_LDFLAGS))
export COMPUTED_CXX_LDFLAGS:=$(COMPUTED_CXX_LDFLAGS)

(Eventually this would be wrapped in ifeq (WINNT,$(HOST_OS_ARCH)) to only do this on Windows, but I left that out for now)

But that didn't work, so I tried the same for the CFLAGS, CXXFLAGS, and LDFLAGS, but that also didn't work either. I can't figure out why, since that's essentially the purpose of this file, as per the header comment:

# /!\ In this file, we export multiple variables globally via make rather than
# in recipes via the `env` command to avoid round-trips to msys on Windows, which
# tend to break environment variable values in interesting ways.

Glandium also mentioned that it might be better to fix whatever is wrong in the Rust/Cargo build scripts that's making them crash with ASAN, but that's a bit outside my wheelhouse since I'm not that familiar with Rust/Cargo. Running the build in verbose (eg: ./mach build -v) gives the command that eventually tries to build the projects that fail with the STATUS_STACK_BUFFER_OVERRUN in build-script-build.exe, but running it outside of the Mach environment (ie: just through the MSYS shell) doesn't work. It seems as though there are multiple environment settings missing to make this command work. I fixed a few, but got stuck on one I wasn't able to get it to run on its own.

Going down this road, it's probably easier to completely isolate the failing projects and try to build them with ASAN independently. For example, grab serde and build it outside of our environment with ASAN enabled.

Though, it's not just serde. I also saw proc-macro2, winapi, and http3server. If we continue down this path, we'd need to fix all these third party build scripts and push the patch upstream. I would continue down this path now, but I've exceeded the time limit I've set for myself on this, and there is a viable workaround available.

Maybe what I've written can help somebody else more knowledgeable find a solution, but for now I'll leave this on hold. (Though if there is an urgent demand for this I'll definitely jump back in to it).

Assignee: ahochheiden → nobody
Regressed by: 1695285

Set release status flags based on info from the regressing bug 1695285

:truber, since you are the author of the regressor, bug 1695285, could you take a look?
For more information, please visit auto_nag documentation.

Flags: needinfo?(jschwartzentruber)

I was able to reproduce this for a local win10 asan build, but the patch for bug 1762324 does not fix it. I tried reproducing in serde standalone, but could not, and I'm not sure how to debug it in tree.

Flags: needinfo?(jschwartzentruber)
Keywords: regression
No longer regressed by: 1695285

It is the same thing as bug 1762324.

The one thing from comment 0 that makes it slightly different is that LDFLAGS is causing problems. Setting LIBS instead makes it work.

Regressed by: 1695285

Set release status flags based on info from the regressing bug 1695285

Severity: -- → S3

I also have this problem, which makes it hard to develop asan sec-issue fixes on windows.
Since this affects our ability to develop security fixes, I think this should be S2.

.mozconfig

#ac_add_options --disable-debug
#ac_add_options --disable-optimize
ac_add_options --enable-warnings-as-errors

#ac_add_options --enable-debug
#ac_add_options --enable-optimize="-Og"
#ac_add_options --enable-optimize

ac_add_options --enable-address-sanitizer
ac_add_options --disable-jemalloc

export LDFLAGS="clang_rt.asan_dynamic-x86_64.lib clang_rt.asan_dynamic_runtime_thunk-x86_64.lib"
CLANG_LIB_DIR="$(cd ~/.mozbuild/clang/lib/clang/*/lib/windows && pwd)"
export MOZ_CLANG_RT_ASAN_LIB_PATH="${CLANG_LIB_DIR}/clang_rt.asan_dynamic-x86_64.dll"
export LIB=$LIB:$CLANG_LIB_DIR

Build failures:

 0:57.32    Compiling quote v1.0.23
 0:57.57    Compiling unicode-ident v1.0.6
 0:57.67    Compiling syn v1.0.107
 0:58.33 error: failed to run custom build command for `proc-macro2 v1.0.51`
 0:58.34 Caused by:
 0:58.34   process didn't exit successfully: `C:/dev/mozilla/gecko7/obj-x86_64-pc-windows-msvc\release\build\proc-macro2-656c92295ddb534e\build-script-build` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)
 0:58.34 warning: build failed, waiting for other jobs to finish...
 0:58.74 error: failed to run custom build command for `quote v1.0.23`
 0:58.74 Caused by:
 0:58.74   process didn't exit successfully: `C:/dev/mozilla/gecko7/obj-x86_64-pc-windows-msvc\release\build\quote-f3bd30d8029b230c\build-script-build` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)
 0:58.79 dom/cache
 0:59.06 error: failed to run custom build command for `syn v1.0.107`
 0:59.06 Caused by:
 0:59.06   process didn't exit successfully: `C:/dev/mozilla/gecko7/obj-x86_64-pc-windows-msvc\release\build\syn-81146587d4a92a3c\build-script-build` (exit code: 0xc0000409, STATUS_STACK_BUFFER_OVERRUN)
 0:59.35 mozmake[4]: *** [C:/dev/mozilla/gecko7/config/makefiles/rust.mk:438: force-cargo-library-build] Error 101
 0:59.35 mozmake[3]: *** [C:/dev/mozilla/gecko7/config/recurse.mk:72: toolkit/library/rust/target] Error 2
 0:59.35 mozmake[3]: *** Waiting for unfinished jobs....
Severity: S3 → S2
Summary: Windows ASAN build fails during due to serde STATUS_STACK_BUFFER_OVERRUN → Windows ASAN build fails during due to STATUS_STACK_BUFFER_OVERRUN in e.g. serde
No longer blocks: 1812932
See Also: → 1812932
No longer blocks: 1754168
See Also: → 1754168

Downgrading to S3. There is no short term or even middle term solution to this problem, but a workaround exists: cross-compiling (ASan builds on CI are cross-compiled ; cross-compilation can be done locally in WSL).

Severity: S2 → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: