Closed Bug 1832591 Opened 1 year ago Closed 11 months ago

Windows Crash in [@ mozilla::URLPreloader::CacheKey::Code]

Categories

(Core :: XPConnect, defect)

All
Windows
defect

Tracking

()

RESOLVED FIXED
115 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox113 --- unaffected
firefox114 + disabled
firefox115 --- fixed

People

(Reporter: aryx, Assigned: kmag)

References

(Regression)

Details

(4 keywords)

Crash Data

Attachments

(1 file)

Frequent assert on Windows with Firefox 114.0 betas

Crash report: https://crash-stats.mozilla.org/report/index/3f9d3638-af02-4335-a9a8-dd83c0230511

MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(mPath.Length() > 0)

Top 10 frames of crashing thread:

0  xul.dll  mozilla::URLPreloader::CacheKey::Code  js/xpconnect/loader/URLPreloader.h:167
0  xul.dll  mozilla::URLPreloader::CacheKey::CacheKey  js/xpconnect/loader/URLPreloader.cpp:678
0  xul.dll  mozilla::URLPreloader::ReadCache  js/xpconnect/loader/URLPreloader.cpp:300
1  xul.dll  mozilla::URLPreloader::BackgroundReadFiles  js/xpconnect/loader/URLPreloader.cpp:352
2  xul.dll  mozilla::detail::RunnableMethodArguments<>::apply<mozilla::ipc::PortLink::PortObserverThunk, void  const  xpcom/threads/nsThreadUtils.h:1164
2  xul.dll  std::_C__Invoker_functor::_Call  /builds/worker/fetches/vs/VC/Tools/MSVC/14.16.27023/include/type_traits:16707566
2  xul.dll  std::_C_invoke  /builds/worker/fetches/vs/VC/Tools/MSVC/14.16.27023/include/type_traits:16707566
2  xul.dll  std::_Apply_impl  /builds/worker/fetches/vs/VC/Tools/MSVC/14.16.27023/include/tuple:1233
2  xul.dll  std::apply  /builds/worker/fetches/vs/VC/Tools/MSVC/14.16.27023/include/tuple:1241
2  xul.dll  mozilla::detail::RunnableMethodArguments<>::apply  xpcom/threads/nsThreadUtils.h:1162
Flags: needinfo?(kmaglione+bmo)
Severity: -- → S3

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 20 desktop browser crashes on beta (startup)

:hsinyi, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(htsai)

This is a MOZ_DIAGNOSTIC_ASSERT, so it should go away after early beta, FWIW.

I'm working on adding a checksum to the headers so we can just fail gracefully in the case of on disk corruption rather than asserting. And, as Andrew says, this is a diagnostic assert. After early beta, we will just bail out rather than asserting. The assertion was there to find out where the corruption is happening, and so far it suggests that it's happening on disk after the file is written.

My best guess is that because we don't force a flush, we're probably winding up with garbage or zeros in the file. But since this is only a cache, I don't really want to force a flush. It would be better to just fail a CRC verification and write out a new cache for the next session.

Flags: needinfo?(kmaglione+bmo)

We've been seeing crashes in the wild from multiple entries with empty hash
key names being read from the preloader caches. Based on assertions we've
added to the code, it appears that this is happening on disk, after the cache
has been written, rather than during runtime.

Using a CRC for this sort of data is a sensible precaution in any case, and it
should make it easier to distinguish between on-disk corruption that we can
safely ignore and other sorts of consistency issues that we should ideally
fix.

Assignee: nobody → kmaglione+bmo
Status: NEW → ASSIGNED
Pushed by maglione.k@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/9df1cbb9878c
Add CRCs to preloader caches. r=mccr8

Backed out for causing build bustages on URLPreloader.cpp.

[task 2023-05-16T06:00:45.407Z] 06:00:45     INFO -  gmake[4]: Entering directory '/builds/worker/workspace/obj-build/js/xpconnect/loader'
[task 2023-05-16T06:00:45.411Z] 06:00:45     INFO -  /builds/worker/fetches/sccache/sccache /builds/worker/fetches/clang/bin/clang++ --sysroot /builds/worker/fetches/sysroot-x86_64-linux-gnu -Qunused-arguments -o URLPreloader.o -c  -I/builds/worker/workspace/obj-build/dist/stl_wrappers -I/builds/worker/workspace/obj-build/dist/system_wrappers -include /builds/worker/checkouts/gecko/config/gcc_hidden.h -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -fstack-protector-strong -fstack-clash-protection -DNDEBUG=1 -DTRIMMED=1 -DOS_POSIX=1 -DOS_LINUX=1 -DMOZ_HAS_MOZGLUE -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -I/builds/worker/checkouts/gecko/js/xpconnect/loader -I/builds/worker/workspace/obj-build/js/xpconnect/loader -I/builds/worker/checkouts/gecko/js/xpconnect/src -I/builds/worker/checkouts/gecko/js/xpconnect/wrappers -I/builds/worker/checkouts/gecko/dom/base -I/builds/worker/checkouts/gecko/js/loader -I/builds/worker/checkouts/gecko/xpcom/base -I/builds/worker/checkouts/gecko/xpcom/io -I/builds/worker/workspace/obj-build/ipc/ipdl/_ipdlheaders -I/builds/worker/checkouts/gecko/ipc/chromium/src -I/builds/worker/workspace/obj-build/dist/include -I/builds/worker/workspace/obj-build/dist/include/nspr -I/builds/worker/workspace/obj-build/dist/include/nss -DMOZILLA_CLIENT -include /builds/worker/workspace/obj-build/mozilla-config.h -D_GLIBCXX_USE_CXX11_ABI=0 -fno-sized-deallocation -fno-aligned-new -fcrash-diagnostics-dir=/builds/worker/artifacts -fno-exceptions -fPIC -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -pthread -pipe -gdwarf-4 -Xclang -load -Xclang /builds/worker/workspace/obj-build/build/clang-plugin/libclang-plugin.so -Xclang -add-plugin -Xclang moz-check -O2 -fno-omit-frame-pointer -funwind-tables -Werror -Wall -Wbitfield-enum-conversion -Wdeprecated-this-capture -Wempty-body -Wformat-type-confusion -Wignored-qualifiers -Wpointer-arith -Wshadow-field-in-constructor-modified -Wsign-compare -Wtautological-constant-in-range-compare -Wtype-limits -Wno-error=tautological-type-limit-compare -Wunreachable-code -Wunreachable-code-return -Wunused-but-set-parameter -Wno-invalid-offsetof -Wclass-varargs -Wempty-init-stmt -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wloop-analysis -Wno-range-loop-analysis -Wc++2a-compat -Wenum-compare-conditional -Wenum-float-conversion -Wno-ambiguous-reversed-operator -Wno-error=deprecated -Wno-error=deprecated-anon-enum-enum-conversion -Wno-error=deprecated-enum-enum-conversion -Wno-error=deprecated-pragma -Wno-error=deprecated-this-capture -Wcomma -Wimplicit-fallthrough -Wstring-conversion -Wno-inline-new-delete -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=free-nonheap-object -Wno-error=atomic-alignment -Wno-error=deprecated-builtins -Wformat -Wformat-security -Wno-psabi -Wthread-safety -Wno-unknown-warning-option -fno-strict-aliasing -ffp-contract=off  -MD -MP -MF .deps/URLPreloader.o.pp   /builds/worker/checkouts/gecko/js/xpconnect/loader/URLPreloader.cpp
[task 2023-05-16T06:00:45.411Z] 06:00:45    ERROR -  /builds/worker/checkouts/gecko/js/xpconnect/loader/URLPreloader.cpp:272:21: error: use of undeclared identifier 'MAGIC'
[task 2023-05-16T06:00:45.412Z] 06:00:45     INFO -    if (size < sizeof(MAGIC) + sizeof(headerSize) + sizeof(crc)) {
[task 2023-05-16T06:00:45.412Z] 06:00:45     INFO -                      ^
[task 2023-05-16T06:00:45.412Z] 06:00:45     INFO -  1 error generated.
[task 2023-05-16T06:00:45.412Z] 06:00:45    ERROR -  gmake[4]: *** [/builds/worker/checkouts/gecko/config/rules.mk:669: URLPreloader.o] Error 1
[task 2023-05-16T06:00:45.412Z] 06:00:45     INFO -  gmake[4]: Leaving directory '/builds/worker/workspace/obj-build/js/xpconnect/loader'
[task 2023-05-16T06:00:45.415Z] 06:00:45     INFO -  gmake[4]: Entering directory '/builds/worker/workspace/obj-build/js/xpconnect/loader'
Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(htsai)

Tracking as this is a high volume startup crash affecting our beta/deved population.

Severity: S3 → S2

Should we back out the regressor on beta?

(In reply to Pascal Chevrel:pascalc from comment #8)

Should we back out the regressor on beta?

I would support backing out, given that I don't see an urgent and important reason to keep the regressor in the current beta.

Pushed by maglione.k@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/fb04fe6c339a
Add CRCs to preloader caches. r=mccr8

We can back out the patch, though I'm not sure it will actually decrease crash volume rather than moving it to the failed assertion from bug 1724336. It will only make a difference if we have users with a single empty path rather than two.

Flags: needinfo?(kmaglione+bmo)
Status: ASSIGNED → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → 115 Branch

The patch landed in nightly and beta is affected.
:kmag, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox114 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(kmaglione+bmo)

Uplifting should be fairly safe, but backing out the patch that added the assertion would also be OK. Either way, the crashes from consistency errors it aims to fix should only affect early beta, since they're treated as nonfatal in late beta and release.

They should also, in theory, go away after a restart, since we should remove the startupCache directory after a startup crash. I'm not entirely convinced that the functionality is working as planned, though, since when I looked into the crashes in the past, I found some users who were repeatedly crashing. I investigated a bit more yesterday, and the only obvious way that could happen is if we're failing to create the startup canary file, or we're failing to remove the startupCrash directory.

It might be worth trying to make the system a bit more reliable by ignoring the cache when the removal fails, like we do for the old startup cache system. It looks like I wrote a patch to do that a couple of years ago, but it got dropped when I had to take medical leave, so I'll try to get it landed this week.

Flags: needinfo?(kmaglione+bmo)

We're past the end of early beta now for 114, so backporting is moot.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: