Closed Bug 1656973 Opened 5 years ago Closed 4 years ago

Crash in [@ nsMemoryReporterManager::RegisterReporterHelper]

Categories

(Core :: XPConnect, defect, P2)

80 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
81 Branch
Tracking Status
firefox-esr68 --- wontfix
firefox-esr78 --- wontfix
firefox79 --- wontfix
firefox80 + fixed
firefox81 --- fixed

People

(Reporter: philipp, Assigned: mccr8)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-7b417e22-0f1e-419b-927e-7545e0200803.

Top 10 frames of crashing thread:

0 xul.dll nsMemoryReporterManager::RegisterReporterHelper xpcom/base/nsMemoryReporterManager.cpp
1 xul.dll mozilla::RegisterWeakMemoryReporter xpcom/base/nsMemoryReporterManager.cpp:2670
2 xul.dll mozilla::URLPreloader::URLPreloader js/xpconnect/loader/URLPreloader.cpp:95
3 xul.dll static mozilla::URLPreloader::GetSingleton js/xpconnect/loader/URLPreloader.cpp:81
4 xul.dll mozilla::ScriptPreloader::InitCache js/xpconnect/loader/ScriptPreloader.cpp:439
5 xul.dll static mozilla::ScriptPreloader::GetChildSingleton js/xpconnect/loader/ScriptPreloader.cpp:148
6 xul.dll static mozilla::ScriptPreloader::GetSingleton js/xpconnect/loader/ScriptPreloader.cpp:107
7 xul.dll mozJSComponentLoader::ObjectForLocation js/xpconnect/loader/mozJSComponentLoader.cpp:756
8 xul.dll mozJSComponentLoader::Import js/xpconnect/loader/mozJSComponentLoader.cpp:1276
9 xul.dll mozilla::dom::module_getter::ModuleGetter dom/base/ChromeUtils.cpp:576

the volume of this crash signature is visibly increasing at the start of the 80.0b cycle and accounting for 1.7% of browser crash in beta 2. crashing installations are on windows and crashing with MOZ_CRASH(CrashIfRefcountIsZero: refcount is zero).

It looks like this is all the URLPreloader. The problem is that we're registering it as a memory reporter in the ctor, so the refcount is zero. This can cause some internal problems in the memory reporter manager, so we assert against it. dthayer has been doing some work on this class recently, but at a quick glance this issue would seem to be at least a few years old. Maybe Doug's changes made us use it more?

The fix would be to add an init method that does the registration, and only call it after there has been an addref.

I see a decent amount of these crashes in 78, but it does appear to have increased quite a bit in 80.

Assignee: nobody → continuation

In the parent process, URLPreloader::InitInternal() will call AddObserver with the preloader, which will normally ensure that there's at least one reference. (I'm not sure how it works in the child process.)

These crashes all have ShutdownProgress of xpcom-shutdown. During XPCOM shutdown, it looks like we shut down the observer service, then clear ClearOnShutdown pointers like the singleton, then eventually shut down the JS engine. Presumably at some point before the last step some JS runs and does something that accesses the preloader. It looks like nsStringBundleBase::ParseProperties() is accessing it while we're shutting down a thread.

It looks like URLPreloader::sInitialized is not cleared after the preloader is initially shut down, so we end up recreating it instead of erroring out in URLPreloader::ReadURI.

Another fun wrinkle. After the observer service shuts down, it still exists, so you can get the service, so this line will succeed: nsCOMPtr<nsIObserverService> obs = services::GetObserverService();. (Note that this will cause a null deref if you call it a little later, after Services has shut down!)

Then on the next line it does this: obs->AddObserver(this, DELAYED_STARTUP_TOPIC, false);. Now that will fail, because mShuttingDown is true, but the URLPreloader doesn't check the return value, so it thinks it has successfully initialized, and it registers as a weak memory reporter without being addrefed. Maybe that's the immediate cause of this crash.

I'm still not sure how the weak register call works in the child process.

I'm not sure what I was searching for before, but it looks like about 300 results of these crashes are happening via mozJSComponentLoader::ObjectForLocation, and only about 60 are happening via nsStringBundleBase (the latter only on 79 and older), which is more in line with what I'd expect.

I'm not really sure what should be done if we're in shutdown and the ScriptPreloader has already gone away and we're recreating it. The callers of ScriptPreloader::GetSingleton() aren't set up to deal with there not being one, and ScriptPreloader::InitCache() doesn't seem to be set up to deal with there not being a URL preloader. I'm not sure if it will be okay with a URL preloader that hasn't been successfully initialized. This code also looks old, so I'm not sure where things have started going wrong.

I'll move this to XPConnect, because I think the main problem is things going wrong when we recreate the preloader during shutdown.

Component: XPCOM → XPConnect

interestingly, so far there are no crashes in 80.0b3.

Yeah, I'm not sure why this is suddenly happening so much. The interesting part of the stack would be in JS, but of course we don't have that in the crash report.

The 78 crash involves l10n, so maybe the 80 crash also involves l10n in a way that is less obvious, and the localizations aren't updated in the freshly-rolled out beta 3? I dunno how any of that really works.

See Also: → 1657231

80b4 has a crash now: bp-9c4a4c18-6ec0-42e4-9af5-cf2e00200805

It looks a little different than the other 80 crashes. It is also after xpcom-shutdown. We're in ScriptCacheParent::Recv__delete__, and trying to put some cached script information into the preloader, but the preloader has gone away so we're recreating it. (ScriptCacheChild::ActorDestroy has the same issue.) I'm still not sure why we're seeing so many of these crashes on beta.

For this particular case, it looks like we can just not do anything if the preloader doesn't exist.

Attached image crash version.png

when taking a closer look at the crash data, it appears that we start getting crash reports from a version just once its successor was getting released (this is also what caught me off-guard in comment #8).
so it's likely something related to an update that's triggering this problem.

version release date crashes starting
80.0b1 2020-07-29 2020-07-31
80.0b2 2020-07-31 2020-08-03
80.0b3 2020-08-03 2020-08-05
80.0b4 2020-08-05 ?

P2

Severity: -- → S2
Priority: -- → P2

this dll correlation involving the BITS service also indicates the crash commonly happens around the time of an update:
(80.22% in signature vs 03.77% overall) Module "BitsProxy.dll" = true [96.69% vs 08.36% if platform_pretty_version = Windows 10]

Between this bug 1656974, maybe something is causing shutdown to take a lot longer. The crashes here look like shutdown races that were latent, and maybe JS code running very late in shutdown when it wouldn't have before is the issue.

See Also: → 1656974

So far it looks like the changes from bug 1657231 made this signature go away in 80.0b7.

(In reply to Julien Cristau [:jcristau] from comment #17)

So far it looks like the changes from bug 1657231 made this signature go away in 80.0b7.

I think it is too soon to say. The pattern we've seen before is that the crashes only show up when the next beta rolls out. So we need to see what happens to b7 when b8 is released. (See comment 13.)

Doh, right, I somehow forgot about that yesterday...

80.0b8 shipped on Friday, and still no crashes from 80.0b7 on crash-stats.

Crash Signature: [@ nsMemoryReporterManager::RegisterReporterHelper] → [@ nsMemoryReporterManager::RegisterReporterHelper] [@ CrashIfRefcountIsZero]

No crashes in 80.0b8 or 80.0rc1. Can we declare victory here?

Crash Signature: [@ nsMemoryReporterManager::RegisterReporterHelper] [@ CrashIfRefcountIsZero] → [@ nsMemoryReporterManager::RegisterReporterHelper] [@ CrashIfRefcountIsZero]
Flags: needinfo?(continuation)

Sure!

Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(continuation)
Resolution: --- → FIXED
Depends on: 1657231
Target Milestone: --- → 81 Branch
See Also: → 1663315
See Also: → 1663975
See Also: → 1663981
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: