Crash in mozilla::detail::log_test called from LOG()/MOZ_LOG()
Categories
(Core :: XPCOM, defect, P1)
Tracking
()
People
(Reporter: jesup, Assigned: erahm)
References
Details
(5 keywords)
Crash Data
Attachments
(3 files, 1 obsolete file)
3.09 KB,
patch
|
froydnj
:
review+
RyanVM
:
approval-mozilla-beta+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
2.11 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
1.35 KB,
patch
|
froydnj
:
review+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1415441 +++ There are many crashes (and many of them wildptrs and sometimes clear UAFs) from logging code all over the tree. Perhaps 50% of them appear to be startup crashes; a few a shutdown crashes, some clearly are neither. Something is wrong with LazyLogModule/LogModule/LogModuleManager, clearly. The hack in bug 1415441 just moved when the LogModule was touched at startup, changing timing for that one instance. This bug is to cover the entire set of crashes, which likely all (or most) come from the same root cause. All the crashes since the DB dump have been Windows... perhaps chance, perhaps timing, perhaps something related to fork()/exec()/etc. One problem I see is that LogModule::Init() sets sLogModuleManager, and *then* calls sLogModuleManager->Init() (which can touch disk, etc). Clearly this is racy if two threads try to call LogModule::Init(). Also, while I haven't seen it in the traces, I'd be concerned if anything LogModuleManager::Init() does tries to log itself (like file IO routines). Probably nothing does now, but it's clearly a danger that's not protected against (or mentioned). I don't think this problem is the root of all of these crashes, however.
Assignee | ||
Comment 1•7 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #0) > +++ This bug was initially created as a clone of Bug #1415441 +++ > > There are many crashes (and many of them wildptrs and sometimes clear UAFs) > from logging code all over the tree. Perhaps 50% of them appear to be > startup crashes; a few a shutdown crashes, some clearly are neither. > > Something is wrong with LazyLogModule/LogModule/LogModuleManager, clearly. > The hack in bug 1415441 just moved when the LogModule was touched at > startup, changing timing for that one instance. FWIW that hack didn't work, see bp-2aed59d2-b2d4-45d3-b43f-ab9390180125 > This bug is to cover the entire set of crashes, which likely all (or most) > come from the same root cause. > > All the crashes since the DB dump have been Windows... perhaps chance, > perhaps timing, perhaps something related to fork()/exec()/etc. Windows doesn't have fork/exec, but it should be roughly equivalent to fork/execve (as opposed to the nuwa style fork). I'm partially inclined to think this might be msvc/pgo related. > One problem I see is that LogModule::Init() sets sLogModuleManager, and > *then* calls sLogModuleManager->Init() (which can touch disk, etc). Why do you think it will touch disk? We do `fopen` a file if an env var is set but I wouldn't expect anyone to actually be using logging in the wild, particularly on Windows where setting an env var is hard to do. > Clearly > this is racy if two threads try to call LogModule::Init(). Correct, it's not designed to be called by multiple threads. We could add a diagnostic assert to enforce main-thread only (we'll have to double check content processes initialize from the main thread). We could also add an assert that it's only initialized once. > Also, while I > haven't seen it in the traces, I'd be concerned if anything > LogModuleManager::Init() does tries to log itself (like file IO routines). > Probably nothing does now, but it's clearly a danger that's not protected > against (or mentioned). Correct, it currently does not call any functions that log. We use OffTheBooksMutex and raw c-strings for this reason. We could certainly add a note. That said, even if we called something that logged it would be harmless, we'd do log_test that looks up the module in our class initialized hashtable, creates the missing module, adds it in the disabled state, and then it wouldn't log due to being disabled. Holding off on assigning `sLogModuleManager` until we've initialized might be a good idea as that would lead to a pretty clear nullptr deref if we try to call log_test. > I don't think this problem is the root of all of these crashes, however. Agreed.
Assignee | ||
Comment 2•7 years ago
|
||
Now let's think about how LogModule::Init is called: What I'll call a normal startup: =============================== XREMain::XRE_main - *calls Init* -> XRE_mainStartup -> nsNativeAppSupportUnix::Start - *calls Init* -> ScopedXPCOMStartup::Initialize -> NS_InitXPCOM2 - *calls Init* So here's an instance where it's called twice, but is harmless. XPCShell testing (somewhat irrelevant to crashstats) ==================================================== XRE_XPCShellMain - calls Init -> NS_InitXPCOM2 - calls Init Again an instance where it's called twice, but is harmless. ChildProcess ============ XRE_InitChildProcess - *calls Init* -> PluginProcessChild::Init - *calls Init* -> ContentProcess::Init -> ScopedXREEmbed::Start -> XRE_InitEmbedding2 -> NS_InitXPCOM2 - *calls Init* -> GPUProcessImpl::Init -> NS_InitMinimalXPCOM - *calls Init* Again many instances where it's called twice, but is harmless. What I'm not seeing is any instance we're we'd initialize on different threads.
Assignee | ||
Comment 3•7 years ago
|
||
This adds some assertions to make the intended usage of LogModuleManager::Init more clear. Nathan, this is a diagnostic patch that might help us identify if any of the possible issues mentioned in commment 0 are actually happening.
Assignee | ||
Updated•7 years ago
|
![]() |
||
Comment 4•7 years ago
|
||
Comment on attachment 8949596 [details] [diff] [review] Avoid initializing LogModuleManager more than once Review of attachment 8949596 [details] [diff] [review]: ----------------------------------------------------------------- I like the idea; I think it needs some tweaks, below. ::: xpcom/base/Logging.cpp @@ +200,5 @@ > */ > void Init() > { > + MOZ_DIAGNOSTIC_ASSERT(!mInitialized); > + mInitialized = true; If `mInitialized` is a member variable, that means we have to call Init() twice from different threads to hit this assert, right? So we really want this to be some flavor of Atomic<> + a compare-and-swap, to make sure we're not racing on `mInitialized`, I think. @@ +568,4 @@ > { > // NB: This method is not threadsafe; it is expected to be called very early > // in startup prior to any other threads being run. > + MOZ_DIAGNOSTIC_ASSERT(NS_IsMainThread()); Are you even sure we *have* a main thread at this point to make this assertion succeed? @@ +582,5 @@ > + // Don't assign the pointer until after Init is called. This should help us > + // detect if any of the functions called by Init somehow rely on logging. > + auto mgr = new LogModuleManager(); > + mgr->Init(); > + sLogModuleManager = mgr; Should we add a check that we're not somehow racing here to set sLogModuleManager, too? I mean, it seems unlikely, but as long as we're checking in diagnostic patches, we might as well try to add in checks for all the conditions that we think could happen, and my impression of comment 0 is that jesup is concerned we could be calling this function more than once, not just trying to re-Init()' a single LogModuleManager. Am I confused?
Assignee | ||
Comment 5•7 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #4) > Comment on attachment 8949596 [details] [diff] [review] > Avoid initializing LogModuleManager more than once > > Review of attachment 8949596 [details] [diff] [review]: > ----------------------------------------------------------------- > > I like the idea; I think it needs some tweaks, below. > > ::: xpcom/base/Logging.cpp > @@ +200,5 @@ > > */ > > void Init() > > { > > + MOZ_DIAGNOSTIC_ASSERT(!mInitialized); > > + mInitialized = true; > > If `mInitialized` is a member variable, that means we have to call Init() > twice from different threads to hit this assert, right? So we really want > this to be some flavor of Atomic<> + a compare-and-swap, to make sure we're > not racing on `mInitialized`, I think. The NS_IsMainThread assert below guarantees it's not being accessed by multiple threads. > @@ +568,4 @@ > > { > > // NB: This method is not threadsafe; it is expected to be called very early > > // in startup prior to any other threads being run. > > + MOZ_DIAGNOSTIC_ASSERT(NS_IsMainThread()); > > Are you even sure we *have* a main thread at this point to make this > assertion succeed? Yes, it's generally set by NS_LogInit which is always called before LogModule::Init (note: the names are similar, but they're unrelated). FWIW my linux build runs fine with this added in. > @@ +582,5 @@ > > + // Don't assign the pointer until after Init is called. This should help us > > + // detect if any of the functions called by Init somehow rely on logging. > > + auto mgr = new LogModuleManager(); > > + mgr->Init(); > > + sLogModuleManager = mgr; > > Should we add a check that we're not somehow racing here to set > sLogModuleManager, too? I mean, it seems unlikely, but as long as we're > checking in diagnostic patches, we might as well try to add in checks for > all the conditions that we think could happen, and my impression of comment > 0 is that jesup is concerned we could be calling this function more than > once, not just trying to re-Init()' a single LogModuleManager. Am I > confused? The NS_IsMainThread assert above guarantees this isn't racy.
Assignee | ||
Comment 6•7 years ago
|
||
Comment on attachment 8949596 [details] [diff] [review] Avoid initializing LogModuleManager more than once Nathan, I think I addressed you concerns in comment 5. Let me know if I missed something.
![]() |
||
Comment 7•7 years ago
|
||
Comment on attachment 8949596 [details] [diff] [review] Avoid initializing LogModuleManager more than once Review of attachment 8949596 [details] [diff] [review]: ----------------------------------------------------------------- OK, let's try this!
Assignee | ||
Comment 8•7 years ago
|
||
Sorry for the churn, this slightly modifies the log module parser logic to avoid GetLog since sLogModuleManager isn't set yet.
Assignee | ||
Updated•7 years ago
|
![]() |
||
Updated•7 years ago
|
Assignee | ||
Comment 9•7 years ago
|
||
Comment on attachment 8950405 [details] [diff] [review] Avoid initializing LogModuleManager more than once Not sure about the policy for diagnostic patches for sec-highs so asking for approval. [Security approval request comment] How easily could an exploit be constructed based on the patch? Not easy, this is a diagnostic patch adding a few assertions. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? It's commented more like a code cleanliness commit. Which older supported branches are affected by this flaw? All. If not all supported branches, which bug introduced the flaw? N/A, but whenever we added LazyLogModule is the best guess. Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? No, this is a diagnostic patch. It should apply to beta easily. How likely is this patch to cause regressions; how much testing does it need? Low, just diagnostic asserts. We might see crashes on the new assertions.
Comment 10•7 years ago
|
||
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #9) > Comment on attachment 8950405 [details] [diff] [review] > Avoid initializing LogModuleManager more than once > > Not sure about the policy for diagnostic patches for sec-highs so asking for > approval. They should get approval but it isn't hard to get! This all seems fine so sec-approval+.
Updated•7 years ago
|
Updated•7 years ago
|
Comment 12•7 years ago
|
||
Is this something we need to uplift to Beta?
Assignee | ||
Comment 13•7 years ago
|
||
Comment on attachment 8950405 [details] [diff] [review] Avoid initializing LogModuleManager more than once Approval Request Comment [Feature/Bug causing the regression]: Unsure, we thinking it's LazyLogModule related. [User impact if declined]: We might not fix this sec bug. [Is this code covered by automated tests?]: Yes. [Has the fix been verified in Nightly?]: Yes. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: N/A [Is the change risky?]: Low risk. [Why is the change risky/not risky?]: We're just adding diagnostic asserts. [String changes made/needed]: N/A
Comment 14•7 years ago
|
||
Comment on attachment 8950405 [details] [diff] [review] Avoid initializing LogModuleManager more than once Gets more diagnostics for these crashes. Taking for 59b12.
Reporter | ||
Comment 15•7 years ago
|
||
what signature would the new crashes appear on?
Assignee | ||
Comment 16•7 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #15) > what signature would the new crashes appear on? Assertions in LogModule::Init and LogModuleManager::Init. Possible segv for nullptr in LogModule::Get.
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 17•7 years ago
|
||
Note: we have a bunch of crashes in GetOrCreate for LogModule, such as 90 in the last week in 58.0.2 all apparently in the Timer Thread: https://crash-stats.mozilla.com/signature/?signature=nsTHashtable%3CT%3E%3A%3As_HashKey%20%7C%20PLDHashTable%3A%3ASearch%20%7C%20mozilla%3A%3ALogModuleManager%3A%3ACreateOrGetModule&date=%3E%3D2018-02-13T08%3A17%3A58.000Z&date=%3C2018-02-20T08%3A17%3A58.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports All the addresses end in x7690....??? (0x51017690 for example). All READ failures
Reporter | ||
Comment 18•7 years ago
|
||
Lots of crashes for LogModule* too: https://crash-stats.mozilla.com/signature/?signature=mozilla%3A%3ALazyLogModule%3A%3Aoperator%20mozilla%3A%3ALogModule%2A&date=%3E%3D2018-02-13T13%3A11%3A03.000Z&date=%3C2018-02-20T13%3A11%3A03.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports READs, WRITES, ILLEGAL INSTRUCTION, etc
![]() |
||
Comment 19•7 years ago
|
||
uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/2ea72bf995ee
Assignee | ||
Comment 20•6 years ago
|
||
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #16) > (In reply to Randell Jesup [:jesup] from comment #15) > > what signature would the new crashes appear on? > > Assertions in LogModule::Init and LogModuleManager::Init. Possible segv for > nullptr in LogModule::Get. I'm not seeing any crashes with the new assertions in 59.0b12/13, I think we can rule out racy/off main thread initialization.
Comment 21•6 years ago
|
||
Looks like we've had a couple reports from Beta60. Anything interesting in them, Eric?
Assignee | ||
Comment 22•6 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #21) > Looks like we've had a couple reports from Beta60. Anything interesting in > them, Eric? Still no crashes in the instrumented code. The failures for `mozilla::detail::log_test` are still all Windows. There's not really a clear pattern; in the past we've seen issues with externed LazyLogModules on Windows but not all are externed. Quite a few declare their own LOG macro instead of MOZ_LOG, it's possible this is causing issues, but again not all instances use this. The addresses are kind of all over the place, we're not seeing any poisoning so it's most likely not UAF. I'm not sure what to do at this point. We could add a canary to LazyLogModule and LogModuleManager to see if we're getting some sort of corruption.
Comment 23•6 years ago
|
||
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #22) > I'm not sure what to do at this point. We could add a canary to > LazyLogModule and LogModuleManager to see if we're getting some sort of > corruption. How much work would that be? Could be worthwhile to have on Nightly for a while.
Assignee | ||
Comment 24•6 years ago
|
||
(In reply to Frederik Braun [:freddyb] from comment #23) > (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment > #22) > > I'm not sure what to do at this point. We could add a canary to > > LazyLogModule and LogModuleManager to see if we're getting some sort of > > corruption. > > How much work would that be? Could be worthwhile to have on Nightly for a > while. It's pretty easy, the question is how often/where we should check the canary. I'll put together a patch to at least add it.
Assignee | ||
Comment 25•6 years ago
|
||
I don't feel great about checking the canary every time we access the log module, but I'm not sure of a better alternative.
![]() |
||
Comment 26•6 years ago
|
||
Comment on attachment 8974494 [details] [diff] [review] Add debug code to LazyLogModule Review of attachment 8974494 [details] [diff] [review]: ----------------------------------------------------------------- What is the ideal outcome from adding this canary? We'll identify the module that's getting corrupted and then...? No objection to the code, just not sure what it's going to tell us. I guess maybe we'll get better stacks? (Stacks from some of these reports look terrible.) ::: xpcom/base/Logging.h @@ +174,5 @@ > operator LogModule*(); > > private: > const char* const mLogName; > + CorruptionCanary mCanary; Do we want to restrict this member to MOZ_DIAGNOSTIC_ASSERT_ENABLED or somesuch, or do we want it to ride the trains all the way to release?
Comment 27•6 years ago
|
||
Looks like this got r+ a couple weeks ago and then...?
Assignee | ||
Comment 28•6 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #27) > Looks like this got r+ a couple weeks ago and then...? I'll get it landed.
Assignee | ||
Comment 29•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #26) > Comment on attachment 8974494 [details] [diff] [review] > Add debug code to LazyLogModule > > Review of attachment 8974494 [details] [diff] [review]: > ----------------------------------------------------------------- > > What is the ideal outcome from adding this canary? We'll identify the > module that's getting corrupted and then...? > > No objection to the code, just not sure what it's going to tell us. I guess > maybe we'll get better stacks? (Stacks from some of these reports look > terrible.) Yeah, I think better stacks and it will let us know if this is corruption. If it's *not* corruption I think we're possibly looking at a msvc-specific issue. > ::: xpcom/base/Logging.h > @@ +174,5 @@ > > operator LogModule*(); > > > > private: > > const char* const mLogName; > > + CorruptionCanary mCanary; > > Do we want to restrict this member to MOZ_DIAGNOSTIC_ASSERT_ENABLED or > somesuch, or do we want it to ride the trains all the way to release? I'm inclined to let this ride the trains as long as it's not a huge perf hit. If we see any perf issues after landing we can follow up limiting to beta.
Assignee | ||
Comment 30•6 years ago
|
||
This ended up getting backed out, we need to make CorruptionCanary constexpr or remove the constexpr from LazyLogModule.
> Check-in: https://hg.mozilla.org/integration/mozilla-inbound/rev/54e79ab99b0f - Coroiu Cristina - Backed out changeset 82ef13b02416 (bug 1436768) for build bustage at obj-firefox/dist/include/mozilla/Logging.h on a CLOSED TREE
Assignee | ||
Comment 31•6 years ago
|
||
This makes CorruptionCanary constexpr as well.
![]() |
||
Updated•6 years ago
|
Comment 32•6 years ago
|
||
Comment on attachment 8983252 [details] [diff] [review] Add debug code to LazyLogModule https://hg.mozilla.org/mozilla-central/rev/0014f483953e
Assignee | ||
Comment 33•6 years ago
|
||
This regressed the amount of static constructors by a large amount [1] and also regressed section sizes by ~17k. ~10k was .text, ~5k .data [2]. [1] https://treeherder.mozilla.org/perf.html#/graphs?timerange=86400&series=mozilla-inbound,1689472,1,2&series=mozilla-inbound,1468543,1,2&series=mozilla-inbound,1689462,1,2&selected=mozilla-inbound,1689472,344767,482494429,2 [2] https://treeherder.mozilla.org/perf.html#/graphs?series=mozilla-inbound,1686767,0,2&series=mozilla-inbound,1686771,1,2&series=mozilla-inbound,1686769,1,2&series=mozilla-inbound,1686770,1,2&series=mozilla-inbound,1686768,1,2&series=mozilla-inbound,1686772,1,2&highlightedRevisions=0014f483953e&zoom=1528127156193.9834,1528234183000,0,75223871.48729011
![]() |
||
Comment 34•6 years ago
|
||
This crash doesn't really seem to be showing up, at least on release (not sure what's going on with ESR), and it'd be nice to just remove the Canary check if we wanted to re-inline LazyLogModule::operator LogModule*. WDYT about just removing the canary?
Assignee | ||
Comment 35•6 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #34)
This crash doesn't really seem to be showing up, at least on release (not sure what's going on with ESR), and it'd be nice to just remove the Canary check if we wanted to re-inline LazyLogModule::operator LogModule*. WDYT about just removing the canary?
I have a somewhat unfounded theory that adding the canary got rid of crashes due to some weird msvc codegen. Now that we're building with clang-cl maybe that doesn't matter (and maybe my theory is bunk anyhow). I'm okay with removing it and seeing what happens.
Assignee | ||
Comment 36•5 years ago
|
||
(In reply to Nathan Froyd [:froydnj] from comment #34)
This crash doesn't really seem to be showing up, at least on release (not sure what's going on with ESR), and it'd be nice to just remove the Canary check if we wanted to re-inline LazyLogModule::operator LogModule*. WDYT about just removing the canary?
Nathan's doing this in bug 1529955. If we don't see an uptick in crashes I'd say we can just close this bug.
Assignee | ||
Comment 37•5 years ago
|
||
We're not seeing this anymore in recent versions.
Updated•5 years ago
|
Updated•5 years ago
|
Description
•