Closed Bug 1436768 Opened 7 years ago Closed 5 years ago

Crash in mozilla::detail::log_test called from LOG()/MOZ_LOG()

Categories

(Core :: XPCOM, defect, P1)

56 Branch
x86
Windows
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix

People

(Reporter: jesup, Assigned: erahm)

References

Details

(5 keywords)

Crash Data

Attachments

(3 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1415441 +++

There are many crashes (and many of them wildptrs and sometimes clear UAFs) from logging code all over the tree.  Perhaps 50% of them appear to be startup crashes; a few a shutdown crashes, some clearly are neither.

Something is wrong with LazyLogModule/LogModule/LogModuleManager, clearly.  The hack in bug 1415441 just moved when the LogModule was touched at startup, changing timing for that one instance.

This bug is to cover the entire set of crashes, which likely all (or most) come from the same root cause.

All the crashes since the DB dump have been Windows... perhaps chance, perhaps timing, perhaps something related to fork()/exec()/etc.

One problem I see is that LogModule::Init() sets sLogModuleManager, and *then* calls sLogModuleManager->Init() (which can touch disk, etc).  Clearly this is racy if two threads try to call LogModule::Init().  Also, while I haven't seen it in the traces, I'd be concerned if anything LogModuleManager::Init() does tries to log itself (like file IO routines).  Probably nothing does now, but it's clearly a danger that's not protected against (or mentioned).

I don't think this problem is the root of all of these crashes, however.
No longer depends on: 1415441
See Also: → 1415441
(In reply to Randell Jesup [:jesup] from comment #0)
> +++ This bug was initially created as a clone of Bug #1415441 +++
> 
> There are many crashes (and many of them wildptrs and sometimes clear UAFs)
> from logging code all over the tree.  Perhaps 50% of them appear to be
> startup crashes; a few a shutdown crashes, some clearly are neither.
> 
> Something is wrong with LazyLogModule/LogModule/LogModuleManager, clearly. 
> The hack in bug 1415441 just moved when the LogModule was touched at
> startup, changing timing for that one instance.

FWIW that hack didn't work, see bp-2aed59d2-b2d4-45d3-b43f-ab9390180125

> This bug is to cover the entire set of crashes, which likely all (or most)
> come from the same root cause.
> 
> All the crashes since the DB dump have been Windows... perhaps chance,
> perhaps timing, perhaps something related to fork()/exec()/etc.

Windows doesn't have fork/exec, but it should be roughly equivalent to fork/execve (as opposed to the nuwa style fork). I'm partially inclined to think this might be msvc/pgo related.

> One problem I see is that LogModule::Init() sets sLogModuleManager, and
> *then* calls sLogModuleManager->Init() (which can touch disk, etc).

Why do you think it will touch disk? We do `fopen` a file if an env var is set but I wouldn't expect anyone to actually be using logging in the wild, particularly on Windows where setting an env var is hard to do.

> Clearly
> this is racy if two threads try to call LogModule::Init().

Correct, it's not designed to be called by multiple threads. We could add a diagnostic assert to enforce main-thread only (we'll have to double check content processes initialize from the main thread). We could also add an assert that it's only initialized once.

> Also, while I
> haven't seen it in the traces, I'd be concerned if anything
> LogModuleManager::Init() does tries to log itself (like file IO routines). 
> Probably nothing does now, but it's clearly a danger that's not protected
> against (or mentioned).

Correct, it currently does not call any functions that log. We use OffTheBooksMutex and raw c-strings for this reason. We could certainly add a note.

That said, even if we called something that logged it would be harmless, we'd do log_test that looks up the module in our class initialized hashtable, creates the missing module, adds it in the disabled state, and then it wouldn't log due to being disabled. 

Holding off on assigning `sLogModuleManager` until we've initialized might be a good idea as that would lead to a pretty clear nullptr deref if we try to call log_test.

> I don't think this problem is the root of all of these crashes, however.

Agreed.
Now let's think about how LogModule::Init is called:

What I'll call a normal startup:
===============================
XREMain::XRE_main - *calls Init*
  -> XRE_mainStartup
    -> nsNativeAppSupportUnix::Start - *calls Init*
  -> ScopedXPCOMStartup::Initialize
      -> NS_InitXPCOM2 - *calls Init*

So here's an instance where it's called twice, but is harmless.

XPCShell testing (somewhat irrelevant to crashstats)
====================================================
XRE_XPCShellMain - calls Init
  -> NS_InitXPCOM2 - calls Init

Again an instance where it's called twice, but is harmless.

ChildProcess
============
XRE_InitChildProcess - *calls Init*
  -> PluginProcessChild::Init - *calls Init*
  -> ContentProcess::Init
    -> ScopedXREEmbed::Start
      -> XRE_InitEmbedding2
        -> NS_InitXPCOM2 - *calls Init*
  -> GPUProcessImpl::Init
    -> NS_InitMinimalXPCOM - *calls Init*

Again many instances where it's called twice, but is harmless.

What I'm not seeing is any instance we're we'd initialize on different threads.
This adds some assertions to make the intended usage of LogModuleManager::Init
more clear.

Nathan, this is a diagnostic patch that might help us identify if any of the
possible issues mentioned in commment 0 are actually happening.
Attachment #8949596 - Flags: review?(nfroyd)
Assignee: nobody → erahm
Status: NEW → ASSIGNED
Comment on attachment 8949596 [details] [diff] [review]
Avoid initializing LogModuleManager more than once

Review of attachment 8949596 [details] [diff] [review]:
-----------------------------------------------------------------

I like the idea; I think it needs some tweaks, below.

::: xpcom/base/Logging.cpp
@@ +200,5 @@
>     */
>    void Init()
>    {
> +    MOZ_DIAGNOSTIC_ASSERT(!mInitialized);
> +    mInitialized = true;

If `mInitialized` is a member variable, that means we have to call Init() twice from different threads to hit this assert, right?  So we really want this to be some flavor of Atomic<> + a compare-and-swap, to make sure we're not racing on `mInitialized`, I think.

@@ +568,4 @@
>  {
>    // NB: This method is not threadsafe; it is expected to be called very early
>    //     in startup prior to any other threads being run.
> +  MOZ_DIAGNOSTIC_ASSERT(NS_IsMainThread());

Are you even sure we *have* a main thread at this point to make this assertion succeed?

@@ +582,5 @@
> +  // Don't assign the pointer until after Init is called. This should help us
> +  // detect if any of the functions called by Init somehow rely on logging.
> +  auto mgr = new LogModuleManager();
> +  mgr->Init();
> +  sLogModuleManager = mgr;

Should we add a check that we're not somehow racing here to set sLogModuleManager, too?  I mean, it seems unlikely, but as long as we're checking in diagnostic patches, we might as well try to add in checks for all the conditions that we think could happen, and my impression of comment 0 is that jesup is concerned we could be calling this function more than once, not just trying to re-Init()' a single LogModuleManager.  Am I confused?
Attachment #8949596 - Flags: review?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #4)
> Comment on attachment 8949596 [details] [diff] [review]
> Avoid initializing LogModuleManager more than once
> 
> Review of attachment 8949596 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I like the idea; I think it needs some tweaks, below.
> 
> ::: xpcom/base/Logging.cpp
> @@ +200,5 @@
> >     */
> >    void Init()
> >    {
> > +    MOZ_DIAGNOSTIC_ASSERT(!mInitialized);
> > +    mInitialized = true;
> 
> If `mInitialized` is a member variable, that means we have to call Init()
> twice from different threads to hit this assert, right?  So we really want
> this to be some flavor of Atomic<> + a compare-and-swap, to make sure we're
> not racing on `mInitialized`, I think.

The NS_IsMainThread assert below guarantees it's not being accessed by multiple threads.

> @@ +568,4 @@
> >  {
> >    // NB: This method is not threadsafe; it is expected to be called very early
> >    //     in startup prior to any other threads being run.
> > +  MOZ_DIAGNOSTIC_ASSERT(NS_IsMainThread());
> 
> Are you even sure we *have* a main thread at this point to make this
> assertion succeed?

Yes, it's generally set by NS_LogInit which is always called before LogModule::Init (note: the names are similar, but they're unrelated). FWIW my linux build runs fine with this added in.

> @@ +582,5 @@
> > +  // Don't assign the pointer until after Init is called. This should help us
> > +  // detect if any of the functions called by Init somehow rely on logging.
> > +  auto mgr = new LogModuleManager();
> > +  mgr->Init();
> > +  sLogModuleManager = mgr;
> 
> Should we add a check that we're not somehow racing here to set
> sLogModuleManager, too?  I mean, it seems unlikely, but as long as we're
> checking in diagnostic patches, we might as well try to add in checks for
> all the conditions that we think could happen, and my impression of comment
> 0 is that jesup is concerned we could be calling this function more than
> once, not just trying to re-Init()' a single LogModuleManager.  Am I
> confused?

The NS_IsMainThread assert above guarantees this isn't racy.
Comment on attachment 8949596 [details] [diff] [review]
Avoid initializing LogModuleManager more than once

Nathan, I think I addressed you concerns in comment 5. Let me know if I missed something.
Attachment #8949596 - Flags: review?(nfroyd)
Comment on attachment 8949596 [details] [diff] [review]
Avoid initializing LogModuleManager more than once

Review of attachment 8949596 [details] [diff] [review]:
-----------------------------------------------------------------

OK, let's try this!
Attachment #8949596 - Flags: review?(nfroyd) → review+
Sorry for the churn, this slightly modifies the log module parser logic to avoid GetLog since sLogModuleManager isn't set yet.
Attachment #8950405 - Flags: review?(nfroyd)
Attachment #8949596 - Attachment is obsolete: true
Attachment #8950405 - Flags: review?(nfroyd) → review+
Comment on attachment 8950405 [details] [diff] [review]
Avoid initializing LogModuleManager more than once

Not sure about the policy for diagnostic patches for sec-highs so asking for approval.

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

Not easy, this is a diagnostic patch adding a few assertions.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

It's commented more like a code cleanliness commit.

Which older supported branches are affected by this flaw?

All.

If not all supported branches, which bug introduced the flaw?

N/A, but whenever we added LazyLogModule is the best guess.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

No, this is a diagnostic patch. It should apply to beta easily.

How likely is this patch to cause regressions; how much testing does it need?

Low, just diagnostic asserts. We might see crashes on the new assertions.
Attachment #8950405 - Flags: sec-approval?
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #9)
> Comment on attachment 8950405 [details] [diff] [review]
> Avoid initializing LogModuleManager more than once
> 
> Not sure about the policy for diagnostic patches for sec-highs so asking for
> approval.


They should get approval but it isn't hard to get!

This all seems fine so sec-approval+.
Attachment #8950405 - Flags: sec-approval? → sec-approval+
Group: core-security → dom-core-security
Is this something we need to uplift to Beta?
Flags: needinfo?(erahm)
Comment on attachment 8950405 [details] [diff] [review]
Avoid initializing LogModuleManager more than once

Approval Request Comment
[Feature/Bug causing the regression]:

Unsure, we thinking it's LazyLogModule related.

[User impact if declined]:

We might not fix this sec bug.

[Is this code covered by automated tests?]:

Yes.

[Has the fix been verified in Nightly?]:

Yes.

[Needs manual test from QE? If yes, steps to reproduce]: 

No.

[List of other uplifts needed for the feature/fix]:

N/A

[Is the change risky?]:

Low risk.

[Why is the change risky/not risky?]:

We're just adding diagnostic asserts.

[String changes made/needed]:

N/A
Flags: needinfo?(erahm)
Attachment #8950405 - Flags: approval-mozilla-beta?
Comment on attachment 8950405 [details] [diff] [review]
Avoid initializing LogModuleManager more than once

Gets more diagnostics for these crashes. Taking for 59b12.
Attachment #8950405 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
what signature would the new crashes appear on?
Flags: needinfo?(erahm)
(In reply to Randell Jesup [:jesup] from comment #15)
> what signature would the new crashes appear on?

Assertions in LogModule::Init and LogModuleManager::Init. Possible segv for nullptr in LogModule::Get.
Flags: needinfo?(erahm)
Crash Signature: [@ mozilla::detail::log_test] → [@ mozilla::detail::log_test] [@ mozilla::LogModuleManager::Init] [@ mozilla::LogModule::Init] [@ mozilla::LogModule::Get]
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #16)
> (In reply to Randell Jesup [:jesup] from comment #15)
> > what signature would the new crashes appear on?
> 
> Assertions in LogModule::Init and LogModuleManager::Init. Possible segv for
> nullptr in LogModule::Get.

I'm not seeing any crashes with the new assertions in 59.0b12/13, I think we can rule out racy/off main thread initialization.
Looks like we've had a couple reports from Beta60. Anything interesting in them, Eric?
Flags: needinfo?(erahm)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #21)
> Looks like we've had a couple reports from Beta60. Anything interesting in
> them, Eric?

Still no crashes in the instrumented code. The failures for `mozilla::detail::log_test` are still all Windows. There's not really a clear pattern; in the past we've seen issues with externed LazyLogModules on Windows but not all are externed. Quite a few declare their own LOG macro instead of MOZ_LOG, it's possible this is causing issues, but again not all instances use this. The addresses are kind of all over the place, we're not seeing any poisoning so it's most likely not UAF.

I'm not sure what to do at this point. We could add a canary to LazyLogModule and LogModuleManager to see if we're getting some sort of corruption.
Flags: needinfo?(erahm)
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #22)
> I'm not sure what to do at this point. We could add a canary to
> LazyLogModule and LogModuleManager to see if we're getting some sort of
> corruption.

How much work would that be? Could be worthwhile to have on Nightly for a while.
Flags: needinfo?(erahm)
(In reply to Frederik Braun [:freddyb] from comment #23)
> (In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment
> #22)
> > I'm not sure what to do at this point. We could add a canary to
> > LazyLogModule and LogModuleManager to see if we're getting some sort of
> > corruption.
> 
> How much work would that be? Could be worthwhile to have on Nightly for a
> while.

It's pretty easy, the question is how often/where we should check the canary. I'll put together a patch to at least add it.
Flags: needinfo?(erahm)
I don't feel great about checking the canary every time we access the log module, but I'm not sure of a better alternative.
Attachment #8974494 - Flags: review?(nfroyd)
Comment on attachment 8974494 [details] [diff] [review]
Add debug code to LazyLogModule

Review of attachment 8974494 [details] [diff] [review]:
-----------------------------------------------------------------

What is the ideal outcome from adding this canary?  We'll identify the module that's getting corrupted and then...?

No objection to the code, just not sure what it's going to tell us.  I guess maybe we'll get better stacks?  (Stacks from some of these reports look terrible.)

::: xpcom/base/Logging.h
@@ +174,5 @@
>    operator LogModule*();
>  
>  private:
>    const char* const mLogName;
> +  CorruptionCanary mCanary;

Do we want to restrict this member to MOZ_DIAGNOSTIC_ASSERT_ENABLED or somesuch, or do we want it to ride the trains all the way to release?
Attachment #8974494 - Flags: review?(nfroyd) → review+
Looks like this got r+ a couple weeks ago and then...?
Flags: needinfo?(erahm)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #27)
> Looks like this got r+ a couple weeks ago and then...?

I'll get it landed.
Flags: needinfo?(erahm)
(In reply to Nathan Froyd [:froydnj] from comment #26)
> Comment on attachment 8974494 [details] [diff] [review]
> Add debug code to LazyLogModule
> 
> Review of attachment 8974494 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> What is the ideal outcome from adding this canary?  We'll identify the
> module that's getting corrupted and then...?
> 
> No objection to the code, just not sure what it's going to tell us.  I guess
> maybe we'll get better stacks?  (Stacks from some of these reports look
> terrible.)

Yeah, I think better stacks and it will let us know if this is corruption. If it's *not* corruption I think we're possibly looking at a msvc-specific issue.

> ::: xpcom/base/Logging.h
> @@ +174,5 @@
> >    operator LogModule*();
> >  
> >  private:
> >    const char* const mLogName;
> > +  CorruptionCanary mCanary;
> 
> Do we want to restrict this member to MOZ_DIAGNOSTIC_ASSERT_ENABLED or
> somesuch, or do we want it to ride the trains all the way to release?

I'm inclined to let this ride the trains as long as it's not a huge perf hit. If we see any perf issues after landing we can follow up limiting to beta.
This ended up getting backed out, we need to make CorruptionCanary constexpr or remove the constexpr from LazyLogModule.

> Check-in: https://hg.mozilla.org/integration/mozilla-inbound/rev/54e79ab99b0f - Coroiu Cristina - Backed out changeset 82ef13b02416 (bug 1436768) for build bustage at obj-firefox/dist/include/mozilla/Logging.h on a CLOSED TREE
This makes CorruptionCanary constexpr as well.
Attachment #8983252 - Flags: review?(nfroyd)
Attachment #8983252 - Flags: review?(nfroyd) → review+
Depends on: 1467048

This crash doesn't really seem to be showing up, at least on release (not sure what's going on with ESR), and it'd be nice to just remove the Canary check if we wanted to re-inline LazyLogModule::operator LogModule*. WDYT about just removing the canary?

Flags: needinfo?(erahm)

(In reply to Nathan Froyd [:froydnj] from comment #34)

This crash doesn't really seem to be showing up, at least on release (not sure what's going on with ESR), and it'd be nice to just remove the Canary check if we wanted to re-inline LazyLogModule::operator LogModule*. WDYT about just removing the canary?

I have a somewhat unfounded theory that adding the canary got rid of crashes due to some weird msvc codegen. Now that we're building with clang-cl maybe that doesn't matter (and maybe my theory is bunk anyhow). I'm okay with removing it and seeing what happens.

Flags: needinfo?(erahm)

(In reply to Nathan Froyd [:froydnj] from comment #34)

This crash doesn't really seem to be showing up, at least on release (not sure what's going on with ESR), and it'd be nice to just remove the Canary check if we wanted to re-inline LazyLogModule::operator LogModule*. WDYT about just removing the canary?

Nathan's doing this in bug 1529955. If we don't see an uptick in crashes I'd say we can just close this bug.

We're not seeing this anymore in recent versions.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Group: dom-core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: