Closed Bug 1257786 Opened 8 years ago Closed 8 years ago

Segfault within LogModule::Get() under B2G_DEBUG=1

Categories

(Firefox OS Graveyard :: Runtime, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(firefox48 fixed)

RESOLVED FIXED
Tracking Status
firefox48 --- fixed

People

(Reporter: gerard-majax, Assigned: gerard-majax)

References

Details

(Whiteboard: fixed-in-pine)

Attachments

(4 files, 3 obsolete files)

So as documented on bug 1256506 comment 11, I discovered that we are hitting what looks like a race condition: LogModule::Get("chromium") gets hits before we got any chance to LogModule::Init().

This is with bug 1219482 relanded and no "mozilla::LogModule::Init()" call within b2g/app/nsBrowserApp.cpp as fabrice verified we hit this code via the NS_XPCOMInit2 call path.

Yet, turning on B2G_DEBUG either bypass that call path or is just racy.
(In reply to Alexandre LISSY :gerard-majax from comment #1)
> Created attachment 8732126 [details] [diff] [review]
> Workaround racy chromium logging

We really just need to find out what's calling this before it's initialized and fix that. Do you have a stack trace?
Hm I have not kept a copy of it but I can surely generate it again. The moment we hit is at https://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/logging.cc#61

But it's tricky and probably due to processes taking more time to start with B2G_DEBUG=1 and Nuwa kicking in. I am not sure nor convinced LogModule is at fault at all.
Attached file gdb backtrace (obsolete) —
And a few lines of the logcat output matching my patch:
> 03-16 14:31:51.680  2731  2734 I Gecko   : LogModule::Get(aName=chromium) sLogModuleManager == nullptr? 1
> 03-16 14:31:51.680  2731  2734 F MOZ_Assert: Assertion failure: sLogModuleManager != nullptr, at /home/alex/codaz/Mozilla/gecko-cinnabar/xpcom/base/Logging.cpp:262
> 03-16 14:31:52.000  2724  2724 I Gecko   : LogModule::Init() sLogModuleManager == nullptr? 1
Oh, xpcom/build/XPCOMInit.cpp:696 but line 491 has the LogModule::Init(). So I'm surprised in fact.
(In reply to Alexandre LISSY :gerard-majax from comment #5)
> Oh, xpcom/build/XPCOMInit.cpp:696 but line 491 has the LogModule::Init(). So
> I'm surprised in fact.

That stack isn't for a crash though, right?

In general I would guess the difference when setting |B2G_DEBUG| is that chromium logging is a noop on non-debug builds [1].

[1] https://dxr.mozilla.org/mozilla-central/rev/3e04659fdf6aef792f7cf9840189c6c38d08d1e8/ipc/chromium/src/base/logging.h#101-109
(In reply to Eric Rahm [:erahm] from comment #6)
> (In reply to Alexandre LISSY :gerard-majax from comment #5)
> > Oh, xpcom/build/XPCOMInit.cpp:696 but line 491 has the LogModule::Init(). So
> > I'm surprised in fact.
> 
> That stack isn't for a crash though, right?
> 
> In general I would guess the difference when setting |B2G_DEBUG| is that
> chromium logging is a noop on non-debug builds [1].
> 
> [1]
> https://dxr.mozilla.org/mozilla-central/rev/
> 3e04659fdf6aef792f7cf9840189c6c38d08d1e8/ipc/chromium/src/base/logging.h#101-
> 109

That's a stack out of a breakpoint set on |MOZ_ASSERT(sLogModuleManager != nullptr);| at https://dxr.mozilla.org/mozilla-central/rev/3e04659fdf6aef792f7cf9840189c6c38d08d1e8/xpcom/base/Logging.cpp#261

As far as I remember, I was never able to catch the crash itself with gdb, I cannot tell why.
If it's too much work to track down the issue, I'd r+ a patch that reverts the chromium logging to NSPR logging.
(In reply to Eric Rahm [:erahm] from comment #9)
> If it's too much work to track down the issue, I'd r+ a patch that reverts
> the chromium logging to NSPR logging.

It's more simple than this: this was blocking me from getting B2G OS booting with the new chrome:// URLs scheme. Now that this is working, I will be able to go back on this and verify carefully. My next target is checking if it's related to NUWA use in which case I guess I will have to ask help to people knowing NUWA properly :)
That one is what I get when adding breakpoint on the MOZ_ASSERT() line. I could confirm with my logging patch that the condition is the same.

Eric, before reverting, is this codepath you could have an idea what to check? Thanks!

Cervantes, just to make sure, is it enough to change MOZ_NUWA_PROCESS from 1 to 0 in b2g/confvars.sh to disable NUWA? Thanks!
Flags: needinfo?(erahm)
Flags: needinfo?(cyu)
The backtrace shows we call |NS_InitXPCOM2()| from https://hg.mozilla.org/projects/pine/file/618b9b36a92c/toolkit/xre/nsAppRunner.cpp#l1520

Then, we get within |nsComponentManagerImpl::Init| at https://hg.mozilla.org/projects/pine/file/618b9b36a92c/xpcom/build/XPCOMInit.cpp#l696

Between those two calls, around the start of |NS_InitXPCOM2()| we have a call to |mozilla::LogModule::Init()| https://hg.mozilla.org/projects/pine/file/618b9b36a92c/xpcom/build/XPCOMInit.cpp#l491

Yet logging shows we don't execute the LogModule::Init() code before the LogModule::Get() gets triggered.
(In reply to Alexandre LISSY :gerard-majax from comment #12)
> Created attachment 8733026 [details]
> gdb backtrace with MOZ_NUWA_PROCESS=0 in b2g/confvars.sh
> 
> That one is what I get when adding breakpoint on the MOZ_ASSERT() line. I
> could confirm with my logging patch that the condition is the same.
> 
> Eric, before reverting, is this codepath you could have an idea what to
> check? Thanks!
> 
> Cervantes, just to make sure, is it enough to change MOZ_NUWA_PROCESS from 1
> to 0 in b2g/confvars.sh to disable NUWA? Thanks!

That codepath looks fine, I doubt the assertion will assert if you step to the next line. We need a stack dump from when it actually asserts.
Flags: needinfo?(erahm)
So we are hitting logging infrastructure very early because of https://hg.mozilla.org/projects/pine/file/618b9b36a92c/ipc/chromium/src/base/at_exit.cc#l40
That error, "Tried to RegisterCallback without an AtExitManager", is clearly something that has always been here. it is my guess that the new logging module is just making it more obvious/visible because it requires Init() to have been done.
(In reply to Alexandre LISSY :gerard-majax from comment #12)
The right way to disabling Nuwa is setting pref "dom.ipc.processPrelaunch.enabled" to false. We haven't used build config MOZ_NUWA_PROCESS=0 for several releases, and disabling it on gonk may fail in an unexpected way.
Flags: needinfo?(cyu)
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #17)
> (In reply to Alexandre LISSY :gerard-majax from comment #12)
> The right way to disabling Nuwa is setting pref
> "dom.ipc.processPrelaunch.enabled" to false. We haven't used build config
> MOZ_NUWA_PROCESS=0 for several releases, and disabling it on gonk may fail
> in an unexpected way.

Thanks, disabling NUWA this way helps, I don't have the segfault anymore but I still see broken use of logging :)
Flags: needinfo?(cyu)
Attachment #8732236 - Attachment is obsolete: true
Attachment #8733020 - Attachment is obsolete: true
Attachment #8733026 - Attachment is obsolete: true
I think I have it.
Assignee: nobody → lissyx+mozillians
Flags: needinfo?(cyu)
Attached file patch
That should be enough. I am making sure as of now.
Comment on attachment 8733365 [details]
MozReview Request: Bug 1257786 - Proper init of LogModule for Nuwa r?khuey

https://reviewboard.mozilla.org/r/41739/#review38141
Attachment #8733365 - Flags: review?(khuey) → review+
https://hg.mozilla.org/projects/pine/rev/901013516094
Whiteboard: fixed-in-pine
https://hg.mozilla.org/mozilla-central/rev/0536032d5ac7
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.