Segfault within LogModule::Get() under B2G_DEBUG=1

RESOLVED FIXED

Status

Firefox OS
Runtime
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: gerard, Assigned: gerard)

Tracking

unspecified
ARM
Gonk (Firefox OS)
Dependency tree / graph

Firefox Tracking Flags

(firefox48 fixed)

Details

(Whiteboard: fixed-in-pine)

MozReview Requests

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(4 attachments, 3 obsolete attachments)

(Assignee)

Description

2 years ago
So as documented on bug 1256506 comment 11, I discovered that we are hitting what looks like a race condition: LogModule::Get("chromium") gets hits before we got any chance to LogModule::Init().

This is with bug 1219482 relanded and no "mozilla::LogModule::Init()" call within b2g/app/nsBrowserApp.cpp as fabrice verified we hit this code via the NS_XPCOMInit2 call path.

Yet, turning on B2G_DEBUG either bypass that call path or is just racy.
(Assignee)

Comment 1

2 years ago
Created attachment 8732126 [details] [diff] [review]
Workaround racy chromium logging
(In reply to Alexandre LISSY :gerard-majax from comment #1)
> Created attachment 8732126 [details] [diff] [review]
> Workaround racy chromium logging

We really just need to find out what's calling this before it's initialized and fix that. Do you have a stack trace?
(Assignee)

Comment 3

2 years ago
Hm I have not kept a copy of it but I can surely generate it again. The moment we hit is at https://dxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/logging.cc#61

But it's tricky and probably due to processes taking more time to start with B2G_DEBUG=1 and Nuwa kicking in. I am not sure nor convinced LogModule is at fault at all.
(Assignee)

Comment 4

2 years ago
Created attachment 8732236 [details]
gdb backtrace

And a few lines of the logcat output matching my patch:
> 03-16 14:31:51.680  2731  2734 I Gecko   : LogModule::Get(aName=chromium) sLogModuleManager == nullptr? 1
> 03-16 14:31:51.680  2731  2734 F MOZ_Assert: Assertion failure: sLogModuleManager != nullptr, at /home/alex/codaz/Mozilla/gecko-cinnabar/xpcom/base/Logging.cpp:262
> 03-16 14:31:52.000  2724  2724 I Gecko   : LogModule::Init() sLogModuleManager == nullptr? 1
(Assignee)

Comment 5

2 years ago
Oh, xpcom/build/XPCOMInit.cpp:696 but line 491 has the LogModule::Init(). So I'm surprised in fact.
(In reply to Alexandre LISSY :gerard-majax from comment #5)
> Oh, xpcom/build/XPCOMInit.cpp:696 but line 491 has the LogModule::Init(). So
> I'm surprised in fact.

That stack isn't for a crash though, right?

In general I would guess the difference when setting |B2G_DEBUG| is that chromium logging is a noop on non-debug builds [1].

[1] https://dxr.mozilla.org/mozilla-central/rev/3e04659fdf6aef792f7cf9840189c6c38d08d1e8/ipc/chromium/src/base/logging.h#101-109
(Assignee)

Comment 7

2 years ago
(In reply to Eric Rahm [:erahm] from comment #6)
> (In reply to Alexandre LISSY :gerard-majax from comment #5)
> > Oh, xpcom/build/XPCOMInit.cpp:696 but line 491 has the LogModule::Init(). So
> > I'm surprised in fact.
> 
> That stack isn't for a crash though, right?
> 
> In general I would guess the difference when setting |B2G_DEBUG| is that
> chromium logging is a noop on non-debug builds [1].
> 
> [1]
> https://dxr.mozilla.org/mozilla-central/rev/
> 3e04659fdf6aef792f7cf9840189c6c38d08d1e8/ipc/chromium/src/base/logging.h#101-
> 109

That's a stack out of a breakpoint set on |MOZ_ASSERT(sLogModuleManager != nullptr);| at https://dxr.mozilla.org/mozilla-central/rev/3e04659fdf6aef792f7cf9840189c6c38d08d1e8/xpcom/base/Logging.cpp#261

As far as I remember, I was never able to catch the crash itself with gdb, I cannot tell why.
(Assignee)

Updated

2 years ago
Duplicate of this bug: 1256638
If it's too much work to track down the issue, I'd r+ a patch that reverts the chromium logging to NSPR logging.
(Assignee)

Comment 10

2 years ago
(In reply to Eric Rahm [:erahm] from comment #9)
> If it's too much work to track down the issue, I'd r+ a patch that reverts
> the chromium logging to NSPR logging.

It's more simple than this: this was blocking me from getting B2G OS booting with the new chrome:// URLs scheme. Now that this is working, I will be able to go back on this and verify carefully. My next target is checking if it's related to NUWA use in which case I guess I will have to ask help to people knowing NUWA properly :)
(Assignee)

Comment 11

2 years ago
Created attachment 8733020 [details]
gdb backtrace with MOZ_NUWA_PROCESS=0 in b2g/confvars.sh
(Assignee)

Comment 12

2 years ago
Created attachment 8733026 [details]
gdb backtrace with MOZ_NUWA_PROCESS=0 in b2g/confvars.sh

That one is what I get when adding breakpoint on the MOZ_ASSERT() line. I could confirm with my logging patch that the condition is the same.

Eric, before reverting, is this codepath you could have an idea what to check? Thanks!

Cervantes, just to make sure, is it enough to change MOZ_NUWA_PROCESS from 1 to 0 in b2g/confvars.sh to disable NUWA? Thanks!
Flags: needinfo?(erahm)
Flags: needinfo?(cyu)
(Assignee)

Comment 13

2 years ago
The backtrace shows we call |NS_InitXPCOM2()| from https://hg.mozilla.org/projects/pine/file/618b9b36a92c/toolkit/xre/nsAppRunner.cpp#l1520

Then, we get within |nsComponentManagerImpl::Init| at https://hg.mozilla.org/projects/pine/file/618b9b36a92c/xpcom/build/XPCOMInit.cpp#l696

Between those two calls, around the start of |NS_InitXPCOM2()| we have a call to |mozilla::LogModule::Init()| https://hg.mozilla.org/projects/pine/file/618b9b36a92c/xpcom/build/XPCOMInit.cpp#l491

Yet logging shows we don't execute the LogModule::Init() code before the LogModule::Get() gets triggered.
(In reply to Alexandre LISSY :gerard-majax from comment #12)
> Created attachment 8733026 [details]
> gdb backtrace with MOZ_NUWA_PROCESS=0 in b2g/confvars.sh
> 
> That one is what I get when adding breakpoint on the MOZ_ASSERT() line. I
> could confirm with my logging patch that the condition is the same.
> 
> Eric, before reverting, is this codepath you could have an idea what to
> check? Thanks!
> 
> Cervantes, just to make sure, is it enough to change MOZ_NUWA_PROCESS from 1
> to 0 in b2g/confvars.sh to disable NUWA? Thanks!

That codepath looks fine, I doubt the assertion will assert if you step to the next line. We need a stack dump from when it actually asserts.
Flags: needinfo?(erahm)
(Assignee)

Comment 15

2 years ago
So we are hitting logging infrastructure very early because of https://hg.mozilla.org/projects/pine/file/618b9b36a92c/ipc/chromium/src/base/at_exit.cc#l40
(Assignee)

Comment 16

2 years ago
That error, "Tried to RegisterCallback without an AtExitManager", is clearly something that has always been here. it is my guess that the new logging module is just making it more obvious/visible because it requires Init() to have been done.
(In reply to Alexandre LISSY :gerard-majax from comment #12)
The right way to disabling Nuwa is setting pref "dom.ipc.processPrelaunch.enabled" to false. We haven't used build config MOZ_NUWA_PROCESS=0 for several releases, and disabling it on gonk may fail in an unexpected way.
Flags: needinfo?(cyu)
(Assignee)

Comment 18

2 years ago
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #17)
> (In reply to Alexandre LISSY :gerard-majax from comment #12)
> The right way to disabling Nuwa is setting pref
> "dom.ipc.processPrelaunch.enabled" to false. We haven't used build config
> MOZ_NUWA_PROCESS=0 for several releases, and disabling it on gonk may fail
> in an unexpected way.

Thanks, disabling NUWA this way helps, I don't have the segfault anymore but I still see broken use of logging :)
Flags: needinfo?(cyu)
(Assignee)

Comment 19

2 years ago
Created attachment 8733322 [details]
adding NS_DebugBreak() and resolving symbols
Attachment #8732236 - Attachment is obsolete: true
Attachment #8733020 - Attachment is obsolete: true
Attachment #8733026 - Attachment is obsolete: true
(Assignee)

Comment 20

2 years ago
I think I have it.
Assignee: nobody → lissyx+mozillians
Flags: needinfo?(cyu)
(Assignee)

Comment 21

2 years ago
Created attachment 8733357 [details]
patch

That should be enough. I am making sure as of now.
(Assignee)

Comment 22

2 years ago
Created attachment 8733365 [details]
MozReview Request: Bug 1257786 - Proper init of LogModule for Nuwa r?khuey

Review commit: https://reviewboard.mozilla.org/r/41739/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/41739/
Attachment #8733365 - Flags: review?(khuey)
Comment on attachment 8733365 [details]
MozReview Request: Bug 1257786 - Proper init of LogModule for Nuwa r?khuey

https://reviewboard.mozilla.org/r/41739/#review38141
Attachment #8733365 - Flags: review?(khuey) → review+
(Assignee)

Comment 24

2 years ago
https://hg.mozilla.org/projects/pine/rev/901013516094
Whiteboard: fixed-in-pine

Comment 25

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/0536032d5ac7

Comment 26

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/0536032d5ac7
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox48: --- → fixed
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.