Closed Bug 938730 Opened 11 years ago Closed 11 years ago

avoid mix of memory allocators (crashes) when using system sqlite

Categories

(Toolkit :: Storage, defect)

25 Branch
All
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28
Tracking Status
firefox28 --- fixed

People

(Reporter: KaiE, Assigned: KaiE)

References

Details

(Whiteboard: [qa-])

Attachments

(1 file, 2 obsolete files)

I'm running into Firefox crashes and misbehaviour (nonworking cookies) on Fedora Linux.

I identified the cause to be a mix of memory allocators for sqlite,
initially reported at https://bugzilla.redhat.com/show_bug.cgi?id=1007603

I can reproduce only with a non-default NSS configuration, where I'm using the modern NSS database file format that's using sqlite. However, since we'd like to upgrade to that new sotrage file format, it's important to get this fixed.

What happens is:
- Mozilla inits NSS
- NSS calls a sqlite API (doesn't matter which, but it's sqlite3_mprintf)
- sqlite detects the need for init, and initializes itself using defaults
- Mozilla proceeds to init a storage service,
  which registers a different allocator to be used by sqlite

This results in a crash on shutdown.
(Although I cannot explain why, this also causes cookies to be broken for me. Maybe because some global sqlite code is already using the default allocator, and things fail when the other allocator is being used later on.)

Mozilla can init sqlite using either jemalloc or sqlite's default alloc code. I found there's already a makefile symbol, MOZ_STORAGE_MEMORY to choose one or the other.

Also, you already disable that on Android.

I suggest to disable it by default on Linux, too.
I think just fixing/implementing 730495 as proposed could fix the underlying problem.

It looks like not defining MOZ_STORAGE_MEMORY would regress the dark matter/memory detector on linux, causing it to not know about the SQLite memory allocations.  Potentially worse is that if jemalloc is used but the storage wrappers are not used, memory use by SQLite will bloat in a non-trivial fashion.  (This could be addressed, but if we're doing work, might as well do the work in bug 730495.)

See http://mxr.mozilla.org/mozilla-central/source/storage/src/mozStorageService.cpp#351
See Also: → 730495
(In reply to Andrew Sutherland (:asuth) from comment #2)
> I think just fixing/implementing 730495 as proposed could fix the underlying
> problem.

But that bug doesn't have any patch yet.

Also, it's not clear to me how you'd solve bug 730495 in a way that works in my configuration.

Please note that NSS is build and packaged separately on Linux distributions. NSS cannot call a Mozilla provided init function, because the NSS package is used by other software, too.
(In reply to Kai Engert (:kaie) from comment #3)
> Please note that NSS is build and packaged separately on Linux
> distributions. NSS cannot call a Mozilla provided init function, because the
> NSS package is used by other software, too.

... unless we'd define a callback... which is unnecessary, because if you were willing to register a callback function with NSS prior to doing anything else, instead you should rather init sqlite alloc very early, prior to anything else.
I agree that we cannot disable  MOZ_STORAGE_MEMORY on all Linux builds, because the functionality is too valuable and we have many (most) of our developers using Linux. If bug 730495 takes too long to fix, then maybe we could consider avoiding MOZ_STORAGE_MEMORY when --use-system-nss is define.
Depends on: 730495
Comment on attachment 832415 [details] [diff] [review]
patch v1, don't define MOZ_STORAGE_MEMORY on {Linux,Android}

Review of attachment 832415 [details] [diff] [review]:
-----------------------------------------------------------------

Andrew is exactly right in comment 2 -- this change will cause sqlite memory usage to be higher, and also prevent DMD from working.  Bug 730495 is the right solution.  In fact, I'm disappointed to see that MOZ_STORAGE_MEMORY is not defined on Android -- why is that?
Attachment #832415 - Flags: review-
(In reply to Andrew Sutherland (:asuth) from comment #2)
> I think just fixing/implementing 730495 as proposed could fix the underlying
> problem.

And to clarify, I'm pro us implementing the solution that works with how Linux distributions package Firefox for that bug.  While supporting system SQL in general is a complicating factor (No one likes writing configure.in checks! :), moving the sqlite3_config call way up in the core app startup process does not seem particularly intractable.
(In reply to Brian Smith (:briansmith, was :bsmith; Please NEEDINFO? me if you want a response) from comment #5)
> I agree that we cannot disable  MOZ_STORAGE_MEMORY on all Linux builds,
> because the functionality is too valuable and we have many (most) of our
> developers using Linux. If bug 730495 takes too long to fix, then maybe we
> could consider avoiding MOZ_STORAGE_MEMORY when --use-system-nss is define.

Yes, avoiding MOZ_STORAGE_MEMORY when using system nss is the way to go. Not something tied with the OS, which has nothing to do with the problem.
(In reply to Mike Hommey [:glandium] from comment #8)
> (In reply to Brian Smith from comment #5)
> > If bug 730495 takes too long to fix, then maybe we
> > could consider avoiding MOZ_STORAGE_MEMORY when --use-system-nss is define.
> 
> Yes, avoiding MOZ_STORAGE_MEMORY when using system nss is the way to go. Not
> something tied with the OS, which has nothing to do with the problem.

I like that approach, too. Attaching patch.
Attachment #832415 - Attachment is obsolete: true
Attachment #832855 - Flags: review?(bugmail)
Attached patch patch v3Splinter Review
same patch as v2 - added a comment
Attachment #832855 - Attachment is obsolete: true
Attachment #832855 - Flags: review?(bugmail)
Attachment #832872 - Flags: review?
Attachment #832872 - Flags: review? → review?(bugmail)
Actually...

Should we rather disable jemalloc for sqlite when using --enable-system-sqlite ?

This would be a more generic solution, which would work with other libraries that use sqlite, too, not just NSS.
Comment on attachment 832872 [details] [diff] [review]
patch v3

Sorry for the delay, I was really hoping that we could do the "just make sure we initialize SQLite before NSS/really early and have a really strong crasher assert on that" and just now got more time to look into it.  AKA do that in bug 730495 and system SQLite could be happy too.  Unfortunately, it does seem like doing that is complicated and may cause weird dependency issues.  Specifically:

If not for Fennec, I think we might be able to add SQLite init to XPCOMGlueStartup or some other common xulrunner-app code path.  We could add a similar call-out from Fennec, but because it wants to load libxul after libsqlite3 (which conceptually could be a system sqlite for our purposes), that suggests putting code into some other common library.  This adds a fair number of moving parts.

Another option is to address this by making sure EnsureNSSInitialized/nsNSSComponent calls the initialize method before calling  NSS_Initialize (or any preceding helpers it might have.)  But based on that Fennec bridge thing, I'm wondering if this has the same problem.


Either way, it seems reasonably clear cut that the fix proposed in bug 730495 is very simple, as is the general fix proposed here.  I wish the linux-distribution-built stuff could get the jemalloc and memory reporter wins, but that'll have to come in a fancier follow-up.

I am needinfo-ing mbrubeck in case he knows of any obvious common init paths and library configurations that could allow us to do make sure apps call a custom SQL init function regardless of the presence of system SQLite.


r=asuth contingent on changing from using _USE_SYSTEM_NSS to MOZ_NATIVE_SQLITE as :kaie proposed in comment 11.  This really only resolves the NSS issue through the correlation of both being used at the same time until bug 730495 is fixed, but ideally :nnethercote or someone will pounce on that once this lands.

:mak, I'm assuming you're cool with this since you haven't chimed in.  We can always back out / land a better solution later.
Attachment #832872 - Flags: review?(bugmail)
Attachment #832872 - Flags: review+
Attachment #832872 - Flags: feedback?(mbrubeck)
(In reply to Andrew Sutherland (:asuth) from comment #12)
> :mak, I'm assuming you're cool with this since you haven't chimed in.

Yep!
Attachment #832872 - Flags: feedback?(mbrubeck) → feedback+
Assignee: nobody → kaie
Adjusting summary, as this bug is specifically about the use of system sqlite.

The use of internal sqlite is handled in bug 730495.
Summary: mix of sqlite memory allocators causes NSS crashes on Linux → avoid mix of memory allocators (crashes) when using system sqlite
Thanks for the r+
I've adjusted the patch based on comment 12.

I've tested and confirmed that the patch fixes the crash in the Fedora 19 build configuration.

https://hg.mozilla.org/integration/mozilla-inbound/rev/247ff2131af5
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
No longer depends on: 730495
It seems the patch doesn't work. I still the problem with Firefox 26 on Fedora 19 which has picked up the patch.

I suspect symbol _USE_SYSTEM_NSS isn't visible inside the makefile we have patched. (I don't understand why I had seen success in my tests. Maybe I had made a mistake during testing.)
Did Fedora 19 pick up the landed patch that uses MOZ_NATIVE_SQLITE instead of _USE_SYSTEM_NSS?

We both AC_DEFINE and AC_SUBST MOZ_NATIVE_SQLITE.  This should get us the symbol in both source files and Makefiles.

_USE_SYSTEM_NSS appears to be internal to configure.in, so it wouldn't be surprising if that didn't work.  (There is an AC_SUBST(MOZ_NATIVE_NSS) that would work, though.)
Whiteboard: [qa-]
Depends on: 1026828
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: