Closed Bug 1249849 Opened 8 years ago Closed 8 years ago

DLL blocklist is using the wrong implementations of malloc and free

Categories

(Core :: mozglue, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox45 + wontfix
firefox46 + fixed
firefox47 + fixed
firefox48 --- fixed

People

(Reporter: bugzilla, Assigned: bugzilla)

References

(Blocks 2 open bugs)

Details

Attachments

(3 files)

Some UniquePtrs in WindowsDllBlocklist end up using jemalloc for operator new and MSVCRT for operator delete.

After a considerable amount of digging, I realized that this MOZ_NO_MALLOC code is generating references to operator new and operator delete that need to be resolved by the linker. Since those operators are declared as extern, and MSVCRT's operators are not inline, they end up being resolved by the linker and we are left with whatever it works out.

I am 99.9% certain that this is the cause of the Windows heap corruption that we were seeing in bug 1243098 and friends.

My knee-jerk response is to implement inline new and delete operators for the DLL blocklist so that it uses its own. OTOH, I have yet to get my head around all of the complexities surrounding mozalloc et al that I might be missing an alternative.
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #0)
> My knee-jerk response is to implement inline new and delete operators for
> the DLL blocklist so that it uses its own. OTOH, I have yet to get my head
> around all of the complexities surrounding mozalloc et al that I might be
> missing an alternative.

Mike, any suggestions?
Flags: needinfo?(mh+mozilla)
Try removing that MOZ_NO_MOZALLOC. Back when the blocklist was moved to mozglue, it made sense, but now that mozalloc is in mozglue as well, it makes less sense.
Flags: needinfo?(mh+mozilla)
[Tracking Requested - why for this release]:
This could cause heap corruption which is a pretty serious quality issue. Based on what I've seen, adding additional heap operations to the DLL blocklist is enough to trigger this.
Tracking for 45+ since this approach may fix a group of crashes (maybe startup crashes).
Aaron, is this something you intend to work on? If not, can you suggest who might take a deeper look? Thanks.
Flags: needinfo?(aklotz)
Yeah, I'll take it.
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
Flags: needinfo?(aklotz)
Too late for 45 as today is release day!
Attached patch RFCSplinter Review
After disabling MOZ_NO_MOZALLOC, the problems persist:

- Allocating goes through jemalloc because operator new calls moz_xmalloc.
- Freeing calls free_impl, which maps to free() as expected outside of mozglue, but ends up referencing MSVC free() within mozglue.

It looks to me that any references to malloc_impl or free_impl in this code need to be map to the mozglue *_impl functions.

This patch does that as a successful test... is this the right idea?
Attachment #8727993 - Flags: feedback?(mh+mozilla)
Comment on attachment 8727993 [details] [diff] [review]
RFC

Review of attachment 8727993 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/build/WindowsDllBlocklist.cpp
@@ +2,5 @@
>  /* This Source Code Form is subject to the terms of the Mozilla Public
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
> +#include "mozmemory_wrap.h"

I'd #define MOZ_MEMORY_IMPL above this include, instead of putting it in moz.build.

@@ +6,5 @@
> +#include "mozmemory_wrap.h"
> +#define MALLOC_FUNCS MALLOC_FUNCS_MALLOC
> +#define MALLOC_DECL(name, return_type, ...) \
> +  extern "C" MOZ_MEMORY_API return_type name ## _impl(__VA_ARGS__);
> +#include "malloc_decls.h"

Copying the comment from memory/mozalloc/mozalloc.cpp could be a good thing.
Attachment #8727993 - Flags: feedback?(mh+mozilla) → feedback+
Summary: MOZ_NO_MOZALLOC code in mozglue needs inline new and delete operators → DLL blocklist is using the wrong implementations of malloc and free
Comment on attachment 8732978 [details]
MozReview Request: Bug 1249849: Make sure the correct implementations of malloc and free are used in DLL blocklist; r?glandium

https://reviewboard.mozilla.org/r/41485/#review38409

Can you file a followup to make things such that we don't need to solve this one libmozglue .cpp file at a time?
Attachment #8732978 - Flags: review?(mh+mozilla) → review+
https://hg.mozilla.org/mozilla-central/rev/c26f9e1dbad7
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8732978 [details]
MozReview Request: Bug 1249849: Make sure the correct implementations of malloc and free are used in DLL blocklist; r?glandium

Approval Request Comment
[Feature/regressing bug #]: DLL Blocklist
[User impact if declined]: Heap corruption leading to crashes or undefined behaviour
[Describe test coverage new/current, TreeHerder]: Verified locally by stepping into code to ensure that the correct malloc and free functions are being called.
[Risks and why]: Very low, trivial fix
[String/UUID change made/needed]: None
Attachment #8732978 - Flags: approval-mozilla-beta?
Attachment #8732978 - Flags: approval-mozilla-aurora?
Aaron, do you think it is best to land this cautiously on aurora first, then beta? Or jump right to land this on beta 6 next Monday? Is there anything else we can do to verify this or test around it? I am torn between landing it for beta 6 so we have time to respond to problems, vs. verifying on aurora and landing in beta 7 or 8.  Fixing a bunch of startup crashes and hangs sounds very tempting!  What do you think?
Flags: needinfo?(aklotz)
I'd recommend that we go straight to beta. This patch only affects one source file and I have manually verified that it fixes all allocations within that source file.
Flags: needinfo?(aklotz)
Comment on attachment 8732978 [details]
MozReview Request: Bug 1249849: Make sure the correct implementations of malloc and free are used in DLL blocklist; r?glandium

Straight to beta it is, sounds good for crash fixes
Attachment #8732978 - Flags: approval-mozilla-beta?
Attachment #8732978 - Flags: approval-mozilla-beta+
Attachment #8732978 - Flags: approval-mozilla-aurora?
Attachment #8732978 - Flags: approval-mozilla-aurora+
This is needed to fix problems on non-Nightly builds. Since malloc_decls.h is not exported when MOZ_REPLACE_ALLOC is not defined, we still need to be able to find it in the case of mozglue.
Flags: needinfo?(aklotz)
Attachment #8735998 - Flags: review?(mh+mozilla)
Attachment #8735998 - Flags: review?(mh+mozilla) → review+
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #20)
> Created attachment 8735998 [details] [diff] [review]
> Build fix for non-nightly channels
> 
> This is needed to fix problems on non-Nightly builds. Since malloc_decls.h
> is not exported when MOZ_REPLACE_ALLOC is not defined, we still need to be
> able to find it in the case of mozglue.

so this need now to land on beta and aurora ?
Flags: needinfo?(aklotz)
Yes. It should also land on central or else we'll have the same problem come merge day.
Flags: needinfo?(aklotz)
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #22)
> Yes. It should also land on central or else we'll have the same problem come
> merge day.

landed on m-i as https://hg.mozilla.org/integration/mozilla-inbound/rev/cc70dae5693b
You need to log in before you can comment on or make changes to this bug.