Closed Bug 1559379 Opened 1 year ago Closed 1 year ago

opt build of mozglue/tests/TestBaseProfiler.cpp has mismatched `operator new` and `je_free`

Categories

(Core :: mozglue, defect)

x86_64
Windows 10
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox68 --- unaffected
firefox69 --- fixed

People

(Reporter: gerald, Assigned: glandium)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Before bug 1553363, I had the old "goop" at the top of each mozglue/baseprofiler//.cpp files, so that allocations and deallocations would always be done through the same memory manager.
At that time TestBaseProfiler was working fine -- or maybe I just didn't experience the issue then.

Since bug 1553363 landed, there is no more need for the "goop", and all is well when using the Base Profiler in Firefox.
However TestBaseProfiler doesn't work anymore in "opt" builds.
E.g., see https://treeherder.mozilla.org/#/jobs?repo=try&revision=79ac8c7ad7233bc4c2621d119f1f5bffa74400c1 , notice the failures in 'cpp' tests in some Windows opt builds, but never in debug.

I can reproduce this locally: --disable-optimize or --enable-optimize="-O1" work fine, but --enable-optimize fails.

Note that Base Profiler must be manually enabled on Windows, by uncommenting # define MOZ_BASE_PROFILER near the top of mozglue/baseprofiler/public/BaseProfiler.h

Trying to debug: (Sorry if I give too much or too little info, I'm not sure what's happening; hopefully an expert can work with this...)

Thanks to WinDbg with TTD (Time Travel Debugging) I was finally able to isolate the issue to platform.cpp:AppendSharedLibraries(). The assembly looks like this:

mozglue!mozilla::baseprofiler::AppendSharedLibraries:
00007ffe6278be0b e84023ffff call mozglue!mozilla::JSONWriter::EscapedString::EscapedString (00007ffe6277e150)

Internally this calls

mozglue!mozilla::JSONWriter::EscapedString::EscapedString:
00007ffe6277e1c2 e865840300 call mozglue!operator new[] (00007ffe627b662c)

which calls ucrtbase!_malloc_base -> ntdll!RtlAllocateHeap... This looks like standard Windows mallocs.

The AppendSharedLibraries() function writes an escaped string in there, which is then copied elsewhere. Then it frees the escaped string:

[...]
00007ffe6278be41 e80a4dfeff call mozglue!je_free (00007ffe62770b50)

But now this goes to mozglue!Allocator<MozJemallocBase>::free, which (of course) fails an assertion for magic arena numbers in mozjemalloc.cpp:arena_dalloc().

Note that the allocation happens under JSONWriter::EscapedString::EscapedString, which is in mfbt/JSONWriter.h.
And the deallocation happens under mozilla::baseprofiler::AppendSharedLibraries, which is in mozglue/baseprofiler/core/platform.cpp -- though that bit of code was inlined from JSONWriter.h as well!

In -O1, I see that JSONWriter::EscapedString::EscapedString() calls operator new[] (as above), but then AppendSharedLibraries calls operator delete[], so all is well there. But why is that latter one different when super-optimized?

Any help appreciated.
Did bug 1553363 miss something?

NI:glandium as discussed 😉 TIA!

Flags: needinfo?(mh+mozilla)

So according to objdump -dr Unified_cpp_mozglue_baseprofiler0.obj, this is what the code is supposed to look like:

00000000 <??0EscapedString@JSONWriter@mozilla@@QAE@PBD@Z>:
(...)
  63:   e8 00 00 00 00          call   68 <??0EscapedString@JSONWriter@mozilla@@QAE@PBD@Z+0x68>
                        64: DISP32      _moz_xmalloc

So the linker is supposed to transform that call to a call to the moz_xmalloc function (and dumpbin agrees).

But in the final mozglue.dll, this is what it looks like:

mozglue!mozilla::JSONWriter::EscapedString::EscapedString [z:\build\build\src\obj-firefox\dist\include\mozilla\JSONWriter.h @ 156]:
(...)
  185 10017a03 e80b3d0300      call    mozglue!operator new[] (1004b713)

It was transformed to a call to operator new[], in the crt.

And the reason this happens is because there is another instance of EscapedString::EscapedString that comes from Unified_cpp_memory_replace_dmd0.obj and that does have a relocation to operator new[].

Which brings another question: should DMD's JSONWriter be allocating memory with something else than DMD's table of original allocator functions?

Flags: needinfo?(mh+mozilla) → needinfo?(n.nethercote)

Here is green try of the same push as in comment 0, with DMD disabled: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=253448779&revision=7eb80bbd190d64e0ccad762b17de02e84c940910

Which brings another question: should DMD's JSONWriter be allocating memory with something else than DMD's table of original allocator functions?

from IRC:

<glandium> njn: as it stands, when DMD uses the JSONWriter, the JSONWriter allocates memory via operator new, which goes through malloc. On OSX and Linux, that 
           circles back to DMD ; on Windows, that actually goes to the system allocator.
<glandium> is that a desirable situation?
<glandium> should it instead allocate via the mozjemalloc table DMD holds?
 * njn thihnks
<njn> glandium: I don't think it really matters one way or the other
<njn> the JSONWriter is only used when Analyze() is called
<njn> glandium: AutoBlockIntercepts is used in Analyze(), so DMD won't track it's own allocations: 
      https://searchfox.org/mozilla-central/source/memory/replace/dmd/DMD.cpp#1629
<njn> so even if those allcoations circle back to DMD, DMD will then route them to the underlying mozjemalloc table
<njn> because of this: https://searchfox.org/mozilla-central/source/memory/replace/dmd/DMD.cpp#1112-1116
<njn> the OSX and Linux behaviour you described is what I would expect; I don't understand why on Windows it goes to the system allocator
<glandium> linker fun is what's going on on Windows

I think we can make this work a little better by exposing the operator new/delete symbols from mozglue.dll (which may actually have indirect benefits on other platforms as well).

Assignee: nobody → mh+mozilla
Flags: needinfo?(n.nethercote)

Bug 1147248 added the workaround for GCC 4.9, but from an attempt with
GCC 6, it seems unnecessary anymore.

  • On Android, we were already doing it, but using fallible allocations.
  • On *nix, it probably doesn't make a difference, but can't hurt. For
    most things in Gecko, operator new/delete are inlined and thus
    replaced by direct calls to the underlying allocator functions
    (moz_xmalloc, malloc, etc.). This may have a benefit for some third
    party libraries that would otherwise go through libstdc++'s to
    eventually end up back into our allocator via the zone allocator
    on macOS and via the exported symbols on others.
  • On Windows, because of how some CRT static libraries are, a non-inlined
    operator new (thanks to some disabled STL wrapping) would end up linked
    against the system malloc, causing problems.

Overall, this can only be better. This also reduces the number of places
where we define those functions.

And on Android, this means operator new within mozglue becomes infallible,
which is more consistent with everything else.

I tried your patch with TestBaseProfiler (which was failing on Windows as per comment #0), and it's now working. Thank you! 🎉
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dda27fd9e96bf211567f9d675c043530196c6133

Blocks: 1557564

(In reply to Tom Ritter [:tjr] from comment #7)

I was wondering if this would help with Bug 1547519 but it caused build failures: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9eb16964e6e28d7d790a3279a4bcefb48350a4e7&selectedJob=253757789

Actually, this is great, because it shows, as a build error, what's going wrong in bug 1547519. Which is actually better than pretending everything is find and realize it's not when you actually run the build. So I'll just land as is, and will let you disable jemalloc on those builds as we agreed last week.

Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/ee4f23ea8530
Remove GCC ASAN workaround that seems to be unnecessary nowadays. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/0defd54899e2
Export C++ allocation functions from mozglue on all platforms. r=froydnj
Depends on: 1562063
Flags: needinfo?(mh+mozilla)
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/06e23355e12f
Remove GCC ASAN workaround that seems to be unnecessary nowadays. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/1254bad83887
Export C++ allocation functions from mozglue on all platforms. r=froydnj
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
You need to log in before you can comment on or make changes to this bug.