Closed Bug 1260208 Opened 8 years ago Closed 8 years ago

switch Fennec builds to use libc++ by default

Tracking

(firefox49 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla49

Tracking Flags:

Tracking

Status

firefox49

---

fixed

People

(Reporter: froydnj, Assigned: froydnj)

References

Details

Attachments

(7 files)

part 1 - use C I/O facilities in TestWebGLElementArrayCache instead of C++ ones 8 years ago Nathan Froyd [:froydnj] 1.89 KB, patch	BenWa : review+	Details \| Diff \| Splinter Review
part 2 - update repackaged r11b NDK to include support headers 8 years ago Nathan Froyd [:froydnj] 1017 bytes, patch	snorp : review+	Details \| Diff \| Splinter Review
part 3 - switch to libc++ when building Fennec 8 years ago Nathan Froyd [:froydnj] 729 bytes, patch	snorp : review+	Details \| Diff \| Splinter Review
part 4 - update mozboot's android NDK URL for r11b and above 8 years ago Nathan Froyd [:froydnj] 9.34 KB, patch	nalexander : review+	Details \| Diff \| Splinter Review
part 5 - add libc++ license to about:license when using it on Android 8 years ago Nathan Froyd [:froydnj] 4.28 KB, patch	gerv : review+ nalexander : review+	Details \| Diff \| Splinter Review
part 6 - use correct implementation namespace for cubeb_resampler_internal hack 8 years ago Nathan Froyd [:froydnj] 1.43 KB, patch	kinetik : review+	Details \| Diff \| Splinter Review
Correctly compare EHTable when sorting (v1) 8 years ago (inactive) Jim Chen [:jchen] [:darchons] 1.65 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review

Nathan Froyd [:froydnj]

Assignee

Description

•

8 years ago

Newer versions of the NDK have a (presumably) fixed libc++.  We can ditch stlport and move that much closer to having a modern C++ library everywhere.

Nathan Froyd [:froydnj]

Assignee

Comment 1

•

8 years ago

Attached patch part 1 - use C I/O facilities in TestWebGLElementArrayCache instead of C++ ones — Details — Splinter Review

The libc++ included with the Android NDK does not seem to work correctly
with std::cerr; writing to it (or to std::cout, as confirmed by tests)
causes the process to hang indefinitely, causing test failures.  Using
fprintf and stderr, however, seems to work correctly.

Attachment #8736017 - Flags: review?(bgirard)

Nathan Froyd [:froydnj]

Assignee

Comment 2

•

8 years ago

Attached patch part 2 - update repackaged r11b NDK to include support headers — Details — Splinter Review

Because I failed to check it for the first NDK I built.

Attachment #8736018 - Flags: review?(snorp)

Nathan Froyd [:froydnj]

Assignee

Comment 3

•

8 years ago

Attached patch part 3 - switch to libc++ when building Fennec — Details — Splinter Review

This needs some about:licenses changes not included here; I'll try uploading
those tomorrow, since there are some conflicts with the libc++/OS X work I'm
doing elsewhere.

Attachment #8736019 - Flags: review?(nalexander)

Benoit Girard (:BenWa)

Updated

•

8 years ago

Attachment #8736017 - Flags: review?(bgirard) → review+

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Updated

•

8 years ago

Attachment #8736018 - Flags: review?(snorp) → review+

Nick Alexander :nalexander [he/him]

Comment 4

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #1)
> Created attachment 8736017 [details] [diff] [review]
> part 1 - use C I/O facilities in TestWebGLElementArrayCache instead of C++
> ones
> 
> The libc++ included with the Android NDK does not seem to work correctly
> with std::cerr; writing to it (or to std::cout, as confirmed by tests)
> causes the process to hang indefinitely, causing test failures.  Using
> fprintf and stderr, however, seems to work correctly.

froydnj: how can we possibly use this?  There's 180 writers in the tree: https://dxr.mozilla.org/mozilla-central/search?tree=mozilla-central&q=%22std%3A%3Acerr%22&redirect=true.  Are you hoping every path through std::cerr is exercised in tests, preventing anybody from burning us?

As for flipping the default, I'd like to see bootstrap updated when we do that: https://dxr.mozilla.org/mozilla-central/source/python/mozboot/mozboot/android.py#261.  But I won't block on that, if you really don't want to test it -- it can be a follow-up ticket.

Flags: needinfo?(nfroyd)

Nick Alexander :nalexander [he/him]

Comment 5

•

8 years ago

Comment on attachment 8736019 [details] [diff] [review]
part 3 - switch to libc++ when building Fennec

Review of attachment 8736019 [details] [diff] [review]:
-----------------------------------------------------------------

Redirecting to snorp.  I appreciate being looped in but I just don't know enough about the C++ and linking story on Android to review.

Attachment #8736019 - Flags: review?(nalexander) → review?(snorp)

Nathan Froyd [:froydnj]

Assignee

Comment 6

•

8 years ago

(In reply to Nick Alexander :nalexander from comment #4)
> (In reply to Nathan Froyd [:froydnj] from comment #1)
> > Created attachment 8736017 [details] [diff] [review]
> > part 1 - use C I/O facilities in TestWebGLElementArrayCache instead of C++
> > ones
> > 
> > The libc++ included with the Android NDK does not seem to work correctly
> > with std::cerr; writing to it (or to std::cout, as confirmed by tests)
> > causes the process to hang indefinitely, causing test failures.  Using
> > fprintf and stderr, however, seems to work correctly.
> 
> froydnj: how can we possibly use this?  There's 180 writers in the tree:
> https://dxr.mozilla.org/mozilla-central/search?tree=mozilla-
> central&q=%22std%3A%3Acerr%22&redirect=true.  Are you hoping every path
> through std::cerr is exercised in tests, preventing anybody from burning us?

Those instances fall into the following categories:

* gmp-plugin-openh264: If people want to log on Android, I'd be a little surprised.
* gfx/angle/: I think this is Windows-only code, so it doesn't matter.
* media/mtransport/test/: We actually only run a small number of these tests on Android:

jsep_track_unittest
sdp_unittests
mediaconduit_unittests (gets built, but doesn't actually run any tests, see test logs on TH or search for MOZ_WEBRTC_TESTS)
signaling_unittests (same)

jsep_track_unittest, at least, appears to output to std::cerr just fine, according to the logs.  So I'm not entirely sure what's going on there.

* media/webrtc/trunk/: I don't think we care about any of these.
* security/nss/: We don't run NSS tests.
* testing/gtest/: Android doesn't run gtests, AFAICT (!).
* toolkit/crashreporter/google-breakpad/: I think these are for tools that don't run on Android?

> As for flipping the default, I'd like to see bootstrap updated when we do
> that:
> https://dxr.mozilla.org/mozilla-central/source/python/mozboot/mozboot/
> android.py#261.  But I won't block on that, if you really don't want to test
> it -- it can be a follow-up ticket.

This actually looks pretty easy, as direct downloading from the URL works, despite the popup license you get when clicking on the link in the NDK download webpage.  I'll try it out.

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Assignee

Comment 7

•

8 years ago

Attached patch part 4 - update mozboot's android NDK URL for r11b and above — Details — Splinter Review

Direct downloading worked out OK, and a test run on my Debian system got it
installed to the correct location.

Attachment #8736329 - Flags: review?(nalexander)

Nathan Froyd [:froydnj]

Assignee

Comment 8

•

8 years ago

Attached patch part 5 - add libc++ license to about:license when using it on Android — Details — Splinter Review

We are now statically linking libc++ from the Android NDK into our binaries, so
we need to note its license in about:license.

Attachment #8736337 - Flags: review?(nalexander)

Attachment #8736337 - Flags: review?(gerv)

Nick Alexander :nalexander [he/him]

Comment 9

•

8 years ago

Comment on attachment 8736329 [details] [diff] [review]
part 4 - update mozboot's android NDK URL for r11b and above

Review of attachment 8736329 [details] [diff] [review]:
-----------------------------------------------------------------

wfm.

::: python/mozboot/mozboot/android.py
@@ +258,5 @@
>          print(MOBILE_ANDROID_MOZCONFIG_TEMPLATE % (sdk_path, ndk_path))
>  
>  
> +def android_ndk_url(os_name, ver='r11b'):
> +    # Produce a URL like 'https://dl.google.com/android/repository/android-ndk-r11b-linux-x86_64.zip

nit: closing quote.

@@ +266,5 @@
>          arch = 'x86_64'
>      else:
>          arch = 'x86'
>  
> +    return '%s-%s-%s-%s.zip' % (base_url, ver, os_name, arch)

Oh hey, they went back to regular packaging.  \o/

Attachment #8736329 - Flags: review?(nalexander) → review+

Nick Alexander :nalexander [he/him]

Comment 10

•

8 years ago

Comment on attachment 8736337 [details] [diff] [review]
part 5 - add libc++ license to about:license when using it on Android

Review of attachment 8736337 [details] [diff] [review]:
-----------------------------------------------------------------

I'll defer to gerv for the licensing details.

::: toolkit/content/moz.build
@@ +16,5 @@
>      DEFINES['ANDROID_PACKAGE_NAME'] = CONFIG['ANDROID_PACKAGE_NAME']
>  
>  if CONFIG['MOZ_ANDROID_CXX_STL'] == 'mozstlport':
>      DEFINES['USE_STLPORT'] = True
> +if CONFIG['MOZ_ANDROID_CXX_STL'] == 'libc++':

This should be in the cut-over patch that snorp is reviewing, right?  In any case, r=me.

Attachment #8736337 - Flags: review?(nalexander) → review+

Nick Alexander :nalexander [he/him]

Comment 11

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #6)
> (In reply to Nick Alexander :nalexander from comment #4)
> > (In reply to Nathan Froyd [:froydnj] from comment #1)
> > > Created attachment 8736017 [details] [diff] [review]
> > > part 1 - use C I/O facilities in TestWebGLElementArrayCache instead of C++
> > > ones
> > > 
> > > The libc++ included with the Android NDK does not seem to work correctly
> > > with std::cerr; writing to it (or to std::cout, as confirmed by tests)
> > > causes the process to hang indefinitely, causing test failures.  Using
> > > fprintf and stderr, however, seems to work correctly.
> > 
> > froydnj: how can we possibly use this?  There's 180 writers in the tree:
> > https://dxr.mozilla.org/mozilla-central/search?tree=mozilla-
> > central&q=%22std%3A%3Acerr%22&redirect=true.  Are you hoping every path
> > through std::cerr is exercised in tests, preventing anybody from burning us?
> 
> Those instances fall into the following categories:
> 
> * gmp-plugin-openh264: If people want to log on Android, I'd be a little
> surprised.
> * gfx/angle/: I think this is Windows-only code, so it doesn't matter.
> * media/mtransport/test/: We actually only run a small number of these tests
> on Android:
> 
> jsep_track_unittest
> sdp_unittests
> mediaconduit_unittests (gets built, but doesn't actually run any tests, see
> test logs on TH or search for MOZ_WEBRTC_TESTS)
> signaling_unittests (same)
> 
> jsep_track_unittest, at least, appears to output to std::cerr just fine,
> according to the logs.  So I'm not entirely sure what's going on there.
> 
> * media/webrtc/trunk/: I don't think we care about any of these.
> * security/nss/: We don't run NSS tests.
> * testing/gtest/: Android doesn't run gtests, AFAICT (!).
> * toolkit/crashreporter/google-breakpad/: I think these are for tools that
> don't run on Android?

It just seems to be incredibly fragile.  That is, you see bad behaviour that you can't explain; work around the one case in our tests; and then have an argument that it's not actually an issue, with no expectation that it won't be an issue in the future.

However, making such calls is life.  If you and snorp are comfortable with this, roll on!

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 12

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #1)
> Created attachment 8736017 [details] [diff] [review]
> part 1 - use C I/O facilities in TestWebGLElementArrayCache instead of C++
> ones
> 
> The libc++ included with the Android NDK does not seem to work correctly
> with std::cerr; writing to it (or to std::cout, as confirmed by tests)
> causes the process to hang indefinitely, causing test failures.  Using
> fprintf and stderr, however, seems to work correctly.

Weird. Do we know what it's trying to do? I'm not very comfortable with this, as there is no chance anyone would expect this type of failure. I don't want to randomly start getting hangs because someone logs with std::cerr. I didn't see this patch when I r+ed the other one :/

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 13

•

8 years ago

I'm building with libc++ now to debug the cerr/cout hang

Gervase Markham [:gerv]

Updated

•

8 years ago

Attachment #8736337 - Flags: review?(gerv) → review+

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 14

•

8 years ago

Comment on attachment 8736019 [details] [diff] [review]
part 3 - switch to libc++ when building Fennec

Review of attachment 8736019 [details] [diff] [review]:
-----------------------------------------------------------------

I looked into the std::cout/cerr hangs and can't make heads or tails of it. I guess I'm ok with working around it as you've done here for now.

Attachment #8736019 - Flags: review?(snorp) → review+

Nathan Froyd [:froydnj]

Assignee

Comment 15

•

8 years ago

Attached patch part 6 - use correct implementation namespace for cubeb_resampler_internal hack — Details — Splinter Review

Recently added code fails when compiled with libc++.  This is a temporary
workaround, and the same as we used in gfx/skia/skia/include/private/SkTLogic.h.

Attachment #8740724 - Flags: review?(kinetik)

Nathan Froyd [:froydnj]

Assignee

Comment 16

•

8 years ago

I compiled things with try and everything looked good except rc1:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ca0b3d62c83
https://treeherder.mozilla.org/logviewer.html#?job_id=19359370&repo=try

snorp, jchen, do either of you have special insight here?  The failure doesn't look libc++ related....

Flags: needinfo?(snorp)

Flags: needinfo?(nchen)

Matthew Gregan [:kinetik]

Updated

•

8 years ago

Attachment #8740724 - Flags: review?(kinetik) → review+

(inactive) Jim Chen [:jchen] [:darchons]

Comment 17

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #16)
> I compiled things with try and everything looked good except rc1:
> 
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ca0b3d62c83
> https://treeherder.mozilla.org/logviewer.html#?job_id=19359370&repo=try
> 
> snorp, jchen, do either of you have special insight here?  The failure
> doesn't look libc++ related....

Looks like some kind of memory corruption? If you can reproduce it locally, I think it'll be a lot easier to diagnose.

Flags: needinfo?(nchen)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 18

•

8 years ago

Nathan and I both looked into this robocop crash. I couldn't seem to repro locally for some reason. Between this and the weird std::cerr issue (which looked like a bogus vptr to me), we think there might be a linker bug being tickled by libc++. Mike do you have any ideas here about some things we could try to debug some more?

Flags: needinfo?(snorp) → needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Comment 19

•

8 years ago

From the crash:
 08:40:56     INFO -   0  libmozglue.so!jemalloc_crash [jemalloc.c:3ca0b3d62c83 : 1625 + 0x2]
 08:40:56     INFO -       r0 = 0x00000000    r1 = 0x00000000    r2 = 0xe5e5e5e5    r3 = 0x00000000
 08:40:56     INFO -       r4 = 0x54f001f0    r5 = 0x62688000    r6 = 0x00000000    r7 = 0x00000030
 08:40:56     INFO -       r8 = 0x54f00044    r9 = 0x54f00040   r10 = 0x00000000   r12 = 0x528d6e84
 08:40:56     INFO -       fp = 0x000052b5    sp = 0x52b4aec0    lr = 0x528636cd    pc = 0x5285eb56
 08:40:56     INFO -      Found by: given as instruction pointer in context
 08:40:56     INFO -   1  libmozglue.so!arena_malloc [jemalloc.c:3ca0b3d62c83 : 4300 + 0x3]
 08:40:56     INFO -       r4 = 0x54f001f0    r5 = 0x62688000    r6 = 0x00000000    r7 = 0x00000030
 08:40:56     INFO -       r8 = 0x54f00044    r9 = 0x54f00040   r10 = 0x00000000    fp = 0x000052b5
 08:40:56     INFO -       sp = 0x52b4aec0    pc = 0x528636cd
 08:40:56     INFO -      Found by: call frame info
 08:40:56     INFO -   2  libmozglue.so!je_malloc [jemalloc.c:3ca0b3d62c83 : 4317 + 0x7]
 08:40:56     INFO -       r4 = 0x00000029    r5 = 0x00000028    r6 = 0x544770f0    r7 = 0x544770f0
 08:40:56     INFO -       r8 = 0x00000029    r9 = 0x642ec800   r10 = 0x00000000    fp = 0x000052b5
 08:40:56     INFO -       sp = 0x52b4aee0    pc = 0x528639c7
08:40:56 INFO - Found by: call frame info

jemalloc.c:4317 is:
    return (arena_malloc(choose_arena(), size, false));

jemalloc.c:4300 is:
    RELEASE_ASSERT(arena->magic == ARENA_MAGIC);

which suggests choose_arena() returns something broken.

And choose_arena, on Android, looks like (when removing irrelevant #ifdefed code):
    if (isthreaded && narenas > 1) {
         ...
    } else
         ret = arenas[0];

and narenas is always 1.

So something is overwriting jemalloc's arena struct, which doesn't look really good.

You could try something like adding a page-sized dummy field (minus 4 bytes) following arena_s's magic field, and mprotect the first page of the arena, so that whatever is doing the overwrite crashes. Which would then tell you what is doing the overwriting... assuming that still happens.

Flags: needinfo?(mh+mozilla)

Nathan Froyd [:froydnj]

Assignee

Comment 20

•

8 years ago

I have caught this on my Android device, and GDB tells me arena->magic is ARENA_MAGIC, even though jemalloc insists that it isn't.  Which is always good.  Maybe some sort of race?

Mike Hommey [:glandium]

Comment 21

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #20)
> I have caught this on my Android device, and GDB tells me arena->magic is
> ARENA_MAGIC, even though jemalloc insists that it isn't.  Which is always
> good.  Maybe some sort of race?

Are you saying you caught the crash but gdb was telling you jemalloc shouldn't have crashed in the first place?

Nathan Froyd [:froydnj]

Assignee

Comment 22

•

8 years ago

(In reply to Mike Hommey [:glandium] from comment #21)
> (In reply to Nathan Froyd [:froydnj] from comment #20)
> > I have caught this on my Android device, and GDB tells me arena->magic is
> > ARENA_MAGIC, even though jemalloc insists that it isn't.  Which is always
> > good.  Maybe some sort of race?
> 
> Are you saying you caught the crash but gdb was telling you jemalloc
> shouldn't have crashed in the first place?

That's correct.

Nathan Froyd [:froydnj]

Assignee

Comment 23

•

8 years ago

(In reply to Mike Hommey [:glandium] from comment #19)
> You could try something like adding a page-sized dummy field (minus 4 bytes)
> following arena_s's magic field, and mprotect the first page of the arena,
> so that whatever is doing the overwrite crashes. Which would then tell you
> what is doing the overwriting... assuming that still happens.

I did this, and the crash moved to a different assert.  No segfault from the page protection.  Examining things in GDB again suggests the crash should not have happened.

(inactive) Jim Chen [:jchen] [:darchons]

Comment 24

•

8 years ago

Attached patch Correctly compare EHTable when sorting (v1) — Details — Splinter Review

Let's see if this works or not, https://treeherder.mozilla.org/#/jobs?repo=try&revision=e887f439393b

operator< for EHTable compares the LHS start PC with the RHS *end* PC.
Because the ranges are non-overlapping, this works fine for two distinct
EHTables. However, the comparison doesn't work if LHS and RHS refer to
the same EHTable; in that case operator< returns true, even though it
should return false because the two operands are identical.

The operator is used to sort a std::vector using std::sort [1].  I think
the libc++ std::sort implementation has a quirk where, if the comparison
function has the above bug, sort will sometimes get confused, and start
sorting "values" outside of the memory range that it's given.  This
results in memory corruption and subsequent unpredictable behavior.

The fix is simply to compare only the start PCs in EHTable, so that
std::sort can work on it correctly.

[1] http://mxr.mozilla.org/mozilla-central/source/tools/profiler/core/EHABIStackWalk.cpp?rev=86730d0a8209#485

Nathan Froyd [:froydnj]

Assignee

Comment 25

•

8 years ago

(In reply to Jim Chen [:jchen] [:darchons] from comment #24)
> operator< for EHTable compares the LHS start PC with the RHS *end* PC.
> Because the ranges are non-overlapping, this works fine for two distinct
> EHTables. However, the comparison doesn't work if LHS and RHS refer to
> the same EHTable; in that case operator< returns true, even though it
> should return false because the two operands are identical.
> 
> The operator is used to sort a std::vector using std::sort [1].  I think
> the libc++ std::sort implementation has a quirk where, if the comparison
> function has the above bug, sort will sometimes get confused, and start
> sorting "values" outside of the memory range that it's given.  This
> results in memory corruption and subsequent unpredictable behavior.
> 
> The fix is simply to compare only the start PCs in EHTable, so that
> std::sort can work on it correctly.

I wonder whether we should just check if &lhs == &rhs and return false in that case before moving on to the start < end check.

(inactive) Jim Chen [:jchen] [:darchons]

Comment 26

•

8 years ago

Comment on attachment 8747124 [details] [diff] [review]
Correctly compare EHTable when sorting (v1)

(In reply to Nathan Froyd [:froydnj] from comment #25)
> I wonder whether we should just check if &lhs == &rhs and return false in
> that case before moving on to the start < end check.

Checking start < start seems simpler. The rc1 try run did pass testANRReporter (it failed later in testDistribution, but that looks unrelated to this specific problem).

Attachment #8747124 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Assignee

Comment 27

•

8 years ago

Comment on attachment 8747124 [details] [diff] [review]
Correctly compare EHTable when sorting (v1)

Review of attachment 8747124 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent.

Attachment #8747124 - Flags: review?(nfroyd) → review+

Jan Beich

Updated

•

8 years ago

Blocks: 1269251

Pulsebot

Comment 28

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/8986d6fa6a5c
https://hg.mozilla.org/integration/mozilla-inbound/rev/185f0a2bfb14
https://hg.mozilla.org/integration/mozilla-inbound/rev/c3aad9e8d78d
https://hg.mozilla.org/integration/mozilla-inbound/rev/b2dab7186ca8
https://hg.mozilla.org/integration/mozilla-inbound/rev/d72fd0796e7e
https://hg.mozilla.org/integration/mozilla-inbound/rev/4e848a9f85c4

Carsten Book [:Tomcat]

Comment 29

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/8986d6fa6a5c
https://hg.mozilla.org/mozilla-central/rev/185f0a2bfb14
https://hg.mozilla.org/mozilla-central/rev/c3aad9e8d78d
https://hg.mozilla.org/mozilla-central/rev/b2dab7186ca8
https://hg.mozilla.org/mozilla-central/rev/d72fd0796e7e
https://hg.mozilla.org/mozilla-central/rev/4e848a9f85c4

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox49: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 49

Nathan Froyd [:froydnj]

Assignee

Comment 30

•

8 years ago

Perfherder graphs registered a notable change in libxul size from these commits:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=3dd2dae0c4020efec0263c28e2aacbc0a669cd0e&newProject=mozilla-inbound&newRevision=4e848a9f85c4751449f4f072fbb3b2801a8b6d3d&originalSignature=c5af03f2e8b46331d479333c6b66802580f1be8a&newSignature=c5af03f2e8b46331d479333c6b66802580f1be8a&framework=2

or about ~120K in libxul.so size.  This was expected; what was not expected was a much larger regression in the APK size:

https://treeherder.mozilla.org/perf.html#/alerts?id=1090

I investigated, and this turns out to be the fault of our non-release builds adding --enable-profiling as a configure option.  This configure option turns on -funwind-tables for ARM:

http://dxr.mozilla.org/mozilla-central/source/build/autoconf/frameptr.m4#12

which in turn causes a bunch of extra symbols to be referenced, presumably from the unwind information added by the option.  These extra symbols include the C++ demangler, for pretty error messages from exception handling (which we don't use, but such is life).  The important thing is that these extra symbols are referenced by a bunch of different shared objects that we create: libmozglue.so and the plugin-container{,-pie}.so files, each of which grows by 200-300K due to these extra symbols.

We are not passing -Wl,--gc-sections to remove unused symbols for the plugin-container binaries, but given that we *are* passing -Wl,--gc-sections for libmozglue and still getting the extra symbols, I think it's safe to say that trying to remove them from plugin-container wouldn't have any effect.

The good news is that compiling without --enable-profiling makes these numbers go down significantly; plugin-container no longer includes unecessary symbols, and mozglue gets somewhat smaller as well.  I verified this locally (indeed, it took me a little while to figure out that --enable-profiling was the culprit!) and pushed a job to try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b546b1f063df&selectedJob=20416061

without --enable-profiling on top of a mozilla-central revision that contains the libc++ patches.  The sizes recorded are:

Size of fennec-49.0a1.en-US.android-arm.apk: 37626921 bytes
Size of libxul.so: 24752751 bytes
Size of omni.ja: 6032727 bytes

Compare the numbers from a central revision without the libc++ patches:

Size of fennec-49.0a1.en-US.android-arm.apk: 37427851 bytes
Size of libxul.so: 24658568 bytes
Size of omni.ja: 6033161 bytes

There are a lot of other revisions in there, so the numbers aren't strictly comparable, but at least they are much closer to the same ballpark than we were before.  I think we might be able to win back the mozglue size regressions if we figure out some way of not linking in the libc++ demangler implementation.

mfinkle is the person that I have seen posting the most about APK sizes, so given the above analysis, are you OK with calling this seemingly large change benign?

Flags: needinfo?(mark.finkle)

Mark Finkle (:mfinkle) (use needinfo?)

Comment 31

•

8 years ago

Seems reasonable. Given the desire to move to a modern C++ lib, this increases are not out of line.

Thankfully have other work happening to further reduce the APK size.

Flags: needinfo?(mark.finkle)

BMO Automation

Updated

•

5 years ago

Product: Firefox for Android → Firefox Build System

Target Milestone: Firefox 49 → mozilla49

You need to log in before you can comment on or make changes to this bug.