Status

P3
normal
8 years ago
7 months ago

People

(Reporter: blassey, Assigned: glandium)

Tracking

({mobile, perf})

Trunk
ARM
Android
mobile, perf
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: mobilestartupshrink)

Attachments

(2 attachments, 2 obsolete attachments)

Comment hidden (empty)
Keywords: mobile, perf
Created attachment 511476 [details] [diff] [review]
WIP

this patch has some configure goop to make pgo work for android, plus some stuff to make using other toolchains work. Unfortunately, the toolchain chokes on the -fprofile-use phase of the build

with ndk r4 gcc4.4 and ndk r5 gcc4.4, I get these sorts of errors while compiling mozalloc.cpp:

{standard input}:902: Error: branch out of range
{standard input}:1178: Error: branch out of range
{standard input}: Error: cant' resolve '_GLOBAL_OFFSET_TABLE_'
{standard input}: Error: cant' resolve '_GLOBAL_OFFSET_TABLE_'

with the ndk r5 gcc4.4.3 I get slightly different errors:
{standard input}: Assembler messages:
{standard input}:902: Error: branch out of range
{standard input}:1178: Error: branch out of range
{standard input}:951: Error: can't resolve `.bss' {.bss section} - `.LPIC92' {.text section}
{standard input}:952: Error: can't resolve `.text.unlikely' {.text.unlikely section} - `.LPIC102' {.text section}
{standard input}:1226: Error: can't resolve `.bss' {.bss section} - `.LPIC122' {.text section}
{standard input}:1227: Error: can't resolve `.text.unlikely' {.text.unlikely section} - `.LPIC132' {.text section}

Comment 2

8 years ago
(In reply to comment #1)
> Created attachment 511476 [details] [diff] [review]
> WIP
> 
> this patch has some configure goop to make pgo work for android, plus some
> stuff to make using other toolchains work. Unfortunately, the toolchain chokes
> on the -fprofile-use phase of the build
> 
> with ndk r4 gcc4.4 and ndk r5 gcc4.4, I get these sorts of errors while
> compiling mozalloc.cpp:
> 
> {standard input}:902: Error: branch out of range
> {standard input}:1178: Error: branch out of range
> {standard input}: Error: cant' resolve '_GLOBAL_OFFSET_TABLE_'
> {standard input}: Error: cant' resolve '_GLOBAL_OFFSET_TABLE_'
> 
> with the ndk r5 gcc4.4.3 I get slightly different errors:
> {standard input}: Assembler messages:
> {standard input}:902: Error: branch out of range
> {standard input}:1178: Error: branch out of range
> {standard input}:951: Error: can't resolve `.bss' {.bss section} - `.LPIC92'
> {.text section}
> {standard input}:952: Error: can't resolve `.text.unlikely' {.text.unlikely
> section} - `.LPIC102' {.text section}
> {standard input}:1226: Error: can't resolve `.bss' {.bss section} - `.LPIC122'
> {.text section}
> {standard input}:1227: Error: can't resolve `.text.unlikely' {.text.unlikely
> section} - `.LPIC132' {.text section}

I recall seeing something similar on desktop. Try taking out -fno-reorder-functions from our build. Can you also paste the full commandline that fails?
(Assignee)

Comment 3

7 years ago
For the record, I'm trying to build a NDK with gcc 4.6.1 which should allow PGO.
Whiteboard: mobilestartupshrink
(Assignee)

Updated

7 years ago
Depends on: 675572
(Assignee)

Comment 4

7 years ago
Created attachment 550402 [details] [diff] [review]
PGO support for Android

I've been using this patch with gcc 4.6 + gold, and building with:
$ make -f client.mk MOZ_PROFILE_GENERATE=1 MOZ_PROFILE_BASE=/sdcard/mozilla-pgo
$ make -C objdir package
install the apk on the device, start fennec, browse to sunspider
pull /sdcard/mozilla-pgo from the device, and copy the tree to / (if the objdir is in /tmp/objdir, the sdcard will contain /sdcard/mozilla-pgo/tmp/objdir/...)
$ make -f client.mk MOZ_PROFILE_USE=1

This however is not exactly in good shape to land as-is, because it also ends up moving the nss gcda files, and they wouldn't be purged by bug 659942.
(Assignee)

Comment 5

7 years ago
Created attachment 550403 [details] [diff] [review]
imported patch -nspr
(Assignee)

Comment 6

7 years ago
Created attachment 550404 [details] [diff] [review]
PGO support for Android, nspr part
(Assignee)

Updated

7 years ago
Attachment #550403 - Attachment is obsolete: true
(Assignee)

Comment 8

7 years ago
(In reply to comment #4)
> Created attachment 550402 [details] [diff] [review] [diff] [details] [review]
> PGO support for Android
> 
> I've been using this patch with gcc 4.6 + gold, and building with:
> $ make -f client.mk MOZ_PROFILE_GENERATE=1
> MOZ_PROFILE_BASE=/sdcard/mozilla-pgo
> $ make -C objdir package
> install the apk on the device, start fennec, browse to sunspider
> pull /sdcard/mozilla-pgo from the device, and copy the tree to / (if the
> objdir is in /tmp/objdir, the sdcard will contain
> /sdcard/mozilla-pgo/tmp/objdir/...)

Forgot to add:
$ make -f client.mk maybe_clobber_profiledbuild
here

> $ make -f client.mk MOZ_PROFILE_USE=1
(Assignee)

Updated

7 years ago
Attachment #511476 - Attachment is obsolete: true
(Assignee)

Comment 9

7 years ago
A new build, corresponding to yesterday's nightly, and profiled running V8, Sunspider and PageLoad from the Zippity Test Harness add-on:
http://people.mozilla.org/~mhommey/pgo/fennec-8.0a1.en-US.android-arm.1dddaeb1366b.pgo.apk

No wonders, it's still slower on Sunspider than the nightly or the corresponding non PGO GCC 4.6 build ( http://people.mozilla.org/~mhommey/pgo/fennec-8.0a1.en-US.android-arm.1dddaeb1366b.gcc4.6.apk ), but is faster on V8 than the nightly, but that is also the case with GCC 4.6 without PGO...
Priority: -- → P1
(Assignee)

Updated

7 years ago
Assignee: nobody → mh+mozilla
(Assignee)

Comment 10

7 years ago
So, I just got an apparently proper PGO profile for android, and the result is not very good, but better than before
Most sunspider tests are faster (between 3 and 12% depending on the test), except a few that are significantly slower, making the whole result lower
On the not so bright side, the apk is 900K bigger.

Sunspider result for PGOed build (best of 3 runs):
http://www.webkit.org/perf/sunspider-0.9.1/sunspider-0.9.1/results.html?%7B%22v%22:%20%22sunspider-0.9.1%22,%20%223d-cube%22:%5B191,193,193,197,191,194,195,190,172,192%5D,%223d-morph%22:%5B79,78,78,77,78,78,77,80,77,78%5D,%223d-raytrace%22:%5B184,179,183,180,178,182,176,181,177,177%5D,%22access-binary-trees%22:%5B42,42,43,43,43,42,43,42,42,43%5D,%22access-fannkuch%22:%5B147,147,103,158,146,147,147,144,146,146%5D,%22access-nbody%22:%5B202,205,202,202,204,203,203,202,113,110%5D,%22access-nsieve%22:%5B71,73,72,71,73,72,74,72,73,72%5D,%22bitops-3bit-bits-in-byte%22:%5B6,6,6,6,6,5,6,6,6,6%5D,%22bitops-bits-in-byte%22:%5B54,54,45,53,53,53,53,54,53,55%5D,%22bitops-bitwise-and%22:%5B18,17,16,16,17,17,17,17,17,17%5D,%22bitops-nsieve-bits%22:%5B39,38,39,37,37,36,37,37,37,39%5D,%22controlflow-recursive%22:%5B25,24,24,24,24,23,23,24,23,25%5D,%22crypto-aes%22:%5B109,114,108,103,113,103,102,104,104,106%5D,%22crypto-md5%22:%5B55,51,55,51,52,52,51,52,53,53%5D,%22crypto-sha1%22:%5B33,33,33,33,33,33,33,35,33,35%5D,%22date-format-tofte%22:%5B154,154,151,150,148,151,152,150,149,150%5D,%22date-format-xparb%22:%5B140,140,139,138,141,141,140,140,138,140%5D,%22math-cordic%22:%5B79,80,80,79,79,79,80,79,79,79%5D,%22math-partial-sums%22:%5B118,116,118,116,117,116,117,117,116,118%5D,%22math-spectral-norm%22:%5B77,78,78,85,77,78,77,81,76,77%5D,%22regexp-dna%22:%5B95,91,96,94,94,96,94,94,95,96%5D,%22string-base64%22:%5B35,34,37,34,33,36,34,34,36,35%5D,%22string-fasta%22:%5B100,100,102,101,100,101,101,102,102,103%5D,%22string-tagcloud%22:%5B151,152,156,152,152,155,154,155,154,158%5D,%22string-unpack-code%22:%5B158,153,152,154,153,153,153,153,157,151%5D,%22string-validate-input%22:%5B73,73,72,72,73,80,72,72,73,72%5D%7D

Sunspider result for non-PGOed build (best of 3 runs):
http://www.webkit.org/perf/sunspider-0.9.1/sunspider-0.9.1/results.html?%7B%22v%22:%20%22sunspider-0.9.1%22,%20%223d-cube%22:%5B196,196,196,195,195,195,193,193,193,196%5D,%223d-morph%22:%5B89,78,78,80,78,79,77,77,78,80%5D,%223d-raytrace%22:%5B177,180,180,180,189,180,179,179,184,178%5D,%22access-binary-trees%22:%5B45,46,46,46,45,47,45,45,45,45%5D,%22access-fannkuch%22:%5B148,145,146,147,101,145,142,143,147,147%5D,%22access-nbody%22:%5B78,79,77,78,80,78,76,76,89,78%5D,%22access-nsieve%22:%5B72,72,73,74,73,73,70,70,73,74%5D,%22bitops-3bit-bits-in-byte%22:%5B6,5,6,6,6,6,6,7,6,6%5D,%22bitops-bits-in-byte%22:%5B56,54,43,55,54,53,54,54,45,46%5D,%22bitops-bitwise-and%22:%5B16,17,17,17,17,17,17,16,17,17%5D,%22bitops-nsieve-bits%22:%5B40,39,48,39,37,39,152,37,38,39%5D,%22controlflow-recursive%22:%5B24,24,24,23,22,26,24,29,34,26%5D,%22crypto-aes%22:%5B110,110,108,111,109,110,104,105,117,109%5D,%22crypto-md5%22:%5B55,54,55,56,53,55,51,53,53,54%5D,%22crypto-sha1%22:%5B34,35,35,35,35,35,34,35,36,34%5D,%22date-format-tofte%22:%5B173,170,173,171,174,171,165,167,169,171%5D,%22date-format-xparb%22:%5B157,154,150,148,153,146,144,148,149,148%5D,%22math-cordic%22:%5B80,80,80,80,80,80,79,79,80,79%5D,%22math-partial-sums%22:%5B115,116,113,112,114,114,112,112,113,114%5D,%22math-spectral-norm%22:%5B78,79,77,76,77,77,76,76,78,77%5D,%22regexp-dna%22:%5B94,95,95,93,94,93,93,90,94,95%5D,%22string-base64%22:%5B43,35,39,39,38,38,39,35,39,37%5D,%22string-fasta%22:%5B104,106,104,104,105,106,104,101,105,106%5D,%22string-tagcloud%22:%5B166,165,164,165,164,166,160,160,165,168%5D,%22string-unpack-code%22:%5B160,160,160,160,161,162,156,158,162,162%5D,%22string-validate-input%22:%5B77,76,74,77,78,80,77,78,77,75%5D%7D

The significant differences where PGO is slower are:
access nbody:             *2.34x as slow*     78.9ms +/- 3.4%    184.6ms +/- 14.9%     significant
math partial-sums:      *1.030x as slow*   113.5ms +/- 0.9%    116.9ms +/- 0.5%     significant
math spectral-norm:     *1.017x as slow*    77.1ms +/- 0.9%     78.4ms +/- 2.4%     significant

Only the first is a *really* significant.

I haven't compared to the corresponding tinderbox build.

If people want to test the builds:
http://people.mozilla.org/~mhommey/pgo/fennec-9.0a1.en-US.android-arm.19a5f6177257.gcc4.6.apk for the non-PGOed build
http://people.mozilla.org/~mhommey/pgo/fennec-9.0a1.en-US.android-arm.19a5f6177257.pgo.apk for the PGOed build
(should be up in about 5 to 10 minutes)

Corresponding changeset is 19a5f6177257 (from build-system)
(Assignee)

Comment 12

7 years ago
Note that the PGO build is actually only less than 200K bigger than the tinderbox build. The 900K are between gcc 4.6 and gcc 4.6+PGO.
(Assignee)

Comment 13

7 years ago
Best of 3 runs with the tinderbox build:
http://www.webkit.org/perf/sunspider-0.9.1/sunspider-0.9.1/results.html?%7B%22v%22:%20%22sunspider-0.9.1%22,%20%223d-cube%22:%5B199,214,184,199,204,200,188,198,185,187%5D,%223d-morph%22:%5B81,81,79,80,79,79,79,79,80,80%5D,%223d-raytrace%22:%5B189,187,190,186,191,192,194,193,192,196%5D,%22access-binary-trees%22:%5B46,46,45,47,46,45,46,46,48,45%5D,%22access-fannkuch%22:%5B145,106,144,145,103,147,146,147,146,147%5D,%22access-nbody%22:%5B77,78,80,78,83,80,79,79,80,78%5D,%22access-nsieve%22:%5B74,72,73,74,76,76,75,75,77,73%5D,%22bitops-3bit-bits-in-byte%22:%5B7,6,6,6,6,6,6,5,6,6%5D,%22bitops-bits-in-byte%22:%5B53,55,44,46,54,53,44,54,53,55%5D,%22bitops-bitwise-and%22:%5B17,17,17,17,17,16,17,17,16,17%5D,%22bitops-nsieve-bits%22:%5B39,40,38,39,38,38,40,39,38,40%5D,%22controlflow-recursive%22:%5B24,27,24,25,25,24,26,26,26,26%5D,%22crypto-aes%22:%5B112,152,112,114,121,125,116,126,114,114%5D,%22crypto-md5%22:%5B54,56,54,55,54,53,55,56,53,56%5D,%22crypto-sha1%22:%5B36,36,35,36,35,35,35,36,35,35%5D,%22date-format-tofte%22:%5B179,181,178,177,176,178,182,179,179,176%5D,%22date-format-xparb%22:%5B162,162,160,159,160,160,159,166,161,162%5D,%22math-cordic%22:%5B80,79,80,80,80,85,87,87,87,79%5D,%22math-partial-sums%22:%5B116,117,126,116,118,115,118,118,118,117%5D,%22math-spectral-norm%22:%5B77,78,78,77,78,76,79,79,85,85%5D,%22regexp-dna%22:%5B97,95,94,94,95,93,95,94,94,94%5D,%22string-base64%22:%5B36,34,37,36,36,37,36,36,40,36%5D,%22string-fasta%22:%5B115,127,118,116,117,115,118,115,121,117%5D,%22string-tagcloud%22:%5B169,169,170,171,169,172,170,171,235,171%5D,%22string-unpack-code%22:%5B166,167,168,166,168,169,166,172,171,168%5D,%22string-validate-input%22:%5B74,74,75,74,76,75,76,75,73,74%5D%7D

Between 2 and 18% improvement depending on the test, except for access:nbody, which is 2.33x as slow.
(Assignee)

Comment 14

7 years ago
(In reply to Mike Hommey [:glandium] from comment #13)
> Between 2 and 18% improvement depending on the test, except for
> access:nbody, which is 2.33x as slow.

for the PGO build, compared to the tinderbox build, that is.
(Assignee)

Comment 15

7 years ago
As a side node, latest NSS broke ARM PGO builds because of the MPI assembly using too many registers for the profiling code to be happy.
OS: Android → MeeGo
Target Milestone: --- → mozilla9
Version: Trunk → Other Branch
(Assignee)

Updated

7 years ago
OS: MeeGo → Android
Target Milestone: mozilla9 → ---
Version: Other Branch → Trunk
(Assignee)

Updated

7 years ago
Depends on: 736066
I recommend that we postpone building NSS itself with PGO until we have the testing infrastructure issues with NSS resolved (which will happen in Q2). NSS has never been tested in a PGO configuration, on any platform,

AFAICT, almost none of NSS is exercised by the profile-gathering runs, so it might be better for performance to build it with normal link-time optimization instead of PGO. I don't know how the profiler works exactly w.r.t. DLLs, but I would hate to see all of NSS de-optimized as cold/dead code. Has anybody measured the performance difference?
P1's that have been inactive for 22 months are not P1's.
Priority: P1 → P3
We're building Fennec with gcc 4.9 now. I wonder if it might be worth looking at PGO again.
Summary: enable pgo for android → Enable PGO for Android
Nathan, what do you think? PGO could be a thing now?
Flags: needinfo?(nfroyd)
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #19)
> Nathan, what do you think? PGO could be a thing now?

Um, sure?  I'd want some measurements first, though; I'm fairly sure GCC doesn't really try tuning their PGO stuff for ARM Android...
Flags: needinfo?(nfroyd)

Updated

7 months ago
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.