Closed Bug 1556022 Opened 5 years ago Closed 4 years ago

speedof.me reports download speeds for Geckoview_example and Fenix to be ~40% to 50% of Chrome's - Moto G5

Categories

(Core :: Graphics, task, P3)

ARM
Android
task

Tracking

()

RESOLVED WORKSFORME
Performance Impact high
Tracking Status
firefox69 --- affected

People

(Reporter: acreskey, Unassigned)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Keywords: perf:pageload)

I collected speedof.me download speed results for geckoview_example, Firefox Preview, and Chrome on two android devices, the Pixel 3 and the reference phone, Moto G5.
(Based on Kamyar's observations)

speedof.me results

On the Moto G5, download speeds for Geckoview_example and Fenix are ~40% and ~50% of Chrome's, respectively.

On the Pixel 3, download speeds for Geckoview_example and Fenix are ~84% and ~94% of Chrome's, respectively.

Latency also appears to be lower on Chrome.

There is a short writeup on speedof.me regarding how the tests work:
It downloads progressively larger contiguous files until they take longer than 8 seconds to download. The timing for the last one is used.

I also run some tests where I modified network preferences.
Increasing network.http.max-connections doesn't look to improve gecko download speed.
This makes sense since from looking at what the site does it's fewer resources downloaded, but they are quite large.

I tried increasing others prefs that I thought might possibly impact this:
network.http.max-persistent-connections-per-server
network.http.spdy.default-hpack-buffer (set to 4k on android)
network.http.spdy.push-allowance (set to 32k on android)

These didn't appear to impact the reported speed.
The resources I saw in the dev tools did come through via http/2.

If anyone knows of other configurations that could impact this, let me know and I'm happy to try them.

This is a simpleperf capture from the Moto G5 while the test is running:
http://bit.ly/2W44Tt4

I just did 4 runs on my Moto G5 with default settings.

GVE:
Download: 39M, 51M, 46M, 47M
Upload: 38M, 41M, 39M, 41M

Fenix:
Download: 53M, 47M, 48M, 56M
Upload: 44M, 39M, 43M, 43M

Chrome:
Download: 54M, 62M, 59M, 60M
Upload: 55M, 36M, 56M, 54M

Which looks like we are still slower to Chrome, but not that bad?

Looks like the G5 is close to where the P3 was - 90%-ish of Chrome on download, though maybe 75% on upload. I think a new profile would also be in order, and a re-check on a P3 (and maybe a few more runs; the noise per run is high, so 4 runs has a wide error bar). Thanks Sean!

Sean, I wonder if the network you are using is leading to these results?

I re-ran this test and my results still match Kamyar's comment 1:

GV_E (mozilla-central.nightly.2019.06.24)
Download (MBps), 40.0, 39.9, 39.6
Upload (MBbps) 20.0, 19.4, 19.4

Chome 74
Download (MBps): 92.9, 97.8, 94.85
Upload (MBbps) 21.7, 20.7, 20.59

Do you have a faster network that you can test on?

FiOS 75/75 (but usually measures around 90/90), Moto G5, 3 feet from AP:

GVE: (local build): down: 33, Up 40. Odd, down is reliably less than up.
Chrome: down 91, Up 80

Right, I'm on TekSavvy 250Mbps down, 20Mbps up (hitting max upload with both browsers)
Tests were also done about 3 feet from the access point.

I was on the Toronto office wifi, I don't know where the AP is, but I don't see any AP-like things near me within 3 feet.

I wonder if it's caused by our geo locations and my tests always connected to 63.245.212.198, NewYork 1, and the latency is somewhat between 20ms to 50ms

Couple of More Runs
Chrome 75
Download (MBps): 46.81 (Max 55.03), 47.39 (Max 56.08), 53.78 (Max 63.73)
Upload (MBbps): 73.89 (Max 74.24), 70.09 (Max 85.11), 51.27 (Max 71.12)

GVE (mozilla-central.nightly.2019.06.24)
Download (MBps): 48.92 (Max 51.52), 44.8 (Max 53.39), 45.63 (Max 52.95)
Upload (MBbps): 39.62 (Max 41.29), 40.83 (Max 45.28), 42.92 (Max 44.83)

My tests shows we are like 90% of Chrome for download, and like 50% of Chrome for upload.

my latency is 17ms in Chrome, 30-110ms in Firefox.; and latency strongly affects TCP speed. And now I'm seeing 30-35Mbps down, 20-25Mbps up. Test Server "unknown" (?)

We're strongly gated in Content on GFX: https://perfht.ml/2FF0hUy - 59% in Paint(). The Blob constructor is using one chunk in the middle(?) for about 5%; another 7 inbetween gfx chunks in OnDataAvailable, mostly memcpy.

SocketThread is using a LOT of time doing AES_Decrypt() - a total of over 55% in code called from WriteSegment/WritePipeSegments, almost all of it in AES_Decrypt - and another 20% in AES_Encrypt. m_kato's patches for AEC on arm32 might help a TON here.

Markus - what are our options on gfx here? Or is the page just stupid?

m_kato: what sort of speedup do you expect for this set of calls show in the profile for SocketThread on a Moto G5 (Arm32, 8 symmetric cores A53 1.4GHz == Qualcomm MSM8937 Snapdragon 430 (28 nm)). https://www.gsmarena.com/motorola_moto_g5-8454.php

Flags: needinfo?(mstange)
Flags: needinfo?(m_kato)
Whiteboard: [qf:investigate] → [qf:p1:pageload][geckoview]

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #8)

We're strongly gated in Content on GFX: https://perfht.ml/2FF0hUy - 59% in Paint().

This is the drawing of the glow effect on the speedometer dial that they display during the test. They're using an SVG filter which does a blur. I wouldn't necessarily call it stupid, but it's certainly an expensive effect. Once we have WebRender on Android and complete SVG filter support in WebRender (bug 1409486), this should become better.

You could try checking if the score improves if you have an override CSS style that disables the effect. I'm not sure how to achieve that, though.

Flags: needinfo?(mstange)

The patch from bug 1152625 doesn't seem to make a huge difference; probably the wrong cipher --or gfx is the blocker

Leaf nodes (in AES_Decrypt/Encrypt) are rijndael_encryptBlock128(), gcm_HashMult_sftw32 and things called from it. sendto() is ~5%, recvfrom is around 4%.

The site also provided an API to do the tests, although the free version has limited number of requests, it doesn't have that graphic effect.
https://speedof.me/api/doc/sample_advanced.html

So I did a few more runs!
GVE:
Download (MBps): 45.35, 41.2, 47.55, 45.99
Upload (MBbps): 42.33, 38.59, 39.96, 41.43
Jitter (ms): 39, 45, 27, 40
Latency (ms): 26, 38, 46, 50
Profile: https://perfht.ml/2X8KlQz

Chrome 75:
Download (MBps): 39.47, 46.93, 42.54, 49.43
Upload (MBbps): 77.06, 74.83, 73.45, 76.22
Jitter (ms): 24, 27, 17, 38
Latency (ms): 34, 23, 24, 23

Also, GCM's code is shown. I filed bug for GCM/aarch32 as bug 1562548

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #9)

Markus - what are our options on gfx here? Or is the page just stupid?

m_kato: what sort of speedup do you expect for this set of calls show in the profile for SocketThread on a Moto G5 (Arm32, 8 symmetric cores A53 1.4GHz == Qualcomm MSM8937 Snapdragon 430 (28 nm)). https://www.gsmarena.com/motorola_moto_g5-8454.php

I don't have Moto G5 now. but, As chip, it supports AES, but some vendor may disable it on aarch32 mode of Linux kernel. When browsing file:///proc/cpuinfo can show current support. (If you want to know aarch32's support, you have to use 32-bit program)

Flags: needinfo?(m_kato)

I don't have Moto G5 now. but, As chip, it supports AES, but some vendor may disable it on aarch32 mode of Linux kernel. When browsing file:///proc/cpuinfo can show current support. (If you want to know aarch32's support, you have to use 32-bit program)

bogomips: 38.0 half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm aes pmull sha1 sha2 crc32

Flags: needinfo?(m_kato)

It appears that we are indeed limited by the SVG filter.
Based on comment 8 and comment 10 I made nsSVGIntegrationUtils::PaintFilter a no-op and the reported download bandwidth almost doubled.

Baseline gv_example (local release build, Moto G5)

Download (Mbps): 37.1, 37.1, 37.8
Upload (Mbps): 19.8, 19.8, 17.7

gv_example (local release build, , Moto G5, but skipping nsSVGIntegrationUtils::PaintFilter):
https://searchfox.org/mozilla-central/rev/11712bd3ce7454923e5931fa92eaf9c01ef35a0a/layout/svg/nsSVGIntegrationUtils.cpp#1057

Download (Mbps): 62.9, 63.0, 62.4
Upload (Mbps): 20.4, 20.3, 20.2

My home network is 250Mbps down and 20Mbps upload so I'm near the upload limit.

On Chrome I'm seeing ~100Mbps download.
I have a preference for independently-verified results, so if someone else who can repro the issue would like to repeat the test, that would be great.

But this makes me think:

  • Is this an isolated problem? An expensive filter running during their test, or could this impact pageload of real sites?
    I'll start a tp6m job to test this hypothesis.
  • It's great that we may be getting decryption optimizations out of this. But is there anything else we can do about the graphics side prior to WebRender + SVG?

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #16)

I don't have Moto G5 now. but, As chip, it supports AES, but some vendor may disable it on aarch32 mode of Linux kernel. When browsing file:///proc/cpuinfo can show current support. (If you want to know aarch32's support, you have to use 32-bit program)

bogomips: 38.0 half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm aes pmull sha1 sha2 crc32

Hmm, I guess that Moto G5's kernel doesn't return valid feature with AT_HWCAP2. By bug 1562611, I will add telemetry for it and I will change CPU detection of arm.h to use cpu-features on Android.

Flags: needinfo?(m_kato)

This is a performance comparison of baseline (left) vs a build where nsSVGIntegrationUtils::PaintFilter() does nothing (Moto G5 - arm7, geckoview_example PGO):
(please ignore the Pixel 2 results, it's not reasonable to get enough retries on that device at the moment)
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=84cc921108a44b6481e891b7b1bdfb05198bcfe6&newProject=try&newRevision=fdc0451a2a203c35b53e2ae35d689c7b56227830&framework=10

Thoughts:
• There are still some running jobs, but this one might show a real change because the SVG filter is painted 3 times during the test:
6% loadtime, 5% fnbp improvement on tp6m-amazon-search-geckoview over 20 runs

• This looked like a ~7% improvement on tp6m-bbc-geckoview except... from my logging the filter isn't even used in this page.

• When I test these PGO builds locally the reported download rate increases from ~40Mbps to ~62Mbps with the SVG filter disabled.

(In reply to Makoto Kato [:m_kato] from comment #18)

Hmm, I guess that Moto G5's kernel doesn't return valid feature with AT_HWCAP2. By bug 1562611, I will add telemetry for it and I will change CPU detection of arm.h to use cpu-features on Android.

We need a workaround like https://bugs.chromium.org/p/boringssl/issues/detail?id=46. BoringSSL reads cpuinfo if AT_HWCAP2 returns 0.

Should we move this bug to the "Core::Networking: HTTP" Bugzilla component?

Whiteboard: [qf:p1:pageload][geckoview] → [qf:p1:pageload]
Depends on: 1564715

Chris, from what I can tell the graphics in the content process is the biggest bottleneck (although great to see improvements to encryption/decryption being made as well).
I'm not sure how the bug should be moved based on that.

Perhaps make this a bug on SVG and spin off a clone for the networking issue (assigned to m_kato). Or make it a meta (especially if we think there are more than these 2 issues), and spin off 2 clones

Flags: needinfo?(acreskey)
Blocks: 1570313

Moved this bug to Core::Graphics.
In comment 10 and comment 17 we saw that the SVG blur filter is significantly reducing the reported download bandwidth.

Not sure if anything can be done outside of WebRender.

Component: Performance → Graphics
Flags: needinfo?(acreskey)
Blocks: wr-android
Priority: -- → P3

I had meant to try this earlier:
I enabled webrender on the motoG5 but the performance of this test did not improve.

Here's a profile:
https://perfht.ml/2o7OXuN

A lot of time in the content process in mozilla::dom::XMLHttpRequestMainThread::AppendToResponseText and nscstring_fallible_append_utf16_to_utf8_impl.

On the socket thread, very busy in GCM_DecryptUpdate.

In Bug 1576617 we discussed how speedof.me is using random text for the large files that are download tested.
Not the most realistic scenario. I've explained and asked them to change the XHR's to "arraybuffer" mode but they did not respond.

Nonetheless, geckoview example is reporting ~35Mbps download while Chrome on the same device is at around 85Mbps.

See Also: → 1576617

When Web Render is turned off, filter processing spends a lot of times.

Depends on: 961759
Blocks: wr-android-perf
No longer blocks: wr-android
No longer depends on: wr-svg-filter-perf

This seems to work now!

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
Performance Impact: --- → P1
Keywords: perf:pageload
Whiteboard: [qf:p1:pageload]
You need to log in before you can comment on or make changes to this bug.