Closed Bug 752245 Opened 12 years ago Closed 12 years ago

Interactions on the Google Play website are slow

Categories

(Firefox for Android Graveyard :: General, defect)

ARM
Android
defect
Not set
normal

Tracking

(firefox15 verified, firefox16 verified, blocking-fennec1.0 -, fennec15+)

VERIFIED FIXED
Firefox 16
Tracking Status
firefox15 --- verified
firefox16 --- verified
blocking-fennec1.0 --- -
fennec 15+ ---

People

(Reporter: jfu, Assigned: bnicholson)

References

Details

(Whiteboard: [Engagement] [PMM])

Attachments

(3 files, 2 obsolete files)

Web page or screen you were on when you saw the issue: https://play.google.com/store/apps/details?id=org.mozilla.firefox

Steps to reproduce:
1. Open Aurora
2.Go to https://play.google.com/store/apps/details?id=org.mozilla.firefox_beta on Fennec
3. Try to tap on "install" button

What you expected: The Google Play Store native app pops up to direct me to download the app
This works for me, but I think I see what you're seeing.

It seems like there is a massive delay between user interaction and actually seeing Gecko content respond to actions on Google Play (throbber spinning for minutes). Tapping sign-in on Google Play, took several minutes and, clicking the Install button, took several more. It took quite some time for Gecko console to show me 'Got message: Tab:HasTouchListener' messages


--
Nightly (05/05), Galaxy Nexus.
blocking-fennec1.0: --- → ?
Assignee: nobody → bnicholson
blocking-fennec1.0: ? → +
Kevin - Did you mention a possible issue with SPDY?
I copied the page here: http://people.mozilla.com/~bnicholson/test/androidmarket.html. I removed different scripts/resources from that page, and eventually narrowed it down to this script: https://checkout.google.com/customer/gadget/gwt/embeddedbuy2/com.google.checkout.gadgets.embeddedbuy2.client.embeddedbuy2.nocache.js.

For comparison, here's the page without the script: http://people.mozilla.com/~bnicholson/test/androidmarket_faster.html.

I don't know how to narrow it down from here, but something in that script is very CPU-intensive.
Brian, have you run a standard profiler on it yet? If it's pure JS, it may just show up as jitcode, but it's possible it will turn out to be parser or GC.

Luke/Steve/Bill--what are the best tools we've got for Brian right now?
Summary: Can't install apps from Google Play Store website → Interactions on the Google Play website are slow
Kevin, can you check re: comment #2?
(In reply to Aaron Train [:aaronmt] from comment #6)
> Kevin, can you check re: comment #2?

ping
The install message popped up quick for me with or without SPDY though I was signed into my Google account. That might be another factor.
going to http://play.google.com/store seems to keep the throbber loading forever, even though the page seems to finish. I see this err in the log:

E/GeckoConsole(14869): [JavaScript Error: "NS_ERROR_NOT_IMPLEMENTED: Component returned failure code: 0x80004001 (NS_ERROR_NOT_IMPLEMENTED) [nsIDOMWindow.crypto]" {file: "https://apis.google.com/_/abc-static/_/js/gapi/iframes_styles_bubble_mobile,plusone/rt=j/ver=OjdQ3MbDCro.en./sv=1/am=!uchpBK-CNFmZrNLZSw/d=1/cb=gapi.loaded_0" line: 122}]
Simple STR: tap the sign in link on the top right (black Google bar).
Brian are you currently looking at this?
George, can you do a quick profile of this to find out where we're spending our time?
(In reply to Brad Lassey [:blassey] from comment #11)
> Brian are you currently looking at this?

I tried to follow the steps at https://wiki.mozilla.org/index.php?title=Using_SlowCalls, but my log.tl is empty and log.txt doesn't have any JS calls.

I used this to run Fennec:
adb shell am start -a android.activity.MAIN -n org.mozilla.fennec_brian/org.mozilla.fennec_brian.App --es env0 NSPR_LOG_MODULES=SlowCalls:5 --es env1 NSPR_LOG_FILE=/mnt/sdcard/log.txt --es env2 MOZ_FT=/mnt/sdcard/log.tl
Luke - Can you help us profile this issue?
Are there any profiles showing time in JS?  If the time is in JS, Alex Crichton is working right now to add JS information to SPS: bug 761261.  However, comment 13 suggests that JS isn't the problem: all the time is blocked on a futex.
When testing, I've noticed that sometimes the page loads (relatively) quickly, whereas other times it takes several minutes. Looking at Fennec's CPU usage, I noticed that the "quick" page loads corresponded to a CPU usage spike immediately when the page is loaded (attachment 630401 [details]). The slow loads, however, showed only moderate CPU usage, followed by a few minutes of low CPU usage, and then a heavy CPU load before a document stop event was finally received (attachment 630400 [details]).

Figuring this pattern may correspond to the disk cache, I tried disabling it. Without the disk cache, the page consistently takes 3-4 minutes to load. With the disk cache enabled (and making sure not to crash between tests due to bug 105843), it consistently takes 40-45 seconds.

3-4 minutes without the disk cache is a very long time, so I looked at some of our network limits in about:config. A few of our settings are low compared to desktop (e.g., network.http.max-connections is 6 on Fennec and 256 on desktop). I changed these to use the desktop values (attachment 630402 [details] [diff] [review]), and without the disk cache, the time to receive a document stop event dropped to ~24 seconds. With the disk cache enabled, though, it still takes 40-45 seconds.

In short, 1) our network.http.* prefs seem to need some tweaking, and 2) our disk cache is (at least in this case) hurting us. Note that my phone is on a 4G network, so the disk cache may be more useful for 3G phones or those with slow wifi.
(In reply to Brian Nicholson (:bnicholson) from comment #20)
> In short, 1) our network.http.* prefs seem to need some tweaking, and 2) our
> disk cache is (at least in this case) hurting us. Note that my phone is on a
> 4G network, so the disk cache may be more useful for 3G phones or those with
> slow wifi.

Rather, the disk cache is *sometimes* hurting us; it was clearly beneficial before I applied the patch posted here, but detrimental with the aggressive network prefs.
Adding Patrick for some insight into the network prefs. Is it safe to bump up some of the mobile values? I remember you mentioned in a different bug that bumping the values wasn't the right answer. There was something else you were looking into.
3-4 minutes, eh?

I bet you're hitting some kind of head of line blocking problem.

+pref("network.http.pipelining.maxrequests" , 4);

that's a reduction from 6 to 4.. I wouldn't do that without at least an isolated test indicating this pref was the issue.. and even then it probably indicates some other kind of issue this just happens to workaroud.

+pref("network.http.keep-alive.timeout", 115);

sure

+pref("network.http.max-connections", 256);

this represnets all connections to all hosts. .and is probably your bottleneck. but 256 is probably too much parallelization and buffering for the lower bandwidths of mobile which will have real troubles with prioritzation when that happens.. bandwidth has grown here since this was set, but its not at desktop levels. I'd start with something in the ~24 range and see if it helped the scenario you're looking at.

+pref("network.http.max-connections-per-server", 15);
+pref("network.http.max-persistent-connections-per-server", 6);
+pref("network.http.max-persistent-connections-per-proxy", 8);

that's all fine. The only one that really matters much there is the "6" which is an upgrade from 4.. there are tradeoffs here, but inching that up to the desktop standard is fine.
qawanted to test these prefs changes
blocking-fennec1.0: + → betaN+
Keywords: qawanted
(In reply to Patrick McManus [:mcmanus] from comment #23)
> 
> +pref("network.http.pipelining.maxrequests" , 4);
> 
> that's a reduction from 6 to 4.. I wouldn't do that without at least an
> isolated test indicating this pref was the issue.. and even then it probably
> indicates some other kind of issue this just happens to workaroud.

Agreed. Let's not change this without data specifically for the change.

> +pref("network.http.max-connections", 256);
> 
> this represnets all connections to all hosts. .and is probably your
> bottleneck. but 256 is probably too much parallelization and buffering for
> the lower bandwidths of mobile which will have real troubles with
> prioritzation when that happens.. bandwidth has grown here since this was
> set, but its not at desktop levels. I'd start with something in the ~24
> range and see if it helped the scenario you're looking at.

Also agreed. Let's not make it too big.
Stupid question perhaps, but given that this is google properties and https, any chance we're looking at SPDY bustage?
This is what I would do if I were trying to make a low risk change here.

I'd do the minimum to get the google play case running better - starting by changing to

pref("network.http.max-connections", 20);
pref("network.http.max-connections-per-server", 15);
pref("network.http.max-persistent-connections-per-server", 6);
pref("network.http.max-persistent-connections-per-proxy", 8);

If that doesn't do the trick, ramp network.http-max-connections up.. testing values up to 64. Don't make any of the "per" prefs larger than I note.

If that doesn't work we probably need to see a NSPR HTTP log to make a well reasoned diagnosis. See: https://developer.mozilla.org/en/HTTP_Logging

verify that this isn't (all) spdy. Response headers in spdy would carry X-Firefox-Spdy or you could determine it from the nspr log. The parameters that were changed in comment 20 don't really drive spdy, but perhaps there is some kind of non-spdy/spdy interaction going on (mix of sites).
Attached patch New network values (obsolete) — Splinter Review
Attachment #630402 - Attachment is obsolete: true
risk assessment for comment 27: 3 on a scale of 1(low) to 10(high). there will be winners and losers when you change the bandwidth vs parallelism bottlenecks, (and they will depend on your testing parameters latency, etc..) but these ranges should be at least reasonable (if not optimal) for everyone.

double check that against nytimes.com, etc.. for at least reasonableness - they use a lot of images and hostnames which will be a good test case here.
Just saw comment 27, here's the updated values.
Attachment #630701 - Attachment is obsolete: true
(In reply to Johnathan Nightingale [:johnath] from comment #26)
> Stupid question perhaps, but given that this is google properties and https,
> any chance we're looking at SPDY bustage?

I had investigated in comment 8, I was logged in at the time. Seems that flipping spdy to of makes the install/login dialog appear quickly.
disabling spdy (by setting network.http.spdy.enabled to false) seems to fix this. The login screen rendered 4.82s after clicking the link.
More times, Google Sign In from Google Play:

SPDY on: 17:31:56.156 (throbber start) -> 17:34:01.179 (throbber stop)
SPDY off: 17:43:02.734 (throbber start) -> 17:43:14.484 (throbber stop)

Agree with comment #32
(In reply to Brad Lassey [:blassey] from comment #32)
> disabling spdy (by setting network.http.spdy.enabled to false) seems to fix
> this. The login screen rendered 4.82s after clicking the link.

please ignore my data, I think I was getting a disk cache hit
with disk and memory caches off, I am not seeing the slowness with either youtube or google plus. So it appears that this is unique to google play.
More testing; Nightly (06/06), restart's in-between. Results varied across the board. Two results yielded two minute completion times, see below:

SPDY off, disk cache off, mem cache off:

06-06 18:41:14.336 I/GeckoToolbar(11090): zerdatime 23436386 - Throbber start
06-06 18:41:33.234 I/GeckoToolbar(11090): zerdatime 23455287 - Throbber stop

SPDY off, disk cache on, mem cache off:

06-06 18:42:48.484 I/GeckoToolbar(11090): zerdatime 23530537 - Throbber start
06-06 18:42:50.172 I/GeckoToolbar(11090): zerdatime 23532219 - Throbber stop


SPDY off, disk cache on, men cache on:

06-06 18:44:13.734 I/GeckoToolbar(11090): zerdatime 23615786 - Throbber start
06-06 18:44:15.359 I/GeckoToolbar(11090): zerdatime 23617411 - Throbber stop

SPDY off, disk cache off, mem cache on:

06-06 18:45:22.718 I/GeckoToolbar(11090): zerdatime 23684769 - Throbber stop
06-06 18:45:26.453 I/GeckoToolbar(11090): zerdatime 23688502 - Throbber start  

SPDY on, disk cache off, mem cache off -- worst offender

06-06 18:48:11.679 I/GeckoToolbar(11090): zerdatime 23853728 - Throbber start
06-06 18:50:20.640 I/GeckoToolbar(11090): zerdatime 23982690 - Throbber start

SPDY on, disk cache on, mem cache off

06-06 18:51:35.937 I/GeckoToolbar(11090): zerdatime 24057986 - Throbber start
06-06 18:51:37.718 I/GeckoToolbar(11090): zerdatime 24059766 - Throbber stop


SPDY on, disk cache on, mem cache on

06-06 18:52:18.640 I/GeckoToolbar(11090): zerdatime 24100694 - Throbber start
06-06 18:52:20.453 I/GeckoToolbar(11090): zerdatime 24102502 - Throbber stop

SPDY on, disk cache off, mem cache on -- worst offender 

06-06 18:53:12.062 I/GeckoToolbar(11090): zerdatime 24154116 - Throbber start
06-06 18:55:21.695 I/GeckoToolbar(11090): zerdatime 24283745 - Throbber stop
if you can reproduce and know how to get at NSPR logging info, the log from comment #27 would be the first step at tackling this.
I'm also seeing an improvement with SPDY disabled with our current network prefs, but with the modified prefs, SPDY makes things faster. Here's what I'm seeing, and these numbers are consistent for me for multiple test runs:

SPDY, no patch:                 >150s
No SPDY, no patch:              40-50s
SPDY, patch from comment 30:    20-25s
No SPDY, patch from comment 30: 40-50s

Here's what I'm doing:
1) Start Fennec
2) Wait until 3rd throbber start shows up in logcat
3) Go to https://play.google.com/store/apps/details?id=org.mozilla.firefox_beta
4) Measure time between throbber stop and page's first throbber start
5) Close Fennec, repeat

This is a Droid RAZR, 4G, no disk/memory cache.
(In reply to Brian Nicholson (:bnicholson) from comment #38)
> I'm also seeing an improvement with SPDY disabled with our current network
> prefs, but with the modified prefs, SPDY makes things faster. Here's what
> I'm seeing, and these numbers are consistent for me for multiple test runs:
> 
> SPDY, no patch:                 >150s
> No SPDY, no patch:              40-50s
> SPDY, patch from comment 30:    20-25s
> No SPDY, patch from comment 30: 40-50s
> 

good news - and it actually makes a lot of sense which is a relief too.

SPDY connections are long lived.. the website has a lot of different hosts involved (requiring many connections, because you obviously can't multiplex different hosts across the same connection in all situations).. and therefore the max-connection limit was creating the bottleneck. Eventually the spdy connections idle-out and the new connections can be made .

with spdy disabled the HTTP/1 connections are much shorter lived (they go idle and are eligible for cleanup when quiescent, where spdy is not) so its not as apparent. It also means that valid pesistent connections were being torn down often in order to have quota room to make new ones to a different host (that's the driving reason that the limit on desktop went up from 30 to 256 in the last few months).

the improved spdy data (when the connection bottleneck is removed) is nice to see too.

so it sounds like the patch from comment 30 is the way to go.

let me know if ther eis something to follow up on. I'd still make sure to double check a couple other sites to make sure the needle on them has moved radically (but like I said, there will be winners and losers with any change - though it should be fairly modest in the loss department).
For reference, here are the builds I'm using - with and without the network changes:
Unpatched: http://dl.dropbox.com/u/35559547/fennec-old-network-prefs.apk
Patched: http://dl.dropbox.com/u/35559547/fennec-new-network-prefs.apk
renom'ing based on this being unique to google play. 

We may still want to take the patch from comment 30 though, I get the impression from Patrick that these are more sane values than we currently have.
blocking-fennec1.0: betaN+ → ?
(In reply to Brad Lassey [:blassey] from comment #41)
> renom'ing based on this being unique to google play. 
> 
> We may still want to take the patch from comment 30 though, I get the
> impression from Patrick that these are more sane values than we currently
> have.

I think you do want that patch because of the longer lived nature of SPDY - that's going to put pressure on the tiny pool you currently have. (and you see what happens if the pool runs empty).
Comment on attachment 630703 [details] [diff] [review]
New network values, v2

After trying nytimes.com, cnn.com, and youtube.com, the load times are exactly the same.
Attachment #630703 - Flags: review?(mark.finkle)
Attachment #630703 - Flags: review?(mark.finkle) → review+
blocking-fennec1.0: ? → -
tracking-fennec: --- → 15+
Removing qawanted since the needed information has been given and the patch has been pushed to inbound
Keywords: qawanted
https://hg.mozilla.org/mozilla-central/rev/745120500d54

(Merged by Ed Morley)
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 16
Are we sure this is only Google Play? I've had a pretty awful time searching in Fennec lately, seemingly due to networking.
Comment on attachment 630703 [details] [diff] [review]
New network values, v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: very slow (several minutes) loading time for google play
Testing completed (on m-c, etc.): 3 days on m-c
Risk to taking this patch (and alternatives if risky): medium-low risk; there may be some cases where the new setting worsens performance (see comment 29)
String or UUID changes made by this patch: none
Attachment #630703 - Flags: approval-mozilla-aurora?
Attachment #630703 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Google Play website responds to taps and can be used without major issues. Closing as verified.

Nightly 16.0a1 2012-07-09/Aurora 15.0a2 2012-07-09
HTC Desire
Android 2.2.2
Status: RESOLVED → VERIFIED
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: