752245 - Interactions on the Google Play website are slow

Reporter

Description

•

12 years ago

Web page or screen you were on when you saw the issue: https://play.google.com/store/apps/details?id=org.mozilla.firefox

Steps to reproduce:
1. Open Aurora
2.Go to https://play.google.com/store/apps/details?id=org.mozilla.firefox_beta on Fennec
3. Try to tap on "install" button

What you expected: The Google Play Store native app pops up to direct me to download the app

Aaron Train [:aaronmt]

Comment 1

•

12 years ago

This works for me, but I think I see what you're seeing.

It seems like there is a massive delay between user interaction and actually seeing Gecko content respond to actions on Google Play (throbber spinning for minutes). Tapping sign-in on Google Play, took several minutes and, clicking the Install button, took several more. It took quite some time for Gecko console to show me 'Got message: Tab:HasTouchListener' messages


--
Nightly (05/05), Galaxy Nexus.

blocking-fennec1.0: --- → ?

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

12 years ago

Assignee: nobody → bnicholson

blocking-fennec1.0: ? → +

Mark Finkle (:mfinkle) (use needinfo?)

Comment 2

•

12 years ago

Kevin - Did you mention a possible issue with SPDY?

Brian Nicholson (:bnicholson)

Assignee

Comment 3

•

12 years ago

I copied the page here: http://people.mozilla.com/~bnicholson/test/androidmarket.html. I removed different scripts/resources from that page, and eventually narrowed it down to this script: https://checkout.google.com/customer/gadget/gwt/embeddedbuy2/com.google.checkout.gadgets.embeddedbuy2.client.embeddedbuy2.nocache.js.

For comparison, here's the page without the script: http://people.mozilla.com/~bnicholson/test/androidmarket_faster.html.

I don't know how to narrow it down from here, but something in that script is very CPU-intensive.

David Mandelin [:dmandelin]

Comment 4

•

12 years ago

Brian, have you run a standard profiler on it yet? If it's pure JS, it may just show up as jitcode, but it's possible it will turn out to be parser or GC.

Luke/Steve/Bill--what are the best tools we've got for Brian right now?

Aaron Train [:aaronmt]

Updated

•

12 years ago

Summary: Can't install apps from Google Play Store website → Interactions on the Google Play website are slow

Aaron Train [:aaronmt]

Comment 6

•

12 years ago

Kevin, can you check re: comment #2?

Aaron Train [:aaronmt]

Comment 7

•

12 years ago

(In reply to Aaron Train [:aaronmt] from comment #6)
> Kevin, can you check re: comment #2?

ping

Kevin Brosnan [Ex-Mozilla]

Comment 8

•

12 years ago

The install message popped up quick for me with or without SPDY though I was signed into my Google account. That might be another factor.

Mark Finkle (:mfinkle) (use needinfo?)

Comment 9

•

12 years ago

going to http://play.google.com/store seems to keep the throbber loading forever, even though the page seems to finish. I see this err in the log:

E/GeckoConsole(14869): [JavaScript Error: "NS_ERROR_NOT_IMPLEMENTED: Component returned failure code: 0x80004001 (NS_ERROR_NOT_IMPLEMENTED) [nsIDOMWindow.crypto]" {file: "https://apis.google.com/_/abc-static/_/js/gapi/iframes_styles_bubble_mobile,plusone/rt=j/ver=OjdQ3MbDCro.en./sv=1/am=!uchpBK-CNFmZrNLZSw/d=1/cb=gapi.loaded_0" line: 122}]

Aaron Train [:aaronmt]

Comment 10

•

12 years ago

Simple STR: tap the sign in link on the top right (black Google bar).

Brad Lassey [:blassey] (use needinfo?)

Comment 11

•

12 years ago

Brian are you currently looking at this?

Joe Drew (not getting mail)

Comment 12

•

12 years ago

George, can you do a quick profile of this to find out where we're spending our time?

George Wright (:gw280) (needinfo me!)

Comment 13

•

12 years ago

Here is a profile based on today's m-c:

http://people.mozilla.com/~bgirard/cleopatra/?report=AMIfv97bzvAjZyLm8p78ziJ0QnOBm4kKfxv8wRKjF_5kfLxkFyc4WthqOGRF1M-jQm_uC-cJmlPd9fsw8a4_qGltExolBE1BN4QySjlvGkaBXj-YsQnG8381OeCIriOKDg9QAA5YL3TM32S_Ave-tip3uUHFY6XJhA

Steps:

- Started Fennec in profiling mode
- Went to https://play.google.com/store/apps/details?id=org.mozilla.firefox
- Clicked on the "Sign in" link at the top right
- Left throbber going for ~5 seconds
- Pulled profile

Brian Nicholson (:bnicholson)

Assignee

Comment 14

•

12 years ago

(In reply to Brad Lassey [:blassey] from comment #11)
> Brian are you currently looking at this?

I tried to follow the steps at https://wiki.mozilla.org/index.php?title=Using_SlowCalls, but my log.tl is empty and log.txt doesn't have any JS calls.

I used this to run Fennec:
adb shell am start -a android.activity.MAIN -n org.mozilla.fennec_brian/org.mozilla.fennec_brian.App --es env0 NSPR_LOG_MODULES=SlowCalls:5 --es env1 NSPR_LOG_FILE=/mnt/sdcard/log.txt --es env2 MOZ_FT=/mnt/sdcard/log.tl

Mark Finkle (:mfinkle) (use needinfo?)

Comment 15

•

12 years ago

Luke - Can you help us profile this issue?

Luke Wagner [:luke]

Comment 16

•

12 years ago

Are there any profiles showing time in JS?  If the time is in JS, Alex Crichton is working right now to add JS information to SPS: bug 761261.  However, comment 13 suggests that JS isn't the problem: all the time is blocked on a futex.

Brian Nicholson (:bnicholson)

Assignee

Comment 17

•

12 years ago

Attached file CPU usage log for slow page load — Details

Brian Nicholson (:bnicholson)

Assignee

Comment 18

•

12 years ago

Attached file CPU usage log for faster page load — Details

Brian Nicholson (:bnicholson)

Assignee

Comment 19

•

12 years ago

Attached patch Desktop network.http.* pref values (obsolete) — Details — Splinter Review

Brian Nicholson (:bnicholson)

Assignee

Comment 20

•

12 years ago

When testing, I've noticed that sometimes the page loads (relatively) quickly, whereas other times it takes several minutes. Looking at Fennec's CPU usage, I noticed that the "quick" page loads corresponded to a CPU usage spike immediately when the page is loaded (attachment 630401 [details]). The slow loads, however, showed only moderate CPU usage, followed by a few minutes of low CPU usage, and then a heavy CPU load before a document stop event was finally received (attachment 630400 [details]).

Figuring this pattern may correspond to the disk cache, I tried disabling it. Without the disk cache, the page consistently takes 3-4 minutes to load. With the disk cache enabled (and making sure not to crash between tests due to bug 105843), it consistently takes 40-45 seconds.

3-4 minutes without the disk cache is a very long time, so I looked at some of our network limits in about:config. A few of our settings are low compared to desktop (e.g., network.http.max-connections is 6 on Fennec and 256 on desktop). I changed these to use the desktop values (attachment 630402 [details] [diff] [review]), and without the disk cache, the time to receive a document stop event dropped to ~24 seconds. With the disk cache enabled, though, it still takes 40-45 seconds.

In short, 1) our network.http.* prefs seem to need some tweaking, and 2) our disk cache is (at least in this case) hurting us. Note that my phone is on a 4G network, so the disk cache may be more useful for 3G phones or those with slow wifi.

Brian Nicholson (:bnicholson)

Assignee

Comment 21

•

12 years ago

(In reply to Brian Nicholson (:bnicholson) from comment #20)
> In short, 1) our network.http.* prefs seem to need some tweaking, and 2) our
> disk cache is (at least in this case) hurting us. Note that my phone is on a
> 4G network, so the disk cache may be more useful for 3G phones or those with
> slow wifi.

Rather, the disk cache is *sometimes* hurting us; it was clearly beneficial before I applied the patch posted here, but detrimental with the aggressive network prefs.

Mark Finkle (:mfinkle) (use needinfo?)

Comment 22

•

12 years ago

Adding Patrick for some insight into the network prefs. Is it safe to bump up some of the mobile values? I remember you mentioned in a different bug that bumping the values wasn't the right answer. There was something else you were looking into.

Patrick McManus [:mcmanus]

Comment 23

•

12 years ago

3-4 minutes, eh?

I bet you're hitting some kind of head of line blocking problem.

+pref("network.http.pipelining.maxrequests" , 4);

that's a reduction from 6 to 4.. I wouldn't do that without at least an isolated test indicating this pref was the issue.. and even then it probably indicates some other kind of issue this just happens to workaroud.

+pref("network.http.keep-alive.timeout", 115);

sure

+pref("network.http.max-connections", 256);

this represnets all connections to all hosts. .and is probably your bottleneck. but 256 is probably too much parallelization and buffering for the lower bandwidths of mobile which will have real troubles with prioritzation when that happens.. bandwidth has grown here since this was set, but its not at desktop levels. I'd start with something in the ~24 range and see if it helped the scenario you're looking at.

+pref("network.http.max-connections-per-server", 15);
+pref("network.http.max-persistent-connections-per-server", 6);
+pref("network.http.max-persistent-connections-per-proxy", 8);

that's all fine. The only one that really matters much there is the "6" which is an upgrade from 4.. there are tradeoffs here, but inching that up to the desktop standard is fine.

Brad Lassey [:blassey] (use needinfo?)

Comment 24

•

12 years ago

qawanted to test these prefs changes

blocking-fennec1.0: + → betaN+

Keywords: qawanted

Mark Finkle (:mfinkle) (use needinfo?)

Comment 25

•

12 years ago

(In reply to Patrick McManus [:mcmanus] from comment #23)
> 
> +pref("network.http.pipelining.maxrequests" , 4);
> 
> that's a reduction from 6 to 4.. I wouldn't do that without at least an
> isolated test indicating this pref was the issue.. and even then it probably
> indicates some other kind of issue this just happens to workaroud.

Agreed. Let's not change this without data specifically for the change.

> +pref("network.http.max-connections", 256);
> 
> this represnets all connections to all hosts. .and is probably your
> bottleneck. but 256 is probably too much parallelization and buffering for
> the lower bandwidths of mobile which will have real troubles with
> prioritzation when that happens.. bandwidth has grown here since this was
> set, but its not at desktop levels. I'd start with something in the ~24
> range and see if it helped the scenario you're looking at.

Also agreed. Let's not make it too big.

Johnathan Nightingale [:johnath]

Comment 26

•

12 years ago

Stupid question perhaps, but given that this is google properties and https, any chance we're looking at SPDY bustage?

Patrick McManus [:mcmanus]

Comment 27

•

12 years ago

This is what I would do if I were trying to make a low risk change here.

I'd do the minimum to get the google play case running better - starting by changing to

pref("network.http.max-connections", 20);
pref("network.http.max-connections-per-server", 15);
pref("network.http.max-persistent-connections-per-server", 6);
pref("network.http.max-persistent-connections-per-proxy", 8);

If that doesn't do the trick, ramp network.http-max-connections up.. testing values up to 64. Don't make any of the "per" prefs larger than I note.

If that doesn't work we probably need to see a NSPR HTTP log to make a well reasoned diagnosis. See: https://developer.mozilla.org/en/HTTP_Logging

verify that this isn't (all) spdy. Response headers in spdy would carry X-Firefox-Spdy or you could determine it from the nspr log. The parameters that were changed in comment 20 don't really drive spdy, but perhaps there is some kind of non-spdy/spdy interaction going on (mix of sites).

Brian Nicholson (:bnicholson)

Assignee

Comment 28

•

12 years ago

Attached patch New network values (obsolete) — Details — Splinter Review

Attachment #630402 - Attachment is obsolete: true

Patrick McManus [:mcmanus]

Comment 29

•

12 years ago

risk assessment for comment 27: 3 on a scale of 1(low) to 10(high). there will be winners and losers when you change the bandwidth vs parallelism bottlenecks, (and they will depend on your testing parameters latency, etc..) but these ranges should be at least reasonable (if not optimal) for everyone.

double check that against nytimes.com, etc.. for at least reasonableness - they use a lot of images and hostnames which will be a good test case here.

Brian Nicholson (:bnicholson)

Assignee

Comment 30

•

12 years ago

Attached patch New network values, v2 — Details — Splinter Review

Just saw comment 27, here's the updated values.

Attachment #630701 - Attachment is obsolete: true

Kevin Brosnan [Ex-Mozilla]

Comment 31

•

12 years ago

(In reply to Johnathan Nightingale [:johnath] from comment #26)
> Stupid question perhaps, but given that this is google properties and https,
> any chance we're looking at SPDY bustage?

I had investigated in comment 8, I was logged in at the time. Seems that flipping spdy to of makes the install/login dialog appear quickly.

Brad Lassey [:blassey] (use needinfo?)

Comment 32

•

12 years ago

disabling spdy (by setting network.http.spdy.enabled to false) seems to fix this. The login screen rendered 4.82s after clicking the link.

Aaron Train [:aaronmt]

Comment 33

•

12 years ago

More times, Google Sign In from Google Play:

SPDY on: 17:31:56.156 (throbber start) -> 17:34:01.179 (throbber stop)
SPDY off: 17:43:02.734 (throbber start) -> 17:43:14.484 (throbber stop)

Agree with comment #32

Brad Lassey [:blassey] (use needinfo?)

Comment 34

•

12 years ago

(In reply to Brad Lassey [:blassey] from comment #32)
> disabling spdy (by setting network.http.spdy.enabled to false) seems to fix
> this. The login screen rendered 4.82s after clicking the link.

please ignore my data, I think I was getting a disk cache hit

Brad Lassey [:blassey] (use needinfo?)

Comment 35

•

12 years ago

with disk and memory caches off, I am not seeing the slowness with either youtube or google plus. So it appears that this is unique to google play.

Aaron Train [:aaronmt]

Comment 36

•

12 years ago

More testing; Nightly (06/06), restart's in-between. Results varied across the board. Two results yielded two minute completion times, see below:

SPDY off, disk cache off, mem cache off:

06-06 18:41:14.336 I/GeckoToolbar(11090): zerdatime 23436386 - Throbber start
06-06 18:41:33.234 I/GeckoToolbar(11090): zerdatime 23455287 - Throbber stop

SPDY off, disk cache on, mem cache off:

06-06 18:42:48.484 I/GeckoToolbar(11090): zerdatime 23530537 - Throbber start
06-06 18:42:50.172 I/GeckoToolbar(11090): zerdatime 23532219 - Throbber stop


SPDY off, disk cache on, men cache on:

06-06 18:44:13.734 I/GeckoToolbar(11090): zerdatime 23615786 - Throbber start
06-06 18:44:15.359 I/GeckoToolbar(11090): zerdatime 23617411 - Throbber stop

SPDY off, disk cache off, mem cache on:

06-06 18:45:22.718 I/GeckoToolbar(11090): zerdatime 23684769 - Throbber stop
06-06 18:45:26.453 I/GeckoToolbar(11090): zerdatime 23688502 - Throbber start  

SPDY on, disk cache off, mem cache off -- worst offender

06-06 18:48:11.679 I/GeckoToolbar(11090): zerdatime 23853728 - Throbber start
06-06 18:50:20.640 I/GeckoToolbar(11090): zerdatime 23982690 - Throbber start

SPDY on, disk cache on, mem cache off

06-06 18:51:35.937 I/GeckoToolbar(11090): zerdatime 24057986 - Throbber start
06-06 18:51:37.718 I/GeckoToolbar(11090): zerdatime 24059766 - Throbber stop


SPDY on, disk cache on, mem cache on

06-06 18:52:18.640 I/GeckoToolbar(11090): zerdatime 24100694 - Throbber start
06-06 18:52:20.453 I/GeckoToolbar(11090): zerdatime 24102502 - Throbber stop

SPDY on, disk cache off, mem cache on -- worst offender 

06-06 18:53:12.062 I/GeckoToolbar(11090): zerdatime 24154116 - Throbber start
06-06 18:55:21.695 I/GeckoToolbar(11090): zerdatime 24283745 - Throbber stop

Patrick McManus [:mcmanus]

Comment 37

•

12 years ago

if you can reproduce and know how to get at NSPR logging info, the log from comment #27 would be the first step at tackling this.

Brian Nicholson (:bnicholson)

Assignee

Comment 38

•

12 years ago

I'm also seeing an improvement with SPDY disabled with our current network prefs, but with the modified prefs, SPDY makes things faster. Here's what I'm seeing, and these numbers are consistent for me for multiple test runs:

SPDY, no patch:                 >150s
No SPDY, no patch:              40-50s
SPDY, patch from comment 30:    20-25s
No SPDY, patch from comment 30: 40-50s

Here's what I'm doing:
1) Start Fennec
2) Wait until 3rd throbber start shows up in logcat
3) Go to https://play.google.com/store/apps/details?id=org.mozilla.firefox_beta
4) Measure time between throbber stop and page's first throbber start
5) Close Fennec, repeat

This is a Droid RAZR, 4G, no disk/memory cache.

Patrick McManus [:mcmanus]

Comment 39

•

12 years ago

(In reply to Brian Nicholson (:bnicholson) from comment #38)
> I'm also seeing an improvement with SPDY disabled with our current network
> prefs, but with the modified prefs, SPDY makes things faster. Here's what
> I'm seeing, and these numbers are consistent for me for multiple test runs:
> 
> SPDY, no patch:                 >150s
> No SPDY, no patch:              40-50s
> SPDY, patch from comment 30:    20-25s
> No SPDY, patch from comment 30: 40-50s
> 

good news - and it actually makes a lot of sense which is a relief too.

SPDY connections are long lived.. the website has a lot of different hosts involved (requiring many connections, because you obviously can't multiplex different hosts across the same connection in all situations).. and therefore the max-connection limit was creating the bottleneck. Eventually the spdy connections idle-out and the new connections can be made .

with spdy disabled the HTTP/1 connections are much shorter lived (they go idle and are eligible for cleanup when quiescent, where spdy is not) so its not as apparent. It also means that valid pesistent connections were being torn down often in order to have quota room to make new ones to a different host (that's the driving reason that the limit on desktop went up from 30 to 256 in the last few months).

the improved spdy data (when the connection bottleneck is removed) is nice to see too.

so it sounds like the patch from comment 30 is the way to go.

let me know if ther eis something to follow up on. I'd still make sure to double check a couple other sites to make sure the needle on them has moved radically (but like I said, there will be winners and losers with any change - though it should be fairly modest in the loss department).

Brian Nicholson (:bnicholson)

Assignee

Comment 40

•

12 years ago

For reference, here are the builds I'm using - with and without the network changes:
Unpatched: http://dl.dropbox.com/u/35559547/fennec-old-network-prefs.apk
Patched: http://dl.dropbox.com/u/35559547/fennec-new-network-prefs.apk

Brad Lassey [:blassey] (use needinfo?)

Comment 41

•

12 years ago

renom'ing based on this being unique to google play. 

We may still want to take the patch from comment 30 though, I get the impression from Patrick that these are more sane values than we currently have.

blocking-fennec1.0: betaN+ → ?

Patrick McManus [:mcmanus]

Comment 42

•

12 years ago

(In reply to Brad Lassey [:blassey] from comment #41)
> renom'ing based on this being unique to google play. 
> 
> We may still want to take the patch from comment 30 though, I get the
> impression from Patrick that these are more sane values than we currently
> have.

I think you do want that patch because of the longer lived nature of SPDY - that's going to put pressure on the tiny pool you currently have. (and you see what happens if the pool runs empty).

Brian Nicholson (:bnicholson)

Assignee

Comment 43

•

12 years ago

Comment on attachment 630703 [details] [diff] [review]
New network values, v2

After trying nytimes.com, cnn.com, and youtube.com, the load times are exactly the same.

Attachment #630703 - Flags: review?(mark.finkle)

Mark Finkle (:mfinkle) (use needinfo?)

Updated

•

12 years ago

Attachment #630703 - Flags: review?(mark.finkle) → review+

Brad Lassey [:blassey] (use needinfo?)

Updated

•

12 years ago

blocking-fennec1.0: ? → -

Brad Lassey [:blassey] (use needinfo?)

Updated

•

12 years ago

tracking-fennec: --- → 15+

Brian Nicholson (:bnicholson)

Assignee

Comment 44

•

12 years ago

http://hg.mozilla.org/integration/mozilla-inbound/rev/745120500d54

Adrian Tamas (:AdrianT)

Comment 45

•

12 years ago

Removing qawanted since the needed information has been given and the patch has been pushed to inbound

Keywords: qawanted

Graeme McCutcheon [:graememcc]

Comment 46

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/745120500d54

(Merged by Ed Morley)

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → Firefox 16

Joe Drew (not getting mail)

Comment 47

•

12 years ago

Are we sure this is only Google Play? I've had a pretty awful time searching in Fennec lately, seemingly due to networking.

Brian Nicholson (:bnicholson)

Assignee

Comment 48

•

12 years ago

Comment on attachment 630703 [details] [diff] [review]
New network values, v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: very slow (several minutes) loading time for google play
Testing completed (on m-c, etc.): 3 days on m-c
Risk to taking this patch (and alternatives if risky): medium-low risk; there may be some cases where the new setting worsens performance (see comment 29)
String or UUID changes made by this patch: none

Attachment #630703 - Flags: approval-mozilla-aurora?

Alex Keybl [:akeybl]

Updated

•

12 years ago

Attachment #630703 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Brian Nicholson (:bnicholson)

Assignee

Comment 49

•

12 years ago

http://hg.mozilla.org/releases/mozilla-aurora/rev/4224d2750ff1

status-firefox15: --- → fixed

status-firefox16: --- → fixed

Adrian Tamas (:AdrianT)

Comment 50

•

12 years ago

Google Play website responds to taps and can be used without major issues. Closing as verified.

Nightly 16.0a1 2012-07-09/Aurora 15.0a2 2012-07-09
HTC Desire
Android 2.2.2

Status: RESOLVED → VERIFIED

status-firefox15: fixed → verified

status-firefox16: fixed → verified

Nobody; OK to take it and work on it

Updated

•

3 years ago

Product: Firefox for Android → Firefox for Android Graveyard

CPU usage log for slow page load 12 years ago Brian Nicholson (:bnicholson) 4.71 KB, text/plain		Details
CPU usage log for faster page load 12 years ago Brian Nicholson (:bnicholson) 1.85 KB, text/plain		Details
Desktop network.http.* pref values 12 years ago Brian Nicholson (:bnicholson) 1.51 KB, patch		Details \| Diff \| Splinter Review
New network values 12 years ago Brian Nicholson (:bnicholson) 1.40 KB, patch		Details \| Diff \| Splinter Review
New network values, v2 12 years ago Brian Nicholson (:bnicholson) 1.29 KB, patch	mfinkle : review+ akeybl : approval-mozilla-aurora+	Details \| Diff \| Splinter Review