Last Comment Bug 869715 - Marketplace API performance needs improvement
: Marketplace API performance needs improvement
Status: RESOLVED DUPLICATE of bug 926640
: perf
Product: Marketplace
Classification: Server Software
Component: API (show other bugs)
: 1.0
: x86 Mac OS X
: P4 normal (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
:
Mentors:
Depends on: 870557 874588 876523 877714
Blocks: 855908 marketplace-perf
  Show dependency treegraph
 
Reported: 2013-05-07 16:31 PDT by Matt Basta [:basta]
Modified: 2014-10-02 08:12 PDT (History)
7 users (show)
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description Matt Basta [:basta] 2013-05-07 16:31:29 PDT
Some unscientific benchmarks gathered from `time curl <url>`:

Homepage:
real	0m1.283s
user	0m0.009s
sys	0m0.011s

Search (twitter):
real	0m1.159s
user	0m0.007s
sys	0m0.008s

Twitter's rating list, page 1:
real	0m0.514s
user	0m0.004s
sys	0m0.004s

Every hundred millisecond matters.
Comment 1 Matt Basta [:basta] 2013-05-07 17:13:35 PDT
Note that Firefox OS introduces additional latency, which means that search results can take over five seconds to load, changing sort can take upwards of three to four seconds to load.
Comment 2 Andy McKay [:andym] 2013-05-07 19:48:26 PDT
Please include the URL you use, and ideally the response headers. So we can figure out servers and if things like geo-encoding occurred etc.
Comment 3 Christopher Van Wiemeersch [:cvan] 2013-05-07 20:19:15 PDT
%% curl -s -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total time: %{time_total} n" -o /dev/null 'https://marketplace-altdev.allizom.org/api/v1/apps/category/?dev=desktop&format=json&lang=en-US&pro=3040fe.32.1&region=es'
Connect: 0.037 TTFB: 0.276 Total time: 0.276 n%                                                                                                                 


%% curl -s -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total time: %{time_total} n" -o /dev/null 'https://marketplace-altdev.allizom.org/api/v1/apps/category/?dev=desktop&format=json&lang=en-US&pro=3040fe.32.1&region=es'
Connect: 0.042 TTFB: 2.031 Total time: 2.031 n%


%% time curl -I -X GET 'https://marketplace-altdev.allizom.org/api/v1/apps/category/?dev=desktop&format=json&lang=en-US&pro=3040fe.32.1&region=es'
HTTP/1.1 200 OK
Server: gunicorn/0.17.4
Vary: API-Filter, X-Requested-With, Accept-Language, Cookie, X-Mobile, User-Agent
X-Backend-Server: dev1.addons.phx1.mozilla.com
Content-Type: application/json; charset=utf-8
Access-Control-Expose-Headers: API-Filter, API-Status, API-Version
Strict-Transport-Security: max-age=2592000
Date: Wed, 08 May 2013 03:10:15 GMT
Transfer-Encoding: chunked
Access-Control-Allow-Origin: *
X-Content-Security-Policy-Report-Only: policy-uri /services/csp/policy?build=a1a1
Via: Moz-pp-zlb09
Connection: keep-alive
API-Filter: carrier=&device=tablet&lang=en-US&region=es
Access-Control-Allow-Methods: GET, OPTIONS
Access-Control-Allow-Headers: X-HTTP-Method-Override, Content-Type
API-Version: 1

curl -I -X GET   0.01s user 0.00s system 3% cpu 0.236 total


%% time curl -I -X GET 'https://marketplace-altdev.allizom.org/api/v1/apps/category/?dev=desktop&format=json&lang=en-US&pro=3040fe.32.1&region=es'
HTTP/1.1 200 OK
Server: gunicorn/0.17.4
Vary: API-Filter, X-Requested-With, Accept-Language, Cookie, X-Mobile, User-Agent
X-Backend-Server: dev2.addons.phx1.mozilla.com
Content-Type: application/json; charset=utf-8
Access-Control-Expose-Headers: API-Filter, API-Status, API-Version
Strict-Transport-Security: max-age=2592000
Date: Wed, 08 May 2013 03:10:18 GMT
Transfer-Encoding: chunked
Access-Control-Allow-Origin: *
X-Content-Security-Policy-Report-Only: policy-uri /services/csp/policy?build=a1a1
Via: Moz-pp-zlb09
Connection: keep-alive
API-Filter: carrier=&device=tablet&lang=en-US&region=es
Access-Control-Allow-Methods: GET, OPTIONS
Access-Control-Allow-Headers: X-HTTP-Method-Override, Content-Type
API-Version: 1

curl -I -X GET   0.01s user 0.00s system 0% cpu 2.064 total


Most of the time from `curl` I get 0.2-0.8s. In the browser, that's a different story.

Homepage:
http://f.cl.ly/items/001v2V0Q1t041j3N1Y1J/Screen%20Shot%202013-05-07%20at%208.15.47%20PM.png

Blank search:
http://f.cl.ly/items/2u0y2x2u1t39072R3J0a/Screen%20Shot%202013-05-07%20at%208.16.36%20PM.png

Search for "apple":
http://f.cl.ly/items/3O3B2W2t1y0d1r362r0J/Screen%20Shot%202013-05-07%20at%208.16.18%20PM.png

Notice how fast the featured apps response is. I wouldn't be surprised if TastyPie -> MySQL -> ES -> MySQL -> Tastypie / Django is slowing down search. The awesome news is that Rob Hudson is fixing all of this so we never hit MySQL for searches. (But if ES is what's slow we might be in trouble.)

If anyone (e.g., Basta) has any more observations, throw them in this bug. I'm very curious, but all I know is that on the Unagi phones the categories and search are pitifully slow.
Comment 4 Matt Basta [:basta] 2013-05-07 21:11:54 PDT
Note that your benchmarks are wrong because you're using -I, which only does a HEAD request.
Comment 5 Matt Basta [:basta] 2013-05-07 21:16:04 PDT
The endpoints that are particularly critical (i.e.: the ones that Krupa has been complaining about) are:

/api/v1/apps/search/
- Any search seems to take at least a second when I curl, even with just a ?q=. Some take two and up.

/api/v1/apps/search/featured/
- Each load of one of these pages with no particular args or headers exhibits the same as noted in comment #0.

/api/v1/apps/rating/?app=...
- Any list of ratings with ?app= with at least one rating seems to take a half second or more.
Comment 6 Rob Hudson [:robhudson] 2013-05-07 22:56:17 PDT
I've got a local mostly working ES-only search API working. Here's the difference locally (using cvan's fancy curl output):

The ES -> MySQL currently:
Connect: 0.002 TTFB: 0.559 Total time: 0.560

The ES only of the future:
Connect: 0.002 TTFB: 0.075 Total time: 0.075
Comment 7 Andy McKay [:andym] 2013-05-08 09:19:40 PDT
Thanks for data. That helps.
Comment 8 Andy McKay [:andym] 2013-05-08 16:35:27 PDT
FYI as part of bug 866942, I added in some statsd API logging and pages. Now we need some data (and probably to start refining our pages).

http://localhost:5000/graphite-api?site=marketplace-altdev&graph=apps
Comment 9 Andy McKay [:andym] 2013-05-08 16:35:53 PDT
Hmm, how about a none localhost URL :)

http://addonsadm.private.phx1.mozilla.com/dashboard/graphite-api?site=marketplace-altdev&graph=search
Comment 10 Andy McKay [:andym] 2013-05-14 06:27:56 PDT
Behold:

http://addonsadm.private.phx1.mozilla.com/dashboard/graphite-api?site=marketplace&graph=apps

750ms to 2 seconds is not good. We can get that faster.
Comment 11 Christopher Van Wiemeersch [:cvan] 2013-05-15 07:55:14 PDT
This is likely because of elasticsearch but if you want to see something painful watch http://screencast.com/t/NtTRsFacLC and jump to the parts where Victor is waiting for the Marketplace consumer pages to load (i.e., the API endpoints to load).
Comment 12 Andy McKay [:andym] 2013-05-24 09:43:11 PDT
The request middleware takes 30ms locally. 28ms of that is in one middleware:

09:42:08 statsd:INFO Timing: base.request.LocaleMiddleware, 28, 1 :/Users/andy/sandboxes/zamboni/django_statsd/clients/log.py:13
Comment 13 Andy McKay [:andym] 2013-05-24 09:54:44 PDT
That's pretty much all in tower.activate
Comment 14 Andy McKay [:andym] 2013-05-24 10:32:12 PDT
We should start with search:

https://rpm.newrelic.com/accounts/315282/applications/2787793/transactions#id=281592532
Comment 15 Rob Hudson [:robhudson] 2013-05-24 10:41:58 PDT
(In reply to Andy McKay [:andym] from comment #14)
> We should start with search:
> 
> https://rpm.newrelic.com/accounts/315282/applications/2787793/
> transactions#id=281592532

I can re-enable the 'search-api-es' waffle on -dev and compare to that. I'm hoping we can move to that soon.
Comment 16 Andy McKay [:andym] 2013-05-27 13:48:32 PDT
For other people looking at optimising: here is a page that pulls in the API and then shows it as HTML. Which then gives you DJDT on the request.

https://github.com/andymckay/zamboni/commit/472eb2
Comment 17 Andy McKay [:andym] 2013-05-30 09:21:45 PDT
Bugs are spawning off this, keeping this in the milestone to stay on my radar.
Comment 18 Andy McKay [:andym] 2013-06-19 13:53:20 PDT
I think the only slow API left from a timing point of view (well at least anything I'm worrying about) is search:

http://addonsadm.private.phx1.mozilla.com/dashboard/graphite-api?site=marketplace&graph=apps.search

Which I know robhudson is working.
Comment 19 Rob Hudson [:robhudson] 2013-06-19 16:42:06 PDT
The featured API is probably a bad one. And it's on the homepage too.
Comment 20 Andy McKay [:andym] 2013-06-20 09:46:07 PDT
Yay @robhudson and his magic search fixes:

http://cl.ly/image/2d1k2k3H2o1q
Comment 21 Matt Basta [:basta] 2013-07-04 10:24:55 PDT
So this is pretty open-ended right now, so here's some specific goals that I think we can hit that would satisfy the purposes of this bug:

Category API
https://marketplace-dev.allizom.org/api/v1/apps/category/

Currently takes about 0.4-0.6s [1] to complete. Since it's cacheable by region and carrier, this could imaginably be sped up very significantly. On the front end, we could remove the feature profile from requests to this API and the back-end could set cache control headers to get Zeus to cache this (it should never change with user state). I think we could easily get this one down to 0.1-0.3s.


Search Feature API
https://marketplace-dev.allizom.org/api/v1/fireplace/search/featured/

Currently takes a consistent 1.3s [1] to complete. Can we store user objects in memcached? The object that tells whether the user has installed/developed/purchased an app could be memoized and stored in memcached and fetched with a get_multi. Busting the cache for that object would only need to be done in the receipt endpoint(s).

I think we can realistically get this API to complete in under a second.


[1] All tests were carried out with `time curl <url>` on a VPS in Los Angeles. Times were adjusted to represent waiting time between confirmation of HTTP request headers and first byte.
Comment 22 Rob Hudson [:robhudson] 2013-07-08 08:52:20 PDT
I'm going to take this to wrap some explicit caching around the featured API. I can look at the category API and caching as well.
Comment 23 Rob Hudson [:robhudson] 2013-07-08 08:54:48 PDT
Basta: Can we draw the screen before the featured apps API returns? I.e. draw everything else but leave the featured apps area with a spinner? Why do we need to block on it? Likewise with the categories... they aren't shown unless the user clicks.
Comment 24 Rob Hudson [:robhudson] 2013-07-08 09:55:57 PDT
https://github.com/mozilla/zamboni/commit/94b8704 adds caching around /search/featured.
Comment 25 Matt Basta [:basta] 2013-07-08 10:04:22 PDT
(In reply to Rob Hudson [:robhudson] from comment #23)
> Basta: Can we draw the screen before the featured apps API returns? I.e.
> draw everything else but leave the featured apps area with a spinner?

We used to have that, but you end up with weirdness:

1. There's twice as many requests for the homepage, which means that on the Unagi, things load twice as slowly
2. If the featured apps load first, you're left with a spinner for the body
3. If there's no featured apps, the box and the spinner randomly disappear

> Likewise with the categories... they aren't shown
> unless the user clicks.

Categories don't block the homepage from rendering, but we need to fetch them on initial load for a few reasons:

1. We use that data to get the names of the categories, so if you're deeplinked to a category we know what the name of it is (we only have the slug from the URL)
2. If the user clicked the category dropdown and we showed a spinner for a few seconds, that's pretty crappy UX since we could have been loading that asynchronously (which we currently do)
3. Something something operator shelf SIM card
Comment 26 Mathieu Pillard [:mat] 2014-03-25 09:57:15 PDT
Closing since we have been working on more specific bugs. Marking as duplicate of the main API tracking bug.

*** This bug has been marked as a duplicate of bug 926640 ***

Note You need to log in before you can comment on or make changes to this bug.