Closed Bug 869715 Opened 11 years ago Closed 10 years ago

Marketplace API performance needs improvement

Categories

(Marketplace Graveyard :: API, defect, P4)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 926640

People

(Reporter: basta, Unassigned)

References

Details

(Keywords: perf)

Some unscientific benchmarks gathered from `time curl <url>`:

Homepage:
real	0m1.283s
user	0m0.009s
sys	0m0.011s

Search (twitter):
real	0m1.159s
user	0m0.007s
sys	0m0.008s

Twitter's rating list, page 1:
real	0m0.514s
user	0m0.004s
sys	0m0.004s

Every hundred millisecond matters.
Blocks: 855908
Note that Firefox OS introduces additional latency, which means that search results can take over five seconds to load, changing sort can take upwards of three to four seconds to load.
Please include the URL you use, and ideally the response headers. So we can figure out servers and if things like geo-encoding occurred etc.
%% curl -s -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total time: %{time_total} n" -o /dev/null 'https://marketplace-altdev.allizom.org/api/v1/apps/category/?dev=desktop&format=json&lang=en-US&pro=3040fe.32.1&region=es'
Connect: 0.037 TTFB: 0.276 Total time: 0.276 n%                                                                                                                 


%% curl -s -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total time: %{time_total} n" -o /dev/null 'https://marketplace-altdev.allizom.org/api/v1/apps/category/?dev=desktop&format=json&lang=en-US&pro=3040fe.32.1&region=es'
Connect: 0.042 TTFB: 2.031 Total time: 2.031 n%


%% time curl -I -X GET 'https://marketplace-altdev.allizom.org/api/v1/apps/category/?dev=desktop&format=json&lang=en-US&pro=3040fe.32.1&region=es'
HTTP/1.1 200 OK
Server: gunicorn/0.17.4
Vary: API-Filter, X-Requested-With, Accept-Language, Cookie, X-Mobile, User-Agent
X-Backend-Server: dev1.addons.phx1.mozilla.com
Content-Type: application/json; charset=utf-8
Access-Control-Expose-Headers: API-Filter, API-Status, API-Version
Strict-Transport-Security: max-age=2592000
Date: Wed, 08 May 2013 03:10:15 GMT
Transfer-Encoding: chunked
Access-Control-Allow-Origin: *
X-Content-Security-Policy-Report-Only: policy-uri /services/csp/policy?build=a1a1
Via: Moz-pp-zlb09
Connection: keep-alive
API-Filter: carrier=&device=tablet&lang=en-US&region=es
Access-Control-Allow-Methods: GET, OPTIONS
Access-Control-Allow-Headers: X-HTTP-Method-Override, Content-Type
API-Version: 1

curl -I -X GET   0.01s user 0.00s system 3% cpu 0.236 total


%% time curl -I -X GET 'https://marketplace-altdev.allizom.org/api/v1/apps/category/?dev=desktop&format=json&lang=en-US&pro=3040fe.32.1&region=es'
HTTP/1.1 200 OK
Server: gunicorn/0.17.4
Vary: API-Filter, X-Requested-With, Accept-Language, Cookie, X-Mobile, User-Agent
X-Backend-Server: dev2.addons.phx1.mozilla.com
Content-Type: application/json; charset=utf-8
Access-Control-Expose-Headers: API-Filter, API-Status, API-Version
Strict-Transport-Security: max-age=2592000
Date: Wed, 08 May 2013 03:10:18 GMT
Transfer-Encoding: chunked
Access-Control-Allow-Origin: *
X-Content-Security-Policy-Report-Only: policy-uri /services/csp/policy?build=a1a1
Via: Moz-pp-zlb09
Connection: keep-alive
API-Filter: carrier=&device=tablet&lang=en-US&region=es
Access-Control-Allow-Methods: GET, OPTIONS
Access-Control-Allow-Headers: X-HTTP-Method-Override, Content-Type
API-Version: 1

curl -I -X GET   0.01s user 0.00s system 0% cpu 2.064 total


Most of the time from `curl` I get 0.2-0.8s. In the browser, that's a different story.

Homepage:
http://f.cl.ly/items/001v2V0Q1t041j3N1Y1J/Screen%20Shot%202013-05-07%20at%208.15.47%20PM.png

Blank search:
http://f.cl.ly/items/2u0y2x2u1t39072R3J0a/Screen%20Shot%202013-05-07%20at%208.16.36%20PM.png

Search for "apple":
http://f.cl.ly/items/3O3B2W2t1y0d1r362r0J/Screen%20Shot%202013-05-07%20at%208.16.18%20PM.png

Notice how fast the featured apps response is. I wouldn't be surprised if TastyPie -> MySQL -> ES -> MySQL -> Tastypie / Django is slowing down search. The awesome news is that Rob Hudson is fixing all of this so we never hit MySQL for searches. (But if ES is what's slow we might be in trouble.)

If anyone (e.g., Basta) has any more observations, throw them in this bug. I'm very curious, but all I know is that on the Unagi phones the categories and search are pitifully slow.
Flags: needinfo?(mattbasta)
Note that your benchmarks are wrong because you're using -I, which only does a HEAD request.
Flags: needinfo?(mattbasta)
The endpoints that are particularly critical (i.e.: the ones that Krupa has been complaining about) are:

/api/v1/apps/search/
- Any search seems to take at least a second when I curl, even with just a ?q=. Some take two and up.

/api/v1/apps/search/featured/
- Each load of one of these pages with no particular args or headers exhibits the same as noted in comment #0.

/api/v1/apps/rating/?app=...
- Any list of ratings with ?app= with at least one rating seems to take a half second or more.
I've got a local mostly working ES-only search API working. Here's the difference locally (using cvan's fancy curl output):

The ES -> MySQL currently:
Connect: 0.002 TTFB: 0.559 Total time: 0.560

The ES only of the future:
Connect: 0.002 TTFB: 0.075 Total time: 0.075
Thanks for data. That helps.
Priority: -- → P4
FYI as part of bug 866942, I added in some statsd API logging and pages. Now we need some data (and probably to start refining our pages).

http://localhost:5000/graphite-api?site=marketplace-altdev&graph=apps
Depends on: 870557
Behold:

http://addonsadm.private.phx1.mozilla.com/dashboard/graphite-api?site=marketplace&graph=apps

750ms to 2 seconds is not good. We can get that faster.
This is likely because of elasticsearch but if you want to see something painful watch http://screencast.com/t/NtTRsFacLC and jump to the parts where Victor is waiting for the Marketplace consumer pages to load (i.e., the API endpoints to load).
Assignee: nobody → amckay
Target Milestone: --- → 2013-05-30
The request middleware takes 30ms locally. 28ms of that is in one middleware:

09:42:08 statsd:INFO Timing: base.request.LocaleMiddleware, 28, 1 :/Users/andy/sandboxes/zamboni/django_statsd/clients/log.py:13
That's pretty much all in tower.activate
(In reply to Andy McKay [:andym] from comment #14)
> We should start with search:
> 
> https://rpm.newrelic.com/accounts/315282/applications/2787793/
> transactions#id=281592532

I can re-enable the 'search-api-es' waffle on -dev and compare to that. I'm hoping we can move to that soon.
Depends on: 874588
For other people looking at optimising: here is a page that pulls in the API and then shows it as HTML. Which then gives you DJDT on the request.

https://github.com/andymckay/zamboni/commit/472eb2
Depends on: 876523
Target Milestone: 2013-05-30 → ---
Bugs are spawning off this, keeping this in the milestone to stay on my radar.
Target Milestone: --- → 2013-06-06
Depends on: 877714
Target Milestone: 2013-06-06 → 2013-06-13
Target Milestone: 2013-06-13 → 2013-06-20
I think the only slow API left from a timing point of view (well at least anything I'm worrying about) is search:

http://addonsadm.private.phx1.mozilla.com/dashboard/graphite-api?site=marketplace&graph=apps.search

Which I know robhudson is working.
The featured API is probably a bad one. And it's on the homepage too.
Target Milestone: 2013-06-20 → 2013-06-27
Yay @robhudson and his magic search fixes:

http://cl.ly/image/2d1k2k3H2o1q
Target Milestone: 2013-06-27 → 2013-07-04
Assignee: amckay → nobody
So this is pretty open-ended right now, so here's some specific goals that I think we can hit that would satisfy the purposes of this bug:

Category API
https://marketplace-dev.allizom.org/api/v1/apps/category/

Currently takes about 0.4-0.6s [1] to complete. Since it's cacheable by region and carrier, this could imaginably be sped up very significantly. On the front end, we could remove the feature profile from requests to this API and the back-end could set cache control headers to get Zeus to cache this (it should never change with user state). I think we could easily get this one down to 0.1-0.3s.


Search Feature API
https://marketplace-dev.allizom.org/api/v1/fireplace/search/featured/

Currently takes a consistent 1.3s [1] to complete. Can we store user objects in memcached? The object that tells whether the user has installed/developed/purchased an app could be memoized and stored in memcached and fetched with a get_multi. Busting the cache for that object would only need to be done in the receipt endpoint(s).

I think we can realistically get this API to complete in under a second.


[1] All tests were carried out with `time curl <url>` on a VPS in Los Angeles. Times were adjusted to represent waiting time between confirmation of HTTP request headers and first byte.
I'm going to take this to wrap some explicit caching around the featured API. I can look at the category API and caching as well.
Assignee: nobody → robhudson.mozbugs
Target Milestone: 2013-07-04 → 2013-07-11
Basta: Can we draw the screen before the featured apps API returns? I.e. draw everything else but leave the featured apps area with a spinner? Why do we need to block on it? Likewise with the categories... they aren't shown unless the user clicks.
https://github.com/mozilla/zamboni/commit/94b8704 adds caching around /search/featured.
(In reply to Rob Hudson [:robhudson] from comment #23)
> Basta: Can we draw the screen before the featured apps API returns? I.e.
> draw everything else but leave the featured apps area with a spinner?

We used to have that, but you end up with weirdness:

1. There's twice as many requests for the homepage, which means that on the Unagi, things load twice as slowly
2. If the featured apps load first, you're left with a spinner for the body
3. If there's no featured apps, the box and the spinner randomly disappear

> Likewise with the categories... they aren't shown
> unless the user clicks.

Categories don't block the homepage from rendering, but we need to fetch them on initial load for a few reasons:

1. We use that data to get the names of the categories, so if you're deeplinked to a category we know what the name of it is (we only have the slug from the URL)
2. If the user clicked the category dropdown and we showed a spinner for a few seconds, that's pretty crappy UX since we could have been loading that asynchronously (which we currently do)
3. Something something operator shelf SIM card
Target Milestone: 2013-07-11 → ---
Assignee: robhudson.mozbugs → nobody
Closing since we have been working on more specific bugs. Marking as duplicate of the main API tracking bug.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.