564540 - Make our fancy new CSVs and JSON faster

Reporter

Description

•

14 years ago

We have a pile of new CSVs and from my click-and-wait testing strategy I think they're all being built on the fly with minimal caching afterwards.  Now that we have more infra options we should be building these smarter:

1) All CSVs and JSON are sent to celery to be built on the first request.  HTTP 202 is returned to the user until the output is ready

2) When celery finishes building the output it is cached to disk in a shared location with a structure of /stats/$addonid/$name.$type.  The shared space is the place remora currently calls NETAPP_STORAGE.  Example:  NETAPP_STORAGE/stats/1865/os-week-20100101-20100601.json

3) When a file is used from the disk cache it should also be touched.  We'll run a GC cron job that removes all files that haven't been touched in a month.

Wil Clouser [:clouserw]

Reporter

Updated

•

14 years ago

Blocks: 564541

Wil Clouser [:clouserw]

Reporter

Updated

•

14 years ago

Target Milestone: 5.11 → 5.12

Stephen Donner [:stephend] Not actively reading bugmail

Updated

•

14 years ago

Keywords: perf

Wil Clouser [:clouserw]

Reporter

Comment 2

•

14 years ago

Random notes:

We've decided to drop the "-week-" part of this, which should make our calculations and caching easier.  All exports will be daily, and people can extrapolate week/month from that.  The current stats models/views/urls should be modified for this.

The celery jobs will need locking so they don't duplicate work.

Wil Clouser [:clouserw]

Reporter

Updated

•

14 years ago

Assignee: nobody → ianb

Wil Clouser [:clouserw]

Reporter

Updated

•

14 years ago

Target Milestone: 5.12 → 5.11.3

Wil Clouser [:clouserw]

Reporter

Updated

•

14 years ago

Target Milestone: 5.11.3 → 5.11.4

Wil Clouser [:clouserw]

Reporter

Updated

•

14 years ago

Target Milestone: 5.11.4 → 5.12

Ian Bicking (:ianbicking)

Assignee

Comment 3

•

14 years ago

I have been trying to figure out where the performance problems on this code come from. The query is quite fast and well indexed, and shows only modest time increases as the amount of data goes up. Running locally with a complete data set, I can't reproduce any timing problem.

So I tried looking more closely on a production box, specifically pm-app-amodev01, adding bits of code to see where the time goes. So far I haven't figured it out. For instance, a request that for downloads can take 1.5 seconds of clock time to request (it's not entirely consistent). This hits zamboni.apps.stats.views.downloads_series. I looked into that code, expecting to find something that is slow, but the function takes 0.02 seconds to complete. I looked at TimingMiddleware, to see if something it was tracking might be slower, but it gave basically the same number. I tried disabling all the middleware (except amo.middleware.LocaleAndAppURLMiddleware which is required) and it didn't speed up. LocaleAndAppURLMiddleware doesn't appear very complicated. The only thing that didn't seem clear in that code was tower.activate(), and commenting that out made no change.

To be extra sure, I put code in zamboni.wsgi to check the time, and then it was only 0.2 seconds. So... I really have no idea where in the stack this slowness is happening. It's not in Python anywhere; at least, not in a way I can reproduce.

I don't know any more about what lies between mod_wsgi and completing an HTTP request in this configuration, so I don't know where else it would be slow.

Wil Clouser [:clouserw]

Reporter

Comment 4

•

14 years ago

https://preview.addons.mozilla.org/z/en-US/firefox/addon/1865/statistics/versions-day-20070101-20100507.csv is an example URL that times out

URL: https://preview.addons.mozilla.org/z/...

Target Milestone: 5.12 → 5.11.8

Jeff Balogh (:jbalogh)

Comment 6

•

14 years ago

ianb: Is there anything else besides http://github.com/jbalogh/zamboni/commit/a84f18d6ca?  Can we close this?

Jeff Balogh (:jbalogh)

Comment 7

•

14 years ago

Since the URL from comment 4 still times out, I suspect we have memory/object problems.  If these queries are creating db.Model objects, that's going to spend a ton of time building objects we probably don't need.  And then all those objects are going to get pickled and cached.  Caching will fail anyways if we go over memcached's 1MB limit.  Using .values or .values_list will probably help.

http://github.com/jbalogh/zamboni/blob/master/apps/stats/views.py#L268 doesn't help, but JSON times out too, so that's not the only culprit.

Ian Bicking (:ianbicking)

Assignee

Comment 8

•

14 years ago

Serving up even quite large content should be fine, and the CSV writer and JSON writer are really quite fast (CSV is faster than JSON in my tests, but the query happens in slightly different places in those two codepaths which obscure that).  So if it's timing out it's probably in other places.  Obviously the query itself could time out, which we can probably determine simply by running that query manually and seeing how long it takes.

If it's the cache system, we could avoid the ORM entirely and run SQL directly.

Wil Clouser [:clouserw]

Reporter

Comment 9

•

14 years ago

ianb is on vacation next week.  bumping

Target Milestone: 5.11.8 → 5.12

Justin Scott [:fligtar]

Updated

•

14 years ago

Priority: -- → P5

Ian Bicking (:ianbicking)

Assignee

Comment 10

•

14 years ago

Local performance testing indicates this commit brings the requests into an acceptable time limit: http://github.com/ianb/zamboni/commit/d1795ec224f6b3a481bff005d753c19227d340d2 (changes database data from phpserialize to JSON)

Performance increases require a data migration, which can be performed with:

  python manage.py convert_stats_to_json

It can't run in SQL, and will likely take a very long time, but can be applied incrementally and will pick up where it left off.

We need to check if the PHP side can produce JSON for stats, and if there is any other code consuming the phpserialized data.

Wil Clouser [:clouserw]

Reporter

Comment 11

•

14 years ago

(In reply to comment #10)
> Local performance testing indicates this commit brings the requests into an
> acceptable time limit:
> http://github.com/ianb/zamboni/commit/d1795ec224f6b3a481bff005d753c19227d340d2
> (changes database data from phpserialize to JSON)
> 
> Performance increases require a data migration, which can be performed with:
> 
>   python manage.py convert_stats_to_json
> 
> It can't run in SQL, and will likely take a very long time, but can be applied
> incrementally and will pick up where it left off.

This is looking great.  The best way to do long running offline stuff is with celeryd.  That lets us rate limit it and distribute it if needed.  You can see an example of a manage script doing that at http://github.com/jbalogh/zamboni/commit/1758532dfdad6935ee547a813178cecb7af83c4c .  The apply_async() call is what is interesting there.

> We need to check if the PHP side can produce JSON for stats, and if there is
> any other code consuming the phpserialized data.

I'll poke around in the PHP and see.

Ian Bicking (:ianbicking)

Assignee

Comment 12

•

14 years ago

One caveat which might be a problem for Celery is that when I was running this migration locally, after a few hours a memory leak stalled out the process.  MySQLdb is known to leak memory... or it could be Python, its memory management can fall down in some circumstances.  But either way, eventually the command will break the process it runs in.  I put in an option to limit the number of updates it runs, and the command exits with a non-zero status if the limit is reached, so it can be looped until done.  Does Celery recycle processes or have some maximum lifetime on processes?  (I guess I could actually just exit too, and I'm guessing Celery will restart?)

Ian Bicking (:ianbicking)

Assignee

Comment 13

•

14 years ago

Celery conversion here: http://github.com/ianb/zamboni/commit/536d87f5b280a87854b117795a4e21f1de3d19d1

I haven't tested it, as I don't have a running Celery setup in my development environment (the non-celery version still works; --queue implements it as a task).

Also I'm not sure about the lifecycle of celery processes, it's still a concern that memory will be leaked if those processes are reused for multiple tasks.

Wil Clouser [:clouserw]

Reporter

Comment 14

•

14 years ago

Thanks.  I tried to run this tonight, and I agree it's using crazy amounts of RAM and CPU.  It looks like I have to specify max-objs, and since I don't want to sit here all night I went for:

./manage.py convert_stats_to_json --queue --max-objs=999999999

It returns super fast, but celery thrashes.  If I do max-objs 1 or 2 it doesn't die, but 20 to 200 just makes it thrash.

I glanced at the code a bit - are you sure you're not recursing forever?  It looks like you pass max_objs into update_to_json, and then pass it in again from after_max_redo.  I didn't dig into it deeper since I'm falling asleep.

To use celery, the idea is to loop over your objects in chunks, and then pass each chunk to apply_async (like the script in comment 11).  I think the way you're doing it is passing the entire max_objs into celery, so celery is doing the work, but it's we're not getting any of the benefit of rate limiting or parallel processing.  (Unless you run the command a zillion times on the command line, I guess).

Ian Bicking (:ianbicking)

Assignee

Comment 15

•

14 years ago

That definitely shouldn't happen.  I'll look at the logic there.  I can't loop over any chunks ahead of time as simply finding the things that need to be updated takes a significant amount of the work.  I've been running with max-objs at 100000 or so without problem, and the way I intended the Celery task to work is that it should restart its own task after reaching that limit.  The whole task is safely rerunnable (but not useful to run in parallel), so the intention is to simply restart the task after doing a bunch of work.

Wil Clouser [:clouserw]

Reporter

Comment 16

•

14 years ago

(In reply to comment #11)
> 
> I'll poke around in the PHP and see.

The only code I saw using the JSON code were the statistics pages themselves, so that should make this transition manageable.  Once we make the transition and confirm the speedup, we'll have to let metrics know about the change(

In reply to comment #15)
> That definitely shouldn't happen.  I'll look at the logic there.  I can't loop
> over any chunks ahead of time as simply finding the things that need to be
> updated takes a significant amount of the work.  I've been running with
> max-objs at 100000 or so without problem, and the way I intended the Celery
> task to work is that it should restart its own task after reaching that limit. 
> The whole task is safely rerunnable (but not useful to run in parallel), so the
> intention is to simply restart the task after doing a bunch of work.

Whatever works is fine with me.  Keep in mind that this is a run once script - it doesn't need to be beautiful.  The more work we keep off the db and onto the celery boxes the better.  If the migration takes longer than a couple hours, we'll have to figure out a plan for an extended stats outage or some method of keeping both versions available.  Since historical stats don't change we can do that, it'll just be more coordinating.

Ian Bicking (:ianbicking)

Assignee

Comment 17

•

14 years ago

Well, if we're going to keep this simple, I've tested it as a simple shell script with a while loop to do the retrying.  It took about two full days on my laptop to do the update, and I only have a partial database dump.

I should clarify though that the migration can run with the application live, so nothing has to be taken offline.  It can simply run as long as it needs to, and be interrupted at any time and restarted.  The change I made to the code still allows phpserialized data to be read, it just detects and decodes JSON data as well.

We could do a smaller migration in staging to test as well, e.g., convert just the records from that known-slow URL.  The script doesn't support that kind of selectivity, but I could add that.

Wil Clouser [:clouserw]

Reporter

Comment 18

•

14 years ago

(In reply to comment #17)
> The change I made to the code
> still allows phpserialized data to be read, it just detects and decodes JSON
> data as well.

It detects it in python.  Our problem is that the PHP side is still expecting what is in the db, so if it changes it'll break.  We'd either need to modify the PHP code to see both, or to the switch all at once.

Ian Bicking (:ianbicking)

Assignee

Comment 19

•

14 years ago

I thought you said the PHP wasn't reading those values, only writing them?  Or maybe I misunderstood you.  Changing the PHP side seems reasonable -- and necessary, if we want the PHP side to read the new JSON values too, right?  The detection the Python code is doing is just looking for value.startswith('{') or value.startswith('['), and if one of those then using JSON.

Wil Clouser [:clouserw]

Reporter

Comment 20

•

14 years ago

PHP is definitely reading the values, but I can look into modifying that since we are on a new enough version of PHP.

I've tried to run your command with max-objs at 100000, 10000, and 1000 and it fails every time saying it's out of memory:


(zamboni)clouserw@khan:~/dev/zamboni$ ./manage.py convert_stats_to_json --max-objs=1000
/home/clouserw/.virtualenvs/zamboni/lib/python2.6/site-packages/celery/loaders/default.py:93: NotConfigured: No celeryconfig.py module found! Please make sure it exists and is available to Python.
  NotConfigured)
11:54:1284663249 nose.plugins.manager:DEBUG DefaultPluginManager load plugin sqlalchemy = sqlalchemy.test.noseplugin:NoseSQLAlchemy :/home/clouserw/.virtualenvs/zamboni/lib/python2.6/site-packages/nose/plugins/manager.py:359
11:54:1284663249 z.command:DEBUG Updating model Contribution.post_data (1/8) :/home/clouserw/dev/zamboni/apps/stats/tasks.py:221
11:54:1284663249 z.command:DEBUG SELECT `stats_contributions`.`id`, `stats_contributions`.`addon_id`, `stats_contributions`.`amount`, `stats_contributions`.`source`, `stats_contributions`.`source_locale`, `stats_contributions`.`annoying`, `stats_contributions`.`created`, `stats_contributions`.`uuid`, `stats_contributions`.`is_suggested`, `stats_contributions`.`suggested_amount`, `stats_contributions`.`comment`, `stats_contributions`.`transaction_id`, `stats_contributions`.`post_data` FROM `stats_contributions` WHERE (NOT ((`stats_contributions`.`post_data` LIKE BINARY {%  AND NOT (`stats_contributions`.`post_data` IS NULL))) AND NOT ((`stats_contributions`.`post_data` LIKE BINARY [%  AND NOT (`stats_contributions`.`post_data` IS NULL))) AND NOT ((`stats_contributions`.`post_data` =   AND NOT (`stats_contributions`.`post_data` IS NULL)))) :/home/clouserw/dev/zamboni/apps/stats/tasks.py:221
11:54:1284663257 z.command:DEBUG Updating model DownloadCount.sources (2/8) :/home/clouserw/dev/zamboni/apps/stats/tasks.py:221
11:54:1284663257 z.command:DEBUG SELECT `download_counts`.`id`, `download_counts`.`addon_id`, `download_counts`.`count`, `download_counts`.`date`, `download_counts`.`src` FROM `download_counts` WHERE (NOT ((`download_counts`.`src` LIKE BINARY {%  AND NOT (`download_counts`.`src` IS NULL))) AND NOT ((`download_counts`.`src` LIKE BINARY [%  AND NOT (`download_counts`.`src` IS NULL))) AND NOT ((`download_counts`.`src` =   AND NOT (`download_counts`.`src` IS NULL)))) :/home/clouserw/dev/zamboni/apps/stats/tasks.py:221
Exception MemoryError: MemoryError() in <bound method Cursor.__del__ of <MySQLdb.cursors.Cursor object at 0xc4e512c>> ignored
Traceback (most recent call last):
  File "./manage.py", line 57, in <module>
    execute_manager(settings)
  File "/home/clouserw/.virtualenvs/zamboni/src/django/django/core/management/__init__.py", line 438, in execute_manager
    utility.execute()
  File "/home/clouserw/.virtualenvs/zamboni/src/django/django/core/management/__init__.py", line 379, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/clouserw/.virtualenvs/zamboni/src/django/django/core/management/base.py", line 191, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/home/clouserw/.virtualenvs/zamboni/src/django/django/core/management/base.py", line 220, in execute
    output = self.handle(*args, **options)
  File "/home/clouserw/dev/zamboni/apps/stats/management/commands/convert_stats_to_json.py", line 30, in handle
    updater.run()
  File "/home/clouserw/dev/zamboni/apps/stats/tasks.py", line 204, in run
    self.update_all_models()
  File "/home/clouserw/dev/zamboni/apps/stats/tasks.py", line 297, in update_all_models
    self.update_objs(model, field_name)
  File "/home/clouserw/dev/zamboni/apps/stats/tasks.py", line 265, in update_objs
    for obj in qs:
  File "/home/clouserw/.virtualenvs/zamboni/src/django/django/db/models/query.py", line 106, in _result_iter
    self._fill_cache()
  File "/home/clouserw/.virtualenvs/zamboni/src/django/django/db/models/query.py", line 760, in _fill_cache
    self._result_cache.append(self._iter.next())
  File "/home/clouserw/.virtualenvs/zamboni/src/django/django/db/models/query.py", line 282, in iterator
    obj = self.model(*row[index_start:aggregate_start])
  File "/home/clouserw/.virtualenvs/zamboni/src/django/django/db/models/base.py", line 273, in __init__
    setattr(self, field.attname, val)
  File "/home/clouserw/.virtualenvs/zamboni/src/django/django/db/models/fields/subclassing.py", line 98, in __set__
    obj.__dict__[self.field.name] = self.field.to_python(value)
MemoryError

Jeff Balogh (:jbalogh)

Comment 21

•

14 years ago

We could run this on preview to see how it helps speed and do loadtests.  We should be decomissioning all of remora's stats pages when we flip to potch's, so I don't think remora needs a compat layer.

Ian Bicking (:ianbicking)

Assignee

Comment 22

•

14 years ago

I'm not sure why that MemoryError is occurring... I haven't been able to trigger that without doing a lot of work, and 1000 objects is practically nothing.  Are you running exactly "./manage.py convert_stats_to_json
--max-objs=1000" -- no --queue option?  Does it do much work before you get the MemoryError?

For handling the transition Jeff's idea seems reasonable -- I can run the migration on preview (maybe just on select objects, so it doesn't take too long), we can be sure this actually resolves the problem, and if the changes to apps.stats.db are deployed then the migration can be done sometime in the future when we do the flip (and feel confident it doesn't need to be reverted).

Wil Clouser [:clouserw]

Reporter

Comment 23

•

14 years ago

The command I ran is what I pasted above.  It takes about ~5 minutes for it to get the memory error, I'm not sure what it's doing during that time.  This is on khan, fwiw.

I'd like to run the migration on my dev database before doing preview, but yeah, we'll definitely need to do some perf testing there before this goes live.

What else do you need as far as debugging output for the MemoryError?  Have you tried running this on khan?

Ian Bicking (:ianbicking)

Assignee

Comment 24

•

14 years ago

I did not realize there was a setup on khan to run it on, that'd be very useful!  Is the khan setup documented anywhere, or is there anything important I should know about it?  It seems like I can just copy your settings_local into a checkout/venv and work from that... I can't tell from the settings where the running instance actually lives though.

Also how is the database managed?  Like if I mess it up, or simply want to retest what I'm doing, is there a way to restore or revert my changes?

Wil Clouser [:clouserw]

Reporter

Comment 25

•

14 years ago

(In reply to comment #24)
> I did not realize there was a setup on khan to run it on, that'd be very
> useful!  Is the khan setup documented anywhere, or is there anything important
> I should know about it?  It seems like I can just copy your settings_local into
> a checkout/venv and work from that... I can't tell from the settings where the
> running instance actually lives though.

My running instance is in my home directory, you should make your own venv and configure it.  You're welcome to grab my settings_local, just don't use my db. ;)

The dbs all live on cm-webdev01-master-01.  I gave you credentials for that a long time ago, so I think you still have a complete dataset on there.  Do you still have them?

Wil Clouser [:clouserw]

Reporter

Comment 26

•

14 years ago

We're making good progress on this - script can currently run.  Ian: can you paste in the command you use to convert an entire database?

Target Milestone: 5.12 → 5.12.1

Ian Bicking (:ianbicking)

Assignee

Comment 27

•

14 years ago

Using the code from my branch (http://github.com/ianb/zamboni), run:

  while ! python ~/dev/zamboni/manage.py convert_stats_to_json --max-objs=10000 ; do echo "Another run" ; done

Wil Clouser [:clouserw]

Reporter

Comment 28

•

14 years ago

Here are some numbers for the current pages off PAMO.  All the remora /statistics/csv/* pages:

Good: 663
Max/Avg/Min/StdDev: 492737ms/152936ms/102ms/135337ms
Bad: 160
Total Time: 494.109225988
Req/s: 1.27299788572

Is that an average time of 152 seconds?  Yes, it is.

And the /statistics/json pages:

Good: 169
Max/Avg/Min/StdDev: 97401ms/65573ms/1893ms/35759ms
Bad: 0
Total Time: 97.4137899876
Req/s: 1.64247792864

Wil Clouser [:clouserw]

Reporter

Comment 29

•

14 years ago

Alright, I reran this to give remora more of a chance and grabbed some pleasantly good zamboni numbers as well.  The following is for 20 simultaneous requests on PAMO.

Remora:
Max/Avg/Min/StdDev: 38604ms/10794ms/146ms/8841ms
Total Time: 521.80540514
Req/s: 1.20543021173

Zamboni CSV (1000 URLs):
Max/Avg/Min/StdDev: 11729ms/1396ms/36ms/1787ms
Total Time: 110.506951809
Req/s: 8.48815377355

Zamboni JSON (1000 URLs):
Max/Avg/Min/StdDev: 8406ms/1143ms/43ms/1558ms
Total Time: 93.6633141041
Req/s: 10.0145933226

Please clean up/rebase your migration and I'll land it on our trunk and tomorrow night I'll run it and we'll grab some new numbers.  Thanks Ian.

Wil Clouser [:clouserw]

Reporter

Comment 30

•

14 years ago

I squashed this and landed it on trunk: http://github.com/jbalogh/zamboni/commit/8ab589019e94bf3a7cb6d7141159204014d998e7

Wil Clouser [:clouserw]

Reporter

Comment 31

•

14 years ago

Ran this on PAMO last night with:

while ! python26 manage.py convert_stats_to_json --max-objs=10000; do echo -n "sleeping..."; sleep 10; echo "gogo"; done

Still ran out of memory after a few hours.  Restarted again, but it's done a lot, we'll probably be done today.

Wil Clouser [:clouserw]

Reporter

Comment 32

•

14 years ago

Update:  No luck now, even with --max-objs=100 we run out of memory on the first run.  The box has 3G free before running.

Wil Clouser [:clouserw]

Reporter

Updated

•

14 years ago

Target Milestone: 5.12.1 → 5.12.2

Wil Clouser [:clouserw]

Reporter

Comment 33

•

14 years ago

(In reply to comment #29)
> Zamboni CSV (1000 URLs):
> Max/Avg/Min/StdDev: 11729ms/1396ms/36ms/1787ms
> Total Time: 110.506951809
> Req/s: 8.48815377355
> 
> Zamboni JSON (1000 URLs):
> Max/Avg/Min/StdDev: 8406ms/1143ms/43ms/1558ms
> Total Time: 93.6633141041
> Req/s: 10.0145933226

The script finished running yesterday at ~10ish.  New load test results:

CSV:
Max/Avg/Min/StdDev: 12745ms/1594ms/34ms/2194ms
Total Time: 127.117345095
Req/s: 7.37900873639

JSON:
Max/Avg/Min/StdDev: 10041ms/1245ms/64ms/1615ms
Total Time: 100.236711979
Req/s: 9.3578488508

So, overall worse performance, but within a margin of error that makes it essentially no change.  

At this point I suppose the question is - do we want to continue working on this bug?  Using the remora results from comment 29 it looks like we're already substantially faster and we should close the bug.  However, the average response time is 1.2-1.5 seconds which is a really long time, especially when the front page does 4 stats requests.  Ian: were there other areas that looked like they could give promising results?

Ian Bicking (:ianbicking)

Assignee

Comment 34

•

14 years ago

From discussion, it appears simplejson speedups isn't installed in the environment used to test this, and without those speedups the JSON decoding is only modestly faster than phpserialize decoding.

Wil Clouser [:clouserw]

Reporter

Comment 35

•

14 years ago

oremj installed simplejson, there was no noticable change:

Max/Avg/Min/StdDev: 12478ms/1757ms/39ms/2116ms
Total Time: 135.53547287
Req/s: 6.92069743912

Max/Avg/Min/StdDev: 8746ms/1167ms/61ms/1675ms
Total Time: 95.0672271252
Req/s: 9.86670199989


I re-ran the test directly afterwards to see how the mysql cache would affect us and got pretty good results:

Max/Avg/Min/StdDev: 176ms/62ms/18ms/26ms
Total Time: 5.4371099472
Req/s: 172.518122515

Max/Avg/Min/StdDev: 165ms/61ms/15ms/28ms
Total Time: 4.57465982437
Req/s: 205.042568412

I think that points to the bottleneck being an unprimed database, not the code processing json.

Ian Bicking (:ianbicking)

Assignee

Comment 36

•

14 years ago

Note the json conversion was specifically to address certain entries that were timing out (taking over 15 seconds), rendering the data inaccessible.  For those records the response time was consistent and caching didn't provide any improvement, and the changes were made after profiling those specific requests.

Wil Clouser [:clouserw]

Reporter

Comment 37

•

14 years ago

For the record, I didn't get any timeouts in my benchmark and the URL in comment 4 is working

Ian Bicking (:ianbicking)

Assignee

Comment 38

•

14 years ago

I'm not sure if timeouts are consistent across preview and the live site.  Whether I got a timeout or just a long response time depended on exactly the URL I hit, and what servers I went through.  I don't believe pieces like Zeus are in place on preview.

Also, while the performance of that specific case with phpserialize could have been addressed, I asked and the consensus seemed to be that people wanted to generally remove phpserialize data and use json instead, which this implements.

Wil Clouser [:clouserw]

Reporter

Comment 39

•

14 years ago

They should be consistent, and this test case was. (and zeus is on preview)

I think we're making good progress - I'm happy we moved to json.  I'm just asking now if there are other areas we can improve with a reasonable amount of investment.

Ian Bicking (:ianbicking)

Assignee

Comment 40

•

14 years ago

The best thing to do then would be to use a profiler with that same test (though the profiler slows things down by nearly an order of magnitude, so a trimmed test would be best).  I don't think there's any reason that needs to be done on preview instead of a local dev box.  I did that for the one request, but I never committed any of the profile rigging.

I didn't notice this page before but it seems like a reasonable approach (specifically python -m cProfile): http://code.djangoproject.com/wiki/ProfilingDjango

One caveat in my experience: some results can be surprising or misleading because queries and memcached can be hit at unexpected times, making code look slow when it's actually just the first thing to access a lazily computed value.

Ian Bicking (:ianbicking)

Assignee

Comment 41

•

14 years ago

I've started putting together some profiling results locally, but looks like I'll need to run it on khan to get access to an updated full database.

Wil Clouser [:clouserw]

Reporter

Comment 42

•

14 years ago

(In reply to comment #41)
> I've started putting together some profiling results locally, but looks like
> I'll need to run it on khan to get access to an updated full database.

How's that going?

Target Milestone: 5.12.2 → 5.12.3

Ian Bicking (:ianbicking)

Assignee

Comment 43

•

14 years ago

I have the profiles running with csvs-zz.log, and I'm trying to interpret the results at this time.

Ian Bicking (:ianbicking)

Assignee

Comment 44

•

14 years ago

(I'm not sure these profiling results would be the same in the presence of caching.)

Most of the time is going into database-related parts of the code.  About 20% of the time is actually the database (MySQLdb routines).  It looks like database-related code takes about 70% of the time (conveniently this goes through cache-machine caching.base, though the overhead of the routine itself isn't high -- it's simply a point of aggregation).  The extra 50% of the time is the Django ORM, deserializing fields, and other miscellaneous but fairly generic bits of database-related code.

The most notable query, which accounts for 60% of the time, is getting the addon object for the queried data (get_object_or_404(Addon, id=addon_id)).  (This particular query might be successfully cached in a non-development environment.)  The object itself is only used to check permissions, otherwise the addon_id itself is sufficient.

Jeff Balogh (:jbalogh)

Comment 45

•

14 years ago

(In reply to comment #44)
> The most notable query, which accounts for 60% of the time, is getting the
> addon object for the queried data (get_object_or_404(Addon, id=addon_id)).

Using Addon.objects.all().no_transforms() as the queryset skips all the extra Addon baggage.

Ian Bicking (:ianbicking)

Assignee

Comment 46

•

14 years ago

Using Addon.objects.all().no_transforms().filter(id=addon_id) as suggested appears to make a substantial difference, bringing addon fetching down to 20% of the total time and halving request times.

The MySQL cache also has a substantial affect -- leaving the code overnight, all the database caching expired and overall time per request went up about ten fold.  Overall MySQL was taking about 65% of the request time in this case.

Wil Clouser [:clouserw]

Reporter

Comment 47

•

14 years ago

Any recommendations for next steps on this bug or have we done all we can?

Ian Bicking (:ianbicking)

Assignee

Comment 48

•

14 years ago

Upon reflection, the worry about performance here is primarily in the case when the MySQL query cache isn't ready.  I believe for the most part performance is fairly decent once the query is cached (plus browser caching already catches this case frequently), and most of the perceived performance problems will be from the uncached case (which is 10x as slow).  

Resolving the database bottleneck would require either persistent caching (e.g., precalculating some values), or other kinds of database tuning.  Precalculating values seems likely to be too resource-intensive considering that probably most stats go unviewed anyway.  I don't much about database tuning, so I cannot suggest much there, though I did check the indexes early on and they seemed generally good (and without those indexes probably even after warming up the database the queries would be slow), though I believe it is possible that the database indexes are not identical between all database deployments (since indexes are essentially invisible, and I saw some small mismatches in previous testing).

So in summary, I don't believe there's anything worth doing to improve general performance, unless there is a particular situation which introduces specific problems (which I don't know of at this time).

Wil Clouser [:clouserw]

Reporter

Comment 49

•

14 years ago

Ok, thanks for investigating and the patches for json'ing the data.  We'll see how things go.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

8 years ago

Product: addons.mozilla.org → addons.mozilla.org Graveyard