545006 - prod socorro hardware install max RAM on Postgres server

Aravind - could you give me the size of tables on disk for: 1. the three latest partitions of the reports table 2. the overall size of the productdims and osdims tables (or anything w/ *dims in it)

Aravind Gottipati [:aravind]

Comment 4

•

15 years ago

the recent reports tables are between 116970 and 121298 relpages, so that would be between 118MB to 112MB in size. For the other stuff.. breakpad=# SELECT relname, relpages FROM pg_class where relname like '%dim%' ORDER BY relpages DESC limit 10; relname | relpages --------------------------------------------------------+---------- top_crashes_by_signature_osdims_key | 94792 top_crashes_by_signature_productdims_key | 94669 top_crashes_by_signature_window_end_productdims_id_idx | 91967 top_crashes_by_url_urldims_key | 13258 top_crashes_by_url_osdims_key | 12924 top_crashes_by_url_productdims_key | 12893 urldims_url_domain_key | 7620 urldims | 6290 urldims_pkey1 | 1319 osdims_name_version_key | 16 (10 rows) That seems like the biggest one (top_crashes_by_signature_osdims_key) is 92MB in size.

Michael Morgan [:morgamic]

Assignee

Comment 5

•

15 years ago

Isn't 1 relpage 8KB?

Aravind Gottipati [:aravind]

Comment 6

•

15 years ago

Whoa.. my math was totally off.. The recent reports tables are around between 2030 MB and 2302 MB each. The other stuff is breakpad=# SELECT pg_size_pretty(pg_total_relation_size('top_crashes_by_signature_osdims_key')); pg_size_pretty ---------------- 741 MB (1 row) breakpad=# SELECT pg_size_pretty(pg_total_relation_size('top_crashes_by_signature_productdims_key')); pg_size_pretty ---------------- 740 MB (1 row) breakpad=# SELECT pg_size_pretty(pg_total_relation_size('top_crashes_by_url_urldims_key ')); pg_size_pretty ---------------- 105 MB (1 row) breakpad=# SELECT pg_size_pretty(pg_total_relation_size('top_crashes_by_url_osdims_key')); pg_size_pretty ---------------- 102 MB (1 row)

Michael Morgan [:morgamic]

Assignee

Comment 7

•

15 years ago

Thanks Aravind - can you run the same query for %report% too

Aravind Gottipati [:aravind]

Comment 8

•

15 years ago

I did, thats how I got those earlier numbers (The recent reports tables are around between 2030 MB and 2302 MB each.)

Michael Morgan [:morgamic]

Assignee

Comment 9

•

15 years ago

09:31 < jberkus> aravind actually needs to run pg_total_relation_size() 09:31 < jberkus> which will include the indexes Can you run that?

[:jberkus] Josh Berkus

Comment 10

•

15 years ago

Actually, he did on the 2nd run. However, we need to know *how many* recent reports partitions there are at 2GB each. That'll give us our sizing; that much data needs to fit into around 60% of RAM. Alternately, if you had some local disk instead of that slow-as-molasses SAN, hitting disk wouldn't be quite the kiss of death it is now. Overall, if this system is going to remain in use for the forseeable future, then I think a new server is called for, one with 64-128GB of RAM and decent local disk.

Aravind Gottipati [:aravind]

Comment 11

•

15 years ago

(In reply to comment #10) > Actually, he did on the 2nd run. > > However, we need to know *how many* recent reports partitions there are at 2GB > each. That'll give us our sizing; that much data needs to fit into around 60% > of RAM. > That has to come from webdev/engineering (how many recent reports partitions are relevant).. We can safely assume that the 2GB number is a lower bound, because we expect the number of reports processed per week to only go up. Also, in the long term, I'd think that these reports partitions will go away entirely, since this data will be in hadoop?

[:jberkus] Josh Berkus

Comment 12

•

15 years ago

Aravind, Morgamic says that you'll still need the cached data for reporting for the most recent 4 weeks, even if the data is coming from hadoop instead of matview generation. But, once you have your policy, the math is simple: Total frequently accessed data <= ( RAM * 65% )

Aravind Gottipati [:aravind]

Comment 13

•

15 years ago

Going by that, it would appear that we need about 4 weeks of reports - that's around 8 GB + matview stuff. The two largest matview tables look like they are around 1.5 GB together. I don't know enough about how much these tables could grow or how many more such tables we plan on adding. It seem like we should be able to handle the existing data with the current memory of 16 GB (65% of that is 10.4GB). I suspect we are missing something else here. It does seem like we are just at the limit of the system with the current memory on the box. I don't have a good way of estimating the future growth of the matview tables without engineering/webdev input.

matthew zeier [:mrz]

Updated

•

15 years ago

Component: Server Operations → Socorro

Product: mozilla.org → Webtools

Laura Thomson :laura

Updated

•

15 years ago

Target Milestone: --- → 1.7

Michael Morgan [:morgamic]

Assignee

Comment 14

•

15 years ago

-> 1.7. If the 1.6 push fixes perf, we'll likely skip the expense of new RAM.

timellis

Comment 15

•

15 years ago

This quote seems pertinent: """Alternately, if you had some local disk instead of that slow-as-molasses SAN, hitting disk wouldn't be quite the kiss of death it is now. Overall, if this system is going to remain in use for the forseeable future, then I think a new server is called for, one with 64-128GB of RAM and decent local disk.""" I've never seen a server that seems so slow for such simple queries before. Can we do a trial with this database on a local-disk machine before we go spending $gazillions on RAM? Also, 64-128GB RAM seems extremely large. I would expect something more along the lines of 8-32GB RAM to work out. I'm asking morgamic for the report from the Postgres consultants so I can sanity check the things I'm saying here.

[:jberkus] Josh Berkus

Comment 16

•

15 years ago

The slowness is a result of the data not fitting in RAM and disk being incredibly slow. If you can speed up disk, more RAM isn't necessary. Also, it would be worth looking at how the matviews are set up. Multi-GB matviews kind of defeats the purpose of having matviews at all; there should be a way to make them more refined and smaller. i.e. make the data smaller.

timellis

Comment 17

•

15 years ago

So we're in agreement. Ditch the SAN. Spend the money we would've spent on RAM getting some decent disks which will last us far longer and give us more benefit. It's a sensible financial choice. It does sound like the material views could use some re-thinking, but I'm not sure I'm prepared to go into that right now.

[:jberkus] Josh Berkus

Comment 18

•

15 years ago

Yes. It needs to be good disk, though, like a 6-8 drive HW RAID, or a pair of SSDs.

timellis

Comment 19

•

15 years ago

Here's one possibility mrz and I pow-wowed about. HP DL360G5 Quickspecs (http://h18000.www1.hp.com/products/quicKspecs/12476_div/12476_div.HTML): it has Up to six SAS/SATA hot plug drive bays. So we can get 6 HP fast drives, set them up in a local array, and see if it works better. We have 300GB of space used now, so 4TB of total space (6x1.5TB/2) should last quite some time. If anyone has an intelligent guess on how long 4TB would last, please chime in.

matthew zeier [:mrz]

Comment 20

•

15 years ago

6 drive bays but 2 are in use. Largest capacity SAS drive is 300GB. Assuming it's mirrored, it's only 600GB. I'd vote against SATA for any number of reasons here.

[:jberkus] Josh Berkus

Comment 21

•

15 years ago

Moving to SATA drives might not be a huge improvement due to the lack of control queueing (i.e. can't read and write at the same time). Also, the outer tracks of those big drives tend to be sloooooooow. If 2 of the drive bays are already in use, can they be reconfigured to support the transaction log? If we did that, then putting 4 SAS drives in the remaining bays would probably be an improvement over current. However, see capacity issue above. Finally, given that you guys are moving to hadoop for your raw processing, wouldn't it make more sense to have hadoop generate highly targeted, compact matviews -- and get rid of "reports" entirely -- than to try to solve this issue with hardware? Just a thought.

Laura Thomson :laura

Comment 22

•

15 years ago

Morgamic? Are we going to get more RAM?

Michael Morgan [:morgamic]

Assignee

Comment 23

•

15 years ago

We don't need more RAM if we aren't hitting the reports table hard and rely more on matviews as Josh alluded to.

[:jberkus] Josh Berkus

Comment 24

•

15 years ago

FWIW, I'm working with another company which is using PostgreSQL as the reporting engine in front of Hadoop. As such, I'm working on some performance optimizations for the use case where PostgreSQL is basically a "cache". I would love to talk over your eventual design for the application in this light.

Ryan Snyder [:ryansnyder] [:rsnyder] [:rysny]

Comment 25

•

15 years ago

The PHX hardware will take care of this requirement; re-open if incorrect.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Updated

•

14 years ago

Component: Socorro → General

Product: Webtools → Socorro