Closed Bug 740629 Opened 14 years ago Closed 13 years ago

Dataset host for media.xiph.org

Categories

(mozilla.org Graveyard :: Server Operations, task)

Product:

Component:

Platform:

x86_64

Linux

Type:

task

Priority:

Not set

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: rillian, Assigned: fox2mike)

References

Details

Ralph Giles (:rillian)

Reporter

Description

•

14 years ago

Bug 729007 requests that Mozilla provide hosting for the Xiph.Org Foundation's collection of redistributable video masters for compression research. This is a resource used by developers working on codecs within Mozilla, and the greater community. It is, for example, where the 'Big Buck Bunny' clips everyone used for video demos last year came from. Zandr suggested the MCS programme might be the right place to implement this. Requirements: - Current allocation is 5 TB, expect to need 20 TB by the end of 2012. - Serve files over http(s) and public rsync. - Answer to the media.xiph.org vhost to avoid breaking established urls. - Most users are in North America, followed by Europe. - There number of users is small, consisting of compression researchers and enthusiasts, but the filesets are big. Peak traffic at the current host has been 100 Mbps after a significant new release, but is normally in the 10-20 Mbps range. I can set up and maintain the front-end server. I have been maintaining the current site on various linux machines for the last decade. However, if we can set it up to pull website updates from version control, and allow a limited number of Xiph.Org contributors to push mew filesets over ssh, I'm happy to have someone else manage the system updates.

Ralph Giles (:rillian)

Reporter

Updated

•

14 years ago

Blocks: 729007

matthew zeier [:mrz]

Comment 1

•

14 years ago

I believe this to be an appropriate use of Mozilla resources and shows that we continue to push and support open source video.

Bob Moss :bmoss

Comment 2

•

14 years ago

I agree that this is an appropriate use of Mozilla resources and support granting this request.

matthew zeier [:mrz]

Updated

•

14 years ago

Assignee: nobody → arzhel

Ralph Giles (:rillian)

Reporter

Comment 3

•

13 years ago

Any news on this? Anything I can do to help?

Arzhel Younsi [:XioNoX]

Comment 4

•

13 years ago

Will you need RAID or backups? Is it going to do heavy computing (powerful CPU, lot of ram)? Is the required disk space going to grow after 1012? If it's a NFS share is it good enough? I'm trying to figure out what the best option is

Ralph Giles (:rillian)

Reporter

Comment 5

•

13 years ago

> Will you need RAID or backups? Yes. I want this to be the primary source for these datasets, so there needs to be a reliability plan. Backups are more important than RAID; occasional downtime is tolerable. > Is it going to do heavy computing (powerful CPU, lot of ram)? No, just serving the files. We do the number crunching on local copies. > Is the required disk space going to grow after 1012? Eventually I can't estimate the growth though; it depends on donations which are sporadic. We know we have a large chunk coming in over the rest of 2012, which is the immediate need. I expect 20 TB would see us through 2013. > If it's a NFS share is it good enough? NFS should be fine.

Corey Shields [:cshields]

Comment 6

•

13 years ago

Hey Ralph, long time no see!! Glad to be helping you here! I'm copying Rich - Rich can you get us some server options with 20TB of usable space (so a bit more than that for RAID+spare)?

Ralph Giles (:rillian)

Reporter

Comment 7

•

13 years ago

Hi Corey! Thanks for looking after me, again. :)

Ralph Giles (:rillian)

Reporter

Comment 8

•

13 years ago

Bug ping. Can I have an update on this? Our next donation is starting to come in.

Ralph Giles (:rillian)

Reporter

Comment 9

•

13 years ago

I spoke to Corey on irc today; this is blocked on him currently, and he's been traveling.

Corey Shields [:cshields]

Comment 10

•

13 years ago

Just an update here, Rich and I have been working up a quote and spec for this server. Should have an ETA first of next week.

Ralph Giles (:rillian)

Reporter

Comment 11

•

13 years ago

Great, thanks for the update.

Comment 12

•

13 years ago

Server shipped via Fed Ex Tracking no.: 038055751941648. ETA is 7/26 D2600 Storage array and 12 3 TB drives already arrived in Phoenix

Ralph Giles (:rillian)

Reporter

Comment 13

•

13 years ago

\o/

Arzhel Younsi [:XioNoX]

Updated

•

13 years ago

Assignee: arzhel → cshields

Ralph Giles (:rillian)

Reporter

Comment 14

•

13 years ago

Any update on this? Do you need anything from me to get the host set up?

Corey Shields [:cshields]

Comment 15

•

13 years ago

Handing this over to DCOps for physical installation, then we'll kickstart it and get you in.

Assignee: cshields → server-ops

Component: Community IT Requests → Server Operations: DCOps

Product: Mozilla Reps → mozilla.org

QA Contact: dmoore

Version: unspecified → other

Derek Moore [:dmoore]

Comment 16

•

13 years ago

Passing this through netops to determine if we have suitable community network in phx1. Otherwise, this equipment will need to be redirected to scl3. Ravi, can you comment on supporting community-style networks in phx1?

Assignee: server-ops → ravi

colo-trip: --- → phx1

Component: Server Operations: DCOps → Server Operations: Netops

QA Contact: dmoore → ravi

Ralph Giles (:rillian)

Reporter

Comment 17

•

13 years ago

Ping on this?

Ralph Giles (:rillian)

Reporter

Comment 18

•

13 years ago

Second ping on this. Corey, if Ravi isn't able to look at this, can you redirect?

Ravi Pina [:ravi]

Comment 19

•

13 years ago

Huh. I don't recall seeing bug mail from your 8-AUG update. Derek, we currently don't have any community infra configured in phx and I don't think there is really any prior art for this particular setup. We currently have an extensive community environment in scl3 and a smaller one in scl1, but those are for Mozilla projects (Bugzilla, Camino, etc). We can spin something up, but we'll need to find some IP space to carve out for this and future growth.

Ralph Giles (:rillian)

Reporter

Comment 20

•

13 years ago

Thanks Ravi. Can you say when space would be available? Does it make sense to ship the server to scl3 in the meantime?

Ralph Giles (:rillian)

Reporter

Comment 21

•

13 years ago

Ping again. Can we have a status update?

Derek Moore [:dmoore]

Comment 22

•

13 years ago

Apologies, my CC on this bug didn't stick and I missed the last few comments. It sounds like we may be better served forwarding this server, storage arrays, and drives to Santa Clara. I'll get staff out to Phoenix to address this.

Assignee: ravi → dmoore

Component: Server Operations: Netops → Server Operations: DCOps

QA Contact: ravi → dmoore

Comment 23

•

13 years ago

The hardware for this project has been shipped to SCL3. Master tracking number: 577007415001285 (1 of 4) Packaging type: Package Weight: 47.00 lb. Tracking number: 577007415001292 (2 of 4) Packaging type: Package Weight: 15.00 lb. Tracking number: 577007415001308 (3 of 4) Packaging type: Package Weight: 19.00 lb. Tracking number: 577007415001315 (4 of 4) Packaging type: Package Weight: 50.00 lb.

colo-trip: phx1 → scl3

Comment 24

•

13 years ago

The hardware arrived today and I have installed the drives, racked, configured iLO and inventoried them. https://inventory.mozilla.org/en-US/systems/show/8645/ https://inventory.mozilla.org/en-US/systems/show/8644/ Back to server-ops to kickstart. Van

Updated

•

13 years ago

Assignee: dmoore → server-ops

Component: Server Operations: DCOps → Server Operations

QA Contact: dmoore → shyam

Ralph Giles (:rillian)

Reporter

Comment 25

•

13 years ago

Great news, Van. Thanks for the update.

Michal Purzynski [:michal`] (use NEEDINFO)

Comment 26

•

13 years ago

ACK for this server to be in the community.scl3.

Shyam Mani [:fox2mike]

Assignee

Comment 27

•

13 years ago

Van, I can't get this server to get an address via PXE but I'm not sure if we have kickstart capabilities on the community vlan. Thoughts?

Derek Moore [:dmoore]

Comment 28

•

13 years ago

I'm almost certain we don't have public kickstart services on the community VLAN (cc'ing dustin for clarification). Perhaps we kickstart internally and then move to external?

Shyam Mani [:fox2mike]

Assignee

Comment 29

•

13 years ago

(In reply to Derek Moore from comment #28) > I'm almost certain we don't have public kickstart services on the community > VLAN (cc'ing dustin for clarification). Perhaps we kickstart internally and > then move to external? Yeah..so I guess I'll wait for dustin. Also, was the storage blade configured already? or should I do that bit? If it was configured, did we RAID-10 that baby?

Comment 30

•

13 years ago

:fox2mike, I never saw a comment requesting a specific RAID config. Can you go ahead and configure it remotely? Let me know if you run into any lag issues and I'll do it locally with a crash cart.

Shyam Mani [:fox2mike]

Assignee

Comment 31

•

13 years ago

Van - No issues :) I'll try and get to it. Thanks!

Shyam Mani [:fox2mike]

Assignee

Comment 32

•

13 years ago

(In reply to Ralph Giles (:rillian) from comment #0) > - Current allocation is 5 TB, expect to need 20 TB by the end of 2012. > - Serve files over http(s) and public rsync. > - Answer to the media.xiph.org vhost to avoid breaking established urls. Ralph - With RAID 1+0 which offers the best protection against multiple disk losses, we have about 16TB of usable space on the array. If we want to take a chance and go with just RAID 5+0 we'll have about 30TB of space to use. Do you have a preference on how you'd like to go? Also considering that if we go with RAID 1+0, your only option if you really need 20 TB of space now would be to get higher capacity hard disks, which will take us some time to source and put in. So to tl;dr this, pick an option? :D 1) More space = RAID 5+0 = 30TB (no hardware change) 2) Sacrifice space, more safety = RAID 1+0 = 16TB (no hardware change) 3) zomgwtfbbq no way = keep RAID 1+0 but need > 16TB = we probably need to get higher capacity disks which involves more time loss.

Ralph Giles (:rillian)

Reporter

Comment 33

•

13 years ago

RAID 1+0 please, for better safety. This summer's donation was smaller than I expected, so 16 TB should be fine.

Shyam Mani [:fox2mike]

Assignee

Comment 34

•

13 years ago

(In reply to Ralph Giles (:rillian) from comment #33) > RAID 1+0 please, for better safety. > > This summer's donation was smaller than I expected, so 16 TB should be fine. Cool. Thanks!

Shyam Mani [:fox2mike]

Assignee

Comment 35

•

13 years ago

Derek, can we get this moved to vlan that isn't community so I can kick the machine and then we can move it back? Thanks!

Assignee: server-ops → server-ops-dcops

Component: Server Operations → Server Operations: DCOps

QA Contact: shyam → dmoore

Shyam Mani [:fox2mike]

Assignee

Comment 36

•

13 years ago

(In reply to Ralph Giles (:rillian) from comment #33) > RAID 1+0 please, for better safety. > > This summer's donation was smaller than I expected, so 16 TB should be fine. Also Ralph, any OS prefs (something non standard will probably take more time)? If not we'll probably slap on RHEL 6.x x86_64.

Ralph Giles (:rillian)

Reporter

Comment 37

•

13 years ago

Wow, I don't think I've ever had an actual RHEL system. Xiph.org has traditionally used Debian, but we're familiar with Fedora as well, so I expect RHEL 6 would be fine.

Ralph Giles (:rillian)

Reporter

Comment 38

•

13 years ago

(In reply to Shyam Mani [:fox2mike] from comment #32) > With RAID 1+0 which offers the best protection against multiple disk > losses, we have about 16TB of usable space on the array. After consultation with another developer, I should probably ask if RAID 6 is an option? The bug doesn't say what hardware was purchased, but wouldn't that offer increased capacity *and* better data safety? Data loss is still possible with two simultaneous failures in RAID 1+0, right? The intended workload is serving large static files, so read performance is much more important than write.

Derek Moore [:dmoore]

Comment 39

•

13 years ago

:fox2mike, moved this to the private VLAN for you

Michal Purzynski [:michal`] (use NEEDINFO)

Comment 40

•

13 years ago

RAID6 is indeed more safe than RAID10. The later can survive _some_ cases with two drive failures, and some not really. Depends if they are in the same pair, than you are SOL. Write performance on RAID6 sucks. Also, rebuild time isn't really nice. Just my 2c.

Ralph Giles (:rillian)

Reporter

Comment 41

•

13 years ago

I don't know much about it, and we do have a backup scheme of sorts. I'm happy go with whatever you think is best for the workload.

Shyam Mani [:fox2mike]

Assignee

Comment 42

•

13 years ago

Sorry, HP doesn't support RAID6. It's either RAID 5, RAID 5+0, RAID 1+0 or RAID 1 IIRC

Ralph Giles (:rillian)

Reporter

Comment 43

•

13 years ago

Alright, 1+0 sounds best out of those. Thanks!

Shyam Mani [:fox2mike]

Assignee

Comment 44

•

13 years ago

(In reply to Derek Moore from comment #39) > :fox2mike, moved this to the private VLAN for you w00t! Kicked. Can we get this back into the community vlan please? :) (In reply to Ralph Giles (:rillian) from comment #41) > I don't know much about it, and we do have a backup scheme of sorts. I'm > happy go with whatever you think is best for the workload. Hey Ralph, One last question (hopefully :p). What will you be storing on the 16TB partition? Lots of small files? large ones? So I can decide what filesystem to put on it. Thanks!

Derek Moore [:dmoore]

Comment 45

•

13 years ago

(In reply to Shyam Mani [:fox2mike] from comment #44) > > w00t! Kicked. Can we get this back into the community vlan please? :) > Done.

Shyam Mani [:fox2mike]

Assignee

Updated

•

13 years ago

Assignee: server-ops-dcops → shyam

Component: Server Operations: DCOps → Server Operations

QA Contact: dmoore → shyam

Shyam Mani [:fox2mike]

Assignee

Comment 46

•

13 years ago

Box is kicked, working on getting it online.

Ralph Giles (:rillian)

Reporter

Comment 47

•

13 years ago

(In reply to Shyam Mani [:fox2mike] from comment #44) > One last question (hopefully :p). What will you be storing on the 16TB > partition? Lots of small files? large ones? So I can decide what filesystem > to put on it. Thanks! ext4 is good. There are a couple of hundred thousand large files. Most are a few MB. Smallest are ~200 KB, largest ~20 GB.

Ralph Giles (:rillian)

Reporter

Comment 48

•

13 years ago

(In reply to Shyam Mani [:fox2mike] from comment #46) > Box is kicked, working on getting it online. How's this going?

Shyam Mani [:fox2mike]

Assignee

Comment 49

•

13 years ago

(In reply to Ralph Giles (:rillian) from comment #48) > (In reply to Shyam Mani [:fox2mike] from comment #46) > > Box is kicked, working on getting it online. > > How's this going? We were hitting this : http://blog.ronnyegner-consulting.de/2011/08/18/ext4-and-the-16-tb-limit-now-solved/ with RHEL 6.x I'm not too happy about using something that needs to be compiled (not a release version) I tried reiserfs too and was undecided between laughing or crying :) [root@xiph1 ~]# time mkfs.reiserfs -l xiph-data /dev/sdb1 All data on /dev/sdb1 will be lost. Do you realy want to create reiserfs 3.6 (y/n) y Creating reiserfs 3.6 with standard journal on /dev/sdb1 Segmentation fault (core dumped) real 0m4.779s user 0m0.119s sys 0m0.689s I finally gave in and compiled e2fsprogs-1.42.6, the local release and managed to get this up : [root@xiph1 sbin]# time ./mkfs.ext4 -j /dev/sdb1 mke2fs 1.42.6 (21-Sep-2012) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 274710528 inodes, 4395350016 blocks 219767500 blocks (5.00%) reserved for the super user First data block=0 134136 block groups 32768 blocks per group, 32768 fragments per group 2048 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, 2560000000, 3855122432 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done real 0m20.132s user 0m6.388s sys 0m0.376s [root@xiph1 sbin]# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 273G 1.6G 257G 1% / tmpfs 18G 0 18G 0% /dev/shm /dev/sda1 985M 65M 870M 7% /boot /dev/sdb1 17T 129M 16T 1% /data **** Since I compiled this from source, if you need to do anything look at the fs/modify etc, please use the versions in /opt/sbin for this. **** So all that's left it to do is to ask for your usernames and ssh public keys, I'll add them to the box and give you sudo access so you can login and manage it. We'll be happy to help out with stuff in case you get stuck.

Ralph Giles (:rillian)

Reporter

Comment 50

•

13 years ago

(In reply to Shyam Mani [:fox2mike] from comment #49) > I tried reiserfs too and was undecided between laughing or crying :) eep! > [root@xiph1 sbin]# df -h > Filesystem Size Used Avail Use% Mounted on > /dev/sda3 273G 1.6G 257G 1% / > tmpfs 18G 0 18G 0% /dev/shm > /dev/sda1 985M 65M 870M 7% /boot > /dev/sdb1 17T 129M 16T 1% /data > > **** > Since I compiled this from source, if you need to do anything look at the > fs/modify etc, please use the versions in /opt/sbin for this. > **** Noted. I think that's probably safer than using xfs would have been. I'm surprised the 16TB limit isn't fixed in an update; you'd think that would be a common configuration these days. In any case, thanks for getting it online! > So all that's left it to do is to ask for your usernames and ssh public > keys, I'll add them to the box and give you sudo access so you can login and > manage it. We'll be happy to help out with stuff in case you get stuck. Keys sent via email.

Dustin J. Mitchell [:dustin] (he/him)

Comment 51

•

13 years ago

Added to jump1.community.scl3.mozilla.com, using existing keys from LDAP: username: giles Both supplied keys were already in LDAP username: tterribe I *added* the key supplied to the key already in Mozilla LDAP So, login to jump1 using that username, then SSH to root@xiph1.community.scl3.mozilla.com. Give the changes an hour to propagate, please. Need LDAP accounts first - I need to figure out the policy here: username: xiphmont username: greg The alternative to adding LDAP accounts is to add a flow allowing direct SSH login to xiph1 from the internet.

Dustin J. Mitchell [:dustin] (he/him)

Comment 52

•

13 years ago

We'll need mrz's signoff to create the accounts.

Ralph Giles (:rillian)

Reporter

Comment 53

•

13 years ago

(In reply to Dustin J. Mitchell [:dustin] from comment #51) > Added to jump1.community.scl3.mozilla.com, using existing keys from LDAP: > username: giles > Both supplied keys were already in LDAP That's correct. Thanks for making the changes. > Need LDAP accounts first - I need to figure out the policy here: > username: xiphmont > username: greg Monty and Greg aren't Mozilla employees. Direct access isn't essential in the near term, but it's not really a community box if community members don't have access... > The alternative to adding LDAP accounts is to add a flow allowing direct SSH > login to xiph1 from the internet. That would be fine with us.

Ralph Giles (:rillian)

Reporter

Comment 54

•

13 years ago

Ok, I was able to log in via the jumphost and have started pulling data from the old host. So far so good. Can you please open ports 80, 443 (http) and 873 (rsync) to this machine? At least, I assume the router is blocking traffic and that's why I can't reach the webserver externally. It doesn't look like the machine itself is blocking any ports.

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

13 years ago

Depends on: 809402

Dustin J. Mitchell [:dustin] (he/him)

Comment 55

•

13 years ago

Flows are in the dependent bug. That's generally done in a day or so.

Shyam Mani [:fox2mike]

Assignee

Comment 56

•

13 years ago

(In reply to Dustin J. Mitchell [:dustin] from comment #51) > The alternative to adding LDAP accounts is to add a flow allowing direct SSH > login to xiph1 from the internet. I don't think we do that for *any* community host? I tried a few, none of them have ssh open to the world.

Dustin J. Mitchell [:dustin] (he/him)

Comment 57

•

13 years ago

Bugzilla is - it's an option, but not a good one.

Shyam Mani [:fox2mike]

Assignee

Comment 58

•

13 years ago

Flows are done, looks like we're good to go here.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Ralph Giles (:rillian)

Reporter

Comment 59

•

13 years ago

Yep, we are all online. Thanks so much, everyone, for getting this machine up!

Ralph Giles (:rillian)

Reporter

Updated

•

13 years ago

Blocks: 812285

Ralph Giles (:rillian)

Reporter

Updated

•

13 years ago

Blocks: 841597

Ralph Giles (:rillian)

Reporter

Updated

•

12 years ago

Blocks: 902085

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: mozilla.org → mozilla.org Graveyard

You need to log in before you can comment on or make changes to this bug.