Closed
Bug 740629
Opened 14 years ago
Closed 13 years ago
Dataset host for media.xiph.org
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rillian, Assigned: fox2mike)
References
Details
Bug 729007 requests that Mozilla provide hosting for the Xiph.Org Foundation's collection of redistributable video masters for compression research. This is a resource used by developers working on codecs within Mozilla, and the greater community. It is, for example, where the 'Big Buck Bunny' clips everyone used for video demos last year came from.
Zandr suggested the MCS programme might be the right place to implement this.
Requirements:
- Current allocation is 5 TB, expect to need 20 TB by the end of 2012.
- Serve files over http(s) and public rsync.
- Answer to the media.xiph.org vhost to avoid breaking established urls.
- Most users are in North America, followed by Europe.
- There number of users is small, consisting of compression researchers and enthusiasts, but the filesets are big. Peak traffic at the current host has been 100 Mbps after a significant new release, but is normally in the 10-20 Mbps range.
I can set up and maintain the front-end server. I have been maintaining the current site on various linux machines for the last decade. However, if we can set it up to pull website updates from version control, and allow a limited number of Xiph.Org contributors to push mew filesets over ssh, I'm happy to have someone else manage the system updates.
Comment 1•14 years ago
|
||
I believe this to be an appropriate use of Mozilla resources and shows that we continue to push and support open source video.
Comment 2•14 years ago
|
||
I agree that this is an appropriate use of Mozilla resources and support granting this request.
Updated•14 years ago
|
Assignee: nobody → arzhel
| Reporter | ||
Comment 3•13 years ago
|
||
Any news on this? Anything I can do to help?
Comment 4•13 years ago
|
||
Will you need RAID or backups? Is it going to do heavy computing (powerful CPU, lot of ram)?
Is the required disk space going to grow after 1012?
If it's a NFS share is it good enough?
I'm trying to figure out what the best option is
| Reporter | ||
Comment 5•13 years ago
|
||
> Will you need RAID or backups?
Yes. I want this to be the primary source for these datasets, so there needs to be a reliability plan. Backups are more important than RAID; occasional downtime is tolerable.
> Is it going to do heavy computing (powerful CPU, lot of ram)?
No, just serving the files. We do the number crunching on local copies.
> Is the required disk space going to grow after 1012?
Eventually I can't estimate the growth though; it depends on donations which are sporadic. We know we have a large chunk coming in over the rest of 2012, which is the immediate need. I expect 20 TB would see us through 2013.
> If it's a NFS share is it good enough?
NFS should be fine.
Comment 6•13 years ago
|
||
Hey Ralph, long time no see!! Glad to be helping you here!
I'm copying Rich - Rich can you get us some server options with 20TB of usable space (so a bit more than that for RAID+spare)?
| Reporter | ||
Comment 7•13 years ago
|
||
Hi Corey! Thanks for looking after me, again. :)
| Reporter | ||
Comment 8•13 years ago
|
||
Bug ping. Can I have an update on this? Our next donation is starting to come in.
| Reporter | ||
Comment 9•13 years ago
|
||
I spoke to Corey on irc today; this is blocked on him currently, and he's been traveling.
Comment 10•13 years ago
|
||
Just an update here, Rich and I have been working up a quote and spec for this server. Should have an ETA first of next week.
| Reporter | ||
Comment 11•13 years ago
|
||
Great, thanks for the update.
Comment 12•13 years ago
|
||
Server shipped via Fed Ex Tracking no.: 038055751941648. ETA is 7/26
D2600 Storage array and 12 3 TB drives already arrived in Phoenix
| Reporter | ||
Comment 13•13 years ago
|
||
\o/
Updated•13 years ago
|
Assignee: arzhel → cshields
| Reporter | ||
Comment 14•13 years ago
|
||
Any update on this? Do you need anything from me to get the host set up?
Comment 15•13 years ago
|
||
Handing this over to DCOps for physical installation, then we'll kickstart it and get you in.
Assignee: cshields → server-ops
Component: Community IT Requests → Server Operations: DCOps
Product: Mozilla Reps → mozilla.org
QA Contact: dmoore
Version: unspecified → other
Comment 16•13 years ago
|
||
Passing this through netops to determine if we have suitable community network in phx1. Otherwise, this equipment will need to be redirected to scl3.
Ravi, can you comment on supporting community-style networks in phx1?
Assignee: server-ops → ravi
colo-trip: --- → phx1
Component: Server Operations: DCOps → Server Operations: Netops
QA Contact: dmoore → ravi
| Reporter | ||
Comment 17•13 years ago
|
||
Ping on this?
| Reporter | ||
Comment 18•13 years ago
|
||
Second ping on this. Corey, if Ravi isn't able to look at this, can you redirect?
Comment 19•13 years ago
|
||
Huh. I don't recall seeing bug mail from your 8-AUG update.
Derek, we currently don't have any community infra configured in phx and I don't think there is really any prior art for this particular setup. We currently have an extensive community environment in scl3 and a smaller one in scl1, but those are for Mozilla projects (Bugzilla, Camino, etc).
We can spin something up, but we'll need to find some IP space to carve out for this and future growth.
| Reporter | ||
Comment 20•13 years ago
|
||
Thanks Ravi. Can you say when space would be available? Does it make sense to ship the server to scl3 in the meantime?
| Reporter | ||
Comment 21•13 years ago
|
||
Ping again. Can we have a status update?
Comment 22•13 years ago
|
||
Apologies, my CC on this bug didn't stick and I missed the last few comments.
It sounds like we may be better served forwarding this server, storage arrays, and drives to Santa Clara. I'll get staff out to Phoenix to address this.
Assignee: ravi → dmoore
Component: Server Operations: Netops → Server Operations: DCOps
QA Contact: ravi → dmoore
Comment 23•13 years ago
|
||
The hardware for this project has been shipped to SCL3.
Master tracking number: 577007415001285 (1 of 4)
Packaging type: Package
Weight: 47.00 lb.
Tracking number: 577007415001292 (2 of 4)
Packaging type: Package
Weight: 15.00 lb.
Tracking number: 577007415001308 (3 of 4)
Packaging type: Package
Weight: 19.00 lb.
Tracking number: 577007415001315 (4 of 4)
Packaging type: Package
Weight: 50.00 lb.
colo-trip: phx1 → scl3
Comment 24•13 years ago
|
||
The hardware arrived today and I have installed the drives, racked, configured iLO and inventoried them.
https://inventory.mozilla.org/en-US/systems/show/8645/
https://inventory.mozilla.org/en-US/systems/show/8644/
Back to server-ops to kickstart.
Van
Updated•13 years ago
|
Assignee: dmoore → server-ops
Component: Server Operations: DCOps → Server Operations
QA Contact: dmoore → shyam
| Reporter | ||
Comment 25•13 years ago
|
||
Great news, Van. Thanks for the update.
Comment 26•13 years ago
|
||
ACK for this server to be in the community.scl3.
| Assignee | ||
Comment 27•13 years ago
|
||
Van,
I can't get this server to get an address via PXE but I'm not sure if we have kickstart capabilities on the community vlan. Thoughts?
Comment 28•13 years ago
|
||
I'm almost certain we don't have public kickstart services on the community VLAN (cc'ing dustin for clarification). Perhaps we kickstart internally and then move to external?
| Assignee | ||
Comment 29•13 years ago
|
||
(In reply to Derek Moore from comment #28)
> I'm almost certain we don't have public kickstart services on the community
> VLAN (cc'ing dustin for clarification). Perhaps we kickstart internally and
> then move to external?
Yeah..so I guess I'll wait for dustin. Also, was the storage blade configured already? or should I do that bit? If it was configured, did we RAID-10 that baby?
Comment 30•13 years ago
|
||
:fox2mike, I never saw a comment requesting a specific RAID config. Can you go ahead and configure it remotely? Let me know if you run into any lag issues and I'll do it locally with a crash cart.
| Assignee | ||
Comment 31•13 years ago
|
||
Van - No issues :) I'll try and get to it. Thanks!
| Assignee | ||
Comment 32•13 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #0)
> - Current allocation is 5 TB, expect to need 20 TB by the end of 2012.
> - Serve files over http(s) and public rsync.
> - Answer to the media.xiph.org vhost to avoid breaking established urls.
Ralph - With RAID 1+0 which offers the best protection against multiple disk losses, we have about 16TB of usable space on the array. If we want to take a chance and go with just RAID 5+0 we'll have about 30TB of space to use. Do you have a preference on how you'd like to go?
Also considering that if we go with RAID 1+0, your only option if you really need 20 TB of space now would be to get higher capacity hard disks, which will take us some time to source and put in.
So to tl;dr this, pick an option? :D
1) More space = RAID 5+0 = 30TB (no hardware change)
2) Sacrifice space, more safety = RAID 1+0 = 16TB (no hardware change)
3) zomgwtfbbq no way = keep RAID 1+0 but need > 16TB = we probably need to get higher capacity disks which involves more time loss.
| Reporter | ||
Comment 33•13 years ago
|
||
RAID 1+0 please, for better safety.
This summer's donation was smaller than I expected, so 16 TB should be fine.
| Assignee | ||
Comment 34•13 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #33)
> RAID 1+0 please, for better safety.
>
> This summer's donation was smaller than I expected, so 16 TB should be fine.
Cool. Thanks!
| Assignee | ||
Comment 35•13 years ago
|
||
Derek, can we get this moved to vlan that isn't community so I can kick the machine and then we can move it back? Thanks!
Assignee: server-ops → server-ops-dcops
Component: Server Operations → Server Operations: DCOps
QA Contact: shyam → dmoore
| Assignee | ||
Comment 36•13 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #33)
> RAID 1+0 please, for better safety.
>
> This summer's donation was smaller than I expected, so 16 TB should be fine.
Also Ralph, any OS prefs (something non standard will probably take more time)? If not we'll probably slap on RHEL 6.x x86_64.
| Reporter | ||
Comment 37•13 years ago
|
||
Wow, I don't think I've ever had an actual RHEL system. Xiph.org has traditionally used Debian, but we're familiar with Fedora as well, so I expect RHEL 6 would be fine.
| Reporter | ||
Comment 38•13 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #32)
> With RAID 1+0 which offers the best protection against multiple disk
> losses, we have about 16TB of usable space on the array.
After consultation with another developer, I should probably ask if RAID 6 is an option? The bug doesn't say what hardware was purchased, but wouldn't that offer increased capacity *and* better data safety? Data loss is still possible with two simultaneous failures in RAID 1+0, right?
The intended workload is serving large static files, so read performance is much more important than write.
Comment 39•13 years ago
|
||
:fox2mike, moved this to the private VLAN for you
Comment 40•13 years ago
|
||
RAID6 is indeed more safe than RAID10. The later can survive _some_ cases with two drive failures, and some not really. Depends if they are in the same pair, than you are SOL.
Write performance on RAID6 sucks. Also, rebuild time isn't really nice.
Just my 2c.
| Reporter | ||
Comment 41•13 years ago
|
||
I don't know much about it, and we do have a backup scheme of sorts. I'm happy go with whatever you think is best for the workload.
| Assignee | ||
Comment 42•13 years ago
|
||
Sorry, HP doesn't support RAID6. It's either RAID 5, RAID 5+0, RAID 1+0 or RAID 1 IIRC
| Reporter | ||
Comment 43•13 years ago
|
||
Alright, 1+0 sounds best out of those. Thanks!
| Assignee | ||
Comment 44•13 years ago
|
||
(In reply to Derek Moore from comment #39)
> :fox2mike, moved this to the private VLAN for you
w00t! Kicked. Can we get this back into the community vlan please? :)
(In reply to Ralph Giles (:rillian) from comment #41)
> I don't know much about it, and we do have a backup scheme of sorts. I'm
> happy go with whatever you think is best for the workload.
Hey Ralph,
One last question (hopefully :p). What will you be storing on the 16TB partition? Lots of small files? large ones? So I can decide what filesystem to put on it. Thanks!
Comment 45•13 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #44)
>
> w00t! Kicked. Can we get this back into the community vlan please? :)
>
Done.
| Assignee | ||
Updated•13 years ago
|
Assignee: server-ops-dcops → shyam
Component: Server Operations: DCOps → Server Operations
QA Contact: dmoore → shyam
| Assignee | ||
Comment 46•13 years ago
|
||
Box is kicked, working on getting it online.
| Reporter | ||
Comment 47•13 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #44)
> One last question (hopefully :p). What will you be storing on the 16TB
> partition? Lots of small files? large ones? So I can decide what filesystem
> to put on it. Thanks!
ext4 is good.
There are a couple of hundred thousand large files. Most are a few MB. Smallest are ~200 KB, largest ~20 GB.
| Reporter | ||
Comment 48•13 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #46)
> Box is kicked, working on getting it online.
How's this going?
| Assignee | ||
Comment 49•13 years ago
|
||
(In reply to Ralph Giles (:rillian) from comment #48)
> (In reply to Shyam Mani [:fox2mike] from comment #46)
> > Box is kicked, working on getting it online.
>
> How's this going?
We were hitting this : http://blog.ronnyegner-consulting.de/2011/08/18/ext4-and-the-16-tb-limit-now-solved/ with RHEL 6.x
I'm not too happy about using something that needs to be compiled (not a release version)
I tried reiserfs too and was undecided between laughing or crying :)
[root@xiph1 ~]# time mkfs.reiserfs -l xiph-data /dev/sdb1
All data on /dev/sdb1 will be lost. Do you realy want to create reiserfs 3.6 (y/n) y
Creating reiserfs 3.6 with standard journal on /dev/sdb1
Segmentation fault (core dumped)
real 0m4.779s
user 0m0.119s
sys 0m0.689s
I finally gave in and compiled e2fsprogs-1.42.6, the local release and managed to get this up :
[root@xiph1 sbin]# time ./mkfs.ext4 -j /dev/sdb1
mke2fs 1.42.6 (21-Sep-2012)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
274710528 inodes, 4395350016 blocks
219767500 blocks (5.00%) reserved for the super user
First data block=0
134136 block groups
32768 blocks per group, 32768 fragments per group
2048 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
2560000000, 3855122432
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
real 0m20.132s
user 0m6.388s
sys 0m0.376s
[root@xiph1 sbin]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 273G 1.6G 257G 1% /
tmpfs 18G 0 18G 0% /dev/shm
/dev/sda1 985M 65M 870M 7% /boot
/dev/sdb1 17T 129M 16T 1% /data
****
Since I compiled this from source, if you need to do anything look at the fs/modify etc, please use the versions in /opt/sbin for this.
****
So all that's left it to do is to ask for your usernames and ssh public keys, I'll add them to the box and give you sudo access so you can login and manage it. We'll be happy to help out with stuff in case you get stuck.
| Reporter | ||
Comment 50•13 years ago
|
||
(In reply to Shyam Mani [:fox2mike] from comment #49)
> I tried reiserfs too and was undecided between laughing or crying :)
eep!
> [root@xiph1 sbin]# df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda3 273G 1.6G 257G 1% /
> tmpfs 18G 0 18G 0% /dev/shm
> /dev/sda1 985M 65M 870M 7% /boot
> /dev/sdb1 17T 129M 16T 1% /data
>
> ****
> Since I compiled this from source, if you need to do anything look at the
> fs/modify etc, please use the versions in /opt/sbin for this.
> ****
Noted. I think that's probably safer than using xfs would have been. I'm surprised the 16TB limit isn't fixed in an update; you'd think that would be a common configuration these days. In any case, thanks for getting it online!
> So all that's left it to do is to ask for your usernames and ssh public
> keys, I'll add them to the box and give you sudo access so you can login and
> manage it. We'll be happy to help out with stuff in case you get stuck.
Keys sent via email.
Comment 51•13 years ago
|
||
Added to jump1.community.scl3.mozilla.com, using existing keys from LDAP:
username: giles
Both supplied keys were already in LDAP
username: tterribe
I *added* the key supplied to the key already in Mozilla LDAP
So, login to jump1 using that username, then SSH to root@xiph1.community.scl3.mozilla.com. Give the changes an hour to propagate, please.
Need LDAP accounts first - I need to figure out the policy here:
username: xiphmont
username: greg
The alternative to adding LDAP accounts is to add a flow allowing direct SSH login to xiph1 from the internet.
Comment 52•13 years ago
|
||
We'll need mrz's signoff to create the accounts.
| Reporter | ||
Comment 53•13 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #51)
> Added to jump1.community.scl3.mozilla.com, using existing keys from LDAP:
> username: giles
> Both supplied keys were already in LDAP
That's correct. Thanks for making the changes.
> Need LDAP accounts first - I need to figure out the policy here:
> username: xiphmont
> username: greg
Monty and Greg aren't Mozilla employees. Direct access isn't essential in the near term, but it's not really a community box if community members don't have access...
> The alternative to adding LDAP accounts is to add a flow allowing direct SSH
> login to xiph1 from the internet.
That would be fine with us.
| Reporter | ||
Comment 54•13 years ago
|
||
Ok, I was able to log in via the jumphost and have started pulling data from the old host. So far so good.
Can you please open ports 80, 443 (http) and 873 (rsync) to this machine? At least, I assume the router is blocking traffic and that's why I can't reach the webserver externally. It doesn't look like the machine itself is blocking any ports.
Comment 55•13 years ago
|
||
Flows are in the dependent bug. That's generally done in a day or so.
| Assignee | ||
Comment 56•13 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #51)
> The alternative to adding LDAP accounts is to add a flow allowing direct SSH
> login to xiph1 from the internet.
I don't think we do that for *any* community host? I tried a few, none of them have ssh open to the world.
Comment 57•13 years ago
|
||
Bugzilla is - it's an option, but not a good one.
| Assignee | ||
Comment 58•13 years ago
|
||
Flows are done, looks like we're good to go here.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
| Reporter | ||
Comment 59•13 years ago
|
||
Yep, we are all online. Thanks so much, everyone, for getting this machine up!
Updated•11 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•