Closed Bug 725811 Opened 12 years ago Closed 12 years ago

Please mount cm-ixstore01 on surf for extra storage

Categories

(Release Engineering :: General, defect, P2)

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: coop, Unassigned)

References

Details

(Whiteboard: [stage][cleanup][storage][capacity])

We're currently hemorrhaging for space (bug 715840) and CPU (bug 711176) on stage.

Yesterday, Dustin sent mail to infra-all about an unused/under-used thumper system (nm-sun-xf01) yesterday. Can we please stand this system up as extra storage for releng?

Releng would like to start shifting some load to this other storage, specifically either production try server builds OR release builds. We would double-upload these builds to stage as well, but point automation to the thumper.

This will hopefully be a temporary situation until the new NetApp appears in scl3.
Blocks: 725816
Assignee: server-ops-releng → dustin
jhopkins also emailed about using momo's thumper.  Neither is an attractive option -- both are not entirely unused, and nm-sun-xf01's capacity can't be shared for security reasons.

We have storage available on cm-ixstore01 that is not any less reliable than the thumpers, and much more accessible.

I'm bumping this back to releng in lieu of WONTFIXing it, as requested.
Assignee: dustin → nobody
Severity: critical → normal
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
1) per irc, we are pausing this bug while we investigate alternate workaround bug#725838. 

(In reply to Dustin J. Mitchell [:dustin] from comment #1)
> jhopkins also emailed about using momo's thumper.  Neither is an attractive
> option -- both are not entirely unused, and nm-sun-xf01's capacity can't be
> shared for security reasons.
didnt follow, are you saying the *capacity* of the disk is classified? 
* can the data which *is* there be moved to tape? 
* is there enough unused space for RelEng to use if bug#725838 doesnt solve the problem?


> We have storage available on cm-ixstore01 that is not any less reliable than
> the thumpers, and much more accessible.
* is there enough unused space for RelEng to use if bug#725838 doesnt solve the problem?


> I'm bumping this back to releng in lieu of WONTFIXing it, as requested.
Yep. Downgrading this; raising bug#725858 as that is now where critical-fix-work will be done.

(Also, tweaking summary - this disk would be use as "cache" to reduce load on stage/surf, not as additional storage.)
Depends on: 725838
Summary: Please stand up nm-sun-xf01 as extra storage for releng → Please stand up nm-sun-xf01 as temp "cache" storage for releng
Just to be clear, the thumpers are not a a tier1 storage solution that's supported by the SRE group and are therefore not a viable solution to this (or any, at this point) problem. They are further down the ladder of desirability than cm-ixstore01 (also not a tier1 sotrage solution, but at least still under the purview of the SRE group) where releng already has GOBS of non-mission-critical space.
(In reply to Amy Rich [:arich] [:arr] from comment #3)
> Just to be clear, the thumpers are not a a tier1 storage solution that's
> supported by the SRE group and are therefore not a viable solution to this
> (or any, at this point) problem. They are further down the ladder of
> desirability than cm-ixstore01 (also not a tier1 sotrage solution, but at
> least still under the purview of the SRE group) where releng already has
> GOBS of non-mission-critical space.

Based on the thumper feedback, I've changed the summary on this bug. 

It's now on releng to figure out whether we can carve out a(nother) slice of the existing firefox partition to go on a new partition on cm-ixstore01.
Severity: normal → major
Priority: -- → P2
Summary: Please stand up nm-sun-xf01 as temp "cache" storage for releng → Please mount cm-ixstore01 on surf for extra storage
Whiteboard: [stage][cleanup][storage][capacity]
bug 708865 would save a lot of space on the firefox mount on stage.
We have those backed up to tape, but I'm still a bit wary of moving these to a disk that we know had significant issues very recently.
If they're on the tape, and presumably not changing subsequently (thereby invalidating the backup), what's the issue with putting them on cm-ixstore01?
(In reply to Dustin J. Mitchell [:dustin] from comment #6)
> If they're on the tape, and presumably not changing subsequently (thereby
> invalidating the backup), what's the issue with putting them on cm-ixstore01?

They should not change.

My scenario:

a) cm-ixstore01 blows up, again
b) the tape backup, singular, is not valid or has issues.

And boom, we lose years worth of builds, just like that.
There's always a data-loss scenario -- it's a question of how unlikely.  The one you suggest is in fact a triple-failure, since cm-ixstore01 is using RAID.  Bear in mind the changes made to cm-ixstore01 from the original, failing configuration (specifically, using software raid and bypassing the hardware controllers).

This sounds like a good trade-off to me, but I'll let y'all come up with alternatives and make the call.
Nick has moved things around. Hopefully we'll be fine until scl3.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.