Closed Bug 866338 Opened 11 years ago Closed 11 years ago

Need two machines with bad memory

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Dolske, Assigned: Dolske)

References

Details

Attachments

(1 file)

Following up from email -- I've got an experiment in mind (detecting memory errors), but need to play with a machine known to have bad RAM. I'm told releng has quite a few, and that it should be possible to borrow from the pile. I'd like 2 to start with.

I don't need any particular OS, I'll probably end up installing some different OS flavors to test different things anyway.

Physical access would be much preferred, hopefully a colo pull is easy.

No rush, this is a side project for now. :)
dolske: thanks for filing. Be careful or we'll send you a pallet full.

callek/rail: can you help find a couple of machines for dolske? He doesnt mind about OS, but delivering a couple of r2 or r3 minis is certainly easier to fit on dolske's desk then some 1U ix machines.
Component: Release Engineering → Release Engineering: Machine Management
QA Contact: armenzg
Summary: Need a machine with bad memory → Need two machines with bad memory
Oh, and just to be explicit (and hopefully not too nitpicky): hopefully you've got some that, while they clearly have bad RAM, can manage to boot the OS and run the browser for a bit before croaking. Not too crashy and not too stable -- I like my bad RAM juuuuuust right. :)
Depends on: 864962
Depends on: 864979
Justin - Will r4 Mac minis work for you?
That would be just fine.
Rev4 has been proven the ones that have the most trouble.
It's good that last week DCOps diagnosed two of them with bad memory.
Justin - I'll have someone drop the two mac minis at your desk.
asset 
06037
06025
For the pruposes of downtiming them for the appropriate amount of time in nagios, about how long do we expect these to be out of the datacenter?
Flags: needinfo?(dolske)
Let's say 2 weeks from now?
Flags: needinfo?(dolske)
:Dolske - Do you need to hold onto the two mac minis a little longer for testing?
Attached image memtestJS in action!
Yay. This is the first bad memory found with my code. :-)

It's on the mac mini from bug 864962. It has one bit stuck at 1 that fails quite reliably. The other mini has a more subtle failure -- even a long memtest86 run can go many passes (days) with no errors. But proof-of-concept successful!

I'd like to hold onto this reliably-failing mini for a bit more (say, 1 month?), but I'm done with the other.

Can we keep the bad RAM once it's replaced? Got any more hanging around? I might want to try collecting a variety of bad RAM, and swap 'em through a single machine to see if it fails in differently exciting ways.
(In reply to Justin Dolske [:Dolske] from comment #11)
> I'd like to hold onto this reliably-failing mini for a bit more (say, 1
> month?), but I'm done with the other.

Releng has no problem with this. I personally cannot answer the other Q's
:dolske - Let us know when we can pick up the mac mini. We can drop off more bad memory to you once they trickle in.
:vinh - is the bad memory you dropped off after picking up the one Mac mini (bug 864979) on friday from that machine? Or just some other set of bad memory?
Justin - They are from a different mac mini.
Product: mozilla.org → Release Engineering
Justin: Do you need anything else from releng?  Have the Mac minis been returned?
Flags: needinfo?(dolske)
Assignee: nobody → dolske
Component: Buildduty → Loan Requests
QA Contact: armenzg → coop
I still have snow67 (bug 864962), the other machine (bug 864979) was returned.

I don't need anything else from releng, but thanks for the reminder. I'm hoping to wrap up some of this work shortly.
Flags: needinfo?(dolske)
Snow67 is a blocker for bug 864962.  Just let us know when the mini is ready to be picked up.  Thanks
Blocks: 864962
No longer depends on: 864962
Hi! All done with this one now, thanks for the extended loan.

2 notes:

* I've already removed the bad memory from this for my testing collection.

* The DVD drive seems to be bad too; it had rather reluctantly accepted a memtest86 disc, and now refuses to eject it (despite making noises that it's trying). Presumably the drive needs replaced, feel free to keep the disc that's stuck in it.

I'll leave it on my desk, it's clearly tagged. Pick it up whenever, or let me know where to drop it off.
:dolske - I will swing by your desk on Friday to pick up the mini.  Thanks
Mac mini has been picked up.
I assume the loans have been completed.

dolske, thanks for looking into this!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
dolske, how could I run this memtest test on my local machine?
Is this a type of test that releng could automate for machines that are acting up?
Component: Loan Requests → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: