Need two machines with bad memory



5 years ago
3 months ago


(Reporter: Dolske, Assigned: Dolske)




(1 attachment)



5 years ago
Following up from email -- I've got an experiment in mind (detecting memory errors), but need to play with a machine known to have bad RAM. I'm told releng has quite a few, and that it should be possible to borrow from the pile. I'd like 2 to start with.

I don't need any particular OS, I'll probably end up installing some different OS flavors to test different things anyway.

Physical access would be much preferred, hopefully a colo pull is easy.

No rush, this is a side project for now. :)
dolske: thanks for filing. Be careful or we'll send you a pallet full.

callek/rail: can you help find a couple of machines for dolske? He doesnt mind about OS, but delivering a couple of r2 or r3 minis is certainly easier to fit on dolske's desk then some 1U ix machines.
Component: Release Engineering → Release Engineering: Machine Management
QA Contact: armenzg
Summary: Need a machine with bad memory → Need two machines with bad memory

Comment 2

5 years ago
Oh, and just to be explicit (and hopefully not too nitpicky): hopefully you've got some that, while they clearly have bad RAM, can manage to boot the OS and run the browser for a bit before croaking. Not too crashy and not too stable -- I like my bad RAM juuuuuust right. :)
Depends on: 864962
Depends on: 864979

Comment 3

5 years ago
Justin - Will r4 Mac minis work for you?

Comment 4

5 years ago
That would be just fine.

Comment 5

5 years ago
Rev4 has been proven the ones that have the most trouble.
It's good that last week DCOps diagnosed two of them with bad memory.

Comment 6

5 years ago
Justin - I'll have someone drop the two mac minis at your desk.

Comment 7

5 years ago
For the pruposes of downtiming them for the appropriate amount of time in nagios, about how long do we expect these to be out of the datacenter?
Flags: needinfo?(dolske)

Comment 9

5 years ago
Let's say 2 weeks from now?
Flags: needinfo?(dolske)

Comment 10

5 years ago
:Dolske - Do you need to hold onto the two mac minis a little longer for testing?

Comment 11

5 years ago
Created attachment 755105 [details]
memtestJS in action!

Yay. This is the first bad memory found with my code. :-)

It's on the mac mini from bug 864962. It has one bit stuck at 1 that fails quite reliably. The other mini has a more subtle failure -- even a long memtest86 run can go many passes (days) with no errors. But proof-of-concept successful!

I'd like to hold onto this reliably-failing mini for a bit more (say, 1 month?), but I'm done with the other.

Can we keep the bad RAM once it's replaced? Got any more hanging around? I might want to try collecting a variety of bad RAM, and swap 'em through a single machine to see if it fails in differently exciting ways.
(In reply to Justin Dolske [:Dolske] from comment #11)
> I'd like to hold onto this reliably-failing mini for a bit more (say, 1
> month?), but I'm done with the other.

Releng has no problem with this. I personally cannot answer the other Q's

Comment 13

5 years ago
:dolske - Let us know when we can pick up the mac mini. We can drop off more bad memory to you once they trickle in.

Comment 14

5 years ago
:vinh - is the bad memory you dropped off after picking up the one Mac mini (bug 864979) on friday from that machine? Or just some other set of bad memory?

Comment 15

5 years ago
Justin - They are from a different mac mini.
Product: → Release Engineering
Justin: Do you need anything else from releng?  Have the Mac minis been returned?
Flags: needinfo?(dolske)


5 years ago
Assignee: nobody → dolske
Component: Buildduty → Loan Requests
QA Contact: armenzg → coop

Comment 17

5 years ago
I still have snow67 (bug 864962), the other machine (bug 864979) was returned.

I don't need anything else from releng, but thanks for the reminder. I'm hoping to wrap up some of this work shortly.
Flags: needinfo?(dolske)

Comment 18

5 years ago
Snow67 is a blocker for bug 864962.  Just let us know when the mini is ready to be picked up.  Thanks
Blocks: 864962
No longer depends on: 864962

Comment 19

5 years ago
Hi! All done with this one now, thanks for the extended loan.

2 notes:

* I've already removed the bad memory from this for my testing collection.

* The DVD drive seems to be bad too; it had rather reluctantly accepted a memtest86 disc, and now refuses to eject it (despite making noises that it's trying). Presumably the drive needs replaced, feel free to keep the disc that's stuck in it.

I'll leave it on my desk, it's clearly tagged. Pick it up whenever, or let me know where to drop it off.

Comment 21

5 years ago
:dolske - I will swing by your desk on Friday to pick up the mini.  Thanks

Comment 22

5 years ago
Mac mini has been picked up.

Comment 23

5 years ago
I assume the loans have been completed.

dolske, thanks for looking into this!
Last Resolved: 5 years ago
Resolution: --- → FIXED

Comment 24

5 years ago
dolske, how could I run this memtest test on my local machine?
Is this a type of test that releng could automate for machines that are acting up?
Component: Loan Requests → Buildduty
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.