Open Bug 1579292 Opened 5 years ago Updated 4 years ago

Add udev rule mechanism to create symlinks to nvme devices as part of AMI setup

Categories

(Webtools :: Searchfox, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

People

(Reporter: asuth, Unassigned)

References

Details

In bug 1567724 I upgraded us to the new "nitro" instances which expose EBS volumes as nvme devices where the kernel decides what to name them in a potentially race-prone fashion. Because of our usage patterns, it's not a problem for indexing, but if the indexer encounters an error and shuts itself down, then it can be a problem when the instance is restarted and automatically attaches the EBS volume.

Specifically, just now I restarted a failed indexer and saw the EBS volume exposed as nvme1n1 and the instance's local storage exposed as nvme2n1, which should not be possible. Specifically, lsblk said:

loop0         7:0    0    18M  1 loop /snap/amazon-ssm-agent/1455
loop1         7:1    0  88.7M  1 loop /snap/core/7396
nvme0n1     259:2    0     8G  0 disk 
└─nvme0n1p1 259:3    0     8G  0 part /
nvme1n1     259:0    0   300G  0 disk 
nvme2n1     259:1    0 186.3G  0 disk 

Thankfully udev rules are designed for this scenario, and there are options like https://github.com/oogali/ebs-automatic-nvme-mapping. Amazon also seems to have some kind of setup for this already for their own linux distro, so we should look into it and/or whether there are other pre-existing Ubuntu packages available that help do this.

Depends on: 1567724

The context of this enhancement has changed. I corrected the mount logic in bug 1601451 to be resilient, but it's still annoying as a human when re-attaching to a stopped indexer to manually figure out which nvme partition to use. That said, it might make sense to just moot this issue with a helper script that's automatically run when an indexer is restarted which is what I proposed in https://github.com/mozsearch/mozsearch/pull/261#issuecomment-566280672.

You need to log in before you can comment on or make changes to this bug.