Closed Bug 1426441 Opened 6 years ago Closed 5 years ago

Moonshot node fails to PXE boot from port 1 nic

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: dividehex, Unassigned)

References

Details

Netops recently reconfigured the switch ports associated with the c5 node (bug 1424322) in chassis 7 in order to PXE boot the node.  The same preceedure was done to the c39 node (bug 1423699) in chassis without issue.

In this case, PXE boot seems to be doing a proper dhcp exchange and gets the proper next-server IP and bootfile (/bootx64.efi) but is unable to download the efi bootfile via tftp.

The error message consists of:
NBP filesize is 0 Bytes
PXE-E18: Server response timeout.

tcpdump on the tftp host indicate no traffic from the node in question is reaching the tftpd host (admin1a.private.mdc1.mozilla.com)

In troubleshooting this, I virtual booted a sysrescuecd environment (via iLo) and tested both nics (eno1 and eno1d1).  eno1 initially would not get a DHCP address but when I unchecked dhcp on the host record (t-linux64-xe-275.test.releng.mdc1.mozilla.com)in infoblox it successfully pulled an IP from the dynamic pool at the end of the range.  So there might be something strange is going with the infoblox dhcp side of things.

Once both interfaces were up and configured with DHCP, I tested pulling the tftp bootfile directly (get /bootx64.efi).  I then did the same test with one port down and the other up, then vice versa.  All tftp tests worked.

I'd also like to note PXE booting on port2(eno1d1) works fine.  Just not on port1.  UEFI bios settings look identical on both nics.

Also see https://bugzilla.mozilla.org/show_bug.cgi?id=1424322#c2
I have confirmed that all ports are up with the port channels (LACP) disabled.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.