Moonshot node fails to PXE boot from port 1 nic

NEW
Unassigned

Status

a year ago
9 months ago

People

(Reporter: dividehex, Unassigned)

Tracking

(Blocks: 1 bug)

Details

Netops recently reconfigured the switch ports associated with the c5 node (bug 1424322) in chassis 7 in order to PXE boot the node.  The same preceedure was done to the c39 node (bug 1423699) in chassis without issue.

In this case, PXE boot seems to be doing a proper dhcp exchange and gets the proper next-server IP and bootfile (/bootx64.efi) but is unable to download the efi bootfile via tftp.

The error message consists of:
NBP filesize is 0 Bytes
PXE-E18: Server response timeout.

tcpdump on the tftp host indicate no traffic from the node in question is reaching the tftpd host (admin1a.private.mdc1.mozilla.com)

In troubleshooting this, I virtual booted a sysrescuecd environment (via iLo) and tested both nics (eno1 and eno1d1).  eno1 initially would not get a DHCP address but when I unchecked dhcp on the host record (t-linux64-xe-275.test.releng.mdc1.mozilla.com)in infoblox it successfully pulled an IP from the dynamic pool at the end of the range.  So there might be something strange is going with the infoblox dhcp side of things.

Once both interfaces were up and configured with DHCP, I tested pulling the tftp bootfile directly (get /bootx64.efi).  I then did the same test with one port down and the other up, then vice versa.  All tftp tests worked.

I'd also like to note PXE booting on port2(eno1d1) works fine.  Just not on port1.  UEFI bios settings look identical on both nics.

Also see https://bugzilla.mozilla.org/show_bug.cgi?id=1424322#c2

Comment 1

11 months ago
I have confirmed that all ports are up with the port channels (LACP) disabled.

Updated

11 months ago
Depends on: 1430287
Depends on: 1444469
You need to log in before you can comment on or make changes to this bug.