Closed Bug 1048623 Opened 11 years ago Closed 11 years ago

openstack pxe gets stuck downloading the deploy agent kernel

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86_64
Windows 7
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: dividehex, Assigned: dividehex)

References

Details

When ironic provisions a baremetal node, it pxe boots to a linux env to run the deploy agent. In this case, the pxe firmware is getting hung up on downloading the kernel via tftp. I've tried restarting dnsmasq which serves up tftp and I've also tried cold booting the baremetal node. Right now it seems fairly consistent since I've rebooted it several times. My only other thought is a udp packet loss issue. Both the bm node and the dhcp server are within the same vlan so it is isolated to L2. Although I would note that we are doing funny business with the esx vswitch where we allow promisc and mac spoofing in order for the neutron server to act as a vswitch to isolate dhcp server agents. This might also be a good argument to flash our bm nics with iPxe for replacing tftp with http. I know there is work to incorporate iPxe into ironic for just this purpose. Aug 4 16:29:53 ironic1 dnsmasq-tftp[7370]: sent /tftpboot/pxelinux.cfg/01-00-25-90-94-21-cc to 10.26.88.14 Aug 4 16:31:20 ironic1 dnsmasq-tftp[7370]: failed sending /tftpboot/aeb23234-f5f9-4c98-8472-6cc667b7f3f9/deploy_kernel to 10.26.88.14 Aug 4 16:32:06 ironic1 dnsmasq-tftp[7370]: failed sending /tftpboot/pxelinux.cfg/01-00-25-90-94-21-cc to 10.26.88.14 Aug 4 16:32:06 ironic1 dnsmasq-tftp[7370]: message repeated 2 times: [ failed sending /tftpboot/pxelinux.cfg/01-00-25-90-94-21-cc to 10.26.88.14] Aug 4 16:32:08 ironic1 dnsmasq-tftp[7370]: failed sending /tftpboot/aeb23234-f5f9-4c98-8472-6cc667b7f3f9/deploy_kernel to 10.26.88.14 Aug 4 16:32:08 ironic1 dnsmasq-tftp[7370]: message repeated 2 times: [ failed sending /tftpboot/aeb23234-f5f9-4c98-8472-6cc667b7f3f9/deploy_kernel to 10.26.88.14]
Interesting that the tftp *server* knows something has gone wrong. Does a tcpdump show any interesting evidence?
Assignee: relops → jwatkins
wontfix since openstack has been sidelined. When we reevaluate openstack in the future, we can check if this has been corrected in future releases. If not, it will be reopened. Hopefully, ipxe will be well integrated with ironic by that time and we won't be relying on tftp as heavily during the deployment process.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.