Experiment with OpenStack as a future cloud deployment platform for releng

RESOLVED FIXED

Status

RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: dustin, Assigned: dividehex)

Tracking

Details

Let's get an instance of OpenStack set up in the relabs VLAN.  The cloudstack hosts can be VMs, and we can use ix-mn hardware for bare-metal stuff.
Depends on: 963172
openstack1.relabs.releng.scl3 should be ready for you.  I landed the node change on 'default', and puppetized against it in my puppet environment.  Have a look and, if it looks OK, merge to prod?
(Assignee)

Comment 2

5 years ago
thanks Dustin.  lg, I've merged to prod
(Assignee)

Comment 3

5 years ago
Trying to install the openstack rpms failed from the broken dependencies.  I disabled puppet and upgraded the os against the external centos repos since puppetagain repos are not up-to-date.

Installation is going much smoother now.  Running 2013.2.1-1.el6 from the openstack-havana repo
(Assignee)

Comment 4

5 years ago
I have some of the openstack compenents running at the moment.  This includes keystone, glance, and nova.  All have been registered with keystone and glance has a test image imported.  Next steps will be to install a compute node and test a deployment.
Depends on: 968615
No longer depends on: 968615
(Assignee)

Comment 5

5 years ago
I successfully setup a compute node on hp6.relabs and was able to launch kvm instances from both the web ui and the CLI.  It took some mucking with the nic interfaces and some bridging to accomplish this but it is working nicely.  I also ran into permission issue with prevented the flatnetwork driver from properly altering the IPTables on the nova compute host.  I couldn't find it mentioned in the installation guide but a sudoer rule, on both the controller and compute host, needed to by added to allow the wrapper script to do so.

The next step here is to prep some ix-mn machines for baremetal testing.  I'll be using ix-mn-2 and ix-mn-3 as baremetal nodes and hp5 as a baremetal compute host.  I might also spin up a new separate OS controller instance for simplicity sake.  These will all be moved to a different vlan (vlan 260, mobile.releng) since the barematal compute host must serve its own managed dhcp.  Dhcp-helper has already been removed in bug971253.
I had to do:

keystone role-create --name="Member"

so that I could add additional projects (it was throwing errors about not having role Member in the http logs).  I'm not sure if role _member_ (which was created) was a typo of if that's used for something else.
(Assignee)

Updated

5 years ago
Depends on: 974697
(Assignee)

Comment 7

5 years ago
I ran through the instructions on baremetal provisioning and was able to complete most of the setup.  The instructions can be found here: https://wiki.openstack.org/wiki/Baremetal

I decided to reuse openstack1.relabs.  I've added a second nic (trunked to mobile.releng) and we'll need to reconfigure the networking on it.

* databases created and initialized
* nova configured for baremetal
* extra packages and services installed (ipmitools, dnsmasq, etc)
* baremetal ubuntu image created plus pxe boot image (kernel + initrd)
* images uploaded and associated in glance.
* various directories created

Not completed:
* dnsmasq started
* hardware enrollment
These are dependent on getting the ix-mn and hp nodes moved over to the mobile.releng vlan


-------- Snippets from CLI ----------

bin/disk-image-create -u baremetal base ubuntu -o pxe
-rw-r--r-- 1 root root   22121158 Feb 19 16:39 pxe.initrd
-rw-r--r-- 1 root root 1086521344 Feb 19 16:39 pxe.qcow2
-rw-r--r-- 1 root root    5631792 Feb 19 16:39 pxe.vmlinuz

bin/ramdisk-image-create -a i386 -o pxe-ramdisk deploy ramdisk base ubuntu
-rw-r--r-- 1 root root 77917122 Feb 19 16:54 pxe-ramdisk.initramfs
-rw----r-- 1 root root  5664656 Feb 19 16:54 pxe-ramdisk.kernel


[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# glance image-create --name bm-ubuntu-vmlinuz --public --disk-format aki < pxe.vmlinuz
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | 3d20b791dbf143334225f6fb78717909     |
| container_format | aki                                  |
| created_at       | 2014-02-20T00:59:50                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | aki                                  |
| id               | 0a9959e7-60a2-46a5-a162-2f62efabd83e |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | bm-ubuntu-vmlinuz                    |
| owner            | 0e2ab7939c944174a47b7b8cbcdcfa3a     |
| protected        | False                                |
| size             | 5631792                              |
| status           | active                               |
| updated_at       | 2014-02-20T00:59:51                  |
+------------------+--------------------------------------+

[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# glance image-create --name bm-ubuntu-initrd --public --disk-format ari < pxe.initrd
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | f57144279a299572751710935f8b8c9c     |
| container_format | ari                                  |
| created_at       | 2014-02-20T01:01:07                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | ari                                  |
| id               | 1b3a6ef2-2305-422c-83b7-cc60cd5ebdca |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | bm-ubuntu-initrd                     |
| owner            | 0e2ab7939c944174a47b7b8cbcdcfa3a     |
| protected        | False                                |
| size             | 22121158                             |
| status           | active                               |
| updated_at       | 2014-02-20T01:01:08                  |
+------------------+--------------------------------------+

[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# glance image-create --name bm-ubuntu-image --public --disk-format qcow2 --container-format bare --property kernel_id=0a9959e7-60a2-46a5-a162-2f62efabd83e --property ramdisk_id=1b3a6ef2-2305-422c-83b7-cc60cd5ebdca < pxe.qcow2
+-----------------------+--------------------------------------+
| Property              | Value                                |
+-----------------------+--------------------------------------+
| Property 'kernel_id'  | 0a9959e7-60a2-46a5-a162-2f62efabd83e |
| Property 'ramdisk_id' | 1b3a6ef2-2305-422c-83b7-cc60cd5ebdca |
| checksum              | 6e34c100ec33648a343c89619c414501     |
| container_format      | bare                                 |
| created_at            | 2014-02-20T01:04:25                  |
| deleted               | False                                |
| deleted_at            | None                                 |
| disk_format           | qcow2                                |
| id                    | a77fcc5a-6d9b-478f-ad48-b3f275ddd0ad |
| is_public             | True                                 |
| min_disk              | 0                                    |
| min_ram               | 0                                    |
| name                  | bm-ubuntu-image                      |
| owner                 | 0e2ab7939c944174a47b7b8cbcdcfa3a     |
| protected             | False                                |
| size                  | 1086521344                           |
| status                | active                               |
| updated_at            | 2014-02-20T01:04:38                  |
+-----------------------+--------------------------------------+

[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# glance image-create --name deploy-vmlinuz --public --disk-format aki < pxe-ramdisk.kernel
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | 73e911c66cc052132e280a3b684137f0     |
| container_format | aki                                  |
| created_at       | 2014-02-20T01:06:00                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | aki                                  |
| id               | 98f2a05e-d6b6-4812-8581-4c2c4a625bc2 |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | deploy-vmlinuz                       |
| owner            | 0e2ab7939c944174a47b7b8cbcdcfa3a     |
| protected        | False                                |
| size             | 5664656                              |
| status           | active                               |
| updated_at       | 2014-02-20T01:06:00                  |
+------------------+--------------------------------------+

[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# glance image-create --name deploy-initrd --public --disk-format ari < pxe-ramdisk.initramfs
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | e789a9bf0e0002f69ac142ce639ade44     |
| container_format | ari                                  |
| created_at       | 2014-02-20T01:06:53                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | ari                                  |
| id               | 128fb087-1229-4e4d-a226-b7bb222f5866 |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | deploy-initrd                        |
| owner            | 0e2ab7939c944174a47b7b8cbcdcfa3a     |
| protected        | False                                |
| size             | 77917122                             |
| status           | active                               |
| updated_at       | 2014-02-20T01:06:53                  |
+------------------+--------------------------------------+

[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# nova flavor-create ix-mn.crap 100 7864 250 8
+-----+------------+-----------+------+-----------+------+-------+-------------+-----------+
| ID  | Name       | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+-----+------------+-----------+------+-----------+------+-------+-------------+-----------+
| 100 | ix-mn.crap | 7864      | 250  | 0         |      | 8     | 1.0         | True      |
+-----+------------+-----------+------+-----------+------+-------+-------------+-----------+

nova flavor-key ix-mn.crap set cpu_arch=x86_64 "baremetal:deploy_kernel_id"=98f2a05e-d6b6-4812-8581-4c2c4a625bc2 "baremetal:deploy_ramdisk_id"=128fb087-1229-4e4d-a226-b7bb222f5866


mysql> CREATE DATABASE nova_bm;
Query OK, 1 row affected (0.00 sec)

mysql> GRANT ALL ON nova_bm.* TO 'nova_bm'@'localhost' IDENTIFIED BY '$password';
Query OK, 0 rows affected (0.22 sec)


[root@openstack1.relabs.releng.scl3.mozilla.com nova]# mkdir -p /var/lib/nova/baremetal/dnsmasq
[root@openstack1.relabs.releng.scl3.mozilla.com nova]# mkdir -p /var/lib/nova/baremetal/console
[root@openstack1.relabs.releng.scl3.mozilla.com nova]# chown -R nova /var/lib/nova/baremetal
(Assignee)

Comment 8

5 years ago
I've increased the memory on openstack1 from 4gb to 8gb so the disk imaging utilities may take advantage of building with tmpfs (in ram) instead of on disk.
my notes (so I can find 'em later) - https://etherpad.mozilla.org/l3nEPnrR6n
(Assignee)

Comment 10

5 years ago
hp6, ix-mn-3 and ix-mn-4 have been moved to mobile.releng along with there OOB port (w/ static ips)

I've launched dnsmasq and register ix-mn-3 to the nova-compute host.  But when I attempt to launch a bare metal instances, it fails (as expected).
There are several things here that I will follow up on.  

* nova-compute might not be configured correctly
* netconfig template needs to be created
* nova-compute host name in baremetal node list doesn't match the hostname in service list
* nova-compute trying to write to directories with bad perms or don't exist
* baremetal node list does not show mac.  Might not have been registered correctly

------ some CLI snippets from today ---------

sudo dnsmasq --conf-file= --port=0 --enable-tftp --tftp-root=/tftpboot --dhcp-boot=pxelinux.0 --bind-interfaces --pid-file=/var/run/dnsmasq.pid --interface=eth1 --dhcp-range=10.26.61.150,10.26.61.199

[root@openstack1.relabs.releng.scl3.mozilla.com ~]# nova baremetal-node-create --pm_address=10.26.61.102 --pm_user=############### --pm_password=########## openstack1.mobile.releng.scl3.mozilla.com 8 7864 250 00:25:90:94:22:b0
+---------------+----------------------------------------------------------------------------------------+
| Property      | Value                                                                                  |
+---------------+----------------------------------------------------------------------------------------+
| instance_uuid | None                                                                                   |
| pm_address    | 10.26.61.102                                                                           |
| interfaces    | [{u'datapath_id': None, u'id': 1, u'port_no': None, u'address': u'00:25:90:94:22:b0'}] |
| cpus          | 8                                                                                      |
| memory_mb     | 7864                                                                                   |
| service_host  | openstack1.mobile.releng.scl3.mozilla.com                                              |
| local_gb      | 250                                                                                    |
| id            | 1                                                                                      |
| pm_user       | ############                                                                           |
| terminal_port | None                                                                                   |
+---------------+----------------------------------------------------------------------------------------+
Host networking looks OK.  A dhclient run on ix-mn-3 gets me 10.26.61.177, and I see that in dnsmasq.leases.

I tried creating an instance with
  nova boot --flavor ix-mn.crap --image bm-ubuntu-image test-bm-inst

A look at logfiles that change when creating a new node shows the images being downloaded from glance:

2014-02-21 05:36:09.169 2106 INFO glance.registry.api.v1.images [d1d98b0b-09e1-4431-b4a2-0263b54fe99e 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Successfully retrieved image 0a9959e7-60a2-46a5-a162-2f62efabd83e
2014-02-21 05:36:09.184 2106 INFO glance.registry.api.v1.images [6debdca3-8c5c-4a2b-b30f-4af7c2e180a5 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Successfully retrieved image 0a9959e7-60a2-46a5-a162-2f62efabd83e
2014-02-21 05:36:09.200 2106 INFO glance.registry.api.v1.images [175f03f4-cfe1-4742-9189-68a6f9b500ed 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Successfully retrieved image 1b3a6ef2-2305-422c-83b7-cc60cd5ebdca
2014-02-21 05:36:09.216 2106 INFO glance.registry.api.v1.images [bcffec93-e8d1-46aa-81bd-9e9e206e4aa7 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Successfully retrieved image 1b3a6ef2-2305-422c-83b7-cc60cd5ebdca
2014-02-21 05:36:09.372 2106 INFO glance.registry.api.v1.images [454d5dba-3331-484d-84d4-4ed9ab42ca9b 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Successfully retrieved image a77fcc5a-6d9b-478f-ad48-b3f275ddd0ad
2014-02-21 05:36:09.388 2106 INFO glance.registry.api.v1.images [a95081a0-77c1-44be-a307-3ac3272138ed 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Successfully retrieved image a77fcc5a-6d9b-478f-ad48-b3f275ddd0ad

No other interesting files under /var/log/ changed.  So the logging is going somewhere else.

In the iKVM view, I don't see any evidence of power management, nor do I see any traffic on eth1 (the vlan260 interface).

I enabled verbose logging in nova and restarted all of the openstack-nova-* services set to run in runlevel 3.  On trying again, still no traffic, but there's a message!

[root@openstack1.relabs.releng.scl3.mozilla.com ~]# nova boot --flavor ix-mn.crap --image bm-ubuntu-image test-bm-instance
+--------------------------------------+-----------------------------------------------------------------------------------------------+
| Property                             | Value                                                                                         |
+--------------------------------------+-----------------------------------------------------------------------------------------------+
| OS-EXT-STS:task_state                | None                                                                                          |
| image                                | bm-ubuntu-image                                                                               |
| OS-EXT-STS:vm_state                  | error                                                                                         |
| OS-EXT-SRV-ATTR:instance_name        | instance-0000000d                                                                             |
| OS-SRV-USG:launched_at               | None                                                                                          |
| flavor                               | ix-mn.crap                                                                                    |
| id                                   | 23462889-3721-4902-bf47-7bfac4cdbf3e                                                          |
| security_groups                      | [{u'name': u'default'}]                                                                       |
| user_id                              | 4c05e549d8d24c0aba0a21433c0cf66a                                                              |
| OS-DCF:diskConfig                    | MANUAL                                                                                        |
| accessIPv4                           |                                                                                               |
| accessIPv6                           |                                                                                               |
| OS-EXT-STS:power_state               | 0                                                                                             |
| OS-EXT-AZ:availability_zone          | nova                                                                                          |
| config_drive                         |                                                                                               |
| status                               | ERROR                                                                                         |
| updated                              | 2014-02-21T13:45:21Z                                                                          |
| hostId                               |                                                                                               |
| OS-EXT-SRV-ATTR:host                 | None                                                                                          |
| OS-SRV-USG:terminated_at             | None                                                                                          |
| key_name                             | None                                                                                          |
| OS-EXT-SRV-ATTR:hypervisor_hostname  | None                                                                                          |
| name                                 | test-bm-instance                                                                              |
| adminPass                            | ############                                                                                  |
| tenant_id                            | 0e2ab7939c944174a47b7b8cbcdcfa3a                                                              |
| created                              | 2014-02-21T13:45:20Z                                                                          |
| os-extended-volumes:volumes_attached | []                                                                                            |
| fault                                | {u'message': u'No valid host was found. ', u'code': 500, u'created': u'2014-02-21T13:45:21Z'} |
| metadata                             | {}                                                                                            |
+--------------------------------------+-----------------------------------------------------------------------------------------------+

And *slightly* more verbose logging:

2014-02-21 05:45:21.084 11541 INFO nova.scheduler.filter_scheduler [req-b6af021a-9748-46b6-b432-919c7edf8161 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Attempting to build 1 instance(s) uuids: [u'23462889-3721-4902-bf47-7bfac4cdbf3e']
2014-02-21 05:45:21.093 11541 WARNING nova.scheduler.driver [req-b6af021a-9748-46b6-b432-919c7edf8161 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] [instance: 23462889-3721-4902-bf47-7bfac4cdbf3e] Setting instance to ERROR state.

That message corresponds to the NoValidHost exception, which is raised by the chance and filter_scheduler schedulers.  Those are defined in the python-nova RPM.

I set debug=True in nova.conf and also added some extra logging to /usr/lib/python2.6/site-packages/nova/scheduler/filter_scheduler.py.  Lots more output, although most of it is a dump of the AMQP traffic.  Still:

2014-02-21 06:02:03.706 11813 DEBUG nova.filters [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Starting with 2 host(s) get_filtered_objects /usr/lib/python2.6/site-packages/nova/filters.py:70
2014-02-21 06:02:03.706 11813 DEBUG nova.scheduler.filters.retry_filter [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Host [u'openstack1.relabs.releng.scl3.mozilla.com', u'aea07aca-7103-44cc-999a-9ca7b69628cf'] passes.  Previously tried hosts: [] host_passes /usr/lib/python2.6/site-packages/nova/scheduler/filters/retry_filter.py:45
2014-02-21 06:02:03.707 11813 DEBUG nova.scheduler.filters.retry_filter [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Host [u'hp6.relabs.releng.scl3.mozilla.com', u'hp6.relabs.releng.scl3.mozilla.com'] passes.  Previously tried hosts: [] host_passes /usr/lib/python2.6/site-packages/nova/scheduler/filters/retry_filter.py:45
2014-02-21 06:02:03.707 11813 DEBUG nova.filters [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Filter RetryFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.6/site-packages/nova/filters.py:85
2014-02-21 06:02:03.707 11813 DEBUG nova.filters [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Filter AvailabilityZoneFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.6/site-packages/nova/filters.py:85
2014-02-21 06:02:03.708 11813 DEBUG nova.scheduler.filters.ram_filter [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] (hp6.relabs.releng.scl3.mozilla.com, hp6.relabs.releng.scl3.mozilla.com) ram:2251 disk:214016 io_ops:0 instances:2 does not have 7864 MB usable ram, it only has 2251.0 MB usable ram. host_passes /usr/lib/python2.6/site-packages/nova/scheduler/filters/ram_filter.py:60
2014-02-21 06:02:03.708 11813 DEBUG nova.filters [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Filter RamFilter returned 1 host(s) get_filtered_objects /usr/lib/python2.6/site-packages/nova/filters.py:85
2014-02-21 06:02:03.708 11813 DEBUG nova.servicegroup.api [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Check if the given member [{u'binary': u'nova-compute', u'deleted': 0L, u'created_at': datetime.datetime(2014, 2, 21, 0, 58, 23), u'updated_at': datetime.datetime(2014, 2, 21, 14, 1, 59), u'report_count': 4701L, u'topic': u'compute', u'host': u'openstack1.relabs.releng.scl3.mozilla.com', u'disabled': False, u'deleted_at': None, u'disabled_reason': None, u'id': 7L}] is part of the ServiceGroup, is up service_is_up /usr/lib/python2.6/site-packages/nova/servicegroup/api.py:94
2014-02-21 06:02:03.709 11813 DEBUG nova.servicegroup.drivers.db [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] DB_Driver.is_up last_heartbeat = 2014-02-21 14:01:59 elapsed = 4.709111 is_up /usr/lib/python2.6/site-packages/nova/servicegroup/drivers/db.py:71
2014-02-21 06:02:03.709 11813 DEBUG nova.filters [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Filter ComputeFilter returned 1 host(s) get_filtered_objects /usr/lib/python2.6/site-packages/nova/filters.py:85
2014-02-21 06:02:03.709 11813 DEBUG nova.scheduler.filters.compute_capabilities_filter [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] extra_spec requirement 'x86_64' does not match {i386|x86_64}' _satisfies_extra_specs /usr/lib/python2.6/site-packages/nova/scheduler/filters/compute_capabilities_filter.py:63
2014-02-21 06:02:03.710 11813 DEBUG nova.scheduler.filters.compute_capabilities_filter [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] (openstack1.relabs.releng.scl3.mozilla.com, aea07aca-7103-44cc-999a-9ca7b69628cf) ram:7864 disk:256000 io_ops:0 instances:0 fails instance_type extra_specs requirements host_passes /usr/lib/python2.6/site-packages/nova/scheduler/filters/compute_capabilities_filter.py:73
2014-02-21 06:02:03.710 11813 DEBUG nova.filters [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Filter ComputeCapabilitiesFilter returned 0 host(s) get_filtered_objects /usr/lib/python2.6/site-packages/nova/filters.py:85
2014-02-21 06:02:03.710 11813 INFO nova.scheduler.filter_scheduler [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] HERE 7
2014-02-21 06:02:03.711 11813 WARNING nova.scheduler.driver [req-0dea13f6-cdea-4c2f-bdc6-70053b41eb93 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] [instance: cc51446a-4a65-4144-8bd4-7b7162b6dac7] Setting instance to ERROR state.

I'm guessing
  extra_spec requirement 'x86_64' does not match {i386|x86_64}'
is the problem.  But I'm not sure where those values are coming from.
Sure enough, that {i386|x86_64} value seems to be associated with the compute node:

INSERT INTO `compute_nodes` (`created_at`, `updated_at`, `deleted_at`, `id`, `service_id`, `vcpus`, `memory_mb`, `local_gb`, `vcpus_used`, `memory_mb_used`, `local_gb_used`, `hypervisor_type`, `hypervisor_version`, `cpu_info`, `disk_avail
able_least`, `free_ram_mb`, `free_disk_gb`, `current_workload`, `running_vms`, `hypervisor_hostname`, `deleted`, `host_ip`, `supported_instances`, `pci_stats`) VALUES ('2014-02-21 02:46:24','2014-02-21 15:04:57',NULL,2,7,8,7864,250,0,0,0,
'baremetal',1,'baremetal cpu',NULL,7864,250,0,0,'aea07aca-7103-44cc-999a-9ca7b69628cf',0,'192.168.91.10','[[\"{i386|x86_64}\", \"baremetal\", \"baremetal\"]]','{}');

and (probably derivative)

INSERT INTO `compute_node_stats` (`created_at`, `updated_at`, `deleted_at`, `id`, `compute_node_id`, `key`, `value`, `deleted`) VALUES ('2014-02-21 02:46:24',NULL,NULL,36,2,'cpu_arch','{i386|x86_64}',0);
Ah:

nova.conf:
[baremetal]
instance_type_extra_specs = cpu_arch:{i386|x86_64}

So

[root@openstack1.relabs.releng.scl3.mozilla.com ~]# nova flavor-key ix-mn.crap set cpu_arch='{i386|x86_64}'
[root@openstack1.relabs.releng.scl3.mozilla.com ~]# nova flavor-show ix-mn.crap
+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property                   | Value                                                                                                                                                                            |
+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| name                       | ix-mn.crap                                                                                                                                                                       |
| ram                        | 7864                                                                                                                                                                             |
| OS-FLV-DISABLED:disabled   | False                                                                                                                                                                            |
| vcpus                      | 8                                                                                                                                                                                |
| extra_specs                | {u'cpu_arch': u'{i386|x86_64}', u'baremetal:deploy_kernel_id': u'98f2a05e-d6b6-4812-8581-4c2c4a625bc2', u'baremetal:deploy_ramdisk_id': u'128fb087-1229-4e4d-a226-b7bb222f5866'} |
| swap                       |                                                                                                                                                                                  |
| os-flavor-access:is_public | True                                                                                                                                                                             |
| rxtx_factor                | 1.0                                                                                                                                                                              |
| OS-FLV-EXT-DATA:ephemeral  | 0                                                                                                                                                                                |
| disk                       | 250                                                                                                                                                                              |
| id                         | 100                                                                                                                                                                              |
+----------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

[root@openstack1.relabs.releng.scl3.mozilla.com ~]# nova boot --flavor ix-mn.crap --image bm-ubuntu-image test-bm-instance
ERROR: Quota exceeded for cores: Requested 8, but already used 18 of 20 cores (HTTP 413) (Request-ID: req-8ac9b333-f515-4e73-8d78-6f6c9b4705d6)
I bumped the quota up to 1000 and re-ran the nova boot.  It successfully powered the host off!  Yet

[root@openstack1.relabs.releng.scl3.mozilla.com ~]# nova show e2b71c76-c7c7-4f80-8dac-90dc7be37611 | grep fault
| security_groups                      | [{u'name': u'default'}]                                                                       |
| fault                                | {u'message': u'No valid host was found. ', u'code': 500, u'created': u'2014-02-21T15:31:57Z'} |

Judging from the logs, on the first round the scheduler found openstack1 to be a suitable host, but it failed, so it re-visited and couldn't find any more suitable hosts.

In compute.log:

> 2014-02-21 07:30:47.706 11448 AUDIT nova.compute.manager [req-aaba78b3-4696-4331-b901-d4c43bc93668 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] [instance: e2b71c76-c7c7-4f80-8dac-90dc7be37611] Starting instance...
> 2014-02-21 07:30:48.003 11448 AUDIT nova.compute.claims [req-aaba78b3-4696-4331-b901-d4c43bc93668 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] [instance: e2b71c76-c7c7-4f80-8dac-90dc7be37611] Attempting claim: memory 7864 MB, disk 250 GB, VCPUs 8
> 2014-02-21 07:30:48.004 11448 AUDIT nova.compute.claims [req-aaba78b3-4696-4331-b901-d4c43bc93668 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] [instance: e2b71c76-c7c7-4f80-8dac-90dc7be37611] Total Memory: 7864 MB, u sed: 0.00 MB
> 2014-02-21 07:30:48.004 11448 AUDIT nova.compute.claims [req-aaba78b3-4696-4331-b901-d4c43bc93668 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] [instance: e2b71c76-c7c7-4f80-8dac-90dc7be37611] Memory limit: 7864.00 MB , free: 7864.00 MB
> 2014-02-21 07:30:48.004 11448 AUDIT nova.compute.claims [req-aaba78b3-4696-4331-b901-d4c43bc93668 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] [instance: e2b71c76-c7c7-4f80-8dac-90dc7be37611] Total Disk: 250 GB, used : 0.00 GB
> 2014-02-21 07:30:48.005 11448 AUDIT nova.compute.claims [req-aaba78b3-4696-4331-b901-d4c43bc93668 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] [instance: e2b71c76-c7c7-4f80-8dac-90dc7be37611] Disk limit not specified , defaulting to unlimited
> 2014-02-21 07:30:48.005 11448 AUDIT nova.compute.claims [req-aaba78b3-4696-4331-b901-d4c43bc93668 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] [instance: e2b71c76-c7c7-4f80-8dac-90dc7be37611] Total CPU: 8 VCPUs, used : 0.00 VCPUs
> 2014-02-21 07:30:48.005 11448 AUDIT nova.compute.claims [req-aaba78b3-4696-4331-b901-d4c43bc93668 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] [instance: e2b71c76-c7c7-4f80-8dac-90dc7be37611] CPU limit not specified, defaulting to unlimited
> 2014-02-21 07:30:48.005 11448 AUDIT nova.compute.claims [req-aaba78b3-4696-4331-b901-d4c43bc93668 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] [instance: e2b71c76-c7c7-4f80-8dac-90dc7be37611] Claim successful
> 2014-02-21 07:30:58.157 11448 AUDIT nova.compute.resource_tracker [-] Auditing locally available compute resources
> 2014-02-21 07:30:58.259 11448 AUDIT nova.compute.resource_tracker [-] Free ram (MB): 0
> 2014-02-21 07:30:58.260 11448 AUDIT nova.compute.resource_tracker [-] Free disk (GB): 0
> 2014-02-21 07:30:58.260 11448 AUDIT nova.compute.resource_tracker [-] Free VCPUS: 0
> 2014-02-21 07:30:58.326 11448 INFO nova.compute.resource_tracker [-] Compute_service record updated for openstack1.relabs.releng.scl3.mozilla.com:aea07aca-7103-44cc-999a-9ca7b69628cf
> 2014-02-21 07:31:48.280 11448 ERROR nova.compute.manager [-] Instance failed network setup after 1 attempt(s)
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager Traceback (most recent call last):
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1244, in _allocate_network_async
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager     dhcp_options=dhcp_options)
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager   File "/usr/lib/python2.6/site-packages/nova/network/api.py", line 94, in wrapped
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager     return func(self, context, *args, **kwargs)
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager   File "/usr/lib/python2.6/site-packages/nova/network/api.py", line 49, in wrapper
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager     res = f(self, context, *args, **kwargs)
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager   File "/usr/lib/python2.6/site-packages/nova/network/api.py", line 301, in allocate_for_instance
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager     nw_info = self.network_rpcapi.allocate_for_instance(context, **args)
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager   File "/usr/lib/python2.6/site-packages/nova/network/rpcapi.py", line 184, in allocate_for_instance
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager     macs=jsonutils.to_primitive(macs))
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager   File "/usr/lib/python2.6/site-packages/nova/rpcclient.py", line 85, in call
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager     return self._invoke(self.proxy.call, ctxt, method, **kwargs)
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager   File "/usr/lib/python2.6/site-packages/nova/rpcclient.py", line 63, in _invoke
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager     return cast_or_call(ctxt, msg, **self.kwargs)
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager   File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/proxy.py", line 130, in call
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager     exc.info, real_topic, msg.get('method'))
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager Timeout: Timeout while waiting on RPC response - topic: "network", RPC method: "allocate_for_instance" info: "<unknown>"
> 2014-02-21 07:31:48.280 11448 TRACE nova.compute.manager
> 2014-02-21 07:31:48.305 11448 ERROR nova.virt.baremetal.driver [req-aaba78b3-4696-4331-b901-d4c43bc93668 4c05e549d8d24c0aba0a21433c0cf66a 0e2ab7939c944174a47b7b8cbcdcfa3a] Error deploying instance e2b71c76-c7c7-4f80-8dac-90dc7be37611 on baremetal node aea07aca-7103-44cc-999a-9ca7b69628cf.

Which looks like a timeout contacting the network component of nova

[root@openstack1.relabs.releng.scl3.mozilla.com ~]# service openstack-nova-network start
Starting openstack-nova-network:                           [  OK  ]

Now:

> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3] Traceback (most recent call last):
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1423, in _spawn
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]     block_device_info)
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]   File "/usr/lib/python2.6/site-packages/nova/virt/baremetal/driver.py", line 272, in spawn
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]     _update_state(context, node, None, baremetal_states.DELETED)
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]   File "/usr/lib/python2.6/site-packages/nova/virt/baremetal/driver.py", line 241, in spawn
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]     network_info=network_info,
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]   File "/usr/lib/python2.6/site-packages/nova/virt/baremetal/pxe.py", line 343, in cache_images
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]     injected_files, admin_password)
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]   File "/usr/lib/python2.6/site-packages/nova/virt/baremetal/pxe.py", line 315, in _inject_into_image
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]     net_config = build_network_config(network_info)
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]   File "/usr/lib/python2.6/site-packages/nova/virt/baremetal/pxe.py", line 134, in build_network_config
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]     template = env.get_template(tmpl_file)
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]   File "/usr/lib/python2.6/site-packages/Jinja2-2.6-py2.6.egg/jinja2/environment.py", line 719, in get_template
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]     return self._load_template(name, self.make_globals(globals))
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]   File "/usr/lib/python2.6/site-packages/Jinja2-2.6-py2.6.egg/jinja2/environment.py", line 693, in _load_template
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]     template = self.loader.load(self, name, globals)
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]   File "/usr/lib/python2.6/site-packages/Jinja2-2.6-py2.6.egg/jinja2/loaders.py", line 115, in load
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]     source, filename, uptodate = self.get_source(environment, name)
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]   File "/usr/lib/python2.6/site-packages/Jinja2-2.6-py2.6.egg/jinja2/loaders.py", line 180, in get_source
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3]     raise TemplateNotFound(template)
> 2014-02-21 08:05:44.084 11448 TRACE nova.compute.manager [instance: ef669459-1be1-4086-ae9b-7710a389eeb3] TemplateNotFound: net-static.ubuntu.template

I blame the wiki page.  It says

[baremetal]
net_config_template = /opt/stack/nova/nova/virt/baremetal/net-static.ubuntu.template

Yet

[root@openstack1.relabs.releng.scl3.mozilla.com ~]# locate net-static.ubuntu.template
/usr/lib/python2.6/site-packages/nova/virt/baremetal/net-static.ubuntu.template

I edited this and also updated the openstack wiki.  Note that this particular error "wedges" openstack-nova-compute, so you have to restart it.

And, hey, it powered up and booted the ubuntu diskimage!!  But it fails to contact 10.26.61.0:10000 as an iSCSI target.
With that fixed, it boots (!!) into the deploy image.  However, that image comes up and tries to hit the nova compute node on port 100000, which is NDMP.  Which gives me flashbacks to my work at Zmanda (I implemented Amanda's support for NDMP).  But it isn't iSCSI, so I'm confused.
Ah, nova-baremetal-deploy-helper serves that port.  Amusingly, it's just doing a simple HTTP app on that port, not NDMP, although NDMP would be a *much* better choice for imaging machines than iSCSI.  Go fig.

nova-baremetal-deploy-helper doesn't have an initscript, so I just started it on the command line.  After PXE-booting again (since there seems to be no other way to un-wedge an un-booted baremetal node):

> 2014-02-21 09:20:11.532 16574 TRACE nova.virt.baremetal.deploy_helper ProcessExecutionError: Unexpected error while running command.
> 2014-02-21 09:20:11.532 16574 TRACE nova.virt.baremetal.deploy_helper Command: sfdisk -uM /dev/disk/by-path/ip-10.26.61.177:3260-iscsi-iqn-ee0cccb7-2018-44cf-b308-0d92258c9537-lun-1
> 2014-02-21 09:20:11.532 16574 TRACE nova.virt.baremetal.deploy_helper Exit code: 1
> 2014-02-21 09:20:11.532 16574 TRACE nova.virt.baremetal.deploy_helper Stdout: '\nDisk /dev/disk/by-path/ip-10.26.61.177:3260-iscsi-iqn-ee0cccb7-2018-44cf-b308-0d92258c9537-lun-1: 30522 cylinders, 255 heads, 63 sectors/track\nOld situation:\nUnits = mebibytes of 1048576 bytes, blocks of 1024 bytes, counting from 0\n\n   Device Boot Start   End    MiB    #blocks   Id  System\n/dev/disk/by-path/ip-10.26.61.177:3260-iscsi-iqn-ee0cccb7-2018-44cf-b308-0d9225  *     1    100    100     102400   83  Linux\n/dev/disk/by-path/ip-10.26.61.177:3260-iscsi-iqn-ee0cccb7-2018-44cf-b308-0d9225      101   4196   4096    4194304   82  Linux swap / Solaris\n/dev/disk/by-path/ip-10.26.61.177:3260-iscsi-iqn-ee0cccb7-2018-44cf-b308-0d9225     4197  239428  235232  240877568   83  Linux\n/dev/disk/by-path/ip-10.26.61.177:3260-iscsi-iqn-ee0cccb7-2018-44cf-b308-0d9225        0      -      0          0    0  Empty\n'
> 2014-02-21 09:20:11.532 16574 TRACE nova.virt.baremetal.deploy_helper Stderr: 'Checking that no-one is using this disk right now ...\nOK\nWarning: given size (256005) exceeds max allowable size (239421)\n\nsfdisk: bad input\n'
I bumped the disk size down to <250GB and tried again, with more success.  It seems to think it imaged the host, but when it booted, all I got was a blinking cursor.
The plan is to make openstack1 just run as a cloud controller - so no nova-compute service, and no nova-network service.  Figuring out how to configure networking has been problematic, as different sources give different advice.

I'm bringing up hp1 as an additional hypervisor node, to test the centos image, and we'll probably reinstall hp6 and make it the compute node supervising the bare-metal nodes.
(Assignee)

Comment 19

5 years ago
(In reply to Dustin J. Mitchell [:dustin] (I ignore NEEDINFO) from comment #18)

> I'm bringing up hp1 as an additional hypervisor node, to test the centos
> image, and we'll probably reinstall hp6 and make it the compute node
> supervising the bare-metal nodes.

Actually, I'm installing hp5 to be a compute node for baremetal only since it is already on the mobile.releng vlan.
I did this stuff:

yum install -y http://rdo.fedorapeople.org/rdo-release.rpm
yum install -y openstack-nova-network
yum install -y openstack-nova-compute
yum install -y openstack-nova python-novaclient openstack-utils
openstack-config --set /etc/nova/nova.conf database connection mysql://nova:NOVA_DBPASS@openstack1.relabs.releng.scl3.mozilla.com/nova
openstack-config --set /etc/nova/nova.conf DEFAULT rpc_backend nova.openstack.common.rpc.impl_qpid
openstack-config --set /etc/nova/nova.conf DEFAULT qpid_hostname openstack1.relabs.releng.scl3.mozilla.com
openstack-config --set /etc/nova/nova.conf DEFAULT auth_strategy keystone
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_host openstack1.relabs.releng.scl3.mozilla.com
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_protocol http
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_port 35357
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_user nova
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_tenant_name service
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_password NOVA_PASS
openstack-config --set /etc/nova/nova.conf DEFAULT my_ip $(facter ipaddress)
openstack-config --set /etc/nova/nova.conf DEFAULT vnc_enabled True
openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_listen 0.0.0.0
openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_proxyclient_address $(facter ipaddress)
openstack-config --set /etc/nova/nova.conf DEFAULT novncproxy_base_url http://openstack1.relabs.releng.scl3.mozilla.com:6080/vnc_auto.html
openstack-config --set /etc/nova/nova.conf DEFAULT glance_host openstack1.relabs.releng.scl3.mozilla.com

and then started the service, and somehow it magically invented a subnet for itself, 192.168.122.0/24.  That doesn't appear in the config files, nor in the DB (which does list 192.168.93.0/24, one of the networks added via the 'nova' command).  That, and creating a new instance didn't work either.  Foo.
I moved hp5's nic1 (eth1) to vlan278, and assigned its SREG-dictated IP there.  This way we have access to both hp5 and openstack1 without consoling.
Because consoling makes me inconsolable.
Well, I've submitted four docs/helpstrings patches to gerrit already, but it's still not working.
HP5
(jake set up repos)
yum install -y openstack-nova-network
yum install -y openstack-nova-compute
yum install -y openstack-nova python-novaclient openstack-utils
nova.conf settings from setup manual
openstack-config --set /etc/nova/nova.conf database connection mysql://nova:NOVA_DBPASS@openstack1.relabs.releng.scl3.mozilla.com/nova
openstack-config --set /etc/nova/nova.conf DEFAULT rpc_backend nova.openstack.common.rpc.impl_qpid
openstack-config --set /etc/nova/nova.conf DEFAULT qpid_hostname openstack1.relabs.releng.scl3.mozilla.com
openstack-config --set /etc/nova/nova.conf DEFAULT auth_strategy keystone
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_host openstack1.relabs.releng.scl3.mozilla.com
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_protocol http
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_port 35357
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_user nova
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_tenant_name service
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_password NOVA_PASS
openstack-config --set /etc/nova/nova.conf DEFAULT my_ip $(facter ipaddress)
openstack-config --set /etc/nova/nova.conf DEFAULT vnc_enabled True
openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_listen 0.0.0.0
openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_proxyclient_address $(facter ipaddress)
openstack-config --set /etc/nova/nova.conf DEFAULT novncproxy_base_url http://openstack1.relabs.releng.scl3.mozilla.com:6080/vnc_auto.html
openstack-config --set /etc/nova/nova.conf DEFAULT glance_host openstack1.relabs.releng.scl3.mozilla.com
nova.conf settings by hand:
openstack-config --set /etc/nova/nova.conf DEFAULT network_manager nova.network.manager.FlatDHCPManager
# eth1 is the vlan278 interface on hp5 (note, different on openstack1)
openstack-config --set /etc/nova/nova.conf DEFAULT public_interface eth1
openstack-config --set /etc/nova/nova.conf DEFAULT flat_interface eth0
openstack-config --set /etc/nova/nova.conf DEFAULT flat_network_bridge br260
[root@hp5.relabs.releng.scl3.mozilla.com network-scripts]# cat ifcfg-eth0
DEVICE="eth0"
BOOTPROTO="none"
BRIDGE="br260"
IPV6INIT="no"
MTU="1500"
NM_CONTROLLED="no"
ONBOOT="yes"
TYPE="Ethernet"
PEERNTP="no"
[root@hp5.relabs.releng.scl3.mozilla.com network-scripts]# cat ifcfg-eth1
DEVICE="eth1"
HWADDR="B4:99:BA:A6:A6:17"
NM_CONTROLLED="no"
ONBOOT="yes"
PEERNTP="no"
IPADDR=10.26.78.34
NETMASK=255.255.255.0
[root@hp5.relabs.releng.scl3.mozilla.com network-scripts]# cat ifcfg-br260
DEVICE=br260
TYPE=Bridge
ONBOOT=yes
BOOTPROTO=none
NM_CONTROLLED=no
DELAY=0
eth0 needs to be promiscuous; redhat net scripts don't support that, so
ifconfig eth0 promisc
flatdhcp creates the bridge networks for us - flat doesn't. but too late now.
fixed networks are configured via the 'nova' client, but it's not clear how things work when there are multiple networks.  The --fixed-cidr selects a subset of the IPs in the subnet to be made available as fixed IPs, although it still reserves the first two and last one IPs and won't hand them out.
nova network-create thereisonlyone --fixed-range-v4=192.168.93.0/24 --fixed-cidr=192.168.93.32/28  --bridge=br260 --multi-host=T
Start all the services..
nova boot --flavor m1.tiny --image "CirrOS 0.3.1" cirros-test
it does eventually come up, but curling the metadata server fails.  Also, it's running a busybox shell and dies if you hit escape :(
metadata is fixed:

set
metadata_host=127.0.0.1
in nova.conf
and start the openstack-nova-metadata-api service, and restart the openstack-nova-network service

docs fix: https://review.openstack.org/#/c/76999/

(other fixes along the way: https://review.openstack.org/#/c/76980/ https://review.openstack.org/#/c/76971/ https://review.openstack.org/#/c/76962/ https://review.openstack.org/#/c/75480/)
hp5 had its kernel hostname set to hp5.mobile.relabs, and somehow that's leaked into something, somewhere, and I can't make it go away.  I'm going to try Windows Fix #1 (reboot), and if that doesn't work, Windows Fix #2 (reinstall the DB).
Or #3: reinstall the OS.  I've kickstarted both openstack1 and hp5.  I also switched the VLANs on hp5 to match openstack1: eth0 on vlan278, eth1 on vlan260.
With due apologies for etherpad's inability to get line-feeds right, here's what I did to set up openstack1 with keystone, glance, and the nova API:

----

kickstart in relabs
disable puppet (rm /etc/cron.d/puppetcheck, kill anything running)
clean out /etc/yum.repos.d
yum install -y http://rdo.fedorapeople.org/rdo-release.rpm
put the following in /etc/yum.repos.d/base.repo:
----
[base]
name=CentOS-$releasever - Base
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
gpgcheck=1
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos4
protect=1
#released updates 
[update]
name=CentOS-$releasever - Updates
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
#baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
gpgcheck=1
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos4
protect=1 
----
yum install -y mysql mysql-server MySQL-python
set in /etc/my.cnf:[mysqld]
bind-address = 10.26.78.17
service mysqld start
chkconfig mysqld on
mysql_install_db
mysql_secure_installation
# accept the defaults and set the root pw
yum install -y http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
yum install -y openstack-utils openstack-selinux
yum upgrade -y
reboot
yum install -y qpid-cpp-server memcached
set 
auth=no
in /etc/qpidd.conf
service qpidd start
chkconfig qpidd on
## Identity Service
yum install -y openstack-keystone python-keystoneclient
openstack-config --set /etc/keystone/keystone.conf sql connection mysql://keystone:KEYSTONE_DBPASS@controller/keystone
openstack-db --init --service keystone --password KEYSTONE_DBPASS
openstack-config --set /etc/keystone/keystone.conf DEFAULT admin_token ADMIN_TOKEN
keystone-manage pki_setup --keystone-user keystone --keystone-group keystone
chown -R keystone:keystone /etc/keystone/* /var/log/keystone/keystone.log
service openstack-keystone start
chkconfig openstack-keystone on
export OS_SERVICE_TOKEN=ADMIN_TOKEN
export OS_SERVICE_ENDPOINT=http://openstack1.relabs.releng.scl3.mozilla.com:35357/v2.0
keystone tenant-create --name=admin --description="Admin Tenant"
keystone tenant-create --name=service --description="Service Tenant"
keystone user-create --name=admin --pass=ADMIN_PASS --email=dustin@mozilla.com
keystone role-create --name=admin
keystone role-create --name="Member"
keystone user-create --name=admin --pass=ADMIN_PASS --email=dustin@mozilla.com
keystone role-create --name=admin
keystone role-create --name="Member"
keystone user-role-add --user=admin --tenant=admin --role=admin
keystone service-create --name=keystone --type=identity --description="Keystone Identity Service"
# noting the genreated id and using it in..
keystone endpoint-create --service-id=60e66ce94d7745a1872e0701c6d520c6 --publicurl=http://openstack1.relabs.releng.scl3.mozilla.com:5000/v2.0 --internalurl=http://openstack1.relabs.releng.scl3.mozilla.com:5000/v2.0 --adminurl=http://openstack1.relabs.releng.scl3.mozilla.com:35357/v2.0
unset OS_SERVICE_TOKEN OS_SERVICE_ENDPOINT
add to ~/keystonerc:
----
export OS_USERNAME=admin
export OS_PASSWORD=ADMIN_PASS
export OS_TENANT_NAME=admin
export OS_AUTH_URL=http://openstack1.relabs.releng.scl3.mozilla.com:35357/v2.0
----
and add 'source keystonerc' to ~/.bashrc
source keystonerc
keystone user-list
## Image Service
yum install -y openstack-glance
openstack-config --set /etc/glance/glance-api.conf \
   DEFAULT sql_connection mysql://glance:GLANCE_DBPASS@openstack1.relabs.releng.scl3.mozilla.com/glance
openstack-config --set /etc/glance/glance-registry.conf \
   DEFAULT sql_connection mysql://glance:GLANCE_DBPASS@openstack1.relabs.releng.scl3.mozilla.com/glance
openstack-db --init --service glance --password GLANCE_DBPASS
keystone user-create --name=glance --pass=aisees0A --email=dustin@mozilla.com
keystone user-role-add --user=glance --tenant=service --role=admin
openstack-config --set /etc/glance/glance-api.conf keystone_authtoken auth_uri http://openstack1.relabs.releng.scl3.mozilla.com:5000
openstack-config --set /etc/glance/glance-api.conf keystone_authtoken auth_host openstack1.relabs.releng.scl3.mozilla.com
openstack-config --set /etc/glance/glance-api.conf keystone_authtoken admin_tenant_name service
openstack-config --set /etc/glance/glance-api.conf keystone_authtoken admin_user glance
openstack-config --set /etc/glance/glance-api.conf keystone_authtoken admin_password GLANCE_PASS
openstack-config --set /etc/glance/glance-api.conf paste_deploy flavor keystone
openstack-config --set /etc/glance/glance-registry.conf keystone_authtoken auth_uri http://openstack1.relabs.releng.scl3.mozilla.com:5000
openstack-config --set /etc/glance/glance-registry.conf keystone_authtoken auth_host openstack1.relabs.releng.scl3.mozilla.com
openstack-config --set /etc/glance/glance-registry.conf keystone_authtoken admin_tenant_name service
openstack-config --set /etc/glance/glance-registry.conf keystone_authtoken admin_user glance
openstack-config --set /etc/glance/glance-registry.conf keystone_authtoken admin_password GLANCE_PASS
openstack-config --set /etc/glance/glance-registry.conf paste_deploy flavor keystone 
cp /usr/share/glance/glance-api-dist-paste.ini /etc/glance/glance-api-paste.ini
cp /usr/share/glance/glance-registry-dist-paste.ini /etc/glance/glance-registry-paste.ini
edit filter:authtoken in both:
[filter:authtoken]
paste.filter_factory=keystoneclient.middleware.auth_token:filter_factory
auth_host=controller
admin_user=glance
admin_tenant_name=service
admin_password=GLANCE_PASS
keystone service-create --name=glance --type=image \
  --description="Glance Image Service"
# and use that ID in..
keystone endpoint-create --service-id=the_service_id_above --publicurl=http://openstack1.relabs.releng.scl3.mozilla.com:9292 --internalurl=http://openstack1.relabs.releng.scl3.mozilla.com:9292 --adminurl=http://openstack1.relabs.releng.scl3.mozilla.com:9292 
service openstack-glance-api start
service openstack-glance-registry start
chkconfig openstack-glance-api on
chkconfig openstack-glance-registry on
yum install -y openstack-nova python-novaclient
openstack-config --set /etc/nova/nova.conf database connection mysql://nova:NOVA_DBPASS@openstack1.relabs.releng.scl3.mozilla.com/nova
openstack-config --set /etc/nova/nova.conf \ >   DEFAULT rpc_backend nova.openstack.common.rpc.impl_qpid
openstack-config --set /etc/nova/nova.conf DEFAULT qpid_hostname openstack1.relabs.releng.scl3.mozilla.com
openstack-db --init --service nova --password NOVA_DBPASS
# note that in the following we use the *public* IP of this host - the host doesn't have an IP on the internal network (as described elsewhere in the docs)
openstack-config --set /etc/nova/nova.conf DEFAULT my_ip 10.26.78.17
openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_listen 10.26.78.17
openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_proxyclient_address 10.26.78.17
keystone user-create --name=nova --pass=NOVA_PASS --email=dustin@mozilla.com
keystone user-role-add --user=nova --tenant=service --role=admin
openstack-config --set /etc/nova/nova.conf DEFAULT auth_strategy keystone
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_host openstack1.relabs.releng.scl3.mozilla.com
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_protocol http
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_port 35357
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_user nova
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_tenant_name service
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_password ADMIN_PASSWORD 
# edit /etc/nova/api-paste.ini:
[filter:authtoken]
paste.filter_factory = keystoneclient.middleware.auth_token:filter_factory
auth_host = openstack1.relabs.releng.scl3.mozilla.com
auth_port = 35357
auth_protocol = http
auth_uri = http://openstack1.relabs.releng.scl3.mozilla.com:5000/v2.0
admin_tenant_name = service
admin_user = nova
admin_password = NOVA_PASS
keystone service-create --name=nova --type=compute  --description="Nova Compute service"
keystone endpoint-create --service-id=the_service_id_above --publicurl=http://openstack1.relabs.releng.scl3.mozilla.com:8774/v2/%\(tenant_id\)s --internalurl=http://openstack1.relabs.releng.scl3.mozilla.com:8774/v2/%\(tenant_id\)s --adminurl=http://openstack1.relabs.releng.scl3.mozilla.com:8774/v2/%\(tenant_id\)s
service openstack-nova-api start
service openstack-nova-cert start
service openstack-nova-consoleauth start
service openstack-nova-scheduler start
service openstack-nova-conductor start
service openstack-nova-novncproxy start
chkconfig openstack-nova-api on
chkconfig openstack-nova-cert on
chkconfig openstack-nova-consoleauth on
chkconfig openstack-nova-scheduler on
chkconfig openstack-nova-conductor on
chkconfig openstack-nova-novncproxy on
## Dashboard
yum install -y memcached python-memcached mod_wsgi openstack-dashboard
edit /etc/openstack-dashboard/local_settings
## note: using the Django session cache, not memcached (why bother)
ALLOWED_HOSTS = ['openstack1.relabs.releng.scl3.mozilla.com']
service httpd start
service memcached start
chkconfig httpd on
chkconfig memcached on
Came across another method of creating centos images - https://github.com/gc3-uzh-ch/openstack-tools (using virt-inst)
Setup of hp5, the compute node:

HP5
kickstart in relabs
disable puppet (rm /etc/cron.d/puppetcheck, kill anything running)
clean out /etc/yum.repos.d
yum install -y http://rdo.fedorapeople.org/rdo-release.rpm
put the following in /etc/yum.repos.d/base.repo:
----
[base]
name=CentOS-$releasever - Base
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
#baseurl=http://mirror.centos.org/centos/$releasever/os/$basearch/
gpgcheck=1
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos4
protect=1
#released updates 
[update]
name=CentOS-$releasever - Updates
mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
#baseurl=http://mirror.centos.org/centos/$releasever/updates/$basearch/
gpgcheck=1
gpgkey=http://mirror.centos.org/centos/RPM-GPG-KEY-centos4
protect=1 
----
yum install -y mysql MySQL-python
yum install -y http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
yum upgrade -y
yum install -y openstack-utils openstack-selinux
reboot
yum install -y openstack-nova python-novaclient openstack-nova-compute
openstack-config --set /etc/nova/nova.conf DEFAULT auth_strategy keystone
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_host openstack1.relabs.releng.scl3.mozilla.com
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_protocol http
openstack-config --set /etc/nova/nova.conf keystone_authtoken auth_port 35357
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_user nova
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_tenant_name service
openstack-config --set /etc/nova/nova.conf keystone_authtoken admin_password ADMIN_PASS
openstack-config --set /etc/nova/nova.conf DEFAULT rpc_backend nova.openstack.common.rpc.impl_qpid 
openstack-config --set /etc/nova/nova.conf DEFAULT qpid_hostname openstack1.relabs.releng.scl3.mozilla.com 
openstack-config --set /etc/nova/nova.conf DEFAULT my_ip $(facter ipaddress)
openstack-config --set /etc/nova/nova.conf DEFAULT vnc_enabled True
openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_listen 0.0.0.0 
openstack-config --set /etc/nova/nova.conf DEFAULT vncserver_proxyclient_address $(facter ipaddress)
openstack-config --set /etc/nova/nova.conf DEFAULT novncproxy_base_url http://openstack1.relabs.releng.scl3.mozilla.com:6080/vnc_auto.html
openstack-config --set /etc/nova/nova.conf DEFAULT glance_host  openstack1.relabs.releng.scl3.mozilla.com
/etc/nova/api-paste.ini:
[filter:authtoken]
paste.filter_factory = keystoneclient.middleware.auth_token:filter_factory
auth_host = openstack1.relabs.releng.scl3.mozilla.com
auth_port = 35357
auth_protocol = http
admin_tenant_name = service
admin_user = nova
admin_password = ADMIN_PASS
service libvirtd start
service messagebus start
chkconfig libvirtd on
chkconfig messagebus on
service openstack-nova-compute start
chkconfig openstack-nova-compute on
## Nova-Network
openstack-config --set /etc/nova/nova.conf DEFAULT network_manager nova.network.manager.FlatDHCPManager
openstack-config --set /etc/nova/nova.conf DEFAULT firewall_driver nova.virt.libvirt.firewall.IptablesFirewallDriver
openstack-config --set /etc/nova/nova.conf DEFAULT network_size 254
openstack-config --set /etc/nova/nova.conf DEFAULT allow_same_net_traffic False
openstack-config --set /etc/nova/nova.conf DEFAULT multi_host True
openstack-config --set /etc/nova/nova.conf DEFAULT send_arp_for_ha True
openstack-config --set /etc/nova/nova.conf DEFAULT share_dhcp_address True
openstack-config --set /etc/nova/nova.conf DEFAULT force_dhcp_release True
openstack-config --set /etc/nova/nova.conf DEFAULT flat_interface eth1
openstack-config --set /etc/nova/nova.conf DEFAULT flat_network_bridge br260
openstack-config --set /etc/nova/nova.conf DEFAULT public_interface eth1
yum install -y openstack-nova-api
service openstack-nova-metadata-api start
chkconfig openstack-nova-metadata-api on
service openstack-nova-network start
chkconfig openstack-nova-network on
scp openstack1:keystonerc . 
source keystonerc
nova network-create vmnet --fixed-range-v4=192.168.93.0/24 --fixed-cidr=192.168.93.32/28 --bridge br260
Set
 net.ipv4.ip_forward = 1
in /etc/sysctl.conf on every nova-network node (hp5 only right now), and run `sysctl net.ipv4.ip_forward=1`.
I added rules for incoming ping and ssh to the security group.  That gives me *inbound* accesss to those ports of VMs.  However, any traffic off of the 192.168.93.0/24 subnet is forwarded without NAT, which means it doesn't get very far.
To add nodes in different networks, you need to specify the network to nova boot:

nova boot --flavor m1.tiny --image 'cirros-0.3.1' --key_name dustin --nic net-id=4f8e867b-fbee-4432-9051-8b6eb1cc6f59 cirros93
nova boot --flavor m1.tiny --image 'cirros-0.3.1' --key_name dustin --nic net-id=7aaa1243-699c-4320-af8a-1cf9d54095eb cirros-test94

Otherwise you hit https://bugs.launchpad.net/nova/+bug/1211784, where every network is added to the instance, whether that network is associated with the compute node or not.  The patch for that bug languished due to a bad commit message, and is now bitrotted.

There's no way to have an IP selected dynamically after the compute node is chosen by the scheduler.  So even with 1211784 fixed, if a compute node had two networks configured, instances would be configured to use both networks.
(Assignee)

Comment 35

5 years ago
hp6.relabs has been installed as compute node and configured for baremetal.  Eth0 on vlan278 and eth1 on vlan260.

Here are the commands I ran minus all the troubleshooting BS
Images have been recreated and injected into glance:
bin/disk-image-create -u baremetal base ubuntu -o bm-ubuntu
bin/ramdisk-image-create -a i386 -o pxe-ramdisk deploy ramdisk base ubuntu
glance image-create --name bm-ubuntu-vmlinuz --public --disk-format aki < bm-ubuntu.vmlinuz
glance image-create --name bm-ubuntu-initrd --public --disk-format ari < bm-ubuntu.initrd
glance image-create --name bm-ubuntu-image --public --disk-format qcow2 --container-format bare --property kernel_id=d9fd8194-5015-4359-be24-2b80668a78b2 --property ramdisk_id=77f5fd1e-5b6a-4eef-b728-fe509e2ced23 < bm-ubuntu.qcow2
glance image-create --name deploy-vmlinuz --public --disk-format aki < pxe-ramdisk.kernel
glance image-create --name deploy-initrd --public --disk-format ari < pxe-ramdisk.initramfs

nova_bm database initialized and nova_bm user granted access

sudo mkdir -p /tftpboot/pxelinux.cfg
sudo cp /usr/share/syslinux/pxelinux.0 /tftpboot/
sudo chown -R nova /tftpboot
sudo mkdir -p /var/lib/nova/baremetal/dnsmasq
sudo mkdir -p /var/lib/nova/baremetal/console
sudo chown -R nova /var/lib/nova/baremetal

/usr/bin/nova-baremetal-deploy-helper &


nova flavor-create ix-mn.crap 100 7864 240 8
nova flavor-key my-baremetal-flavor set cpu_arch='{i386|x86_64}' "baremetal:deploy_kernel_id"=737f81c4-c387-4bba-85d3-53244b4e7c50 "baremetal:deploy_ramdisk_id"=fc63e70f-05ee-4d6b-a7b9-397021764eee
nova baremetal-node-create --pm_address=10.26.61.102 --pm_user=$$$$$$$$$$$$$ --pm_password=$$$$$$$$$ openstack1.mobile.releng.scl3.mozilla.com 8 7864 250 00:25:90:94:22:b0

dnsmasq --conf-file= --port=0 --enable-tftp --tftp-root=/tftpboot --dhcp-boot=pxelinux.0 --bind-interfaces --pid-file=/var/run/dnsmasq.pid --interface=eth1 --dhcp-range=192.168.91.10,192.168.91.99,255.255.255.0

(using 192.168.91 so it won't overlap with the vmnet)

First attempt to boot a bm instance got wedged since all the fixed_ips had been reserved.  I wasn't able to delete the instance and thought it was safer to reinitialize the nova db
After that, I recreated the bm flavor and flavor key plus the vmnet network.

On the next attempt, the instance seemed to enter the BUILDING state just fine but ipmi didn't power on the host (ix-mn-3).  pxeconfig files were written to /tftpboot so I manually powered on the host via ipmi but failed to get an dhcp address.  I tried killing the dnsmasq I had started up by hand and ran nova-network instead.  When nova-network came up it automatically created br260 on eth1 and assigned an ip from vmnet.  It also fired up 2 dnsmasq processes bond to br260.  I reset ix-mn-3 and tried pxe boot.  This time it got an IP but failed to find the tftp server ip.

A couple things to still figure out here:
(a) why is the manual dnsmasq process not serving up IPs (I'm fairly sure we shouldn't be running nova-network here since the assigned ips are injected with the template file)
-- might have to do with nova-network reconfiguring ifaces and changing iptables
(b) why is the host not being powered on by nova/ipmitool
Regarding (a), we're running in Flat DHCP mode with multi host mode, so I think every hypervisor needs to be running nova-network and that will run dnsmasq.  That means that hp5 is already running DHCP on vlan260, so perhaps that's why it didn't get PXE information?

From the wiki: "NOTE: This dnsmasq process must be the only process on the network answering DHCP requests from the MAC addresses of the enrolled bare metal nodes. If another DHCP server answers the PXE boot, deployment is likely to fail. This means that you must disable neutron-dhcp. Work on this limitation is planned for the Havana cycle. "

That means we need to run even our hypervisors in Flat mode, not FlatDHCP.  Which means we need to hack things to support injecting IPs into CentOS (and Windows and OS X) images.

It sounds like the manually-started dnsmasq process should be configured to use IPs within the same subnet as used by nova-network, but outside of the range of fixed IPs it's allocating from (so within --fixed-range-v4 but not within --fixed-cidr, for those following along at home).
(Assignee)

Comment 37

5 years ago
(In reply to Dustin J. Mitchell [:dustin] (I ignore NEEDINFO) from comment #36)
> Regarding (a), we're running in Flat DHCP mode with multi host mode, so I
> think every hypervisor needs to be running nova-network and that will run
> dnsmasq.  That means that hp5 is already running DHCP on vlan260, so perhaps
> that's why it didn't get PXE information?

That is interesting because when I killed off all dnsmasq processes on hp6 (not touching hp5),  I couldn't get any DHCP response during PXE.  If hp5 was serving dhcp on vlan260, ix-mn-3 should have at least gotten a response instead of timing out.

> 
> From the wiki: "NOTE: This dnsmasq process must be the only process on the
> network answering DHCP requests from the MAC addresses of the enrolled bare
> metal nodes. If another DHCP server answers the PXE boot, deployment is
> likely to fail. This means that you must disable neutron-dhcp. Work on this
> limitation is planned for the Havana cycle. "
> 
> That means we need to run even our hypervisors in Flat mode, not FlatDHCP. 
> Which means we need to hack things to support injecting IPs into CentOS (and
> Windows and OS X) images.
> 
> It sounds like the manually-started dnsmasq process should be configured to
> use IPs within the same subnet as used by nova-network, but outside of the
> range of fixed IPs it's allocating from (so within --fixed-range-v4 but not
> within --fixed-cidr, for those following along at home).

That contradicts the notes from the same wiki page.
"# Start dnsmasq for baremetal deployments. Change IFACE and RANGE as needed.
 # Note that RANGE must not overlap with the instance IPs assigned by Nova or Neutron."
(In reply to Jake Watkins [:dividehex] from comment #37)
> That is interesting because when I killed off all dnsmasq processes on hp6
> (not touching hp5),  I couldn't get any DHCP response during PXE.  If hp5
> was serving dhcp on vlan260, ix-mn-3 should have at least gotten a response
> instead of timing out.

Ah, I bet DHCP is iptable'd off to only hit that compute node.  That makes things easier!

> That contradicts the notes from the same wiki page.
> "# Start dnsmasq for baremetal deployments. Change IFACE and RANGE as needed.
>  # Note that RANGE must not overlap with the instance IPs assigned by Nova
> or Neutron."

It doesn't quite conflict: the instance IPs are the fixed IPs available, so as long as you're not configuring dnsmasq with any of those specific IPs, it's fine if you're configuring it with a range in the same subnet.  In fact, I think that will be preferable - if it's in another subnet, then how will it communicate with the pxe server?
(Assignee)

Comment 39

5 years ago
(In reply to Dustin J. Mitchell [:dustin] (I ignore NEEDINFO) from comment #38)
> It doesn't quite conflict: the instance IPs are the fixed IPs available, so
> as long as you're not configuring dnsmasq with any of those specific IPs,
> it's fine if you're configuring it with a range in the same subnet.  In
> fact, I think that will be preferable - if it's in another subnet, then how
> will it communicate with the pxe server?

Eth1 has a vip matching the same subnet served up by dnsmasq.  During the PXE boot processes, it can then reach the needed services on hp6 such as iscsi and tftp.  I'll try your suggestion of using the same subnet with the range outside the fixed_ip range.
(Assignee)

Comment 40

5 years ago
Got ix-mn-3 to pxeboot at least

- removed eth1:0 alias and assigned 192.168.93.100/24 to eth1
- ensured openstack-nova-network was disabled across runlevels (via chkconfig)
- rebooted to make sure networking and iptables were in a fresh state

Launched dnsmasq:
dnsmasq --conf-file= --port=0 --enable-tftp --tftp-root=/tftpboot --dhcp-boot=pxelinux.0 --bind-interfaces --pid-file=/var/run/dnsmasq.pid --interface=eth1 --dhcp-range=192.168.93.50,192.168.93.60 --log-dhcp

Launched nova-baremetal-deploy-helper:
nova-baremetal-deploy-helper &

Then rebooted ix-mn-3, hit F12 to netboot.  It pxebooted, got the kernel/ramdisk from tftp and initiated an iscsi session but failed from the disk size mismatch.  Same thing dustin say the first time around.  I'm going to adjust the flavor for a smaller disk and try again.

Here is the error as logged by nova-baremetal-deploy-helper:

2014-03-06 11:00:03.972 1683 ERROR nova.virt.baremetal.deploy_helper [req-d5c76ecd-f929-410b-bd37-dc28e38b1b3f None None] Cmd     : sfdisk -uM /dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92d0b4a79-lun-1
2014-03-06 11:00:03.986 1683 ERROR nova.virt.baremetal.deploy_helper [req-d5c76ecd-f929-410b-bd37-dc28e38b1b3f None None] StdOut  : '\nDisk /dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92d0b4a79-lun-1: 30522 cylinders, 255 heads, 63 sectors/track\nOld situation:\nUnits = mebibytes of 1048576 bytes, blocks of 1024 bytes, counting from 0\n\n   Device Boot Start   End    MiB    #blocks   Id  System\n/dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92        0+ 20481- 20482-  20972857   83  Linux\n/dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92    20481+ 22536-  2056-   2104515   82  Linux swap / Solaris\n/dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92        0      -      0          0    0  Empty\n/dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92        0      -      0          0    0  Empty\n'
2014-03-06 11:00:03.986 1683 ERROR nova.virt.baremetal.deploy_helper [req-d5c76ecd-f929-410b-bd37-dc28e38b1b3f None None] StdErr  : 'Checking that no-one is using this disk right now ...\nOK\nWarning: given size (245768) exceeds max allowable size (239421)\n\nsfdisk: bad input\n'
2014-03-06 11:00:04.493 1683 ERROR nova.virt.baremetal.deploy_helper [req-d5c76ecd-f929-410b-bd37-dc28e38b1b3f None None] deployment to node 1 failed
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper Traceback (most recent call last):
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper   File "/usr/lib/python2.6/site-packages/nova/cmd/baremetal_deploy_helper.py", line 252, in run
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper     deploy(**params)
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper   File "/usr/lib/python2.6/site-packages/nova/cmd/baremetal_deploy_helper.py", line 217, in deploy
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper     LOG.error(_("StdErr  : %r"), err.stderr)
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper   File "/usr/lib/python2.6/site-packages/nova/cmd/baremetal_deploy_helper.py", line 211, in deploy
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper     root_uuid = work_on_disk(dev, root_mb, swap_mb, image_path)
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper   File "/usr/lib/python2.6/site-packages/nova/cmd/baremetal_deploy_helper.py", line 183, in work_on_disk
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper     make_partitions(dev, root_mb, swap_mb)
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper   File "/usr/lib/python2.6/site-packages/nova/cmd/baremetal_deploy_helper.py", line 96, in make_partitions
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper     check_exit_code=[0])
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper   File "/usr/lib/python2.6/site-packages/nova/utils.py", line 177, in execute
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper     return processutils.execute(*cmd, **kwargs)
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper   File "/usr/lib/python2.6/site-packages/nova/openstack/common/processutils.py", line 178, in execute
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper     cmd=' '.join(cmd))
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper ProcessExecutionError: Unexpected error while running command.
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper Command: sfdisk -uM /dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92d0b4a79-lun-1
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper Exit code: 1
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper Stdout: '\nDisk /dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92d0b4a79-lun-1: 30522 cylinders, 255 heads, 63 sectors/track\nOld situation:\nUnits = mebibytes of 1048576 bytes, blocks of 1024 bytes, counting from 0\n\n   Device Boot Start   End    MiB    #blocks   Id  System\n/dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92        0+ 20481- 20482-  20972857   83  Linux\n/dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92    20481+ 22536-  2056-   2104515   82  Linux swap / Solaris\n/dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92        0      -      0          0    0  Empty\n/dev/disk/by-path/ip-192.168.93.50:3260-iscsi-iqn-252ffc2a-fd27-4e30-b6ae-08c92        0      -      0          0    0  Empty\n'
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper Stderr: 'Checking that no-one is using this disk right now ...\nOK\nWarning: given size (245768) exceeds max allowable size (239421)\n\nsfdisk: bad input\n'
2014-03-06 11:00:04.493 1683 TRACE nova.virt.baremetal.deploy_helper
(Assignee)

Comment 41

5 years ago
Before I adjust the flavor, I need to figure out how to remove the current instance without reinitializing the reinitializing the entire nova database.  I could delete it directly from the instances table but I'm not sure if that will have an side effects on state or if it referenced elsewhere.

nova force-delete was a no go.
[root@hp6.relabs.releng.scl3.mozilla.com ~]# nova force-delete 252ffc2a-fd27-4e30-b6ae-08c92d0b4a79
ERROR: Cannot 'forceDelete' while instance is in vm_state building (HTTP 409) (Request-ID: req-d5f0d3c7-c5f7-42b3-9e86-2d9e6c1247db)
(Assignee)

Comment 42

5 years ago
A little help from #openstack and it is removed.

nova reset-state 252ffc2a-fd27-4e30-b6ae-08c92d0b4a79
nova delete 252ffc2a-fd27-4e30-b6ae-08c92d0b4a79
(Assignee)

Comment 43

5 years ago
Although now I get a 500 internal server error from the web dashboard :-/
(Assignee)

Comment 44

5 years ago
(In reply to Jake Watkins [:dividehex] from comment #43)
> Although now I get a 500 internal server error from the web dashboard :-/

It was transient and might have been a coincidence
(Assignee)

Comment 45

5 years ago
I change the network manager to the non-dhcp driver (FlatManager).  Plus enabled ip injection.  Since the FlatManager does not setup bridges or launch dnsmasq, these will need to be configure manually for the time being (and later by puppet).

Added to nova.conf
network_manager = nova.network.manager.FlatManager
flat_injected=true

After restarting nova-compute and nova-network, launching an image (via nova boot) all seems to work from beginning to end without any tracebacks or other errors.  Althought, the final images still ends up on a blinking cursor on boot.

I also ran into a few weird "error" or issues.
1.  If an baremetal instance fails on deployment and the nova-compute service is restarted, nova-compute dies with an UnknownInstance error.  I was able to delete the instance but it took about a minute for the things to timeout before actually removing the instance.  I was then able to restart nova-compute on hp6.

2. If a deploy never completes the building stage, you must 'nova reset-state' before being able to call 'nova delete'.  The pxe building state can have a timeout int set in nova.conf.  It is currently default 0 (timeout disabled)

3. At one point the nova-baremetal-deploy-helper process logged 'CRITICAL nova [-] [Errno 5] Input/output error' and died during a deploy.  Probably a bug.  Restarted it and it has been running fine since.

4. After a couple successful baremetal boot->Running->delete tests, I got a traceback:
AttributeError: No such RPC function '_rpc_allocate_fixed_ip'
This makes me wonder if it is an problem to be running FlatDHCPManager on hp5 and FlatManager on hp6. :-/
(Assignee)

Comment 46

5 years ago
I shutdown nova-compute nova-network, nova-api-metadata on hp5 and the baremetal deployments from hp6 are running smoothly now.  I can consistently nova boot a baremetal instance into a "RUNNING" state (which means "image put on disk as far as openstack is concerned") and then delete the instance and then repeat.

After a delete, it takes a few moments for nova compute to audit the freed up resources.  Attempting to nova boot before that will cause the instance to immediately appear but in an ERROR state.

Also saw ISE again on the web dashboard.  This time, I restarted httpd and it recovered.

Next step is to figure out why the image is not booting.
(Assignee)

Comment 47

5 years ago
Baremetal linux deployment success!

This cleared up a misconception I (and maybe others) had about the deployment of baremetal images from openstack.  The baremetal deployment helper does NOT install a bootloader, boot partition, kernel or initramfs iamge; it only partitions and write the root filesystem to the baremetal disk.   From that point, openstack boots baremetal nodes to pxe (via ipmi) which then boots the image kernel and initramfs.  To boot the image kernel and initramfs after the deployment, the deployment initramfs needs to run kexec.  Where it retrieves and boots the image kernel/initrd.  I rebuilt the deployment image with the kexec element and all is well.

I also rebuilt the test ubuntu image to be deployed with local-config and cloud-init-nocloud elements.  Local-config copies the root authorized keys to the image.  In this case, the keys are limited to those with root keys on the relabs puppet environment.  The cloud-init-nocloud points to on-disk metadata/userdata sources; useful while testing.  Proxy metadata still needs to be configured on the baremetal compute host.

The next step is to get baremetal and virt compute nodes to work from the same cloud controller.  According to the docs this is possible though using separate tenants.


Cmds used in creating the new deployment kernel/initrd and the ubuntu 13.10 test image:

bin/disk-image-create -u baremetal base ubuntu cloud-init-nocloud local-config -o ubuntu-13.10-1
bin/ramdisk-image-create -a i386 -o pxe-ramdisk-1 deploy ramdisk base ubuntu deploy-kexec

Cmds to import images to glance, add new flavor w/deployment images, and boot new flavor:

[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# glance image-create --name ubuntu-13.10-1-vmlinuz --public --disk-format aki < ubuntu-13.10-1.vmlinuz
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | d3ad83d258c73fa007142e1cf5019e2f     |
| container_format | aki                                  |
| created_at       | 2014-03-08T03:15:55                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | aki                                  |
| id               | d2b05afa-9b0c-4024-aced-2bcd91415efd |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | ubuntu-13.10-1-vmlinuz               |
| owner            | 1758deaca4e54c95a75fca0c1033c582     |
| protected        | False                                |
| size             | 5634192                              |
| status           | active                               |
| updated_at       | 2014-03-08T03:15:55                  |
+------------------+--------------------------------------+
[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# glance image-create --name ubuntu-13.10-1-initrd --public --disk-format ari < ubuntu-13.10-1.initrd
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | 3833bea5eb5dcc07c4355529ff7e5546     |
| container_format | ari                                  |
| created_at       | 2014-03-08T03:16:42                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | ari                                  |
| id               | a8af3499-4bef-4755-9f18-666925491fc3 |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | ubuntu-13.10-1-initrd                |
| owner            | 1758deaca4e54c95a75fca0c1033c582     |
| protected        | False                                |
| size             | 22073149                             |
| status           | active                               |
| updated_at       | 2014-03-08T03:16:42                  |
+------------------+--------------------------------------+
[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# glance image-create --name ubuntu-13.10-1-image --public --disk-format qcow2 --container-format bare --property kernel_id=d2b05afa-9b0c-4024-aced-2bcd91415efd --property ramdisk_id=a8af3499-4bef-4755-9f18-666925491fc3 < ubuntu-13.10-1.qcow2
+-----------------------+--------------------------------------+
| Property              | Value                                |
+-----------------------+--------------------------------------+
| Property 'kernel_id'  | d2b05afa-9b0c-4024-aced-2bcd91415efd |
| Property 'ramdisk_id' | a8af3499-4bef-4755-9f18-666925491fc3 |
| checksum              | a16f8ae9cde79e0605b693f985393056     |
| container_format      | bare                                 |
| created_at            | 2014-03-08T03:18:35                  |
| deleted               | False                                |
| deleted_at            | None                                 |
| disk_format           | qcow2                                |
| id                    | 826aed99-d30a-47db-a08b-9bc474b8d634 |
| is_public             | True                                 |
| min_disk              | 0                                    |
| min_ram               | 0                                    |
| name                  | ubuntu-13.10-1-image                 |
| owner                 | 1758deaca4e54c95a75fca0c1033c582     |
| protected             | False                                |
| size                  | 1256980480                           |
| status                | active                               |
| updated_at            | 2014-03-08T03:18:43                  |
+-----------------------+--------------------------------------+
[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# glance image-create --name pxe-ramdisk-1-kernel --public --disk-format aki < pxe-ramdisk-1.kernel
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | e8bd010aa70524abade9fad3a039694f     |
| container_format | aki                                  |
| created_at       | 2014-03-08T03:20:13                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | aki                                  |
| id               | 39c9bb46-7a94-4d6b-af2f-6794020dda67 |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | pxe-ramdisk-1-kernel                 |
| owner            | 1758deaca4e54c95a75fca0c1033c582     |
| protected        | False                                |
| size             | 5667920                              |
| status           | active                               |
| updated_at       | 2014-03-08T03:20:13                  |
+------------------+--------------------------------------+
[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# glance image-create --name pxe-ramdisk-1-initramfs --public --disk-format ari < pxe-ramdisk-1.initramfs
+------------------+--------------------------------------+
| Property         | Value                                |
+------------------+--------------------------------------+
| checksum         | 072482e8298581d2daf27db033665e64     |
| container_format | ari                                  |
| created_at       | 2014-03-08T03:20:50                  |
| deleted          | False                                |
| deleted_at       | None                                 |
| disk_format      | ari                                  |
| id               | fd3448d8-5f96-448d-b433-c95bc206a2ad |
| is_public        | True                                 |
| min_disk         | 0                                    |
| min_ram          | 0                                    |
| name             | pxe-ramdisk-1-initramfs              |
| owner            | 1758deaca4e54c95a75fca0c1033c582     |
| protected        | False                                |
| size             | 78909835                             |
| status           | active                               |
| updated_at       | 2014-03-08T03:20:51                  |
+------------------+--------------------------------------+

[root@openstack1.relabs.releng.scl3.mozilla.com diskimage-builder]# nova flavor-create ix-mn.crap2 101 7864 200 8
+-----+-------------+-----------+------+-----------+------+-------+-------------+-----------+
| ID  | Name        | Memory_MB | Disk | Ephemeral | Swap | VCPUs | RXTX_Factor | Is_Public |
+-----+-------------+-----------+------+-----------+------+-------+-------------+-----------+
| 101 | ix-mn.crap2 | 7864      | 200  | 0         |      | 8     | 1.0         | True      |
+-----+-------------+-----------+------+-----------+------+-------+-------------+-----------+

nova flavor-key ix-mn.crap2 set cpu_arch='{i386|x86_64}' "baremetal:deploy_kernel_id"=39c9bb46-7a94-4d6b-af2f-6794020dda67 "baremetal:deploy_ramdisk_id"=fd3448d8-5f96-448d-b433-c95bc206a2ad

nova boot --flavor ix-mn.crap2 --image ubuntu-13.10-1-image ubuntu-test

Comment 48

5 years ago
(In reply to Jake Watkins [:dividehex] from comment #47)
> Baremetal linux deployment success!
> 
> This cleared up a misconception I (and maybe others) had about the
> deployment of baremetal images from openstack.  The baremetal deployment
> helper does NOT install a bootloader, boot partition, kernel or initramfs
> iamge; it only partitions and write the root filesystem to the baremetal
> disk.   From that point, openstack boots baremetal nodes to pxe (via ipmi)
> which then boots the image kernel and initramfs.  To boot the image kernel
> and initramfs after the deployment, the deployment initramfs needs to run
> kexec.  Where it retrieves and boots the image kernel/initrd.  I rebuilt the
> deployment image with the kexec element and all is well.

\o/ for getting stuff working and figuring out our misconception.
(Assignee)

Updated

5 years ago
Depends on: 982739
(Assignee)

Comment 49

5 years ago
Neutron was installed and configured a couple weeks ago and now replaces nova-networking.  This gives us a lot more features and control of network configuration.  I've opted to use openvswitch with gre tunneling.  This will dynamically create and configure gre tunnels and bridges to all compute hosts and neutron controller. I've also create a router, network, subnet for the virtual instance and a seperate network and subnet for the baremetal instances.  The virtual instances are attached to demo-net while the baremetal are attached to releng.scl3 under the associated subnet 'mobile.releng.scl3'

The baremetal network configuration is set to match the actual mobile.releng.scl3 network.  This means baremetal nodes are given static ips, netmask, dns, and gateway ip which allow the baremetal instances to traverse the releng fw rather than bridge back to neutron as the virtual instances do.

One limitation I noticed in doing this was the static network injection does not set search domains. This is a problem when trying to resolve things such as repo and puppet.  Another limitation, when using the native router on the mobile.releng network is baremetal instances cannot route metadata '169.254.168.254'.  This can probably be overcome by adding a NAT rule to the router which points back at the openstack metadata service running on port 8775.

== CLI c&p notes ==

[root@openstack1.relabs.releng.scl3.mozilla.com neutron]# neutron net-create ext-net -- --router:external=True --provider:network_type gre --provider:segmentation_id 2
Created a new network:
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | True                                 |
| id                        | 9cce4df0-3008-492f-8cd7-f648234a20df |
| name                      | ext-net                              |
| provider:network_type     | gre                                  |
| provider:physical_network |                                      |
| provider:segmentation_id  | 2                                    |
| router:external           | True                                 |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   |                                      |
| tenant_id                 | 1758deaca4e54c95a75fca0c1033c582     |
+---------------------------+--------------------------------------+
[root@openstack1.relabs.releng.scl3.mozilla.com neutron]# nova network-list
+--------------------------------------+---------+------+
| ID                                   | Label   | Cidr |
+--------------------------------------+---------+------+
| 9cce4df0-3008-492f-8cd7-f648234a20df | ext-net | -    |
+--------------------------------------+---------+------+
[root@openstack1.relabs.releng.scl3.mozilla.com neutron]# neutron subnet-create ext-net --allocation-pool start=10.26.51.1,end=10.26.51.254 --gateway=10.26.48.1 --enable_dhcp=false 10.26.48.0/22
Created a new subnet:
+------------------+------------------------------------------------+
| Field            | Value                                          |
+------------------+------------------------------------------------+
| allocation_pools | {"start": "10.26.51.1", "end": "10.26.51.254"} |
| cidr             | 10.26.48.0/22                                  |
| dns_nameservers  |                                                |
| enable_dhcp      | False                                          |
| gateway_ip       | 10.26.48.1                                     |
| host_routes      |                                                |
| id               | 6f44bdd5-6045-4f53-a974-6fc054cc48d2           |
| ip_version       | 4                                              |
| name             |                                                |
| network_id       | 9cce4df0-3008-492f-8cd7-f648234a20df           |
| tenant_id        | 1758deaca4e54c95a75fca0c1033c582               |
+------------------+------------------------------------------------+
[root@openstack1.relabs.releng.scl3.mozilla.com neutron]# keystone tenant-create --name demo_tenant
+-------------+----------------------------------+
|   Property  |              Value               |
+-------------+----------------------------------+
| description |                                  |
|   enabled   |               True               |
|      id     | 487c6020289149c28cb36b89296afe3b |
|     name    |           demo_tenant            |
+-------------+----------------------------------+
[root@openstack1.relabs.releng.scl3.mozilla.com neutron]# keystone tenant-list
+----------------------------------+-------------+---------+
|                id                |     name    | enabled |
+----------------------------------+-------------+---------+
| 1758deaca4e54c95a75fca0c1033c582 |    admin    |   True  |
| 487c6020289149c28cb36b89296afe3b | demo_tenant |   True  |
| 7a9ad3bc126b499fabb82766c93c26df |   service   |   True  |
+----------------------------------+-------------+---------+
[root@openstack1.relabs.releng.scl3.mozilla.com neutron]# neutron router-create ext-to-int --tenant-id 487c6020289149c28cb36b89296afe3b
Created a new router:
+-----------------------+--------------------------------------+
| Field                 | Value                                |
+-----------------------+--------------------------------------+
| admin_state_up        | True                                 |
| external_gateway_info |                                      |
| id                    | 924d9f52-d59a-4fce-bf7e-abf514438ca0 |
| name                  | ext-to-int                           |
| status                | ACTIVE                               |
| tenant_id             | 487c6020289149c28cb36b89296afe3b     |
+-----------------------+--------------------------------------+
[root@openstack1.relabs.releng.scl3.mozilla.com neutron]# neutron router-gateway-set 924d9f52-d59a-4fce-bf7e-abf514438ca0 9cce4df0-3008-492f-8cd7-f648234a20df
Set gateway for router 924d9f52-d59a-4fce-bf7e-abf514438ca0
[root@openstack1.relabs.releng.scl3.mozilla.com neutron]# neutron net-create --tenant-id 487c6020289149c28cb36b89296afe3b demo-net
Created a new network:
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | True                                 |
| id                        | 41812a22-b903-46d1-a7fa-f2d8bed412d2 |
| name                      | demo-net                             |
| provider:network_type     | gre                                  |
| provider:physical_network |                                      |
| provider:segmentation_id  | 1                                    |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   |                                      |
| tenant_id                 | 487c6020289149c28cb36b89296afe3b     |
+---------------------------+--------------------------------------+
[root@openstack1.relabs.releng.scl3.mozilla.com neutron]# neutron subnet-create --tenant-id 487c6020289149c28cb36b89296afe3b demo-net 192.168.95.0/24 --gateway 192.168.95.1
Created a new subnet:
+------------------+----------------------------------------------------+
| Field            | Value                                              |
+------------------+----------------------------------------------------+
| allocation_pools | {"start": "192.168.95.2", "end": "192.168.95.254"} |
| cidr             | 192.168.95.0/24                                    |
| dns_nameservers  |                                                    |
| enable_dhcp      | True                                               |
| gateway_ip       | 192.168.95.1                                       |
| host_routes      |                                                    |
| id               | 133b4ffa-540e-45b3-8f97-c946e0894f41               |
| ip_version       | 4                                                  |
| name             |                                                    |
| network_id       | 41812a22-b903-46d1-a7fa-f2d8bed412d2               |
| tenant_id        | 487c6020289149c28cb36b89296afe3b                   |
+------------------+----------------------------------------------------+
[root@openstack1.relabs.releng.scl3.mozilla.com neutron]# neutron router-interface-add 924d9f52-d59a-4fce-bf7e-abf514438ca0 133b4ffa-540e-45b3-8f97-c946e0894f41
Added interface 9585b39c-7dec-41fd-90bb-d8162962b21c to router 924d9f52-d59a-4fce-bf7e-abf514438ca0.
(Assignee)

Comment 50

5 years ago
The ip range for baremetal instances have been reserved in inventory.  I've also created 2 A/PTR records for the 2 ix-mn hosts already available to openstack.
https://inventory.mozilla.org/en-US/core/range/21/
(Assignee)

Comment 51

5 years ago
I've successfully deployed an image of Ubuntu 12.04 which is based on a kickstarted install from puppetagain.

The image is named 'ubuntu-12.04-7.qcow2'

I tried using the ubuntu cloud images and the ubuntu minimal images without success. They would either fail during a diskimage-builder build or during a puppet run after a deployment.  Both root causes were mismatched package dependencies between what came installed on the image vs. what was available in the puppetagain repos.  Needless to say, our repos are out of date.  Once I captured the post kickstarted image and deployed it, puppet runs happened flawlessly.

After capture, the image also needed to be run through the diskimage-builder process in order to prep and sanitize it before being imported into glance.  Since the root fs image wasn't coming from the pre-built cloud images, some extra steps were needed.  This included resetting /etc/fstab and udev net rules.  It also included installing cloud-init and setting the conf to point to NoCloud datasource.  This was crucial to getting ssh host keys generated and set on first run.
(Assignee)

Updated

5 years ago
Depends on: 990733
Summary: Install OpenStack in relabs → Experiment with OpenStack as a future cloud deployment platform for releng
(Assignee)

Comment 52

5 years ago
While looking into Ironic last week, one particular blueprint stood out:

tl;dr:
Windows images are not supported in ironic just yet. In its current form, the ironic driver insists on creating partitions and deploying an image to a root partition. This doesn't allow an entire windows disk image (w/ bootloader) to be pushed to disk.  The solution would be to flag an image in glance as to being either partition or whole disk.

https://blueprints.launchpad.net/ironic/+spec/windows-disk-image-support
I talked to some OS folks at their booth at PyCon, and they suggested that we will most likely need to write our own drivers.  In fact, they indicated that just about every OpenStack deployment that is more than the most basic private cloud involves substantial custom development.  That's pretty easy when the model is a good match, and progressively harder as what you're trying to do diverges from "basic private cloud".
We're moving ahead with staging.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.