Bug 1499054 Comment 10 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

This bug is currently blocked by [AWS support case 6410417131](https://console.aws.amazon.com/support/cases#/6410417131/en), copied/pasted here for convenience.

-------

Hi!

We create AMIs for our worker machines. The way we create the AMI is to launch a standard base image, pass a bash script as userdata which bootstraps the system installations and shuts down the machine, and then we take a snapshot when we detect the instance has shutdown. We then launch spot instances based on the AMI we created, and everything works as expected, except for the following new situation we are encountering.

In the case that we install `ubuntu-gnome-desktop` when creating the AMI, spot instances that we launch using this AMI start up with no network, and fail the Instance Status Checks (System Status Checks pass).

If I remove `ubuntu-desktop` / `ubuntu-gnome-desktop` from the installed packages, we do not have this problem.

Looking at the system logs, I see the following routing tables:

```
[   19.231811] cloud-init[831]: Cloud-init v. 19.1-1-gbaa47854-0ubuntu1~18.04.1 running 'init' at Mon, 02 Sep 2019 16:59:20 +0000. Up 19.09 seconds.
[   19.237650] cloud-init[831]: ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
[   19.243737] cloud-init[831]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[   19.249784] cloud-init[831]: ci-info: | Device |   Up  |  Address  |    Mask   | Scope |     Hw-Address    |
[   19.255828] cloud-init[831]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[   19.261921] cloud-init[831]: ci-info: |  ens3  | False |     .     |     .     |   .   | 06:16:8a:ed:e2:12 |
[   19.267866] cloud-init[831]: ci-info: |   lo   |  True | 127.0.0.1 | 255.0.0.0 |  host |         .         |
[   19.273801] cloud-init[831]: ci-info: |   lo   |  True |  ::1/128  |     .     |  host |         .         |
[   19.279765] cloud-init[831]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[   19.285710] cloud-init[831]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[   19.291030] cloud-init[831]: ci-info: +-------+-------------+---------+-----------+-------+
[   19.296309] cloud-init[831]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[   19.301670] cloud-init[831]: ci-info: +-------+-------------+---------+-----------+-------+
[   19.306931] cloud-init[831]: ci-info: +-------+-------------+---------+-----------+-------+
```

Note, if I start the original instance up that the AMI was taken from, it starts up with no problems, and has valid routing tables, and I can ssh onto the machine etc. The problem is only with the spot instances created from the AMI.

Like I say, I've localised the issue to the `ubuntu-gnome-desktop` / `ubuntu-desktop` package installations, since without these packages, the machines start up without a problem.

Example instances:

Base instance that I created AMI from:
* us-east-1: i-054c4f889fb6ee399 (currently stopped)

Instances spawned from the AMI created from this base instance:
* us-east-1: i-01cb177231623ae01

From the base instance, you should be able to see the userdata passed in to bootstrap it, but I'll also attach it to the ticket for reference.

As I say, this happens consistently in automation. The AMI that the base instance was created from is ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190814 (ami-064a0193585662d74). Nothing was done on that instance other than launching it with the attached file `1.txt` provided as userdata, which includes the shutdown command at the end which caused the instance to stop.

Lastly, I also discovered that this certainly seems to be an AWS issue rather than an ubuntu issue, since if I launch the spot instances based on the same AMI but without ubuntu-desktop / ubuntu-gnome-desktop packages installed, I am then able to install ubuntu-desktop / ubuntu-gnome-desktop successfully, restart the instances, and they come up with networking working. The problem is only if I install the desktop *before* taking the snapshot. And like I say, the original instance starts up without problem if I restart it, the issue only exhibits itself on the spot instances that were created from the image, not on the machine that was used to create image itself.

Many thanks in advance for your help!

FWIW this is the code we use in automation that creates the instance and snapshots it: https://github.com/taskcluster/generic-worker/blob/bug1499054/worker_types/update.sh

Kind regards,
Pete

Attachments
* 1.txt
This bug is currently blocked by [AWS support case 6410417131](https://console.aws.amazon.com/support/cases#/6410417131/en), copied/pasted here for convenience.

-------

Hi!

We create AMIs for our worker machines. The way we create the AMI is to launch a standard base image, pass a bash script as userdata which bootstraps the system installations and shuts down the machine, and then we take a snapshot when we detect the instance has shutdown. We then launch spot instances based on the AMI we created, and everything works as expected, except for the following new situation we are encountering.

In the case that we install `ubuntu-gnome-desktop` when creating the AMI, spot instances that we launch using this AMI start up with no network, and fail the Instance Status Checks (System Status Checks pass).

If I remove `ubuntu-desktop` / `ubuntu-gnome-desktop` from the installed packages, we do not have this problem.

Looking at the system logs, I see the following routing tables:

```
[   19.231811] cloud-init[831]: Cloud-init v. 19.1-1-gbaa47854-0ubuntu1~18.04.1 running 'init' at Mon, 02 Sep 2019 16:59:20 +0000. Up 19.09 seconds.
[   19.237650] cloud-init[831]: ci-info: +++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++
[   19.243737] cloud-init[831]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[   19.249784] cloud-init[831]: ci-info: | Device |   Up  |  Address  |    Mask   | Scope |     Hw-Address    |
[   19.255828] cloud-init[831]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[   19.261921] cloud-init[831]: ci-info: |  ens3  | False |     .     |     .     |   .   | 06:16:8a:ed:e2:12 |
[   19.267866] cloud-init[831]: ci-info: |   lo   |  True | 127.0.0.1 | 255.0.0.0 |  host |         .         |
[   19.273801] cloud-init[831]: ci-info: |   lo   |  True |  ::1/128  |     .     |  host |         .         |
[   19.279765] cloud-init[831]: ci-info: +--------+-------+-----------+-----------+-------+-------------------+
[   19.285710] cloud-init[831]: ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++
[   19.291030] cloud-init[831]: ci-info: +-------+-------------+---------+-----------+-------+
[   19.296309] cloud-init[831]: ci-info: | Route | Destination | Gateway | Interface | Flags |
[   19.301670] cloud-init[831]: ci-info: +-------+-------------+---------+-----------+-------+
[   19.306931] cloud-init[831]: ci-info: +-------+-------------+---------+-----------+-------+
```

Note, if I start the original instance up that the AMI was taken from, it starts up with no problems, and has valid routing tables, and I can ssh onto the machine etc. The problem is only with the spot instances created from the AMI.

Like I say, I've localised the issue to the `ubuntu-gnome-desktop` / `ubuntu-desktop` package installations, since without these packages, the machines start up without a problem.

Example instances:

Base instance that I created AMI from:
* us-east-1: [i-054c4f889fb6ee399](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Instances:search=i-054c4f889fb6ee399;sort=instanceId) (currently stopped)

Instances spawned from the AMI created from this base instance:
* us-east-1: [i-01cb177231623ae01](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Instances:search=i-01cb177231623ae01;sort=instanceId)

From the base instance, you should be able to see the userdata passed in to bootstrap it, but I'll also attach it to the ticket for reference.

As I say, this happens consistently in automation. The AMI that the base instance was created from is ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20190814 ([ami-064a0193585662d74](https://console.aws.amazon.com/ec2/v2/home?region=us-east-1#Images:visibility=public-images;search=ami-064a0193585662d74;sort=name)). Nothing was done on that instance other than launching it with the attached file `1.txt` provided as userdata, which includes the shutdown command at the end which caused the instance to stop.

Lastly, I also discovered that this certainly seems to be an AWS issue rather than an ubuntu issue, since if I launch the spot instances based on the same AMI but without ubuntu-desktop / ubuntu-gnome-desktop packages installed, I am then able to install ubuntu-desktop / ubuntu-gnome-desktop successfully, restart the instances, and they come up with networking working. The problem is only if I install the desktop *before* taking the snapshot. And like I say, the original instance starts up without problem if I restart it, the issue only exhibits itself on the spot instances that were created from the image, not on the machine that was used to create image itself.

Many thanks in advance for your help!

FWIW this is the code we use in automation that creates the instance and snapshots it: https://github.com/taskcluster/generic-worker/blob/bug1499054/worker_types/update.sh

Kind regards,
Pete

Attachments
* 1.txt

Back to Bug 1499054 Comment 10