Here's the initial response from AWS support:
My name is Luke and I'm from the AWS Linux Support team and will be looking into your case.
I understand you are seeing a lot of timeouts running your test jobs on m5.large instances types. You would like review to see if there is any commonalities between the instances that you provided.
I focused the response on the m5.large instances(i-03bb3c305fe29770f, i-04dc5b3e0f42c3a6f, i-03d8d270819160d2c, i-0eb0734fd390d4dc0) types. On review of these instances I found there was no issue with the underlying hardware or networking(dropped packets, packet loss). Console on these instances looked good. On review of the CloudWatch metrics everything looks good too. Although, I did see for these nodes there was a few large spikes for Network In/Out traffic. For example NetworkOut for i-03d8d270819160d2c shows spikes at the following times:
There are a recommendations I suggest from review these instances. These updates will help with your stability of the operating system to help rule out issue:
- Kernel 4.4.0-1014-aws on ubuntu 14 which is a few revisions behind the latest one available. The latest available Ubuntu 14 is 4.4.0-1044.47 which is released on the 2019-05-16. We also suggest migrating to a newer version of ubuntu with a more up-to-date kernel too.
- ENA driver version 1.3.0K. ENA driver is past version 2.
So to get to the bottom of what is happening we will need to get some more information. I would like the following information from 1 or 2 current instances that are facing the issue:
Timestamp + timezone where you see the failures.
I see that the m5 instance types also have instance store volumes. At the time of your issue can you please run iostat every 1 second to see how the instance is working with the Volumes that are attached to it:
$iostat -mydtxz 1
Can you provide logging form the Application and Operating System side(like file /var/log/messages) of things at the time of the issue?
Do you have any instances that are running your workload that do not require termination? If so can you provide us with a couple of instance IDs and we can review these too.
Finally, I was taking a look at https://bugzilla.mozilla.org/show_bug.cgi?id=1617552 and there was a mention of this issue occurring on Ubuntu 18 images. From this review I see from the console that these instances provided are using Ubuntu 14 images provided from AMI ami-03788cad4724efdbc .
In summary, this review is not conclusive to the issue you are facing, so we would like some more information to assist with further troubleshooting. Please let me know if you have any further questions or concerns and we will be happy to help. Have a great day !
 i-03d8d270819160d2c - NetworkOut: