ENA driver restarting network interface

Recently I’ve noticed increased level of 502 errors on Application Load Balancer sitting in front of ECS service. Analysis of all load balancer parameters like connection draining I have not found a direct reason of this behaviour. I noticed that application served the response with 200 OK status code, however ALB still sent 502 to the client. Thankfuly instance was still around. After investigating logs I found cryptic message about Elastic Network Adapter being reset:

ena 0000:00:05.0: Device reset completed successfully, Driver info: Elastic Network Adapte
r (ENA) v2.2.10g

Quick analysis shown that it happens quite often:

grep 'Device reset completed successfully' messages | wc -l
45

Ok, so definitely restart of network adapter driver can cause dropped connections from ALB to the instance. But what causes ENA to reset? By digging deeper I found this issue on Github that from the first sight looks related.

And it was that! I found out that single Java application OOM-ing, dropped stack trace to serial, and that caused ENA driver to restart due to serial being busy.

And the solution was to disable dumping kernel log during OOMs:

vm.oom_dump_tasks = 0  # to disable the dump to kernel log & spamming serial out.