My docker swarm failed to start services after docker daemon upgrade
Jan 4 11:13:25 systemd[851112]: run-docker-netns-1\x2d0nwxt2ozxt.mount: Succeeded.
Jan 4 11:13:25 kernel: [7327688.428417] br0: renamed from ov-001001-0nwxt
Jan 4 11:13:25 dockerd[886568]: time="2025-01-04T11:13:25.902878298Z" level=warning msg="Peer flush operation failed" error="Unable to find the peerDB for nid:0nwxt2ozxt0rxudrmu4hw2737"
Jan 4 11:13:25 dockerd[886568]: time="2025-01-04T11:13:25.902966519Z" level=error msg="fatal task error" error="network sandbox join failed: subnet sandbox join failed for \"10.0.1.0/24\": error creating vxlan interface: file exists" module=node/agent/taskmanager node.id=gdjq1b5lald677iqxhlaz1osd service.id=ymo1vfjuzljrlsgrbnyxgvvv4 task.id=wvblqpgj9wv3p7p1kyol4s6z7
Jan 4 11:13:25 systemd[1]: run-docker-netns-lb_0nwxt2ozx.mount: Succeeded.
That AI kindly explained to me as:
The error logs from the Docker daemon indicate a network configuration issue within a Docker environment, specifically related to the creation and management of virtual network interfaces. The key error, “network sandbox join failed: subnet sandbox join failed for ‘10.0.1.0/24’: error creating vxlan interface: file exists,” suggests that Docker attempted to create a VXLAN interface for an overlay network, but encountered a conflict because an interface with the same identifier already exists. This can occur if previous network configurations were not properly cleaned up, leading to residual interfaces that interfere with new network setups. The accompanying warning about a failed peer flush operation further points to potential issues in Docker’s overlay network management, possibly due to misconfigured or stale network state information.
Solution was to delete that network interface and restart docker:
udevadm info /etc/sysclass/net/
udevadm info /etc/sysclass/net
udevadm info /sys/class/net/vx-001001-0nwxt
sudo ip -d link show vx-001001-0nwxt
sudo ip -d link delete vx-001001-0nwxt