Kea Interacting with Wired Networking
After a reboot of my new server, the Kea DHCP server did not respond to DHCPDISCOVERs or DHCPREQUESTs
I had trouble noticing this because my laptop retained a DHCP lease for a while after rebooting the new server. Close laptop, come back later, and it could not get a lease. I had to experience it a few times before believing.
systemctl restart kea-dhcp4
on the new server got me past the proximal problem:
the restarted kea-dhcp4
would give out leases.
My laptop could join the IP network without hassle.
I figured out the root cause by looking at kea-dhcp4
logs:
sudo journalctl -u kea-dhcp4.service --since 2024-06-16T14:34:00 > kea.logs
In this command, “2024-06-16T14:34:00” is just before rebooting the entire new server.
I could compare the sequence of log entries from boot to restart,
to the log entries from restart to present time.
I could see kea-dhcp4
give out IPv4 address to my laptop after I restarted it,
but not between system boot and kea-dhcp4
restart.
Other than the timestamps on log entries, the first unusual log entry
that occurred in the time from from system boot to kea-dhcp4
restart was:
WARN DHCPSRV_OPEN_SOCKET_FAIL failed to open socket: the interface enp4s0 is not running
Why, according to kea-dhcp4
is interface enp4s0 not running?
It’s certainly running now, if I run ip -br link
or ip -br address
.
I also have to believe that kea-dhcp4
does not try to open the socket again later.
Since I now have kea-dhcp4
running on my Dell R530),
I double checked the log entries on the Dell R530,
confirming that enp4s0 “not running” indicates the real problem.
Here’s a diagram of my network, for purposes of this post:
I can ssh
to my new server by joining a WiFi network served by
my production server, the Dell R530, then using the IPv4 address
assigned to interface enp7s0 on the new server.
There’s some iptables
setup done at system boot time to assign
static IPv4 addresses to interface enp4s0 (172.24.0.1),
to set up IPv4 forwarding,
to do Network Address Translation (NAT) masquerading.
This is almost certainly where enp4s0 “not running” happens.
My first thought was that the systemd unit that I wrote to do all
the iptables
work didn’t run at the correct time.
I have only a hazy understanding of how systemd
(PID 1)
decides to order units (services, targets, devices) at system startup.
I have had this systemd unit (imaginatively called network.service
)
running on three different machines since sometime in 2018.
Here’s the Unit
specification that caused kea
to start before enps0
network interface was ready:
[Unit]
Description=Wired Static IP Connectivity
Wants=network.target
Before=network-pre.target
BindsTo=sys-subsystem-net-devices-eno1.device
After=sys-subsystem-net-devices-eno1.device
I read some man pages, looked up some system admin blogs on systemd unit files,
especially, the Before=
, After=
, Wants=
and Requires=
config items.
I tried a few things, but when they didn’t work (kea
started before enp4s0
was UP), I looked at /usr/lib/systemd/system/iptables.service
for inspiration.
iptables.service
has Before=network-pre.target
, which really doesn’t match
the documentation,
which says something like this:
network-pre.target is used to order services before any network interfaces start to be configured.
I ended up with this in /etc/systemd/system/network.service
file:
[Unit]
Description=Wired Static IP Connectivity
Before=network-pre.target
Wants=network-pre.target
After=sys-subsystem-net-devices-enp4s0.device
Requires=sys-subsystem-net-devices-enp4s0.device
It looks to me like this problem occurs on my new server,
but not my production server,
because the production server has units to start pppd
and PPPoE,
and do Path-MTU-clamping.
My current theory is these units on my production server caused
a different synchronization,
so dhcpd
or kea-dhcp4
gets started later in the boot sequence.