Thor Simon
2018-04-05 14:22:21 UTC
We've been investigating some sporadic issues with clients failing to receive addresses via the DHCP plugin. I think we've found a general problem with address plugins and would appreciate others' feedback on the best path towards a fix.
First, a few observations:
1) The interface for address plugins is that they're asked for an "address" attr, which is a bare host_t.
2) Mode Config supports an "address lifetime" key but because of #1 above, we don't and can't easily feed this to the client -- it gets an address only, with no notion of expiry.
3) IKEv2 Mode Config is pull-only, so the server can't later tell the client a new address
4) StrongSwan's behavior as a client is to issue a new mode config pull when it renews its Phase 1 SA. This causes another call into the DHCP plugin on the server side, thus another full DORA cycle with the DHCP server or servers, thus a reacquisition of the client's address. This is the only mechanism which causes renewal of a DHCP-allocated address in StrongSwan as far as we can tell.
Issues:
1) Issuing a new config pull on phase 1 SA renewal does not appear to be required by the standard and some peers may not do this, thus allowing their leased addresses to expire with no attempt to reacquire them.
2) If the Phase 1 SA lifetime is longer than the DHCP lease time, the client will continue to use an address whose corresponding lease has expired. For most ISC-dervived DHCP server code, this will cause a serious pathology:
A) the lease expires with no attempt to renew and is marked as "abandoned"
B) if the backing address range runs out of addresses, the server will attempt to ping addresses it believes to have abandoned leases
C) If the Phase 1 SA has not yet expired, the client will respond to the ping
D) The server will thereafter refuse to allocate the address
E) This will eventually exhaust the DHCP server's address space and nobody will get addresses.
3) We can _minimize_ this issue by configuring long DHCP lease times and short Phase 1 SA lifetimes, but not eliminate it. In failover DHCP configurations, per the standard, factors such as the MCLT (max client lead time) will result in some clients receiving shorter-than-default leases until their first renewal, and this will eventually trigger the pathology detailed above.
Possible fixes:
1) Feed the client an address-lifetime attribute along with its address. Then the client will trigger a new DORA cycle at a more appropriate time.
* Clients may not support this attribute (does StrongSwan? We haven't yet looked)
* This requires a change to the API for all plugins which supply addresses, to return the lifetime along with the address.
* But it does seem most correct, since it does tell the client the lease is not eternal.
2) Add more state and a timer to the DHCP plugin such that it knows all addresses it's handled and tries to renew them itself.
* This is fairly heavy
* Not clear what to do if renewal fails
3) Artificially constrain the effective phase 1 SA lifetime server-side so that the server tries to renew the phase 1 before the DHCP lease expires. This should still trigger a new config pull by the client.
* This is an abstraction violation - the DHCP plugin would have to find the relevant client state and mess with it directly, unless the plugin's API were changed as per #1 above
4) Change the client to do a config pull at each Phase 2 renewal instead
* Some other clients do this (or so the comments in the standard say)
* But it isn't really a fix, just reduces the chance of an issue because the phase 2 lifetimes are shorter
++++++
What do others think the best path forwar
First, a few observations:
1) The interface for address plugins is that they're asked for an "address" attr, which is a bare host_t.
2) Mode Config supports an "address lifetime" key but because of #1 above, we don't and can't easily feed this to the client -- it gets an address only, with no notion of expiry.
3) IKEv2 Mode Config is pull-only, so the server can't later tell the client a new address
4) StrongSwan's behavior as a client is to issue a new mode config pull when it renews its Phase 1 SA. This causes another call into the DHCP plugin on the server side, thus another full DORA cycle with the DHCP server or servers, thus a reacquisition of the client's address. This is the only mechanism which causes renewal of a DHCP-allocated address in StrongSwan as far as we can tell.
Issues:
1) Issuing a new config pull on phase 1 SA renewal does not appear to be required by the standard and some peers may not do this, thus allowing their leased addresses to expire with no attempt to reacquire them.
2) If the Phase 1 SA lifetime is longer than the DHCP lease time, the client will continue to use an address whose corresponding lease has expired. For most ISC-dervived DHCP server code, this will cause a serious pathology:
A) the lease expires with no attempt to renew and is marked as "abandoned"
B) if the backing address range runs out of addresses, the server will attempt to ping addresses it believes to have abandoned leases
C) If the Phase 1 SA has not yet expired, the client will respond to the ping
D) The server will thereafter refuse to allocate the address
E) This will eventually exhaust the DHCP server's address space and nobody will get addresses.
3) We can _minimize_ this issue by configuring long DHCP lease times and short Phase 1 SA lifetimes, but not eliminate it. In failover DHCP configurations, per the standard, factors such as the MCLT (max client lead time) will result in some clients receiving shorter-than-default leases until their first renewal, and this will eventually trigger the pathology detailed above.
Possible fixes:
1) Feed the client an address-lifetime attribute along with its address. Then the client will trigger a new DORA cycle at a more appropriate time.
* Clients may not support this attribute (does StrongSwan? We haven't yet looked)
* This requires a change to the API for all plugins which supply addresses, to return the lifetime along with the address.
* But it does seem most correct, since it does tell the client the lease is not eternal.
2) Add more state and a timer to the DHCP plugin such that it knows all addresses it's handled and tries to renew them itself.
* This is fairly heavy
* Not clear what to do if renewal fails
3) Artificially constrain the effective phase 1 SA lifetime server-side so that the server tries to renew the phase 1 before the DHCP lease expires. This should still trigger a new config pull by the client.
* This is an abstraction violation - the DHCP plugin would have to find the relevant client state and mess with it directly, unless the plugin's API were changed as per #1 above
4) Change the client to do a config pull at each Phase 2 renewal instead
* Some other clients do this (or so the comments in the standard say)
* But it isn't really a fix, just reduces the chance of an issue because the phase 2 lifetimes are shorter
++++++
What do others think the best path forwar