Discussion:
[strongSwan-dev] DoS protection questions
Emeric POUPON
2018-04-03 12:44:41 UTC
Permalink
Hello,

As far as I understand, IKE_SAs are only registered as half-open after the first message has successfully been handled from the job queue.

If we are under a DoS attack (even a small one like 320 packets/s), we end up with a huge amount of jobs in queue and the system takes hours to recover, that is definitely questionable.

Example:
"2018-02-06 16:14:09" zone=GMT tz=+0000 ntp=Off
worker threads: 11 of 16 idle, 5/0/0/0 working, job queue: 0/0/0/0, scheduled: 3
"2018-02-06 16:14:19" zone=GMT tz=+0000 ntp=Off
worker threads: 11 of 16 idle, 5/0/0/0 working, job queue: 0/0/0/0, scheduled: 3
"2018-02-06 16:14:29" zone=GMT tz=+0000 ntp=Off
worker threads: 0 of 16 idle, 5/0/11/0 working, job queue: 0/0/221/0, scheduled: 3
"2018-02-06 16:14:39" zone=GMT tz=+0000 ntp=Off
worker threads: 0 of 16 idle, 5/0/11/0 working, job queue: 0/0/3102/0, scheduled: 2
"2018-02-06 16:14:49" zone=GMT tz=+0000 ntp=Off
worker threads: 0 of 16 idle, 5/0/11/0 working, job queue: 0/0/7137/0, scheduled: 2
...
"2018-02-06 16:25:47" zone=GMT tz=+0000 ntp=Off
worker threads: 0 of 16 idle, 5/0/11/0 working, job queue: 0/0/122518/0, scheduled: 2
"2018-02-06 16:25:58" zone=GMT tz=+0000 ntp=Off
worker threads: 0 of 16 idle, 5/0/11/0 working, job queue: 0/0/123698/0, scheduled: 2

Even if charon.block_threshold is set to 5, each time we successfully establish an IKE SA, we can queue a huge amount of pending jobs until the next IKE_SA_INIT is processed to increase the half-open counter.

Questions:
- why is the this counter increased after the first message has successfully been handled from the job queue?
- is charon.init_limit_job_load the only relevant setting for DoS protection?

Regards,

Emeric
Tobias Brunner
2018-04-04 08:16:41 UTC
Permalink
Hi Emeric,
Post by Emeric POUPON
- why is the this counter increased after the first message has successfully been handled from the job queue?
The half-open SA counter is increased whenever an IKE_SA object is
checked into the IKE_SA manager after processing (or initiating) an
IKE_SA_INIT request, and reduced when an IKE_SA is checked in after
successfully establishing it with the last IKE_AUTH request.
Post by Emeric POUPON
- is charon.init_limit_job_load the only relevant setting for DoS protection?
No, there are several others. The first is charon.cookie_threshold (and
charon.dos_protection), which causes COOKIEs to get returned if the
global number of half-open SAs exceeds the limit, which helps if the
IKE_SA_INITs are sent from fake IPs. If the requests are sent from real
hosts that actually retry initiating with the returned COOKIE payload
and (if they send multiple requests) modify the nonces/KE payload the
next option is charon.block_threshold, which sets a limit for half-open
SAs per source IP. Then the next limit is charon.init_limit_half_open,
which drops IKE_SA_INITs if the global half-open SA count exceeds a
certain number. Similarly, the charon.init_limit_job_load option will
cause IKE_SA_INITs to get dropped if the total number of queued jobs
exceeds a certain number. Next are options that might help processing
the queued jobs faster, e.g. using hash tables in the IKE_SA manager [1]
and optimizing thread allocation [2].

Regards,
Tobias

[1] https://wiki.strongswan.org/projects/strongswan/wiki/IkeSaTable
[2] https://wiki.strongswan.org/projects/strongswan/wiki/JobPriority
Emeric POUPON
2018-04-04 08:33:45 UTC
Permalink
Post by Tobias Brunner
Post by Emeric POUPON
- is charon.init_limit_job_load the only relevant setting for DoS protection?
No, there are several others. The first is charon.cookie_threshold (and
charon.dos_protection), which causes COOKIEs to get returned if the
global number of half-open SAs exceeds the limit, which helps if the
IKE_SA_INITs are sent from fake IPs. If the requests are sent from real
hosts that actually retry initiating with the returned COOKIE payload
and (if they send multiple requests) modify the nonces/KE payload the
next option is charon.block_threshold, which sets a limit for half-open
SAs per source IP. Then the next limit is charon.init_limit_half_open,
which drops IKE_SA_INITs if the global half-open SA count exceeds a
certain number. Similarly, the charon.init_limit_job_load option will
cause IKE_SA_INITs to get dropped if the total number of queued jobs
exceeds a certain number. Next are options that might help processing
the queued jobs faster, e.g. using hash tables in the IKE_SA manager [1]
and optimizing thread allocation [2].
Regards,
Tobias
[1] https://wiki.strongswan.org/projects/strongswan/wiki/IkeSaTable
[2] https://wiki.strongswan.org/projects/strongswan/wiki/JobPriority
Hello,

Thanks for your answer.
I know these settings and they look promising. Unfortunately as I said before they seem to be useless since the counter is increased too late in the IKE_SA manager.
We simulated a DoS attack and charon did not handle it well (see the logs in the initial question).

What do you think?

Emeric
Tobias Brunner
2018-04-04 09:12:50 UTC
Permalink
Post by Emeric POUPON
I know these settings and they look promising.
Why not use them then?
Post by Emeric POUPON
Unfortunately as I said before they seem to be useless since the counter is increased too late in the IKE_SA manager.
Yeah, I noticed that it's quite late. Since strongSwan calculates the
IKE keys while processing the IKE_SA_INIT request (and not e.g. when
processing the IKE_AUTH request) it might be better to increase this
counter when the IKE_SA is checked out. But even so I guess there could
be lots of packets queued initially until a number of them have been
processed to increase the half-open SA count. I suppose you could
counter that by some rate limiting in the firewall (e.g. only allow a
few UDP packet/s per source IP). We currently also don't recheck the
limits when processing queued packets (they are only checked early in
the receiver before they get queued).
Post by Emeric POUPON
We simulated a DoS attack and charon did not handle it well (see the logs in the initial question).
How exactly? And what settings did you use on the responder? (I saw
that there are e.g. only 16 threads and I guess you didn't set a job limit.)

Regards,
Tobias
Emeric POUPON
2018-04-04 10:05:53 UTC
Permalink
Post by Tobias Brunner
Post by Emeric POUPON
I know these settings and they look promising.
Why not use them then?
We do, but we still suffer from DoS attacks that seem trivial to setup.
Post by Tobias Brunner
Post by Emeric POUPON
Unfortunately as I said before they seem to be useless since the counter is
increased too late in the IKE_SA manager.
Yeah, I noticed that it's quite late. Since strongSwan calculates the
IKE keys while processing the IKE_SA_INIT request (and not e.g. when
processing the IKE_AUTH request) it might be better to increase this
counter when the IKE_SA is checked out. But even so I guess there could
be lots of packets queued initially until a number of them have been
processed to increase the half-open SA count. I suppose you could
counter that by some rate limiting in the firewall (e.g. only allow a
few UDP packet/s per source IP). We currently also don't recheck the
limits when processing queued packets (they are only checked early in
the receiver before they get queued).
Furthermore, I am afraid we actually queue a lot of jobs (more than one) when the counter is decreased by one.
I think it may be the root problem?
Post by Tobias Brunner
Post by Emeric POUPON
We simulated a DoS attack and charon did not handle it well (see the logs in the
initial question).
How exactly? And what settings did you use on the responder? (I saw
that there are e.g. only 16 threads and I guess you didn't set a job limit.)
Here is the initiator configuration :

charon {
reuse_ikesa = no
threads = 32

plugins {
load-tester {
enable = yes
initiators = 32
iterations = 1000000
delay = 100
responder = 172.21.21.33
proposal = aes128-sha1-modp1024
initiator_auth = psk
responder_auth = psk
request_virtual_ip = yes
ike_rekey = 0
child_rekey = 60
delete_after_established = no
shutdown_when_complete = no
}
}
}

On the responder, cookie threshold is set to 10 and block_threshold is set to 5, and there is neither job limit nor half open limit set.
There is no visible effect if we set both cookie and block thresholds to 1. Same for init_limit_half_open set to 5.
The only visible effect is to set a job limit, but since it is global we could prevent high priority jobs to run properly.

Regards,
Tobias Brunner
2018-04-04 10:12:15 UTC
Permalink
Post by Emeric POUPON
Furthermore, I am afraid we actually queue a lot of jobs (more than one) when the counter is decreased by one.
I think it may be the root problem?
Yes, until the next IKE_SA is checked in packets will be processed.
Post by Emeric POUPON
The only visible effect is to set a job limit, but since it is global we could prevent high priority jobs to run properly.
It's not a limit on the number of jobs, it's a limit that causes
IKE_SA_INITs to get dropped when the number of jobs exceeds the
configured number.

Regards,
Tobias
Emeric POUPON
2018-04-05 09:16:56 UTC
Permalink
Post by Tobias Brunner
Post by Emeric POUPON
Furthermore, I am afraid we actually queue a lot of jobs (more than one) when
the counter is decreased by one.
I think it may be the root problem?
Yes, until the next IKE_SA is checked in packets will be processed.
Do you want that I fill an issue for that?
Post by Tobias Brunner
Post by Emeric POUPON
The only visible effect is to set a job limit, but since it is global we could
prevent high priority jobs to run properly.
It's not a limit on the number of jobs, it's a limit that causes
IKE_SA_INITs to get dropped when the number of jobs exceeds the
configured number.
Ok, it does the job then.
Thanks again for your answers.

Regards,

Emeric

Loading...