Discussion:
[strongSwan-dev] [PATCH] starter: cleanup SAs when deleting a connection
Christophe Gouault
2014-09-19 15:15:32 UTC
Permalink
From: Zheng Zhong <***@6wind.com>

Do a little cleanup when deleting a connection via "ipsec update"
command:
- delete all established CHILD_SAs
- unroute the connection
- delete IKE_SAs that have no more CHILD_SAs
- delete the connection
- make sure to refuse an undesired negotiation request from the peer,
by deleting the connection before terminating it.

Signed-off-by: Zheng Zhong <***@6wind.com>
Acked-by: Christophe Gouault <***@6wind.com>
---
src/starter/starter.c | 8 +++++--
src/starter/starterstroke.c | 49 +++++++++++++++++++++++++++++++++++++++++++
src/starter/starterstroke.h | 2 ++
3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/src/starter/starter.c b/src/starter/starter.c
index 71f33ae..aa740ba 100644
--- a/src/starter/starter.c
+++ b/src/starter/starter.c
@@ -713,11 +713,13 @@ int main (int argc, char **argv)
{
if (starter_charon_pid())
{
+ starter_stroke_del_conn(conn);
+ starter_stroke_terminate_conn(conn);
+ starter_stroke_purge_ike();
if (conn->startup == STARTUP_ROUTE)
{
starter_stroke_unroute_conn(conn);
}
- starter_stroke_del_conn(conn);
}
conn->state = STATE_TO_ADD;
}
@@ -774,11 +776,13 @@ int main (int argc, char **argv)
{
if (starter_charon_pid())
{
+ starter_stroke_del_conn(conn);
+ starter_stroke_terminate_conn(conn);
+ starter_stroke_purge_ike();
if (conn->startup == STARTUP_ROUTE)
{
starter_stroke_unroute_conn(conn);
}
- starter_stroke_del_conn(conn);
}
}
}
diff --git a/src/starter/starterstroke.c b/src/starter/starterstroke.c
index 1e305db..e1b369b 100644
--- a/src/starter/starterstroke.c
+++ b/src/starter/starterstroke.c
@@ -16,6 +16,7 @@
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
+#include <stdarg.h>

#include <credentials/auth_cfg.h>

@@ -47,6 +48,28 @@ static char* push_string(stroke_msg_t *msg, char *string)
}
}

+static char* push_format(stroke_msg_t *msg, const char *format, ...)
+{
+ unsigned long string_start = msg->length;
+ size_t room = sizeof(stroke_msg_t) - msg->length;
+ int written;
+ va_list ap;
+
+ va_start(ap, format);
+
+ written = vsnprintf((char*)msg + string_start, room, format, ap);
+
+ if (written >= room)
+ {
+ return NULL;
+ }
+ else
+ {
+ msg->length += written + 1;
+ return (char*)string_start;
+ }
+}
+
static int send_stroke_msg (stroke_msg_t *msg)
{
stream_t *stream;
@@ -280,6 +303,32 @@ int starter_stroke_initiate_conn(starter_conn_t *conn)
return send_stroke_msg(&msg);
}

+/*
+ * Terminate all established CHILD_SAs of a connection
+ */
+int starter_stroke_terminate_conn(starter_conn_t *conn)
+{
+ stroke_msg_t msg;
+
+ msg.type = STR_TERMINATE;
+ msg.length = offsetof(stroke_msg_t, buffer);
+ msg.initiate.name = push_format(&msg, "%s{*}", connection_name(conn));
+ return send_stroke_msg(&msg);
+}
+
+/*
+ * Delete IKE_SAs without a CHILD_SA
+ */
+int starter_stroke_purge_ike(void)
+{
+ stroke_msg_t msg;
+
+ msg.type = STR_PURGE;
+ msg.length = offsetof(stroke_msg_t, buffer);
+ msg.purge.flags = PURGE_IKE;
+ return send_stroke_msg(&msg);
+}
+
int starter_stroke_add_ca(starter_ca_t *ca)
{
stroke_msg_t msg;
diff --git a/src/starter/starterstroke.h b/src/starter/starterstroke.h
index 1264863..4d23b97 100644
--- a/src/starter/starterstroke.h
+++ b/src/starter/starterstroke.h
@@ -23,6 +23,8 @@ int starter_stroke_del_conn(starter_conn_t *conn);
int starter_stroke_route_conn(starter_conn_t *conn);
int starter_stroke_unroute_conn(starter_conn_t *conn);
int starter_stroke_initiate_conn(starter_conn_t *conn);
+int starter_stroke_terminate_conn(starter_conn_t *conn);
+int starter_stroke_purge_ike(void);
int starter_stroke_add_ca(starter_ca_t *ca);
int starter_stroke_del_ca(starter_ca_t *ca);
int starter_stroke_configure(starter_config_t *cfg);
--
1.7.10.4
Martin Willi
2014-10-02 08:08:26 UTC
Permalink
Hi Christophe,

Thanks for your patch.
Post by Christophe Gouault
Do a little cleanup when deleting a connection via "ipsec update"
- delete all established CHILD_SAs
- unroute the connection
- delete IKE_SAs that have no more CHILD_SAs
- delete the connection
- make sure to refuse an undesired negotiation request from the peer,
by deleting the connection before terminating it.
These chances certainly make sense in some scenarios. However, the
behavioral change is non-trivial. That an "update" of connections
deletes all associated SAs is not that obvious, especially as we did not
do that before. I'd guess we'd break many scripted installations with
that change.

If we introduce such a behavioral change, I think we need to make that
optional, and probably disable it by default.

Regards
Martin
Christophe Gouault
2014-10-02 08:13:33 UTC
Permalink
Post by Martin Willi
Hi Christophe,
Thanks for your patch.
Post by Christophe Gouault
Do a little cleanup when deleting a connection via "ipsec update"
- delete all established CHILD_SAs
- unroute the connection
- delete IKE_SAs that have no more CHILD_SAs
- delete the connection
- make sure to refuse an undesired negotiation request from the peer,
by deleting the connection before terminating it.
These chances certainly make sense in some scenarios. However, the
behavioral change is non-trivial. That an "update" of connections
deletes all associated SAs is not that obvious, especially as we did not
do that before. I'd guess we'd break many scripted installations with
that change.
If we introduce such a behavioral change, I think we need to make that
optional, and probably disable it by default.
Regards
Martin
Hi Martin,

You're right, this makes sense. I'll provide an update that makes it optional.

Best regards,
Christophe
Emeric POUPON
2015-01-29 14:18:46 UTC
Permalink
Hello,

Thanks for your patch: I think it is definitely a good idea to flush connections that are no longer up to date with the configuration files.
Did you manage to make an updated patch?

I have another related problem:
I have two CA certificates in ipsec.d/cacerts. I can see them using "ipsec listcacerts"
If I remove one of them and perform a "ipsec rereadcacerts", I can see in charon's log that the only remaining CA certificate is reloaded.
However, I still see the two CA certs using the "ipsec listcacerts" command. "ipsec purgecerts" does not seem to help.
Remote peers successfully manage to authenticate using the removed CA cert, that is quite annoying.

Any idea?

Best Regards,

Emeric


----- Mail original -----
De: "Christophe Gouault" <***@6wind.com>
À: "Martin Willi" <***@strongswan.org>
Cc: ***@lists.strongswan.org
Envoyé: Jeudi 2 Octobre 2014 10:13:33
Objet: Re: [strongSwan-dev] [PATCH] starter: cleanup SAs when deleting a connection
Post by Martin Willi
Hi Christophe,
Thanks for your patch.
Post by Christophe Gouault
Do a little cleanup when deleting a connection via "ipsec update"
- delete all established CHILD_SAs
- unroute the connection
- delete IKE_SAs that have no more CHILD_SAs
- delete the connection
- make sure to refuse an undesired negotiation request from the peer,
by deleting the connection before terminating it.
These chances certainly make sense in some scenarios. However, the
behavioral change is non-trivial. That an "update" of connections
deletes all associated SAs is not that obvious, especially as we did not
do that before. I'd guess we'd break many scripted installations with
that change.
If we introduce such a behavioral change, I think we need to make that
optional, and probably disable it by default.
Regards
Martin
Hi Martin,

You're right, this makes sense. I'll provide an update that makes it optional.

Best regards,
Christophe
_______________________________________________
Dev mailing list
***@lists.strongswan.org
https://lists.strongswan.org/mailman/listinfo/dev
Christophe Gouault
2015-01-29 15:52:12 UTC
Permalink
Post by Emeric POUPON
Hello,
Thanks for your patch: I think it is definitely a good idea to flush connections that are no longer up to date with the configuration files.
Did you manage to make an updated patch?
Hello Emeric,

I had to switch to priority tasks, so I let this patch in standby
(long term standby ;-)). I'll try to find some time to add an option
in strongswan.conf.
Post by Emeric POUPON
I have two CA certificates in ipsec.d/cacerts. I can see them using "ipsec listcacerts"
If I remove one of them and perform a "ipsec rereadcacerts", I can see in charon's log that the only remaining CA certificate is reloaded.
However, I still see the two CA certs using the "ipsec listcacerts" command. "ipsec purgecerts" does not seem to help.
Remote peers successfully manage to authenticate using the removed CA cert, that is quite annoying.
Any idea
Obviously additional clean up is desirable.

Best Regards,

Christophe
Emeric POUPON
2015-02-20 17:10:54 UTC
Permalink
Hello,

Unfortunately, I am facing an issue with this patch.
For example, we may want to update the configuration file since the remote host's IP address has changed.
When charon receives the terminate stroke message, it sends the DELETE message but it may take minutes before giving up if the remote host is down!
Therefore the new configuration may be applied several minutes later, which is quite unexpected.

What do you think?

Emeric

----- Mail original -----
De: "Christophe Gouault" <***@6wind.com>
À: "Emeric POUPON" <***@stormshield.eu>
Cc: "Martin Willi" <***@strongswan.org>, ***@lists.strongswan.org
Envoyé: Jeudi 29 Janvier 2015 16:52:12
Objet: Re: [strongSwan-dev] [PATCH] starter: cleanup SAs when deleting a connection
Post by Emeric POUPON
Hello,
Thanks for your patch: I think it is definitely a good idea to flush connections that are no longer up to date with the configuration files.
Did you manage to make an updated patch?
Hello Emeric,

I had to switch to priority tasks, so I let this patch in standby
(long term standby ;-)). I'll try to find some time to add an option
in strongswan.conf.
Post by Emeric POUPON
I have two CA certificates in ipsec.d/cacerts. I can see them using "ipsec listcacerts"
If I remove one of them and perform a "ipsec rereadcacerts", I can see in charon's log that the only remaining CA certificate is reloaded.
However, I still see the two CA certs using the "ipsec listcacerts" command. "ipsec purgecerts" does not seem to help.
Remote peers successfully manage to authenticate using the removed CA cert, that is quite annoying.
Any idea
Obviously additional clean up is desirable.

Best Regards,

Christophe
Christophe Gouault
2015-02-23 17:23:09 UTC
Permalink
Hello Emeric,
Post by Emeric POUPON
Hello,
Unfortunately, I am facing an issue with this patch.
For example, we may want to update the configuration file since the remote host's IP address has changed.
When charon receives the terminate stroke message, it sends the DELETE message but it may take minutes before giving up if the remote host is down!
Indeed, if the peer does not respond, the actual tear down of the
connection will last until the timeout is reached, but as far as I
know, this does not prevent from completing the cleanup and applying
the new configuration.
Post by Emeric POUPON
Therefore the new configuration may be applied several minutes later, which is quite unexpected.
What do you think?
Well, I think the new conf can be used immediately (the old connection
will just survive for a while until the timeout is reached). I'll try
to do a little test.

Best Regards,
Christophe
Christophe Gouault
2015-02-25 09:57:58 UTC
Permalink
Post by Christophe Gouault
Hello Emeric,
Post by Emeric POUPON
Hello,
Unfortunately, I am facing an issue with this patch.
For example, we may want to update the configuration file since the remote host's IP address has changed.
When charon receives the terminate stroke message, it sends the DELETE message but it may take minutes before giving up if the remote host is down!
Indeed, if the peer does not respond, the actual tear down of the
connection will last until the timeout is reached, but as far as I
know, this does not prevent from completing the cleanup and applying
the new configuration.
Post by Emeric POUPON
Therefore the new configuration may be applied several minutes later, which is quite unexpected.
What do you think?
Well, I think the new conf can be used immediately (the old connection
will just survive for a while until the timeout is reached). I'll try
to do a little test.
Hello Emeric,

After testing, I confirm the problem you describe: the unsuccessful
sending of a delete message delays the cleanup and applying of the new
conf.

This patch obviously needs some rework. Thanks for raising the issue.

Best Regards,
Christophe
Emeric POUPON
2015-02-25 10:24:15 UTC
Permalink
Hello,

Thanks for your support.

I noticed the terminate_XX methods of the controller are synchronous when a callback is provided:

* Terminate an IKE_SA and all of its CHILD_SAs.
*
* If a callback is provided the function is synchronous and thus blocks
* until the IKE_SA is properly deleted, or the call timed out.

Therefore, doing things like this in the src/libcharon/plugins/stroke/stroke_control.c file seems to correct the problem:

@@ -390,7 +390,7 @@ METHOD(stroke_control_t, terminate, void,
while (enumerator->enumerate(enumerator, &del))
{
status = charon->controller->terminate_ike(charon->controller, del,
- (controller_cb_t)stroke_log, &info, this->timeout);
+ msg->output_verbosity == -1 ? NULL : (controller_cb_t)stroke_log, &info, this->timeout);
report_terminate_status(this, status, out, del, FALSE);
}
enumerator->destroy(enumerator);

I really don't know if it is the right fix, since it may raise other problems...

What do you think?

Emeric

----- Mail original -----
De: "Christophe Gouault" <***@6wind.com>
À: "Emeric POUPON" <***@stormshield.eu>
Cc: "Martin Willi" <***@strongswan.org>, ***@lists.strongswan.org
Envoyé: Mercredi 25 Février 2015 10:57:58
Objet: Re: [strongSwan-dev] [PATCH] starter: cleanup SAs when deleting a connection
Post by Christophe Gouault
Hello Emeric,
Post by Emeric POUPON
Hello,
Unfortunately, I am facing an issue with this patch.
For example, we may want to update the configuration file since the remote host's IP address has changed.
When charon receives the terminate stroke message, it sends the DELETE message but it may take minutes before giving up if the remote host is down!
Indeed, if the peer does not respond, the actual tear down of the
connection will last until the timeout is reached, but as far as I
know, this does not prevent from completing the cleanup and applying
the new configuration.
Post by Emeric POUPON
Therefore the new configuration may be applied several minutes later, which is quite unexpected.
What do you think?
Well, I think the new conf can be used immediately (the old connection
will just survive for a while until the timeout is reached). I'll try
to do a little test.
Hello Emeric,

After testing, I confirm the problem you describe: the unsuccessful
sending of a delete message delays the cleanup and applying of the new
conf.

This patch obviously needs some rework. Thanks for raising the issue.

Best Regards,
Christophe
Christophe Gouault
2015-02-25 11:00:32 UTC
Permalink
Post by Emeric POUPON
Hello,
Thanks for your support.
* Terminate an IKE_SA and all of its CHILD_SAs.
*
* If a callback is provided the function is synchronous and thus blocks
* until the IKE_SA is properly deleted, or the call timed out.
@@ -390,7 +390,7 @@ METHOD(stroke_control_t, terminate, void,
while (enumerator->enumerate(enumerator, &del))
{
status = charon->controller->terminate_ike(charon->controller, del,
- (controller_cb_t)stroke_log, &info, this->timeout);
+ msg->output_verbosity == -1 ? NULL : (controller_cb_t)stroke_log, &info, this->timeout);
report_terminate_status(this, status, out, del, FALSE);
}
enumerator->destroy(enumerator);
I really don't know if it is the right fix, since it may raise other problems...
What do you think?
Well spotted!

It seems that there was a will to support non blocking connection
initiation/termination in 2013 (stroke up-nb and down-nb commands were
added in commit 4182c86aed84933b3efa0367 "stroke: Add non-blocking
versions of up and down"; they precisely use the output_verbosity),
but only stroke up-nb works as expected: terminate does not take
output_verbosity in account...

Christophe
Post by Emeric POUPON
Emeric
----- Mail original -----
Envoyé: Mercredi 25 Février 2015 10:57:58
Objet: Re: [strongSwan-dev] [PATCH] starter: cleanup SAs when deleting a connection
Post by Christophe Gouault
Hello Emeric,
Post by Emeric POUPON
Hello,
Unfortunately, I am facing an issue with this patch.
For example, we may want to update the configuration file since the remote host's IP address has changed.
When charon receives the terminate stroke message, it sends the DELETE message but it may take minutes before giving up if the remote host is down!
Indeed, if the peer does not respond, the actual tear down of the
connection will last until the timeout is reached, but as far as I
know, this does not prevent from completing the cleanup and applying
the new configuration.
Post by Emeric POUPON
Therefore the new configuration may be applied several minutes later, which is quite unexpected.
What do you think?
Well, I think the new conf can be used immediately (the old connection
will just survive for a while until the timeout is reached). I'll try
to do a little test.
Hello Emeric,
After testing, I confirm the problem you describe: the unsuccessful
sending of a delete message delays the cleanup and applying of the new
conf.
This patch obviously needs some rework. Thanks for raising the issue.
Best Regards,
Christophe
Emeric POUPON
2015-02-26 08:58:39 UTC
Permalink
Ok, thanks for this explanation.

Martin, do you plan to fix this? Do you want me to fill in a bug report?

Best regards,

Emeric

----- Mail original -----
De: "Christophe Gouault" <***@6wind.com>
À: "Emeric POUPON" <***@stormshield.eu>
Cc: "Martin Willi" <***@strongswan.org>, ***@lists.strongswan.org
Envoyé: Mercredi 25 Février 2015 12:00:32
Objet: Re: [strongSwan-dev] [PATCH] starter: cleanup SAs when deleting a connection
Post by Emeric POUPON
Hello,
Thanks for your support.
* Terminate an IKE_SA and all of its CHILD_SAs.
*
* If a callback is provided the function is synchronous and thus blocks
* until the IKE_SA is properly deleted, or the call timed out.
@@ -390,7 +390,7 @@ METHOD(stroke_control_t, terminate, void,
while (enumerator->enumerate(enumerator, &del))
{
status = charon->controller->terminate_ike(charon->controller, del,
- (controller_cb_t)stroke_log, &info, this->timeout);
+ msg->output_verbosity == -1 ? NULL : (controller_cb_t)stroke_log, &info, this->timeout);
report_terminate_status(this, status, out, del, FALSE);
}
enumerator->destroy(enumerator);
I really don't know if it is the right fix, since it may raise other problems...
What do you think?
Well spotted!

It seems that there was a will to support non blocking connection
initiation/termination in 2013 (stroke up-nb and down-nb commands were
added in commit 4182c86aed84933b3efa0367 "stroke: Add non-blocking
versions of up and down"; they precisely use the output_verbosity),
but only stroke up-nb works as expected: terminate does not take
output_verbosity in account...

Christophe
Post by Emeric POUPON
Emeric
----- Mail original -----
Envoyé: Mercredi 25 Février 2015 10:57:58
Objet: Re: [strongSwan-dev] [PATCH] starter: cleanup SAs when deleting a connection
Post by Christophe Gouault
Hello Emeric,
Post by Emeric POUPON
Hello,
Unfortunately, I am facing an issue with this patch.
For example, we may want to update the configuration file since the remote host's IP address has changed.
When charon receives the terminate stroke message, it sends the DELETE message but it may take minutes before giving up if the remote host is down!
Indeed, if the peer does not respond, the actual tear down of the
connection will last until the timeout is reached, but as far as I
know, this does not prevent from completing the cleanup and applying
the new configuration.
Post by Emeric POUPON
Therefore the new configuration may be applied several minutes later, which is quite unexpected.
What do you think?
Well, I think the new conf can be used immediately (the old connection
will just survive for a while until the timeout is reached). I'll try
to do a little test.
Hello Emeric,
After testing, I confirm the problem you describe: the unsuccessful
sending of a delete message delays the cleanup and applying of the new
conf.
This patch obviously needs some rework. Thanks for raising the issue.
Best Regards,
Christophe
Emeric POUPON
2015-03-04 13:45:11 UTC
Permalink
Hello,
Post by Christophe Gouault
Post by Emeric POUPON
@@ -390,7 +390,7 @@ METHOD(stroke_control_t, terminate, void,
while (enumerator->enumerate(enumerator, &del))
{
status = charon->controller->terminate_ike(charon->controller, del,
- (controller_cb_t)stroke_log, &info, this->timeout);
+ msg->output_verbosity == -1 ? NULL : (controller_cb_t)stroke_log, &info, this->timeout);
report_terminate_status(this, status, out, del, FALSE);
}
enumerator->destroy(enumerator);
I really don't know if it is the right fix, since it may raise other problems...
What do you think?
Well spotted!
It seems that there was a will to support non blocking connection
initiation/termination in 2013 (stroke up-nb and down-nb commands were
added in commit 4182c86aed84933b3efa0367 "stroke: Add non-blocking
versions of up and down"; they precisely use the output_verbosity),
but only stroke up-nb works as expected: terminate does not take
output_verbosity in account...
Christophe
Unfortunately, I think this is not enough to solve this issue.
1- change the remote GW's IP address (from 172.16.0.3 to 172.16.2.3), keep the same left/right subnets
2- kill the remote host
3- SIGHUP the daemon.

The old connection is being destroyed in the background (since the DELETE are not acked), and the new connection is loaded.
Since the new connection has the same traffic selectors, the generated SP are the same.
Therefore I get things like that:

Feb 28 03:30:14 11[CFG] added configuration 'new_connection'
...
Feb 28 03:30:14 14[CFG] received stroke: route 'new_connection'
Feb 28 03:30:14 14[CFG] proposing traffic selectors for us:
Feb 28 03:30:14 14[CFG] 192.168.3.0/24
Feb 28 03:30:14 14[CFG] proposing traffic selectors for other:
Feb 28 03:30:14 14[CFG] 192.168.13.0/24
Feb 28 03:30:14 14[CFG] configured proposals: ESP:DES_CBC/HMAC_SHA1_96/NO_EXT_SEQ, ESP:DES_CBC/HMAC_MD5_96/NO_EXT_SEQ
Feb 28 03:30:14 14[KNL] policy 192.168.3.0/24 === 192.168.13.0/24 out already exists, increasing refcount
Feb 28 03:30:14 14[KNL] policy 192.168.13.0/24 === 192.168.3.0/24 in already exists, increasing refcount

The SP in the kernel still references the IP of the previous GW. The tunnel can never goes up again...

# setkey -DP
192.168.13.0/24[any] 192.168.3.0/24[any] 255
in ipsec
esp/tunnel/172.16.1.13-172.16.0.3/unique:1
spid=1638 seq=2 pid=92527
refcnt=1
192.168.3.0/24[any] 192.168.13.0/24[any] 255
out ipsec
esp/tunnel/172.16.0.3-172.16.1.13/unique:1
spid=1637 seq=0 pid=92527
refcnt=1

Maybe we would need to flush the SP first if the last connection that uses it is being deleted?

What do you think?

Emeric
Emeric POUPON
2015-03-05 08:45:17 UTC
Permalink
This post might be inappropriate. Click to display it.
Christophe Gouault
2015-03-05 09:08:32 UTC
Permalink
Post by Emeric POUPON
Hello,
Post by Christophe Gouault
Post by Emeric POUPON
@@ -390,7 +390,7 @@ METHOD(stroke_control_t, terminate, void,
while (enumerator->enumerate(enumerator, &del))
{
status = charon->controller->terminate_ike(charon->controller, del,
- (controller_cb_t)stroke_log, &info, this->timeout);
+ msg->output_verbosity == -1 ? NULL : (controller_cb_t)stroke_log, &info, this->timeout);
report_terminate_status(this, status, out, del, FALSE);
}
enumerator->destroy(enumerator);
I really don't know if it is the right fix, since it may raise other problems...
What do you think?
Well spotted!
It seems that there was a will to support non blocking connection
initiation/termination in 2013 (stroke up-nb and down-nb commands were
added in commit 4182c86aed84933b3efa0367 "stroke: Add non-blocking
versions of up and down"; they precisely use the output_verbosity),
but only stroke up-nb works as expected: terminate does not take
output_verbosity in account...
Christophe
Unfortunately, I think this is not enough to solve this issue.
1- change the remote GW's IP address (from 172.16.0.3 to 172.16.2.3), keep the same left/right subnets
2- kill the remote host
3- SIGHUP the daemon.
The old connection is being destroyed in the background (since the DELETE are not acked), and the new connection is loaded.
Since the new connection has the same traffic selectors, the generated SP are the same.
Feb 28 03:30:14 11[CFG] added configuration 'new_connection'
...
Feb 28 03:30:14 14[CFG] received stroke: route 'new_connection'
Feb 28 03:30:14 14[CFG] 192.168.3.0/24
Feb 28 03:30:14 14[CFG] 192.168.13.0/24
Feb 28 03:30:14 14[CFG] configured proposals: ESP:DES_CBC/HMAC_SHA1_96/NO_EXT_SEQ, ESP:DES_CBC/HMAC_MD5_96/NO_EXT_SEQ
Feb 28 03:30:14 14[KNL] policy 192.168.3.0/24 === 192.168.13.0/24 out already exists, increasing refcount
Feb 28 03:30:14 14[KNL] policy 192.168.13.0/24 === 192.168.3.0/24 in already exists, increasing refcount
The SP in the kernel still references the IP of the previous GW. The tunnel can never goes up again...
# setkey -DP
192.168.13.0/24[any] 192.168.3.0/24[any] 255
in ipsec
esp/tunnel/172.16.1.13-172.16.0.3/unique:1
spid=1638 seq=2 pid=92527
refcnt=1
192.168.3.0/24[any] 192.168.13.0/24[any] 255
out ipsec
esp/tunnel/172.16.0.3-172.16.1.13/unique:1
spid=1637 seq=0 pid=92527
refcnt=1
Maybe we would need to flush the SP first if the last connection that uses it is being deleted?
What do you think?
Emeric
Hi Emeric,

Right, the SP should probably be removed first (as is done in the
original code).

The order of operations must be carefully checked. Probably first
deleting connection (starter_stroke_del_conn), to avoid immediate
renegotiation, then delete SPs (starter_stroke_unroute_conn), then
asynchronously terminate connections.

The main problem when terminating connections is to terminate the right IKE_SAs.

Identifying CHILD_SAs to delete is rather easy (their name is the same
as the connection name), but that is not the case of IKE_SAs: several
"connections" may have the same left/right pair, so share the same
IKE_SA, so:

- the name of an IKE_SA used by a connection may be the name of
another connection (the IKE_SA name is the name of the first
connection that established an IKE negotiation between the same
left/right pair).
- if you find an IKE_SA named after the connection that you are
deleting, you must be sure that it is not used by other CHILD SAs
derived from another (not deleted) connection.

That's why starter_stroke_terminate_conn then starter_stroke_purge_ike
are called in the patch (the first one terminates CHILD_SAs of deleted
connections, the second one terminates IKE_SAs with no CHILD_SA). But
in the asyncrhonous deletion case, the IKE_SAs will not be deleted
since deletion of CHILD_SAs is pending.

Best Regards,
Christophe
Emeric POUPON
2015-03-10 15:22:32 UTC
Permalink
Hello,
Post by Emeric POUPON
Post by Emeric POUPON
Maybe we would need to flush the SP first if the last connection that uses it is being deleted?
What do you think?
Emeric
Hi Emeric,
Right, the SP should probably be removed first (as is done in the
original code).
The order of operations must be carefully checked. Probably first
deleting connection (starter_stroke_del_conn), to avoid immediate
renegotiation, then delete SPs (starter_stroke_unroute_conn), then
asynchronously terminate connections.
I tried this way, but unfortunately the starter_stroke_unroute_conn call has no immediate effect since the route is still used by the established CHILD SA. Therefore I still get the issue described before.
I also tried to have a look at the "flush" method of the ike_sa_manager to see of the "delete" with no ack is performed: maybe we would need something similar in order to synchronously flush the connections that are no longer handled in the configuration file?
Post by Emeric POUPON
The main problem when terminating connections is to terminate the right IKE_SAs.
Identifying CHILD_SAs to delete is rather easy (their name is the same
as the connection name), but that is not the case of IKE_SAs: several
"connections" may have the same left/right pair, so share the same
- the name of an IKE_SA used by a connection may be the name of
another connection (the IKE_SA name is the name of the first
connection that established an IKE negotiation between the same
left/right pair).
- if you find an IKE_SA named after the connection that you are
deleting, you must be sure that it is not used by other CHILD SAs
derived from another (not deleted) connection.
That's why starter_stroke_terminate_conn then starter_stroke_purge_ike
are called in the patch (the first one terminates CHILD_SAs of deleted
connections, the second one terminates IKE_SAs with no CHILD_SA). But
in the asyncrhonous deletion case, the IKE_SAs will not be deleted
since deletion of CHILD_SAs is pending.
How would you implement that? I guess using a new stroke message?

BTW, configuration reloading is quite a common subject on the strongswan's mailing lists,
I am wondering if one of the official developers has already some ideas about this topic and things that may be done?

Best Regards,

Emeric
Martin Willi
2015-03-10 16:05:13 UTC
Permalink
Post by Emeric POUPON
I also tried to have a look at the "flush" method of the ike_sa_manager
to see of the "delete" with no ack is performed: maybe we would need
something similar in order to synchronously flush the connections that
are no longer handled in the configuration file?
Note that a hard delete without a confirmed exchange is something we
should avoid when possible; The peer might think that the tunnel is
still alive, and sends traffic to a black hole.

Having that said, you may try to issue two subsequent "down" commands.
The first will trigger a graceful tunnel shutdown with confirmation.
Once in the DELETING state, an additional "down" command will
immediately remove the IKE_SA.
Post by Emeric POUPON
BTW, configuration reloading is quite a common subject on the
strongswan's mailing lists, I am wondering if one of the official
developers has already some ideas about this topic and things that may
be done?
As we can see in this and other discussions, the topic is not trivial.
One reason for this is that the internal configuration hierarchy in
charon does not match very well to the ipsec.conf format inherited from
FreeS/WAN.

The current behavior is therefore limited to reloading connection
definitions, while keeping existing tunnels (which I think is absolutely
legitimate). It requires that the administrator manually closes/
initiates affected tunnels if he wants to. Of course that won't work
well in a scripted environment, but this is something where the stroke
interface has been bad at ever since.

In my opinion, I think we should focus more on the swanctl interface and
the underlying vici IPC mechanism. It avoids many problems by closer
resembling the configuration hierarchy in swanctl.conf. When reloading
connections, it inverses any specified start_action, and so basically
affects established connection (not manually initiated). This is all
relatively new, and certainly far away from perfect. But as we have a
better configuration format and a proper return channel in vici, the
foundation is much better to implement such functionality.

Regards
Martin
Emeric POUPON
2015-03-11 11:16:16 UTC
Permalink
Helo,
Post by Martin Willi
Note that a hard delete without a confirmed exchange is something we
should avoid when possible; The peer might think that the tunnel is
still alive, and sends traffic to a black hole.
Having that said, you may try to issue two subsequent "down" commands.
The first will trigger a graceful tunnel shutdown with confirmation.
Once in the DELETING state, an additional "down" command will
immediately remove the IKE_SA.
Thanks for the tip!
In order to make it work, I had to modify the terminate command of the stroke plugin in order to delete the IKE SA if a CHILD SA exists with the same name.
The second call indeed remove the IKE SA immediately.
Post by Martin Willi
In my opinion, I think we should focus more on the swanctl interface and
the underlying vici IPC mechanism. It avoids many problems by closer
resembling the configuration hierarchy in swanctl.conf. When reloading
connections, it inverses any specified start_action, and so basically
affects established connection (not manually initiated). This is all
relatively new, and certainly far away from perfect. But as we have a
better configuration format and a proper return channel in vici, the
foundation is much better to implement such functionality.
I will take some time to have a look at this new configuration interface, but I'm afraid we are likely to hit trouble too.

Best Regards,

Emeric

Loading...