Emeric POUPON
2016-06-16 16:49:26 UTC
Hello,
The HA plugin uses sockets to communicate with the other members of the HA cluster.
The problem is that if there is a transmission error we end up in a desynchronized state, which may be very difficult to recover from.
Actually we have a modified version of the HA plugin that uses corosync instead of sockets but the problem is still the same, even it is mitigated.
The question is: how can we automatically recover from such a situation?
I was thinking about sending the non responsible nodes a FLUSH message in order for them to clean up everything and make then respond with a RESYNC message.
The problem is that in ha_socket we have no clue about segments, messages and responsibilities... Maybe we would need a new event?
What do you think?
Emeric
The HA plugin uses sockets to communicate with the other members of the HA cluster.
The problem is that if there is a transmission error we end up in a desynchronized state, which may be very difficult to recover from.
Actually we have a modified version of the HA plugin that uses corosync instead of sockets but the problem is still the same, even it is mitigated.
The question is: how can we automatically recover from such a situation?
I was thinking about sending the non responsible nodes a FLUSH message in order for them to clean up everything and make then respond with a RESYNC message.
The problem is that in ha_socket we have no clue about segments, messages and responsibilities... Maybe we would need a new event?
What do you think?
Emeric