Description
I am using your library to implement a leader election algorithm, and it's working so well, thanks for the wonderful work.
Here is a certain section of my code:
protected override async Task OnElectAsync(CancellationToken cancellationToken)
{
var dlock = _distributedLockManager.CreateLock($"{typeof(DistributedLockLeaderElectionProvider).FullName}/leader/8E7B4AC6-6C08-4467-801A-AA0AA76533AC/leader.lock");
while (!cancellationToken.IsCancellationRequested) {
var handle = await dlock.TryAcquireAsync(cancellationToken: cancellationToken);
if (handle != null) {
// save for later disposal when the current node stops or we lose the lock
_lockHandle = handle;
// register for handle loss
if (handle.HandleLostToken.CanBeCanceled) {
handle.HandleLostToken.Register(() => {
_lockHandle = null;
// lock connection lost, lease expired etc
// that means we are no longer leader
NotifyLeaderChanged(null);
StartElection("Lock handle lost");
// get rid of handle
try {
handle.Dispose();
} catch { }
});
}
// lock acquired
Logger.LogInformation($"Successfully aqcuired distributed leader lock, Instance Id={Constants.InstanceId.ToString("N")}, Instance Name={CommonHelper.GetAppFlag<string>(Constants.AppFlags.InstanceName)}");
var others = Participants.GetOtherPeers(Cluster.CurrentPeer);
Logger.LogDebug($"I am sending LEADER CHANGED message to Count={others.Count()} List={Utils.EnumerableToString(others)} other peers ME={Cluster.CurrentPeer}");
try {
await Cluster.Channel.SendMessageAsync(new LeaderChangedMessage {
LeaderAddress = Cluster.CurrentPeer.Address.ToParsableString()
}, others, cancellationToken: cancellationToken);
} catch (OperationCanceledException) {
} catch (Exception ex) {
Logger.LogError(ex, $"I could not send LEADER CHANGED message to Count={others.Count()} List={Utils.EnumerableToString(others)} other peers ME={Cluster.CurrentPeer}");
}
NotifyLeaderChanged(Cluster.CurrentPeer);
// no need to loop, we acquired lock, we will keep it until shutdown
break;
} else {
// couldn't acquire lock, so let's wait for a while and then try again
if (CurrentLeader == null) {
// no leader, so let's find out who is the current leader
var others = Participants.GetOtherPeers(Cluster.CurrentPeer);
Logger.LogDebug($"I am sending ASK LEADER message to Count={others.Count()} List={Utils.EnumerableToString(others)} other peers ME={Cluster.CurrentPeer}");
try {
await Cluster.Channel.SendMessageAsync(new AskLeaderRequest {
SenderAddress = Cluster.CurrentPeer.Address.ToParsableString()
}, others, cancellationToken: cancellationToken);
} catch (OperationCanceledException) {
} catch (Exception ex) {
Logger.LogError(ex, $"I could not send ASK LEADER message to Count={others.Count()} List={Utils.EnumerableToString(others)} other peers ME={Cluster.CurrentPeer}");
}
}
try {
await Task.Delay(TimeSpan.FromMinutes(5), cancellationToken);
} catch (Exception) {
break;
}
}
}
}
When this method runs, it blocks on handle.HandleLostToken
for very very long time (i guess indefinitely).
But when I access that property on another thread like this, it works like a charm.
using (CommonHelper.SuppressAllScopes()) {
Task.Run(() => {
// register for handle loss
if (handle.HandleLostToken.CanBeCanceled) {
handle.HandleLostToken.Register(() => {
_lockHandle = null;
// lock connection lost, lease expired etc
// that means we are no longer leader
NotifyLeaderChanged(null);
StartElection("Lock handle lost");
// get rid of handle
try {
handle.Dispose();
} catch { }
});
}
}).Ignore();
}
And also, when app shuts down, when I call _lockHandle.DisposeAsync
method in another method, it either blocks or succeeds depending on whether I accessed _lockHandle.HandleLostToken
property. In the first section of code (where I didn't access _lockHandle.HandleLostToken
) _lockHandle.DisposeAsync
method works without blocking, but in the second section of code where I'm accessing _lockHandle.HandleLostToken
, it blocks.
I think there is some kind of synchronization context dead lock or something here. Thanks