Skip to content

SqlDistributedLockHandle.HandleLostToken blocks current thread indefinitely-ish #85

Closed
@joshua-mng

Description

@joshua-mng

I am using your library to implement a leader election algorithm, and it's working so well, thanks for the wonderful work.

Here is a certain section of my code:

protected override async Task OnElectAsync(CancellationToken cancellationToken)
{
    var dlock = _distributedLockManager.CreateLock($"{typeof(DistributedLockLeaderElectionProvider).FullName}/leader/8E7B4AC6-6C08-4467-801A-AA0AA76533AC/leader.lock");

    while (!cancellationToken.IsCancellationRequested) {
        var handle = await dlock.TryAcquireAsync(cancellationToken: cancellationToken);
        if (handle != null) {
            // save for later disposal when the current node stops or we lose the lock
            _lockHandle = handle;

            // register for handle loss
            if (handle.HandleLostToken.CanBeCanceled) {
                handle.HandleLostToken.Register(() => {
                    _lockHandle = null;

                    // lock connection lost, lease expired etc
                    // that means we are no longer leader
                    NotifyLeaderChanged(null);

                    StartElection("Lock handle lost");

                    // get rid of handle
                    try {
                        handle.Dispose();
                    } catch { }
                });
            }

            // lock acquired
            Logger.LogInformation($"Successfully aqcuired distributed leader lock, Instance Id={Constants.InstanceId.ToString("N")}, Instance Name={CommonHelper.GetAppFlag<string>(Constants.AppFlags.InstanceName)}");

            var others = Participants.GetOtherPeers(Cluster.CurrentPeer);
            Logger.LogDebug($"I am sending LEADER CHANGED message to Count={others.Count()} List={Utils.EnumerableToString(others)} other peers ME={Cluster.CurrentPeer}");
            try {
                await Cluster.Channel.SendMessageAsync(new LeaderChangedMessage {
                    LeaderAddress = Cluster.CurrentPeer.Address.ToParsableString()
                }, others, cancellationToken: cancellationToken);
            } catch (OperationCanceledException) {
            } catch (Exception ex) {
                Logger.LogError(ex, $"I could not send LEADER CHANGED message to Count={others.Count()} List={Utils.EnumerableToString(others)} other peers ME={Cluster.CurrentPeer}");
            }

            NotifyLeaderChanged(Cluster.CurrentPeer);

            // no need to loop, we acquired lock, we will keep it until shutdown
            break;
        } else {
            // couldn't acquire lock, so let's wait for a while and then try again
            if (CurrentLeader == null) {
                // no leader, so let's find out who is the current leader
                var others = Participants.GetOtherPeers(Cluster.CurrentPeer);
                Logger.LogDebug($"I am sending ASK LEADER message to Count={others.Count()} List={Utils.EnumerableToString(others)} other peers ME={Cluster.CurrentPeer}");
                try {
                    await Cluster.Channel.SendMessageAsync(new AskLeaderRequest {
                        SenderAddress = Cluster.CurrentPeer.Address.ToParsableString()
                    }, others, cancellationToken: cancellationToken);
                } catch (OperationCanceledException) {
                } catch (Exception ex) {
                    Logger.LogError(ex, $"I could not send ASK LEADER message to Count={others.Count()} List={Utils.EnumerableToString(others)} other peers ME={Cluster.CurrentPeer}");
                }
            }

            try {
                await Task.Delay(TimeSpan.FromMinutes(5), cancellationToken);
            } catch (Exception) {
                break;
            }
        }
    }
}

When this method runs, it blocks on handle.HandleLostToken for very very long time (i guess indefinitely).

But when I access that property on another thread like this, it works like a charm.

using (CommonHelper.SuppressAllScopes()) {
    Task.Run(() => {
        // register for handle loss
        if (handle.HandleLostToken.CanBeCanceled) {
            handle.HandleLostToken.Register(() => {
                _lockHandle = null;

                // lock connection lost, lease expired etc
                // that means we are no longer leader
                NotifyLeaderChanged(null);

                StartElection("Lock handle lost");

                // get rid of handle
                try {
                    handle.Dispose();
                } catch { }
            });
        }
    }).Ignore();
}

And also, when app shuts down, when I call _lockHandle.DisposeAsync method in another method, it either blocks or succeeds depending on whether I accessed _lockHandle.HandleLostToken property. In the first section of code (where I didn't access _lockHandle.HandleLostToken) _lockHandle.DisposeAsync method works without blocking, but in the second section of code where I'm accessing _lockHandle.HandleLostToken, it blocks.

I think there is some kind of synchronization context dead lock or something here. Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions