Skip to content

[DP-110] NC blocks when remote nodes are unavailable #204

Open
@qnikst

Description

@qnikst

[Imported from JIRA. Reported by Facundo Dominguez @facundominguez) as DP-110 on 2015-04-16 17:39:09]
The node controller sends messages to remote nodes sometimes. When the remote node is unreachable, the NC may block for a while.

To fix ideas, let's assume we are using network-transport-tcp.

The NC uses sendBinary to send messages to other nodes. When a node is unreachable and there is no connection, establishing a new connection needs to time out for sendBinary to return the control back to the NC. If there is a connection, sending a messages through it may still block if the send buffer is full.

One tentative fix could be to have the NC spawn an auxiliary thread to call sendBinary. However, when sendBinary blocks, this can cause multiple auxiliary threads to accumulate trying to communicate with the unreachable node, and this can have some impact in performance depending on the amount of accumulated threads.

Another solution is to have a message queue with a dedicated thread per remote NodeId. When the NC needs to send a message to a node, the message is placed in the corresponding queue. A bit of cleverness can make the collection of queues dynamic, so queues and threads are created on demand and disposed of when empty.

At the transport level we could ask send and connect to be asynchronous at least for unreliable connections, and have the NC use unreliable connections to send messages.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Backlog

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions