Homework 3: a relay chat server

Due date: Mar. 1, 2019

Given an existing chat node program (this was the topic of a cs450 homework in an earlier year), your job is to extend it to support multiple nodes. Clients connect to a nearby node, and all messages are delivered to all clients, no matter what node they connect to.

You'll find the template code here: https://github.com/bitslab/cs450/tree/master/relay-chat

Basic Operation

The node can be started in standalone, or peer mode. In standalone mode, a listening port is provided on the command line, as in the template code. In peer mode, the address and port of another node are also provided (as the second and third arguments).

Any message sent by a client is forwarded to all peers, and the client experience is identical to if all clients were connected to the same node. You do not need to support the "whisper" functionality provided by the template code. nodes follow a standard protocol for node-to-node communication:

Peer-to-Peer Protocol

Your program needs to conform to a standard protocol, in order to be compatible both with the reference node, and with the nodes of your classmates. When passed two more parameters: remote peer address, and remote peer port, your node should connect to the remote peer to join the peer-to-peer chat network.

To connect to a peer node, use the same port that is accepting client connections. Send the command:

peer <port>

to indicate that this is a peer node, and not a client connecting. The port provided is the port number that the connecting node is listening on. No IP is specified: instead, the IP of the incoming connection is used (see the Fault Tolerance part below).

All communication after the peer command conforms to a packetized messaging protocol, as described by these C types:

struct header {
enum {MESSAGE=1,FAILOVER=2,REBALANCE=3} type;
unsigned int length; // in host order (little endian)
};

struct peer {
in_addr_t addr;
unsigned short port; // in network byte order
};

struct failover_message {
struct header hdr;
struct peer node;
};

struct rebalance_message {
struct header hdr;
struct peer node;
};

struct chat_message {
struct header hdr;
char payload[];
};

The type and length fields in the header are used to split the continuous binary stream of bytes into individual messages. Here, the chat message payload is just a string, but its length is determined by the total length of the message, as specified in the header.

The failover and rebalance messages have the same structure. The peer struct is used to describe another peer in the network. The addr field is the node's address, as seen by its directly connected peer, and the port is the port specified by the peer in the initial peer message.

Fault Tolerance

Occasionally, a node will disappear from the network. Its immediate (telnet) clients are then out of luck. However, the network as a whole needs to immediately recover from this failure. To ensure continued correct operation, nodes announce failover peers to their neighbors. If a node fails, its downstream neighbors must all connect to the failover peer. 

For correct operation, announce the upstream peer as failover peer. If the node has no upstream select one downstream peer and stick to it until it disconnects. If the failover peer disconnects, select a new failover peer as appropriate and announce it to the other peers.

Load Balancing

The rebalance message serves to allow nodes to shed downstream peers to other nodes. When a node receives a rebalance message, it must connect to the specified peer, and upon success, disconnect from the current upstream peer.

If your node has direct connections to more than 3 nodes, make it try to shed some of them to its downstream peers. Warning: not every peer will be reachable at the specified IP and port. Before sending a rebalance message, verify that the target peer is indeed reachable.

Some test cases

Here are some things we will test.

  • two peers, a telnet client connected to each: chat messages are correctly forwarded.
  • three peers, a telnet client connected to each: chat messages are correctly forwarded.
  • three peers, a telnet client connected to each. The middle peer is killed - chat messages continue to be forwarded correctly.
  • three peers, last two connected to the first one. First peer is killed - chat messages continue to be forwarded correctly.
  • four peers, last three connected to the first one. The initial failover peer is killed, then the first peer. Communication between the remaining peers continue as expected.
  • five peers, last four connected to the first peer. fifth peer is told to rebalance to another peer: disconnects, and joins the other peer. Chat messages forwarded as expected.

Reference server

Once you have a working implementation, try connecting to the reference server at IP address 131.193.34.70, port 8080. You may telnet to it directly, to act as a chat client, or connect your peer(s) to it. Verify that locally connected telnet clients are able to chat with telnet clients connected to the reference server.

Be aware that the reference server may issue rebalance requests, failover messages and disconnect abruptly. Rebalance and failover events may connect you to peers of other classmates, and chat messages will be arriving from all peers connected to the network.

Turn-in

Use this link to turn in your homework via github classroom.