Load balancing & consistent hashing

LOAD BALANCING

Load balancing refers to efficiently distributing incoming network traffic across a group of backend servers, also known as a server farm or server pool.

Modern high-traffic websites must serve hundreds of thousands, if not millions, of concurrent requests from users or clients and return the correct text, images, video, or application data, all in a fast and reliable manner. To cost-effectively scale to meet these high volumes, modern computing best practice generally requires adding more servers.

A load balancer acts as the “traffic cop” sitting in front of your servers and routing client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization and ensures that no one server is overworked, which could degrade performance. If a single server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added to the server group, the load balancer automatically starts to send requests to it.

In this manner, a load balancer performs the following functions:

Distributes client requests or network load efficiently across multiple servers
Ensures high availability and reliability by sending requests only to servers that are online
Provides the flexibility to add or subtract servers as demand dictates

Load Balancing Algorithms

Different load balancing algorithms provide different benefits; the choice of load balancing method depends on your needs:

Round Robin – Requests are distributed across the group of servers sequentially.
Least Connections – A new request is sent to the server with the fewest current connections to clients. The relative computing capacity of each server is factored into determining which one has the least connections.
IP Hash – The IP address of the client is used to determine which server receives the request.
Weighted Round Robin
Least Connection
Weighted Least Connection
Agent-Based Adaptive Load Balancing
Chained Failover (Fixed Weighted)
Weighted Response Time
Software Defined Networking (SDN) Adaptive

Session Persistence

Information about a user’s session is often stored locally in the browser. For example, in a shopping cart application the items in a user’s cart might be stored at the browser level until the user is ready to purchase them. Changing which server receives requests from that client in the middle of the shopping session can cause performance issues or outright transaction failure. In such cases, it is essential that all requests from a client are sent to the same server for the duration of the session. This is known as session persistence.

The best load balancers can handle session persistence as needed. Another use case for session persistence is when an upstream server stores information requested by a user in its cache to boost performance. Switching servers would cause that information to be fetched for the second time, creating performance inefficiencies.

Dynamic Configuration of Server Groups

Many fast-changing applications require new servers to be added or taken down on a constant basis. This is common in environments such as the Amazon Elastic Compute Cloud (EC2), which enables users to pay only for the computing capacity they actually use, while at the same time ensuring that capacity scales up in response traffic spikes. In such environments it greatly helps if the load balancer can dynamically add or remove servers from the group without interrupting existing connections.

Hardware vs. Software Load Balancing

Load balancers typically come in two flavors: hardware-based and software-based. Vendors of hardware-based solutions load proprietary software onto the machine they provide, which often uses specialized processors. To cope with increasing traffic at your website, you have to buy more or bigger machines from the vendor. Software solutions generally run on commodity hardware, making them less expensive and more flexible. You can install the software on the hardware of your choice or in cloud environments like AWS EC2.

Some load balancing algorithms

1. Consistent Hashing

To distribute requests among servers using consistent hashing, HAProxy takes a hash of part of the request (in our case, the part of the URL that contains the video ID), and uses that hash to choose an available backend server. With traditional “modulo hashing”, you simply consider the request hash as a very large number. If you take that number modulo the number of available servers, you get the index of the server to use. It’s simple, and it works well as long as the list of servers is stable. But when servers are added or removed, a problem arises: the majority of requests will hash to a different server than they did before. If you have nine servers and you add a tenth, only one-tenth of requests will (by luck) hash to the same server as they did before.

Then there’s consistent hashing. Consistent hashing uses a more elaborate scheme, where each server is assigned multiple hash values based on its name or ID, and each request is assigned to the server with the “nearest” hash value.The benefit of this added complexity is that when a server is added or removed, most requests will map to the same server that they did before. So if you have nine servers and add a tenth, about 1/10 of requests will have hashes that fall near the newly-added server’s hashes, and the other 9/10 will have the same nearest server that they did before. Much better! So consistent hashing lets us add and remove servers without completely disturbing the set of cached items that each server holds. That’s a very important property when those servers are running in the cloud. (video tutorial)

issues with Consistent Hashing
However, consistent hashing comes with its own problem: uneven distribution of requests. Because of its mathematical properties, consistent hashing only balances loads about as well as choosing a random server for each request, when the distribution of requests is equal. But if some content is much more popular than others (as usual for the internet), it can be worse than that. Consistent hashing will send all of the requests for that popular content to the same subset of servers, which will have the bad luck of receiving a lot more traffic than the others. This can result in overloaded servers, bad video playback, and unhappy users.

refs
1. https://ai.googleblog.com/2017/04/consistent-hashing-with-bounded-loads.html
2. https://medium.com/vimeo-engineering-blog/improving-load-balancing-with-a-new-consistent-hashing-algorithm-9f1bd75709ed
3.http://michaelnielsen.org/blog/consistent-hashing/ (implementation )

Search This Blog

System Design