Load Balancer (LB) distributes incoming request and traffic evenly across multiple servers
LB placing
- btw/ user and web servers
- btw/ web server & application server
- web server: Handles HTTP requests from clients (usually web browsers), serving static content (HTML, CSS, JavaScript, images) and forwarding dynamic content requests to an application server.
application server: Manages and executes business logic and dynamic content generation for client requests. Typically interacts with databases and other backend systems.
- btw/ application server & data server
DB server: Manages data storage, retrieval, and manipulation for applications. It handles database queries from application servers or clients.
Keywords
- load balancing algorithm: method used by LB to determine the distribution
- health checks: periodic checks by LB to backend servers. unhealthier servers are removed from the server till they recover.
- session persistence: ensures that subsequent requests from the same client are directed to the same backend server, maintaining session state and providing a consistent user experience.
- SSL/TLS termination: decrypting SSL/TLS encrypted traffic at the LB level, offloading the decryption burden from the backend servers, and allowing for centralized SSL/TLS management
Types of LB Algorithms
Round Robin Algorithm
distributes request in a cyclic order. Assigns to first, second, then third, and when the last is reached, back to first.
- (+): ensures equal distribution of requests among servers, works well when servers have similar capacities.
- (-):
- doesn’t take into account the current load/capacity of each server.
- subsequent requests from the same client may go to different servers. -> problem for stateful applications
- predictable distribution pattern -> vulnerable to hacking
- Use Cases
suitable for homogeneous environments & stateless applications
Least Connections Algorithm
dynamic lb technique, assigns incoming request to the server with the fewest active connections at the time of request
-
(+):
- load awareness
- dynamic distribution: ensures no bottle neck
- efficient in heterogeneous environment bc allocates to less busy servers.
-
(-):
-
higher complexity: requires real-time monitoring of active connections.
-
state maintenance: LB has to maintain state of active connections -> more overhead
-
potential for connections spikes: when the duration of connections are short, the server will close them rapidly and since it looks like it has “less load” it will pick up new request -> leading to a spike in a certain server. This situation requires frequent rebalancing to distribute the load effectively as the server load can change very quickly.
-
If the capacities of the servers vary significantly, it might be beneficial to introduce weighted least connections. (explained later)
-
Use Cases
- heterogeneous environment
- applications with variable traffic patterns
- stateful applications: Since sessions in stateful applications can persist for a long time (as users interact with the system), balancing the number of active connections across servers is crucial to prevent a single server from becoming a bottleneck.
Weighted Round Robin (WRR)
Round Robin + assigns weights to each server based on their capacity or performance, distributing incoming requests, proportionally according to these weights. >> stronger servers handle a large share of the load
- (-):
- complexity in weight assignment
- increase overhead for managing & updating weights
- not ideal for highly variable loads -> WRR may not always provide optimal load balancing
- Use Case
- heterogeneous server environments: different processing capabilities
- scalable web applications: suitable for web applications where different servers may have varying performance characteristics.
- database clusters: useful in database clusters where some nodes have higher processing power and can handle more queries.
Weighted Least Connections (Least Connections + WRR)
takes into account the current load + relative capacity (weight)
-
(+):
- dynamic load balancing: adjustment based on real time
- capacity awareness
- flexibility
-
(-):
- state maintenance: need to keep track of active connections & server weights -> increase overhead
- determining weight assignment can be challenging
-
Use cases
- good for env where servers have diff. processing capacities and workloads
- high traffic web applications to avoid bottleneck
- useful in database clusters where nodes have varying performance capabilities and query loads.
IP Hash
assigns based on the client’s IP address. Uses hash function to convert the IP address into a hash value, which is then used to determine which server should handle the request.
This ensures that request from the same client IP address are consistently routed to the same server providing session persistence.
-
(+):
- session persistence
- not states to manage
- predictable
-
(-):
- possibly uneven distribution
- adding/removing servers can disrupt hash mapping
- inefficiency: no consideration of current load or server capacity
-
Use Cases
- when clients are distributed across different regions and consistent routing is required.
- Applications that need session persistence like online shopping carts, and user sessions.
- Least Response Time
- directs incoming requests to the server with the lowest response time.
-
How it works
(1) Monitors response time
(2) When new requests arrives, the LB assigns it to the server with the lowest average response time.
(3) Dynamic adjustment: LB dynamically adjusts the assignment of requests based on real-time performance data.
-
(+): optimize performance, resources, and dynamic load balancing.
-
(-):
- complex to implement and requires continuous monitoring
- overhead costs from dynamically adjusting + monitoring
- short-term variability: response times can vary in the short term due to network fluctuations or transient server issues, potentially causing frequent rebalancing (when not necessary).
-
Use Cases
- applications where low latency and fast response times are critical like online gaming, video streaming, or financial trading platforms.
- suitable for environments with fluctuating loads and varying server performance.
Random
Works like the name suggests!
-
(+):
- simple to implement, no state maintenance
- low overhead costs
- uniform distribution over time
-
(-):
- no load awareness, potential for imbalance in the short term
- no session affinity/persistence: request from same user get directed to diff. servers.
- Security systems that rely on detecting anomalies (e.g., to mitigate DDoS attacks) might find it slightly more challenging to identify malicious patterns if a Random algorithm is used, due to the inherent unpredictability in request distribution. This could potentially dilute the visibility of attack patterns.
-
Use Cases
- Homogenous env: servers with similar cap & performance
- stateless apps: requests are handled independently
when other complex lb are simply not needed (=unjustifiable).
Least Bandwidth
Routes each new request to the server that is consuming the least amount of bandwidth at the time.
- (+):
- dynamic LB
- efficient resource utilization by balancing bandwidth usage.
- (-):
- requires continuous monitoring of bandwidth usage
monitoring + dynamic LB -> increased overhead
- short-term variability: fluctuates in the short term, potentially causing frequent rebalancing.
- Use Cases
- High-bandwidth applications like video streaming, file downloads, and large data transfers
- Content Delivery Networks (CDNs): useful for CDNs that need to balance traffic efficiently to deliver content quickly.
- Real-time applications: YouTube, PubG, Zoom etc.
Custom Load
highly configurable
- How it works
(1) Define custom metrics (e.g. CPU/memory usage, distk I/O etc)
(2) Implement Monitoring (integrate monitoring tools)
(3) Create Load Balancing Rules (e.g. simple weighted sum)
(4) Dynamic Adjustment
- Use Cases: complex apps, highly dynamic env, custom requirements
You should check out https://lightningballgame.com/ . Their whole concept is built around live dealer games where the social aspect is front and center. The dealers talk with you during the game, you can interact with other players in the chat, and the energy feels alive. It’s not just about winning or losing — it’s about the experience of being connected to other people while playing. That’s something most online platforms completely miss.