CAP Theorem
What is CAP Theorem?
The CAP theorem, also known as Brewer’s theorem, is a fundamental concept in system design. It was introduced by Eric Brewer in 2000 during a discussion on distributed computing principles at U.C. Berkeley.
The CAP theorem states that a distributed system can offer only two out of three properties at the same time: consistency, availability, and partition tolerance. This theorem explains the trade-off between consistency and availability when a network partition occurs.
Remember, in the CAP theorem, you can't always have all three things at once (Consistency, Availability, and Partition Tolerance), so you need to choose what's most important for your system.

Consistency
A system is said to be consistent if all nodes see the same data at the same time.
In a consistent system, data updates are instantly visible to all nodes in the system, ensuring that there is no confusion or discrepancy when multiple nodes access the same data simultaneously.
Simply, if we perform a read operation on a consistent system, it should return the value of the most recent write operation. This means that, the read should cause all nodes to return the same data, i.e., the value of the most recent write.
Example:
Imagine you have a distributed database system with multiple nodes, and you want to add points in user's account. To maintain consistency, the system ensures that all nodes have the same updated points for that user.
Here's how consistency works in this scenario:
- Initial State: Initially, all nodes in the distributed system have the same points. Let's say the user has 500 points
- Update: You initiate a request to update the user's point balance to 1000 points
- Consistent Behavior: In a consistent system, after the update request is processed and acknowledged, all nodes in the system immediately reflect the change. So, all nodes now have 1000 points for that user.
- Read: If you perform a read operation on any node after the update, it will return 1000 points, because consistency ensures that all nodes are synchronized and have the same data.
In summary, consistency ensures that all nodes in the distributed system provide the same, up-to-date information, even when data is updated or accessed concurrently. It eliminates the possibility of one node returning 500 points while another returns 1000 points during a read operation after the update.
Availability
Availability in a distributed system means that the system is always up and running, ensuring that every request receives a response without errors, no matter the current state of individual nodes.
For example, in a highly available system, even if some nodes are temporarily offline or experiencing issues, the system as a whole continues to function, and users can still interact with it without disruptions.
However, it's important to note that availability alone doesn't guarantee that the response will contain the most recent data or write. It simply means that the system is responsive and accessible.
Example:
Imagine an e-commerce platform that relies on a distributed database to manage its product catalog and customer orders. In this scenario, Availability as a critical requirement. The primary goal is to ensure that customers can always access the website, browse products, and place orders, even in the face of potential network issues or node failures.
In this example, the e-commerce platform prioritizes Availability by ensuring that customers can continue shopping and using the platform even in the presence of network partitions or node failures. During such disruptions, there may be temporary data inconsistencies, but the system focuses on returning to a consistent state once the network issues are resolved.
Partition Tolerance
Partition Tolerance refers to a distributed system's ability to continue functioning even in the presence of network partitions or communication failures between different components or nodes of the system. Partition tolerance is a critical property for systems that need to operate reliably in a networked environment, where network connections can be unreliable or may temporarily fail.
Example:
Imagine a system used by an e-commerce company to manage customer orders and product information. This system is designed to ensure that order placement and product data remain accessible, even during network issues or server problems.
- Normal Operation: On regular days, customers can place orders, check order statuses, and view product details on the e-commerce website. The system operates smoothly, with order and product data kept up to date.
- Network Interruption: Suddenly, a network interruption occurs due to a technical glitch or a temporary issue with the server hosting the system. Some parts of the system become temporarily disconnected from the rest of the network.
- Partition Tolerance: Partition tolerance is vital in this system. It means that even when some parts of the system are cut off from the network due to the interruption, the remaining parts of the system continue to function. Customers can still place orders and access product information, but they might experience delays or partial access due to the network issue
- Server Recovery: While some parts of the system continue to function during the network interruption, the company's IT team works to resolve the network problem and restore full connectivity.
- Synchronization: Once the network issue is fixed, the system begins synchronizing data across all parts. It ensures that order and product information are consistent throughout the system.
In this example, partition tolerance ensures that even during a network interruption, the order and product details system continues to provide essential functionality, such as order placement and limited access to product information. Once the network problem is resolved, data synchronization ensures that the entire system returns to a consistent state.
Conclusion
The conclusion of the CAP theorem is that you can't have all three of these at the same time. You have to choose which two are most important for your specific system. So, it helps you make decisions about how your distributed system should behave based on your priorities.
Comments
Post a Comment