What is High Availability?by Iwan Price-Evans on High Availability • May 19, 2022
High availability refers to the ability of an application or service to continue operating despite failures within its environment.
High availability is important for any business that relies on technology. It helps ensure that critical applications will keep running even when there are problems in other parts of the system.
What Is Redundancy?
Redundancy is a common method for achieving high availability. Redundant applications, services, and technologies have backup systems that can maintain operation in the event of a failure.
High availability systems usually include redundancy in the following systems:
- Geographic locations (for example, multiple data centers)
- Power supplies
- Network connections
- Server hardware
- Data storage
- Compute resources
- Application services (for example, load balancers)
- Software components
What Are The Types Of Redundancy?
There are different types of redundancy, with different relationships between primary and backup systems.
- 1+1 Redundancy (one-to-one) features one backup system for every primary system.
- 1+N Redundancy features one backup system for any number of primary systems in a cluster.
- 2+N Redundancy features two backup systems for any number of primary systems in a cluster, providing one additional layer of redundancy compared with 1+N.
Active and Passive Systems
- Active-Active Redundancy features primary and backup systems that are both active and serve clients simultaneously and interchangeably. In this configuration, if one system fails then all clients are migrated to the remaining active system. This has the advantage of fast failover but requires more active resources.
- Active-Passive Redundancy features an active primary with a passive backup. In this configuration, only the active system serves clients. If the active system fails then the backup system activates and all clients are migrated to the backup system. This has the advantage of lower resource costs but requires time to initialize the backup system and risks losing clients' session state.
What Is A High Availability Service Level Agreement?
A high availability service level agreement (SLA) is a contract to provide a certain level of uptime usually expressed in a percentage. A common target for high availability SLAs is 99.999% otherwise known as "five nines".
You can calculate the high availability ratio by dividing the uptime by the total time. For example, if a system is down for 1 minute per month:
- Total Time = 43,200 minutes
- Downtime = 1 minute
- Uptime = 43,199 minutes
- Ratio = 43,199 / 43,200 = 0.99998%
In this example, high availability falls just short of the common target of "five nines".
What Are Replication Technologies?
To ensure high availability, applications must be able to tolerate failure without losing data. This requires replication technologies that allow for the safe distribution of data among multiple nodes.
What Is Disaster Recovery?
A disaster recovery plan is a set of procedures designed to help businesses recover quickly after a major disruption. This includes things like having multiple copies of data stored at different locations so that if one location fails, another can take over.
What Is Business Continuity Planning?
Business continuity planning (BCP) is a process used by businesses to prepare for potential disruptions to business operations. It includes identifying risks and developing strategies to mitigate those risks. BCP also involves creating plans to recover quickly after a disruption occurs.
Learn more about business continuity and disaster recovery.
Does Snapt Help Ensure High Availability?
Yes. Snapt Nova provides multi-location and multi-cloud load balancing with health checks. If one location, cloud, or server fails, loses performance, or reaches capacity, Snapt Nova will redistribute traffic to the best alternative nodes, usually without losing session states.
Snapt Nova is itself highly available. It is able to generate dynamic application services on-demand and self-heal in seconds if the underlying infrastructure fails.