RTO and RPO in Disaster Recovery Planning

The terms RTO and RPO are often thrown about in the tech world. But, like many acronyms, they are often misunderstood. RTO (Recovery Time Objective) and RPO (Recovery Point Objective) are key factors in determining database backup and disaster recovery scenarios. People think of them as technical terms, and there is nothing wrong with that. However, it is important to think of them in a business sense as well.

Let’s talk about each of these terms and discuss what they mean in a technical as well as a business sense.

RTO: Recovery Time Objective

A standard definition of RTO is the duration of time in which a business process must be restored after a disaster in order to avoid unacceptable consequences associated with a break in business continuity.

Let’s break that down a bit further. Let’s say that your business says the website (or various other processes) can be down for 30 minutes before you start to lose serious money as well as damaging the brand. (This is just an example of some websites, of course, cannot afford to be down for 30 minutes). What this means is that your RTO is 30 minutes. If a disaster strikes you want to ensure that you are up and running 30 minutes later. It is important to think of these times from a business point of view first. The business unit must take into consideration the actual costs (loss of business), brand damage (will an outage make the news?), and operational costs (what is going to be required to get up and running?).  Once a true cost is determined the business can take this information and tell the technical team what time limit on each business system. Most business people will typically say ‘No downtime is acceptable’ or ‘seconds’ to ‘minutes’.

From here we have to think in terms of what the business is asking the technical people. If they say that RTO is seconds…what does this really mean? Typically it means expensive hardware, fully redundant systems that are ready to be ‘up’ at all times. For business systems that can tolerate 30 minutes or an hour, this can mean something different. And for business systems that can tolerate 4+ hours, that’s something different again. Knowing the tolerance for RTO for each system is important as not all business systems are the same. For example, for an e-commerce website, you might want the RTO to be seconds whereas for an internal HR website you might tolerate a few days.

A careful study of each system and its unique needs is required. Getting a system up and running after it is down is a challenge. Care and planning are needed to ensure that the organization can be restored quickly and effectively.

RPO: Recovery Point Objective

RPO is often defined as the maximum targeted period in which data might be lost from a disaster.

Like RTO, this can vary by business as well as vary by the business system within a company. How much data can be ‘lost’? Again most people will say ‘Zero Data Loss’ and in many systems, this can be achieved, but at a certain cost. Your systems will have to be designed for no loss in the event of an emergency. And they often come at a high cost.

There is a very large difference between losing 1 hour of data vs losing 10 minutes vs losing 1 minute. When designing the system it is important to think of these things. As the fault tolerance shrinks, the cost of ensuring low RTO typically rises dramatically. IT departments must plan proper backups, disaster recovery plans, offsite IT systems, and a myriad of other factors. Many of these concerns can be alleviated with proper planning in finding the proper balance of RTO and RPO for your organization.


About Author

Leave A Reply