Designing for Uncertainty: Lessons from COVID-19 and what comes next

by Bethany Hendricks on DevOps • July 3, 2020
Designing for Uncertainty: Lessons from COVID-19 and what comes next

In an upcoming series of blog articles we analyse the lessons to be learned from COVID-19 and how to design for uncertainty. This series tackles 5 main topics which are introduced below. To follow the series “Designing for Uncertainty” please subscribe to our blog.


If there is one thing to be learned from the COVID-19 pandemic it is that there is very little we can be certain about. Over the last few months, both in our personal lives and professional lives we have likely all had to make some really tough decisions, many of which we would not or could not have predicted. Our ability to handle these decisions depends on our capacity to react well to unpredictable events and to be resilient to future uncertainty. Moving forward in the post-COVID-19 landscape means considering resilience to uncertainty as a key influencing factor of design. As this blog series will show, one proven way of designing for uncertainty is to adopt an agile methodology, such as the DevOps methodology, that allows for speed, continuous deployment, continuous integration and automation.

Designing for Scalability: lessons for changing buyer behavior

While this pandemic saw many industries reach never-before-seen levels of commerce (let’s not forget the mad rush for toilet paper), others such as hospitality had the rug ripped out from under them. These drastic changes in global consumer behavior happened in a matter of weeks, or in some cases, days.

Statista TP revenue chart







Source: Statista

As detailed by EY in their article “Eight ways to keep up with your customers during and after COVID-19”, COVID-19 has been a reminder of how quickly the ground can shift beneath your feet, demanding equally rapid, dynamic and agile action to adapt to changing demands and restrictions. 

The key lesson from these unpredictable shifts is that systems need to be capable of handling a situation in which rapid growth or rapid shrinkage in different parts of a business could happen at any time and in any combination. System design needs to account for scalability and flexibility in order to cope with these kinds of strains. Static servers and fixed software licenses no longer serve the needs of businesses dealing with unpredictable changes in demand and traffic. Looking to the future, cloud storage and compute (public, private or hybrid) with scalable, automated applications (and scalable licenses to match) will provide the best solutions to the problem of scalability.

Designing for Distributed Systems: lessons for digital workplaces

Not only have consumers’ needs changed, but employees’ needs have changed as well. As can be seen in the graph below, a Gartner CFO survey reveals that nearly a quarter of respondents intend to shift at least 20% of employees to remote work permanently.

Gartner CFO Survey

Remote working has rapidly evolved from a privilege to a norm, and travel bans have prevented many from travelling for in-person business meetings. The concept of digital workplaces encompasses not only the shift to decentralized staff and Zoom meetings but the total digital transformation of communication, collaboration, operational systems and procedures to ensure optimal performance and efficiency. 

The challenges caused by an immediate need to shift to remote working include whether there is enough remote desktop capacity, whether to rely on public clouds for data sharing and storage (with the accompanying questions around data security), and how to effectively track staff engagement and output. 

The key lesson here is that these challenges need to be solved in a way that allows not only for an immediate shift to remote working but also an immediate shift back to centralized working, rapid shifts between the two models, and the combination of centralized and decentralized working simultaneously.

Those who had previously been resisting the process of digital transformation or who are using static infrastructure that can cope only with predictable patterns face the urgent requirement to adopt flexible software solutions to cope with a constantly changing, dynamic work environment.

Designing for Agility: lessons for economic opportunity

With the global economy suffering the consequences of COVID-19 and national lockdowns, there has been a large focus on cutting costs, reducing staff and stripping back to the bare necessities. But what happens when an opportunity arises and a business needs to take action quickly? How do you ensure that you are not only making quick decisions but also making durable decisions that will serve you in the future? 

IT leaders have needed to respond urgently and make decisions in haste during the unfolding of COVID-19. Decisions made in such an environment need to be future-proof taking into account the needs for tomorrow, without the associated upfront expenditure typically required by future planning. The ability to rapidly scale business operations – and for costs to be able to scale alongside – is something that organisations need to consider when designing their operations in the age of uncertainty.

Unpredictable events such as COVID-19 required businesses and state agencies to pivot rapidly. For example, production lines being re-purposed for building ventilators or for producing testing kits in vast quantities, and clothing manufacturers switching over to mask manufacturing. Without flexibility in systems and operations, organizations are unable to act quickly to take advantage of new opportunities.

The key lesson here is that underlying systems must be designed with the capacity for rapid re-purposing and redirection, and must avoid costly long-term investment in things that cannot be changed easily. 

A concept originating in manufacturing but becoming more relevant in application delivery is the Just in Time or JIT methodology. In its manufacturing roots, it refers to a method of reducing workflows and response times to reduce holding costs or, more simply, making what’s needed when it’s needed. From a software perspective this methodology translates into investing in people and systems that can quickly react to unforeseen events and provide immediate solutions, rather than investing in pre-emptive solutions that could fail. In application delivery, a JIT methodology is enabled by adoptions such as distributed cloud-based systems, microservices architectures, systems for rapid testing and deployment, and system-wide observability.

Designing for Automation and Intelligence: lessons for human resources

Due to the agility needed to survive the COVID-19 pandemic, organizations are realizing the importance of strong cross-functional IT, development and operations teams. The problem that these organizations face, however, is that vacancies such as DevOps are notoriously hard to fill and skilled employees even harder to retain – a problem that is only becoming more difficult as demand increases. 

Another issue at play with high staff turnover in this space is a term known as “tribal knowledge”. Tribal knowledge is something seen often in software development, further explained by Snapt CEO Dave BlakeyTribal knowledge comes from setups that are custom and not well documented. A single developer (or small group) creates a system that requires a large amount of knowledge to maintain and manage. The stack, the custom scripts, the code to run it, the infrastructure, etc. This causes an issue where it takes an extremely long time for a new person to manage it and work on it, or to diagnose problems.” Not only does this cause challenges in staff turnover, but during a pandemic such as COVID-19, should employees be off sick, the knowledge those employees hold is unavailable at that point in time, which creates a huge amount of risk.

The key lesson here is that systems must be designed to cope with a shortage of specialist staff or the loss of key individuals. Adopting a DevOps methodology is one way of beginning to solve the issue of tribal knowledge as it shifts organizations away from this behavior and towards automation and repeatability, while breaking down the walls between departments and encouraging collaboration. The adoption of machine learning and AI technologies can help with automated decision making on large and complex data sets, mitigating the need to hire and retain key individuals with expert knowledge of company systems.

Designing for Resilience: lessons for disaster mitigation and recovery

Most organizations, big or small, have long had a disaster mitigation and recovery plan – though this might have seemed like a box-checking exercise until COVID-19 appeared. Now that some of these plans have been tested in a real global disaster, any weaknesses in these plans will have been exposed.

Unpredictable events can wreak havoc on an organization in many ways, and sometimes they cannot be prevented but can only be mitigated. The lockdowns in many countries rendered many businesses inactive and not viable. In another kind of disaster event, something such as the large-scale loss of power or connectivity could disable or cut off critical systems. It is not certain what the next global or even local disaster might be. 

The key lesson here is that it is vital for disaster planning that organizations design systems to cope not just with these examples but with the ones we cannot predict. This can be achieved by spreading risk. Ways of designing for resilience (spreading the risk) include: 1) avoiding fixed upfront costs and focusing instead on flexibility; 2) distributing systems and data globally, with N+1 redundancy; 3) achieving complete observability into all systems, to identify the remaining healthy parts of an organization so that resources can be shifted as needed.

The recent events have highlighted something we already knew, which is the need for businesses to be agile, flexible and able to pivot at a moment's notice. This is why a DevOps methodology is a vital element of business practice today. When no-one can predict what the next “pandemic” will bring and what response will be required, it is the agile and resilient organizations that will survive and thrive when under severe pressure.

What is a DevOps methodology and what makes it so vital in designing for uncertainty?

We have referred to the DevOps methodology as a solution to the challenges facing us during times of uncertainty. So let’s take a look at what this really means.

AWS defines DevOps as “the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes.” 

Expanding on this definition, DevOps involves breaking down the silos of development and operations to improve collaboration and efficiency through continuous delivery, continuous deployment, continuous integration and automation.

Whether or not the functionality of your organization has traditionally been rooted in software, with the growing global uncertainty an agile DevOps methodology has become increasingly essential to survival and growth. 

We will continue to explore these challenges and lessons learned from COVID-19, and how to design for uncertainty, in a series of blog posts.

To follow this series please subscribe to our blog.